Why nobody can see inside AI’s black box

The precise mechanics of how these chatbots or image models produce their next word, image, or idea remains somewhat mysterious, even to their creators. Credit: Flow37 / depositphotos.com

When you click a button in Microsoft Word, you likely know the exact outcome. That’s because each user action leads to a predetermined result through a path that developers carefully mapped out, line by line, in the program’s source code. The same goes for many often-used computing applications available up until recently. But artificial intelligence systems, particularly large language models that power the likes of ChatGPT and Claude, were built and thus operate in a fundamentally different way. Developers didn’t meticulously program these new systems in a step-by-step fashion. The models shaped themselves through complex learning processes, training on vast amounts of data to recognize patterns and generate responses.

When a user enters a prompt, chatbots powered by these models may, in text applications, predict what the next word in a sentence might be and output text that can feel remarkably human. Similarly, image-generation models like DALL-E and Midjourney create visuals by training on billions of image-text pairs, without following explicit drawing instructions.

The precise mechanics of how these chatbots or image models produce their next word, image, or idea remains somewhat mysterious, even to their creators. It’s a bit like watching a master chef cook by intuition—you can observe the ingredients and the result, but the exact process of how they arrived at their decisions is unclear. This challenge of understanding AI’s internal workings isn’t new. Transparency research has been a field in computer science for more than a decade, attempting to peer into the “black box” of neural networks—complex computing systems loosely inspired by the human brain—and other AI algorithms. While researchers in the field have explored various approaches, including “explainable AI” tools designed to help interpret AI decisions, these technical solutions haven’t proven very useful in practice.

This opacity has created an unprecedented power dynamic: First, at the most fundamental level, the tech companies building these AI systems don’t fully understand how their models work internally—a challenge inherent to the technology itself. But there’s a second, distinct barrier to transparency: Developers aren’t making the data they train these systems with available to those outside their organizations. Additionally, outside researchers who have the skills and knowledge to study these systems independently lack the resources and computing power to run their own experiments, even if they had data access. With generative AI rapidly reshaping society, from medical diagnoses to classroom teaching, academic and independent researchers are pursuing parallel investigations: They hope to crack open the AI “black box” to understand its decision-making, while rigorously studying how these systems affect the real-world. Recent breakthroughs reveal that true transparency requires not just peering into AI’s inner workings, but reimagining how society should study, evaluate, and govern these systems.

Meaningful transparency. Grasping the fundamental internals of AI models is important because it could enable precise interventions when needed, just as targeted therapies revolutionized medicine by blocking exact biological pathways. “When people want to solve Parkinson’s, they know that understanding the mechanism allows them to target specific processes,” says Northeastern University assistant professor of computer science David Bau, who leads a research team working on mechanistic AI understanding—the study of how neural networks process information and make decisions. “We’re nowhere close to that with AI, but we are starting to see the mechanisms.”

But the quest to decode AI’s internal decision-making process, and the associated research field, is still evolving. “Interpretability research is difficult and messy,” Bau says. “It’s not that well understood, even what the questions are that we’re trying to answer.” Yet he remains optimistic about the progress being made. “Every month, a little bit more is known,” Bau notes, pointing to recent breakthroughs in understanding the behaviors of some of the layers of computations these models conduct, and also how even the order of the artificial “neurons” or, units of binary computations, within the AI model can significantly impact its ability to make correct associations.

Yet as people’s daily interactions with large language models become more commonplace, the very conception of what constitutes meaningful transparency is also expanding. Some believe there isn’t a need to fully account for every gear turning in these machines. “What we really need to understand is how these large language models interact with the world,” says Sayash Kapoor, a Ph.D. candidate at Princeton University’s Center for Information Technology Policy who was included in TIME’s 100 Most Influential People in AI in 2023. Perhaps AI systems should not be viewed as purely engineering problems, with a focus solely on decoding how AI models generate output. Figuring out how language models operate in society is what’s crucial, Kapoor says.

“A mechanic might have an intimate understanding of how a car works,” he says, “but when it comes to regulating how cars operate on roads, what matters more are the observable patterns of human behavior and real-world interactions.”

This broader view of transparency is gaining traction in the research community. At Radbound University in the Netherlands, researchers have developed a transparency matrix that goes beyond just technical interpretation, incorporating model information, training data details, and societal impact assessments. For Alex Hanna, Director of Research at the Distributed AI Research Institute (DAIR), the most crucial transparency issues aren’t technical at all. They’re about human and organizational decisions around how and when AI systems are deployed. “Without meaningful company transparency, we can’t even begin to understand how these systems impact the real world,” Hanna explains. While training data access is also important, she emphasizes that the fundamental problems around transparency are about organizational decisions, such as how or when to release AI systems despite their hallucinations or how to evaluate impacts and risks, not technical explanations.

Some researchers say AI companies’ decision not to share AI systems data might serve strategic purposes. “The less they share about their training data, the more they can claim that their systems are magic,” says Emily Bender, a University of Washington linguistics professor. Bender advocates for a more rigorous, scientific approach to the development of automated systems, criticizing the tendency to market language models as universal problem-solvers. “Instead of capabilities, we should talk about functionalities,” she says. For example, while an AI system trained directly on medical data might seem useful, relying on language models for such tasks might not be effective. “If there’s enough narrative data in the language model, it might come up with something relevant, but that’s not what it’s purpose-built for,” Bender says, warning that these systems are not doing the same probabilistic thinking that doctors do when diagnosing patients. They might also inadvertently perpetuate medical racism due to biases in their training data. Instead, Bender advocates for “curated and evaluated datasets” specifically designed for each task. This can only work if there’s sufficient transparency.

“Most datasets aren’t open, so we can’t assess if they’re working off what’s in the training data,” Hanna says. Training data, which is often not released, is useful for knowing whether or how large language models are more than the sum of their parts.

A striking paradox in AI development. The very labs that build AI systems champion transparency about AI risks while remaining surprisingly opaque about their own models and decision-making processes. Stephen Casper, an MIT computer science Ph.D. student and a researcher in the Algorithmic Alignment Group, where he studies the internal structures and processes of AI systems, says that labs’ legitimate concerns about the risks and safety of AI systems can divert attention from a bigger issue: how these companies themselves make important decisions. Leading AI companies win, he says, when the public is more worried about the opaqueness of AI systems’ internal workings rather than the transparency of the AI lab’s operations, decisions, and direction. Despite devoting much of his research to understanding AI systems, Casper says institutional transparency may be more important and difficult to tackle. The lack of corporate transparency in their research publications, for example, prevents outside researchers like himself from fully evaluating the AI labs’ chosen research directions.

“The resistance to transparency often stems from competitiveness concerns ,rather than legitimate privacy issues,” Hanna says. While some labs like Anthropic have published more extensively than others, even these more open efforts fall short of academic standards, according to researchers.

With access to AI system’s training data and methods and the computing resources needed to independently study these systems, academic researchers could help fill this knowledge gap. Even gaining “white-box” access—the ability to see and study a model’s internal parameter, which Casper notes is essential for meaningful research and evaluation—remains out of reach. Despite Meta’s release of Llama 3, a 405-billion-parameter open-source model more than six months ago, academic researchers have been stuck. “The model is so big that it’s difficult for scientists to study it,” Bau says. “Running the model requires a computer cluster costing over $1 million, and it needs to be spread across 16 or 34 GPU devices.” He likens the challenge to trying to study the physiology of a horse while the horse is running on a racetrack rather than lying on an operating table. “We want to create infrastructure to allow study of models of this size.”

The National Deep Inference Computing Fabric (NDIF) project, a research computing infrastructure project launched six months ago thanks to a $9 million National Science Foundation grant to Northeastern University, aims to address these challenges. “We’ve been creating infrastructure and prototyping it and starting to do some initial science at this scale,” Bau, the lead principal investigator of the project, says. “This type of investigation has been attempted a bit inside proprietary companies, but this is the first time we are trying to do it with academics, with public infrastructure.”

A more balanced future. The high cost of running and studying large language models has created an imbalance in AI research. While major tech companies possess the resources needed for comprehensive transparency research, they may lack incentives to pursue understanding over mere capability improvements.

These resource constraints highlight a deeper question about what drives transparency efforts in the first place. “People can have a variety of motivations for understanding internals. Some are motivated by transparency issues or fairness, others by capability development,” Bau says. “I’m motivated by transparency because we have responsibility for the systems we make.” This sense of responsibility—whether driven by scientific curiosity, ethical oversight, or technical development—underscores why the current resource and transparency imbalance is so concerning. Without meaningful access to both computational resources and institutional knowledge, the academic community cannot fulfill its crucial role in understanding and evaluating these increasingly powerful systems.

Today’s concentration of AI research in well-funded tech companies mirrors historical patterns of technological development—from Bell Labs’ early dominance of telecommunications to IBM’s control of early computing. Yet the most transformative breakthroughs often emerged when technologies became more democratized, as seen in how the personal computer revolution and open-source software progressed innovation.

As AI systems grow more complex and influential, this disparity threatens to undermine not just innovation but the scientific process itself. The challenge ahead requires addressing both technical and institutional barriers to transparency. Creating frameworks for meaningful transparency—ones that go beyond selective disclosure—will be essential for ensuring these technologies develop in ways that can be properly understood, evaluated, and aligned with the public interest. Just as the democratization of computing tools sparked waves of innovation, expanding access to AI systems and research infrastructure could unlock new pathways in both capability and safety.

The views expressed in this piece are the author’s and do not represent those of the US government.

Together, we make the world safer.

The Bulletin elevates expert voices above the noise. But as an independent nonprofit organization, our operations depend on the support of readers like you. Help us continue to deliver quality journalism that holds leaders accountable. Your support of our work at any level is important. In return, we promise our coverage will be understandable, influential, vigilant, solution-oriented, and fair-minded. Together we can make a difference.

Make your gift now

Keywords: AI, artificial intelligence, black box, generative AI, open source data, training data
Topics: Disruptive Technologies

Get alerts about this thread

Why nobody can see inside AI’s black box

Together, we make the world safer.

Abi Olvera

When scientists and weather forecasters are targeted, everyone loses

By John Morales

Tracking Trump’s approach to existential threats

By Thomas Gaulkin

Make China great again? How Trump’s attacks on universities will backfire

By Andrew Dessler

Behind the US-Iran talks: five points of convergence between Trump and Khamenei

By Seyed Hossein Mousavian

DOGE’s staff firing fiasco at the nuclear weapon agency means everything but efficiency

By Stephen Young

How climate change could disrupt the construction and operations of US nuclear submarines

By Allie Maloney

RELATED POSTS

Tracking Trump’s approach to existential threats

By Thomas Gaulkin

The real ‘Great Replacement’

By Dawn Stover

As more countries enter space, the boundary between civilian and military enterprise is blurring. Dangerously.

By Zohaib Altaf

How to stop bioterrorists from buying dangerous DNA

By Steph Batalis, Vikram Venkatram

The Signal and the noise: Why the messaging app is great for privacy but not for war plans

By Rachel Nuwer

We’re back: How tuberculosis is set to surge globally once again

By Erik English

Receive Email
Updates

Recent Stories

When scientists and weather forecasters are targeted, everyone loses

Tracking Trump’s approach to existential threats

Make China great again? How Trump’s attacks on universities will backfire

Behind the US-Iran talks: five points of convergence between Trump and Khamenei

DOGE’s staff firing fiasco at the nuclear weapon agency means everything but efficiency

How climate change could disrupt the construction and operations of US nuclear submarines

The real ‘Great Replacement’

What Sweden, Finland, and Poland can teach the United States about confronting Russia’s nuclear threats

Don't miss an update

Why nobody can see inside AI’s black box

Together, we make the world safer.

By John Morales

By Thomas Gaulkin

By Andrew Dessler

By Seyed Hossein Mousavian

By Stephen Young

By Allie Maloney

RELATED POSTS

By Thomas Gaulkin

By Dawn Stover

By Zohaib Altaf

By Steph Batalis, Vikram Venkatram

By Rachel Nuwer

By Erik English

Receive Email Updates

Recent Stories

Don't miss an update

Receive Email
Updates