We’re Building AI On A Broken Foundation

The Public Records and Archives Administration Department sat just off a main road in Accra, Ghana, and it was the kind of building you could walk by without noticing if you didn’t know what was inside. The interior paint was fading, and the air was thick and hot, even in December. Afternoon light filtered in through the windows and settled over rows of boxes that had been removed from the basement. No machinery. Nothing humming or blinking. In fact, no screens or terminals at all. Just paper and time.

It was the semester after my junior year fall when, following a graduate-level historiography course, I traveled to West Africa to investigate the complex narrative of decolonization in the Gold Coast. I was working through a set of archived materials: internal memos, handwritten letters, and correspondence between Kwame Nkrumah, the first president of Ghana, and other world leaders.

Some documents were folded into themselves and others were clipped with fasteners that stained the margins of the sheet. The legibility varied: sharp and deliberate in some places, and faded in others. In a conversation with the Senior Records Officer, I learned there has been a decades-long effort to document the country’s history. But with lack of time, resources, and staff, it had been difficult to truly make everything cataloged, let alone digitized. In a sense I was charting my own system––moving slowly piece by piece, contextualizing history page by page. However, one letter stopped me.

It was written in a tight, careful script, the letter was addressed to Kwame Nkrumah, situating it at the intersection of Soviet statecraft and African decolonization. Its’ authorship by a USSR official linked it to a broader Cold War narrative of ideological outreach– one that I had encountered in an entirely different contenxt. According to most accounts, their place in the historical record was fixed, having been cataloged, cited, and effectively settled. But this letter complicates that narrative, revealing hesitation and uncertainty. The problem was that this truth didn’t exist anywhere beyond this room. It was not digitized or searchable. It wasn’t even really connected to anything beyond the box it sat in.

Holding this document, I realized this piece and thousands of others weren’t hidden.They were invisible. This moment didn’t stay in the archive. It followed me throughout my travels to Senegal, The Gambia, and Cote d’Ivoire. It changed the way I thought about preserving knowledge.

The problem wasn’t simply that the document hadn’t been digitized. Even if it had been , it could not be used in a way analysis required. What’s missing are systems that preserve documents and evidence, maintaining their provenance, context, and internal distinctions. Without such systems the human record is diluted to text without context, mixed in with thousands of other documents, and summarized or paraphrased to the point where the original content is diminished.

This all leads to the present AI boom. Governments, universities, and tech companies are racing to upload documents, archives, and institutional knowledge into systems. Entire archives that once would’ve taken decades to go through are being processed in a month. Often, the goal is simply to make everything accessible–and thus searchable and usable. Though it’s a step in the right direction, this work relies on the assumption that the core problem today is absence; that certain communities, histories, and languages have been left out and the solution is to generate and collect more data so they can be rightfully included.

That assumption isn’t wrong, and while I fully agree with the intention, it is nonetheless incomplete. Because inclusion is not the same as understanding.

Today you can digitize an entire archive, feed it into a foundational AI model, and still have an incomplete picture of which claims came from which documents, what was inferred along the way, and where something went wrong within the proverbial AI black box. For example, a set of Cold War correspondence might include a Soviet memorandum referencing a meeting with Kwame Nkrumah. An AI model could correctly summarize the memo’s contents, but fail to distinguish whether the claims reflect official Soviet policy, secondhand reporting, or internal speculation. Additionally, it may collapse draft language into final positions, or attribute a statement to Nkrumah that in fact originated with a Soviet intermediary producing a clean narrative that obscures the document’s actual evidentiary weight.When this happens, we create confidence without grounding because AI systems don’t preserve where something came from. They flatten sources, strip context, and produce answers that sound authoritative but can’t be traced.

For a historian, it’s not just a limitation, but a collapse of methodology itself. For us, sources aren’t interchangeable. A handwritten letter is not the same as a public statement. A private memo doesn’t carry the same meaning as a published report. A document written under pressure tells a different story than one written freely. AI systems collapse those distinctions into a single output and remove the friction that forces you to ask questions: Where did this come from? Who created it? Under what conditions? What might be missing? You get an answer that feels complete, but you lose the ability to verify it.

This problem deepens the moment documents are separated from where they came from. When a record is pulled out of its institutional context and absorbed into a training system, something critical disappears. When the document simply becomes data, its history of custody, interpretation, and correction does not always come with it. What remains is content without accountability. Historians are trained to interrogate sources. They look for origin, authorship, bias, and silence. They are taught that what isn’t said can matter just as much as what is, and that absence itself can be evidence. Archivists operate one layer deeper. They track provenance: the chain of custody that traces a document from its creation to its current location. They preserve context and understand that removing a document from that context doesn’t just relocate it–-it changes what it means.

Most technologists aren’t trained to think this way. They are optimizing for speed, scale, and usability. They want systems that can retrieve, summarize, and generate information instantly. And those goals matter. But in the process, the structure of evidence, like provenance, context, verifiability, often falls away. This is not a failure of intention but rather a difference in orientation. At a deeper level, it’s a difference in what the record is for. In many technical systems, historical material is treated as input: something to improve performance, refine outputs, or expand coverage. But for historians, archivists, and the communities connected to these materials, the record is not input. It is evidence and carries weight.

Right now, the people who understand this distinction are not the same people building the systems that will define how knowledge is processed in the future. The systems are being built, but the ethical standards are not. Not having standards begets disastrous consequences because making information accessible is not the same as making it trustworthy.

We assume that because something can be retrieved, summarized, or generated, it is grounded in truth. But accessibility without accountability doesn’t clarify knowledge, it replaces it with something thinner, more confident, and far less reliable. The risk we now have is not just that we get things wrong. It is that we lose the ability to tell what we have.

When I walked out of that archive in Ghana, nothing in the world outside suggested what was sitting just behind those walls. But I knew what I had seen. There was something real and fragile that didn’t quite announce itself or surface on its own, and frankly didn’t exist unless someone was looking for it.

But that document didn’t just matter because it existed. It mattered because it could be placed, questioned, and traced. The future of knowledge depends on whether our systems can do the same. Without that, the knowledge itself doesn’t disappear; it simply becomes invisible again.