
The next iteration of AI could very well come from an unexpected place, one that is not widely considered by technologists and enterprises – archival records. Billions of pages of credible, history-rich, humanly complex sources containing information that cannot be found anywhere else, including online.
Unlocking and digitizing analog archives could help usher in the next era of AI performance gains, but doing so necessitates new public-private partnerships to make that data accessible while remaining protected. If we can make these rich historical records more available to the public in doing so, the upside is substantial.
AI platforms and data-sourcing companies have already scraped many edges of the internet for high-quality source material to improve their models, landing several of them in legal and reputational hot water. The open web is nearly exhausted as a training source, and it’s been increasingly contested from a legal and ownership standpoint.
Moreover, https://hbr.org/2026/02/why-ai-adoption-stalls-according-to-industry-data early gains from data ingestion are beginning to taper off as models run into diminishing returns from existing training approaches. As the next frontier of AI prioritizes intelligence, it’s time to reconsider how we help these models get there.
The case for archives
There are tens of thousands of cultural heritage institutions across the country that are collectively sitting on a trove of provenance-rich, credible original source materials, waiting to be digitized.
Unsurprisingly, archives are one of the most underutilized inputs for AI, yet they offer a unique value that scraped web data lacks, and synthetic data simply cannot replicate – unparalleled evidential context, chronology, and institutional credibility. Information from historical collections that have prioritized provenance and original order to preserve authenticity and avoid subjectivity. What’s standing in the way isn’t a matter of what exists in these long-unseen collections but rather how much has been digitized, and the answer is barely a sliver.
The biggest obstacle to accessing archives, public or commercial, has always been the expertise, labor, and cost required to skillfully analyze and then digitize analog records to make them searchable and usable at scale. The process demands more than just technical conversion, but rather the deep contextual understanding of an archival expert which further raises the value and credibility of these records for AI systems that rely on high-quality, well-understood data.
The private sector opportunity
With AI reaching a critical mass as model developers and enterprises search for higher-quality source material, it may be the first market force strong enough to make archival backlogs economically addressable. This opens the door to an unlikely, yet mutually beneficial relationship. For the private sector, it means improving AI with deeply contextual material, while archives harness private demand to fund long-delayed public access.
As it currently stands, even the National Archive’s massive collection may stay unseen for decades as the understaffed agency faces slim budgets and a longstanding digitization backlog. Private capital could fill these types of gaps. Archives at the federal, state, and local levels have all been chronically underfunded for decades, but outside investment could hold the key to financing the processing and digitization efforts that public budgets alone have consistently failed to sustain.
Private sector involvement can also spur innovation in this space with the opportunity for new tools to dramatically reduce the cost and time required to process analog records. Right now, it typically takes over 10 hours to process a single foot of archival material, and 3x longer for disorganized collections. If archival processing can be done faster and at lower cost, institutions could finally address their backlogs and then digitize more of their valuable collections, making them discoverable not only to researchers, educators, and the general public, but responsible tech and AI companies as well.
The information within archives cannot be replaced or replicated by synthetic data, and it exists on a scale that is unfathomable. It’s indisputable information about our world that doesn’t exist anywhere else yet could help AI models better understand economies, governments, systematic issues, and societies – concepts that users are increasingly relying on AI to explain. It simply has to be taken out of the box, and there is no shortcut to doing it correctly.
But the terms and conditions matter
The concern is not that private companies will want access to what’s in these archives; they will. It’s that commercial demand could begin to shape public access at archival institutions. What gets digitized, described, indexed, and surfaced is what becomes the historical record. If those decisions are driven exclusively by market value, then records that hold essential truths and stories may not become available, and we risk public memory being distorted. In a balanced world, private capital can support the processing of these collections while institutions retain authority over their collections.
We’ve seen what happens when this isn’t the case. Ancestry.com partnered with a Pennsylvania state agency to digitize 45 terabytes of historical records to further improve its platform. Work was completed that likely wouldn’t have been economically feasible without those private sector resources. But backlash ensued over concerns of otherwise public records now being trapped behind a paywall. Similar concerns have also been raised about digitization agreements between the National Archives and its partners.
The path forward hinges on intentional collaboration
This is perhaps one of the only moments where backlogs can be addressed at this scale thanks to the unrelenting market power that is AI. But in order for those commercial partners to benefit from these valuable public materials, there has to be a balance.
Supporting this work means funding humanities programs, investing in digitization, and implementing tools that allow archival material to be systematically processed and integrated while respecting ethical stewardship and archival tenets. There will also be a need for archival institutions and professionals to approach potential innovative partnerships with an open mind.
Enterprises that embrace this approach will earn a competitive advantage and see AI outputs that are more reliable, grounded in authentic human context, and capable of informing strategic business decisions, while archives flourish with a funding solution. A clear win-win for both private enterprise and society.


