
This year, between 80 and 90 percent of all enterprise data will be unstructured (IDC). Although most businesses rely on structured data for decision-making, analyzing unstructured data offers unique benefits. It offers insights into operational challenges; provides the “why” behind certain data; weighs supplier and customer sentiment; and uncovers hidden risks and opportunities.
Why unstructured data matters
Unstructured data offers a myriad of insights, including emerging bottlenecks; the “why” behind the data; true customer and supplier sentiment; and hidden contractual risks and opportunities.
Emerging operational bottlenecks
A team’s real-time operational challenges live in unstructured channels like Slack, Teams and email long before they show up in a formal report. Analyzing this data reveals the “small fires,” like a confusing new procurement step or a buggy internal tool, before they become massive, productivity-killing infernos.
The “why” behind the numbers
Structured analytics can show that a supplier’s on-time delivery metric has dropped. Unstructured email threads between that supplier and your procurement team will tell you that it’s because of a persistent invoice reconciliation issue. One is a symptom; the other is the diagnosis.
True customer & supplier sentiment
Structured data tells you what happened (e.g., a ticket was closed late), but unstructured data from emails and support chats tells you why and how the customer felt about it. It’s the difference between seeing a bad metric and understanding that a relationship is at risk.
Hidden contractual risk & opportunity
A contract’s structured data might show the renewal date and total value, but the unstructured text buried in the PDF contains the real risk: non-standard liability clauses, confusing payment terms, or auto-renewal clauses that the ERP can’t see. This is where companies unknowingly leak millions.
Ramifications of ignoring unstructured data
Ignoring the data – and not capturing and categorizing it – can result in burnout; compliance and audit failures; revenue leakage; and compromised relationships with partners and suppliers.
The “human swivel chair” & employee burnout
Enterprises spend millions on systems like Workday and Salesforce, but the connection between them is often a person in a swivel chair, manually reconciling data, rendering their most valuable experts as manual data integrators. They spend their days bouncing between emails, spreadsheets and legacy systems, doing low-value, repetitive work. It’s a direct path to burnout. The company hires smart people to solve problems, but they end up just moving data. AI agents replace that chair with a direct, intelligent data pipeline.
Pervasive compliance & audit failures
When an approval for a non-standard payment term or a change in a vendor’s banking information happens in an email thread, it’s not auditable. This “dark matter” of operational decisions introduces massive compliance, fraud and financial risks. . When auditors ask why a payment was made, the answer “I think someone approved it in an email” is a catastrophic failure.
Direct revenue leakage
This is the most immediate consequence. A prime example we see in healthcare is when a request to add a new billable medical device comes through from the operating room via an unstructured email. If that request is delayed or lost in the shuffle, every use of that device is unbilled revenue for the hospital, often costing millions in revenue leak per year.
Degraded supplier & partner relationships
When a supplier’s invoice is rejected because of a data mismatch, a human then has to manually dig through three systems to solve it and payment is delayed. In the case of one hospital client it meant uncovering a stuck invoice in onBase, digging through Workday for a corresponding PO, and navigating emails for any additional context. Delaying payments consistently erodes trust and damages strategic partnerships, ultimately leading to worse pricing and service for your company.
How AI enables insights from unstructured data
Because it can quickly process complicated, disorganized data that previously had been too difficult to handle, AI bridges the gap between messy emails, PDF contracts, and even image data.
AI understands intent – not just keywords
Traditional tools can search for a keyword like “invoice.” Modern AI, specifically Large Language Models (LLMs), can read an entire email thread and understand the sender’s intent: “This person is frustrated, because their invoice was rejected for the third time due to a PO mismatch, and they are threatening to put a hold on our account.” This is a monumental leap in comprehension which allows us to incorporate user sentiment into routing the workflow differently.
AI connects disparate information
AI’s real power is in its ability to act as a “cognitive bridge.” It can identify a PO number in an unstructured email, use that data to look up the record in SAP, cross-reference the line items against a PDF of the original contract stored elsewhere and present a synthesized summary to a human. It does the “swivel-chair” work of reconciliation in 30 seconds instead of 1-2 hours.
AI codifies and automates “tribal knowledge”
The rules for how to handle messy data often live only in the heads of a few senior analysts. AI enables this wisdom to be externalized. By observing how a company’s experts handle exceptions, an AI agent can learn the patterns and build a ruleset. It turns “what Sarah in purchasing knows” into a scalable, automated and retainable company asset.
Barriers and challenges in fixing unstructured data
Disorganization, multiple formats and the volume of unstructured data make it challenging for businesses to manage it.
The scale & variety problem
In its Data Age 2025 report, IDC forecasted that “the global datasphere will grow to 163 zettabytes (that is a trillion gigabytes). That’s ten times the 16.1ZB of data generated in 2016.”
The sheer volume and diversity of unstructured data (PDFs, emails, spreadsheets, images, chats) is overwhelming. Traditional ETL (Extract, Transform, Load) tools are built for the clean, predictable world of databases and break when faced with this much chaos.
The context trap
A single unstructured data point is often useless without context from multiple other systems. A vendor’s email asking, “What’s the status of invoice #5678?” can’t be answered without looking up that invoice in the ERP, checking the PO in the procurement system and looking at the receiving status in the warehouse system. Humans provide this context manually, which is why it’s so slow.
The “last mile” automation gap
Existing automation platforms like RPA and iPaaS are excellent for connecting structured systems (API-to-API). However, they fail at the “first mile – last mile,” the messy start and end of the process where a human has to read an email to kick things off or interpret a vendor’s response to close the loop.
Fixing unstructured data is fundamentally a human workflow problem
Fixing unstructured data isn’t a database problem; it’s a process re-engineering problem. The data is messy because the underlying human workflows are messy and undocumented. Most companies don’t know where to start, because they’ve never mapped the complex, cross-functional journey of a single operational request.
How leaders can manage their unstructured data
Leaders must manage their unstructured data to drive business results, but they have to be deliberate in their approach.
Find the financial epicenter of the pain
Don’t try to boil the ocean. Find the one operational workflow where unstructured data is causing the most direct financial harm. Is it invoice reconciliation causing late payment fees? Is it contract lookups causing missed savings? Quantify that single pain point in dollars and make it the first target.
Map a single “swivel chair” workflow
Pick one high-impact, low-satisfaction process. Shadow the team that does it. Document every single system they have to touch and every manual step they take to resolve one request. This map is the blueprint for your automation strategy.
Empower your experts with human-in-the-loop AI
Don’t aim for 100% autonomy on day one. Implement a solution that automates 80% of the data gathering and reconciliation that then presents a clean, validated summary to your human expert for the final, critical decision. This builds trust, reduces risk and delivers immediate value.
Invest in a system of execution – not just a system of record
You already have systems that store data, like ERPs and data lakes. You need to invest in a platform that can act on that data, but the future isn’t another dashboard; it’s an AI agent that can understand an unstructured request from an email, reconcile it against your systems of record and execute the resolution.
The bottom line
The sheer volume of unstructured data presents businesses with a big opportunity to address operational challenges; understand what the data means; identify how suppliers and customers feel about them; and position them to identify and overcome risks and take advantage of hidden opportunities.