The Hidden Cost of Unstructured Data in Your LLM Strategy

Part One is a five-minute read. Part Two has the evidence for anyone who wants to dig deeper.

Part One: What's Going Wrong and What To Do About It

You've probably been here. The AI pilot went well. The board was impressed. Budget was approved. And now, a few months into real-world use, something feels off.

The outputs are almost right — but not quite. Your finance team rewrites the summaries before sending them anywhere. Your ops people spend more time double-checking AI-generated reports than it used to take to write them by hand. And nobody can explain why the invoice-matching automation keeps confidently assigning the wrong reference numbers.

The natural reaction is to blame the model. Upgrade it. Fine-tune it. Write better prompts. But in most cases, the model isn't the problem. The data underneath it is.

The real issue is messier than it sounds

Every organisation has its own internal language. Finance calls something a "transaction." Sales calls the same thing a "deal." Operations calls it a "job." These aren't just different words — they sit in different systems, in different formats, with different assumptions baked in.

This has always caused friction. Reports that don't match across departments. Dashboards that tell different stories depending on who built them. People learned to work around it. They'd pick up the phone, ask a colleague, use their judgement.

Large language models don't do any of that. They take whatever data you give them and process it with total confidence. If your data says two contradictory things about the same customer, the model picks one — or blends both — and delivers the result as if it's settled fact. No hesitation. No caveat. Just a clean, articulate, wrong answer.

And you're paying for every single one.

This is a budget problem, not just a quality problem

LLMs charge by the token. Roughly speaking, every word you send in and every word you get back costs compute. When the data going in is messy — inconsistent formats, duplicate entries, conflicting definitions — the model has to work harder. It pulls in more context to make sense of things. It takes more passes to generate something usable. And then someone has to review it anyway, because the output doesn't feel trustworthy.

Most organisations waste somewhere between 40% and 60% of their token spend just through how their data is structured before the model even starts doing the thinking. A mid-sized company running 50,000 AI queries a day with poorly formatted data could be burning through €100,000 to €200,000 a year in unnecessary compute. That's before anyone counts the hours spent checking and correcting the outputs.

Here's the uncomfortable part: while the price per token is dropping, overall AI spending is climbing fast — it roughly doubled in under a year. The unit cost is going down, but the total bill is going up. Data quality is the single biggest cost lever most companies aren't touching.

Make it concrete: the invoice problem

Your AI scans incoming invoices and matches them to payments. Straightforward enough. But your finance system calls it an "invoice number," your operations platform calls it a "reference ID," and your procurement tool uses a six-digit code that doesn't map to either.

The AI doesn't pause to ask which one is right. It picks the best match it can find and moves on — often matching the wrong invoice to the wrong payment. Multiply that across thousands of invoices a month.

Surveys consistently show that invoices are the single biggest source of AI data errors, accounting for roughly a third of all document-processing mistakes. A single invoice error costs somewhere between €50 and €200 to sort out. One company found $42 million in duplicate billings in just twelve months — and that was with humans in the loop catching mistakes.

When AI inherits those same inconsistencies, it doesn't catch them. It scales them.

Why smarter prompts won't save you

There's always a temptation to fix this with cleverness on the AI side. Better prompts. Smarter retrieval. More guardrails. And these things genuinely help — sometimes a lot. Retrieval-Augmented Generation can cut hallucination rates significantly. Caching avoids redundant work. Routing simple queries to cheaper models saves money.

But the research is consistent: these are treatments for symptoms, not cures for the disease. When researchers tested the best available prompt techniques on models working with bad data, error rates dropped from roughly 66% to 44%. That's real progress — but the model was still wrong nearly half the time.

Every one of these technical fixes depends on the quality of the data underneath. RAG only helps if the knowledge base it's pulling from is accurate. Fine-tuning only works if the training data is consistent. Prompt engineering only sharpens outputs when the inputs are clean.

If you're looking for lasting improvement — not workarounds that buy time while costs quietly compound — the work has to start with the data.

Two paths forward

Which one you take depends on your resources and your tolerance for risk.

Governance first. Do the foundational work before you scale. Define the core concepts your business runs on — what is a customer, a transaction, an invoice, a product? Get every team using the same definitions. Clean and structure the data domains that matter most to your AI use cases. Then deploy with confidence.

This is the more cautious path. It's slower. But it avoids the cycle of expensive rework that catches so many organisations off guard.

Governance in parallel. If you've got the budget and the nerve, start deploying AI on carefully chosen use cases while building your data foundations at the same time. Pick areas where your data is already in good shape. Accept specific risks on specific projects. Learn fast and build the plumbing as you go.

The research shows that the companies getting the most value from AI tend to take this second path — they move fast, fail fast, and learn fast. But the critical word is deliberately. Running AI and governance side by side with a clear plan is one thing. Deploying AI and hoping to sort the data out later is something else entirely.

On that point, the numbers are sobering. AI project abandonment more than doubled between 2024 and 2025. Roughly 95% of enterprise GenAI pilots fail to deliver measurable impact. The common thread in nearly every failure is data.

For DACH organisations, the clock is also ticking. The EU AI Act becomes fully applicable in August 2026. It requires that data used in high-risk AI systems be relevant, representative, and — as far as possible — free of errors. Not as a recommendation. As law. With penalties up to €20 million or 4% of worldwide turnover.

Combined with the EU Data Act and GDPR, that's a triple compliance layer. Most DACH organisations aren't ready for it yet — which is a risk, but also a head start for those who move now.

What it comes down to

The conversation most leadership teams need to have isn't about how much to spend on AI. It's about how much they're already wasting.

Data governance doesn't demo well. Nobody gets excited about aligning field definitions across three ERP systems. But it's the difference between AI that compounds value and AI that compounds cost. The organisations that get this right — whether they do the governance work first or run it alongside deployment — will spend less, get better results, and avoid the rework cycle that's quietly draining budgets everywhere else.

Everyone else keeps paying the hidden cost.

Join the conversation

Have you seen the hidden cost of bad data in your own AI projects? I'd love to hear your experience — join the discussion on LinkedIn.

Part Two: The Evidence

Everything in Part One is grounded in specific research. This section lays out the data for anyone who wants to verify the claims, challenge the numbers, or take this to their board with sources attached.

The readiness gap

The scale of the disconnect between AI ambition and data readiness is well-documented. Gartner's Q3 2024 survey of data management leaders found that 63% of organisations either lack or are unsure whether they have the right practices for AI. Only 4% reported their data as fully prepared. [1]

An HBR Analytic Services survey from February 2026 sharpens that: 89% of leaders call data governance highly important for AI, but only 37% rate their own organisation as proficient. Just 15% consider their data "very ready" for the next wave of agentic AI. [2]

McKinsey's State of AI 2025 report tells the same story from the adoption side: 88% of organisations use AI in at least one function, but only 1% call themselves mature. Even among high performers, 70% report governance difficulties. [3]

What this looks like in the DACH region

The gap is particularly stark — and the opportunity particularly large — for German-speaking markets. A 2025 report by Dr. Justus & Partners found that 94% of Mittelstand firms have not implemented AI. [4] Roland Berger's "Data Imperative" study found 71% of European enterprises struggling to access reliable data, with only 25% calling their infrastructure GenAI-ready. [5] Cognizant's DACH-focused research confirmed that businesses in the region rate their data readiness fairly highly but score themselves poorly on compliance with their own internal frameworks — awareness without follow-through. [6]

Hallucinations and how bad data makes them worse

Vectara's Hallucination Leaderboard — the closest thing the industry has to a standard benchmark — shows popular LLMs fabricating information 2.5–8.5% of the time in basic summarisation tasks. [7] In specialist contexts, rates jump: a 2024 JMIR study found GPT-3.5 hallucinated nearly 40% of medical references. [8] A 2025 Mount Sinai study published in Nature planted deliberate fabrications in clinical cases and found that six leading LLMs repeated or elaborated on those errors up to 83% of the time. [9]

Hallucinations aren't purely a data quality issue — model architecture and training incentives play a role too. But bad organisational data amplifies an inherent weakness. You're adding noise to a system that's already prone to confident guessing.

What the token waste actually looks like

Research from The New Stack puts the waste at 40–60% of token spend for most organisations, driven primarily by how data is formatted before it reaches the model. [10] GetCrux tested 10,000 questions and found CSV consumed 56% fewer tokens than JSON for identical tabular data — at enterprise scale, a single workload's optimisation saved roughly $1,740 per month. [11]

Poorly tuned retrieval architectures make it worse, inflating input tokens by 3–4× when they pull too many document chunks. [12] And in practice, over 30% of enterprise RAG queries turn out to be repetitive or near-identical, each one triggering the full processing chain from scratch. [13] Meanwhile, total model API spend doubled from $3.5 billion to $8.4 billion between late 2024 and mid-2025. [14]

The invoice numbers in detail

Parseur's January 2026 survey of 500 professionals found 88% reporting errors in AI-processed document data. Invoices topped the list at nearly 32%. [15] HabileData's research spells out the mechanism: when departments apply different definitions to the same fields, the contradictions multiply with every automated handoff. [16] Only 9% of accounts payable departments are fully automated [17], and OpenEnvoy flagged $42.1 million in duplicate billings across its customers in a single year. [18]

Where prompt engineering hits its ceiling

The Mount Sinai study is the clearest data point here: best-in-class prompt mitigation brought hallucination rates down from 65.9% to 44.2%. Progress, but still wrong nearly half the time when the underlying data was poor. [19] Researchers writing in Communications of the ACM concluded that hallucinations are baked into how current LLMs work and cannot be fully eliminated. [20]

The missing framework

Right now, the tooling market is fragmented. LLM observability platforms like Langfuse [21] track token spend but not data quality. Data quality tools like Monte Carlo [22] measure data health but not AI costs. FinOps frameworks [23] are catching up to generative AI but haven't integrated data quality yet. Nobody has built the bridge.

The failure and cost data

S&P Global data shows AI project abandonment jumping from 17% in 2024 to 42% in 2025. [24] MIT Sloan Management Review research argues that technical debt in AI compounds faster than in traditional software. [25] Google's seminal NIPS paper on machine learning systems identified data dependencies as a source of maintenance costs that grow over time rather than shrinking. [26]

On the cost side: IBM's January 2026 analysis found over 25% of organisations losing more than $5 million a year to poor data quality, with 7% exceeding $25 million. [27] Fivetran's 2024 survey — which included German respondents — found that AI models trained on bad data drove misinformed decisions costing an average of $406 million per organisation. [28]

The regulatory clock

The EU AI Act [29] becomes fully applicable on 2 August 2026. Article 10 [30] requires data for high-risk AI systems to be relevant, representative, and free of errors. The EU Data Act has applied since September 2025, but Bitkom found only 1% of German firms have fully implemented it. [31] DACH enterprises carry roughly 46% dark data — collected but never governed — which adds an estimated $900,000 to breach costs per incident. [32]

References

All sources were validated on 4 March 2026. Publication dates reflect original source material.

[1] Gartner — Lack of AI-Ready Data Puts AI Projects at Risk (Feb 2025) — 63% of organisations lack AI-ready data practices; projected 60% AI project abandonment by 2026.

[2] HBR Analytic Services / Reltio — Readiness for Agentic AI (Feb 2026) — Only 37% of leaders rate their governance as proficient; 15% data-ready for agentic AI.

[3] McKinsey — The State of AI 2025 (Mar 2025) — 88% AI adoption but only 1% maturity; 70% of high performers report governance difficulties.

[4] Dr. Justus & Partners — 94% of Mittelstand Without AI (2025) — Mittelstand AI implementation gap in Germany.

[5] Roland Berger — Gen AI × Data Management (PDF) (May 2025) — 71% difficulty accessing reliable data; only 25% infrastructure GenAI-ready.

[6] Cognizant — Gen AI Adoption in DACH (Oct 2024) — DACH self-assessed readiness vs actual governance compliance gap.

[7] Vectara — Hallucination Leaderboard (GitHub) (Ongoing) — Industry benchmark: 2.5–8.5% hallucination rate in summarisation tasks.

[8] JMIR — Hallucination Rates of ChatGPT and Bard (May 2024) — GPT-3.5 hallucinated 39.6% of medical references.

[9] Nature — Mount Sinai LLM Hallucination Study (2025) — Six LLMs repeated planted fabrications in up to 83% of clinical cases.

[10] The New Stack — Token-Efficient Data Prep (2025) — 40–60% waste from data serialisation inefficiency.

[11] GetCrux — CSV vs JSON Token Experiment (2025) — CSV consumed 56% fewer tokens than JSON; $1,740/month saving per workload.

[12] Silicon Data — LLM Cost Per Token (2026) — Poorly tuned RAG inflates tokens 3–4×.

[13] Towards Data Science — Zero-Waste Agentic RAG (2025) — 30%+ of enterprise RAG queries are repetitive.

[14] Pluralsight — Cutting LLM Costs (2025) — API spend doubled from $3.5B to $8.4B in under a year.

[15] Parseur — Data Confidence Gap (Jan 2026) — 88% of leaders find errors in AI document data; invoices #1 source.

[16] HabileData — Data Entry Errors (Jan 2025) — Cross-departmental definition inconsistency causing conflicting outputs.

[17] MHC Automation — Accounts Payable Issues (2024) — Only 9% of AP departments fully automated.

[18] OpenEnvoy — Invoice Errors (2024) — $42.1M in duplicate billings found in 12 months.

[19] Healthcare IT News — Mount Sinai LLM Study (2025) — Prompt mitigation reduced hallucinations from 65.9% to 44.2%.

[20] ACM — LLM Hallucinations: Bug or Feature? (2024) — Hallucinations inherent to current LLM architecture.

[21] Langfuse — Token and Cost Tracking (Ongoing) — LLM observability; tracks tokens but not upstream data quality.

[22] Monte Carlo — Data Quality Framework (2024) — Data quality measurement; no AI cost integration.

[23] Finout — FinOps for Generative AI (2025) — FinOps adapted for GenAI; lacks data quality layer.

[24] S&P Global — AI Implementation Paradox (2025) — AI abandonment jumped from 17% to 42% in one year.

[25] MIT Sloan — Tech Debt in the AI Era (2025) — AI technical debt compounds faster than traditional software debt.

[26] Google — Hidden Technical Debt in ML Systems (NIPS) (2015) — Data dependencies create compounding maintenance costs.

[27] IBM — Cost of Poor Data Quality (Jan 2026) — 25%+ of orgs lose over $5M/year from bad data; 7% exceed $25M.

[28] Fivetran — $406M in Losses from Poor Data (2024) — Average $406M revenue impact; German respondents included.

[29] European Commission — EU AI Act (Jun 2024) — Full applicability 2 August 2026; regulatory framework overview.

[30] EU AI Act — Article 10: Data and Data Governance (Jun 2024) — Data must be relevant, representative, and free of errors for high-risk AI.

[31] Bitkom — EU Data Act Readiness (2025) — Only 1% of German firms have fully implemented the EU Data Act.

[32] Data Stack Hub — Dark Data Statistics (2025) — 46% dark data in enterprises; $900K added breach cost.