← Back to all blogs

Craig Cook | 07 January 2026

The Public Sector Data Paradox. Why AI Fails Before it Starts

DevOps Metrics

Every public sector leader has felt the pressure over the past 18 months. AI is everywhere, in ministerial briefings, strategy documents, vendor pitches, and board papers. Teams are experimenting with chatbots. Productivity tools like Co-pilot are rolling out across departments. Everybody wants to be seen as doing something with AI.

And yet, quietly, behind the headlines and the hype, a pattern keeps repeating.

  • AI pilots stall.
  • Chatbots impress nobody.
  • Automations work in the controlled environment of a demo but collapse when they hit a real service.
  • Teams become sceptical.
  • Budgets get diverted elsewhere.

And leadership is left wondering why government can’t seem to make AI stick.

AI isn’t failing because of the technology. It’s failing because the data isn’t ready.

Public sector organisations are running head-first into a structural problem that no amount of tooling can solve. They are both data-rich and data-poor at the same time.

They hold some of the most valuable information in the country, often far richer than anything the private sector has, yet most of it is locked inside legacy systems, inconsistent formats, buried documents, and knowledge that never leaves people’s heads.

Until this paradox is resolved, AI will always under-deliver. And no model, no platform, no AI vendor can fix that for you.

The Public sector data paradox

On paper, public sector organisations have everything they need.

Council records going back decades. Crime stats and incident logs. Benefits histories. Planning applications. Operational workflows. Environmental and transport datasets. Case management information. Procurement and financial data. You name it, it exists somewhere.

But the problem isn’t quantity. It’s access, structure and trustworthiness.

Data lives in different formats across incompatible systems. Older records don’t match newer ones. Metadata is missing. Permissions are inconsistent. PDFs, PowerPoints, scanned documents and buried email attachments become the default place where knowledge ends up.

And a huge amount of operational reality exists only as tacit knowledge, the unwritten how things actually work, that never makes it into documentation.

So even though the public sector is swimming in information, very little of it is in a condition that an AI system can actually use. This is what makes organisations feel both overwhelmed and empty-handed at the same time.

Why AI cannot ‘fix’ poor or fragmented data

There’s a persistent myth that AI can ‘fix’ bad data. It can’t. It simply reflects, extends and amplifies whatever patterns it finds.

  • If the data is incomplete, AI will confidently fill the gaps – often with nonsense.
  • If the data is biased, AI will reinforce the bias.
  • If the data is inconsistent, AI will hallucinate its own version of consistency.
  • If the data is locked away, AI will fall back on generic knowledge and ignore your specific context.

And this creates a dangerous illusion. Leaders see AI doing clever things in demos or on general datasets and assume the same will work on their services.

But the moment the model meets real public sector information, real planning histories, real claims data, or real casework, it falls apart.

Not because the AI is bad but because the inputs are.

In the private sector, this can be annoying. In the public sector, it can be catastrophic.

The six attributes of AI-ready public sector data

In our guide, Feeding the Beast: Managing Data for Public Sector AI, we outline the six attributes of high-quality data; the factors that explain most AI failures before they even occur.

Good data is accurate, complete, consistent, timely, relevant and traceable. Miss one of those and the entire value chain breaks. And unfortunately, most public-sector data fails on several at once.

Accuracy. Does the data reflect reality? Inaccurate data cascades into incorrect predictions and unsafe decisions.

Completeness. Are all required fields present? Missing values distort eligibility, risk and service outcomes.

Consistency. Does the same variable have the same meaning everywhere? ‘DoB’ vs ‘Date of Birth’, is enough to break automations.

Timeliness. Is the data current? Out-of-date data makes AI outputs unreliable.

Relevance. Is the data useful for the decision? Noise increases bias and reduces precision.

Provenance. Can you prove where the data came from and how it changed? This is crucial for explainable AI and public audit requirements.

If any of these fail, the entire AI workflow becomes unstable.

None of this stops day-to-day work because people adapt. But AI cannot adapt. It can only ingest what it understands. And if your data estate looks like a timeline of every system government has ever commissioned, the model will be forced to fill the gaps itself.

The quality of the data determines the quality of the AI. It’s as simple as that.

Download the guide Feeding the Beast: Managing Data for Public Sector AI

SharePoint. A repository, not an AI knowledge base

This is a sensitive topic, but it has to be said plainly – SharePoint is not a knowledge system. It is a storage system.

It has served its purpose well. It keeps documents accessible and secure and every department depends on it. But if the goal is to make organisational knowledge usable by AI, SharePoint becomes a bottleneck, because it’s where information goes to rest, not where it goes to be transformed.

Think about the average workflow. Someone writes a report. Someone else writes a PowerPoint. Another creates a PDF. Everything ends up in SharePoint. Nobody tags anything. No structure is imposed. No automation extracts the useful content. And six months later, no model, no matter how advanced, can find the needle in the haystack.

This is why so many AI tools deployed in government end up sounding generic. They simply don’t have access to the rich internal context that teams rely on daily.

Until organisations stop treating SharePoint like a digital filing cabinet and start turning it into a source of structured, reusable knowledge, AI will always be starved of the inputs that make it valuable.

Tacit knowledge. The most critical and least visible data source

And there’s another blind spot. The most operationally important knowledge in government never touches a system at all. It exists in the conversations between caseworkers and in the unwritten rules about edge cases.

It’s the quiet this is how we’ve always done it that shapes real decision-making and the judgement calls made by experienced staff that no workflow document captures.

This tacit knowledge is gold. It is also invisible to every AI model unless you deliberately capture it.

This is why Communities of Practice are so important. They create the space for people to articulate their workflows, share what works, compare approaches and document the lived reality of public services.

That material then becomes training data, not in the machine learning sense, but in the sense that it gives AI systems the context they desperately lack.

Without this, even the cleanest datasets won’t tell the full story.

Why the paradox has persisted

No-one in government set out to create a fragmented, inconsistent, legacy-bound data estate. It emerged gradually over decades, shaped by budget cycles, outsourcing waves, siloed procurement, emergency fixes, incompatible systems and constant policy change.

The structures weren’t built for AI. They were built to keep critical services running.

And because operational demand is relentless, benefits need processing, planning needs approving, cases need triaging and therefore, data cleanup always falls to the bottom of the list. Nobody gets promoted for metadata. Nobody has time to standardise 20 years of history because everyone is firefighting.

AI can change that equation but without reliable data, AI simply cannot function safely or meaningfully. It forces the issue. For the first time, data quality isn’t a nice-to-have, it’s a fundamental dependency for transformation.

That’s why breaking the paradox isn’t optional anymore. It’s foundational.

A practical, engineering-led way forward

The good news, and this often surprises leaders, is that you don’t need a giant transformation programme to get AI-ready. You just need the right sequencing coupled with the right engineering.

Start by understanding what data you already have, not by buying tools. Look for the information that actually drives decisions, the case histories, eligibility rules, complaints patterns, planning constraints, policy exceptions.

Then identify the quick wins. Metadata correction, automated extraction, digitising high-value archives, mapping how information moves through services and capturing operational knowledge from frontline teams.

Once you’ve done this, the data estate becomes dramatically easier to work with. AI models can be given meaningful, structured context. Agents can be trained safely. Decision pathways can be audited and provenance can be established.

Only then does automation become viable. Not before.

This is the shift the public sector needs to make – from AI-first thinking to data-first AI.

If you want a deeper, practical breakdown of how to:

  • Clean and structure legacy data
  • Capture tacit knowledge
  • Turn SharePoint into a usable knowledge layer
  • Build provenance and auditability
  • Prepare for decision-making AI
  • Safely move from Co-pilot to custom AI agents

Download. Feeding the Beast. Managing Data for Public Sector AI