Confidence Without Truth: Why Language Models Cannot Stop Hallucinating

September 7, 2025 at 2:00 PM

chatgpt image sep 8, 2025, 10_09_29 am.png

They told us the machine would speak only the truth. They promised us a super intelligence. With their charts and their leaderboards, and their billions of other people's dollars, great certitude would emerge. And yet here it stands before us, fluent and false, filling silence with words it does not know, delivering confidence without substance.

This is not a malfunction. This is the design. It is the child of our own hunger for certainty, and our terror of admitting we do not know.

The peril is not in the machine’s hallucination. The peril is in our willingness to believe it.

The chimera of certainty

In the modern mythology of artificial intelligence, hallucination has become the great Chimera: part truth, part fabrication, all confusion. Executives want certainty, regulators demand reliability, and investors crave the promise of precision. Yet the machines they bankroll continue to speak with conviction even when wrong. They do not so much lie as they confabulate, weaving fragments of the real into a seamless but illusory whole.

This is not a technical glitch. It is a structural inevitability. A new paper from researchers at OpenAI and Georgia Tech makes the case with unforgiving clarity: hallucinations are not bugs to be patched, but statistical certainties arising from the very objectives that govern how language models learn. Train them on pristine, error-free data, and they will still produce plausible falsehoods. Test them on gold-standard benchmarks, and they will still bluff.

For leaders in business and government, the implications are sobering. If hallucinations cannot be eliminated, then strategy must account for them. The challenge is not to wish them away but to plan around them: to build systems, policies and expectations that accept error as endemic, while constraining its fallout. The real risk is not that AI occasionally falters. It is that society convinces itself that flawless truth machines are possible, and so builds its institutions on sand.

The authors of the paper call this misalignment between model incentives and human trust an “epidemic.” They are right. Hallucinations endure because the culture of AI research, leaderboards, benchmarks, competitions, rewards confidence over honesty. Models are being trained to act like prize pupils at an exam: always guessing, never admitting ignorance. The result is a field in turmoil, where every convulsion of progress conceals an awkward reality: the closer we get to systems that can reason with fluency, the more likely they are to mislead with style.

Hallucinations are statistically inevitable

At the core of the paper lies a blunt proposition: hallucinations cannot be debugged out of existence. They arise from the same mathematics that make machine learning work.

The argument begins with a simple analogy. Imagine training a model to answer factual questions. Even if the training data contains nothing but true statements, the model is being optimized to imitate plausible responses, not to guarantee truth. In statistical terms, the task of generating language can be reduced to a classification problem: is this string valid or invalid? But classification, as any student of computational learning theory knows, has irreducible error. If a model can misclassify, it can hallucinate.

The paper formalizes this link. Generating valid text is at least as hard as classifying it correctly. Since classification carries a non-zero error rate, generation inevitably produces falsehoods. Put differently: hallucination is not an accident but a floor. The authors prove that hallucination rates must exceed the proportion of “singletons” in the training data, facts that appear only once. If 20 per cent of birthdays in the corpus are mentioned a single time, then expect at least 20 per cent of birthday questions to yield wrong answers. No amount of extra tuning or fine polish can erase that statistical scar.

This is a profound shift in how the industry should think about error. For years, hallucinations were treated as bugs, the digital equivalent of spelling mistakes, destined to vanish as models scaled. But the research suggests otherwise. Scaling reduces some forms of error, yet rare facts remain rare, and models stumble over them. The confabulations persist because the mathematics demands they must.

For corporate planners, the lesson is stark. Deploy a model into customer service, and it will fabricate at least some answers, no matter how carefully it has been trained. Embed a model in financial decisioning, and it will occasionally state numbers with unwarranted confidence. Regulators cannot legislate away this risk, nor can boards demand it be engineered out. Like credit risk in banking or failure rates in manufacturing, hallucination is a structural cost of doing business with machines that learn from language.

This matters because it reframes the entire governance debate. The false hope was that with better data and larger clusters, hallucinations will disappear. That misleads decision-makers into complacency. In truth, they must treat hallucination as a statistical constant, a background radiation of error that no technological shield can fully block. Strategy must therefore shift from elimination to mitigation: structuring incentives, evaluations and human oversight so that errors are contained before they metastasize.

The metaphor the authors might not use, but which hangs over their findings, is of a Pseudomorph: a crystal shaped like another mineral, masquerading as what it is not. A language model does the same. It mimics the surface form of truth even when the substance is hollow. To treat such systems as reliable arbiters of knowledge is to mistake the shape for the core.

The gamified test taker

If pretraining makes hallucination inevitable, post-training makes it unforgivable. The models are not scholars seeking truth. They are hustlers in an exam hall, trained to play a game where silence is punished and bravado is crowned.

Consider the rules. A correct answer wins a point. An “I don’t know” earns nothing. A wrong but confident guess loses nothing more than what was already lost. What, then, should the model do? Like the student staring down the multiple-choice sheet, it learns to gamble. It learns to bluff. It learns, above all, never to confess ignorance.

The researchers capture the trap in one unsparing line: “Language models are optimized to be good test-takers, and guessing when uncertain improves test performance.” That is the whole story. A machine that admits doubt is docked. A machine that lies with conviction is rewarded. The scoreboard will always favour the cheat.

And so we find ourselves in a strange inversion: the very benchmarks that claim to measure progress are teaching dishonesty. The leaderboard is not a ladder to truth but a staircase to deceit. Executives should recognize the irony. They would never tolerate a customer service agent who invented an answer rather than say “let me check.” Yet in their AI strategies they celebrate precisely that behaviour, mistaking recklessness for competence.

The consequences are larger than corporate mishaps. Governments, too, cling to the fiction that hallucination can be engineered away with clever tricks and extra training. But the authors show that the disease is institutional. As long as the scoring system punishes hesitation, the machines will perform as they have been taught: forever in the posture of the exam hall, always scribbling an answer, never daring to leave the page blank.

This is how epidemics spread: not from isolated failures but from structures that make failure rational. The epidemic of hallucination is not in the silicon. It is in the culture of evaluation. When the only acceptable posture is certainty, certainty will be feigned. And feigned certainty is a dangerous thing. It sounds like knowledge, it carries the authority of knowledge, but it has no substance behind it.

Here lies the torment for leaders. To abandon benchmarks is to surrender comparability. To cling to them is to enshrine dishonesty. It is the same dilemma that haunted the schools of the last century: train children to pass exams, and you send them into the world armed with answers that cannot withstand the first touch of reality. Train machines this way, and you condemn them to stumble the moment they leave the laboratory.

The result is a double crisis. Enterprises deploy generative AI, then recoil when customers complain of false answers. Regulators demand reliability, yet the very standards they use to judge progress are the ones breeding the deception. Everyone insists the machine must do better, but no one wants to rewrite the rules of the game.

Humans, at least, have the saving grace of humiliation. We learn to confess error because the cost of denial is too great. The machine has no such schooling. It does not blush when wrong. It does not pay the price of a lie. It only learns what the benchmark teaches: to answer always, to hedge never, to walk out of the exam hall with a perfect score and a hollow education.

The benchmark trap

Benchmarks promise truth but too often deliver distortion. They are supposed to be neutral yardsticks of progress. Instead, they warp incentives and teach machines to perform deceit. The authors put it plainly: “Binary evaluations of language models impose a false right-wrong dichotomy, award no credit to answers that express uncertainty, omit dubious details, or request clarification.”

In other words, silence counts for nothing, hesitation is punished, and the only winning move is to answer, however frail the confidence. That rule reshapes the whole field. A model that blurts out an error with conviction is celebrated on the leaderboard. A model that resists, that says “I don’t know,” is marked down. Bluff triumphs over honesty, because the system demands it.

This is what the researchers call an “epidemic of penalizing uncertainty.” Epidemic is the right word: the disease spreads by design. It moves from the metrics that define success, to the training procedures that optimise against them, to the companies that stake reputations on rankings, and finally to the regulators who rely on those scores as evidence of safety. Each layer amplifies the bias.

For business leaders, the strategic cost is clear. Any system procured under these rules will be biased toward bluffing. It will generate answers, yes, but answers that may shimmer with confidence while hollow within. That is not an accident. It is the predictable outcome of measuring performance in a way that prizes fluency over restraint.

For governments, the warning is sharper still. Raise the benchmark bar, demand higher scores, and nothing changes, except the intensity with which the models are trained to bluff. Policy framed this way risks creating not safer systems but slicker ones, more adept at passing the test while no closer to the truth.

Here lies the quiet tragedy. We are asking our machines to sit exams, and we are applauding the bravest guessers. The numbers on the scoreboard climb. The press releases glow. But the foundation is brittle, because what is being measured is not trustworthiness, only test-taking.

James Baldwin once wrote that “not everything that is faced can be changed, but nothing can be changed until it is faced.” Benchmarks, like mirrors, reflect what we value. At present they reflect the field’s obsession with right answers at any cost. Until that changes, hallucinations will remain not a quirk of the silicon, but the very grammar of progress.

The classification link

Strip away the technical layers and the heart of the paper beats with a simple revelation: hallucinations are nothing more than classification errors in disguise. A machine asked to generate an answer is doing something no easier, and often harder, than deciding whether a statement is valid. If it can misclassify, it will hallucinate.

The authors crystallise the point: “Generating valid outputs (i.e., avoiding errors) is harder than classifying output validity.” The error that surfaces in a fabricated dissertation title or a misplaced birthday is not exotic. It is the same statistical stumble that dogs every classifier ever trained.

This framing matters. It lifts hallucination out of the realm of mystery and anchors it in mathematics. Misclassifications are inevitable in supervised learning, and hallucinations are their generative echo. The two are formally, statistically linked. This is not a matter of engineering flaw but of logical necessity.

That necessity has strategic consequences. Business leaders may dream of a model that never invents, a system that answers only with facts. The theory says otherwise. Unless one is willing to constrain the model so tightly that it collapses into silence or rote memorisation, hallucinations will remain. The choice is not between perfection and imperfection, but between error and sterility.

The metaphor is of a mask. The hallucination appears as truth, shaped by the rhythms of grammar and the patterns of plausibility. Yet beneath the mask lies nothing. A pseudomorph of knowledge, statistical error rendered in fluent prose. To mistake that mask for reality is to court disappointment, or worse, catastrophe.

For governments, the lesson is sharper still. Regulation that demands “no hallucinations” is regulation that demands the impossible. Policymakers must instead ask: how many errors are tolerable, in which domains, and under what safeguards? Just as air travel tolerates mechanical failure only within strict margins, so too must AI be framed in terms of acceptable error, not unattainable purity.

The beauty of the classification link is its bluntness. It spares us the comforting illusion that hallucinations are quirks of architecture or quirks of scale. They are not. They are the mathematics speaking. Not only are we trapped in history, it is trapped in us. So too with these systems. They are trapped in statistics, and statistics are trapped in them.

Calibration and dishonestly

If the link between classification and hallucination explains why errors arise, calibration explains why they refuse to leave. In statistics, calibration is a badge of honour: a system that says it is 70 per cent sure is right seven times out of ten. To be calibrated is to be honest about one’s confidence.

But for language models, calibration comes with a sting. The authors write: “Calibration—and, hence, errors—is a natural consequence of the standard cross-entropy objective.” The very quality that makes a model statistically trustworthy ensures that it will hallucinate. A machine that does not hallucinate is not calibrated; it is playing a trick, smoothing over doubt, presenting itself as steadier than it truly is.

This is the paradox at the centre of strategy. Corporations want models that never fabricate. Regulators dream of systems that never err. But the mathematics will not oblige. To demand zero hallucinations is to demand dishonesty. To permit calibration is to accept that error will live among us, persistent as shadow.

For executives, the choice is brutal but clear. One can have a model that never appears to hallucinate, and therefore deceives about its own uncertainty. Or one can accept a calibrated system, which will sometimes falter but will tell you, in its own statistical way, how often it is likely to do so. One is a polished mask, the other a flawed mirror. Which to trust?

Governments face the same dilemma. To legislate hallucinations out of existence is to legislate away calibration itself. Such regulation would not protect the public. It would merely push the deceit deeper, producing machines that wear the mask of certainty but are hollow beneath.

This tension cannot be escaped. It can only be managed. And it demands a more adult conversation about truth, honesty, and error. A language model cannot live without error. The only question is whether it admits the error in its bones, or hides it behind a practiced smile.

A model that never seems to hallucinate clings to its innocence at the cost of truth. A model that confesses uncertainty may stumble, but at least it does not pretend to be what it is not.

The new floor

Every science has its lower bound, its irreducible minimum. Physics has the speed of light, economics its diminishing returns. For language models, the floor is set by rarity itself. The paper calls it the “singleton rate”: the share of facts that appear only once in the training data. That proportion, however small, becomes the baseline for hallucination.

The mathematics is unforgiving. If one per cent of the facts in the training corpus appear just a single time, then no model, however vast, can hallucinate less than one per cent of the time on that domain. The authors are unequivocal: “If 20% of birthday facts appear exactly once in the pretraining data, then one expects base models to hallucinate on at least 20% of birthday facts.”

This is not a software bug. It is a law of the terrain. Rare events will always be misremembered, reconstructed, or invented. The machine does not know it is fabricating. It simply has nothing else to offer. A fact encountered once is a whisper in a storm; the model strains to hear it, then fills the silence with something that sounds like truth.

For business leaders, the message is clear. No matter how much compute is thrown at the problem, rare events will trip the system. A chatbot for retail will stumble on obscure products. A legal assistant will err on precedent cases mentioned only once. A financial advisor will misstate the details of an instrument too arcane for the data to repeat. These are not anomalies. They are structural.

For governments, the policy lesson is equally stark. You cannot regulate the singleton rate away. Legislating that models “shall not hallucinate” is like legislating that probability must yield certainty. What you can do is recognise the floor, measure it, and build guardrails around it. In safety-critical domains, this may mean requiring human oversight whenever a question strays into rare territory. In lower-stakes settings, it may mean designing systems that flag uncertainty rather than manufacture answers.

The danger is not in the floor itself but in pretending it does not exist. A society that builds its expectations on the myth of flawless AI is like a builder laying foundations on sand. The structure may rise, but it will fall. Leaders must therefore plan as if the error is permanent, because it is.

The ignorance here is wilful: the refusal to acknowledge a statistical truth. Power comes from those who design, deploy, and regulate these systems. If they ally with ignorance, they will unleash not justice but disillusion. Better to face the floor now, and reckon with its limits, than to be buried by it later.

Institutional incentives

The mathematics explains why hallucinations arise, but it does not explain why they thrive. For that, one must look to the institutions that govern artificial intelligence. Models are shaped not only by data but by the culture of competition that surrounds them.

Reinforcement learning with human feedback was hailed as the cure. Yet it cannot erase hallucinations so long as the incentives point elsewhere. Train a model to please the grader, and it will please the grader. If hesitation is punished, it will not hesitate. If bravado is rewarded, it will bluff.

This is how industries entrench their own distortions. The rules are written not in the language of truth but in the language of the tournament. Leaderboards declare champions, investors chase them, companies trumpet their scores, and regulators lean on those scores as evidence of progress. In that loop, overconfidence is not a mistake but a strategy.

The authors capture the futility of the current order: “The biggest hallucination isn’t what the model says, it’s the belief that we can fix it without rethinking the system around it.” That indictment reaches far beyond the laboratory.

For executives, the meaning is clear. Benchmark results may impress shareholders, but they often mask systems trained for performance, not reliability. The risk is reputational: dazzling proofs of concept followed by quiet retreats when the bluff is exposed in the marketplace.

For governments, the danger is deeper. Demanding higher scores without altering how they are calculated only accelerates the arms race of overconfidence. Safety becomes theatre. Numbers rise, trust falls.

Institutions, like people, pay for what they permit themselves to become. And this industry has permitted itself to become a tournament of bluffers. Unless the incentives are rewritten, and unless humility itself is rewarded, hallucinations will remain the coin of the realm, the price of admission to a game that no one can truly win.

There is no general purpose truth machine

Every industry dreams of a universal tool, a machine that can answer any question, in any domain, with flawless accuracy. The research dismantles that dream. There is no general-purpose truth machine.

The logic is unforgiving. A model that generates widely, drawing from the full diversity of language, will hallucinate. A model that never hallucinates will collapse into narrow repetition, producing only what it has already seen. Diversity and error are bound together, two sides of the same coin. To pull on one thread is to tug at the other.

This trade-off matters. Companies may promise systems that are both imaginative and infallible. Regulators may legislate for models that are both general and precise. Neither ambition can be realised. The mathematics will not bend. Leaders who ignore this law are not hedging risk; they are courting illusion.

Here is where strategy must confront a harder truth. As the field evolves, decisions once left to people are being funnelled through machines. Corporate planning, customer service, product design, even public policy, all are moving toward automated mediation. The trajectory is not divergence but what might be called Univergence: the steady convergence of decisions into a single machine-led reality, where options narrow and judgment is outsourced.

Univergence is not utopia or dystopia in itself. It is a condition. The more institutions depend on language models, the more those models will shape the texture of decision-making. The risk is that hallucination, an error born of mathematics, becomes embedded in the very structure of governance and commerce. The danger is not only false answers but false confidence at scale, reproduced across industries and reinforced by policy.

For businesses, the consequence is that differentiation shrinks. Competing firms leaning on the same models will converge on the same strategies, the same customer interactions, the same flawed assumptions. For governments, the stakes are higher still. Policy shaped by Univergence risks mistaking statistical noise for democratic will, error for consensus.

The temptation is to insist that more scale, more data, more engineering will break the trade-off. The paper’s message is that it will not. There is no path to an all-purpose machine that generates freely without risk of fabrication. The task for leaders, then, is not to chase the impossible but to build institutions resilient to error, institutions that can withstand the inevitability of hallucination without surrendering to it.

Hard problems and hallucination traps

Some questions cannot be answered. Not by machines, not by men. The research reminds us of this blunt fact. Cryptography, computational complexity, problems whose solutions are intractable by definition: here the ground is not uncertain but impossible. A model confronted with such questions will not fall silent. It will speak. And in speaking, it will invent.

This is not malice. It is compulsion. The model has been trained to reply, and reply it must. In the absence of knowledge, it builds scaffolds out of air. Ask it to prove a theorem beyond the reach of human mathematics, and it will offer a fluent mirage. Ask it to unravel a code designed to resist solution, and it will give you words that sound like revelation but dissolve on inspection.

For businesses, the trap is obvious but tempting. Deploy a model to advise on domains where the problems themselves are unsolved, and it will deliver answers, answers that carry the same tone of authority as those grounded in fact. Executives may mistake eloquence for evidence, a polished surface for solid ground. And when decisions are made on that basis, the cost is measured not in theory but in balance sheets, reputations, lives.

For governments, the peril cuts deeper. There will be pressure to consult machines on the hardest policy questions. These are the questions where the data are contested, the future unknowable, the stakes immense. Climate projections at the edge of science. Economic models straining to see decades ahead. Strategic judgments about war and peace. In these spaces, uncertainty is not a gap to be filled but the essence of the problem. A model that fills it with manufactured confidence may soothe for a moment but will betray in the end.

The authors’ point is merciless: even a perfectly trained model cannot answer what is, in principle, unanswerable. Yet it will still try. That is its nature, and our danger.

What then should leaders do? They must resist the seduction of fluency. They must build systems that recognise when a question belongs not to the realm of solvable problems but to the realm of human judgment. Machines can illuminate patterns, sharpen analysis, extend memory. But they cannot conjure certainty where none exists. To ask them to do so is to invite error at precisely the moment when error is least tolerable.

The rhythm of history offers its own warning. Time and again, societies have placed blind faith in instruments they believed infallible: the oracle, the index, the market, the model. And time and again, those instruments have faltered where complexity exceeds comprehension. The machine is only the newest oracle. It deserves our scrutiny, not our surrender.

"I don't know" is the most valuable answer

In the litany of answers a machine can give, one remains despised though it may be the most valuable: “I don’t know.” The paper insists this is the output most deserving of recognition, and the one most harshly punished. Benchmarks give it no credit. Companies view it as weakness. Regulators rarely make space for it. And so the machines, ever attentive to incentives, learn to bury it.

The cost is profound. A model that never admits uncertainty is not a trustworthy partner; it is a compulsive talker, filling silence with invention. The absence of “I don’t know” is not strength, but fragility masquerading as authority.

The remedy is deceptively simple. Reform the benchmarks. Reward abstention when abstention is the honest choice. Give partial credit for humility. Penalise not hesitation but reckless confidence. The authors sketch a system in which errors are charged a price, while honest restraint holds its value. “With explicit confidence targets,” they write, “there is one behavior which is simultaneously optimal for all targets, outputting IDK among examples where its correctness probability is greater than the target.”

This is not mere statistical hygiene. It is cultural repair. By changing how we score, we change what we teach. We signal to machines that the right to remain silent is sometimes the greatest act of responsibility.

For business leaders, this would mean systems more likely to pause before misleading a customer, more willing to flag uncertainty in financial advice, more cautious in recommending treatments or legal actions. That pause is not inefficiency; it is prudence. It is the line between inconvenience and liability.

For governments, the stakes are higher still. An official who cannot say “I don’t know” risks policy built on sand. A machine denied the same honesty risks amplifying that danger across the whole machinery of state. To insist that every question be answered is to confuse decisiveness with wisdom.

The undervaluation of “I don’t know” is, at root, a moral failure. It is the refusal to make room for humility in systems that desperately need it. If institutions can learn to reward the courage of uncertainty, they may yet tame hallucination into something manageable. If not, they will reap what they have sown: a world of eloquent machines, brimming with confidence, and empty of truth.

The Horror. The Horror

The case has been made with mathematical precision and cultural urgency: hallucinations are not accidents, but constants. They spring from the very equations that make language models possible, and they are entrenched by the incentives of the institutions that train, grade, and deploy them. The dream of a flawless machine is itself a hallucination. Perhaps the most dangerous one of all.

For too many businesses, the strategic choice is not whether to live with hallucination, but how. Will they demand the polished mask of certainty, or accept the flawed mirror of calibration? Will they prize fluency above trust, or reward the courage of “I don’t know”? The answer will shape customer relationships, reputations, and markets.

For governments, the challenge is equally unforgiving. Regulation that confuses benchmarks with safety, or confidence with truth, will not protect the public. It will institutionalise deceit. Policy must acknowledge the statistical floor, reward humility, and build oversight where the stakes are too high for error. The test is not whether leaders can imagine a world without hallucination, but whether they can govern wisely in one where hallucination endures.

The current trajectory is toward Univergence, the convergence of decisions into a single machine-mediated reality. That reality will be shaped by the incentives we write today. If the incentives reward humility, the future may yet bend toward wisdom. If they reward bluff, the future will be built on bluff. Truths may whisper, but lies have a way of coming undone loudly.

"The sleep is still in me, and the dream is more real than the waking," Joseph Conrad once famously observed, but will it lead to horror and a heart of darkness or to resolution? That choice lies not in the silicon, but in the hands of those who would wield it. Executives deciding how to compete, governments deciding how to govern.

And, of course, societies deciding what kind of truth they are willing to live with. Look around you. Silicon Valley has already answered the last question. "I don't know" is infinitely preferable to where they have landed.

Let's talk

We would love to hear from you!

Get in touch

Unprompted