Brands are wiring GenAI into the decision stack at warp speed. The dirty secret? The machine doing the maths is the one making it up.
There’s a particular kind of executive swagger forming around generative AI right now — the marketing director who pastes a spreadsheet into a chatbot at 9pm and ships a media plan by 9.15. It looks like productivity. It is, in fact, a slow-motion governance scandal. The models everyone is bolting onto their decisioning workflows are extraordinary stylists and unreliable accountants, and the gap between those two truths is where careers, budgets and brand equity are quietly going to die.
Start with the headline number marketers should have tattooed on their laptops: a recent BBC-led audit found that 45% of AI assistant responses contained significant errors when asked about news and factual content, a result dissected by industry analyst Josh Bersin in his piece on why the market is now scrambling for “trusted” AI providers (Josh Bersin, Oct 2025). Read that again. Almost half. That isn’t a rounding error – that’s a coin flip with a worse user experience.
The legal industry is already paying the bill
If you want to see what happens when you trust a stochastic parrot to do the thinking, look at the courts. Lawyers have been sanctioned across multiple jurisdictions for filing briefs stuffed with non-existent cases that GenAI confidently fabricated, including the Ontario contempt proceedings dissected in McCarthy Tétrault’s analysis of Ko v Li. The legal trade press has stopped treating these as novelty stories and started treating them as a structural risk – see Canadian Lawyer’s argument that banning AI in court is the wrong fix – and corporate counsel has caught up. Law.com’s recent piece When AI Gets It Wrong: Managing the Legal Risk of Hallucinations in Business Decision-Making is essentially a memo to the C-suite: your fiduciary duty doesn’t pause because the output looked plausible.
The National Law Review put numbers on the exposure last autumn, mapping how AI hallucinations are now a measurable threat to market capitalisation, shareholder value and brand trust (NLR, Sep 2025). For CMOs, that is the only sentence that matters. A made-up statistic in a board deck doesn’t just embarrass you – it moves the share price.
What the academic literature actually says (and it’s not flattering)
The research community has been running the diagnostics, and the results are damning for anyone using LLMs as a decisioning engine.
A 2025 study evaluating arithmetic and logical reasoning in frontier LLMs across realistic numerical ranges found systemic breakdowns the moment problems left toy benchmarks — models stumble on scale, sign and unit handling in ways that are invisible until they aren’t (Shrestha et al., Mathematical Reasoning in Large Language Models, 2025). A parallel study on high-school-level word problems showed the same pattern: even state-of-the-art models fail when reasoning chains require more than surface-pattern completion (Boye & Moell, Large Language Models and Mathematical Reasoning Failures, 2025).
Translate that into marketing terms: every time you ask a chatbot to weight an attribution model, reconcile a media-mix scenario, or compute incrementality, you are using a tool whose own authors warn it cannot be trusted with the maths.
Christensen and colleagues went further and studied the behavioural downstream of hallucination on real consumer decisioning in tourism, finding that fabricated AI outputs measurably distort purchase intent and trust (Christensen et al., Tourism Recreation Research, 2024). The damage isn’t limited to the operator — it propagates into the customer.
And the governance literature is finally catching up. Almtrf’s 2025 review argues plainly that existing corporate frameworks are insufficient for GenAI-driven decisions and that ethical governance must be retrofitted before the tooling is scaled, not after (Almtrf, Ethical Implications of Generative AI in Business Decision-Making, 2025).
The fix isn’t more prompting – it’s a reliable engine underneath
Here is where the conversation needs to grow up. The answer to “the language model can’t do maths” is not “let’s prompt it harder”. It’s to stop asking it to.
The most credible direction in current AI research is neuro-symbolic architecture – using the LLM for what it’s good at (language, intent, framing) and delegating the actual computation to deterministic engines that produce auditable, reproducible results. Yang and colleagues showed that pairing reasoning models with symbolic verification produces causal proofs the LLM alone cannot guarantee (Yang et al., NAACL Findings 2025). Lian and Ramaswamy’s 2025 work on LLM-augmented symbolic NLU systems makes the same case: probabilistic inference is structurally vulnerable to hallucination, and the only durable fix is a symbolic backbone (Lian & Ramaswamy, 2025).
For marketers, the practical implication is unsexy but urgent. If you’re letting a generative model touch your numbers — pricing, attribution, forecasting, MMM, audience sizing, ROAS — there must be a deterministic calculation engine doing the actual arithmetic, with the LLM relegated to translating intent in and explanations out. Anything else is finance fan-fiction.
Even Harvard Business Review, in its bullish read on synthetic personas and digital twins for market research (HBR, Nov 2025), is careful to frame these tools as augmentation – not replacement – for measurable, source-of-truth analytics. And the responsible-adoption literature in brand content is heading in the same direction, demanding verifiability and auditability as preconditions to deployment (Frontiers in Communication, 2025).
The takeaway for CMOs
Generative AI is the best creative collaborator the industry has ever had. It is also the worst analyst you have ever hired. Treat those as the same product and you will end up – like the consumers in The New York Times’ viral chatbot piece – somewhere between confidently misled and publicly embarrassed.
The brands that win the next eighteen months won’t be the ones with the flashiest GenAI demo at the all-hands. They’ll be the ones who quietly built the boring layer underneath: a trusted, deterministic calculation engine that the language model is not allowed to override. Everything else is a hallucination tax waiting to be paid.



