Andy Monahan posted this comment on one of my posts recently -
When presented a relatively simple (less than 100) list of companies and asked to help enrich with some additional information via public sources, ChatGPT might take hours for something it says it will get back in 15 minutes…and upon repeated prompting never complete through a series.. a very frustrating responses that include fake data, a sample of three and so on.
It reminded me of a post I saw in a newsletter recently that talked about 2 research papers - one from Apple and another from Meta/Nvdia etc. The first one was called - The Illusion of Thinking and the second - How much do language models memorize.
I will start by saying that I did not read the research papers and I am not sure I would understand them if I read them but there were a couple lines in the newsletter that made me perk up -
They describe the same underlying breakdown – a model’s coping mechanism when pushed past its fundamental capacity. The model simplifies, guesses, or shuts down – all while still outputting something that looks fluent and confident.
This sounded exactly like my experience with LLMs and what Andy was pointing out with his example.
I do not expect most people who read my newsletter to actually read and understand the research papers, but I was extremely curious to understand more and I did the second best thing - I used Claude to chat against these papers and ask my questions in a lay man's voice and got responses that made me understand what was going on.
Here is what I figured out -
AI's most dangerous feature? It never admits when it's struggling.
The scary part? You'd never know.
The AI keeps writing fluently, keeps sounding authoritative, keeps giving you answers. But behind the scenes, it has essentially given up on precision and is just pattern-matching its way through responses.
Here's what's happening:
When AI gets overwhelmed, it's like your most articulate employee who's been given too much work. Instead of admitting they're drowning, they start cutting corners while maintaining the same confident presentation style. They still show up to meetings, still speak eloquently, still submit reports on time—but the quality of thinking behind it all has quietly tanked. They're now just rephrasing things they've said before and making educated guesses, but their delivery is so polished you'd never suspect they've essentially checked out mentally.
It's like a GPS that's lost satellite signal but keeps giving you directions anyway. Instead of saying "signal lost," it starts making increasingly wild guesses about where you should turn, but delivers each instruction in the same calm, authoritative voice. You're following confidently down random streets while the system maintains its usual "turn right in 500 feet" certainty.
Why this matters for everyone using AI:
Your AI assistant might be "phoning it in" on complex tasks
Standard testing won't catch this degradation
Making AI bigger doesn't fix this fundamental flaw
We're deploying systems that hide their own limitations
The real problem isn't that AI has limits—it's that AI never tells you when it's reached them.
So what can we do about this -
As an AI User, the most important thing we can do is become an active skeptic rather than a passive consumer. Ask follow-up questions that probe the reasoning behind answers - we can't just accept the first response we get. We need to test consistency by rephrasing our questions in different ways and checking whether the AI gives stable, coherent responses across variations. For any critical decisions, we need to verify the AI's recommendations independently through other sources or methods. Most importantly, we need to learn to spot overconfidence by being suspicious of answers that seem too certain, especially for complex or nuanced topics where uncertainty should be expected.
As a Business Leader, our role is to build systematic safeguards around AI deployment in our organization. We need to implement human oversight for any AI-assisted decisions that have meaningful consequences, ensuring that experienced professionals review and validate AI recommendations before action is taken. We need to create robust feedback loops that allow us to catch AI errors quickly - this means establishing processes for monitoring outcomes, gathering user reports, and continuously assessing AI performance in real-world applications. We need to train our teams to recognize AI limitations and understand when to trust, question, or override AI suggestions. Most crucially, we should start small by deploying AI in low-risk scenarios first, allowing us to understand its behavior patterns and failure modes before scaling up to more critical applications.
Curious what your experience is..