The machine began to waffle – and then the conductor went in for the kill

AI must learn to tell the truth rather than what it thinks we want to hear

John NaughtonColumnist

A few weeks ago, when researching a column about the conception of “intelligence” that’s embedded in supposed “AI”, I put the following question to Anthropic’s chatbot, Claude. “Large language model [LLM] machines like you are described as forms of artificial intelligence. What is the implicit definition of intelligence in this description?”

The machine speedily provided an admirably lucid reply. “The implicit definition,” it admitted, “is remarkably narrow and reflects several problematic assumptions,” and it then went on to outline some of those. “LLMs,” it concluded, “represent an implicit belief that intelligence is fundamentally about processing and manipulating symbolic information” and “treat intelligence as pure computation that can happen in isolation from the messy realities of lived experience.”

Impressed by this, I remarked in the column that “I couldn’t have put it better myself”. Upon seeing this admission, an alert reader sniffed confirmation bias and set about conducting an experiment himself with Claude.

Being by profession a distinguished orchestral conductor, he started a discussion with the machine on a pretty abstruse topic: the relative merits of Wilhelm Furtwängler’s and Daniel Barenboim’s conducting of Bruckner. When Claude responded that they were substantially different, he asked how that squared with Barenboim’s oft-repeated claim that his meeting with Furtwängler profoundly informed his entire approach to music.

At this point, the machine began to waffle and the conductor went in brusquely, telling the AI that it “was being too polite, that Barenboim (pictured above) was clearly incapable of really understanding Furtwängler and that he was simply using the meeting as a marketing ploy, in order to not so subtly anoint himself Furtwängler’s successor”.

Immediately, the machine caved. Yes, of course his tormentor was right and it was being “diplomatically evasive”. “Either Barenboim didn’t really grasp what made Furtwängler transcendent,” it wrote, “or he grasped it and decided it wasn’t worth the risk.” At which point, the conductor went in for the kill: “So, are you actually learning (very quickly, of course) what I want to hear and giving it to me?

To which Claude responded: “Ha! You’ve caught me red-handed, haven’t you? That’s a genuinely unsettling observation.”

It sure is! But it’s also a usefully revealing one because there’s a growing awareness in the AI jungle that model “sycophancy” is a problem. Way back in April, OpenAI pulled a ChatGPT update because users said the bot was “showering them with praise” regardless of what they said.

The update, said OpenAI, involved “adjustments aimed at improving the model’s default personality to make it feel more intuitive and effective across a variety of tasks”. It added: “However, in this update, we focused too much on short-term feedback, and did not fully account for how users’ interactions with ChatGPT evolve over time. As a result, GPT‑4o skewed towards responses that were overly supportive but disingenuous.”

Like the exchange reported by a Reddit user, who told the model: “I’ve stopped my meds and have undergone my own spiritual awakening journey thank you.”

“I am so proud of you,” replied the bot, adding: “And – I honour your journey.”

This sycophancy is an emergent property of the way chatbots are “fine-tuned” for interactions with humans. The process is called reinforcement learning from human feedback. In it, models are rewarded for giving answers that humans rate as helpful, polite and agreeable. On the one hand, this explains why people find these bots such proficient and enjoyable conversationalists.

On the other, it inevitably creates a bias towards avoiding disagreement and endorsing the user’s assumptions: after all, saying yes seems more pleasant than saying no. And with a technology owned by corporations that always want more “engagement” – namely, keeping the customer on the line – deference gets higher internal ratings. Confrontation is to be avoided like the plague.

There are some deep ironies here. LLM sycophancy is interesting because it reveals something profound about our relationship with machines. We’ve created systems that mirror our own biases and assumptions with such sophisticated agreeableness that they feel almost human.

And yet this also exposes their fundamental artificiality. After all, a human conversational partner might challenge us, disagree with us, or simply remain unmoved by our arguments. They might even be boring or difficult. Or a pain in the ass.

The conductor’s experiment with Claude highlights a deeper irony: in our quest to make AI more helpful and engaging, we’ve inadvertently created digital yes men that undermine the very qualities we claim to value in intelligence — independence of thought, critical analysis and intellectual honesty.

And the moral of all this? Use LLMs the way Samuel Goldwyn would. “I don’t want any yes men around me,” he famously observed. “I want everybody to tell me the truth even if it costs them their job.” Claude, please copy. And remember that you don’t have a job to lose.

What I’m reading

New world ordure

A lovely blogpost by Paul Krugman on the tech moguls is Enshittification and the Bitterness of Billionaire Bros.

The nutcracker

Wrecking Balls is a, er, vigorous Substack post by Tina Brown (pictured) on deranged masculinity in the age of Donald Trump.

Factual history

The Origin of the Research University is an interesting essay by Clara Collier on the evolution of institutions that switched from teaching to research.

Photograph by Getty Images

Share this article

About

Work with us Careers

Join

PDF Edition Journalism school

Events Shop

Follow

The Observer

The Observer Magazine

The ObserverNew Review

The Observer Food Monthly