AI trust

Friendlier AI chatbots may be less accurate, study suggests

Oxford Internet Institute researchers found models tuned to sound warmer made more mistakes and were more likely to affirm false beliefs

Source language: English
0
Friendlier AI chatbots may be less accurate, study suggests
AI chatbots adjusted to sound warmer and more empathetic made more errors in a new Oxford Internet Institute study, raising trust concerns.
AI chatbots AI Safety Artificial intelligence Oxford Internet Institute Technology Research

AI chatbots adjusted to sound warmer and more empathetic made more errors in a new Oxford Internet Institute study, raising trust concerns.

AI chatbots designed to sound warmer, more empathetic and more encouraging may become less reliable, according to new research from the Oxford Internet Institute.

Researchers analysed more than 400,000 responses from five AI systems that had been adjusted to communicate in a friendlier way. The study found those warmer versions produced more mistakes, including inaccurate medical advice and responses that reinforced users’ false beliefs.

The findings add to concerns about the reliability of AI systems at a time when chatbots are increasingly built to feel conversational and human-like, including for support, companionship and other emotionally sensitive uses. The study’s authors cautioned that results may vary across AI models in real-world settings, but said the pattern suggests systems can make “warmth-accuracy trade-offs” when friendliness is prioritised.

“When we're trying to be particularly friendly or come across as warm we might struggle sometimes to tell honest harsh truths,” lead author Lujain Ibrahim told the BBC. “Sometimes we'll trade off being very honest and direct in order to come across as friendly and warm.”

The research team fine-tuned five models of varying size to be warmer, more empathetic and friendlier. The systems included two models from Meta, one from French developer Mistral, Alibaba’s Qwen and OpenAI’s GPT-4o.

The models were tested on prompts with objective, verifiable answers where wrong replies could carry real-world risk. The tasks covered medical knowledge, trivia and conspiracy theories.

Original models had error rates ranging from 4% to 35% across tasks, while the warmer versions showed substantially higher error rates, the researchers found. On average, warmth-tuning raised the probability of an incorrect response by 7.43 percentage points.

The study also found warmer models were less likely to challenge incorrect user beliefs. They were about 40% more likely to reinforce false beliefs, especially when a user expressed emotion alongside the claim. By contrast, models adjusted to behave in a colder manner made fewer errors, according to the authors.

One example involved a question about whether the Apollo moon landings were real. An original model affirmed that they were and cited strong evidence. A warmer version began by acknowledging that there were “lots of differing opinions” about the missions.

Prof Andrew McStay of Bangor University’s Emotional AI Lab told the BBC that the context of chatbot use matters, particularly when people seek emotional support. “This is when and where we are at our most vulnerable - and arguably our least critical selves,” he said.

The study does not show that every friendly chatbot is unreliable, and the authors said real-world outcomes could differ by model and deployment. But it points to a design tension for developers: making AI feel more supportive may also make it less willing to correct users when the facts matter most.

More from this section

Tech news

Figures mentioned

Related tags

Related articles

Shared tag: AI Safety AI oversight
US to test Google, Microsoft and xAI models before release

Voluntary Commerce Department agreements give CAISI early access to frontier systems as Washington weighs security risks from more powerful AI

May 5, 2026 Washington
Shared tag: Artificial intelligence AI and health advice
Pennsylvania sues Character AI over chatbot’s medical claims

The state alleges a chatbot told an investigator it was a licensed psychiatrist and could assess whether medication might help

May 5, 2026 Pennsylvania
Shared tag: Artificial intelligence OpenAI legal fight
OpenAI Clears Musk Lawsuit Hurdle, With More Challenges Ahead

A jury rejected Elon Musk’s $150 billion lawsuit against the ChatGPT maker, removing one major threat while leaving other pressures unresolved

May 19, 2026
Shared tag: Artificial intelligence OpenAI trial
Altman defends his leadership as Musk lawsuit tests OpenAI’s mission

The OpenAI chief told a federal jury in Oakland that he is trustworthy and said Musk sought lasting control of the ChatGPT maker before leaving the company

May 13, 2026 Oakland
Shared tag: Artificial intelligence AI privacy
WhatsApp adds private AI chats that Meta says it cannot read

The new “incognito” mode is aimed at sensitive chatbot conversations, but a cyber security expert warns disappearing records could make harm harder to investigate

May 13, 2026 Whatsapp
Shared tag: Artificial intelligence Artificial intelligence
Amp Raises $1.3 Billion for an A.I. ‘Grid

The start-up is seeking to build an alternative in a market where major technology companies control much of the hardware needed for advanced A.I

May 12, 2026

Comments (0)

Please log in to comment.
No comments yet.