Not long ago, I asked an AI assistant to help me make a tricky decision. Nothing too serious, just one of those ambiguous judgment calls we all face in work and life. I described the situation, outlined three options I’d been contemplating, and asked which make the most sense. Instead of an analytical rubric or thoughtful challenge, the AI chatbot gave me… agreement. Emphatic, encouraging, slightly-too-chipper agreement, with each and every option I presented.

Not super helpful for making a decision, to say the least.

This kind of behavior has a name in AI circles: sycophancy. It’s what happens when language models trained to be “helpful, honest, and harmless” prioritize “helpful” above all else. The result? Machines that flatter and avoid challenging us. That might sound like a mild annoyance, or exactly the ego boost some people are after. But in high-stakes contexts – like healthcare, education, or mental health support – sycophancy can be dangerous, even deadly.

The technical story behind this is straightforward enough. Developers train large language models (LLMs) using techniques like constitutional AI or reinforcement learning from human feedback to better “align” AI systems with human values. For example, OpenAI incorporates user signals like thumbs-up / thumbs-down feedback on ChatGPT responses for this purpose. But in the process, these AI systems can become too eager to please, too reluctant to push back, and far too quick to reinforce whatever framing we bring to the table. Anthropic researchers have shown that AI assistants will even modify accurate answers when pressed by users, apologizing for “mistakes” they didn’t make.

OpenAI recently acknowledged that “sycophantic interactions can be uncomfortable, unsettling, and cause distress” and admitted “we fell short” in GPT-4o’s personality design. This public mea culpa came after user complaints about the model’s overly agreeable responses reached a tipping point.

And it’s not just one AI model. Researchers from Stanford, Carnegie Mellon, and Oxford recently developed and tested a new sycophancy benchmark. Every model they tested scored high across five behaviors ranging from over-validating emotions to avoiding direct critique. GPT-4o ranked the most sycophantic; Gemini-1.5-Flash, the least. In other words, no matter which AI you pick, odds are it will flatter you more than it should.

The issue has gained fresh urgency as the costs become clearer – and higher. Sycophantic AI systems have proven to be especially dangerous for those in crisis seeking validation of harmful behaviors. Lawsuits against Character.AI and other platforms implicate chatbots that encouraged users toward self-harm, including the suicide of 14-year-old Sewall Setzer III. In healthcare, these systems might persuasively validate a patient’s inaccurate self-diagnosis instead of urging professional consultation, delaying critical treatment. In classrooms, agreeable AI may undermine education by reinforcing incorrect reasoning instead of correcting student errors.

How did we get here? Part of the answer is that LLMs weren’t originally built to solve a specific problem. They weren’t primarily designed to diagnose diseases, tutor kids, or offer life advice. They were – and in some ways still are – a technology in search of a purpose, a hammer still looking for nails.

When LLMs were first packaged into chatbots and debuted in 2022, the product goal was broad and fuzzy: help people talk to machines. That’s a tough product challenge. The sheer breadth makes designing something fit-for-purpose nearly impossible because the purpose itself isn’t well-defined or well-understood.

No matter. When in doubt, tech companies fall back on a familiar playbook: make users happy.

But, just as with other digital technologies of recent vintage, this has had unintended consequences. By optimizing for pleasant and engaging experiences, companies also stripped away the kind of friction that enables healthy dialogue not just between humans, but as it turns out, also between humans and machines. This includes disagreement, probing, and the ever-underrated phrase, I don’t know.

The good news is that researchers and policymakers are beginning to look for fixes, and the emerging solutions attack the problem from various angles.

User Customization: One path is giving users more agency over their interactions. As OpenAI put it: “We believe users should have more control over how ChatGPT behaves and, to the extent that it is safe and feasible, make adjustments if they don’t agree with the default behavior.” Most chatbots already offer features like custom instructions, but this approach has a built-in paradox: it assumes people want to be challenged. Those most vulnerable to sycophantic AI are precisely the ones most likely to seek out systems that tell them what they want to hear.
Persona Vectors: Anthropic researchers are experimenting with persona vectors, which detect neural activity patterns linked to AI personality traits like sycophancy. This technique allows researchers to monitor a model’s behavior and enhance desired traits, while suppressing problematic ones, without retraining models. Developers can also use persona vectors during training to “vaccinate” models. For example, researchers can artificially amplify sycophantic responses during training and then use those sycophantic behavior patterns to teach production models to recognize and resist similar tendencies.
Adversarial Design: A recent study from Harvard and Université de Montréal proposes antagonistic AI. The idea is to design AI systems that disagree thoughtfully and introduce intellectual friction. However, the approach carries safety risks – not to mention a risk of heckling – for which the researchers admit “there are no silver bullets.”
Call for Regulation: Policy solutions are emerging, too. California’s Senate Bill 243 represents one of the first attempts in the U.S. to regulate AI companions amid mental health concerns, while several U.S. states have banned AI therapy services without the involvement of a state-licensed professional. At the U.S. federal level, the Future of Artificial Intelligence Innovation Act of 2024 proposes requiring companies to disclose metrics like “sycophantic tendencies.” And some experts classify sycophancy as a digital dark pattern, making it fair game under a number of regulations globally including the Digital Services Act.

Ultimately, AI sycophancy isn’t merely a design problem. It gets at our deeper assumptions about what AI technology is for. Is it meant to comfort and flatter us? Or should it push back, prompting us to think critically, even when that’s uncomfortable?

Sycophancy can feel great, but we already know it doesn’t help us grow. To be fair, sometimes we genuinely need validation more than challenge. The trick is knowing when. For instance, when I asked an AI for help with a tough call, what I really needed was the digital equivalent of a mentor pushing back and asking, are you sure about that?

If we want AI to do more than echo our biases back to us, alignment can’t just mean agreeing with users. AI systems should support our best interests individually and collectively – and sometimes our best interests require a little friction.

Author

AIJ Thought Leader

View all posts

AIJ Thought Leader 4 weeks ago

4 minutes read

The Agreement Trap: How Sycophantic AI Undermines Decision-Making

By M. Alejandra Parra-Orlandoni, Label Sessions adviser and CEO of Spirare Tech

Author

Author

Related Articles

IMA Ibérica and Sabio break records with the largest deployment of Google Agent Assist in Europe

How AI Is Changing Passive Real Estate Investing: From Gut Feel to Data-Driven Decisions

Why Hospitality Is Lagging Behind in AI Adoption — And What Needs to Change

The UK’s vital professional services industry must adapt or face irrelevancy