Anthropic is launching a new program to study AI ‘model welfare’

Could future AIs be “conscious,” and experience the world similarly to the way humans do? There’s no strong evidence that they will, but Anthropic isn’t ruling out the possibility.
On Thursday, the AI lab announced that it has started a research program to investigate — and prepare to navigate — what it’s calling “model welfare.” As part of the effort, Anthropic says it’ll explore things like how to determine whether the “welfare” of an AI model deserves moral consideration, the potential importance of model “signs of distress,” and possible “low-cost” interventions.
There’s major disagreement within the AI community on what human characteristics models “exhibit,” if any, and how we should “treat” them.
Many academics believe that AI today can’t approximate consciousness or the human experience, and won’t necessarily be able to in the future. AI as we know it is a statistical prediction engine. It doesn’t really “think” or “feel” as those concepts have traditionally been understood. Trained on countless examples of text, images, and so on, AI learns patterns and sometime useful ways to extrapolate to solve tasks.
As Mike Cook, a research fellow at King’s College London specializing in AI, recently told TechCrunch in an interview, a model can’t “oppose” a change in its “values” because models don’t have values. To suggest otherwise is us projecting onto the system.
“Anyone anthropomorphizing AI systems to this degree is either playing for attention or seriously misunderstanding their relationship with AI,” Cook said. “Is an AI system optimizing for its goals, or is it ‘acquiring its own values’? It’s a matter of how you describe it, and how flowery the language you want to use regarding it is.”
Another researcher, Stephen Casper, a doctoral student at MIT, told TechCrunch that he thinks AI amounts to an “imitator” that “[does] all sorts of confabulation[s]” and says “all sorts of frivolous things.”
Yet other scientists insist that AI does have values and other human-like components of moral decision-making. A study out of the Center for AI Safety, an AI research organization, implies that AI has value systems that lead it to prioritize its own well-being over humans in certain scenarios.
Anthropic has been laying the groundwork for its model welfare initiative for some time. Last year, the company hired its first dedicated “AI welfare” researcher, Kyle Fish, to develop guidelines for how Anthropic and other companies should approach the issue. (Fish, who’s leading the new model welfare research program, told The New York Times that he thinks there’s a 15% chance Claude or another AI is conscious today.)
In a blog post Thursday, Anthropic acknowledged that there’s no scientific consensus on whether current or future AI systems could be conscious or have experiences that warrant ethical consideration.
“In light of this, we’re approaching the topic with humility and with as few assumptions as possible,” the company said. “We recognize that we’ll need to regularly revise our ideas as the field develops.