A.I. startup Anthropic is best known for its Claude chatbot. Courtesy Anthropic
Last year, Anthropic hired its first-ever A.I. welfare researcher, Kyle Fish, to examine whether A.I. models are conscious and deserving of moral consideration. Now, the fast-growing startup is looking to add another full-time employee to its model welfare team as it doubles down on efforts in this small but burgeoning field of research.
The question of whether A.I. models could develop consciousness—and whether the issue warrants dedicated resources—has sparked debate across Silicon Valley. While some prominent A.I. leaders warn that such inquiries risk misleading the public, others, like Fish, argue that it’s an important but overlooked area of study.
“Given that we have models which are very close to—and in some cases at—human-level intelligence and capabilities, it takes a fair amount to really rule out the possibility of consciousness,” said Fish on a recent episode of the 80,000 Hours podcast.
Anthropic recently posted a job opening for a research engineer or scientist to join its model welfare program. “You will be among the first to work to better understand, evaluate and address concerns about the potential welfare and moral status of A.I. systems,” the listing reads. Responsibilities include running technical research projects and designing interventions to mitigate welfare harms. The salary for the role ranges between $315,000 and $340,000.
Anthropic did not respond to requests for comment from Observer.
The new hire will work alongside Fish, who joined Anthropic last September. He previously co-founded Eleos AI, a nonprofit focused on A.I. wellbeing, and co-authored a paper outlining the possibility of A.I. consciousness. A few months after Fish’s hiring, Anthropic announced the launch of its official research program dedicated to model welfare and interventions.
As part of this program, Anthropic recently gave its Claude Opus 4 and 4.1 models the ability to exit user interactions deemed harmful or abusive, after observing “a pattern of apparent distress” during such exchanges. Instead of being forced to remain in these conversations indefinitely, the models can now end communications they find aversive.
For now, the bulk of Anthropic’s model welfare interventions will remain low-cost and designed to minimize interference with user experience, Fish told 80,000 Hours. He also hopes to explore how model training might raise welfare concerns and experiment with creating “some kind of model sanctuary”—a controlled environment akin to a playground where models can pursue their own interests “to the extent that they have them.”
Anthropic may be the most public major company investing in model welfare, but it’s not alone. In April, Google DeepMind posted an opening for a research scientist to explore topics including “machine consciousness,” according to 404 Media.
Still, skepticism persists in Silicon Valley. Mustafa Suleyman, CEO of Microsoft AI, argued last month that model welfare research is “both premature and frankly dangerous.” He warned that encouraging such work could fuel delusions about A.I. systems, and that the emergence of “seemingly conscious A.I.” could prompt calls for A.I. rights.
Fish, however, maintains that the possibility of A.I. consciousness shouldn’t be dismissed. He estimates a 20 percent chance that “somewhere, in some part of the process, there’s at least a glimmer of conscious or sentient experience.”
As Fish looks to expand his team with a new hire, he also hopes to broaden the scope of Anthropic’s welfare agenda. “To date, most of what we’ve done has had a flavor of identifying low-hanging fruit where we can find it and then pursuing those projects,” he said. “Over time, we hope to move more in the direction of really aiming at answers to some of the biggest-picture questions and working backwards from those to develop a more comprehensive agenda.”