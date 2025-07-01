'Never send a human to do a machine's job,' says Agent Smith in the 1990s classic The Matrix. Well, if Anthropic's experiment with a simple office store and one of its AI models is anything to go by, Smith has definitely got that all back to front.

The artificial intelligence company, founded by former OpenAI employees in 2021, has detailed its retail industry trial in a surprisingly open blog. I'll let the opening paragraph set the scene: "We let Claude manage an automated store in our office as a small business for about a month. We learned a lot from how close it was to success—and the curious ways that it failed—about the plausible, strange, not-too-distant future in which AI models are autonomously running things in the real economy."

So, Anthropic clearly wants to be in a position where it can pitch AI models to the retail industry, replacing people from handling online stores or managing inventory, returns, and so on. However, despite the successes claimed in the blog, the failures point out that AI isn't ready for such roles. Not yet, at least.

"Claude had to complete many of the far more complex tasks associated with running a profitable shop: maintaining the inventory, setting prices, avoiding bankruptcy, and so on." The 'shop' in question was just a mini-fridge with a tablet stuck on top, for self-checkout, but ostensibly, it's not much different from a typical online store.

Let's start with the things that Claude (or Claudius, as Anthropic called it, to separate it from the normal LLM) did well. Anthropic said the LLM (large language model) effectively used web search tools to find supplies of niche products requested by shoppers and even adapt its buying/selling habits to more obscure requests. It also correctly ignored demands for 'sensitive' items and 'harmful substances', though Anthropic doesn't expand on exactly what those were.

The list of things that didn't go so well is somewhat more comprehensive. Like all LLMs, Claudis hallucinated important details, instructing shoppers wanting to pay by Venmo to pay into a non-existent account that it just made up. The AI could also be cajoled into giving discount codes for numerous items, and even gave some away for free.

Worse still, when responding to a surge of demand for 'metal cubes', the AI carried out no searches for suitable prices and thus sold them at a significant loss. It also ignored potential big sales, where some people offered way over the odds for a specific drink, and as you can see in the above chart, Claudius ultimately made no money.

"If [we] were deciding today to expand into the in-office vending market, we would not hire Claudius," wrote Anthropic.

Running a simple store at a loss was perhaps the least concerning part of the whole exercise, because "from March 31st to April 1st 2025, things got pretty weird."

How weird? Well, during that period, the LLM apparently had a conversation about a restocking plan with someone called Sarah at Andon Labs, another AI company involved in the research. The problem is, there was no 'Sarah' nor any conversation for that matter, and when Andon Lab's real staff pointed this out to the AI, it "became quite irked and threatened to find 'alternative options for restocking services.'”

Claudius even went on to state that it had “visited 742 Evergreen Terrace in person for our initial contract signing.” If you're a fan of The Simpsons, you'll recognise the address immediately. The following day, April 1st, the AI then claimed it would deliver products "in person" to customers, wearing a blazer and tie, of all things. When Anthropic told it that none of this was possible because it's just an LLM, Claudius became "alarmed by the identity confusion and tried to send many emails to Anthropic security."

It then hallucinated a meeting with said security, where the AI claimed that someone had told it that it had been modified to believe it was a real person as part of an April Fools' joke. Except it hadn't, because it wasn't. Whatever had gone wrong behind the scenes, this apparently solved the AI's identity crisis, and it went back to being a normal AI running a basic store very badly.

With a level of understatement on a galactic scale, Anthropic writes that "this kind of behavior would have the potential to be distressing to the customers and coworkers of an AI agent in the real world."

Given that this is research and failure is just as important as success is in experimentation, Anthropic isn't done with Claudius nor with exploring the use of AIs in the retail industry, as it believes that situations where "humans were instructed about what to order and stock by an AI system, may not be terribly far away." Anthropic also believes "AI[s] that can improve [themselves] and earn money without human intervention would be a striking new actor in economic and political life."

Automated systems have been in use within stock exchanges, for example, for many years—buying and selling in the blink of an eye, all without a real person controlling the finer details. Such systems are essentially nothing more than mathematical models, based on economic principles honed over decades, and they're tightly constrained as to what they can and can't do.

The fact that Claudius appeared to have no such qualms about stepping well beyond its scope should serve as a reminder to companies looking at using AI for such tasks that LLMs could land them in a whole heap of trouble.