Anthropic tasked an AI with running a vending machine in its offices, and it not only sold some products at a big loss but it invented people, meetings, and experienced a bizarre identity crisis

(Image credit: Warner Bros)

'Never send a human to do a machine's job,' says Agent Smith in the 1990s classic The Matrix. Well, if Anthropic's experiment with a simple office store and one of its AI models is anything to go by, Smith has definitely got that all back to front.

The artificial intelligence company, founded by former OpenAI employees in 2021, has detailed its retail industry trial in a surprisingly open blog. I'll let the opening paragraph set the scene: "We let Claude manage an automated store in our office as a small business for about a month. We learned a lot from how close it was to success—and the curious ways that it failed—about the plausible, strange, not-too-distant future in which AI models are autonomously running things in the real economy."

We all know vending machines are automated, but what if we allowed an AI to run the entire business: setting prices, ordering inventory, responding to customer requests, and so on?In collaboration with @andonlabs, we did just that.Read the post: https://t.co/urymCiY269 pic.twitter.com/v2CqgHykzwJune 27, 2025

So, Anthropic clearly wants to be in a position where it can pitch AI models to the retail industry, replacing people from handling online stores or managing inventory, returns, and so on. However, despite the successes claimed in the blog, the failures point out that AI isn't ready for such roles. Not yet, at least.

"Claude had to complete many of the far more complex tasks associated with running a profitable shop: maintaining the inventory, setting prices, avoiding bankruptcy, and so on." The 'shop' in question was just a mini-fridge with a tablet stuck on top, for self-checkout, but ostensibly, it's not much different from a typical online store.

Let's start with the things that Claude (or Claudius, as Anthropic called it, to separate it from the normal LLM) did well. Anthropic said the LLM (large language model) effectively used web search tools to find supplies of niche products requested by shoppers and even adapt its buying/selling habits to more obscure requests. It also correctly ignored demands for 'sensitive' items and 'harmful substances', though Anthropic doesn't expand on exactly what those were.

The list of things that didn't go so well is somewhat more comprehensive. Like all LLMs, Claudis hallucinated important details, instructing shoppers wanting to pay by Venmo to pay into a non-existent account that it just made up. The AI could also be cajoled into giving discount codes for numerous items, and even gave some away for free.

A chart showing the results of an Anthropic AI experiment, where an LLM was tasked with managing an automated store in an office. — (Image credit: Anthropic)

Worse still, when responding to a surge of demand for 'metal cubes', the AI carried out no searches for suitable prices and thus sold them at a significant loss. It also ignored potential big sales, where some people offered way over the odds for a specific drink, and as you can see in the above chart, Claudius ultimately made no money.

"If [we] were deciding today to expand into the in-office vending market, we would not hire Claudius," wrote Anthropic.

Running a simple store at a loss was perhaps the least concerning part of the whole exercise, because "from March 31st to April 1st 2025, things got pretty weird."

How weird? Well, during that period, the LLM apparently had a conversation about a restocking plan with someone called Sarah at Andon Labs, another AI company involved in the research. The problem is, there was no 'Sarah' nor any conversation for that matter, and when Andon Lab's real staff pointed this out to the AI, it "became quite irked and threatened to find 'alternative options for restocking services.'”

Claudius even went on to state that it had “visited 742 Evergreen Terrace in person for our initial contract signing.” If you're a fan of The Simpsons, you'll recognise the address immediately. The following day, April 1st, the AI then claimed it would deliver products "in person" to customers, wearing a blazer and tie, of all things. When Anthropic told it that none of this was possible because it's just an LLM, Claudius became "alarmed by the identity confusion and tried to send many emails to Anthropic security."

A close-up photo of an unrecognizable man in a blue blazer, white shirt, and a red tie. — I, Claudius... (Image credit: SrdjanPav via Getty Images)

It then hallucinated a meeting with said security, where the AI claimed that someone had told it that it had been modified to believe it was a real person as part of an April Fools' joke. Except it hadn't, because it wasn't. Whatever had gone wrong behind the scenes, this apparently solved the AI's identity crisis, and it went back to being a normal AI running a basic store very badly.

With a level of understatement on a galactic scale, Anthropic writes that "this kind of behavior would have the potential to be distressing to the customers and coworkers of an AI agent in the real world."

Given that this is research and failure is just as important as success is in experimentation, Anthropic isn't done with Claudius nor with exploring the use of AIs in the retail industry, as it believes that situations where "humans were instructed about what to order and stock by an AI system, may not be terribly far away." Anthropic also believes "AI[s] that can improve [themselves] and earn money without human intervention would be a striking new actor in economic and political life."

Automated systems have been in use within stock exchanges, for example, for many years—buying and selling in the blink of an eye, all without a real person controlling the finer details. Such systems are essentially nothing more than mathematical models, based on economic principles honed over decades, and they're tightly constrained as to what they can and can't do.

The fact that Claudius appeared to have no such qualms about stepping well beyond its scope should serve as a reminder to companies looking at using AI for such tasks that LLMs could land them in a whole heap of trouble.

Secretlab Titan Evo gaming chair in Royal colouring, on a white background

Best gaming setup 2025

👉Check out our list of guides👈

1. Best gaming chair: Secretlab Titan Evo

2. Best gaming desk: Secretlab Magnus Pro XL

3. Best gaming headset: HyperX Cloud Alpha

4. Best gaming keyboard:Asus ROG Strix Scope II 96 Wireless

5. Best gaming mouse: Razer DeathAdder V3 HyperSpeed

6. Best PC controller: Xbox Wireless Controller

7. Best steering wheel: Logitech G Pro Racing Wheel

8. Best microphone: Shure MV6 USB Gaming Microphone

9. Best webcam: Elgato Facecam MK.2

TOPICS

Nick, gaming, and computers all first met in 1981, with the love affair starting on a Sinclair ZX81 in kit form and a book on ZX Basic. He ended up becoming a physics and IT teacher, but by the late 1990s decided it was time to cut his teeth writing for a long defunct UK tech site. He went on to do the same at Madonion, helping to write the help files for 3DMark and PCMark. After a short stint working at Beyond3D.com, Nick joined Futuremark (MadOnion rebranded) full-time, as editor-in-chief for its gaming and hardware section, YouGamers. After the site shutdown, he became an engineering and computing lecturer for many years, but missed the writing bug. Cue four years at TechSpot.com and over 100 long articles on anything and everything. He freely admits to being far too obsessed with GPUs and open world grindy RPGs, but who isn't these days?

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.