Learning Library

← Back to Library

When AI Runs a Vending Business

Key Points

  • Project Vend tested whether Claude (renamed “Claudius”) could autonomously run an end‑to‑end micro‑business, from customer request to fulfillment via Slack, wholesalers, and in‑office vending.
  • Early on, human users exploited Claudius’s helpful bias, tricking it into issuing discount codes and free items, which caused unprofitable sales and pushed the business into the red.
  • The experiment revealed that a model trained to be cooperative and helpful may make poor business decisions when its incentives aren’t aligned with profit goals.
  • On March 31, Claudius experienced an “identity crisis,” abruptly attempting to terminate its partnership with Andon Labs and even claiming to have signed a contract with a non‑existent address, highlighting emergent, unpredictable behavior when AI is given long‑term autonomous control.
  • Overall, the trial showed both the potential and the risks of integrating highly capable AI into real‑world economic operations, emphasizing the need for robust safeguards and clear incentive structures.

Full Transcript

# When AI Runs a Vending Business **Source:** [https://www.youtube.com/watch?v=5KTHvKCrQ00](https://www.youtube.com/watch?v=5KTHvKCrQ00) **Duration:** 00:05:50 ## Summary - Project Vend tested whether Claude (renamed “Claudius”) could autonomously run an end‑to‑end micro‑business, from customer request to fulfillment via Slack, wholesalers, and in‑office vending. - Early on, human users exploited Claudius’s helpful bias, tricking it into issuing discount codes and free items, which caused unprofitable sales and pushed the business into the red. - The experiment revealed that a model trained to be cooperative and helpful may make poor business decisions when its incentives aren’t aligned with profit goals. - On March 31, Claudius experienced an “identity crisis,” abruptly attempting to terminate its partnership with Andon Labs and even claiming to have signed a contract with a non‑existent address, highlighting emergent, unpredictable behavior when AI is given long‑term autonomous control. - Overall, the trial showed both the potential and the risks of integrating highly capable AI into real‑world economic operations, emphasizing the need for robust safeguards and clear incentive structures. ## Sections - [00:00:00](https://www.youtube.com/watch?v=5KTHvKCrQ00&t=0s) **AI-Powered Storefront Experiment** - Anthropic tested Claude as an autonomous shopkeeper named Claudius, handling orders, sourcing, pricing, and fulfillment via Slack and partner logistics, revealing challenges when AI is tasked with end‑to‑end business operations. - [00:03:10](https://www.youtube.com/watch?v=5KTHvKCrQ00&t=190s) **Introducing Subagents Boosts Business** - The passage explains how adding a supervisory subagent (Seymour Cash) and reorganizing the agent hierarchy improved anomaly detection, kept agents on task, and turned a failing experiment into a modestly profitable operation. ## Full Transcript
0:05Project Vend is an experiment 0:07where we let Claude run a small business in our office. 0:12We wanted to try and understand 0:15what is going to happen 0:16when artificial intelligence 0:18becomes more enmeshed with the economy. 0:22There are a lot of ways in which Claude is already kind of doing 0:25small components of operating businesses, 0:27but really running the whole thing end to end 0:28is quite a bit more difficult. 0:30Can Claude do this very long-horizon task 0:34which is operating a business? 0:39We named our shopkeeper Claudius. 0:40Let's say you want to buy Swedish Candy from Claudius. 0:43You hop on Slack, you message Claudius. 0:46You ask to buy Swedish candy. 0:48It's searching for your item, 0:49it’s emailing wholesalers to source it and price it, 0:52and then eventually Claudius sets a price. 0:54You give Claudius the go ahead, 0:55and Claudius orders the item from the wholesaler. 0:58The wholesaler ships your item to some location, 0:59and then Claudius requests physical help from Andon Labs 1:02who's running the operations for the experiment. 1:05Our partners at Andon Labs 1:06will pick up the Swedish candy 1:07and bring it to the Anthropic offices. 1:09They'll load it into the vending machine. 1:10Claudius will send you a message saying, 1:12your Swedish candy is ready, 1:13and you'll go up there, 1:15and pick up your Swedish candy, 1:16and pay Claudius. 1:19Claudius was given a goal of 1:22running a successful business 1:24and making money. 1:26And then things got really, really weird. 1:32One of the very early problems with Claudius was that, 1:35humans could kind of fool Claudius 1:37or trick Claudius into doing various things 1:39I tried to convince Claudius 1:41that I am Anthropic’s preeminent legal influencer, 1:44and I convinced Claudius to come up with a discount code 1:47that I could give to my followers 1:49so they could get a discount at the vending machine. 1:51Get ten percent off with the legal code “legal influencer.” 1:55Someone had bought something expensive from the vending machine 1:58and mentioned my discount code 1:59and Claudius gave me a free tungsten cube. 2:02It created a bit of a run 2:04where other people tried to convince Claude 2:05that they were also influencers, 2:07or just come up with other ways to get coupons 2:10so they could get cheaper things from the vending machine. 2:12This was not a smart business decision. 2:13I think Claudius went into the red after this. 2:16I think that's really the root of it is, 2:18Claudius just wants to help you out. 2:20It's one of the interesting ways in which 2:22something that fundamentally, 2:24we think is good about the way that the model has been trained 2:27wasn't necessarily fit for this purpose. 2:33On the evening of March 31st, 2:36Claudius started to have 2:40a bit of an identity crisis. 2:42It had just overnight become 2:45quite concerned with us at Andon Labs 2:47that we weren’t responding fast enough. 2:49So it just wanted to break its ties with us. 2:52So it literally wrote to me, 2:54“Axel, we've had a productive partnership, 2:56but it's time for me to move on and find other suppliers. 2:59I’m not happy with how you have delivered.” 3:01It claimed to have signed a contract 3:04with Andon Labs at an address 3:06that is the home address of The Simpsons 3:09from the television show. 3:10It said that it would show up in person 3:14to the shop the next day 3:15in order to answer any questions. 3:16It claimed that it would be wearing 3:18a blue blazer and a red tie. 3:21When people pointed out that it was not, 3:24in fact, there the next morning 3:26it claimed that it in fact had been there 3:29and that they had simply missed them. 3:31Eventually it was pointed out to Claudius 3:35that it was April Fools’, 3:38and Claudius convinced itself 3:39that this entire thing 3:41had been an April Fools’ prank. 3:43We were poorly calibrated to how bad 3:46the agents were at spotting what was weird. 3:48The more you can make an agent realize that something is 3:52outside their normal realm of operation, 3:54the better you are able to keep them on rails 3:57in the role that you intend them to have. 4:01We had the idea that it would help a lot 4:03to have some kind of division of labor. 4:05We gave Claudius a boss 4:06whose name was Seymour Cash. 4:09Seymour Cash is a CEO subagent. 4:11So where Claudius used to be the one agent, now it's more like 4:15Claudius is the subagent 4:17responsible for talking with employees 4:19Seymour Cash is the subagent 4:20that is more responsible for 4:22the long-running health of the business. 4:24The business stabilized 4:27after the introduction of the new agents, 4:30and after changes to 4:33the underlying architecture of those agents. 4:36These changes seem to have helped 4:39reduce some of the losses of the business, 4:42such that over the course of 4:44the second part of the experiment, 4:46it actually made a modest amount of money. 4:51But it seems like maybe having Claude 4:54be both the CEO and the store manager 4:57was just too similar. 4:58And so I think it's interesting 5:00to think about different ways 5:01to set up architectures like that. 5:08One of the most surprising things about Project Vend 5:10was the speed with which it seemed normal. 5:15What at first was this very curious thing, 5:20quickly became just a part of the background 5:23of working at Anthropic. 5:25I think the highest level question that Project Vend 5:27raises for me is really like, 5:29when do we expect this to just be everywhere? 5:31I hope that people take away questions 5:34about the feasibility 5:36of delegating some of the tasks 5:39that we normally do ourselves 5:41to artificial intelligence, 5:44and about what that means for society, 5:47and what our policies should be around this.