In a recent experiment that’s as fascinating as it is funny, researchers at data-analytics-id=”inline-link” href=”https://www.tomsguide.com/ai/this-ai-was-given-a-9-5-job-for-a-month-it-failed-miserably-and-had-a-breakdown” target=”_blank” data-before-rewrite-localise=”https://www.tomsguide.com/ai/this-ai-was-given-a-9-5-job-for-a-month-it-failed-miserably-and-had-a-breakdown”> Andon Labs put today’s top data-analytics-id=”inline-link” href=”https://www.tomsguide.com/ai/ai-glossary-all-the-key-terms-explained-including-llm-models-tokens-and-chatbots” data-before-rewrite-localise=”https://www.tomsguide.com/ai/ai-glossary-all-the-key-terms-explained-including-llm-models-tokens-and-chatbots”>large language models (LLMs) to the test, by having them run a robot tasked with “passing the butter” in an office setting.
The goal? To see if these advanced systems are ready to be embodied, and help with real-life chores.
The experiment, which was powered by various models including data-analytics-id=”inline-link” href=”https://www.tomsguide.com/ai/what-is-chat-gpt-5″ data-before-rewrite-localise=”https://www.tomsguide.com/ai/what-is-chat-gpt-5″>ChatGPT-5, Gemini 2.5 Pro, data-analytics-id=”inline-link” href=”https://www.tomsguide.com/ai/claude-opus-4-is-here-and-it-might-be-the-smartest-ai-assistant-yet” data-before-rewrite-localise=”https://www.tomsguide.com/ai/claude-opus-4-is-here-and-it-might-be-the-smartest-ai-assistant-yet”>Claude Opus 4.1 and others, was simple but challenging: To find a butter pack, recognize it…

![[CITYPNG.COM]White Google Play PlayStore Logo – 1500×1500](https://startupnews.fyi/wp-content/uploads/2025/08/CITYPNG.COMWhite-Google-Play-PlayStore-Logo-1500x1500-1-630x630.png)