Physical Intelligence, a San Francisco startup founded in 2024, is advancing robot control systems that learn multiple tasks using vision-language-action models derived from large language models. The company has demonstrated robots performing varied activities such as making coffee, folding clothes and cooking sweet potatoes based on verbal instructions.
In a warehouse setting, the robots have learned to peel vegetables, clean kitchens and handle items in simulated home environments that are renovated weekly. A recent model called π0.7 successfully operated an air fryer for the first time after receiving step-by-step guidance.
Sergey Levine, a founder and University of California, Berkeley professor, noted that diverse data sources help AI systems improve rather than complicate learning. The company is also testing robots in actual homes to handle real-world variability.
Ingmar Posner of the University of Oxford described the approach as an exciting translation of language model capabilities but cautioned that real-world deployment at scale remains distant due to data requirements and unpredictable user interactions.