Overview
- Rho-alpha, derived from Microsoft’s Phi series, maps natural-language instructions to precise control for bimanual manipulation.
- The model extends VLA approaches by incorporating tactile sensing, with work underway to support force signals during operation.
- Training data combines physical demonstrations with synthetic trajectories produced via NVIDIA Isaac Sim on Azure alongside large-scale visual QA data.
- Microsoft released demos on the BusyBox benchmark and on plug insertion and toolbox tasks, including a sequence corrected in real time by a human operator.
- The system is being evaluated on dual-arm platforms and humanoid robots, with a Research Early Access Program open now and a technical description coming in the months ahead.