Module 4: Vision-Language-Action (VLA)

Focus: The convergence of Large Language Models (LLMs) and Robotics.

Vision-Language-Action (VLA) is the ultimate goal of Physical AI, enabling robots to interpret complex, natural human commands and translate them into physical actions.

Key Concepts

Voice-to-Action: Implementing speech recognition using tools like OpenAI Whisper to convert user voice commands into structured text inputs for the AI system.
Cognitive Planning: Leveraging LLMs to perform high-level Cognitive Planning. This translates natural language goals (e.g., "Clean the room") into a sequence of low-level, executable ROS 2 actions.

Capstone Project

The module culminates in the Capstone Project: The Autonomous Humanoid. This project requires students to integrate all learned modules:

Receive a voice command.
Plan a path.
Navigate obstacles.
Identify an object using computer vision.
Manipulate the object in the simulated environment.

Key Concepts​

Capstone Project​

Key Concepts

Capstone Project