Lecture 6 April 25, 2026

Two Paradigms of World Models

Video-first vs. joint world-action models. Trace the arc from UniPi, SuSIE, HiP, GR-1, and VPP through NVIDIA Cosmos and DreamZero — then fine-tune DreamZero for the SO-101 arm on 4×H100, with all seven bugs documented.

UniPi SuSIE HiP GR-1 / GR-2 VPP Cosmos DreamZero SO-101 Flow Matching
Open Slides

Lecture Outline

Three parts: the video-first lineage, the joint world-action shift, and our own DreamZero-SO101 build.

Part 1 — Paradigm A: Video → Inverse Dynamics

🗂️

The Three Categories

🎬

UniPi

🖼️

SuSIE

🧩

HiP

🤖

GR-1 / GR-2

🎯

VPP

🌀

The Full Arc

Part 2 — Paradigm B: Joint World-Action Models

The Core Shift

🌌

NVIDIA Cosmos

🔢

Cosmos Tokenizer

💭

Meet DreamZero

🏗️

14B Backbone

🔗

Joint Formulation

〰️

Flow Matching

🧱

AR Chunks + KV Cache

🔁

Closed-Loop

DreamZero-Flash 7 Hz

📊

Cross-Embodiment

Part 3 — Building Our Own: DreamZero-SO101

🎯

The Goal

💰

Hardware & Budget

🦾

Adding SO-101 Support

📦

Finding SO-101 Data

🔄

LeRobot → GEAR

🧪

The PoC Run

🐛

Seven Bugs

🚀

What's Next