240 million tennis matches per year. Almost none have a referee. Sideline watches video, reasons through the play with chain-of-thought, and signals a physical robot.
An image classifier can tell you the ball landed out. An agent knows the score, the server, the rules, and what happened three points ago.
Multi-stream ingest, 30fps sync, H.264/265.
Qwen2-VL-72B via Nebius Token Factory.
Chain-of-thought + match-state memory.
Serialized commands to robot runtime.
Physical point + audio annunciation.
A classifier tells you the ball is out. A referee holds state, narrates reasoning, and takes physical action.
Every decision is narrated. "Ball crossed the plane at 47 km/h; angle -12°; topspin — goal." The agent leaves a trace a human can audit.
Score, server, faults, who won the last point — all in context. Every call incorporates match history, not just the current frame.
Hot-swap between Nebius Qwen2-VL-72B, Cosmos Reason, or local Llama without touching agent logic.
Direct tool-call to robot runtime. The referee doesn't just show a score — it points, speaks, and gestures.
The referee reasoning stays the same. The embodiment swaps: precise arm, humanoid presence, mobile rover.
Six-joint manipulator. Precise court-side gestures, high-signal low-footprint. Perfect for in-stadium calls.
Full-humanoid gestures — arm raise, point, whistle posture. Stadium-scale presence when the call needs to be seen.
Tracked rover with mecanum wheels, camera, speaker, and ROS2. Mobile, affordable, follows the play.
// REV_A · 2026-04
| Component | Specification | Status |
|---|---|---|
| VLM Backend | Nebius Qwen2-VL-72B · OpenAI-compatible endpoint | Active |
| Agent Runtime | Python 3.12 · CoT prompt · tool-calling · match-state memory | Active |
| Rules Engine | Tennis (MVP) · pluggable sport config | Active |
| Robot IO | ROS2 · LeRobot · gesture scripts · tool-call → execute | Testing |
| Console | Web console · live token stream · CoT trace · tool-call log | Active |
| License | Apache-2.0 · fork it · build your own sport | Open |