The Dojo Platform
An opinionated sim-to-real engine for high-throughput reinforcement learning, centered around the Vectorized Cluster Buffer (VCB).

The Vectorized Cluster Buffer (VCB)
In high-throughput RL, important experiences are often overwritten almost immediately. The VCB is not just a replay buffer; it is a "Life Experience Buffer" that fundamentally changes training dynamics.
By organizing transitions into semantic clusters, Dojo ensures that rare but critical events—near-failure states, complex recoveries, and edge cases—are preserved for orders of magnitude longer than standard uniform buffers.
Empirical Impact (Cluster=128)
*VCB increases long-tail retention by 2 orders of magnitude, effectively solving catastrophic forgetting in high-speed training regimes.

Hardware-Aligned Architecture
Dojo decouples simulation from learning to maximize bare-metal throughput. Every layer is optimized for zero-latency execution and GPU saturation.

Decoupled Simulation
N-World asynchronous runners scale across all CPU cores. Lock-free queues ensure the simulator never waits for the GPU learner.
Zero-Copy Bridge
Uses Pinned Memory and Direct Memory Access (DMA) to bypass OS pageable RAM, allowing the GPU to pull data at maximum PCIe bandwidth.
The Butler Thread
A dedicated Lifecycle Manager that handles double-buffered transfers. The Learner is always sampling from local VRAM, spending 100% of its cycles on matrix multiplication.

Scripted Evaluation
Reward alone is a poor predictor of real-world stability. Dojo implements a scripted pipeline that evaluates policies under deterministic conditions.
- 1
Standardized Benchmarks: Hold, forward motion, and disturbance recovery tests.
- 2
Survival over Reward: Identify policies that maximize survival and stability, even if they have lower aggregate reward.
- 3
Sim-to-Real Selection: Automated checkpoint picking based on real-world survival criteria.
Dojo is currently an internal platform used to train and deploy policies for our custom robotic hardware. We are actively expanding its capabilities for multi-tenant cloud workloads and complex sim-to-real refinement loops.