Active Research Platform

The Dojo Platform

An opinionated sim-to-real engine for high-throughput reinforcement learning, centered around the Vectorized Cluster Buffer (VCB).

The Vectorized Cluster Buffer (VCB)

In high-throughput RL, important experiences are often overwritten almost immediately. The VCB is not just a replay buffer; it is a "Life Experience Buffer" that fundamentally changes training dynamics.

By organizing transitions into semantic clusters, Dojo ensures that rare but critical events—near-failure states, complex recoveries, and edge cases—are preserved for orders of magnitude longer than standard uniform buffers. Read the VCB White Paper →

Empirical Impact (Cluster=128)

Median Sample Age

~52.5k steps

95th Percentile Age

6.1M steps

*VCB increases long-tail retention by 2 orders of magnitude, effectively solving catastrophic forgetting in high-speed training regimes.

Hardware-Aligned Architecture

Dojo decouples simulation from learning to maximize bare-metal throughput. Every layer is optimized for zero-latency execution and GPU saturation.

Decoupled Simulation

N-World asynchronous runners scale across all CPU cores. Lock-free queues ensure the simulator never waits for the GPU learner.

Zero-Copy Bridge

Uses Pinned Memory and Direct Memory Access (DMA) to bypass OS pageable RAM, allowing the GPU to pull data at maximum PCIe bandwidth.

The Butler Thread

A dedicated Lifecycle Manager that handles double-buffered transfers. The Learner is always sampling from local VRAM, spending 100% of its cycles on matrix multiplication.

Scripted Evaluation

Reward alone is a poor predictor of real-world stability. Dojo implements a scripted pipeline that evaluates policies under deterministic conditions.

1
Standardized Benchmarks: Hold, forward motion, and disturbance recovery tests.
2
Survival over Reward: Identify policies that maximize survival and stability, even if they have lower aggregate reward.
3
Sim-to-Real Selection: Automated checkpoint picking based on real-world survival criteria.

Platform Concepts

Trainable Entities

Deep-dive into Dojo V2's sensor-first, isolated agent model. Learn how we decouple state queries and run embedded-ready SensorSuites.

Read Entity Spec →

Sim-to-Real & Batched Dynamics

Explore the floating-point precision constraints (float32 vs float64) and batched inference pipelines that solve policy divergence in chaotic environments.

Read Dynamics Spec →

~200K

Samples / Sec

~200ms

100K Buffer Turnover

128

Semantic Clusters

100%

GPU Utilization

Dojo is currently an internal platform used to train and deploy policies for our custom robotic hardware. We are actively expanding its capabilities for multi-tenant cloud workloads and complex sim-to-real refinement loops.