Base Layer Robotics
Active Research Platform

The Dojo Platform

An opinionated sim-to-real engine for high-throughput reinforcement learning, centered around the Vectorized Cluster Buffer (VCB).

Dojo Platform Interface

The Vectorized Cluster Buffer (VCB)

In high-throughput RL, important experiences are often overwritten almost immediately. The VCB is not just a replay buffer; it is a "Life Experience Buffer" that fundamentally changes training dynamics.

By organizing transitions into semantic clusters, Dojo ensures that rare but critical events—near-failure states, complex recoveries, and edge cases—are preserved for orders of magnitude longer than standard uniform buffers.

Empirical Impact (Cluster=128)

Median Sample Age
~52.5k steps
95th Percentile Age
6.1M steps

*VCB increases long-tail retention by 2 orders of magnitude, effectively solving catastrophic forgetting in high-speed training regimes.

VCB Cluster Heatmap

Hardware-Aligned Architecture

Dojo decouples simulation from learning to maximize bare-metal throughput. Every layer is optimized for zero-latency execution and GPU saturation.

Dojo Technical Architecture Map

Decoupled Simulation

N-World asynchronous runners scale across all CPU cores. Lock-free queues ensure the simulator never waits for the GPU learner.

Zero-Copy Bridge

Uses Pinned Memory and Direct Memory Access (DMA) to bypass OS pageable RAM, allowing the GPU to pull data at maximum PCIe bandwidth.

The Butler Thread

A dedicated Lifecycle Manager that handles double-buffered transfers. The Learner is always sampling from local VRAM, spending 100% of its cycles on matrix multiplication.

Deterministic Evaluation Metrics

Scripted Evaluation

Reward alone is a poor predictor of real-world stability. Dojo implements a scripted pipeline that evaluates policies under deterministic conditions.

  • 1

    Standardized Benchmarks: Hold, forward motion, and disturbance recovery tests.

  • 2

    Survival over Reward: Identify policies that maximize survival and stability, even if they have lower aggregate reward.

  • 3

    Sim-to-Real Selection: Automated checkpoint picking based on real-world survival criteria.

~200K
Samples / Sec
~200ms
100K Buffer Turnover
128
Semantic Clusters
100%
GPU Utilization

Dojo is currently an internal platform used to train and deploy policies for our custom robotic hardware. We are actively expanding its capabilities for multi-tenant cloud workloads and complex sim-to-real refinement loops.