← Back to Dojo|Dynamics Reference

Sim-to-Real & Batched Dynamics

Robotics systems in chaotic state spaces—like our B1 balance robot prototype—exhibit extreme sensitivity to temporal budgets, communication delays, and numerical precision. This document reviews the floating-point and batch-inference requirements of the Dojo simulation engine.

Precision Divergence in Chaotic Systems

Autonomy systems are highly susceptible to chaotic dynamics. In Dojo's BalanceWorld environment, minor changes in initial state parameters propagate exponentially.

We observed significant behavior divergence between Single-threaded evaluation loops and Batched simulation runs. This divergence was traced to a subtle precision mismatch:

The accelerated BalanceWorld simulation executes actions in float32 precision during batched pipeline updates via apply_action_batch.

Conversely, standard naive test loops evaluate policy matrices in Python using native float64 (double-precision) floating-point structures.

Because of chaotic sensitivity, these differences in float representation quickly compound over time, leading to divergent policy outcomes. For evaluation results to mirror training dynamics precisely, the evaluation engine must execute using the batched interface:

Snippet: Compelling Evaluation Synchronization

# AVOID: Naive loop using implicit float64
# for step in range(steps):
#     action = policy(obs) # float64
#     world.step(action)

# CORRECT: Use the accelerated action batcher to lock float32 alignment
actions = policy.decide_batch(observations) # Aligned matrix
world.apply_action_batch(actions) # Synchronized float32 dynamics

Batched Inference Requirements

Dojo's SACMLPBrain exhibits high computational sensitivity during backward steps. To avoid optimization drifts:

  • decide_batch execution: The policy evaluation framework must run inside decide_batch to prevent thread drift and preserve memory alignment with VCB clusters.
  • Deterministic Disturbance: Automated recovery sweeps introduce deterministic step-wise forces to evaluate recovery curves, ensuring that policies are rated based on physical survival metrics rather than cumulative rewards.

Sim-to-Real Hardware Validation

Our physical hardware, the B1 Wine-Box robot, runs policies optimized via the aligned float32 dynamics. By enforcing perfect alignment in simulation during the policy search phase, we achieve seamless zero-shot balance transfers in real-world environments without hand-tuning motor coefficients.