RHDL offers six simulation backends spanning three orders of magnitude in performance. This guide covers benchmarks, backend selection, and profiling.

Backend Performance Summary

BackendSpeedStartupBest For
Ruby BehavioralBaselineInstantDevelopment, debugging
IR Interpreter~60K cycles/sFastQuick gate-level verification
IR JIT~200–600K cycles/sModerateMedium-length simulations
IR Compiler (AOT)~1–2M cycles/s5–8sLong batch simulations
Verilator~5–6M cycles/sCompile timeMaximum throughput
CIRCT/MLIR (Arcilator)Native RTL parityCompile timeRTL benchmarking

Benchmark Commands

MOS 6502 CPU

rake bench[mos6502]   # Default: 5 million cycles

Sample results:

  • Interpreter: ~60K cycles/s
  • JIT: ~230K cycles/s
  • Compiler: ~1.58M cycles/s (6.8x over JIT)

Apple II Full System

rake bench[apple2]    # CPU + memory + I/O

Game Boy

rake bench[gameboy]   # Frame-based execution

Sample results:

  • IR Compiler: ~1.27 MHz (~30% of real-time)
  • Verilator: exceeds real hardware speed

Backend Selection Guide

Simulation LengthRecommended Backend
< 100K cyclesInterpreter or JIT
100K – 1M cyclesJIT
1M – 10M cyclesCompiler (AOT)
> 10M cyclesVerilator or CIRCT/MLIR
Use CaseRecommended Backend
Development and debuggingRuby Behavioral
RSpec test suiteRuby Behavioral
Gate-level verificationIR Interpreter
Extended batch testingIR Compiler
Maximum performanceVerilator
Native RTL benchmarkingCIRCT/MLIR (Arcilator)

Using Each Backend

Ruby Behavioral (Default)

component = MyDesign.new('test')
component.set_input(:a, 42)
component.propagate

IR Interpreter

sim = RHDL::Codegen.gate_level([component], backend: :interpreter)
sim.poke('a', 42)
sim.evaluate
result = sim.peek('y')

IR JIT

sim = RHDL::Codegen.gate_level([component], backend: :jit)

IR Compiler (AOT)

sim = RHDL::Codegen.gate_level([component], backend: :compiler)

Verilator

# Requires Verilator installed
rhdl export --lang verilog MyComponent
verilator --cc my_component.v --exe testbench.cpp
make -C obj_dir

CIRCT/MLIR

# Requires firtool and arcilator
rhdl export --lang firrtl MyComponent
firtool my_component.fir --lowering-options=emitVerilog
arcilator my_component.mlir -o sim

Profiling Tips

Ruby Profiling

require 'benchmark'
 
time = Benchmark.measure do
  1000.times do
    component.propagate
  end
end
puts "1000 propagations: #{time.real}s"

Gate Count as Complexity Metric

rhdl gates --stats

Gate count correlates with simulation time — a component with 400 gates will simulate roughly 8x slower than one with 50 gates at the gate level.

SIMD Lane Count

For gate-level simulation, increase SIMD lanes for batch throughput:

RHDL_BENCH_LANES=64 rake bench[mos6502]

Default is 64 lanes. Increasing beyond 64 requires wider SIMD operations.

Cycle Count

Control benchmark duration:

RHDL_BENCH_CYCLES=1000000 rake bench[mos6502]

Optimization Strategies

  1. Start with behavioral — get correctness first
  2. Switch to JIT for CI — fast enough for test suites, catches gate-level bugs
  3. Use AOT Compiler for regression — best throughput for long test runs
  4. Profile hot components — gate count reveals complexity bottlenecks
  5. Parallelize with SIMD — 64 test vectors for free

Next Steps