This tutorial walks through building a simple 8-bit CPU in RHDL — a great starting point before tackling more complex designs like the MOS 6502 or RISC-V.

Architecture

The CPU features:

  • 8-bit data bus and 16-bit address space (64KB addressable memory)
  • Accumulator register (A)
  • ALU with arithmetic and logic operations
  • Stack pointer initialized to $FF
  • Single-cycle execution model

Instruction Set

Single-Byte Instructions

OpcodeMnemonicDescription
$00NOPNo operation
$01HLTHalt CPU
$02PHAPush A to stack
$03PLAPull A from stack
$04NOTBitwise NOT of A
$05NEGNegate A (two’s complement)
$06INCIncrement A
$07DECDecrement A

Two-Byte Instructions

OpcodeMnemonicDescription
$10LDI valLoad immediate into A
$11ADD valAdd immediate to A
$12SUB valSubtract immediate from A
$13AND valBitwise AND with immediate
$14OR valBitwise OR with immediate
$15XOR valBitwise XOR with immediate
$16MUL valMultiply A by immediate
$17CMP valCompare A with immediate

Three-Byte Instructions (16-bit address)

OpcodeMnemonicDescription
$20JMP addrUnconditional jump
$21JZ addrJump if zero flag set
$22JNZ addrJump if zero flag clear
$23JC addrJump if carry set
$24CALL addrCall subroutine
$25RETReturn from subroutine
$26LDA addrLoad from memory address
$27STA addrStore to memory address

Building the CPU

Step 1: Define the ALU

class SimpleALU < RHDL::Sim::Component
  input :a, width: 8
  input :b, width: 8
  input :op, width: 4
  output :result, width: 8
  output :zero, :carry
 
  OP_ADD = 0; OP_SUB = 1; OP_AND = 2; OP_OR = 3
  OP_XOR = 4; OP_NOT = 5; OP_INC = 6; OP_DEC = 7
 
  behavior do
    add_full = local(:add_full, a + b, width: 9)
    sub_full = local(:sub_full, a - b, width: 9)
 
    result <= case_select(op, {
      OP_ADD => add_full[7..0],
      OP_SUB => sub_full[7..0],
      OP_AND => a & b,
      OP_OR  => a | b,
      OP_XOR => a ^ b,
      OP_NOT => ~a,
      OP_INC => (a + lit(1, width: 8))[7..0],
      OP_DEC => (a - lit(1, width: 8))[7..0]
    }, default: a)
 
    carry <= case_select(op, {
      OP_ADD => add_full[8],
      OP_SUB => sub_full[8]
    }, default: lit(0, width: 1))
 
    zero <= mux(result == lit(0, width: 8),
                lit(1, width: 1), lit(0, width: 1))
  end
end

Step 2: Build the Datapath

class SimpleCPU < RHDL::Sim::Component
  input :clk, :rst
  input :mem_data_in, width: 8
  output :mem_addr, width: 16
  output :mem_data_out, width: 8
  output :mem_write_en
  output :halted
 
  # Sub-components
  instance :alu, SimpleALU
  instance :acc, Register, width: 8         # Accumulator
  instance :pc, ProgramCounter, width: 16   # Program counter
  instance :sp, StackPointer                # Stack pointer
 
  # Clock distribution
  port :clk => [[:acc, :clk], [:pc, :clk], [:sp, :clk]]
  port :rst => [[:acc, :rst], [:pc, :rst], [:sp, :rst]]
 
  # ALU connections
  port [:acc, :q] => [:alu, :a]
  port [:alu, :result] => [:acc, :d]
end

Step 3: Implement the Decoder

The decoder interprets each opcode and generates control signals for the datapath — which register to load, what ALU operation to perform, and whether to read or write memory.

Memory Layout

RangeDescription
00FFVariables (zero page)
07FFGeneral purpose
0FFFDisplay memory
$1000+Program memory

Example Program

A simple counter that stores values to display memory:

cpu = SimpleCPU.new('cpu')
 
program = [
  0x10, 0x00,               # LDI #0      ; A = 0
  0x27, 0x08, 0x00,         # STA $0800   ; Store to display
  0x06,                     # INC         ; A++
  0x17, 0x10,               # CMP #16    ; Compare with 16
  0x22, 0x02, 0x00,         # JNZ $0002  ; Loop if not zero
  0x01                      # HLT        ; Stop
]
 
cpu.load_program(program, 0x1000)
cpu.reset
cpu.run

Testing

RSpec.describe SimpleCPU do
  let(:cpu) { SimpleCPU.new('test') }
 
  def clock!
    cpu.set_input(:clk, 0); cpu.propagate
    cpu.set_input(:clk, 1); cpu.propagate
  end
 
  it "executes LDI instruction" do
    cpu.load_program([0x10, 0x42, 0x01])  # LDI #$42; HLT
    cpu.reset
    2.times { clock! }  # Fetch + execute LDI
    expect(cpu.accumulator).to eq(0x42)
  end
 
  it "generates valid Verilog" do
    verilog = SimpleCPU.to_verilog_hierarchy
    expect(verilog).to include('module simple_cpu')
    expect(verilog).to include('module simple_alu')
  end
end

Extending the Design

Once the basic CPU works, try adding:

  • More registers — X and Y index registers
  • More addressing modes — indirect, indexed
  • Interrupts — IRQ and NMI vectors
  • Wider data path — 16-bit operations

Next Steps