CPU modeling in CλaSH

23 September 2018 (programming haskell fpga electronics retrochallenge retro clash)

My entry for RetroChallenge 2018/09 is building a CHIP-8 computer. Previously, I've talked in detail about the video signal generator and the keyboard interface; the only part still missing is the CPU.

The CHIP-8 instruction set

Since the CHIP-8 was originally designed to be an interpreted language run as a virtual machine, some of its instructions are quite high-level. For example, the framebuffer is modified via a dedicated blitting instruction; there is a built-in random number generator; and instructions to manipulate two 60 Hz timers. Other instructions are more in line with what one would expect to see in a CPU, and implement basic arithmetic such as addition or bitwise AND. There is also a generic escape hatch instruction but that doesn't really apply to hardware implementations.

The CPU has 16 generic-purpose 8-bit registers V0VF; register VF is also used to report flag results like overflow from arithmetic operations, or collision during blitting. Most instructions operate on these general registers. Since the available memory is roughly 4K, these 8-bit registers wouldn't be too useful as pointers. Instead, there is a 12-bit Index register that is used as the implicit address argument to memory-accessing instructions.

For flow control, the program counter needs 12 bits as well; the CHIP-8 is a von Neumann machine. Furthermore, it has CALL / RET instructions backed by a call-only stack (there is no argument passing or local variables).

Modeling the CPU's internal state

We can collect all of the registers described above into a single Haskell datatype. I have also added two 8-bit registers for the high and low byte of the current instruction, but in retrospect it would be enough to just store the high byte, since the low byte is coming from RAM exactly when we need to dispatch on it anyway. The extra phase register is to distinguish between execution phases such as fetching the first byte of the next instruction, or for instructions that are implemented in multiple clock cycles, like clearing the frame buffer (more on that below).

type Addr = Unsigned 12
type Reg = Index 16

data CPUState = CPUState
    { opHi, opLo :: Word8
    , pc, ptr :: Addr
    , registers :: Vec 16 Word8
    , stack :: Vec 24 Addr
    , sp :: Index 24
    , phase :: Phase
    , timer :: Word8
    , randomState :: Unsigned 9

I implemented the random number generator as a 9-bit linear-feedback shift register, truncated to its lower 8 bits; this is because a maximal 8-bit LFSR wouldn't generate 0xFF.

lfsr :: Unsigned 9 -> Unsigned 9
lfsr s = (s `rotateR` 1) `xor` b4
    b = fromIntegral $ complement . lsb $ s
    b4 = b `shiftL` 4

Input and output "pins"

Similar to how a real chip has various pins to interface with other parts, our CPU description will also have multiple inputs and outputs. The input consists of the data lines read from main memory and the framebuffer; the events coming from the keypad, and the keypad state; and the 60 Hz VBlank signal from the video generator. This latter signal is used to implement the timer register's countdown. The keypad's signals are fed into the CPU both as events and statefully; I've decided to do it this way so that only the peripheral interface needs to be changed to accomodate devices that are naturally either parallel (like a keypad matrix scanner) or serial (like a computer keyboard on a PS/2 connector).

type Key = Index 16
type KeypadState = Vec 16 Bool

data CPUIn = CPUIn
    { cpuInMem :: Word8
    , cpuInFB :: Bit
    , cpuInKeys :: KeypadState
    , cpuInKeyEvent :: Maybe (Bool, Key)
    , cpuInVBlank :: Bool

The output is even less surprising: there's an address line and a data out (write) line for main memory and the video framebuffer.

type VidX = Unsigned 6
type VidY = Unsigned 5

data CPUOut = CPUOut
    { cpuOutMemAddr :: Addr
    , cpuOutMemWrite :: Maybe Word8
    , cpuOutFBAddr :: (VidX, VidY)
    , cpuOutFBWrite :: Maybe Bit

So, what is a CPU?

As far as CλaSH is concerned, the CPU is extensionally a circuit converting input signals to output signals, just like any other component:

extensionalCPU :: Signal dom CPUIn -> Signal dom CPUOut

The internal CPU state is of no concern at this level. Internally, we can implement the above as a Mealy machine with a state transition function that describes behaviour in any given single cycle:

intensionalCPU :: (CPUState, CPUIn) -> (CPUState, CPUOut)

extensionalCPU = mealy intenstionalCPU initialState

As far as a circuit is concerned, a clock cycle is a clock cycle is a clock cycle. If we want to do any kind of sequencing, for example to fetch two-byte instruction opcodes from the byte-indexed main memory in two steps, we need to know in intensionalCPU which step is next. This is why we have the phase field in CPUState, so we can read out what we need to do, and store what we want to do next. For example, in my current version the video framebuffer is bit-indexed (addressed by the 6-bit X and the 5-bit Y coordinate), and there is no DMA to take care of bulk writes; so to implement the instruction that clears the screen, we need to write low to all framebuffer addresses, one by one, from (0, 0) to (63, 31). This requires 2048 cycles, so we need to go through the Phase that clears (0, 0), to the one that clears (0, 1), all the way to (63, 31), before fetching the first byte of the next opcode to continue execution. Accordingly, one of the constructors of Phase stores the (x, y) coordinate of the next bit to clear, and we'll need to add some logic so that if phase = ClearFB (x, y), we emit (x, y) on the cpuOutFBAddr line and Just low on the cpuOutFBWrite line. Blitting proceeds similarly, with two sub-phases per phase: one to read the old value, and one to write back the new value (with the bitmap image xor'd to it)

data Phase
    = Init
    | Fetch1
    | Exec
    | StoreReg Reg
    | LoadReg Reg
    | ClearFB (VidX, VidY)
    | Draw DrawPhase (VidX, VidY) Nybble (Index 8)
    | WaitKeyPress Reg
    | WriteBCD Word8 (Index 3)

So how should we write intensionalCPU? We could do it in direct style, i.e. something like

If you think this is horrible and unreadable and unmaintainable, then yes! I agree! Which is why I've spent most of this RetroChallenge (when not fighting synthesizer crashes) thinking about nicer ways of writing this.

This post is getting long, let's end on this note here. Next time, I am going to explain how far I've gotten so far in this quest for nicely readable, composable descriptions of CPUs.

« Composable CPU descriptions in CλaSH, and wrap-up of RetroChallenge 2018/09 
All posts
 Back in the game! »