Cactus

Composable CPU descriptions in CλaSH, and wrap-up of RetroChallenge 2018/09

30 September 2018 (programming haskell fpga electronics retrochallenge retro clash chip-8)

My last post ended with some sample CλaSH code illustrating the spaghetti-ness you can get yourself into if you try to describe a CPU directly as a function (CPUState, CPUIn) -> (CPUState, CPUOut). I promised some ideas for improving that code.

To start off gently, first of all we can give names to some common internal operations to help readability. Here's the code from the previous post rewritten with a small kit of functions:

intensionalCPU (s0, CPUIn{..}) = case phase s of
    WaitKeyPress reg ->
        let s' = case cpuInKeyEvent of
                Just (True, key) -> goto Fetch1 . setReg reg key $ s
                _ -> s
            out = def{ cpuOutMemAddr = pc s' }
        in (s', out)                   
    Fetch1 ->
        let s' = goto Exec s{ opHi = cpuInMem, pc = succ $ pc s }
            out = def{ cpuOutMemAddr = pc s' }
        in (s', out)
  where
    s | cpuInVBlank = s0{ timer = fromMaybe 0 . predIdx $ timer s0 }
      | otherwise = s0

Using a State monad

To avoid the possibility of screwing up the threading of the internal state, we can use the State CPUState monad:

stateCPU :: CPUIn -> State CPUState CPUOut      
stateCPU CPUIn{..} = do
    when cpuInVBlank $
        modify $ \s -> s{ timer = fromMaybe 0 . predIdx $ timer s }
    gets phase >>= \case 
      WaitKeyPress reg -> do
        forM_ cpuInKeyEvent $ \(pressed, key) -> when pressed $ do
            setReg reg key
            goto Fetch1
        return def{ cpuOutMemAddr = pc s' }
      Fetch1 -> do
        modify $ \s -> s{ opHi = cpuInMem, pc = succ $ pc s }
        goto Exec
        return def{ cpuOutMemAddr = pc s' }

At this point, the state-accessing actions either come from the kit, because they are useful in many parts of our CPU description (like setReg), or they read or change individual fields of the CPUState record. This second group can benefit from using lenses:

stateCPU CPUIn{..} = do
    when cpuInVBlank $
        timer %= fromMaybe 0 . predIdx
    use phase >>= \case 
      WaitKeyPress reg -> do
        forM_ cpuInKeyEvent $ \(pressed, key) -> when pressed $ do
            setReg reg key
            goto Fetch1
        return def{ cpuOutMemAddr = pc s' }
      Fetch1 -> do
        opHi .= cpuInMem
        pc %= succ
        goto Exec
        return def{ cpuOutMemAddr = pc s' }

Problems with composability

While this code is much more readable than the original one, and allows describing the internal state changes piecewise, it still has a problem with composability. To illustrate this, let's add some more functionality by implementing some CPU instructions (simplified a bit from a real CHIP-8 which can only write to memory through a save-registers instruction and even that is addressed by a dedicated pointer register; and accessing the framebuffer is also done via multiple-byte blitting only).

stateCPU CPUIn{..} = do
    when cpuInVBlank $
        timer %= fromMaybe 0 . predIdx
    use phase >>= \case 
      WaitKeyPress reg -> do
        forM_ cpuInKeyEvent $ \(pressed, key) -> when pressed $ do
            setReg reg key
            goto Fetch1
        pc <- use pc    
        return def{ cpuOutMemAddr = pc }
      Idle -> do
        goto Fetch1
        pc <- use pc    
        return def{ cpuOutMemAddr = pc }
      Fetch1 -> do
        opHi .= cpuInMem
        pc %= succ
        goto Exec
        pc <- use pc    
        return def{ cpuOutMemAddr = pc }
      Exec -> do
        opLo .= cpuInMem
        pc %= succ
        goto Fetch1
        decode >>= \case
          WaitKey reg -> do
            goto $ WaitKeyPress reg
            pc <- use pc    
            return def{ cpuOutMemAddr = pc }
          LoadImm reg imm -> do
            setReg reg imm
            pc <- use pc    
            return def{ cpuOutMemAddr = pc }
          WriteMem addr reg -> do
            val <- getReg reg
            goto Idle -- flush write to RAM before starting next fetch
            return def{ cpuOutMemAddr = addr, cpuOutMemWrite = Just val }           
          SetPixel regX regY -> do
            x <- getReg regX
            y <- getReg regY
            pc <- use pc
            return def{ cpuOutMemAddr = pc, cpuOutFBAddr = (x, y), cpuOutFBWrite = Just True }          
          -- And a lot more opcodes here

Since there are going to be a lot of opcodes to support, it could seem like a good idea to refactor the Exec state to be a separate function, leaving only the implementation of the other control phases in the main function:

stateCPU cpuIn@CPUIn{..} = do
    when cpuInVBlank $
        timer %= fromMaybe 0 . predIdx
    use phase >>= \case 
      WaitKeyPress reg -> do
        forM_ cpuInKeyEvent $ \(pressed, key) -> when pressed $ do
            setReg reg key
            goto Fetch1
        pc <- use pc    
        return def{ cpuOutMemAddr = pc }
      Idle -> do
        goto Fetch1
        pc <- use pc    
        return def{ cpuOutMemAddr = pc }
      Fetch1 -> do
        opHi .= cpuInMem
        pc %= succ
        goto Exec
        pc <- use pc    
        return def{ cpuOutMemAddr = pc }
      Exec -> do
        opLo .= cpuInMem
        pc %= succ
        goto Fetch1
        decode >>= exec cpuIn
        
exec CPUIn{..} (WaitKey reg) = ... -- same as previous version
exec CPUIn{..} (LoadImm reg imm) = ... -- same as previous version
exec CPUIn{..} (WriteMem addr reg) = ...
exec CPUIn{..} (SetPixel regX regY) = ...

However, this becomes problematic if we want to set some parts of the CPUOut result in a generic way, outside exec. Suppose we implement sound support, which in CHIP-8 is done via another 60Hz timer:

stateCPU cpuIn@CPUIn{..} = do
    when cpuInVBlank $ do
        timer %= fromMaybe 0 . predIdx
        soundTimer %= fromMaybe 0 . predIdx
    soundOn <- uses soundTimer (/= 0)
    def <- return def{ cpuOutSound = soundOn }

This takes care of setting cpuOutSound in the branches for WaitKeyPress, Idle and Fetch1 (albeit in an ugly way, by shadowing def), but now this needs to be passed as an extra parameter to exec (or the code to set cpuOutSound needs to be duplicated).

Assembling the output piecewise

So what we want is a way to specify the resulting CPUOut such that

  1. Reasonable defaults (like reading the next instruction from RAM via cpuOutMemAddr = pc) are applied automatically, without requiring any code. Note that the defaults shouldn't need to be static; they can depend on the state.
  2. Various parts of the CPU implementation can contribute to the final CPUOut result that is visible from the outside.

This has led me to a design where the CPU is described by a type CPU i s o = RWS i (Endo o) s:

With this setup, we can rewrite our CPU implementation as:

cpu = do
    CPUIn{..} <- input
    when cpuInVBlank $
        timer %= fromMaybe 0 . predIdx
        soundTimer %= fromMaybe 0 . predIdx
    soundOn <- uses soundTimer (/= 0)
    when soundOn $ do
        tell $ \out -> out{ cpuOutSound = True }

    use phase >>= \case 
      WaitKeyPress reg -> do
        forM_ cpuInKeyEvent $ \(pressed, key) -> when pressed $ do
            setReg reg key
            goto Fetch1
      Idle -> do
        goto Fetch1
      Fetch1 -> do
        opHi .= cpuInMem
        pc %= succ
        goto Exec
      Exec -> do
        opLo .= cpuInMem
        pc %= succ
        goto Fetch1
        decode >>= exec

exec (WaitKey reg) = do
    goto $ WaitKeyPress reg
exec (LoadImm reg imm) = do
    setReg reg imm
exec (WriteMem addr reg) = do
    val <- getReg reg
    writeMem addr val
    goto Idle -- flush write to RAM before starting next fetch
exec (SetPixel regX regY) = do
    x <- getReg regX
    y <- getReg regY
    writeFB (x, y) True

Most branches don't need to do anything to get the correct final CPUOut; and just like we were able to name the State kit of setReg &c, we can define


    

The system described above is implemented here for the reusable library definitions and here for CHIP-8 specifically; the version that uses lenses is in the branch linked above, since I am not 100% convinced yet that the cognitive overhead of the lens library is worth it once enough kit is developed around state access.

Future directions

There are two questions about the above that I feel that haven't been resolved yet:

RetroChallenge 2018/09 wrap-up

All in all, even though I couldn't get as far as I originally hoped, I am glad I at least managed to build a CHIP-8 computer in CλaSH that can boot up on real hardware and play original CHIP-8 games; that's pretty much what I originally set out to do. The RetroChallenge format of detailed blog posts concurrent with the actual development work being done is quite taxing on me, because I am slow to write these posts; overall, I have probably spent about half and half of my RetroChallenge time on coding/design vs. writing about it. But the fixed one-month timeframe and the great progress reports by the other participants have been great motivators; I would probably try something next time where I can expect less uknown unknowns.

There is still work to be done even on this CHIP-8 computer: I've found some games that don't work as expected when they try to use the built-in hexadecimal digit font; and there are of course the previously discussed performance problems inherent in addressing the frame buffer as single bits. Also, I would like to adapt it to use a real four-by-four keypad (by scanning the key rows from the FPGA) and a PCD8544 monochrome matrix LCD

And then, of course, further exploration of the design space of CPU descriptions discussed above; and then moving on to a "real" CPU, probably by rewriting my Lava 6502 core. Also, I will need to get a new FPGA dev board with a chip that is supported by a contemporary toolchain, so that I won't get bogged down by problems like the Spartan-6 synthesis bug.


« Integrating Verilator and Clash via Cabal 
All posts
 CPU modeling in CλaSH »