Cactus

PS/2 keyboard interface in CλaSH

8 September 2018 (programming haskell fpga electronics retrochallenge retro clash chip-8)

This week, most of my weekday evenings were quite busy, but I did manage to squeeze in a PS/2 keyboard interface in small installments; then today I went down the rabbit hole of clearing up some technical debt I've accumulated so far by not really looking into how CλaSH handled clock domains.

PS/2 signals

(Just to preempt any possible confusion, we're talking about the peripheral port of the IBM Personal System/2 introduced in the late '80s, not the Playstation 2 console)

The same way VGA is ideal for hobbyist video signal generation since it is both simple and ubiquitous, PS/2 is the go-to interface for keyboards. It is a two-directional, synchronous serial protocol with a peripheral-generated clock in the 10-20 KHz range. Any PC keyboard old enough will support it. One important caveat, though, is that the common USB-to-PS/2 adapters don't actually convert signals, and so they only work with keyboards that were originally designed with that conversion in mind. Here, we are only concerned with device to host communication; it is also possible to communicate in the other direction to e.g. change the Num Lock LED's state.

"Synchronous" here means that there is a separate clock line, unlike in the family of asynchronous serial protocols that were used in RS-232; it is this latter one that is usually meant as "serial communication" when unqualified. In synchronous serial communication, everything happens on the clock ticks; in asynchronous communication, there is no separate clock signal, so the data signal has enough structure that the two communicating sides can agree on the exact framing.

Turning the data line of PS/2 into a stream of bits is a straightforward process: the standard prescribes sampling the data line on the falling edge of the clock line. We also apply an 8-cycle debouncer for good measure, just because some pages on the Internet suggest it:

data PS2 dom = PS2
    { ps2Clk :: Signal dom Bit
    , ps2Data :: Signal dom Bit
    }

samplePS2
    :: (HiddenClockReset dom gated synchronous)
    => PS2 dom -> Signal dom (Maybe Bit)
samplePS2 PS2{..} = enable <$> isFalling low ps2Clk' <*> ps2Data'
  where
    ps2Clk' = debounce d3 low ps2Clk
    ps2Data' = debounce d3 low ps2Data

The second step in that pipeline is to shift in the bits, 11 at a time. A leading low bit signals the start of a packet; eight data bits and one parity bit follow; the packet is finished with one high bit. Of course, only the eight data bits are presented externally. I use a WriterT (Last Word8) (State PS2State) monad to implement this logic, and then turn that into a CλaSH Mealy machine, in a pattern that I plan to use a lot in implementing the CHIP-8 CPU later:

data PS2State
    = Idle
    | Bit Word8 (Index 8)
    | Parity Word8
    | Stop (Maybe Word8)

decodePS2
    :: (HiddenClockReset dom gated synchronous)
    => Signal dom (Maybe Bit) -> Signal dom (Maybe Word8)
decodePS2 = flip mealyState Idle $ \bit -> fmap getLast . execWriterT . forM_ bit $ \bit -> do
    state <- get
    case state of
        Idle -> do
            when (bit == low) $ put $ Bit 0 0
        Bit x i -> do
            let x' = shiftInLeft bit x
            put $ maybe (Parity x') (Bit x') $ succIdx i
        Parity x -> do
            let checked = bit /= parity x
            put $ Stop $ enable checked x
        Stop x -> do
            when (bit == high) $ tell $ Last x
            put Idle

A quick change in hardware

To be able to try out on real hardware what I had at this point, I had to leave the trusty LogicStart Mega-Wing of my Papilio Pro, and instead switch over to the Arcade since that one has a PS/2 port. There are actually two ports on it, so that one could connect e.g. a keyboard and a mouse.

This change involved rewriting my UCF file since the pinout is different from the LogicStart. Also, the Arcade has 4+4+4 bits of VGA color output instead of the previous 3+3+2; of course with the black & white graphics of the CHIP-8, that color depth is all going to waste with this project.

PS/2 scan codes

Unfortunately, it is not enough to shift in the PS/2 data into a byte: we also have to make sense of that byte. While this could be as straightforward as interpreting each byte as the ASCII code of the character on the key pressed, the reality is not this simple. Keyboards emit so-called scan codes, where one or several bytes can encode a single keypress or key release event (see here for example for a list of some keyboard scan codes). I haven't been able to come up with an elegant way of handling this yet, so for now I just have some messy Mealy machine that returns a 16-bit code, where the high byte is zero for one-byte codes. You can see in the comment my frustration at both the implementation and the spec itself:

data KeyEvent = KeyPress | KeyRelease
    deriving (Generic, NFData, Eq, Show)

data ScanCode = ScanCode KeyEvent Word16
    deriving (Generic, NFData, Eq, Show)

data ScanState
    = Init
    | Extended Word8
    | Code KeyEvent Word8

-- TODO: rewrite this for clarity.
-- All it does is it parses 0xE0 0xXX into an extended (16-bit) code, and everything else into
-- an 8-bit code. The only complication is that the key release marker 0xF0 is always the
-- second-to-last byte. Who does that?!?
parseScanCode
    :: (HiddenClockReset dom gated synchronous)
    => Signal dom (Maybe Word8) -> Signal dom (Maybe ScanCode)
parseScanCode = flip mealyState Init $ \raw -> fmap getLast . execWriterT . forM_ raw $ \raw -> do
    let finish ev ext = do
            tell $ Last . Just $ ScanCode ev $ fromBytes (ext, raw)
            put Init
    state <- get
    case state of
        Init | raw == 0xe0 -> put $ Extended raw
             | raw == 0xf0 -> put $ Code KeyRelease 0x00
             | otherwise -> finish KeyPress 0x00
        Extended ext | raw == 0xf0 -> put $ Code KeyRelease ext
                     | otherwise -> finish KeyPress ext
        Code ev ext -> finish ev ext
  where
    fromBytes :: (Word8, Word8) -> Word16
    fromBytes = unpack . pack

Driving a CHIP-8 pixel around

With the video output from last time and the keyboard from this post, but no CPU yet, our options to put everything together into something impressive are somewhat limited. I ended up showing a single CHIP-8 pixel that can be moved around in the CHIP-8 screen space with the arrow keys; this results in something tangible without needing a CPU or even a framebuffer yet. Note how well the code lends itself to using applicative do syntax:

VGADriver{..} = vgaDriver vga640x480at60
ps2 = decodePS2 $ samplePS2 PS2{..}

(dx, dy) = unbundle $ do
    key <- parseScanCode ps2
    pure $ case key of
        Just (ScanCode KeyPress 0xe075) -> (0, -1) -- up
        Just (ScanCode KeyPress 0xe072) -> (0, 1)  -- down
        Just (ScanCode KeyPress 0xe06b) -> (-1, 0) -- left
        Just (ScanCode KeyPress 0xe074) -> (1, 0)  -- right
        _ -> (0, 0)

pixel = do
    x <- fix $ register 0 . (+ dx)
    y <- fix $ register 0 . (+ dy)
    x0 <- (chipX =<<) <$> vgaX
    y0 <- (chipY =<<) <$> vgaY
    pure $ case (,) <$> x0 <*> y0 of
        Just (x0, y0) -> (x0, y0) == (x, y)
        _ -> False

But wait! There's more!

In reality, after getting the PS/2 decoder working, but before hooking it up to the scan code parser, I thought I'd use the serial IO on the Papilio Pro to do a quick test by just transmitting the scan codes straight away as they come out of the decoder. This has then prompted me to clean up a wart on my UART implementation: they took the clock rate as an extra term-level argument to compute the clock division they need to do:

tx
    :: (HiddenClockReset domain gated synchronous)
    => Word32
    -> Word32
    -> Signal domain (Maybe Word8)
    -> TXOut domain
tx clkRate serialRate inp = TXOut{..}
  where
    (txReady, txOut) = unbundle $ mealyState (tx0 $ clkRate `div` serialRate) (0, Nothing) inp

This bothered me because the clock domain already specifies the clock rate, at the type level. Trying to remove this redundancy has led me down a rabbit hole of what I believe is a CλaSH bug; but at least I managed to work around that for now (until I find an even better way).

This, in turn, forced me to use a proper clock domain with the correct clock period setting in my CHIP-8 design:

-- | 25.175 MHz clock, needed for the VGA mode we use.
-- CLaSH requires the clock period to be specified in picoseconds.
type Dom25 = Dom "CLK_25MHZ" (1000000000000 `Div` 25175000)

But then, this allowed me to start putting pixel clock specifications into the type of VGATimings, allowing me to statically enforce that the clock domain in which the VGA signal generator runs is at exactly the right frequency:

vgaDriver
    :: (HiddenClockReset dom gated synchronous, KnownNat w, KnownNat h)
    => (dom ~ Dom s ps, ps ~ (1000000000000 `Div` rate))
    => VGATimings rate w h
    -> VGADriver dom w h
    
-- | VGA 640*480@60Hz, 25.175 MHz pixel clock
vga640x480at60 :: VGATimings 25175000 10 10
vga640x480at60 = VGATimings
    { vgaHorizTiming = VGATiming 640 16 96 48
    , vgaVertTiming  = VGATiming 480 11  2 31
    }

« Very high-level simulation of a CλaSH CPU 
All posts
 CλaSH video: VGA signal generator »