PS/2 keyboard interface in CλaSH
8 September 2018 (programming haskell fpga electronics retrochallenge retro clash chip-8)This week, most of my weekday evenings were quite busy, but I did manage to squeeze in a PS/2 keyboard interface in small installments; then today I went down the rabbit hole of clearing up some technical debt I've accumulated so far by not really looking into how CλaSH handled clock domains.
PS/2 signals
(Just to preempt any possible confusion, we're talking about the peripheral port of the IBM Personal System/2 introduced in the late '80s, not the Playstation 2 console)
The same way VGA is ideal for hobbyist video signal generation since it is both simple and ubiquitous, PS/2 is the go-to interface for keyboards. It is a two-directional, synchronous serial protocol with a peripheral-generated clock in the 10-20 KHz range. Any PC keyboard old enough will support it. One important caveat, though, is that the common USB-to-PS/2 adapters don't actually convert signals, and so they only work with keyboards that were originally designed with that conversion in mind. Here, we are only concerned with device to host communication; it is also possible to communicate in the other direction to e.g. change the Num Lock LED's state.
"Synchronous" here means that there is a separate clock line, unlike in the family of asynchronous serial protocols that were used in RS-232; it is this latter one that is usually meant as "serial communication" when unqualified. In synchronous serial communication, everything happens on the clock ticks; in asynchronous communication, there is no separate clock signal, so the data signal has enough structure that the two communicating sides can agree on the exact framing.
Turning the data line of PS/2 into a stream of bits is a straightforward process: the standard prescribes sampling the data line on the falling edge of the clock line. We also apply an 8-cycle debouncer for good measure, just because some pages on the Internet suggest it:
data PS2 dom = PS2 { ps2Clk :: Signal dom Bit , ps2Data :: Signal dom Bit } samplePS2 :: (HiddenClockReset dom gated synchronous) => PS2 dom -> Signal dom (Maybe Bit) samplePS2 PS2{..} = enable <$> isFalling low ps2Clk' <*> ps2Data' where ps2Clk' = debounce d3 low ps2Clk ps2Data' = debounce d3 low ps2Data
The second step in that pipeline is to shift in the bits, 11 at a time. A leading low bit signals the start of a packet; eight data bits and one parity bit follow; the packet is finished with one high bit. Of course, only the eight data bits are presented externally. I use a WriterT (Last Word8) (State PS2State) monad to implement this logic, and then turn that into a CλaSH Mealy machine, in a pattern that I plan to use a lot in implementing the CHIP-8 CPU later:
data PS2State = Idle | Bit Word8 (Index 8) | Parity Word8 | Stop (Maybe Word8) decodePS2 :: (HiddenClockReset dom gated synchronous) => Signal dom (Maybe Bit) -> Signal dom (Maybe Word8) decodePS2 = flip mealyState Idle $ \bit -> fmap getLast . execWriterT . forM_ bit $ \bit -> do state <- get case state of Idle -> do when (bit == low) $ put $ Bit 0 0 Bit x i -> do let x' = shiftInLeft bit x put $ maybe (Parity x') (Bit x') $ succIdx i Parity x -> do let checked = bit /= parity x put $ Stop $ enable checked x Stop x -> do when (bit == high) $ tell $ Last x put Idle
A quick change in hardware
To be able to try out on real hardware what I had at this point, I had to leave the trusty LogicStart Mega-Wing of my Papilio Pro, and instead switch over to the Arcade since that one has a PS/2 port. There are actually two ports on it, so that one could connect e.g. a keyboard and a mouse.
This change involved rewriting my UCF file since the pinout is different from the LogicStart. Also, the Arcade has 4+4+4 bits of VGA color output instead of the previous 3+3+2; of course with the black & white graphics of the CHIP-8, that color depth is all going to waste with this project.
PS/2 scan codes
Unfortunately, it is not enough to shift in the PS/2 data into a byte: we also have to make sense of that byte. While this could be as straightforward as interpreting each byte as the ASCII code of the character on the key pressed, the reality is not this simple. Keyboards emit so-called scan codes, where one or several bytes can encode a single keypress or key release event (see here for example for a list of some keyboard scan codes). I haven't been able to come up with an elegant way of handling this yet, so for now I just have some messy Mealy machine that returns a 16-bit code, where the high byte is zero for one-byte codes. You can see in the comment my frustration at both the implementation and the spec itself:
data KeyEvent = KeyPress | KeyRelease deriving (Generic, NFData, Eq, Show) data ScanCode = ScanCode KeyEvent Word16 deriving (Generic, NFData, Eq, Show) data ScanState = Init | Extended Word8 | Code KeyEvent Word8 -- TODO: rewrite this for clarity. -- All it does is it parses 0xE0 0xXX into an extended (16-bit) code, and everything else into -- an 8-bit code. The only complication is that the key release marker 0xF0 is always the -- second-to-last byte. Who does that?!? parseScanCode :: (HiddenClockReset dom gated synchronous) => Signal dom (Maybe Word8) -> Signal dom (Maybe ScanCode) parseScanCode = flip mealyState Init $ \raw -> fmap getLast . execWriterT . forM_ raw $ \raw -> do let finish ev ext = do tell $ Last . Just $ ScanCode ev $ fromBytes (ext, raw) put Init state <- get case state of Init | raw == 0xe0 -> put $ Extended raw | raw == 0xf0 -> put $ Code KeyRelease 0x00 | otherwise -> finish KeyPress 0x00 Extended ext | raw == 0xf0 -> put $ Code KeyRelease ext | otherwise -> finish KeyPress ext Code ev ext -> finish ev ext where fromBytes :: (Word8, Word8) -> Word16 fromBytes = unpack . pack
Driving a CHIP-8 pixel around
With the video output from last time and the keyboard from this post, but no CPU yet, our options to put everything together into something impressive are somewhat limited. I ended up showing a single CHIP-8 pixel that can be moved around in the CHIP-8 screen space with the arrow keys; this results in something tangible without needing a CPU or even a framebuffer yet. Note how well the code lends itself to using applicative do syntax:
VGADriver{..} = vgaDriver vga640x480at60 ps2 = decodePS2 $ samplePS2 PS2{..} (dx, dy) = unbundle $ do key <- parseScanCode ps2 pure $ case key of Just (ScanCode KeyPress 0xe075) -> (0, -1) -- up Just (ScanCode KeyPress 0xe072) -> (0, 1) -- down Just (ScanCode KeyPress 0xe06b) -> (-1, 0) -- left Just (ScanCode KeyPress 0xe074) -> (1, 0) -- right _ -> (0, 0) pixel = do x <- fix $ register 0 . (+ dx) y <- fix $ register 0 . (+ dy) x0 <- (chipX =<<) <$> vgaX y0 <- (chipY =<<) <$> vgaY pure $ case (,) <$> x0 <*> y0 of Just (x0, y0) -> (x0, y0) == (x, y) _ -> False
But wait! There's more!
In reality, after getting the PS/2 decoder working, but before hooking it up to the scan code parser, I thought I'd use the serial IO on the Papilio Pro to do a quick test by just transmitting the scan codes straight away as they come out of the decoder. This has then prompted me to clean up a wart on my UART implementation: they took the clock rate as an extra term-level argument to compute the clock division they need to do:
tx :: (HiddenClockReset domain gated synchronous) => Word32 -> Word32 -> Signal domain (Maybe Word8) -> TXOut domain tx clkRate serialRate inp = TXOut{..} where (txReady, txOut) = unbundle $ mealyState (tx0 $ clkRate `div` serialRate) (0, Nothing) inp
This bothered me because the clock domain already specifies the clock rate, at the type level. Trying to remove this redundancy has led me down a rabbit hole of what I believe is a CλaSH bug; but at least I managed to work around that for now (until I find an even better way).
This, in turn, forced me to use a proper clock domain with the correct clock period setting in my CHIP-8 design:
-- | 25.175 MHz clock, needed for the VGA mode we use. -- CLaSH requires the clock period to be specified in picoseconds. type Dom25 = Dom "CLK_25MHZ" (1000000000000 `Div` 25175000)
But then, this allowed me to start putting pixel clock specifications into the type of VGATimings, allowing me to statically enforce that the clock domain in which the VGA signal generator runs is at exactly the right frequency:
vgaDriver :: (HiddenClockReset dom gated synchronous, KnownNat w, KnownNat h) => (dom ~ Dom s ps, ps ~ (1000000000000 `Div` rate)) => VGATimings rate w h -> VGADriver dom w h -- | VGA 640*480@60Hz, 25.175 MHz pixel clock vga640x480at60 :: VGATimings 25175000 10 10 vga640x480at60 = VGATimings { vgaHorizTiming = VGATiming 640 16 96 48 , vgaVertTiming = VGATiming 480 11 2 31 }