Blog tags RSS

Entries tagged haskell

Hacks of 2015

28 December 2015 (programming haskell idris games electronics avr FPGA meta)

Encsé writes in his blog that one of the reasons he created a tiny CPS-based Scheme interpreter was because he realized he hasn't done any fun side projects all year. So I thought I'd make an inventory of fun hacks I've done in 2015.




Functional programming

Stack Overflow highlights

Some of the answers I wrote this year on Stack Overflow required me to learn just enough of something new to be able to answer the question:

Then, there were those where it turned out there was a bug to be found by scratching the surface of the question deep enough:

Then there were the answers that were just too much fun to write:

All in all, it seems this has been quite a productive year for me out of the office, even if you exclude the Stack Overflow part. I was actually surprised how long this list was while compiling it for this blog post. Maybe I should write a list like this every year from now...

Initial version of my Commodore PET

2 March 2015 (programming haskell FPGA electronics retro)

In my quest to build more and more complicated computers on FPGAs armed with nothing but a crappy hobbist mindset and some hazy ideas of how Kansas Lava is supposed to work, I've reached another milestone: my first real computer.

That is, unlike the Brainfuck CPU that I designed myself, or the CHIP-8, which was originally a virtual machine spec (with all the implementation leeway that implies), this latest subject is a bona fide 8-bit home computer from the seventies: the Commodore PET.

A Commodore PET 2001 from 1977

The Commodore PET in a nutshell

The PET is a very simple machine compared to later Commodore models, which is why I thought it would make a good first step on a journey that I hope will one day culminate in implementing a Commodore 64. Its centerpiece is the MOS 6502 CPU (practically the same as the MOS 6510 used in the C=64), and there are only four other components: a text-only video signal generator and three IO interface chips (two PIA's and one VIA) for keyboard, Datasette and extension port communication. Just hooking up one of the PIAs is enough to get a minimal system working with keyboard input.

12 KBytes of PET ROM contain implementation of basic IO routines (the so-called "kernal"), the full-screen text editor, and Microsoft's BASIC interpreter. Then there's a separate character ROM (not addressable from the CPU) used by the video generator.

MOS 6502

The 6502 microprocessor was a staple of the eight-bit home computer era of the late seventies and eighties. By today's standards, it is incredible to imagine what it must have been like to design it manually, drawing the layout with pencils on paper. On the other hand, if it was designed in such a low-tech way, I figured it shouldn't be too difficult to build something compatible using modern tools even by a hobbist like myself. And of course there are already dozens of home-built 6502 implementations out there, to varying degrees of compatibility.

The ultimate reference on the 6502 must be the Visual 6502 Project which I deliberately avoided consulting. I don't really see the educational value in copying the original 6502 design; so instead, I went with a more black-box approach by just looking at the opcode descriptions and interrupt model and working from that.

The first milestone I aimed for was to get enough of the CPU working that I can run the sample programs on, which defines a tiny microcomputer-like architecture that doesn't have interrupts or any fancy video modes: you just have 32×32 pixels with a fixed 16-color palette mapped to main RAM from $0200, and a zero page-mapped register for keyboard input that you can do polling on. The Kansas Lava implementation is really simple and I plan to reuse it later if I do a similar project with the Z80.

My workflow was that I would use ca65 to assemble test programs, burn them into ROM, and run it in the Kansas Lava simulator for a couple thousand cycles; then render the video RAM into a GTK+ window. I would start with this program that does nothing but moves data around in memory (drawing the Commodore logo pixel by pixel), and basically I implemented the 6502 opcodes as I went along. After two days of work, I finally got it working:

First positive result

Seeing this was an incredible feeling. The input was valid 6502 machine code, and my very own CPU managed to run it correctly for the approximately 40,000 cycles that it took to draw this image. There was no stopping at this point: I already had a working VGA frame buffer implementation from the CHIP-8, so next day I synthesized it and run it on real hardware, my venerable Papilio Pro:

Hi-tech from the seventies, today!

Simulation-based testing

As I added more and more opcodes and started running more and more complicated programs, things very quickly stopped working. My CPU was full of bugs, and figuring out what went wrong by looking at the simulation logs after running it for tens of thousands of cycles was very tedious.

And so, it was at this point that I started adding unit tests. The framework for writing tests exposes a monad where the available effects are making observations on the state of the system (CPU registers and contents of the memory) and executing instructions. This presents an API that allows writing tests in an imperative way:

php = do
    flags <- observe statusFlags
    sp <- observe regSP
    execute0 0x08
    sp' <- observe regSP
    pushed < observe $ mem (stackAddr <$> sp)
    assertEq "Stack pointer is decremented" sp' (pred <$> sp)
    assertEq "Status is correctly pushed" pushed flags

A test like this is turned into a ROM image containing $08 at the address pointed to by the reset vector. The simulation is then run until the CPU enters the Fetch internal state for the second time (the first time is when it fetches the opcode under testing, i.e. the PHP ($08) instruction), and then the observations are evaluated by looking at the simulation output in the same cycles as the Fetch state. Of course, this means you shouldn't be able to write tests like the following:

impossiblyDynamicTest = do
    arg <- observe regX
    execute1 0x00 arg
    a' <- observe regA
    assertEq "A is updated" a' arg

This is ensured by observe returning values wrapped in an Obs type, and execute1 requiring unwrapped arguments:

observe :: Query a -> TestM (Obs a)
execute1 :: Byte -> Byte -> TestM ()
assertEq :: (Eq a, Show a) => String -> Obs a -> Obs a -> TestM ()

To allow assertions over derived values, Obs is an applicative functor (in fact, it is the free applicative functor over the co-Yoneda functor of the primitive observations).

I think this approach has merit as a general framework for hardware simulator-based unit testing and I intend to extend it and maybe even carve it out into a separate library in the future.

A Minimal Viable PET

Once I had a sufficiently working CPU, I started building the other pieces around it. I took the PET emulator from the VICE suite and commented out all the PIA and VIA code, replacing writes with nops and reads with hardcoded values, until I was able to boot it up with the stock ROM to get to the READY. prompt. Of course, since the PIA supplying the interrupt used for timing was removed by that point, I had no flashing cursor or keyboard input. All in all, the system got to a steady state in about 80,000 operations. (Since my implementation is not yet cyle-accurate, I had to switch to counting operations instead of cycles beyond this point. Every operation is at least as fast as on the real chip, so I hope by adding some wait cycles I'll be able to take care of this at some later point.)

After hooking up the same hardcoded values on the same addresses to the CPU, the next step was running the simulator and peeking at the video memory area ($8000..$8FFF on the PET), using the original fonts to render the screen. The initial version showed there might be someone home (sorry for crap quality on the screenshot):


By comparing detailed logs from running the emulator and the simulator, I was able to make observations like "the first 12,345 steps seem to be in agreement", which was a big boost to productivity, getting me, in short order, to this:

Almost there!

After fixing some more bugs in the arithmetic opcodes, I was finally rewarded by this sight:

Ah... much better.

Adding more components

While working on the CPU, I also started writing the character generator, on top of the VGA signal generator in the kansas-lava-papilio package that I originally made for the CHIP-8. This way, the VGA synchronization signals were abstracted away from me and I just had to take care of pumping out the actual pixels. This turned out to be tricker than I originally thought, since you have to time all read-aheads just right so that everything is at hand just in time for the next pixel. So before it finishes drawing the 8 pixels that make up a single row of a character, the next character index is loaded from RAM, and then the character ROM is consulted for the first row of the font image of the next indexed character. Initial versions were having some ghosting issues, or even more fun, full character transpositions (like showing the character from one line above in the first position of each line).

The Commodore PET diverts the vsync signal from the video generator to one of the PIA chips, which generates a CPU interrupt that can be acknowledged by reading from one of its memory-mapped registers. So the next obvious step was to implement this functionality to get the cursor blinking! This required more than just implementing a PIA, since I didn't even have interrupts in the CPU at that point.

But all that work was totally worth it:

And now, here we are

The current version supports keyboard input from PS/2 keyboards (but not all keys are mapped yet), so for the first time since I started working on this more than a month ago, it can be used to write and run BASIC programs!

What you can't see on the video below is that there's still a bug somewhere that causes the classic 10 PRINT "FOO": 20 GOTO 10 program to terminate with an out of memory error after some time.

Apart from fixing these bugs, the big outstanding feature is to add Datasette support so that programs can be loaded and saved to virtual "casettes". For a first version, I'll just burn some extra ROM onto the FPGA containing the tape images and hook that up to the PIA controlling the casette player; but I guess the proper way to do this would be to use something like an SD card reader to get proper persistent, read-writable storage. Or maybe, alternatively, have some kind of serial-over-USB communication with a computer acting as the Datasette unit.

Typed embedding of STLC into Haskell

5 February 2015 (programming haskell language correctness)

Someone posted to the Haskell subreddit this blogpost of Lennart where he goes step-by-step through implementing an evaluator and type checker for CoC. I don't know why this post from 2007 showed up on Reddit this week, but it's a very good post, pedagogically speaking. Go and read it.

In this post, I'd like to elaborate on the simply-typed lambda calculus part of his blogpost. His typechecker defines the following types for representing STLC types, terms, and environments:

data Type = Base
          | Arrow Type Type
          deriving (Eq, Show)

type Sym = String

data Expr = Var Sym
          | App Expr Expr
          | Lam Sym Type Expr
          deriving (Eq, Show)

The signature of the typechecker presented in his post is as follows:

type ErrorMsg = String
type TC a = Either ErrorMsg a
newtype Env = Env [(Sym, Type)] deriving (Show)

tCheck :: Env -> Expr -> TC Type

My approach is to instead create a representation of terms of STLC in such a way that only well-scoped, well-typed terms can be represented. So let's turn on a couple of heavy-weight language extensions from GHC 7.8 (we'll see how each of them is used), and define a typed representation of STLC terms:

{-# LANGUAGE GADTs, StandaloneDeriving #-}
{-# LANGUAGE DataKinds, KindSignatures, TypeFamilies, TypeOperators #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TemplateHaskell #-} -- sigh...

import Data.Singletons.Prelude
import Data.Singletons.TH
import Data.Type.Equality

-- | A (typed) variable is an index into a context of types
data TVar (ts :: [Type]) (a :: Type) where
    Here :: TVar (t ': ts) t
    There :: TVar ts a -> TVar (t ': ts) a
deriving instance Show (TVar ctx a)

-- | Typed representation of STLC: well-scoped and well-typed by construction
data TTerm (ctx :: [Type]) (a :: Type) where
    TConst :: TTerm ctx Base
    TVar :: TVar ctx a -> TTerm ctx a
    TLam :: TTerm (a ': ctx) b -> TTerm ctx (Arrow a b)
    TApp :: TTerm ctx (Arrow a b) -> TTerm ctx a -> TTerm ctx b
deriving instance Show (TTerm ctx a)

The idea is to represent the context of a term as a list of types of variables in scope, and index into that list, de Bruijn-style, to refer to variables. This indexing operation maintains the necessary connection between the pointer and the type that it points to. Note the type of the TLam constructor, where we extend the context at the front for the inductive step.

To give a taste of how convenient it is to work with this representation programmatically, here's a total evaluator:

-- | Interpretation (semantics) of our types
type family Interp (t :: Type) where
    Interp Base = ()
    Interp (Arrow t1 t2) = Interp t1 -> Interp t2

-- | An environment gives the value of all variables in scope in a given context
data Env (ts :: [Type]) where
    Nil :: Env '[]
    Cons :: Interp t -> Env ts -> Env (t ': ts)

lookupVar :: TVar ts a -> Env ts -> Interp a
lookupVar Here      (Cons x _)  = x
lookupVar (There v) (Cons _ xs) = lookupVar v xs

-- | Evaluate a term of STLC. This function is total!
eval :: Env ctx -> TTerm ctx a -> Interp a
eval env TConst = ()
eval env (TVar v) = lookupVar v env
eval env (TLam lam) = \x -> eval (Cons x env) lam
eval env (TApp f e) = eval env f $ eval env e

Of course, the problem is that this representation is not at all convenient for other purposes. For starters, it is certainly not how we would expect human beings to type in their programs.

My version of the typechecker is such that instead of giving the type of a term (when it is well-typed), it instead transforms the loose representation (Term) into the tight one (TTerm). A Term is well-scoped and well-typed (under some binders) iff there is a TTerm corresponding to it. Let's use singletons to store type information in existential positions:

$(genSingletons [''Type])
$(singDecideInstance ''Type)

-- | Existential version of 'TTerm'
data SomeTerm (ctx :: [Type]) where
    TheTerm :: Sing a -> TTerm ctx a -> SomeTerm ctx

-- | Existential version of 'TVar'
data SomeVar (ctx :: [Type]) where
    TheVar :: Sing a -> TVar ctx a -> SomeVar ctx

-- | A typed binder of variable names
data Binder (ctx :: [Type]) where
    BNil :: Binder '[]
    BCons :: Sym -> Sing t -> Binder ts -> Binder (t ': ts)

Armed with these definitions, we can finally define the type inferer. I would argue that it is no more complicated than Lennart's version. In fact, it has the exact same shape, with value-level equality tests replaced with Data.Type.Equality-based checks.

-- | Type inference for STLC
infer :: Binder ctx -> Term -> Maybe (SomeTerm ctx)
infer bs (Var v) = do
    TheVar t v' <- inferVar bs v
    return $ TheTerm t $ TVar v'
infer bs (App f e) = do
    TheTerm (SArrow t0 t) f' <- infer bs f
    TheTerm t0' e' <- infer bs e
    Refl <- testEquality t0 t0'
    return $ TheTerm t $ TApp f' e'
infer bs (Lam v ty e) = case toSing ty of
    SomeSing t0 -> do
        TheTerm t e' <- infer (BCons v t0 bs) e
        return $ TheTerm (SArrow t0 t) $ TLam e'

inferVar :: Binder ctx -> Sym -> Maybe (SomeVar ctx)
inferVar (BCons u t bs) v
  | v == u = return $ TheVar t Here
  | otherwise = do
      TheVar t' v' <- inferVar bs u
      return $ TheVar t' $ There v'
inferVar _ _ = Nothing        

Note that pattern matching on Refl in the App case brings in scope type equalities that are crucial to making infer well-typed.

Of course, because of the existential nature of SomeVar, we should provide a typechecker as well which is a much more convenient interface to work with:

-- | Typechecker for STLC
check :: forall ctx a. (SingI a) => Binder ctx -> Term -> Maybe (TTerm ctx a)
check bs e = do
    TheTerm t' e' <- infer bs e
    Refl <- testEquality t t'
    return e'
    t = singByProxy (Proxy :: Proxy a)

-- | Typechecker for closed terms of STLC
check_ :: (SingI a) => Term -> Maybe (TTerm '[] a)
check_ = check BNil        

(The SingI a constraint is an unfortunate implementation detail; the kind of a is Type, which is closed, so GHC should be able to know there is always going to be a SingI a instance).

To review, we've written a typed embedding of STLC into Haskell, with a total evaluator and a typechecker, in about 110 lines of code.

If we were doing this in something more like Agda, one possible improvement would be to define a function untype :: TTerm ctx a -> Term and use that to give check basically a type of Binder ctx -> (e :: Term) -> Either ((e' :: TTerm ctx a) -> untype e' == e -> Void) (TTerm ctx a), i.e. to give a proof in the non-well-typed case as well.

Full code as a gist on Github

Arrow's place in the Applicative/Monad hierarchy

12 July 2014 (programming haskell language)

I've been thinking lately about arrows in relation to applicative functors and monads. The difference between the latter two is easy to intuit (and I'll describe it via an example below), but I never managed to get the same level of understanding about arrows. There's a somewhat famous paper about this question, which has a very clear-cut diagram showing that applicatives embed into arrows and arrows embed into monads (and both containments are non-trivial), which I understood as meaning every arrow is an applicative functor, and every monad is an arrow.

At first glance, this makes sense, given the well-known result that monads are exactly equivalent to arrows that are also instances of ArrowApply, as witnessed by the Haskell types Kleisli and ArrowMonad. However, there's no immediately obvious reason why you couldn't also turn an applicative functor into an arrow, so how does that leave any room for arrows to be different from applicatives? (As an aside, the fact that applicative functors have kind ⋆ → ⋆ and arrows have kind ⋆ → ⋆ → ⋆ has been a huge complication for me in trying to compare them).

Now, finally, based on the helpful replies to that StackOverflow question and the associated Reddit thread, I am confident enough to answer my own question.

Tom Ellis suggested thinking about a concrete example involving file I/O, so let's compare three approaches to it using the three typeclasses. To make things simple, we will only care about two operations: reading a string from a file and writing a string to a file. Files are going to be identified by their file path:

type FilePath = String

Monadic I/O

Our first I/O interface is defined as follows:

data IOM ∷ ⋆ → ⋆
instance Monad IOM
readFile ∷ FilePath → IOM String
writeFile ∷ FilePath → String → IOM ()

Using this interface, we can for example copy a file from one path to another:

copy ∷ FilePath → FilePath → IOM ()
copy from to = readFile from >>= writeFile to

However, we can do much more than that: the choice of files we manipulate can depend on effects upstream. For example, the below function takes an index file which contains a filename, and copies it to the given target directory:

copyIndirect ∷ FilePath → FilePath → IOM ()
copyIndirect index target = do
    from ← readFile index
    copy from (target ⟨/⟩ to)

On the flip side, this means there is no way to know upfront the set of filenames that are going to be manipulated by a given value action ∷ IOM α. By "upfront", what I mean is the ability to write a pure function fileNames :: IOM α → [FilePath].

Of course, for non-IO-based monads (such as ones for which we have some kind of extractor function μ α → α), this distinction becomes a bit more fuzzy, but it still makes sense to think about trying to extract information without evaluating the effects of the monad (so for example, we could ask "what can we know about a Reader Γ α without having a value of type Γ at hand?").

The reason we can't really do static analysis in this sense on monads is because the function on the right-hand side of a bind is in the space of Haskell functions, and as such, is completely opaque.

So let's try restricting our interface to just an applicative functor.

Applicative I/O

data IOF ∷ ⋆ → ⋆
instance Applicative IOF
readFile ∷ FilePath → IOF String
writeFile ∷ FilePath → String → IOF ()

Since IOF is not a monad, there's no way to compose readFile and writeFile, so all we can do with this interface is to either read from a file and then postprocess its contents purely, or write to a file; but there's no way to write the contents of a file into another one.

How about changing the type of writeFile?

writeFile′ ∷ FilePath → IOF (String → ())

The main problem with this interface is that while it would allow writing something like

copy ∷ FilePath → FilePath → IOF ()
copy from to = writeFile′ to ⟨*⟩ readFile from

it leads to all kind of nasty problems because String → () is such a horrible model of writing a string to a file, since it breaks referential transparency. For example, what do you expect the contents of out.txt to be after running this program?

(λ write → [write "foo", write "bar", write "foo"]) ⟨$⟩ writeFile′ "out.txt"

Two approaches to arrowized I/O

First of all, let's get two arrow-based I/O interfaces out of the way that don't (in fact, can't) bring anything new to the table: Kleisli IOM and Applicarrow IOF.

The Kleisli-arrow of IOM, modulo currying, is:

readFile ∷ Kleisli IOM FilePath String
writeFile ∷ Kleisli IOM (FilePath, String) ()

Since writeFile's input still contains both the filename and the contents, we can still write copyIndirect (using arrow notation for simplicity). Note how the ArrowApply instance of Kleisli IOM is not even used.

copyIndirect ∷ Kleisli IOM (FilePath, FilePath) ()
copyIndirect = proc (index, target) → do
    from ← readFile ↢ index
    s ← readFile ↢ from
    writeFile ↢ (to, s)

The Applicarrow of IOF would be:

readFile ∷ FilePath → Applicarrow IOF () String
writeFile ∷ FilePath → String → Applicarrow IOF () ()

which of course still exhibits the same problem of being unable to compose readFile and writeFile.

A proper arrowized I/O interface

Instead of transforming IOM or IOF into an arrow, what if we start from scratch, and try to create something in between, in terms of where we use Haskell functions and where we make an arrow? Take the following interface:

data IOA ∷ ⋆ → ⋆ → ⋆
instance Arrow IOA
readFile ∷ FilePath → IOA () String
writeFile ∷ FilePath → IOA String ()

Because writeFile takes the content from the input side of the arrow, we can still implement copy:

copy ∷ FilePath → FilePath → IOA () ()
copy from to = readFile from >>> writeFile to

However, the other argument of writeFile is a purely functional one, and so it can't depend on the output of e.g. readFile; so copyIndirect can't be implemented with this interface.

If we turn this argument around, this also means that while we can't know in advance what will end up being written to a file (before running the full IOA pipeline itself), but we can statically determine the set of filenames that will be modified.


Monads are opaque to static analysis, and applicative functors are poor at expressing dynamic-time data dependencies. It turns out arrows can provide a sweet spot between the two: by choosing the purely functional and the arrowized inputs carefully, it is possible to create an interface that allows for just the right interplay of dynamic behaviour and amenability to static analysis.

My first computer

29 March 2014 (programming haskell FPGA electronics retro)

tl;dr: I've built a computer on a Xilinx FPGA using Kansas Lava, based on a virtual machine design from the mid seventies.

I would be lying if I said I always wanted to build my own computer. For a very long time, hardware didn't tickle my curiosity at all; and even today, I prefer staying away from dangling wires and soldering irons. I like my computing platforms to Just Work™, and hardware problems are just a huge hassle. But then in 2010 some coworkers of mine started getting into electronics and I learned from them just enough to start hooking some ICs up on a breadbord, and it seemed like a very fun diversion from all the high-level, abstract, softwarey stuff. In some sense, it filled the same void in me that assembly programing would probably have had. But even back in the days on the Commodore 64, I was looking for more BASIC extensions instead of going downwards to assembly / machine code.

One thing led to another and I was creating a Brainfuck CPU on a Papilio FPGA. It was a very satisfying experience, plus, I got to learn about a completely new world (that of digital hardware design and hardware description languages). So ever since then, I had a voice in the back of my head saying I should go all in, and implement a proper full computer, with I/O and all. But jumping straight into replicating a real computer seemed like trying to run before I can walk.

I can't remember now how I bumped into the CHIP-8 platform, but when I read about it in detail, I realized this is something I, a self-taught HDL guy, could realistically implement. To recap, it's a virtual machine spec from the mid seventies indended to be implemented on home computers of the time. It is super low performance, meaning I could implement everything the most naïve way possible and still get something workable out of it: graphics is 64×32 black & white, RAM is short of 4KBytes, CPU speed is not really part of the spec, and the only timers provided run at 60 Hz.

I can do this!

Setting the scene

The FPGA board that I targeted is a Papilio One 500K, which has 500K... somethings called "system gates", and about 40KByte of SRAM. The Arcade MegaWing provides a D-sub 15 VGA connector and two PS/2 ports, among some other stuff that I am not using for this project.

The home computer targeted by the original CHIP-8 spec only had a four-by-four keypad, so I am not 100% happy about using a PS/2 keyboard for the input. Maybe a later version could use a bespoke keypad connected to the unused pins of the LogicStart MegaWing. However, by using a PS/2 keyboard, I could not worry about the hardware side of things and just implement a decoder for the PS/2 protocol.

In terms of software, there are several games available for the CHIP-8. Maybe even donzens! But just for fun, after finishing the machine itself, I ended up writing my own game as well: a crappy port of the currently trending 2048 game.

Understanding the CHIP-8

I would say the CPU itself is a RISC architecture, in that it has 16 registers and not a whole lot of operations. But I'm not sure it would be called RISC back in its day.

The aforementioned 64×32 one-bit graphics is manipulated via a sprite interface: there's a special CPU opcode for xor-blitting an 8-pixel-wide, variable-height sprite onto any part of the screen. I went with the obvious way of implementing it as a 2048-bit frame buffer.

To validate my understanding, I first created a software emulator in Haskell. That unearthed a ton of edge cases and situations where I was not reading the spec with enough attention. Once I had the emulator working well enough to play the sample games I found, I enumerated the modules I'll need to implement.

The emulator running a Tetris game

A TODO list

Since I've already done the Brainfuck CPU, I didn't foresee too much difficulty in implementing the CPU proper (oh boy was I wrong). However, the world of peripherals was a new one to me.

I've never looked into VGA signal timings in detail before, and for some reason I just assumed that it's going to be just as complicated as PAL, about which I knew just enough to know that you have to generate all kinds of elaborate sync patterns. So actually reading the VGA spec was a relief, and I quickly came up with a scheme where my CHIP-8 computer would be running at 50 MHz, so that it can easily implement the 25 MHz pixel clock needed for 640×480@60 Hz. I went with this mode because it has several advantages:

The Papilio One itself has a clock of 32 MHz, and I first hoped that 32 Mhz and 25 Mhz is close enough that I can just generate a signal using the 32 MHz as the pixel clock. Turns out that's not quite how signal timings work.

Luckily, I've found out that the Papilio One also has something called a DCM which can divide and multiply the clock signal. I was able to go to 25 or 50 MHz easily with it. It's a bit of a black box component: I had to run a wizard in the Xilinx IDE to get a small binary file describing my DCM parameters, and then I integrated it into my build process by running another Xilinx program on it which spits out some magic bitstream.

The PS/2 protocol is a simple serial protocol with a slow (~10 KHz) clock and one parity bit per 8 data bits. Decoding it into a stream of bytes was a straightforward thing once I hunted down an actual PS/2 keyboard, since it turned out those USB to PS/2 dongles don't really work with regular USB keyboards; rather, the keyboard have to be made ready to "speak" PS/2 over the dongle. So I ended up getting a proper PS/2 keyboard from Mustafa Centre (thanks to Viki's sister Dori for picking it up for me); god only knows why they still had one on stock.


The Xilinx tool suite seems to be aimed at being used from the IDE. This has several drawbacks: version controlling is complicated because generated artifacts litter the source tree; the editor itself in the IDE is of course crap compared to Emacs; and most importantly, I didn't intend to manually write either VHDL or Verilog. As I've mentioned before in my post about the Brainfuck CPU, I've found both of these hardware description languages to be lacking in the same way mainstream imperative programming languages are: the tools you, the developer, is given to introduce your own abstractions are very poor. So I planned to use Kansas Lava, an HDL embedded in Haskell.

Now, the first thing to note about Kansas Lava is, as nice at is, the software package itself is bit-rotten. The latest version available on Hackage cannot even be compiled with GHC 7.6. While the change to fix that is trivial, after that, you can easily bump into bugs in the Lava compiler itself. But more on that later. Later versions available from GitHub are not even self-consisent between the various dependencies. I've put my patched-up version of Kansas Lava and some of the packages it dependens on on GitHub and I'm trying to get Andy Gill to allow me to upload the bundle as a update to Hackage. Don says I should maybe say fuck it and create my own Singapore Lava fork, just to stay with the Lava tradition of a twisty little maze of incompatible forks, all alike.

However, when it all works, it is amazing. I was able to extend the library of Papilio-specific Lava code that I created for the Brainfuck project with reusable modules for the VGA signal generator and the PS/2 decoder in such a way that they should be very easy to be re-used for any other future projects. And it's all type-safe, so for example, the Papilio Arcade MegaWing VGA port is statically enforced to be 4+4+4 bits whereas the LogicStart MegaWing is 3+3+2.

But there was one bug that left a bitter aftertaste in my mouth. Once I had both the CPU and the VGA parts working and I started running some test programs on the thing, I noticed that the framebuffer blitting is exhibiting "or" behaviour instead of "xor". Running the same code in the Lava simulator and dumping the contents of the framebuffer, it showed the correct, "xor" behaviour. After weeks of frustration and a complete rework of the communication system between the CPU, the VGA signal generator and the frame buffer to add memory read bussing, I finally took a look at the Lava compiler's source code to find that it simulates the xor2 primitive as xor but compiles it to or. How do you not notice this bug? Has the Kansas Lava suite been used by anyone for everything at all in the history of ever???

End result

The final result, after much trial and error, looking at the Lava simulator's output, and pouring over the code, is available here. I'm reasonably happy about the code base, except for the implementation of the CPU itself, which feels a bit spaghetti to me. Especially around the parts where it's waiting for results to come back from main RAM or the video framebuffer.

Below are some photographs and videos of it running two games: the memory puzzle game Hidden and the 2048 clone mentioned above. Unfortunately, the Tetris game from the screenshot above seems to have a bug in its input handling, in that it samples the keyboard at an unthrottled frequency; thus, even the shortest keypress sends the falling piece to the edge of the wall. I'll eventually need to disassemble the game and fix this.

The VGA signal generator is not as neat as it could be, because it doesn't do pre-fetching. This means by the time the current CHIP-8 pixel's value is read from the framebuffer, it is already too late, the first of the 8 real pixels for the given virtual CHIP-8 pixel should already have been drawn. This results in some ghosting. But since I know what's causing it, I will hopefully be able to fix this in some next version. But right now I'm too happy to have the whole thing just working.

The setup

Detail of the FPGA board itself

Older entries:

Entries from all tags