Posts tagged fpga

Formatting serial streams in hardware

12 August 2024 (programming haskell clash fpga)

I've been playing around with building a Sudoku solver circuit on an FPGA: you connect to it via a serial port, send it a Sudoku grid with some unknown cells, and after solving it, you get back the solved (fully filled-in) grid. I wanted the output to be nicely human-readable, for example for a 3,3-Sudoku (i.e. the usual Sudoku size where the grid is made up of a 3 ⨯ 3 matrix of 3 ⨯ 3 boxes):

4 2 1  9 5 8  6 3 7  
8 7 3  6 2 1  9 5 4  
5 9 6  4 7 3  2 1 8  

3 1 2  8 4 6  7 9 5  
7 6 8  5 1 9  3 4 2  
9 4 5  7 3 2  8 6 1  

2 8 9  1 6 4  5 7 3  
1 3 7  2 9 5  4 8 6  
6 5 4  3 8 7  1 2 9

This post is about how I structured the stream transformer that produces all the right spaces and newlines, yielding a clash-protocols based circuit.

Cheap and cheerful microcode compression

2 May 2022 (programming haskell clash fpga retro)

This post is about an optimization to the Intel 8080-compatible CPU that I describe in detail in my book Retrocomputing in Clash. It didn't really fit anywhere in the book, and it isn't as closely related to the FPGA design focus of the book, so I thought writing it as a blog post would be a good idea.

Just like the real 8080 from 1974, my Clash implementation is microcoded: the semantics of each machine code instruction of the Intel 8080 is described as a sequence of steps, each step being the machine code instruction of an even simpler, internal micro-CPU. Each of these micro-instruction steps are then executed in exactly one clock cycle each.

My 8080 doesn't faithfully replicate the hardware 8080's micro-CPU; in fact, it doesn't replicate it at all. It is a from-scratch design based on a black box understanding of the 8080's instruction set, and the main goal was to make it easy to understand, instead of making it efficient in terms of FPGA resource usage. Of course, since my micro-CPU is different, the micro-instructions have no one to one correspondence with the orignal Intel 8080, and so the microcode is completely different as well.

In the CPU, after fetching the machine code instruction byte, we look up the microcode for that byte, and then execute it cycle by cycle until we're done. This post is about how to store that microcode efficiently.

A tiny computer for Tiny BASIC

17 November 2020 (programming haskell clash fpga retro)

Just a quick post to peel back the curtain a bit on a Clash example I posted to Twitter. My tweet in question shows the following snippet, claiming it is "a complete Intel 8080-based FPGA computer that runs Tiny BASIC":

logicBoard
    :: (HiddenClockResetEnable dom)
    => Signal dom (Maybe (Unsigned 8)) -> Signal dom Bool -> Signal dom (Maybe (Unsigned 8))
logicBoard inByte outReady = outByte
  where
    CPUOut{..} = intel8080 CPUIn{..}

    interruptRequest = pure False

    (dataIn, outByte) = memoryMap _addrOut _dataOut $ do
        matchRight $ do
            mask 0x0000 $ romFromFile (SNat @0x0800) "_build/intel8080/image.bin"
            mask 0x0800 $ ram0 (SNat @0x0800)
            mask 0x1000 $ ram0 (SNat @0x1000)
        matchLeft $ do
            mask 0x10 $ port $ acia inByte outReady

I got almost a hundred likes on this tweet, which isn't too bad for a topic as niche as hardware design with Haskell and Clash. Obviously, the above tweet was meant as a brag, not as a detailed technical description; but given the traction it got, I thought I might as well try to expand a bit on it.

Integrating Verilator and Clash via Cabal

7 May 2020 (programming haskell clash fpga)

TL;DR: This is a detailed description of how I got Clashilator working seamlessly from Cabal. It took me three days to figure out how the pieces need to fit together, and most of it was just trawling the documentation of Cabal internals, so I better write this down while I still have any idea what I did.

Background information

We set the scene with the dramatis personæ first:

Clash is a Haskell-to-HDL compiler: you run it on your Haskell source file, and it emits Verilog or VHDL.
Verilator is a Verilog simulator: it takes Verilog as input, and outputs C++ code. If you then compile that C++ code, you get a library which simulates the Verilog circuit: it gives you an interface to fiddle with input pin values, you tell Verilator to propagate changes, and then you can read out output pin values.

When designing some circuit, it is very useful to be able to simulate its behaviour. Getting debugging information out of a hardware FPGA is a huge hassle; iteration is slow because FPGA synthesis toolchains, by and large, suck; and driving the circuit (i.e. setting the right inputs in the right sequence and interpreting the outputs) can be almost as complicated as the circuit under testing itself. Of course, all of these problems apply doubly to someone like me who is just dabbling in FPGAs.

So during development, instead of synthesizing a circuit design and loading it onto an FPGA, we want to simulate it; and of course we want to use nice expressive languages to write the test bench that envelopes the simulated circuit. One way to do this is what I call very high-level simulation: in this approach, we take the Haskell abstractions we use in our circuit, and reinterpret them in a software context. For example, we might have a state machine described as i -> State s o: instead of lifting it into a signal function, we can just runState it in a normal Haskell program's main and do whatever we want with it.

However, sometimes we want to simulate whole circuits, i.e. Signal dom i -> Signal dom o functions that might have who knows what registers and memory elements inside. For example, if we have a circuit that generates video output from a frame buffer, there's a difference between a high-level simulation that renders the frame buffer's contents to the screen, and a lower level one that interprets the VGA signal output of the circuit. Timing issues in synchronizing the VGA blanking signals with the color lines will only manifest themselves in the latter. So for this kind of applications, Clash of course contains a signal simulator that can be used to feed inputs into a circuit and get outputs. For example, here's a simulation of a Brainfuck computer where only the peripheral lines are exposed: internal connections between the CPU and the RAM and ROM are all part of the Clash circuit.

There is only one problem with the Clash simulator: its performance. This small benchmark shows how long it takes to simulate enough cycles to draw 10 full video frames at 640 ⨯ 480 resolution (i.e. 4,192,000 cycles). Clash does it in ~13 seconds; remember, at 60 FPS, it shouldn't take more than 166 milliseconds to draw 10 frames if we want to simulate it in real time. Of course, real-time simulation at this level of detail isn't necessarily feasable on consumer hardware; but less than one frame per second means any kind of interactive applications are out.

In contrast, Verilator, an open-source Verilog simulator can run all 4,192,000 cycles in 125 ms. This could be faster than realtime, were it not for the additional overhead of drawing each frame to the screen (and the Haskell FFI of 4 million calls accross to C...), and of course this is a very simple circuit that only renders a single bouncing ball; anything more complex will only be slower. But still, that same benchmark shows that 20+ FPS is possibe, end-to-end, if we ditch Clash and use Verilator instead. Pong is fully playable at 13 FPS.

Clash, Verilog and Haskell

The interface between Clash and Verilator is simple: Verilator consumes Verilog code, so we can simply run Clash and point Verilator at its output. However, we still need to connect that Verilator code to the Haskell code to drive the inputs and interpret the outputs. Here are the glue files from the Pong example:

On the C++ side, we need to define structs for the input and output pins, and write boilerplate that applies inputs and retrieves outputs from the Verilator-generated state.
Also on the C++ side, we write some extern "C" functions that are easier to use via FFI than real C++ methods.
On the Haskell side, we define record types for the input and output pins, with Storable instances to marshal them between Haskell and C. Also we use FFI imports to connect to the extern "C" functions; in effect, we expose the Verilator simulation to Haskell as a function simStep :: Sim -> Input -> IO Output.
Just for extra hairiness, it is worth noting what exactly that Makefile there is doing: it has a rule to run Verilator on the Clash output, and then one of the results of Verilator is a new Makefile which orchestrates building a static library and an object file, which must be linked together with our C++-side wrappers, to produce the final static library ready to be used from Haskell.

As I was preparing to write the next chapter of a Clash book I've been working on, I made a new Clash project and then, because I needed a Verilator simulation for it, I started going through the steps of making all these files. And of course, I realized this should be all automated. But what is all in that sentence?

Step one was to write generators for the C++ and Haskell source files and the Makefile. This is quite easy, actually; after all, it is the fact that these files are so regular that makes it infuriating writing them by hand. So we do a bit of text template substitution, using Clash's .manifest output as the source of input/output pin names and bus widths. This gives us a simple code generator: you run Clash yourself, point Clashilator at a .manifest file, and it outputs a bunch of files, leaving you ready to run make. Mission accomplished?

No, not really.

Clash, Verilog and... Cabal

While we've eliminated the boilerplate in the source files, one source of boilerplate remains: the Cabal package settings. Here's the relevant HPack package.yaml section from the Pong example:

extra-libraries: stdc++ 
extra-lib-dirs: verilator
include-dirs: verilator
build-tools: hsc2hs

ghc-options:
  -O3
  -fPIC -pgml g++
  -optl-Wl,--whole-archive -optl-Wl,-Bstatic
  -optl-Wl,-L_build/verilator -optl-Wl,-lVerilatorFFI
  -optl-Wl,-Bdynamic -optl-Wl,--no-whole-archive

Oof, that hurts. All those magic GHC options just to statically link to libVerilatorFFI.a, to be repeated accross all Clash projects that use Verilator...

Also, while Clashilator outputs a Makefile to drive the invocation of Verilator and the subsequent compilation of the C++ bits, it doesn't give you a solution for running *that* Makefile at the right time — not to mention running Clashilator itself!

The problem here is that in order to compile a Verilator-using Haskell program, we first need to compile the other, non-simulation modules into Verilog. And this is tricky because those modules can have external dependencies: remember Clash is just a GHC backend, so you can import other Haskell libraries. And how are we going to bring those libraries in scope? I myself use Stack but your mileage may vary: you could be using Cabal directly, or some Cabal-Nix integration. In an case, you'd basically need to build your package so you can compile to Verilog so you can run Verilator so you can... build your package.

To solve this seemingly circular dependency, and to get rid of the Cabal file boilerplate, I decided to try and do everything in the Cabal workflow. Whatever your preferred method of building Haskell packages, when the rubber hits the road, they all ultimately run Cabal. If we run Clash late enough in the build process, all dependencies will be installed by then. If we run Verilator early enough in the build process, the resulting library can be linked into whatever executable or library Cabal is building.

If we do all this during cabal build, everything will happen at just the right time.

Is such a thing even possible? Yes it is.

So, what gives us at least a fighting chance is that Cabal is extensible with so-called hooks. You can write a custom Setup.hs file like this:

import Distribution.Simple

main = defaultMainWithHooks simpleUserHooks

Here, simpleUserHooks is a record with a bunch of fields for extension points; of particular interest to us here is this one:

buildHook :: PackageDescription -> LocalBuildInfo -> UserHooks ->
BuildFlags -> IO ()

At ths point, we have breached the defenses: inside buildHook, we can basically do arbitrary things as long as the effect is building the package. In particular, we can:

Go over all executables and libraries of the PackageDescription, and retrieve the value of some custom x-clashilator-* fields. I use x-clashilator-top-is to mark the module containing the topEntity definition we want to simulate; this is the equivalent of the main-is field for normal Haskell code. By setting this field for a given package component, Clashilator is activated for that component.
Get all the package details of dependencies from LocalBuildInfo and pass them to Clash
Get the compilation target directory from LocalBuildInfo so that we know where we need to put the results: both the generated files and the built static library.
The LocalBuildInfo also tells us how to run pkg-config, which we can use to find the location of Verilator's header files we need to include.
Edit the BuildInfo of the given component. This is where the real fun is. We've just run Clash and Verilator and Make: we have a new lib*.a library to link to and a new .hsc source file to compile into our Haskell program. We add the .hsc module to the list of otherModules of the given component, and we add all the magic linker flags to the compiler options.
And do all this according to the verbosity level of the BuildFlags to play nicely with Cabal's --verbose flag and similar settings.

The result of all this is a bunch of new files under the build directory, and modified BuildInfos for all the components marked with x-clashilator-top-is. We put these back into the PackageDescription and then call the default buildHook, which then goes on to compile and link the Haskell simulation integrating the Verilator parts:

clashilatorBuildHook :: PackageDescription -> LocalBuildInfo -> UserHooks -> BuildFlags -> IO ()
clashilatorBuildHook pkg localInfo userHooks buildFlags = do
    pkg' <- clashilate pkg localInfo buildFlags
    buildHook simpleUserHooks pkg' localInfo userHooks buildFlags

All the details are in the full implementation, which is probably worth a look if you are interested in Cabal's internals. As I wrote in the beginning of this post, it took me days to wade through the API to find all the moving parts that can be put together to apply this level of violence to Cabal. The final code also exports a convenience function clashilatorMain for the common case where enabling Clashilator is the only desired Setup.hs customization; and also clashilate itself for hardcore users who want to build their own buildHooks.

The implementation is almost 150 lines of not particularly nice code. It is also missing some features; most notably, it doesn't track file changes, so Clash and Verilator is always rerun, even if none of the Clash source files have changed. It is also completely, utterly untested. But it does give us what we set out to do: completely boilerplate-less integration of Clash and Verilator. A complete example package is here, and here's the money shot: the executables section of the package.yaml file.

executables:
  simulator:
    main: simulator.hs
    verbatim:
      x-clashilator-top-is: MyCircuit.Nested.Top
      x-clashilator-clock: CLK

Composable CPU descriptions in CλaSH, and wrap-up of RetroChallenge 2018/09

30 September 2018 (programming haskell fpga electronics retrochallenge retro clash chip-8)

My last post ended with some sample CλaSH code illustrating the spaghetti-ness you can get yourself into if you try to describe a CPU directly as a function (CPUState, CPUIn) -> (CPUState, CPUOut). I promised some ideas for improving that code.

To start off gently, first of all we can give names to some common internal operations to help readability. Here's the code from the previous post rewritten with a small kit of functions:

intensionalCPU (s0, CPUIn{..}) = case phase s of
    WaitKeyPress reg ->
        let s' = case cpuInKeyEvent of
                Just (True, key) -> goto Fetch1 . setReg reg key $ s
                _ -> s
            out = def{ cpuOutMemAddr = pc s' }
        in (s', out)                   
    Fetch1 ->
        let s' = goto Exec s{ opHi = cpuInMem, pc = succ $ pc s }
            out = def{ cpuOutMemAddr = pc s' }
        in (s', out)
  where
    s | cpuInVBlank = s0{ timer = fromMaybe 0 . predIdx $ timer s0 }
      | otherwise = s0

Using a `State` monad

To avoid the possibility of screwing up the threading of the internal state, we can use the State CPUState monad:

Blog tags

Blog tags

Posts tagged fpga

Formatting serial streams in hardware

Cheap and cheerful microcode compression

A tiny computer for Tiny BASIC

Integrating Verilator and Clash via Cabal

Background information

Clash, Verilog and Haskell

Clash, Verilog and... Cabal

Composable CPU descriptions in CλaSH, and wrap-up of RetroChallenge 2018/09

Using a `State` monad

Older posts:

Posts from all tags

Posts tagged fpga

Background information

Clash, Verilog and Haskell

Clash, Verilog and... Cabal

Using a State monad

Older posts:

Using a `State` monad