Blog tags RSS

All entries

Very high-level simulation of a CλaSH CPU

15 September 2018 (programming haskell fpga electronics retrochallenge)

Initially, I wanted to talk this week about how I plan to structure the CλaSH description of the CHIP-8 CPU. However, I'm postponing that for now, because I ran into what seems like a CλaSH bug, and I want to see my design run on real hardware before I describe it in too much detail. So instead, here's a post on how I am testing in software.

CPUs as Mealy machines

After stripping away all the nice abstractions that I am using in my description of the CPU, what remains is a Mealy machine, which simply means it is described by a state transition and output function s -> i -> (s, o). If that looks familiar, that is not a coincidence: this is, of course, just one argument flip away from the Kleisli category of the State s monad. Just think of it as being either this or that, depending on which one you have more intuition about. A lot more on this in my upcoming blogpost.

My CHIP-8 CPU is currently described by a Mealy machine over these types:

data CPUIn = CPUIn
    { cpuInMem :: Word8
    , cpuInFB :: Bit
    , cpuInKeys :: KeypadState
    , cpuInKeyEvent :: Maybe (Bool, Key)
    , cpuInVBlank :: Bool
    }

data Phase
    = Init
    | Fetch1
    | Exec
    | StoreReg Reg
    | LoadReg Reg
    | ClearFB (VidX, VidY)
    | Draw DrawPhase (VidX, VidY) Nybble (Index 8)
    | WaitKeyPress Reg

data CPUState = CPUState
    { opHi, opLo :: Word8
    , pc, ptr :: Addr
    , registers :: Vec 16 Word8
    , stack :: Vec 24 Addr
    , sp :: Index 24
    , phase :: Phase
    , timer :: Word8
    }

data CPUOut = CPUOut
    { cpuOutMemAddr :: Addr
    , cpuOutMemWrite :: Maybe Word8
    , cpuOutFBAddr :: (VidX, VidY)
    , cpuOutFBWrite :: Maybe Bit
    }        

cpu :: CPUIn -> State CPUState CPUOut
      

Running the CPU directly

Note that all the types involved are pure: signal inputs are turned into pure input by CλaSH's mealy function, and the pure output is similarly turned into a signal output. But what if we didn't use mealy, and ran cpu directly, completely sidestepping CλaSH, yet still running the exact same implementation?

That is exactly what I am doing for testing the CPU. By running its Mealy function directly, I can feed it a CPUIn and consume its CPUOut result while interacting with the world — completely outside the simulation! The main structure of the code that implements the above looks like this:

stateful :: (MonadIO m) => s -> (i -> State s o) -> IO (m i -> (o -> m a) -> m a)
stateful s0 step = do
    state <- newIORef s0
    return $ \mkInput applyOutput -> do
        inp <- mkInput
        out <- liftIO $ do
            s <- readIORef state
            let (out, s') = runState (step inp) s
            writeIORef state s'
            return out
        applyOutput out
      

Hooking it up to SDL

I hooked up the main RAM and the framebuffer signals to IOArrays, and wrote some code that renders the framebuffer's contents into an SDL surface and translates keypress events. And, voilà: you can run the CHIP-8 computer, interactively, even allowing you to use good old trace-based debugging (which is thankfully removed by CλaSH during VHDL generation so can even leave them in). The below screencap shows this in action: :main is run from clashi and starts the interactive SDL program, with no Signal types involved.

PS/2 keyboard interface in CλaSH

8 September 2018 (programming haskell fpga electronics retrochallenge)

This week, most of my weekday evenings were quite busy, but I did manage to squeeze in a PS/2 keyboard interface in small installments; then today I went down the rabbit hole of clearing up some technical debt I've accumulated so far by not really looking into how CλaSH handled clock domains.

PS/2 signals

(Just to preempt any possible confusion, we're talking about the peripheral port of the IBM Personal System/2 introduced in the late '80s, not the Playstation 2 console)

The same way VGA is ideal for hobbyist video signal generation since it is both simple and ubiquitous, PS/2 is the go-to interface for keyboards. It is a two-directional, synchronous serial protocol with a peripheral-generated clock in the 10-20 KHz range. Any PC keyboard old enough will support it. One important caveat, though, is that the common USB-to-PS/2 adapters don't actually convert signals, and so they only work with keyboards that were originally designed with that conversion in mind. Here, we are only concerned with device to host communication; it is also possible to communicate in the other direction to e.g. change the Num Lock LED's state.

"Synchronous" here means that there is a separate clock line, unlike in the family of asynchronous serial protocols that were used in RS-232; it is this latter one that is usually meant as "serial communication" when unqualified. In synchronous serial communication, everything happens on the clock ticks; in asynchronous communication, there is no separate clock signal, so the data signal has enough structure that the two communicating sides can agree on the exact framing.

Turning the data line of PS/2 into a stream of bits is a straightforward process: the standard prescribes sampling the data line on the falling edge of the clock line. We also apply an 8-cycle debouncer for good measure, just because some pages on the Internet suggest it:

data PS2 dom = PS2
    { ps2Clk :: Signal dom Bit
    , ps2Data :: Signal dom Bit
    }

samplePS2
    :: (HiddenClockReset dom gated synchronous)
    => PS2 dom -> Signal dom (Maybe Bit)
samplePS2 PS2{..} = enable <$> isFalling low ps2Clk' <*> ps2Data'
  where
    ps2Clk' = debounce d3 low ps2Clk
    ps2Data' = debounce d3 low ps2Data        
      

The second step in that pipeline is to shift in the bits, 11 at a time. A leading low bit signals the start of a packet; eight data bits and one parity bit follow; the packet is finished with one high bit. Of course, only the eight data bits are presented externally. I use a WriterT (Last Word8) (State PS2State) monad to implement this logic, and then turn that into a CλaSH Mealy machine, in a pattern that I plan to use a lot in implementing the CHIP-8 CPU later:

 data PS2State
    = Idle
    | Bit Word8 (Index 8)
    | Parity Word8
    | Stop (Maybe Word8)

decodePS2
    :: (HiddenClockReset dom gated synchronous)
    => Signal dom (Maybe Bit) -> Signal dom (Maybe Word8)
decodePS2 = flip mealyState Idle $ \bit -> fmap getLast . execWriterT . forM_ bit $ \bit -> do
    state <- get
    case state of
        Idle -> do
            when (bit == low) $ put $ Bit 0 0
        Bit x i -> do
            let x' = shiftInLeft bit x
            put $ maybe (Parity x') (Bit x') $ succIdx i
        Parity x -> do
            let checked = bit /= parity x
            put $ Stop $ enable checked x
        Stop x -> do
            when (bit == high) $ tell $ Last x
            put Idle       
      

A quick change in hardware

To be able to try out on real hardware what I had at this point, I had to leave the trusty LogicStart Mega-Wing of my Papilio Pro, and instead switch over to the Arcade since that one has a PS/2 port. There are actually two ports on it, so that one could connect e.g. a keyboard and a mouse.

This change involved rewriting my UCF file since the pinout is different from the LogicStart. Also, the Arcade has 4+4+4 bits of VGA color output instead of the previous 3+3+2; of course with the black & white graphics of the CHIP-8, that color depth is all going to waste with this project.

PS/2 scan codes

Unfortunately, it is not enough to shift in the PS/2 data into a byte: we also have to make sense of that byte. While this could be as straightforward as interpreting each byte as the ASCII code of the character on the key pressed, the reality is not this simple. Keyboards emit so-called scan codes, where one or several bytes can encode a single keypress or key release event (see here for example for a list of some keyboard scan codes). I haven't been able to come up with an elegant way of handling this yet, so for now I just have some messy Mealy machine that returns a 16-bit code, where the high byte is zero for one-byte codes. You can see in the comment my frustration at both the implementation and the spec itself:

data KeyEvent = KeyPress | KeyRelease
    deriving (Generic, NFData, Eq, Show)

data ScanCode = ScanCode KeyEvent Word16
    deriving (Generic, NFData, Eq, Show)

data ScanState
    = Init
    | Extended Word8
    | Code KeyEvent Word8

-- TODO: rewrite this for clarity.
-- All it does is it parses 0xE0 0xXX into an extended (16-bit) code, and everything else into
-- an 8-bit code. The only complication is that the key release marker 0xF0 is always the
-- second-to-last byte. Who does that?!?
parseScanCode
    :: (HiddenClockReset dom gated synchronous)
    => Signal dom (Maybe Word8) -> Signal dom (Maybe ScanCode)
parseScanCode = flip mealyState Init $ \raw -> fmap getLast . execWriterT . forM_ raw $ \raw -> do
    let finish ev ext = do
            tell $ Last . Just $ ScanCode ev $ fromBytes (ext, raw)
            put Init
    state <- get
    case state of
        Init | raw == 0xe0 -> put $ Extended raw
             | raw == 0xf0 -> put $ Code KeyRelease 0x00
             | otherwise -> finish KeyPress 0x00
        Extended ext | raw == 0xf0 -> put $ Code KeyRelease ext
                     | otherwise -> finish KeyPress ext
        Code ev ext -> finish ev ext
  where
    fromBytes :: (Word8, Word8) -> Word16
    fromBytes = unpack . pack        
      

Driving a CHIP-8 pixel around

With the video output from last time and the keyboard from this post, but no CPU yet, our options to put everything together into something impressive are somewhat limited. I ended up showing a single CHIP-8 pixel that can be moved around in the CHIP-8 screen space with the arrow keys; this results in something tangible without needing a CPU or even a framebuffer yet. Note how well the code lends itself to using applicative do syntax:

VGADriver{..} = vgaDriver vga640x480at60
ps2 = decodePS2 $ samplePS2 PS2{..}

(dx, dy) = unbundle $ do
    key <- parseScanCode ps2
    pure $ case key of
        Just (ScanCode KeyPress 0xe075) -> (0, -1) -- up
        Just (ScanCode KeyPress 0xe072) -> (0, 1)  -- down
        Just (ScanCode KeyPress 0xe06b) -> (-1, 0) -- left
        Just (ScanCode KeyPress 0xe074) -> (1, 0)  -- right
        _ -> (0, 0)

pixel = do
    x <- fix $ register 0 . (+ dx)
    y <- fix $ register 0 . (+ dy)
    x0 <- (chipX =<<) <$> vgaX
    y0 <- (chipY =<<) <$> vgaY
    pure $ case (,) <$> x0 <*> y0 of
        Just (x0, y0) -> (x0, y0) == (x, y)
        _ -> False
      

But wait! There's more!

In reality, after getting the PS/2 decoder working, but before hooking it up to the scan code parser, I thought I'd use the serial IO on the Papilio Pro to do a quick test by just transmitting the scan codes straight away as they come out of the decoder. This has then prompted me to clean up a wart on my UART implementation: they took the clock rate as an extra term-level argument to compute the clock division they need to do:

tx
    :: (HiddenClockReset domain gated synchronous)
    => Word32
    -> Word32
    -> Signal domain (Maybe Word8)
    -> TXOut domain
tx clkRate serialRate inp = TXOut{..}
  where
    (txReady, txOut) = unbundle $ mealyState (tx0 $ clkRate `div` serialRate) (0, Nothing) inp        
      

This bothered me because the clock domain already specifies the clock rate, at the type level. Trying to remove this redundancy has led me down a rabbit hole of what I believe is a CλaSH bug; but at least I managed to work around that for now (until I find an even better way).

This, in turn, forced me to use a proper clock domain with the correct clock period setting in my CHIP-8 design:

-- | 25.175 MHz clock, needed for the VGA mode we use.
-- CLaSH requires the clock period to be specified in picoseconds.
type Dom25 = Dom "CLK_25MHZ" (1000000000000 `Div` 25175000)        
      

But then, this allowed me to start putting pixel clock specifications into the type of VGATimings, allowing me to statically enforce that the clock domain in which the VGA signal generator runs is at exactly the right frequency:

vgaDriver
    :: (HiddenClockReset dom gated synchronous, KnownNat w, KnownNat h)
    => (dom ~ Dom s ps, ps ~ (1000000000000 `Div` rate))
    => VGATimings rate w h
    -> VGADriver dom w h
    
-- | VGA 640*480@60Hz, 25.175 MHz pixel clock
vga640x480at60 :: VGATimings 25175000 10 10
vga640x480at60 = VGATimings
    { vgaHorizTiming = VGATiming 640 16 96 48
    , vgaVertTiming  = VGATiming 480 11  2 31
    }
      

CλaSH video: VGA signal generator

2 September 2018 (programming haskell fpga electronics retrochallenge)

I decided to start with the video signal generation so that I have something fun to look at, and also so that I can connect other components to it at later steps.

VGA signal primer

VGA is a very simple video signal format with separate digital lines for vertical and horizontal synchronization, and three analog lines for the red, green and blue color channels. This is so much simpler than TV signals like PAL or NTSC that were designed to be backwards-compatible with early black and white TV formats and that only support a single row count, it has some quite low-bandwidth standard modes (with pixel clocks at just tens of MHz), and the whole world is filled with displays that support VGA; put it together, and it is ideal for hobbyist computer projects.

The basic mode of operation is you do a bit of a song and dance on the vertical and horizontal sync lines, and then just keep a counter of where you are so you know what color to put on the red/green/blue lines. The clock speed one should use for this counter is intimately tied to the sync pattern, and this is where VGA timing databases come into play.

I'm going to skip the tale of the electron beam scanning the CRT in þe olde times because every page describing VGA has it; for our purposes here it is enough to just regard it as an abstract serial protocol.

CHIP-8 video

The CHIP-8 has 1-bit graphics with 64⨯32 resolution. This is not a typo, there is no unit or scaling factor missing: it really is only 64 (square) pixels across ands 32 pixels down. That is not a lot of pixels; to give you an idea, here is a full-screen dump rendered with no scaling:

To make it full-screen, we need to scale it up. The easiest way I could think of was to scale it by a factor of a power of 2; that way, we can easily convert the screen-space X/Y coordinates to computer-space by just dropping the lowest bits. From 64⨯32, we could scale it for example by 8 to get 512⨯256, or by 16 for 1024⨯512. Of course, given the awkward 2:1 aspect ratio of the CHIP-8, we can't hope to avoid borders altogether: if we look at some lists of VGA modes, these two look promising: the larger one would fit on 1024⨯768 with no vertical bordering, and the other one is a close contender with a small border in 640⨯480, requiring only a 25MHz pixel clock.

A low pixel clock frequency is useful for this project because I'm still learning the ropes with CλaSH, so getting just something to work is an achievement on its own; getting something to work efficiently, at a high clock rate, or using two separate clock domains for the video generator and the CPU would both be more advanced topics for a later version. So for this project, I'm going to go with 640⨯480: divided by 8, this gives us a screen layout that looks like this:

Signal generation from CλaSH

So out of all the 640⨯480 modes, we're going to use 640⨯480@60Hz for this, since the CHIP-8 also has a 60 Hz timer which we'll get for free this way. Let's write down the timing parameters; we'll need a 10-bit counter for both X and Y since the porches push the vertical size to 524.

data VGATiming n = VGATiming
    { visibleSize, pre, syncPulse, post :: Unsigned n
    }

data VGATimings w h = VGATimings
    { vgaHorizTiming :: VGATiming w
    , vgaVertTiming :: VGATiming h
    }

-- | VGA 640*480@60Hz, 25.175 MHz pixel clock
vga640x480at60 :: VGATimings 10 10
vga640x480at60 = VGATimings
    { vgaHorizTiming = VGATiming 640 16 96 48
    , vgaVertTiming  = VGATiming 480 11  2 31
    }
      

The output of the signal generator is the vertical and horizontal sync lines and the X/Y coordinates of the pixel being drawn; the idea being that there would be some other circuit that is responsible for ensuring the correct color data is put out for that coordinate. I've also bundled two extra lines to signal the start of a line/frame: the frame one will be used for the 60Hz timer, and the line one can be useful for implementing other machines later on: for example, on the Commodore 64, the video chip can be configured to interrupt the processor at some given raster line.

data VGADriver dom w h = VGADriver
    { vgaVSync :: Signal dom Bit
    , vgaHSync :: Signal dom Bit
    , vgaStartFrame :: Signal dom Bool
    , vgaStartLine :: Signal dom Bool
    , vgaX :: Signal dom (Maybe (Unsigned w))
    , vgaY :: Signal dom (Maybe (Unsigned h))
    }
      

For the actual driver, a horizontal counter simply counts up to the total horizontal size, and a vertical counter is incremented every time the horizontal one is reset. Everything else can be easily derived from these two counters with just pure functions:

vgaDriver 
    :: (HiddenClockReset dom gated synchronous, KnownNat w, KnownNat h)
    => VGATimings w h
    -> VGADriver dom w h
vgaDriver VGATimings{..} = VGADriver{..}
  where
    vgaVSync = activeLow $ pure vSyncStart .<=. vCount .&&. vCount .<. pure vSyncEnd
    vgaHSync = activeLow $ pure hSyncStart .<=. hCount .&&. hCount .<. pure hSyncEnd
    vgaStartLine = hCount .==. pure hSyncStart
    vgaStartFrame = vgaStartLine .&&. vCount .==. pure vSyncStart
    vgaX = enable <$> (hCount .<. pure hSize) <*> hCount
    vgaY = enable <$> (vCount .<. pure vSize) <*> vCount

    endLine = hCount .==. pure hMax
    endFrame = vCount .==. pure vMax
    hCount = register 0 $ mux endLine 0 (hCount + 1)
    vCount = regEn 0 endLine $ mux endFrame 0 (vCount + 1)

    VGATiming hSize hPre hSync hPost = vgaHorizTiming
    hSyncStart = hSize + hPre
    hSyncEnd = hSyncStart + hSync
    hMax = sum [hSize, hPre, hSync, hPost] - 1

    VGATiming vSize vPre vSync vPost = vgaVertTiming
    vSyncStart = vSize + vPre
    vSyncEnd = vSyncStart + vSync
    vMax = sum [vSize, vPre, vSync, vPost] - 1        
      

I hooked it up to a small test circuit, connected it to a small CCTV display I had lying around before I unpacked the bigger old VGA screen, and... nothing. This was frustrating because in the CλaSH simulator the sync timings looked right. Then Joe had this simple but brilliant idea to just blink an LED at 1 Hz using the pixel clock, and see if that is correct -- this immediately uncovered that I was using the wrong clock manager settings, and instead of a pixel clock of 25.125 MHz, it was running at 40 MHz. No wonder the signal made no sense to the poor screen... With that out of the way, I finally saw the test pattern:

And so with a bit of bit truncation, I now have a checkers pattern displayed at the CHIP-8 resolution; and I've even ended up bringing out the larger screen:

The full code in progress is on GitHub; in particular, the version that generates the checkerboard pattern is in this code.

RetroChallenge 2018: CHIP-8 in CλaSH

29 August 2018 (programming haskell fpga electronics retrochallenge)

The 2018 September run of the RetroChallenge starts in a couple of days, and I've been playing around with CλaSH for the last couple of weeks, so I thought I'd aim for the stars and try to make a CHIP-8 computer.

What is CλaSH?

CλaSH is, basically, an alternative GHC backend that emits a hardware description from Haskell code. In the more established Lava family of HDLs, like Kansas Lava that I've used earlier, you write Haskell code which uses the Lava eDSL to describe your hardware, then you run said Haskell code and it emits VHDL or Verilog. With CλaSH, you write Haskell code which you then compile to VHDL/Verilog. This allows you to e.g. use algebraic data types with pattern matching, and it will compile to just the right bit width.

Here is an example of CλaSH code by yours truly: the receiver half of a UART.

What is CHIP-8?

There is no shortage of information about the CHIP-8 on the Internet; the Wikipedia page is a good starting point. But basically, it is a VM spec from the mid-'70s; originally meant to be run on home computers built around the the 8-bit COSMAC CPU. There are tens, TENS! of games written for it.

CHIP-8 has 35 opcodes, a one-note beeper, 64x32 1-bit framebuffer graphics that is accessible only via an XOR bit blitting operation (somewhat misleadingly called "sprites"), and 16-key input in a 4x4 keypad layout.

I have some experience with CHIP-8 already: it was the first computer I've implemented on an FPGA (in Kansas Lava back then); I've used it as the test application when I was working on Rust for AVR; I even have a super-barebones 2048 game for the platform.

In summary, CHIP-8 is so simple that it is a very good starting point when getting accustomed with any new way of making computers, be it emulators, electronics boards or FPGAs.

My plan

I want to build an FPGA implementation of CHIP-8 on the Papilio Pro board I have laying around. It is based on the Xilinx Spartan-6 FPGA, a chip so outdated that the last version of Xilinx's development tools that supports it is ISE 14.7 from 2013; but it is what I have so it will have to do. I plan to work outside-in by first doing the IO bits: a VGA signal generator for the video output, and a PS/2 keyboard controller for the input. Afterwards, I will move to making a CPU which uses CHIP-8 as its machine language.

If I can finish all that with time remaining, there are some "stretch goals" as well: implementing sound output (the Papilio IO board has a 1-bit PWM DAC that I could use for that); interfacing with a real 4x4 keypad instead of a full PS/2 keyboard; and implementing some way of switching ROM images without rebuilding the FPGA bitfile.

Rust on AVR: Beyond Blinking

12 May 2017 (programming rust llvm electronics avr)

In February this year, Dylan McKay wrote in his blog:

In the coming months the Rust compiler should support AVR support out-of-the-box!

I've been watching this from afar, planning to try it out when AVR support is mainlined into Rust; I thought it would be much easier to wait until the dust settles than trying to track the development versions of LLVM and Rust at the same time. However, it's now May, LLVM 4.0 has come out with the new AVR backend and the Rust compiler has been updated to use it; and my interest in Rust has also started to grow: in late March one of my beachside readings on Gili Meno was O'Reilly's Programming Rust, and a Rust meetup group started in Singapore. And so I started to formulate a plan.

Two years ago, Viki and I designed and prototyped a very minimal AVR-based handheld console that can run CHIP-8 programs. It's made up of an Adafruit Trinket (an AVR ATMega328P running on 12MHz@3.3V, with a bit of circuitry to be programmable via serial-over-USB), a serial SRAM chip, an LCD from an old Nokia phone, and a 4x4 keypad.

CHIP-328 (schematics)

The 2015 software was, of course, written in C++. So my idea was to rewrite that in Rust, mainly aiming to use Rust's algebraic data type support with pattern matching. Given the way one usually programs microcontrollers, I think pattern matching gives me the most bang for my buck for now, until we start designing and implementing funky Rust libraries that use the borrows checker to enforce all kinds of interesting non-memory-allocation invariants.

And so, while fiddling my thumbs waiting for AVR support to show up in the next Rust release, I got busy reimplementing the CHIP-8 engine in Rust, in two modules: chip8-engine is a no_std library implementing the CHIP-8 machine with all IO done over a trait, and chip8-sdl is the executable that links to the library and implements the frontend using SDL. This is the exact same architecture we used back in 2015 to develop the C++ version; this allowed us to debug the CHIP-8 implementation on an x86 computer, only having to worry about running on the device for the IO-specific parts.

CHIP-8 SDL frontend

Two weeks ago, I decided to take the plunge and build LLVM and the Rust compiler from the dev branch where AVR support is being worked on.

At this point, we have to note a big difference between the development process on C++ vs. Rust. With the C++ version, once the engine was running OK on the computer, we simply recompiled it using an AVR-targeting C++ compiler and linked it with code that communicates with the serial RAM chip, the LCD, and the keypad. On the other hand, at least for now, AVR support in LLVM and Rust is in its infancy, so even if the engine works on x86, there is absolutely no guarantee that it will do remotely the same on AVR as well.

CHIP-328

And so the next step was to use simavr to create a simulator for our board. The simulator implements just enough of the PCD8544 protocol to be able to display the pixels of the LCD in an SDL window; implements just enough of the serial SRAM protocol to read/write single bytes; and implements the keypad in the obvious way. I debugged the simulator by running the C++ version of the firmware; then when I decided I trusted the simulator enough, it was time to go back to Rust and see what it takes to compile it to AVR.

First, I tried just compiling the Hello World of microcontrollers: blinking an LED.

Right out the gate, this fails. It fails even before you get to try compiling your program. It fails because the Core library of Rust itself cannot be compiled on AVR due to a compiler bug. So the zeroth thing you have to do, before you even do the first thing, is to start trimming down Rust's libcore until it doesn't contain too much stuff that compiling it triggers the aforementined bug, but still contains enough to be useful. For example, you need to include core::iter if you want to do any for loops. I've put my version of libcore-mini on GitHub; if you need anything that is not included from the real libcore, just try adding it and hope for the best.

BTW, if you use a custom libcore, there's a bunch of magic Rust incantations you need in your code:

#![feature(no_core)]
#![no_core]

extern crate libcore_mini as core;

// These are imported to get for-loops working
#[allow(unused_imports)]
use core::option;
#[allow(unused_imports)]
use core::iter;
#[allow(unused_imports)]
use core::ops;
    

Also the module providing main() has to jump through a bunch of extra hoops:

pub mod std {
    #[lang = "eh_personality"]
    #[no_mangle]
    pub unsafe extern "C" fn rust_eh_personality(state: (), exception_object: *mut (), context: *mut ()) -> () {
    }

    #[lang = "panic_fmt"]
    #[unwind]
    pub extern fn rust_begin_panic(msg: (), file: &'static str, line: u32) -> ! {
        loop{}
    }
}

#[no_mangle]
pub extern fn main() {
    // Finally! Put your real main() here!
}
    

Note that rust_begin_panic just loops, since there's nothing better I could come up with for a microcontroller.

Once you have your own stripped-down libcore, you can start writing programs. Here's my first one, a blinker that uses a timer interrupt instead of busy-waiting:

use core::intrinsics::{volatile_load, volatile_store};

mod avr {
    pub const DDRB:   *mut u8  = 0x24 as *mut u8;
    pub const PORTB:  *mut u8  = 0x25 as *mut u8;
    pub const TCCR1B: *mut u8  = 0x81 as *mut u8;
    pub const TIMSK1: *mut u8  = 0x6f as *mut u8;
    pub const OCR1A:  *mut u16 = 0x88 as *mut u16;
}

use avr::*;

const MASK: u8 = 0b_0010_0000;

#[no_mangle]
pub extern fn main() {
    unsafe {
        volatile_store(DDRB, volatile_load(DDRB) | MASK);

        // Configure timer 1 for CTC mode, with divider of 64
        volatile_store(TCCR1B, volatile_load(TCCR1B) | 0b_0000_1101);

        // Timer frequency
        volatile_store(OCR1A, 62500);

        // Enable CTC interrupt
        volatile_store(TIMSK1, volatile_load(TIMSK1) | 0b_0000_0010);

        // Good to go!
        asm!("SEI");

        loop {}
    }
}

#[no_mangle]
pub unsafe extern "avr-interrupt" fn __vector_11() {
    volatile_store(PORTB, volatile_load(PORTB) ^ MASK);
}
    

Again, at first, this failed due to another compiler bug that I've reported here and got fixed in a couple hours; and then for the first weekend, I got into this pattern of compiling something, running it in the simulator, noticing that the MCU gets jammed due to invalid machine code, or LLVM fails to turn IR into AVR assembly, and so on; reporting the issue at hand; then I'd test and confirm that the fix works.

As I dug myself deeper into both LLVM and the Rust compiler, after a while I started not just reporting bugs and reducing test cases, but fixing them as well. In fact, if you want to try Rust on AVR today, I very much recommend using LLVM from my fork since it has all my changes that haven't been upstreamed yet, but are all needed to compile my code.

And so now, two weeks later, I can reasonably claim that I know LLVM (something that I always wanted to learn but haven't gotten around to until now), I have a complete Rust implementation of CHIP-8 that runs on my real hardware board, and there are interesting problems to work on in Rustc and LLVM moving forward. Also, the Rust API to the actual AVR IO functionality that I've implemented is very rudimentary; I know of at least one project already that tries to design a nicer API, and we could probably also reuse some of the ideas from Marten's C++ AVR API to do bundled IO port updates.

Older entries: