Blog tags RSS

Entries tagged programming

Rust on AVR: Beyond Blinking

12 May 2017 (programming rust llvm electronics avr)

In February this year, Dylan McKay wrote in his blog:

In the coming months the Rust compiler should support AVR support out-of-the-box!

I've been watching this from afar, planning to try it out when AVR support is mainlined into Rust; I thought it would be much easier to wait until the dust settles than trying to track the development versions of LLVM and Rust at the same time. However, it's now May, LLVM 4.0 has come out with the new AVR backend and the Rust compiler has been updated to use it; and my interest in Rust has also started to grow: in late March one of my beachside readings on Gili Meno was O'Reilly's Programming Rust, and a Rust meetup group started in Singapore. And so I started to formulate a plan.

Two years ago, Viki and I designed and prototyped a very minimal AVR-based handheld console that can run CHIP-8 programs. It's made up of an Adafruit Trinket (an AVR ATMega328P running on 12MHz@3.3V, with a bit of circuitry to be programmable via serial-over-USB), a serial SRAM chip, an LCD from an old Nokia phone, and a 4x4 keypad.

CHIP-328 (schematics)

The 2015 software was, of course, written in C++. So my idea was to rewrite that in Rust, mainly aiming to use Rust's algebraic data type support with pattern matching. Given the way one usually programs microcontrollers, I think pattern matching gives me the most bang for my buck for now, until we start designing and implementing funky Rust libraries that use the borrows checker to enforce all kinds of interesting non-memory-allocation invariants.

And so, while fiddling my thumbs waiting for AVR support to show up in the next Rust release, I got busy reimplementing the CHIP-8 engine in Rust, in two modules: chip8-engine is a no_std library implementing the CHIP-8 machine with all IO done over a trait, and chip8-sdl is the executable that links to the library and implements the frontend using SDL. This is the exact same architecture we used back in 2015 to develop the C++ version; this allowed us to debug the CHIP-8 implementation on an x86 computer, only having to worry about running on the device for the IO-specific parts.

CHIP-8 SDL frontend

Two weeks ago, I decided to take the plunge and build LLVM and the Rust compiler from the dev branch where AVR support is being worked on.

At this point, we have to note a big difference between the development process on C++ vs. Rust. With the C++ version, once the engine was running OK on the computer, we simply recompiled it using an AVR-targeting C++ compiler and linked it with code that communicates with the serial RAM chip, the LCD, and the keypad. On the other hand, at least for now, AVR support in LLVM and Rust is in its infancy, so even if the engine works on x86, there is absolutely no guarantee that it will do remotely the same on AVR as well.


And so the next step was to use simavr to create a simulator for our board. The simulator implements just enough of the PCD8544 protocol to be able to display the pixels of the LCD in an SDL window; implements just enough of the serial SRAM protocol to read/write single bytes; and implements the keypad in the obvious way. I debugged the simulator by running the C++ version of the firmware; then when I decided I trusted the simulator enough, it was time to go back to Rust and see what it takes to compile it to AVR.

First, I tried just compiling the Hello World of microcontrollers: blinking an LED.

Right out the gate, this fails. It fails even before you get to try compiling your program. It fails because the Core library of Rust itself cannot be compiled on AVR due to a compiler bug. So the zeroth thing you have to do, before you even do the first thing, is to start trimming down Rust's libcore until it doesn't contain too much stuff that compiling it triggers the aforementined bug, but still contains enough to be useful. For example, you need to include core::iter if you want to do any for loops. I've put my version of libcore-mini on GitHub; if you need anything that is not included from the real libcore, just try adding it and hope for the best.

BTW, if you use a custom libcore, there's a bunch of magic Rust incantations you need in your code:


extern crate libcore_mini as core;

// These are imported to get for-loops working
use core::option;
use core::iter;
use core::ops;

Also the module providing main() has to jump through a bunch of extra hoops:

pub mod std {
    #[lang = "eh_personality"]
    pub unsafe extern "C" fn rust_eh_personality(state: (), exception_object: *mut (), context: *mut ()) -> () {

    #[lang = "panic_fmt"]
    pub extern fn rust_begin_panic(msg: (), file: &'static str, line: u32) -> ! {

pub extern fn main() {
    // Finally! Put your real main() here!

Note that rust_begin_panic just loops, since there's nothing better I could come up with for a microcontroller.

Once you have your own stripped-down libcore, you can start writing programs. Here's my first one, a blinker that uses a timer interrupt instead of busy-waiting:

use core::intrinsics::{volatile_load, volatile_store};

mod avr {
    pub const DDRB:   *mut u8  = 0x24 as *mut u8;
    pub const PORTB:  *mut u8  = 0x25 as *mut u8;
    pub const TCCR1B: *mut u8  = 0x81 as *mut u8;
    pub const TIMSK1: *mut u8  = 0x6f as *mut u8;
    pub const OCR1A:  *mut u16 = 0x88 as *mut u16;

use avr::*;

const MASK: u8 = 0b_0010_0000;

pub extern fn main() {
    unsafe {
        volatile_store(DDRB, volatile_load(DDRB) | MASK);

        // Configure timer 1 for CTC mode, with divider of 64
        volatile_store(TCCR1B, volatile_load(TCCR1B) | 0b_0000_1101);

        // Timer frequency
        volatile_store(OCR1A, 62500);

        // Enable CTC interrupt
        volatile_store(TIMSK1, volatile_load(TIMSK1) | 0b_0000_0010);

        // Good to go!

        loop {}

pub unsafe extern "avr-interrupt" fn __vector_11() {
    volatile_store(PORTB, volatile_load(PORTB) ^ MASK);

Again, at first, this failed due to another compiler bug that I've reported here and got fixed in a couple hours; and then for the first weekend, I got into this pattern of compiling something, running it in the simulator, noticing that the MCU gets jammed due to invalid machine code, or LLVM fails to turn IR into AVR assembly, and so on; reporting the issue at hand; then I'd test and confirm that the fix works.

As I dug myself deeper into both LLVM and the Rust compiler, after a while I started not just reporting bugs and reducing test cases, but fixing them as well. In fact, if you want to try Rust on AVR today, I very much recommend using LLVM from my fork since it has all my changes that haven't been upstreamed yet, but are all needed to compile my code.

And so now, two weeks later, I can reasonably claim that I know LLVM (something that I always wanted to learn but haven't gotten around to until now), I have a complete Rust implementation of CHIP-8 that runs on my real hardware board, and there are interesting problems to work on in Rustc and LLVM moving forward. Also, the Rust API to the actual AVR IO functionality that I've implemented is very rudimentary; I know of at least one project already that tries to design a nicer API, and we could probably also reuse some of the ideas from Marten's C++ AVR API to do bundled IO port updates.

Stack reification via trampolines

21 February 2016 (programming language java)

I've been spending a couple weeks now on getting Időrégész to Android in the best possible way. Időrégész is the Hungarian text adventure game from the '80s that I reverse-engineered and then re-implemented in Inform. My original plan was to just implement a Glulx interpreter; however, initial experiments with a Clojure implementation weren't too promising on the performance front.

I decided to turn to compilation instead of interpretation, then: take the Inform-emitted Glulx image, and compile that directly to the JVM. Of course, that approach would have its own problems with self-modifying code, but my goal is just to get it working well enough that I can compile Időrégész, which is very vanilla as far as Inform programs go.

Most instructions of Glulx map to JVM instructions in a relatively straightforward manner; some unsigned integer operations are implemented by hand in Java and then called via invokestatic. The rather baroque string compression is currently handled at compile time, by just emitting a single Java function that looks like this:

public String resolveString(int ptr)
  switch (ptr) {
    case 32851 : return "Class" ;
    case 32857 : return "Object" ;
    case 32863 : return "Routine" ;
    // and so on...

However, there is one aspect of Glulx that makes the compilation trickier than a straightforward mapping of procedures to procedures and calls to calls: some Glulx opcodes provide fairly detailed access to the runtime state. For example, save must be able to serialize enough of the state that it can later load that back and continue with the next instruction.

In an interpreter, this is a trival matter, since things like the VM's memory and the stack are already reified data structures. However, if it's compiled to JVM, the state of the Glulx program and the state of the JVM is one and the same; so how do you then save it — from the inside?

The solution is to not use the JVM's stack as the Glulx stack; rather, function calls are compiled to areturn instructions that return a small Java object describing which function to call and where to jump back once its result is available. Returns are also compiled to areturn instructions, but this time returning an object describing the result value. Function-local variables and the per-function stack are passed to the generated JVM code as function arguments:

Each Glulx function is compiled into a class that extends the Glulx.Fun class, defined in Kotlin as:

package Glulx

abstract class Fun {
  abstract class Cont {
    class Return(val value : Int) : Cont()
    class Call(val args: IntArray, val nextFun : Fun, val contPC : Short) : Cont()

  class Result(val value : Int, val contPC : Short)

  abstract fun enter(stub: Stack.CallStub?, args: IntArray): Stack.CallFrame
  abstract fun exec(result: Result?, localVars: IntArray): Cont

Since the JVM doesn't support computed jumps, the continuation address contPC is handled by starting the exec of each Fun with a big tableswitch. Here's an example of a recursively defined factorial function (using the Krakatau JVM assembler's syntax):

.method public exec : (LGlulx/Fun$Result;[I)LGlulx/Fun$Cont;
  .code stack 10 locals 10
    ifnull LSTART

    invokevirtual Method Glulx/Fun$Result getContPC ()S
    tableswitch 0
      default: LSTART

    ;; if V0=0, jump to base case
    ldc 0

    ifeq L0

    ;; START call FACT(V0-1)
    ldc 1
    newarray int
    ldc 0
    ldc 0
    ldc 1

    new Glulx/Fun$Cont$Call
    getstatic Field Glulx/Image/FACT fun LGlulx/Fun;
    ldc 0
    invokespecial Method Glulx/Fun$Cont$Call <init> ([ILGlulx/Fun;S)V

    invokevirtual Method Glulx/Fun$Result getValue ()I
    ;; END call FACT(V0-1)
    ;; Note the code generated for the call spans an areturn!

    ;; Multiply result by V0
    ldc 0

    ;; Return result -- this is the "real" return
    new Glulx/Fun$Cont$Return    
    invokespecial Method Glulx/Fun$Cont$Return <init> (I)V

    ;; For the base case, we just return 1
    new Glulx/Fun$Cont$Return
    ldc 1
    invokespecial Method Glulx/Fun$Cont$Return <init> (I)V
  .end code
.end method

Running these functions then becomes a matter of mere stack juggling, implemented again in Kotlin:

package Glulx

class Stack {
  class CallFrame(val parent: CallStub?, val localVars: IntArray) {
    constructor(parent: CallStub?, localVarCount: Int):
      this(parent, IntArray(localVarCount))
    fun storeArgs(args: IntArray) {
      for(i in
        localVars[i] = args[i]
  class CallStub(val parent: CallFrame, val parentFun: Fun, val parentPC : Short)

fun run(startFun: Fun) {
  var frame = startFun.enter(null, IntArray(0))
  var fn = startFun
  var result : Fun.Result? = null

  while (true) {
    val cont = fn.exec(result, frame.localVars)
    when (cont) {
      is Fun.Cont.Return -> {
         val stub = frame.parent
         if (stub == null) return

         frame = stub.parent
         fn = stub.parentFun
         result = Fun.Result(cont.value, stub.parentPC)
      is Fun.Cont.Call -> {
        val stub = Stack.CallStub(frame, fn, cont.contPC)
        fn = cont.nextFun
        frame = fn.enter(stub, cont.args)
        result = null

In the real implementation, there's slightly more state to pass around: Fun.exec also gets as argument an instance of a generic Environment class which it can use to e.g. access the main Glulx memory, or to issue Glk calls; and there are some boring details about handling both 32-, 16- and 8-bit local variables.

Note that the exact same technique can also be used to implement tail recursion (even mutual tail recursion) on a platform that doesn't support it, like the JVM. I am not using it here, but Glulx actually has a tailcall instruction (not used by Időrégész, or sane Inform code in general), so it might come handy if I want to increase feature coverage.

Hacks of 2015

28 December 2015 (programming haskell idris games electronics avr FPGA meta)

Encsé writes in his blog that one of the reasons he created a tiny CPS-based Scheme interpreter was because he realized he hasn't done any fun side projects all year. So I thought I'd make an inventory of fun hacks I've done in 2015.




Functional programming

Stack Overflow highlights

Some of the answers I wrote this year on Stack Overflow required me to learn just enough of something new to be able to answer the question:

Then, there were those where it turned out there was a bug to be found by scratching the surface of the question deep enough:

Then there were the answers that were just too much fun to write:

All in all, it seems this has been quite a productive year for me out of the office, even if you exclude the Stack Overflow part. I was actually surprised how long this list was while compiling it for this blog post. Maybe I should write a list like this every year from now...

Időrégész 2015

5 June 2015 (programming retro games)

Amikor már megvolt a C64-ünk, megkaptam egyszerre az LSI-féle 1001 játék sorozat első három részét anyukáméktól (ez úgy '89 körül lehetett), azt hiszem a negyedik-ötödik rész később jelent meg. Annyira nem voltam a scene-ben, hogy azt sem tudtam, hogy van olyan, hogy scene. Az én scene-em az a pár általános iskolai osztálytárs volt, akinek legalább egy ZX Spectrumja volt. Szóval a lényeg, hogy ez a pár könyv volt minden kapcsolatom a C64 játékokkal. Azon a néhány játékon kívül, amik így apukám ismerősein keresztül jutottak hozzánk, a többiről csak olvasni tudtam. Persze a leírás alapján mindegyik a legjobb játéknak tűnt EVER!, és volt néhány, amit el is kezdtem BASIC-ben leprogramozni az alapján, ahogy elképzeltem; ez nyilván több oldalról is ab ovo teljes sikertelenségre volt ítélve.

A harmadik 1001 játékban volt egy végigjátszás az Időrégészről, amibe persze azonnal beleszerettem. A szöveges kalandjáték volt az egyetlen játék-műfaj, aminek a programozásában akkortájt labdába tudtam rúgni annyira, hogy valami játszható jöjjön ki belőle, és emiatt pláne úgy éreztem, hogy ez a játék megszólít engem. Ráadásul nem is tudom, hogy volt-e akkor már CoV, de az biztos, hogy én nem ismertem (a Magic Candle-ös szám volt az első, amikor belefutottam, ha minden igaz, '92-ben), úgyhogy a leírás stílusa, humora is egy teljesen új világot nyitott meg előttem.

Mivel a fent ecsetelt okok miatt warez-ként esélyem se volt hozzájutni, ez lett a második játék, amit eredetiben megvettünk (az első az Impossible Mission 2 volt (kazettán), de azzal én nem sokat játszottam, azt apukám tolta — külön post lehetne, amikor az éjszaka közepén, a TV hangerejét éppen csak hallhatóra állítva ordított a figura leesés közben).

A játék csalódást nem okozott, de végigjátszani az istennek se ment, még az 1001-leírás alapján sem. Az áttörést csak 1991-ben hozta egy osztálytársam, aki rájött, hogy SPOILER ALERT a papnál még imádkozni is kell, nem elég odaadni neki az aranykeresztet vagy mit.

Most illene arról is írni, hogy akkor most ez egy jó játék volt-e. Azt gondolom, hogy most tudom a legkevésbé ezt megítélni, mert túl közelről látom (mint majd mindjárt kiderül hogy miért). Az biztos, hogy akinek annak idején volt C64-e (vagy akár +4-e!), és magyar, az ismeri az Időrégészt meg a többi Rátkai-játékot: nem véletlenül volt róla IDDQD cikk meg Index illetve Facebook csoportok. Igazi cult classic.

Egy modern Időrégész-reprodukció terve

Mostanában elkezdtek érdekelni a régi 8-bites gépek, mivel annak idején az alacsonyszintű programozásuk, meg úgy egyáltalán a felépítésük totál kimaradt nekem. Most, hogy autodidakta módon FPGA-fejlesztést tanulok, elértem oda, hogy épkézláb számítógépeket tudok építeni, amihez persze elkerülhetetlen, hogy megismerjem minden kis részletüket. Ennek egyébként egy nem várt, de kellemes mellékhatása, hogy tudom értékelni a korabeli és azóta készült demókat, amikor minden képkockát önmagában is lehetetlennek tűnik előállítani a C64-en.

Szóval kitaláltam, hogy kipróbálnám egy játék teljes visszafejtését, mivel ilyesmit sose csináltam még. Egy ilyen nem-realtime, nem-grafikus játék, mint az Időrégész, remek kezdő projektnek tűnt. Az alap-elképzelés (még május elején) az volt, hogy visszafejtem a programot annyira, hogy tudjam minden részéről hogy mit csinál, és ezalapján megértem, és minél magasabb szinten reprodukálom a működését. Hogy valami igazán örök formában rögzítsem, mindenképpen a ZMachine virtuális gépet terveztem célbavenni, mivel a modern interactive fiction-világban ez tűnik egyértelműen a bevett eszköznek már évtizedek óta.

Continue reading »

Initial version of my Commodore PET

2 March 2015 (programming haskell FPGA electronics retro)

In my quest to build more and more complicated computers on FPGAs armed with nothing but a crappy hobbist mindset and some hazy ideas of how Kansas Lava is supposed to work, I've reached another milestone: my first real computer.

That is, unlike the Brainfuck CPU that I designed myself, or the CHIP-8, which was originally a virtual machine spec (with all the implementation leeway that implies), this latest subject is a bona fide 8-bit home computer from the seventies: the Commodore PET.

A Commodore PET 2001 from 1977

The Commodore PET in a nutshell

The PET is a very simple machine compared to later Commodore models, which is why I thought it would make a good first step on a journey that I hope will one day culminate in implementing a Commodore 64. Its centerpiece is the MOS 6502 CPU (practically the same as the MOS 6510 used in the C=64), and there are only four other components: a text-only video signal generator and three IO interface chips (two PIA's and one VIA) for keyboard, Datasette and extension port communication. Just hooking up one of the PIAs is enough to get a minimal system working with keyboard input.

12 KBytes of PET ROM contain implementation of basic IO routines (the so-called "kernal"), the full-screen text editor, and Microsoft's BASIC interpreter. Then there's a separate character ROM (not addressable from the CPU) used by the video generator.

MOS 6502

The 6502 microprocessor was a staple of the eight-bit home computer era of the late seventies and eighties. By today's standards, it is incredible to imagine what it must have been like to design it manually, drawing the layout with pencils on paper. On the other hand, if it was designed in such a low-tech way, I figured it shouldn't be too difficult to build something compatible using modern tools even by a hobbist like myself. And of course there are already dozens of home-built 6502 implementations out there, to varying degrees of compatibility.

The ultimate reference on the 6502 must be the Visual 6502 Project which I deliberately avoided consulting. I don't really see the educational value in copying the original 6502 design; so instead, I went with a more black-box approach by just looking at the opcode descriptions and interrupt model and working from that.

The first milestone I aimed for was to get enough of the CPU working that I can run the sample programs on, which defines a tiny microcomputer-like architecture that doesn't have interrupts or any fancy video modes: you just have 32×32 pixels with a fixed 16-color palette mapped to main RAM from $0200, and a zero page-mapped register for keyboard input that you can do polling on. The Kansas Lava implementation is really simple and I plan to reuse it later if I do a similar project with the Z80.

My workflow was that I would use ca65 to assemble test programs, burn them into ROM, and run it in the Kansas Lava simulator for a couple thousand cycles; then render the video RAM into a GTK+ window. I would start with this program that does nothing but moves data around in memory (drawing the Commodore logo pixel by pixel), and basically I implemented the 6502 opcodes as I went along. After two days of work, I finally got it working:

First positive result

Seeing this was an incredible feeling. The input was valid 6502 machine code, and my very own CPU managed to run it correctly for the approximately 40,000 cycles that it took to draw this image. There was no stopping at this point: I already had a working VGA frame buffer implementation from the CHIP-8, so next day I synthesized it and run it on real hardware, my venerable Papilio Pro:

Hi-tech from the seventies, today!

Simulation-based testing

As I added more and more opcodes and started running more and more complicated programs, things very quickly stopped working. My CPU was full of bugs, and figuring out what went wrong by looking at the simulation logs after running it for tens of thousands of cycles was very tedious.

And so, it was at this point that I started adding unit tests. The framework for writing tests exposes a monad where the available effects are making observations on the state of the system (CPU registers and contents of the memory) and executing instructions. This presents an API that allows writing tests in an imperative way:

php = do
    flags <- observe statusFlags
    sp <- observe regSP
    execute0 0x08
    sp' <- observe regSP
    pushed < observe $ mem (stackAddr <$> sp)
    assertEq "Stack pointer is decremented" sp' (pred <$> sp)
    assertEq "Status is correctly pushed" pushed flags

A test like this is turned into a ROM image containing $08 at the address pointed to by the reset vector. The simulation is then run until the CPU enters the Fetch internal state for the second time (the first time is when it fetches the opcode under testing, i.e. the PHP ($08) instruction), and then the observations are evaluated by looking at the simulation output in the same cycles as the Fetch state. Of course, this means you shouldn't be able to write tests like the following:

impossiblyDynamicTest = do
    arg <- observe regX
    execute1 0x00 arg
    a' <- observe regA
    assertEq "A is updated" a' arg

This is ensured by observe returning values wrapped in an Obs type, and execute1 requiring unwrapped arguments:

observe :: Query a -> TestM (Obs a)
execute1 :: Byte -> Byte -> TestM ()
assertEq :: (Eq a, Show a) => String -> Obs a -> Obs a -> TestM ()

To allow assertions over derived values, Obs is an applicative functor (in fact, it is the free applicative functor over the co-Yoneda functor of the primitive observations).

I think this approach has merit as a general framework for hardware simulator-based unit testing and I intend to extend it and maybe even carve it out into a separate library in the future.

A Minimal Viable PET

Once I had a sufficiently working CPU, I started building the other pieces around it. I took the PET emulator from the VICE suite and commented out all the PIA and VIA code, replacing writes with nops and reads with hardcoded values, until I was able to boot it up with the stock ROM to get to the READY. prompt. Of course, since the PIA supplying the interrupt used for timing was removed by that point, I had no flashing cursor or keyboard input. All in all, the system got to a steady state in about 80,000 operations. (Since my implementation is not yet cyle-accurate, I had to switch to counting operations instead of cycles beyond this point. Every operation is at least as fast as on the real chip, so I hope by adding some wait cycles I'll be able to take care of this at some later point.)

After hooking up the same hardcoded values on the same addresses to the CPU, the next step was running the simulator and peeking at the video memory area ($8000..$8FFF on the PET), using the original fonts to render the screen. The initial version showed there might be someone home (sorry for crap quality on the screenshot):


By comparing detailed logs from running the emulator and the simulator, I was able to make observations like "the first 12,345 steps seem to be in agreement", which was a big boost to productivity, getting me, in short order, to this:

Almost there!

After fixing some more bugs in the arithmetic opcodes, I was finally rewarded by this sight:

Ah... much better.

Adding more components

While working on the CPU, I also started writing the character generator, on top of the VGA signal generator in the kansas-lava-papilio package that I originally made for the CHIP-8. This way, the VGA synchronization signals were abstracted away from me and I just had to take care of pumping out the actual pixels. This turned out to be tricker than I originally thought, since you have to time all read-aheads just right so that everything is at hand just in time for the next pixel. So before it finishes drawing the 8 pixels that make up a single row of a character, the next character index is loaded from RAM, and then the character ROM is consulted for the first row of the font image of the next indexed character. Initial versions were having some ghosting issues, or even more fun, full character transpositions (like showing the character from one line above in the first position of each line).

The Commodore PET diverts the vsync signal from the video generator to one of the PIA chips, which generates a CPU interrupt that can be acknowledged by reading from one of its memory-mapped registers. So the next obvious step was to implement this functionality to get the cursor blinking! This required more than just implementing a PIA, since I didn't even have interrupts in the CPU at that point.

But all that work was totally worth it:

And now, here we are

The current version supports keyboard input from PS/2 keyboards (but not all keys are mapped yet), so for the first time since I started working on this more than a month ago, it can be used to write and run BASIC programs!

What you can't see on the video below is that there's still a bug somewhere that causes the classic 10 PRINT "FOO": 20 GOTO 10 program to terminate with an out of memory error after some time.

Apart from fixing these bugs, the big outstanding feature is to add Datasette support so that programs can be loaded and saved to virtual "casettes". For a first version, I'll just burn some extra ROM onto the FPGA containing the tape images and hook that up to the PIA controlling the casette player; but I guess the proper way to do this would be to use something like an SD card reader to get proper persistent, read-writable storage. Or maybe, alternatively, have some kind of serial-over-USB communication with a computer acting as the Datasette unit.

Older entries:

Entries from all tags