Contents

Blog tags RSS

Entries tagged retro

Időrégész 2015

5 June 2015 (programming retro games)

Amikor már megvolt a C64-ünk, megkaptam egyszerre az LSI-féle 1001 játék sorozat első három részét anyukáméktól (ez úgy '89 körül lehetett), azt hiszem a negyedik-ötödik rész később jelent meg. Annyira nem voltam a scene-ben, hogy azt sem tudtam, hogy van olyan, hogy scene. Az én scene-em az a pár általános iskolai osztálytárs volt, akinek legalább egy ZX Spectrumja volt. Szóval a lényeg, hogy ez a pár könyv volt minden kapcsolatom a C64 játékokkal. Azon a néhány játékon kívül, amik így apukám ismerősein keresztül jutottak hozzánk, a többiről csak olvasni tudtam. Persze a leírás alapján mindegyik a legjobb játéknak tűnt EVER!, és volt néhány, amit el is kezdtem BASIC-ben leprogramozni az alapján, ahogy elképzeltem; ez nyilván több oldalról is ab ovo teljes sikertelenségre volt ítélve.

A harmadik 1001 játékban volt egy végigjátszás az Időrégészről, amibe persze azonnal beleszerettem. A szöveges kalandjáték volt az egyetlen játék-műfaj, aminek a programozásában akkortájt labdába tudtam rúgni annyira, hogy valami játszható jöjjön ki belőle, és emiatt pláne úgy éreztem, hogy ez a játék megszólít engem. Ráadásul nem is tudom, hogy volt-e akkor már CoV, de az biztos, hogy én nem ismertem (a Magic Candle-ös szám volt az első, amikor belefutottam, ha minden igaz, '92-ben), úgyhogy a leírás stílusa, humora is egy teljesen új világot nyitott meg előttem.

Mivel a fent ecsetelt okok miatt warez-ként esélyem se volt hozzájutni, ez lett a második játék, amit eredetiben megvettünk (az első az Impossible Mission 2 volt (kazettán), de azzal én nem sokat játszottam, azt apukám tolta — külön post lehetne, amikor az éjszaka közepén, a TV hangerejét éppen csak hallhatóra állítva ordított a figura leesés közben).

A játék csalódást nem okozott, de végigjátszani az istennek se ment, még az 1001-leírás alapján sem. Az áttörést csak 1991-ben hozta egy osztálytársam, aki rájött, hogy SPOILER ALERT a papnál még imádkozni is kell, nem elég odaadni neki az aranykeresztet vagy mit.

Most illene arról is írni, hogy akkor most ez egy jó játék volt-e. Azt gondolom, hogy most tudom a legkevésbé ezt megítélni, mert túl közelről látom (mint majd mindjárt kiderül hogy miért). Az biztos, hogy akinek annak idején volt C64-e (vagy akár +4-e!), és magyar, az ismeri az Időrégészt meg a többi Rátkai-játékot: nem véletlenül volt róla IDDQD cikk meg Index illetve Facebook csoportok. Igazi cult classic.

Egy modern Időrégész-reprodukció terve

Mostanában elkezdtek érdekelni a régi 8-bites gépek, mivel annak idején az alacsonyszintű programozásuk, meg úgy egyáltalán a felépítésük totál kimaradt nekem. Most, hogy autodidakta módon FPGA-fejlesztést tanulok, elértem oda, hogy épkézláb számítógépeket tudok építeni, amihez persze elkerülhetetlen, hogy megismerjem minden kis részletüket. Ennek egyébként egy nem várt, de kellemes mellékhatása, hogy tudom értékelni a korabeli és azóta készült demókat, amikor minden képkockát önmagában is lehetetlennek tűnik előállítani a C64-en.

Szóval kitaláltam, hogy kipróbálnám egy játék teljes visszafejtését, mivel ilyesmit sose csináltam még. Egy ilyen nem-realtime, nem-grafikus játék, mint az Időrégész, remek kezdő projektnek tűnt. Az alap-elképzelés (még május elején) az volt, hogy visszafejtem a programot annyira, hogy tudjam minden részéről hogy mit csinál, és ezalapján megértem, és minél magasabb szinten reprodukálom a működését. Hogy valami igazán örök formában rögzítsem, mindenképpen a ZMachine virtuális gépet terveztem célbavenni, mivel a modern interactive fiction-világban ez tűnik egyértelműen a bevett eszköznek már évtizedek óta.

Continue reading »

Initial version of my Commodore PET

2 March 2015 (programming haskell FPGA electronics retro)

In my quest to build more and more complicated computers on FPGAs armed with nothing but a crappy hobbist mindset and some hazy ideas of how Kansas Lava is supposed to work, I've reached another milestone: my first real computer.

That is, unlike the Brainfuck CPU that I designed myself, or the CHIP-8, which was originally a virtual machine spec (with all the implementation leeway that implies), this latest subject is a bona fide 8-bit home computer from the seventies: the Commodore PET.

A Commodore PET 2001 from 1977

The Commodore PET in a nutshell

The PET is a very simple machine compared to later Commodore models, which is why I thought it would make a good first step on a journey that I hope will one day culminate in implementing a Commodore 64. Its centerpiece is the MOS 6502 CPU (practically the same as the MOS 6510 used in the C=64), and there are only four other components: a text-only video signal generator and three IO interface chips (two PIA's and one VIA) for keyboard, Datasette and extension port communication. Just hooking up one of the PIAs is enough to get a minimal system working with keyboard input.

12 KBytes of PET ROM contain implementation of basic IO routines (the so-called "kernal"), the full-screen text editor, and Microsoft's BASIC interpreter. Then there's a separate character ROM (not addressable from the CPU) used by the video generator.

MOS 6502

The 6502 microprocessor was a staple of the eight-bit home computer era of the late seventies and eighties. By today's standards, it is incredible to imagine what it must have been like to design it manually, drawing the layout with pencils on paper. On the other hand, if it was designed in such a low-tech way, I figured it shouldn't be too difficult to build something compatible using modern tools even by a hobbist like myself. And of course there are already dozens of home-built 6502 implementations out there, to varying degrees of compatibility.

The ultimate reference on the 6502 must be the Visual 6502 Project which I deliberately avoided consulting. I don't really see the educational value in copying the original 6502 design; so instead, I went with a more black-box approach by just looking at the opcode descriptions and interrupt model and working from that.

The first milestone I aimed for was to get enough of the CPU working that I can run the sample programs on 6502asm.com, which defines a tiny microcomputer-like architecture that doesn't have interrupts or any fancy video modes: you just have 32×32 pixels with a fixed 16-color palette mapped to main RAM from $0200, and a zero page-mapped register for keyboard input that you can do polling on. The Kansas Lava implementation is really simple and I plan to reuse it later if I do a similar project with the Z80.

My workflow was that I would use ca65 to assemble test programs, burn them into ROM, and run it in the Kansas Lava simulator for a couple thousand cycles; then render the video RAM into a GTK+ window. I would start with this program that does nothing but moves data around in memory (drawing the Commodore logo pixel by pixel), and basically I implemented the 6502 opcodes as I went along. After two days of work, I finally got it working:

First positive result

Seeing this was an incredible feeling. The input was valid 6502 machine code, and my very own CPU managed to run it correctly for the approximately 40,000 cycles that it took to draw this image. There was no stopping at this point: I already had a working VGA frame buffer implementation from the CHIP-8, so next day I synthesized it and run it on real hardware, my venerable Papilio Pro:

Hi-tech from the seventies, today!

Simulation-based testing

As I added more and more opcodes and started running more and more complicated programs, things very quickly stopped working. My CPU was full of bugs, and figuring out what went wrong by looking at the simulation logs after running it for tens of thousands of cycles was very tedious.

And so, it was at this point that I started adding unit tests. The framework for writing tests exposes a monad where the available effects are making observations on the state of the system (CPU registers and contents of the memory) and executing instructions. This presents an API that allows writing tests in an imperative way:

php = do
    flags <- observe statusFlags
    sp <- observe regSP
    execute0 0x08
    sp' <- observe regSP
    pushed < observe $ mem (stackAddr <$> sp)
    assertEq "Stack pointer is decremented" sp' (pred <$> sp)
    assertEq "Status is correctly pushed" pushed flags
      

A test like this is turned into a ROM image containing $08 at the address pointed to by the reset vector. The simulation is then run until the CPU enters the Fetch internal state for the second time (the first time is when it fetches the opcode under testing, i.e. the PHP ($08) instruction), and then the observations are evaluated by looking at the simulation output in the same cycles as the Fetch state. Of course, this means you shouldn't be able to write tests like the following:

impossiblyDynamicTest = do
    arg <- observe regX
    execute1 0x00 arg
    a' <- observe regA
    assertEq "A is updated" a' arg
      

This is ensured by observe returning values wrapped in an Obs type, and execute1 requiring unwrapped arguments:

observe :: Query a -> TestM (Obs a)
execute1 :: Byte -> Byte -> TestM ()
assertEq :: (Eq a, Show a) => String -> Obs a -> Obs a -> TestM ()
      

To allow assertions over derived values, Obs is an applicative functor (in fact, it is the free applicative functor over the co-Yoneda functor of the primitive observations).

I think this approach has merit as a general framework for hardware simulator-based unit testing and I intend to extend it and maybe even carve it out into a separate library in the future.

A Minimal Viable PET

Once I had a sufficiently working CPU, I started building the other pieces around it. I took the PET emulator from the VICE suite and commented out all the PIA and VIA code, replacing writes with nops and reads with hardcoded values, until I was able to boot it up with the stock ROM to get to the READY. prompt. Of course, since the PIA supplying the interrupt used for timing was removed by that point, I had no flashing cursor or keyboard input. All in all, the system got to a steady state in about 80,000 operations. (Since my implementation is not yet cyle-accurate, I had to switch to counting operations instead of cycles beyond this point. Every operation is at least as fast as on the real chip, so I hope by adding some wait cycles I'll be able to take care of this at some later point.)

After hooking up the same hardcoded values on the same addresses to the CPU, the next step was running the simulator and peeking at the video memory area ($8000..$8FFF on the PET), using the original fonts to render the screen. The initial version showed there might be someone home (sorry for crap quality on the screenshot):

Yeah...

By comparing detailed logs from running the emulator and the simulator, I was able to make observations like "the first 12,345 steps seem to be in agreement", which was a big boost to productivity, getting me, in short order, to this:

Almost there!

After fixing some more bugs in the arithmetic opcodes, I was finally rewarded by this sight:

Ah... much better.

Adding more components

While working on the CPU, I also started writing the character generator, on top of the VGA signal generator in the kansas-lava-papilio package that I originally made for the CHIP-8. This way, the VGA synchronization signals were abstracted away from me and I just had to take care of pumping out the actual pixels. This turned out to be tricker than I originally thought, since you have to time all read-aheads just right so that everything is at hand just in time for the next pixel. So before it finishes drawing the 8 pixels that make up a single row of a character, the next character index is loaded from RAM, and then the character ROM is consulted for the first row of the font image of the next indexed character. Initial versions were having some ghosting issues, or even more fun, full character transpositions (like showing the character from one line above in the first position of each line).

The Commodore PET diverts the vsync signal from the video generator to one of the PIA chips, which generates a CPU interrupt that can be acknowledged by reading from one of its memory-mapped registers. So the next obvious step was to implement this functionality to get the cursor blinking! This required more than just implementing a PIA, since I didn't even have interrupts in the CPU at that point.

But all that work was totally worth it:

And now, here we are

The current version supports keyboard input from PS/2 keyboards (but not all keys are mapped yet), so for the first time since I started working on this more than a month ago, it can be used to write and run BASIC programs!

What you can't see on the video below is that there's still a bug somewhere that causes the classic 10 PRINT "FOO": 20 GOTO 10 program to terminate with an out of memory error after some time.

Apart from fixing these bugs, the big outstanding feature is to add Datasette support so that programs can be loaded and saved to virtual "casettes". For a first version, I'll just burn some extra ROM onto the FPGA containing the tape images and hook that up to the PIA controlling the casette player; but I guess the proper way to do this would be to use something like an SD card reader to get proper persistent, read-writable storage. Or maybe, alternatively, have some kind of serial-over-USB communication with a computer acting as the Datasette unit.

My first computer

29 March 2014 (programming haskell FPGA electronics retro)

tl;dr: I've built a computer on a Xilinx FPGA using Kansas Lava, based on a virtual machine design from the mid seventies.

I would be lying if I said I always wanted to build my own computer. For a very long time, hardware didn't tickle my curiosity at all; and even today, I prefer staying away from dangling wires and soldering irons. I like my computing platforms to Just Work™, and hardware problems are just a huge hassle. But then in 2010 some coworkers of mine started getting into electronics and I learned from them just enough to start hooking some ICs up on a breadbord, and it seemed like a very fun diversion from all the high-level, abstract, softwarey stuff. In some sense, it filled the same void in me that assembly programing would probably have had. But even back in the days on the Commodore 64, I was looking for more BASIC extensions instead of going downwards to assembly / machine code.

One thing led to another and I was creating a Brainfuck CPU on a Papilio FPGA. It was a very satisfying experience, plus, I got to learn about a completely new world (that of digital hardware design and hardware description languages). So ever since then, I had a voice in the back of my head saying I should go all in, and implement a proper full computer, with I/O and all. But jumping straight into replicating a real computer seemed like trying to run before I can walk.

I can't remember now how I bumped into the CHIP-8 platform, but when I read about it in detail, I realized this is something I, a self-taught HDL guy, could realistically implement. To recap, it's a virtual machine spec from the mid seventies indended to be implemented on home computers of the time. It is super low performance, meaning I could implement everything the most naïve way possible and still get something workable out of it: graphics is 64×32 black & white, RAM is short of 4KBytes, CPU speed is not really part of the spec, and the only timers provided run at 60 Hz.

I can do this!

Setting the scene

The FPGA board that I targeted is a Papilio One 500K, which has 500K... somethings called "system gates", and about 40KByte of SRAM. The Arcade MegaWing provides a D-sub 15 VGA connector and two PS/2 ports, among some other stuff that I am not using for this project.

The home computer targeted by the original CHIP-8 spec only had a four-by-four keypad, so I am not 100% happy about using a PS/2 keyboard for the input. Maybe a later version could use a bespoke keypad connected to the unused pins of the LogicStart MegaWing. However, by using a PS/2 keyboard, I could not worry about the hardware side of things and just implement a decoder for the PS/2 protocol.

In terms of software, there are several games available for the CHIP-8. Maybe even donzens! But just for fun, after finishing the machine itself, I ended up writing my own game as well: a crappy port of the currently trending 2048 game.

Understanding the CHIP-8

I would say the CPU itself is a RISC architecture, in that it has 16 registers and not a whole lot of operations. But I'm not sure it would be called RISC back in its day.

The aforementioned 64×32 one-bit graphics is manipulated via a sprite interface: there's a special CPU opcode for xor-blitting an 8-pixel-wide, variable-height sprite onto any part of the screen. I went with the obvious way of implementing it as a 2048-bit frame buffer.

To validate my understanding, I first created a software emulator in Haskell. That unearthed a ton of edge cases and situations where I was not reading the spec with enough attention. Once I had the emulator working well enough to play the sample games I found, I enumerated the modules I'll need to implement.

The emulator running a Tetris game

A TODO list

Since I've already done the Brainfuck CPU, I didn't foresee too much difficulty in implementing the CPU proper (oh boy was I wrong). However, the world of peripherals was a new one to me.

I've never looked into VGA signal timings in detail before, and for some reason I just assumed that it's going to be just as complicated as PAL, about which I knew just enough to know that you have to generate all kinds of elaborate sync patterns. So actually reading the VGA spec was a relief, and I quickly came up with a scheme where my CHIP-8 computer would be running at 50 MHz, so that it can easily implement the 25 MHz pixel clock needed for 640×480@60 Hz. I went with this mode because it has several advantages:

The Papilio One itself has a clock of 32 MHz, and I first hoped that 32 Mhz and 25 Mhz is close enough that I can just generate a signal using the 32 MHz as the pixel clock. Turns out that's not quite how signal timings work.

Luckily, I've found out that the Papilio One also has something called a DCM which can divide and multiply the clock signal. I was able to go to 25 or 50 MHz easily with it. It's a bit of a black box component: I had to run a wizard in the Xilinx IDE to get a small binary file describing my DCM parameters, and then I integrated it into my build process by running another Xilinx program on it which spits out some magic bitstream.

The PS/2 protocol is a simple serial protocol with a slow (~10 KHz) clock and one parity bit per 8 data bits. Decoding it into a stream of bytes was a straightforward thing once I hunted down an actual PS/2 keyboard, since it turned out those USB to PS/2 dongles don't really work with regular USB keyboards; rather, the keyboard have to be made ready to "speak" PS/2 over the dongle. So I ended up getting a proper PS/2 keyboard from Mustafa Centre (thanks to Viki's sister Dori for picking it up for me); god only knows why they still had one on stock.

Tooling

The Xilinx tool suite seems to be aimed at being used from the IDE. This has several drawbacks: version controlling is complicated because generated artifacts litter the source tree; the editor itself in the IDE is of course crap compared to Emacs; and most importantly, I didn't intend to manually write either VHDL or Verilog. As I've mentioned before in my post about the Brainfuck CPU, I've found both of these hardware description languages to be lacking in the same way mainstream imperative programming languages are: the tools you, the developer, is given to introduce your own abstractions are very poor. So I planned to use Kansas Lava, an HDL embedded in Haskell.

Now, the first thing to note about Kansas Lava is, as nice at is, the software package itself is bit-rotten. The latest version available on Hackage cannot even be compiled with GHC 7.6. While the change to fix that is trivial, after that, you can easily bump into bugs in the Lava compiler itself. But more on that later. Later versions available from GitHub are not even self-consisent between the various dependencies. I've put my patched-up version of Kansas Lava and some of the packages it dependens on on GitHub and I'm trying to get Andy Gill to allow me to upload the bundle as a 0.2.4.1 update to Hackage. Don says I should maybe say fuck it and create my own Singapore Lava fork, just to stay with the Lava tradition of a twisty little maze of incompatible forks, all alike.

However, when it all works, it is amazing. I was able to extend the library of Papilio-specific Lava code that I created for the Brainfuck project with reusable modules for the VGA signal generator and the PS/2 decoder in such a way that they should be very easy to be re-used for any other future projects. And it's all type-safe, so for example, the Papilio Arcade MegaWing VGA port is statically enforced to be 4+4+4 bits whereas the LogicStart MegaWing is 3+3+2.

But there was one bug that left a bitter aftertaste in my mouth. Once I had both the CPU and the VGA parts working and I started running some test programs on the thing, I noticed that the framebuffer blitting is exhibiting "or" behaviour instead of "xor". Running the same code in the Lava simulator and dumping the contents of the framebuffer, it showed the correct, "xor" behaviour. After weeks of frustration and a complete rework of the communication system between the CPU, the VGA signal generator and the frame buffer to add memory read bussing, I finally took a look at the Lava compiler's source code to find that it simulates the xor2 primitive as xor but compiles it to or. How do you not notice this bug? Has the Kansas Lava suite been used by anyone for everything at all in the history of ever???

End result

The final result, after much trial and error, looking at the Lava simulator's output, and pouring over the code, is available here. I'm reasonably happy about the code base, except for the implementation of the CPU itself, which feels a bit spaghetti to me. Especially around the parts where it's waiting for results to come back from main RAM or the video framebuffer.

Below are some photographs and videos of it running two games: the memory puzzle game Hidden and the 2048 clone mentioned above. Unfortunately, the Tetris game from the screenshot above seems to have a bug in its input handling, in that it samples the keyboard at an unthrottled frequency; thus, even the shortest keypress sends the falling piece to the edge of the wall. I'll eventually need to disassemble the game and fix this.

The VGA signal generator is not as neat as it could be, because it doesn't do pre-fetching. This means by the time the current CHIP-8 pixel's value is read from the framebuffer, it is already too late, the first of the 8 real pixels for the given virtual CHIP-8 pixel should already have been drawn. This results in some ghosting. But since I know what's causing it, I will hopefully be able to fix this in some next version. But right now I'm too happy to have the whole thing just working.

The setup

Detail of the FPGA board itself

Brainfuck on the Commodore 64

31 December 2013 (programming brainfuck retro)

What would the Christmas holidays be without some silly hacking? This year, I had this most random idea to try to write something for the Commodore 64, the result of which is available on GitHub.

I grew up on the C64 (I think I was 7 when we got one), but interestingly, I never got into the whole low-level programming thing. Instead, I started using the Graphics BASIC extension after growing out the built-in BASIC interpreter. So I never learned too much about the details of the 6510 CPU, or all the other driver chips like the SID or the VIC-II; but everyone knew even back then that you can't get anything serious done with the performance of BASIC interpreters.

These days, of course, you have very nice cross-compiler tools so you can get both the performance and the advantages of a somewhat-higher-level programming language. My idea for a project was to use the CC65 C cross-compiler to write a compiler for Brainfuck.

So after reading up on the 6502 processor family, it seemed very straightforward to implement the Brainfuck instructions by just storing the cell pointer in the X register and loading to / storing from the A register to do cell arithmetics. Of course, to get good performance, we compile bunches of the same Brainfuck instruction together, so e.g. +++ is compiled into

       LDA mem,X ; 4 cycles
       CLC       ; 2 cycles
       ADC 3     ; 2 cycles
       STA mem,X ; 5 cycles

instead of the slower

       INC mem,X ; 7 cycles
       INC mem,X
       INC mem,X

(see e.g. here for instruction timings).

So I wrote some C code that generates code like the above, machine code byte by byte. The only interesting part is compiling code for the [ and ] Brainfuck instructions. First of all, the 6502 only has conditional branch instructions for near jumps, whereas in Brainfuck the loop body can be arbirarily large. And second, since the compiler is to be on-line in the sense that it emits the machine code as it reads the Brainfuck source without ever holding the full source in memory (otherwise the source could take up too much of the already limited 64 kilobytes of memory), we have actually no idea when processing a [ how to jump over the loop. Because of these factors, the code I generate for [ is like this:

loop:  LDA mem,X
       BNE enter
       JMP $0000
enter: ... ; rest of the program  

where the argument to the JMP instruction is zeroed out initially. We store the loop address in a stack in the compiler, and when we encounter the matching ], apart from emitting the code for JMP loop, we also edit loop+6 to store the current target address.

So that takes care of a traditional compiler. However, my compiler also has a transpiler mode, where it emits C64 BASIC from Brainfuck. Writing this was a lot of fun because I finally got to learn the details of how BASIC programs were stored on these low-powered home computers. Turns out programs were pre-tokenized line by line, so for example, the following line:

10 IF X% = 0 THEN GOTO 100

is stored as

10 <IF> X% <=> 0 <THEN> <GOTO> 100

So the variable name X% and number literals 10, 0 and 100 are stored as text, but the keywords IF, THEN, GOTO and the = operator are stored as single-byte identifiers. I think it's interesting that number literals are stored as text instead of numbers.

The overall structure of the BASIC transpiler is otherwise the same as the compiler. But typing LIST and seeing my generated BASIC code for the first time was so much more satisfying than looking at the generated machine code in the monitor.

In closing, here's a screencast of the program in action.

Entries from all tags