Entries tagged brainfuck
What would the Christmas holidays be without some silly hacking? This year, I had this most random idea to try to write something for the Commodore 64, the result of which is available on GitHub.
I grew up on the C64 (I think I was 7 when we got one), but interestingly, I never got into the whole low-level programming thing. Instead, I started using the Graphics BASIC extension after growing out the built-in BASIC interpreter. So I never learned too much about the details of the 6510 CPU, or all the other driver chips like the SID or the VIC-II; but everyone knew even back then that you can't get anything serious done with the performance of BASIC interpreters.
These days, of course, you have very nice cross-compiler tools so you can get both the performance and the advantages of a somewhat-higher-level programming language. My idea for a project was to use the CC65 C cross-compiler to write a compiler for Brainfuck.
So after reading up on the 6502 processor family, it seemed very straightforward to implement the Brainfuck instructions by just storing the cell pointer in the X register and loading to / storing from the A register to do cell arithmetics. Of course, to get good performance, we compile bunches of the same Brainfuck instruction together, so e.g. +++ is compiled into
LDA mem,X ; 4 cycles CLC ; 2 cycles ADC 3 ; 2 cycles STA mem,X ; 5 cycles
instead of the slower
INC mem,X ; 7 cycles INC mem,X INC mem,X
(see e.g. here for instruction timings).
So I wrote some C code that generates code like the above, machine code byte by byte. The only interesting part is compiling code for the [ and ] Brainfuck instructions. First of all, the 6502 only has conditional branch instructions for near jumps, whereas in Brainfuck the loop body can be arbirarily large. And second, since the compiler is to be on-line in the sense that it emits the machine code as it reads the Brainfuck source without ever holding the full source in memory (otherwise the source could take up too much of the already limited 64 kilobytes of memory), we have actually no idea when processing a [ how to jump over the loop. Because of these factors, the code I generate for [ is like this:
loop: LDA mem,X BNE enter JMP $0000 enter: ... ; rest of the program
where the argument to the JMP instruction is zeroed out initially. We store the loop address in a stack in the compiler, and when we encounter the matching ], apart from emitting the code for JMP loop, we also edit loop+6 to store the current target address.
So that takes care of a traditional compiler. However, my compiler also has a transpiler mode, where it emits C64 BASIC from Brainfuck. Writing this was a lot of fun because I finally got to learn the details of how BASIC programs were stored on these low-powered home computers. Turns out programs were pre-tokenized line by line, so for example, the following line:
is stored as
So the variable name X% and number literals 10, 0 and 100 are stored as text, but the keywords IF, THEN, GOTO and the = operator are stored as single-byte identifiers. I think it's interesting that number literals are stored as text instead of numbers.
The overall structure of the BASIC transpiler is otherwise the same as the compiler. But typing LIST and seeing my generated BASIC code for the first time was so much more satisfying than looking at the generated machine code in the monitor.
In closing, here's a screencast of the program in action.
There's this wonderful paper detailing the computation model discovered in the x86 memory management unit, or MMU for short. Basically, by setting up the page table right, you can arrange for a page fault handler to lead to another page fault. If you set things up juuuust right, you can get a whole cascade of page faults. You can then modulate this to implement a single instruction, Harvard-architecture, Turing-complete computer using Move-Decrement-BranchIfZero, or MOVDBZ for short, with the following semantics in pseudocode:
MOVDBZ (x ← y-1) THEN l ELSE l′ = do val ← read x if val == 0 then do write y 0 jump l else do write y (val-1) jump l′
I'm going to refer to x and y as the source and target registers below, but it's important to understand that these are not the real registers of the x86 processor that we are talking about here; rather, these are the registers of this virtual machine living inside the MMU. The term Harvard architecture means the instruction labels l and l′ come from a completely separate namespace as the register (c.f. von Neumann computer).
Another important thing is that from the real x86 processor's point of view, the whole computation takes up zero cycles. That is because the resolution of the page fault happens between (or, to be more precise, in the middle of) instructions, so the whole cascade will be executed by the MMU before the next x86 instruction runs.
The paper linked above has all the details; this is just a summary to motivate what follows. Readers of my blog already know Brainfuck is Turing-complete, since we managed to compile register machine code into Brainfuck. So let's continue that chain of reasoning (motivated by this Proggit comment), and prove the Turing-completeness of MovDBz by compiling Brainfuck into MovDBz.
First of all, I'm going to add two new instructions to our one-instruction machine; hopefully we can all agree they don't change the spirit of the challange:
- HALT is already implicitly part of the original language, since you can set up parts of the page table in such a way that the cascade stops
- I've added PRINT to be able to do some I/O. We won't be aiming for input, so the Brainfuck instruction , is not going to be supported.
At initial glance, there seem to be two tricky aspects of Brainfuck with respect to MovDBz:
- Brainfuck has (limited) indirect addressing: there's a pointer register that addresses into the other data cells. MovDBz has nothing like that: every instruction has the source and target registers hardcoded.
- Brainfuck has increment and decrement operators, whereas MovDBz only has decrement.
The first problem is solved by basically turning this:
else if ptr == 1 then increment cells
else if ptr == 2 then increment cells
else if ptr == n then increment cells[n]
This latter construct is easy to implement if we nominate one of our MovDBz registers as a temporary register tmp:
scan(1): MOVDBZ (tmp ← tmp-1) THEN found(1) ELSE scan(2)
scan(n): MOVDBZ (tmp ← tmp-1) THEN found(n) ELSE impossible
found(i): increment cells[i]
where n is the maximum index of Brainfuck cells.
Note that this means the generated MovDBz program's size will be linear in the number of Brainfuck cells. The Brainfuck spec calls for 30,000 cells; when compiling to MovDBz, it is usually a good idea to use a lot less cells.
At first glance, it might seem impossible to implement incrementing when all we have is a single decrement operator. Modular arithmetic comes to the rescue: since x + 1 ≡ x - 255 (mod 256), we can do repeated decrements (255 times) to emulate incrementing. Of course, we also have to handle specially the case when the value gets to 0.
So with the help of a constant register C256 we keep clamped to 256, we can turn this:
count: MOVDBZ (tmp ← tmp-1) THEN next ELSE dec
dec: MOVDBZ (cells[i] ← cells[i]-1) THEN uflow ELSE count
uflow: MOVDBZ (cells[i] ← C256-1) THEN impossible ELSE count
We can implement incrementing the pointer (> in Brainfuck) similarily, by having another constant register storing the number of cells (i.e. n+1).
Putting it all together
To prototype these ideas, I've written a compiler that uses the above techniques to turn Brainfuck code into MovDBz. The Haskell implementation is available on Github, but be aware that it is somewhat-horrible spaghetti code.
One nice aspect of it, though, is that the compiler generates semantic labels and registers (like Cell i for the register containing the i'th Brainfuck cell, or ScanFinished i (Inc l) for the label of the instruction that starts incrementing the Cell i and eventually goes to label l when finished). This can make the output almost readable... For example, here's the result of compiling -.>. with two Brainfuck cells:
λ> mapM_ print $ compileBF 1 [DecData, Output, IncPtr, Output] (Src 0,MOVDBZ C0 C0 (S (Scan 0 (Dec (Src 1)))) (S (Scan 0 (Dec (Src 1))))) (Src 1,MOVDBZ C0 C0 (S (Scan 0 (Print (Src 2)))) (S (Scan 0 (Print (Src 2))))) (Src 2,MOVDBZ C0 C0 (S (DoIncPtr (Src 3))) (S (DoIncPtr (Src 3)))) (Src 3,MOVDBZ C0 C0 (S (Scan 0 (Print (Src 4)))) (S (Scan 0 (Print (Src 4))))) (Src 4,HALT) (S (Scan 0 (Dec (Src 1))),MOVDBZ Ptr Tmp (S (ScanFinished 0 (Dec (Src 1)))) (S (Scan 1 (Dec (Src 1))))) (S (Scan 0 (Print (Src 2))),MOVDBZ Ptr Tmp (S (ScanFinished 0 (Print (Src 2)))) (S (Scan 1 (Print (Src 2))))) (S (Scan 0 (Print (Src 4))),MOVDBZ Ptr Tmp (S (ScanFinished 0 (Print (Src 4)))) (S (Scan 1 (Print (Src 4))))) (S (Scan 1 (Dec (Src 1))),MOVDBZ Tmp Tmp (S (ScanFinished 1 (Dec (Src 1)))) (S End)) (S (Scan 1 (Print (Src 2))),MOVDBZ Tmp Tmp (S (ScanFinished 1 (Print (Src 2)))) (S End)) (S (Scan 1 (Print (Src 4))),MOVDBZ Tmp Tmp (S (ScanFinished 1 (Print (Src 4)))) (S End)) (S (ScanFinished 0 (Dec (Src 1))),MOVDBZ C0 C0 (S (DecCell 0 (Src 1))) (S (DecCell 0 (Src 1)))) (S (ScanFinished 0 (Print (Src 2))),PRINT (Cell 0) (Src 2)) (S (ScanFinished 0 (Print (Src 4))),PRINT (Cell 0) (Src 4)) (S (ScanFinished 1 (Dec (Src 1))),MOVDBZ C0 C0 (S (DecCell 1 (Src 1))) (S (DecCell 1 (Src 1)))) (S (ScanFinished 1 (Print (Src 2))),PRINT (Cell 1) (Src 2)) (S (ScanFinished 1 (Print (Src 4))),PRINT (Cell 1) (Src 4)) (S (DecCell 0 (Src 1)),MOVDBZ (Cell 0) (Cell 0) (S (UnderflowCell 0 (Src 1))) (Src 1)) (S (DecCell 1 (Src 1)),MOVDBZ (Cell 1) (Cell 1) (S (UnderflowCell 1 (Src 1))) (Src 1)) (S (UnderflowCell 0 (Src 1)),MOVDBZ CMaxData (Cell 0) (S End) (Src 1)) (S (UnderflowCell 1 (Src 1)),MOVDBZ CMaxData (Cell 1) (S End) (Src 1)) (S (DoIncPtr (Src 3)),MOVDBZ CMaxAddr Tmp (S (DoIncPtrLoop (Src 3) IncPtrDecCounter)) (S (DoIncPtrLoop (Src 3) IncPtrDecCounter))) (S (DoIncPtrLoop (Src 3) IncPtrDecCounter),MOVDBZ Tmp Tmp (Src 3) (S (DoIncPtrLoop (Src 3) IncPtrDecPtr))) (S (DoIncPtrLoop (Src 3) IncPtrDecPtr),MOVDBZ Ptr Ptr (S (DoIncPtrLoop (Src 3) IncPtrUnderflowPtr)) (S (DoIncPtrLoop (Src 3) IncPtrDecCounter))) (S (DoIncPtrLoop (Src 3) IncPtrUnderflowPtr),MOVDBZ CMaxAddr Ptr (S (DoIncPtrLoop (Src 3) IncPtrDecCounter)) (S (DoIncPtrLoop (Src 3) IncPtrDecCounter))) (S End,HALT)
Which, after label and register layouting, comes out as this:
λ> mapM_ print $ layout 1 $compileBF 1 [DecData, Output, IncPtr, Output] (0,MOVDBZ 0 0 5 5) (1,MOVDBZ 0 0 6 6) (2,MOVDBZ 0 0 25 25) (3,MOVDBZ 0 0 7 7) (4,HALT) (5,MOVDBZ 3 4 11 8) (6,MOVDBZ 3 4 12 9) (7,MOVDBZ 3 4 13 10) (8,MOVDBZ 4 4 14 29) (9,MOVDBZ 4 4 15 29) (10,MOVDBZ 4 4 16 29) (11,MOVDBZ 0 0 17 17) (12,PRINT 5 2) (13,PRINT 5 4) (14,MOVDBZ 0 0 18 18) (15,PRINT 6 2) (16,PRINT 6 4) (17,MOVDBZ 1 4 19 19) (18,MOVDBZ 1 4 22 22) (19,MOVDBZ 4 4 1 20) (20,MOVDBZ 5 5 21 19) (21,MOVDBZ 1 5 19 19) (22,MOVDBZ 4 4 1 23) (23,MOVDBZ 6 6 24 22) (24,MOVDBZ 1 6 22 22) (25,MOVDBZ 2 4 26 26) (26,MOVDBZ 4 4 3 27) (27,MOVDBZ 3 3 28 26) (28,MOVDBZ 2 3 26 26) (29,HALT)
About two and a half years ago, a wave of interest in electronics swept across the Budapest office of Intentional. We desperately wanted to create something tangible from first principles. The idea we settled on was to design and eventually build a CPU that uses Brainfuck as its machine language.
Looking back, it really was a case of people with insufficient knowledge trying to use inappropriate tools. But damn if we didn't have fun during the process! After filling a couple of notebooks with sketches, we ended up with an initial design in Logisim. It had horrible timing problems, of course, since too much of it was asynchronous. Before ironing out all the wrinkles, though, I remember Maya pointing at one of the lines tangled up on the screen, and saying "You guys realise this single line will be 16 wires if we actually want to solder this together, right?" So basically we gave up on building the thing. Later on, Maya and Encsé went on to enroll to a bachelor's program in EE as a hobby; and I decided to stick to discrete logic, ordered a bunch of 7400 TTL's and some LEDs and seven-segment displays and started wiring together much simpler circuits on breadboards. I never got to soldering, not to mention getting access to anything that could produce PCB's, Then as I moved to Singapore, I left all my electronics stuff at home, and put the whole electronics thing on the backburner indefinitely.
Then, a couple months ago I discovered the Papilio FPGA platform, which has this collection of nice IO daughterboards (called "wings") that snap right into it, no soldering or even wiring required. I ordered one with the LogicStart IO board which features, among other, more advanced stuff, eight toggle switches and four seven-segment displays. Perfect for my baby steps into the world of FPGA's!
So what else could my Hello World project have had been, than the Brainfuck CPU.
We can use the Harvard architecture: since the Brainfuck language has no reflection capabilities, the program can be stored in ROM with no programmable access. Memory is implemented as a RAM of 32K 8-bit bytes. The CPU also has several internal registers:
- Internal state: As we will see, it takes several clock cycles for the CPU to execute a given Brainfuck opcode. Rather unfortunately, I ended up with 9 states, so I need 4 bits to store it.
- Program counter (PC): directly connected to the address pin of the ROM.
- Opcode register: loaded from the data pin of the ROM in the Fetch state.
- Memory pointer (idx): a 15-bit register that is directly connected to the address pin of the RAM.
- Data out register: for simplicity's sake, new values for RAM[idx], and the write enable leg of the RAM, are buffered
- Depth counter (DC): Used in the implementation of the [ and ] opcodes (see below)
Output is implemented by an 9-bit signal: 8 bits of data and an enable bit. When a . opcode is encountered, the CPU sets these 9 bits, and enters a special state until it receives an acknowledgment signal. Input is implemented similarily. On the actual board, the output signals are connected to the seven-segment display, the input signals are fed from the eight toggle switches, and the directional "mini-joystick" is used to acknowledge input/output.
Implementing [ and ]
Compared to a normal machine language, it's really just [ and ] that requires special handling. Everything else is just straightforward manipulation of either idx or RAM[idx] via incrementing/decrementing; or pushing data between RAM[idx] and the IO port. [ and ] are tricky because we need to search for their matching pairs, and pre-processing the Brainfuck program to attach pair addresses would be against the spirit of this (self-imposed) challange.
One solution would be to maintain a stack in a separate RAM, and push PC into it whenever a [ is encountered. In that case, ] is a simple matter of popping PC if RAM[idx] does not equal 0. However, here we've basically changed [/] from a while loop to a do while loop. So if RAM[idx] is 0 when we first enter the [, we have to scan the program forward to find its matching ].
For simplicity's sake, I decided not to worry about performance and skip the stack part, and just implement scanning in both directions. Scanning is where the DC register is used (explained here for [, but ] is similar): if the opcode is [ and RAM[idx] is 0, DC is set to 1, and the CPU enters a special skip-forward state from the next opcode. In this state, only [ and ] opcodes have any effect: they increment and decrement, respectively, the DC register. When DC gets to 0, we know we've found the matching ], and so we can go back to the regular fetch-execute cycle.
Lava → VHDL → Xilinx
I originally planned to implement the whole thing in VHDL, and compile that using the Xilinx synthesizer tools (since the Papilio One board uses a Xilinx FPGA chip). However, I've found VHDL to be quite a horrible language from a software programmer's point of view. The whole point, for me, of going from physical chips to FPGA's was to enable using abstractions to manage complexity; so why settle for a poor language like VHDL? Fortunately, there's a whole family of Haskell-embedded DSLs for hardware description called Lava. Of these, Kansas Lava seemed the only one actively maintained, and it already had support for a Xilinx dev board; so adding support for the Papilio was straightforward (see my kansas-lava-papilio package).
The complete code for my Brainfuck CPU (including IO via the LogicStart daughterboard) is available on GitHub. There are quite some rough edges left to file off; I'd say the most pressing is adding the ability to synthesize the program ROM separately from the CPU definition.
This first video shows a simple countdown (actually, count-up) program: ,[>+.<-]. I had to record these slightly out-of-focus, otherwise the seven-segment LEDs were hard to read.
Next up is "Hello world!":
This is a continuation of my previous post on register machines vs. Brainfuck programs. We left off at Brainfuck's supposed Turing-completeness.
Now, the most straightforward way to prove Turing-completeness of a given language is to write a compiler that takes a program written in a language that is already known to be Turing-complete, and creates a program written in the language to be proved Turing-complete, that simulates the original program. So an obvious way to prove that Brainfuck is a Turing-complete language is to compile register machine programs into Brainfuck. This has the added advantage that a programmer having some experience in real-world assembly programming can easily write register machine programs, which can then be compiled into (horribly inefficient and over-complicated, as we'll see) Brainfuck programs.
Important note: Of course, to really prove, in a mathematical sense, that Brainfuck is Turing-complete, we would first have to define formal operational semantics for register machines and Brainfuck programs to be even able to argue about simulating one with the another. In this post, I will appeal to intuition instead.Continue reading »
The Brainfuck programming language stays a somewhat current topic at the office ever since Maya's 9-digit program, so much so, that now we've even started designing our own Brainfuck computer using first principle logic gates only. But more on that later. Today and in my next post, I'm going to write about compiling register machines to Brainfuck.
The register machine is an abstract mathematical machine much like Turing machines or finite automata, and it can be shown to be Turing-complete. On the other hand, it models very closely how real-world computers operate. The program of a register machine (RM for short) is a labelled list of instructions, each operating on one of a finite number of memory cells called registers, holding values of natural numbers. The allowed instructions in the specific formulation that we'll be using here is:
- Increment (inc) r, Decrement (dec) r
- Clear (clr) r: Sets the value of register r to 0
- Copy (mov) rdest rsrc: Overwrite the contents of register rdest with the contents of register rsrc
- Jump to (jmp) label: Continue execution from the instruction marked with label
- Jump if (jz) r is zero to label: If the contents of register r is non-zero, continue execution as normal. If it is zero,
- Output (out) r, Input (in) r: Not strictly necessary for the theory to work out, it's convenient to include serial input/output in the model
Note that this set of operations is redundant in that both clr and mov can be easily implemented in terms of the others — they are included here only for clarity later on.
For example, to add the contents of register a to b, we can write the following program using a temporary register z:
- clr z
- jz a 7
- dec a
- inc z
- inc b
- jmp 2
- jz z 11
- dec z
- inc a
- jmp 7
The Haskell code for parsing register machine programs and an instantiator for a simple little macro language is available on GitHub.
Brainfuck is a joke programming language, one designed to be as much of a pain to use as possible, while also being powerful enough to solve real problems with it (languages like that are commonly called Turing tar-pits). The link above explains the intentionally minimalistic syntax and the semantics of the language in detail. The sketchy version is that there is a linear array of memory cells and a pointer moving on this array, and the operations can move the pointer and change the contents of the currently pointed cell:
- Increment cell (+), Decrement cell (-)
- Move pointer left (<) or right (>)
- Loop (): Repeatedly execute the statements between the [ and the ], as long as the value of the pointed cell is non-zero at the start of each iteration.
- Input (,), Output (.): These modify or print the contents of the currently pointed cell.
You can find my parser, interpreter and x86 compiler for Brainfuck here.
In the next post, we will prove that Brainfuck is Turing-complete by translating register machine programs into Brainfuck. As you can see, there are two differences between RM and Brainfuck: one is that RM uses random access for the registers, whereas Brainfuck can access only one register at a time; the other, more major one is that you can't change the order of execution arbitrarily in Brainfuck. This is why we will have to use some trickery to come up with a translation scheme.
If you're feeling impatient, you can, of course, take a peek in the meantime at my RM → Brainfuck compiler in GitHub.