Disambiguating and Disassembling the Space Invaders ROM


[visible confusion] Ewan McGregor Obi-Wan Kenobi Star Wars: Episode IV - A New Hope Face Forehead Chin Nose Human Fictional character

Dis-a-what? Simply put, a disassembler is a program that translates a stream of hex numbers back into assembly language source code. The disassembler will allow us to familiarize ourselves with 8080 instruction set. It might also be useful for debugging later on in this project!

The Space Invaders ROM is available online as 4 separate files: invaders.e, invaders.f, invaders.g, and invaders.h. Individually, these files constitute a portion of the Space Invaders ROM that is loaded into the 8080 CPU in order for the game to run. When these files are combined in the correct order, we get the full Space Invaders ROM!

Let’s perform a hexdump of the newly created invaders.rom file and see what we’re working with:

Ok this looks like something! It seems like the first 3 bytes are 8080 NOP codes (don’t do anything and proceed to the next instruction). Now we get to the first byte with some useful information: address 0x03 has the value c3. Looking at the 8080 datasheet, we can see that 0xc3 translates to “JMP <addr>”. This is a jump instruction! But what does this mean?

Following the byte indicating a JMP, the next two bytes (<addr>) indicate a destination address in the ROM, which contains the next instruction to be executed. From a low-level programming perspective, JMP causes the CPU to load <addr> into the Program Counter (PC), which is a special register that keeps track of the next instruction to be executed.

As each instruction in the program is executed, the PC is consulted and updated accordingly. Sometimes the PC is updated based on the length of the preceding instruction (instructions on the 8080 can be 1, 2, or 3 bytes in length), or, in the case of a JMP instruction, a constant is loaded into the PC, indicating a specific address. The PC is responsible for program flow; it knows where in the code the program currently is, and will eventually know the next instruction to execute.

Ok, back to the JMP instruction. We see that bytes 0x03 through 0x05 contain hex data: c3 d4 18. Seems easy enough, 0xc3 is JMP, so we just jump to address 0xd418 right? Not so fast!

https://miro.medium.com/max/1000/0*at3oEghVoPfpwVhh.jpg
Do you break your egg the right way or are you a heathen?

In the 1726 novel Gulliver’s Travels, a ship surgeon is stranded on an island inhabited by a race of diminutive people called the Lilliputians. The Lilliputians are feuding with each other and split into two-factions: the Little-Endians and the Big-Endians. The Big-Endians believe that a hard-boiled egg should always be broken and peeled from the bigger side of the egg before it’s consumed; it’s how the Lilliputians historically ate their eggs. The Little-Endians on the other hand, think that you’re a brute and a traitor if you break your egg on the large side. Eggs should be broken from the little side prior to eating, as decreed by the king and performed by Lilliputians of culture! Gulliver gets caught up in this feud and eventually needs to decide which side he stands on, but that’s not important. What’s important is that this terminology is used in the present-day to explain two different kinds of CPUs.

Perhaps you’ve heard of little-endian vs. big-endian CPUs. The distinction between the two is important for understanding why certain CPUs work the way they do. Big-endian is an order in which the “big end” (most significant value in the sequence) is stored first, at the lowest storage address.

For example, 64 = 0100 0000 is stored as 0x40. But in little-endian order, the most significant value is stored last, at the highest storage address. 64 is stored as 0x04. In the case of the 8080, which is a 16-bit processor, this concept only applies to 16 bit chunks of information. Notice that the data is structured c3 d4 18, and not d4 18 c3.

It turns out that the Intel 8080 is a little-endian CPU, so information is stored and processed from smallest byte to the largest byte. c3 d4 18 actually translates to JMP 0x18d4. The goal of our team’s disassembler will be to output the assembly instructions such that the addresses referenced in each instruction are outputted in big-endian order. This isn’t strictly necessary, but it’s easier to understand upon observing the output of the disassembler.

Ok so how do we actually write the disassembler?

It’s actually fairly simple:

1.) We open a file of compiled 8080 code (the invaders.rom file).

2.) Read it into a memory buffer

3.) Loop through the buffer until we reach the end.

4.) Based on the current hexadecimal byte, we print the assembly language equivalent and advance the PC by either 1, 2, or 3 bytes. Keep in mind that when we read an instruction like JMP, we advance the PC by 3 for the disassembler; we’re not actually running the game yet, we’re just trying to output the binary as readable assembly code!

5.) Return from the disassembler function once all the instructions are printed!

Tune in next week, where we’ll mull over how to actually implement the 8080 instructions needed to run Space Invaders!


Leave a Reply

Your email address will not be published. Required fields are marked *