NES Emulation – PPU Quick Start

Writing an NES emulator is a long process.  If you’re reading this you’ve likely finally been able to finish implementing the 6502, and now you just want to see some output!  While there is a lot of information generally available for the NES PPU it can be difficult to identify the best point of entry to start developing this component.  The documentation dives into everything from cycle accuracy to scrolling, and it can be a bit daunting trying to dig in.  While these items are definitely all important, how do you simply start getting some semblance of a game running?  This guide’s focus isn’t on getting anywhere near a perfect PPU implementation, but to explain the minimum components needed to start rendering and provide details on where to start.

The quickest path forward is to start with the implementation of rendered backgrounds.  There are three main pieces that are combined to make up a game’s background.  Name tables, pattern tables, and attribute tables. The CPU also needs to know when it can write name tables, this is accomplished through NMI interruption.  We’ll describe below each of these and share examples of expected output when each of these are rendered.

At a very high level, the PPU generates 341 pixels for each 262 scan lines. As it moves through this process 240 scan lines produce visible pixels. During these scanlines most games do not write to the PPU memory as it can produce various unexpected (or expected, in some games that take advantage of this) effects. The PPU triggers a vblank period where it announces it is not currently rendering to the screen, and that a game can write to its memory.

This diagram does a great job showing a visual representation of the rendering period: https://wiki.nesdev.org/w/index.php?title=File:Ntsc_timing.png

NMI Interrupt

If you’ve successfully gotten a game running against your 6502 you may notice you’re looping against a repetitive set of CPU opcodes. If you have not yet implemented your NMI interrupt your CPU is likely trying to request an NMI interrupt. The NMI interrupts are determined by 2 of the 8 bytes of CPU memory that control a set of PPU registers.

0x2000 – PPU Controller – The CPU can write to this register. When the CPU wants to write to the PPU it can request an NMI interrupt. This is accomplished by writing 1 to bit 7 of this register.

0x2002 – PPU Status – When a vblank period has been entered this register indicates this by setting bit 7 of this register to 1.

When both of the described bits above are 1 an NMI interrupt should occur. Additional detail on the specific implementation details of NMI interrupts can be found at https://wiki.nesdev.org/w/index.php/NMI

Nametables

Once you’ve successfully implemented an NMI interrupt you should see the CPU start trying to write to 0x2006 and 0x2007. These registers allow the CPU to write data to the PPU’s memory and populate a game’s nametables. Nametables describe the state of the current background.  For our initial implementation, let’s forget about mirroring and focus just on rendering the current name table. 

Donkey Kong Title Screen

The PPU memory maps to 4 names tables stored from 0x2000 to 0x2fff (NOTE: this is separate PPU memory and is separate from the 0x2000 in the CPU memory). The currently displayed nametable is described by bits 0 and 1 of the controller register (0x2000 in the CPU memory). These bits allow us to determine from which nametable we should populate our background. Names tables describe a 32X30 set of tiles on the screen. Each byte in the applicable nametable indexes into the upcoming pattern tables.

Donkey Kong Demo

When I started to test my name tables, I simply assigned a color to each byte and rendered to a 32X30 screen in SDL2. Output of this step is displayed. While it may not be the most attractive output, it’s easy to see how this can start making up our final title screen.


Provides a good overview of where each name tables starts in the PPU memory: https://wiki.nesdev.org/w/index.php/PPU_nametables
https://wiki.nesdev.org/w/index.php/PPU_memory_map

Pattern Tables

Pattern tables provide the detail that make up the games we know and love. Accessing pattern tables is fairly easy. Pattern tables live in the CHR ROM obtained from the underlying cartridge. This detail is then loaded into the PPU memory starting from 0x0000 to 0x1fff.

Raw Pattern Tables
Donkey Kong – Incorrect Palette

There are two pattern tables, and each pattern table is 128X128 pixels and made up of a set of 16X16 tiles. When I first started with my implementation, I first starting by making sure I could render the raw pattern tables to a 256X128 SDL2 window. Each tile is 8×8 pixels, and each is made up by 16 bytes. Each section of 16 bytes is divided up into 2 sets of 8 bytes. Each byte is then broken up into its 8 bits to define patterns.

Donkey Kong Title Screen

For example, if byte 0 is 01011010 and byte 8 is 11000011 this would form the first 8 pixels. This would then be combined to determine an index into the applicable color in the upcoming attribute table. This example would produce the following indexes 23011032.

Once you’re able to build the raw pattern table, you would simply need to use the index provided in the nametable to apply the correct tile in the pattern table. As mentioned, the nametables provided a grid of 32X20 bytes. Each byte represents and index into the pattern table to a tile. Each pattern table tile is 8×8 pixels. This provides our standard NES output screen of 256X240.

The following was helping is building pattern tables: https://wiki.nesdev.org/w/index.php/PPU_pattern_tables

Attribute Tables

Donkey Kong

Our last piece to build our background are attribute tables. Attribute tables sit in the last 64 bytes of memory in our applicable nametable. Attribute tables provide color detail for our screen in an 8×8 grid, so each byte details the color information for a 2×2 section of pattern tiles.

Every two bits of an attribute table byte details the index into the color palette used for that tile. Palette colors are started starting at 0x3f00 in the PPU memory.

Donkey Kong – Incorrect Pattern Table

Bytes 0 and 1 – Top Left Namespace Tile
Byte 2 and 3 – Top Right Namespace Tile
Byte 4 and 5 – Bottom Left Namespace Tile
Byte 6 and 7 – Bottom Left Namespace Tile

Starting at 0x3f00 color palettes are defined with each color palette containing 4 bytes. Our palette index determined from our attribute table indexes into this memory space, and the index provided by the individual bits in our pattern tables indexes into 1 of the four colors of each palette.

If you’re seeing colors that seem to be applied consistently, but aren’t producing the expected colors I’d recommend checking your nametables. The image above shows a similar situation where the high and low bit making up the nametable was switched.

The PPU has a lot of complexity that makes it difficult to determine how best to start tackling. Hopefully this guide will help clear up some of the confusion. Now off to build sprite rendering!

Posted in Uncategorized | Leave a comment

My OSU Journey

I’ve been a student at OSU now for a little over 3 years, and it’s a bit of an odd feeling to realize that this will be my last course. Like most students in the post-bacc program I was frustrated with my previous career path and needed a change. While I think I could have moved onto a different career path without going back for another degree, I do think OSU provided the structured path to learn fundamentals that I was looking for.

Previous to my current position I worked for a public accounting firm in international tax focusing mostly on improving engagement efficiency and developing models for our clients. A lot of my day-to-day was either pushing forward Excel/VBA models, or migrating existing models to other platforms. While this did provide plenty of opportunity to develop from a coding perspective (I migrated a ton of models to web-based CRUD applications), I was the most senior technology resource on my team at the time, and didn’t feel like I was developing my technical skillset as quickly as I’d like.

My decision to attend OSU happened right around the same time that one of the models I was developing was picked up to be moved under an enterprise development team. As I was tasked with managing the direction of the development with this offshore team I had a bit of imposter syndrome with my self-taught background. OSU did an excellent job in helping me develop my background on fundamentals, and really did help me feel more confident working with a highly technical team.

OSU also allowed me to network more directly with those sharing similar interests as me. Previously most of my professional network consisted of those in the accounting world. It was difficult with my background to find interesting technical opportunities. Participating in the unofficial OSU Slack has allowed me to expand this network, and to find more interesting opportunities.

Just a few months ago networking paid off, and I found an interesting opportunity with a start-up through a classmate on Slack. I was eager to work in such an environment as opposed to simply moving on to work with another large company/development team. The opportunity was also perfect from a technical perspective as their engineering team is made up primarily of former FAANG employees. It seemed like the perfect opportunity for me, and I made the decision to leave the world of accounting.

So far, I’m very happy with the decision. Being able to concentrate purely on a technical role has been a breath of fresh air, and I’ve enjoyed the learning opportunities. The team members I’ve worked with have taught me a ton, and I feel like I’m growing as an engineer. I’m excited about where I’m currently driving my career, and I’m not sure I’d be in the same place without OSU.

Posted in Uncategorized | Leave a comment

6502 Addressing Modes

Our group has been developing an NES emulator, and we recently moved onto implementing the CPU. The NES CPU is a variation of the MOS Technology 6502, and contains 56 different op codes that can utilize 13 different addressing modes. These addressing modes inform the op codes as to which address against which to operate. I’m currently diving deep into the addressing modes, so wanted to detail my findings here.

The following details my notes around the 13 addressing modes, and how each operate:

Accumulator

These are instructions that operate against the currently value in the accumulator. They require two cycles to complete.

Absolute, Absolute X, Absolute Y

With this addressing mode, two additional bytes are read after the op code address that make up a word. This word defines a specific address against which to operate. These instructions can additionally be incremented by the value in the X or Y indexes. These instructions require 4 CPU cycles to complete. When indexing against the address, if incrementing the absolute address causes the address to cross pages (changes the value of the high byte) then an additional cycle is required.

Immediate

With this addressing mode 1 additional byte is read following the instruction. This byte represents a specific value against which to operate. This addressing mode requires 2 cycles.

Implied

The implied addressing mode basically means there is no addressing mode. These are instructions that operate against specific areas of memory. This are op codes like BRK and those that change processer flags directly. This addressing mode requires 2 cycles to complete.

Indirect

These instructions return a value located at a specific address. When this addressing mode is run, a word is read following the instruction which represents a memory location. This addressing mode then reads the value at that location and returns this value. Indirect requires 5 cycles.

Indirect X, Indirect Y

These are similar to the previous indirect. except a single byte is read following the instruction. This is then added to the zero page, and offset by the X or Y register. The word located at the calculated address is then returned. Indirect X increments without a carry operation, while Y is incremented and can carry. Indirect X uses 6 CPU cycles, while indirect Y uses 5. If indirect Y carries it uses 6 CPU cycles.

Relative

This addressing mode is used for branch operations. When used, a single byte following the instruction is read. This byte represents a signed integer that offsets from the current program counter. This instruction requires 2 cycles. If the offset causes the address to cross into a new page an additional cycle is required (for 3 total).

Zeropage, Zeropage X, Zeropage Y

This addressing mode targets values following a beginning of the memory (zeropage), and are very quick to run. When run a byte is read following the instruction representing an offset from the zero page. Zeropage X and Y additionally add the X or Y index (as applicable) to this offset. If the additional of the index causes the instruction to carry, the carry is dropped and the returned address instead just wraps back around to start at zero. They are mainly used to access frequently referenced variables. Zeropage requires 3 CPU cycles, while Zeropage X and Zeropage Y require 4.

Understanding and implementing these addressing modes are key to pushing forward on the opcodes themselves. Successfully implementing and testing various scenarios around addressing modes should make moving onto the opcodes themselves a much easier exercise.

This documentation is sourced primarily from 6502 Instruction Set (masswerk.at).

Posted in Uncategorized | Leave a comment

Integrating Emscripten and CMake with VS Code

In planning for the development of our NES emulator, we decided to target WASM for our interface and to utilize CMake to generate our build system.  As I also utilize VS Code as my IDE, I wanted to integrate these components in such a way as to ease my development process.  I’m hoping this documentation will help others in setting up their environment.  This documentation is focused on setup for a Linux environment, though I would imagine a lot of the VS Code configuration steps would translate to other environments.

Install Dependencies

In addition to installing VSCode and any other development tools you may need, you’ll also need to install CMake and Emscripten.  I was able to install an appropriate version of CMake through my package manager:  

apt-get install cmake

Installing Emscripten was a bit more difficult, though well documented.  I followed the instructions at https://emscripten.org/docs/getting_started/downloads.html.  In summary:  


#Get the emsdk repo   
git clone https://github.com/emscripten-core/emsdk.git  
# Enter that directory

cd emsdk
# Fetch the latest version of the emsdk (not needed the first time you clone)

git pull
# Download and install the latest SDK tools.

./emsdk install latest
# Make the "latest" SDK "active" for the current user. (writes .emscripten file)

./emsdk activate latest
# Activate PATH and other environment variables in the current terminal

source ./emsdk_env.sh


Install VSCode Extensions

I’m using a number of VS Code extensions to ease my development process.  The following extensions will need to be installed: 
– C/C++ – For language support. 
– CMake Tools
– Provides CMake language support and extends features into the IDE UI. 
– C++ TestMate Legacy – Adds testing to the side bar, supports Catch2.  I had issues with the latest version, but the legacy version worked with no issues.


Setup CMake Project

The exact structure and contents of CMake files will vary by project, but my projects is setup as follows.  The structure was based largely on the example here: https://cmake.org/examples/  

– root
– CMakeLists.txt
– include – Project Header Files
– src
– wasm – CMakeLists.txt and Emscripten specific files.  The CMake build statements here are wrapped in an if statement and will only execute when CMake is targeting Emscripten.
– emulator – CMakeLists and all emulator library files.   
– tests – CMakesLists.txt and various unit test files.  The CMake build statements here are wrapped in an if statement and will only execute when CMake is not targeting Emscripten.  
Using either the demo example or pulling our repo https://github.com/ericcolvinmorgan/NESEmulation directly, you should now be able to build using CMake.  Run the follow to build:  

mkdir build && cd build && cmake ..

If you’re able to successfully build via the command line, you should also now see a CMake extension in the sidebar and your project outline should be included here.

Setup Emscripten Environment

Getting everything setup so IntelliSense is working, and so you can easily switch between Emscripten and another compiler is the next step.  I setup my environment so CMake provides IntelliSense directly to VS Code.  This feature is provided by the C++ extension and is enabled via the configurationProvider setting.  A full reference can be found at https://code.visualstudio.com/docs/cpp/c-cpp-properties-schema-reference.  This setting should be defined as follows in the .vscode/c_cpp_properties.json file:  

"configurationProvider": "ms-vscode.cmake-tools"

You may still see IntelliSense errors when you first make this change, or when compiling using different kits.  You should see the IntelliSense update appropriately when you build through the CMake extension.

Finally, while there are a number of preconfigured kits available by default, we’ll need to define a CMake kit specific to Emscripten.  https://vector-of-bool.github.io/docs/vscode-cmake-tools/kits.html does a good job walking through what Kits are and how to set them up.  This process is definitely more straight forward than it sounds, and Emscripten makes this process incredibly easy by providing a toolchain file as part of the SDK — this file can be viewed at https://github.com/emscripten-core/emscripten/blob/main/cmake/Modules/Platform/Emscripten.cmake.  As mentioned directly in the file, the toolchain file “teaches CMake about the Emscripten compiler, so that CMake can generate makefiles from CMakeLists.txt that invoke emcc.”  The final kit definition used is as follows, it will need to be updated for the specific locations to where you installed your emsdk in the instructions above:    

[
  {
    "name": "Emscripten",
    "toolchainFile": "/usr/local/lib/emsdk/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake",
    "compilers": {
      "C": "/usr/local/lib/emsdk/upstream/emscripten/emcc",
      "CXX": "/usr/local/lib/emsdk/upstream/emscripten/em++"
    }
  }
]

If everything is setup correctly you should now be able to toggle your active kit to Emscripten in the status bar at the bottom of VS Code.  Clicking build should then compile using the emsdk.  These steps will make it very easy to compile the project to WASM when we’re ready to test in a browser, and toggle to a different compiler when testing against our test suite.  Setting up an unfamiliar environment is always a time-consuming process.  Hopefully this documentation will help save someone a few hours when building a similar environment.

Posted in Uncategorized | Leave a comment

Exploring MarI/O

During the holidays (while exploring various potential projects for this course) I dug a bit into some of the various neural networks being used to play video games.  A few examples I enjoyed reading follow:

A Neuroevolution Approach to General Atari Game Playing (http://www.cs.utexas.edu/users/pstone/Papers/bib2html-links/TCIAIG13-mhauskn.pdf),

Playing Atari with Deep Reinfocement Learning (https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf)

SethBling’s MarI/O (https://www.youtube.com/watch?v=qv6UVOQ0F44)

While all are great, I’m going to focus specifically on some of my notes/thoughts while exploring MarI/O’s input layer.  MarI/O is an implementation of a genetic algorithm used to play Super Mario World (“SMW”) for the SNES.  While digging into this implementation, I discovered the fascinating community surrounding SNES ROM hacking.

For SMW specifically, there exists a community called SMW Central (https://www.smwcentral.net/).  This community consists of dedicated enthusiasts who create and share various hacks of the original ROM.  This community has created and maintains very detailed technical specifications of SMW, and freely discusses and shares detail for other games as well.  This resource was very invaluable in digging into the inputs for MarI/O.

The following is a list of memory locations utilized by MarI/O for SMW to create the inputs:

VariableMemory Location – Per MarI/ODescription (From SMW Central – https://www.smwcentral.net/?p=nmap&m=smwram)
getPositions – marioX0x94$7E0094 – Player X position (16-bit) within the level, next frame (calculates player position one frame ahead, as opposed to $7E:00D1). It’s also used as a player X position on-screen on the overworld border.
getPositions – marioY0x96$7E0096 – Player Y position (16-bit) within the level, next frame (calculates player position one frame ahead, as opposed to $7E:00D3). It’s also used as a player Y position on-screen on the overworld border.
getPositions – layer1x0x1A$7E1462 – Layer 1 X position, current frame. Mirror of SNES register $210D.
getPositions – layer1y0x1C$7E1464 – Layer 1 Y position, current frame. Mirror of SNES register $210E.
getTile – return0x1C800$7FC800 – Map16 high byte table. Same format as $7E:C800. $7F:FFF8 through $7F:FFFD are also used by Lunar Magic’s title screen recording ASM.
getSprites – status0x14C8$7E14C8 – Sprite status table:   #$00 = Free slot, non-existent sprite. #$01 = Initial phase of sprite. #$02 = Killed, falling off screen. #$03 = Smushed. Rex and shell-less Koopas can be in this state. #$04 = Killed with a spinjump. #$05 = Burning in lava; sinking in mud. #$06 = Turn into coin at level end. #$07 = Stay in Yoshi’s mouth. #$08 = Normal routine. #$09 = Stationary / Carryable. #$0A = Kicked. #$0B = Carried. #$0C = Powerup from being carried past goaltape.   States 08 and above are considered alive; sprites in other states are dead and should not be interacted with.
getSprites – spritex0xE4$7E00E4 – Sprite X position, low byte.
getSprites – spritex0x14E0$7E14E0 – Sprite X position, high byte.
getSprites – spritey0xD8$7E00D8 – Sprite Y position, low byte.
getSprites – spritey0x14D4$7E14D4 – Sprite Y position, high byte.
getExtendedSprites – number0x170B$7E170B – Extended sprite number.  Last two bytes reserved for fireballs.
getExtendedSprites – spritex0x171F$7E171F – Extended sprite X position, low byte. Last two bytes reserved for fireballs.
getExtendedSprites – spritex0x1733$7E1733 – Extended sprite X position, high byte. Last two bytes reserved for fireballs.
getExtendedSprites – spritey0x1715$7E1715 – Extended sprite Y position, low byte. Last two bytes reserved for fireballs.
getExtendedSprites – spritey0x1729$7E1729 – Extended sprite Y position, high byte. Last two bytes reserved for fireballs.

While the list is fairly straight forward, digging into the “why” behind each of these was very interesting.  Let’s start with the inputs from the getPositions function.  This RAM maps to Mario’s X and Y coordinate on the screen.  This is used to drive the fitness within the algorithm, and to help determine the Map 16 tiles that need to be pulled from memory for purposes of building input tiles, and determining Mario’s distance from enemies.  It doesn’t appear as if the layer1 variables are ultimately utilized in any of the calculations.

The getTile variable was probably the most confusing for me to find specific detail around.  This variable maps to the map16 high byte.  Thanks to the resources at SMW Central I was able to find Lunar Magic, which help clear up a lot of my confusion around how this memory was being used.  In general, a Map 16 high byte of 01 in SMW maps to tiles that are solid – these can include everything from ground tiles, question mark blocks, slopes, and (unfortunately for the Mario character being controlled by this implementation) even spikes. 

Map16 Tiles – Example 1 – The tiles in this tilemap seem like they would generally work fairly well. 🙂
Map16 Tiles – Example 2 – The tiles in this tilemap have tiles that would kill Mario. 🙁

These inputs are all brought together in the getInputs function to ultimately drive the creation of a feature matrix within the neural network representing small areas of the screen.  All features are initially assigned a value of 0, then is further assigned a value of 1 if there is a solid tile in that particular area, or a value of -1 if an enemy (sprite) or obstacle (extended sprite) exists in that particular area.

While it isn’t lost on me that MarI/O was created mostly to create an awesome YouTube video, and I shouldn’t be too critical, using this input feature selection wouldn’t work for a large number of the game’s levels.  Simply assigning a weight to a tile based on the high byte loses a lot of information that could be provided to the neural network, and will lead to inconsistent results across various tilesets.

For example, as discussed above, the example 2 tileset contains spikes with a high byte of 01. The image below shows an area where this would likely lead to issues for MarI/O. Based on how the current inputs are utilized, the spikes would be assigned a weight of 1.

The spikes on this level wouldn’t be recognized as being different than other ground tiles.

Ultimately, I’d like to pick up and continue to explore this (and other) implementations for ML in gaming. This examination lead me down a path of exploring emulation as a project for this course. I’m excited to dig more into emulation by building an NES emulator as a project for this course.

Posted in Uncategorized | Leave a comment