6502 Addressing Modes

Our group has been developing an NES emulator, and we recently moved onto implementing the CPU. The NES CPU is a variation of the MOS Technology 6502, and contains 56 different op codes that can utilize 13 different addressing modes. These addressing modes inform the op codes as to which address against which to operate. I’m currently diving deep into the addressing modes, so wanted to detail my findings here.

The following details my notes around the 13 addressing modes, and how each operate:

Accumulator

These are instructions that operate against the currently value in the accumulator. They require two cycles to complete.

Absolute, Absolute X, Absolute Y

With this addressing mode, two additional bytes are read after the op code address that make up a word. This word defines a specific address against which to operate. These instructions can additionally be incremented by the value in the X or Y indexes. These instructions require 4 CPU cycles to complete. When indexing against the address, if incrementing the absolute address causes the address to cross pages (changes the value of the high byte) then an additional cycle is required.

Immediate

With this addressing mode 1 additional byte is read following the instruction. This byte represents a specific value against which to operate. This addressing mode requires 2 cycles.

Implied

The implied addressing mode basically means there is no addressing mode. These are instructions that operate against specific areas of memory. This are op codes like BRK and those that change processer flags directly. This addressing mode requires 2 cycles to complete.

Indirect

These instructions return a value located at a specific address. When this addressing mode is run, a word is read following the instruction which represents a memory location. This addressing mode then reads the value at that location and returns this value. Indirect requires 5 cycles.

Indirect X, Indirect Y

These are similar to the previous indirect. except a single byte is read following the instruction. This is then added to the zero page, and offset by the X or Y register. The word located at the calculated address is then returned. Indirect X increments without a carry operation, while Y is incremented and can carry. Indirect X uses 6 CPU cycles, while indirect Y uses 5. If indirect Y carries it uses 6 CPU cycles.

Relative

This addressing mode is used for branch operations. When used, a single byte following the instruction is read. This byte represents a signed integer that offsets from the current program counter. This instruction requires 2 cycles. If the offset causes the address to cross into a new page an additional cycle is required (for 3 total).

Zeropage, Zeropage X, Zeropage Y

This addressing mode targets values following a beginning of the memory (zeropage), and are very quick to run. When run a byte is read following the instruction representing an offset from the zero page. Zeropage X and Y additionally add the X or Y index (as applicable) to this offset. If the additional of the index causes the instruction to carry, the carry is dropped and the returned address instead just wraps back around to start at zero. They are mainly used to access frequently referenced variables. Zeropage requires 3 CPU cycles, while Zeropage X and Zeropage Y require 4.

Understanding and implementing these addressing modes are key to pushing forward on the opcodes themselves. Successfully implementing and testing various scenarios around addressing modes should make moving onto the opcodes themselves a much easier exercise.

This documentation is sourced primarily from 6502 Instruction Set (masswerk.at).

Posted in Uncategorized | Leave a comment

Integrating Emscripten and CMake with VS Code

In planning for the development of our NES emulator, we decided to target WASM for our interface and to utilize CMake to generate our build system.  As I also utilize VS Code as my IDE, I wanted to integrate these components in such a way as to ease my development process.  I’m hoping this documentation will help others in setting up their environment.  This documentation is focused on setup for a Linux environment, though I would imagine a lot of the VS Code configuration steps would translate to other environments.

Install Dependencies

In addition to installing VSCode and any other development tools you may need, you’ll also need to install CMake and Emscripten.  I was able to install an appropriate version of CMake through my package manager:  

apt-get install cmake

Installing Emscripten was a bit more difficult, though well documented.  I followed the instructions at https://emscripten.org/docs/getting_started/downloads.html.  In summary:  


#Get the emsdk repo   
git clone https://github.com/emscripten-core/emsdk.git  
# Enter that directory

cd emsdk
# Fetch the latest version of the emsdk (not needed the first time you clone)

git pull
# Download and install the latest SDK tools.

./emsdk install latest
# Make the "latest" SDK "active" for the current user. (writes .emscripten file)

./emsdk activate latest
# Activate PATH and other environment variables in the current terminal

source ./emsdk_env.sh


Install VSCode Extensions

I’m using a number of VS Code extensions to ease my development process.  The following extensions will need to be installed: 
– C/C++ – For language support. 
– CMake Tools
– Provides CMake language support and extends features into the IDE UI. 
– C++ TestMate Legacy – Adds testing to the side bar, supports Catch2.  I had issues with the latest version, but the legacy version worked with no issues.


Setup CMake Project

The exact structure and contents of CMake files will vary by project, but my projects is setup as follows.  The structure was based largely on the example here: https://cmake.org/examples/  

– root
– CMakeLists.txt
– include – Project Header Files
– src
– wasm – CMakeLists.txt and Emscripten specific files.  The CMake build statements here are wrapped in an if statement and will only execute when CMake is targeting Emscripten.
– emulator – CMakeLists and all emulator library files.   
– tests – CMakesLists.txt and various unit test files.  The CMake build statements here are wrapped in an if statement and will only execute when CMake is not targeting Emscripten.  
Using either the demo example or pulling our repo https://github.com/ericcolvinmorgan/NESEmulation directly, you should now be able to build using CMake.  Run the follow to build:  

mkdir build && cd build && cmake ..

If you’re able to successfully build via the command line, you should also now see a CMake extension in the sidebar and your project outline should be included here.

Setup Emscripten Environment

Getting everything setup so IntelliSense is working, and so you can easily switch between Emscripten and another compiler is the next step.  I setup my environment so CMake provides IntelliSense directly to VS Code.  This feature is provided by the C++ extension and is enabled via the configurationProvider setting.  A full reference can be found at https://code.visualstudio.com/docs/cpp/c-cpp-properties-schema-reference.  This setting should be defined as follows in the .vscode/c_cpp_properties.json file:  

"configurationProvider": "ms-vscode.cmake-tools"

You may still see IntelliSense errors when you first make this change, or when compiling using different kits.  You should see the IntelliSense update appropriately when you build through the CMake extension.

Finally, while there are a number of preconfigured kits available by default, we’ll need to define a CMake kit specific to Emscripten.  https://vector-of-bool.github.io/docs/vscode-cmake-tools/kits.html does a good job walking through what Kits are and how to set them up.  This process is definitely more straight forward than it sounds, and Emscripten makes this process incredibly easy by providing a toolchain file as part of the SDK — this file can be viewed at https://github.com/emscripten-core/emscripten/blob/main/cmake/Modules/Platform/Emscripten.cmake.  As mentioned directly in the file, the toolchain file “teaches CMake about the Emscripten compiler, so that CMake can generate makefiles from CMakeLists.txt that invoke emcc.”  The final kit definition used is as follows, it will need to be updated for the specific locations to where you installed your emsdk in the instructions above:    

[
  {
    "name": "Emscripten",
    "toolchainFile": "/usr/local/lib/emsdk/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake",
    "compilers": {
      "C": "/usr/local/lib/emsdk/upstream/emscripten/emcc",
      "CXX": "/usr/local/lib/emsdk/upstream/emscripten/em++"
    }
  }
]

If everything is setup correctly you should now be able to toggle your active kit to Emscripten in the status bar at the bottom of VS Code.  Clicking build should then compile using the emsdk.  These steps will make it very easy to compile the project to WASM when we’re ready to test in a browser, and toggle to a different compiler when testing against our test suite.  Setting up an unfamiliar environment is always a time-consuming process.  Hopefully this documentation will help save someone a few hours when building a similar environment.

Posted in Uncategorized | Leave a comment

Exploring MarI/O

During the holidays (while exploring various potential projects for this course) I dug a bit into some of the various neural networks being used to play video games.  A few examples I enjoyed reading follow:

A Neuroevolution Approach to General Atari Game Playing (http://www.cs.utexas.edu/users/pstone/Papers/bib2html-links/TCIAIG13-mhauskn.pdf),

Playing Atari with Deep Reinfocement Learning (https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf)

SethBling’s MarI/O (https://www.youtube.com/watch?v=qv6UVOQ0F44)

While all are great, I’m going to focus specifically on some of my notes/thoughts while exploring MarI/O’s input layer.  MarI/O is an implementation of a genetic algorithm used to play Super Mario World (“SMW”) for the SNES.  While digging into this implementation, I discovered the fascinating community surrounding SNES ROM hacking.

For SMW specifically, there exists a community called SMW Central (https://www.smwcentral.net/).  This community consists of dedicated enthusiasts who create and share various hacks of the original ROM.  This community has created and maintains very detailed technical specifications of SMW, and freely discusses and shares detail for other games as well.  This resource was very invaluable in digging into the inputs for MarI/O.

The following is a list of memory locations utilized by MarI/O for SMW to create the inputs:

VariableMemory Location – Per MarI/ODescription (From SMW Central – https://www.smwcentral.net/?p=nmap&m=smwram)
getPositions – marioX0x94$7E0094 – Player X position (16-bit) within the level, next frame (calculates player position one frame ahead, as opposed to $7E:00D1). It’s also used as a player X position on-screen on the overworld border.
getPositions – marioY0x96$7E0096 – Player Y position (16-bit) within the level, next frame (calculates player position one frame ahead, as opposed to $7E:00D3). It’s also used as a player Y position on-screen on the overworld border.
getPositions – layer1x0x1A$7E1462 – Layer 1 X position, current frame. Mirror of SNES register $210D.
getPositions – layer1y0x1C$7E1464 – Layer 1 Y position, current frame. Mirror of SNES register $210E.
getTile – return0x1C800$7FC800 – Map16 high byte table. Same format as $7E:C800. $7F:FFF8 through $7F:FFFD are also used by Lunar Magic’s title screen recording ASM.
getSprites – status0x14C8$7E14C8 – Sprite status table:   #$00 = Free slot, non-existent sprite. #$01 = Initial phase of sprite. #$02 = Killed, falling off screen. #$03 = Smushed. Rex and shell-less Koopas can be in this state. #$04 = Killed with a spinjump. #$05 = Burning in lava; sinking in mud. #$06 = Turn into coin at level end. #$07 = Stay in Yoshi’s mouth. #$08 = Normal routine. #$09 = Stationary / Carryable. #$0A = Kicked. #$0B = Carried. #$0C = Powerup from being carried past goaltape.   States 08 and above are considered alive; sprites in other states are dead and should not be interacted with.
getSprites – spritex0xE4$7E00E4 – Sprite X position, low byte.
getSprites – spritex0x14E0$7E14E0 – Sprite X position, high byte.
getSprites – spritey0xD8$7E00D8 – Sprite Y position, low byte.
getSprites – spritey0x14D4$7E14D4 – Sprite Y position, high byte.
getExtendedSprites – number0x170B$7E170B – Extended sprite number.  Last two bytes reserved for fireballs.
getExtendedSprites – spritex0x171F$7E171F – Extended sprite X position, low byte. Last two bytes reserved for fireballs.
getExtendedSprites – spritex0x1733$7E1733 – Extended sprite X position, high byte. Last two bytes reserved for fireballs.
getExtendedSprites – spritey0x1715$7E1715 – Extended sprite Y position, low byte. Last two bytes reserved for fireballs.
getExtendedSprites – spritey0x1729$7E1729 – Extended sprite Y position, high byte. Last two bytes reserved for fireballs.

While the list is fairly straight forward, digging into the “why” behind each of these was very interesting.  Let’s start with the inputs from the getPositions function.  This RAM maps to Mario’s X and Y coordinate on the screen.  This is used to drive the fitness within the algorithm, and to help determine the Map 16 tiles that need to be pulled from memory for purposes of building input tiles, and determining Mario’s distance from enemies.  It doesn’t appear as if the layer1 variables are ultimately utilized in any of the calculations.

The getTile variable was probably the most confusing for me to find specific detail around.  This variable maps to the map16 high byte.  Thanks to the resources at SMW Central I was able to find Lunar Magic, which help clear up a lot of my confusion around how this memory was being used.  In general, a Map 16 high byte of 01 in SMW maps to tiles that are solid – these can include everything from ground tiles, question mark blocks, slopes, and (unfortunately for the Mario character being controlled by this implementation) even spikes. 

Map16 Tiles – Example 1 – The tiles in this tilemap seem like they would generally work fairly well. 🙂
Map16 Tiles – Example 2 – The tiles in this tilemap have tiles that would kill Mario. 🙁

These inputs are all brought together in the getInputs function to ultimately drive the creation of a feature matrix within the neural network representing small areas of the screen.  All features are initially assigned a value of 0, then is further assigned a value of 1 if there is a solid tile in that particular area, or a value of -1 if an enemy (sprite) or obstacle (extended sprite) exists in that particular area.

While it isn’t lost on me that MarI/O was created mostly to create an awesome YouTube video, and I shouldn’t be too critical, using this input feature selection wouldn’t work for a large number of the game’s levels.  Simply assigning a weight to a tile based on the high byte loses a lot of information that could be provided to the neural network, and will lead to inconsistent results across various tilesets.

For example, as discussed above, the example 2 tileset contains spikes with a high byte of 01. The image below shows an area where this would likely lead to issues for MarI/O. Based on how the current inputs are utilized, the spikes would be assigned a weight of 1.

The spikes on this level wouldn’t be recognized as being different than other ground tiles.

Ultimately, I’d like to pick up and continue to explore this (and other) implementations for ML in gaming. This examination lead me down a path of exploring emulation as a project for this course. I’m excited to dig more into emulation by building an NES emulator as a project for this course.

Posted in Uncategorized | Leave a comment