Running Tetris On My Arduino-Based Emulator — Part 4 of a Series

Raphael Stäbler
5 min readJan 19, 2020


This is part 4 of a series that started with my introduction to the project of creating my very own handheld console. After learning a lot about the internal workings of the original Gameboy, it’s finally time to see some results. This is also going to conclude the emulator development part of the project for a while as it will be time to build some hardware first. But now: Tetris.

As of now, I have a complete CPU implementation and a basic address bus with a good idea of what to connect to it. I can now go ahead and connect a Tetris cartridge to my address bus. Remember, the first 32 kBytes of the memory map are mapped directly to the cartridge and as it so happens the Tetris cartridge takes up exactly that amount of space. So, let’s hook it up! In my case this means putting all of the 32768 bytes of the Tetris ROM at the beginning of my memory array.

If I run my emulator now it actually starts executing the Tetris code. It doesn’t get very far, though, as it gets caught up in an endless loop. Yet that’s intentional and it’s all operating properly. The game is just waiting for a vertical blank interrupt to occur. I mentioned this interrupt in my last post, stating that it’s basically the main game loop. The interrupt is requested by the video controller as it marks the end of a full frame. So, without a video controller there’s not much going to happen — and even if it were, without video output I won’t be seeing any of it anyway.

Video Controller

The main job of the video controller is to draw the pixel data of the video RAM according to the information stored in both the sprite attributes memory and the display control register. I’ve talked about all but the latter in a previous post. The display control register defines what sprite size to use and what part of the video RAM is holding the background map.

There is other stuff that’s controlled by the display control register and other registers, such as what color palette to use, whether sprites are enabled or not and whether and how to scroll the background map. There’s also a window feature that allows for drawing a background map on just a part of the screen. I’m not going to implement any of this right now as it’s not required to properly run Tetris.

You can find the current implementation of my video controller here.

Wiring It All Up

My test display hooked up to the Teensy 4.0 running the emulator

There’s not a lot to do in terms of hooking up my test display to the Teensy development board. This 2.4 inch display module is controlled by a serial interface (SPI) and has full library support within the Arduino ecosystem which means there’s no need for me to write a driver for it. I’m using a slightly modified version of the ILI9341_t3 library that was written by Paul Stoffregen, the creator of the Teensy board. I only added a method that allows me to easily transfer a horizontal line to the display — because that’s how my video controller is going to work, by generating one line after the other until a full frame is rendered. After that, a vertical blank interrupt is requested and it all starts from the top.

The actual communication between the Teensy and the display module is done with 5 wires: A clock wire, a chip select wire, a data/command wire, a master in / slave out wire, and a master out / slave in wire. Those are very common for SPI transmissions and I will be talking about this in more detail in future posts as I will be dealing with SPI a lot more.

Additionally, there’s a wire to connect a common ground between the two boards as well as a power connection through a resistor for the background illumination of the screen.


In my post about the CPU I was talking about the importance of counting cycles and how the DMG CPU can be understood as running at either 4 MHz or 1 MHz. I opted for 1 MHz and that’s the target speed for my emulator. Ideally, it would run faster, so I have the option of adding complexity or slowing it down artificially.

I can measure the speed of my emulated CPU by letting it run for some time while counting the total cycles executed. By dividing cycles by time I get the amount of cycles per second aka Hertz.

Doing this reveals a total speed of 0.68 MHz. That’s significantly below the targeted 1 MHz.

It actually turns out that a lot of time is lost during communication with the display. If I turn off communication with the display I get an emulated speed of 1.77 MHz.

The reason for this being that the SPI data transfers aren’t optimized for speed and take up a lot more of the Teensy’s CPU resources than actually necessary. The CPU has a lot of idle time during each transfer right now — time that could be better spent running the emulator.

It’s not entirely trivial to rewrite the display library to make use of the ARM Cortex’ direct memory access (DMA) features — which would be necessary in order to optimize the SPI communication. Because of that and because I won’t be using this display for my final build anyway, I won’t bother right now. At this point it’s just important to note that the performance can be drastically improved.


This concludes the first part of the project which has been the proof of concept. It’s now apparent that the Teensy 4.0 is capable of running a Gameboy emulator.

Next will be some hardware development. The screen for the final build isn’t as easy to connect as the test screen. I will have to develop driver circuitry as well as some kind of video RAM in order to use it properly with this project. I plan on switching to a video blog format for this as it’s more suited for showing and explaining the concepts involved.

Thanks for reading, I hope you enjoyed it. If you have any questions or would like to know more about the project, feel free to leave a comment or contact me on Instagram or Twitter!