I will explain the technique I used for the game scroll in another post, but in this post I wanted to focus on explaining how does the flag waving animation in the title screen work! I want to explain how it works, since the method achieved pretty nice results and it is not too complex, and it could be useful to other people trying to add similar effects to other MSX games!
My goal was to figure out a way to play a small "video loop" in the background that looked very smooth. I wanted to do this since I saw this video of a fire animation converted to the ZX Spectrum specs. I thought it would be amazing to have something like that on a game. So, I started thinking how could I convert a small video on my computer to play on the MSX, and how could I make it small enough to fit on a small game.
I was initially wanting to convert a fire animation (a real video of a fire, not one of those fire visual effects), but I struggled to find any that I could easily convert to a loop with a handful of frames (e.g., 16 frames or so). So, I remembered that back in the day, when we were working on the F-1 Spirit remake, we had this fancy flag animation in the background of the title screen. So, I thought that would be an easier one to convert! I started looking for real videos of a waving flag, but again I struggled to find one I could easily turn into a small 16 frame loop. So, I just wrote a little program to generate my own flag. It doesn't look real, but it did the trick.
Alright, so, before I describe the technique, let me take a brief moment to explain the technical specs of the MSX graphic chip, which will define the constraints of the approach!
The MSX VDP
Most first generation MSX computers shipped with the TMS9918 graphics chip, which I will just call "the VDP" (Video Display Processor). The VDP has several graphic modes, but I will focus on what is known as "Screen 2" in the MSX world, which is the most powerful graphic mode the VDP offers.
Screen 2 is a 256x192 pixel graphic mode, where the screen is divided into 8x8 pixel blocks called "patterns" or "tiles". The screen is thus 32 patterns wide (32*8 = 256) and 24 patterns tall (24*8 = 192), as is shown in this small diagram:
Each pattern is defined by two components a "pattern table entry" and a "color table entry", which are combined in the following way to draw the pattern:
So, as can be seen above, the "pattern table entry" (8 bytes per pattern) defines the "shape" of the pattern, and the "color table entry" (8 bytes per pattern) defines the color of the pixels. Two colors can be specified in each byte of the color table entry, which correspond to the 2 colors that can be used in the each of the 8 rows of the pattern. Each of the bits in the pattern table entry corresponds to a pixel in the pattern, and whether its a 0 or a 1 determines which of the two colors specified for the corresponding row will be drawn. This is where the "2 colors for each block of 8x1 pixels" comes in the MSX!
In order to draw something in the screen, what you need to do is to write data into the "video RAM" (VRAM from now on), and as a result, the content of the screen will change. The VDP has 16KB of VRAM, divided into a collection of areas:
- the Pattern Table: we can specify up to 768 different patterns, divided into 3 different banks (as shown in the picture above; more on this below). Each pattern is 8 bytes long, with each byte corresponding to one of the rows of the pattern (each bit corresponding to 1 pixel), as described above.
- the Color Table: also has 768 entries of 8 bytes each, specifying the colors of each of the 768 patterns as described above.
- the Name Table: finally, the name table is a 768 byte table that specifies which pattern to draw in each of the 768 positions of the screen. For example, if we define two patterns (a smiley face in pattern 0, and a sad face in pattern 1), if we write 0, 1, 0, 1, 0, 1, etc. in the first 32 bytes of the name table, we will see a row of smiley and sad faces. So, once we define a pattern, we can reuse it multiple times in the screen. There is only one limitation here, and it is that since we can only use 1 byte to specify which pattern to draw, we can only select from 256 patterns (since that's the maximum number of values we can represent with a single byte). Since there are 768 patterns, these are divided in "3 banks". In the top 8 rows of the screen (BANK 1), we can only select from patterns 0 - 255, in the middle rows, we can select from patterns 256 - 511 (BANK 2), and in the bottom 8 rows from 512 - 767 (BANK 3).
- and then a couple of tables for sprites (but we will ignore sprites for now :))
By looking at this structure you can already probably see why some MSX games have a very "chunky" scroll of 8x8 pixels! It's just because the easiest way to implement scroll in MSX is by just defining our graphics as patterns, and then just moving them with the name table (which lets us move them only in increments if 8 by 8 pixels!). If we want smoother scroll, then we have to figure out a way to do it, by using just the tools I described above!
Another thing to have in mind is the slow speed of the MSX CPU (Z80) and the even slower speed of data transfer to the VDP! The MSX Z80 has a clock of 3.58 MHz (3580000 ticks per second). If we want to have a smooth video/game that renders at 50 frames per second, we only have 71600 ticks per frame. If we have in mind that it takes usually 29 ticks to transfer a single byte to the VDP. So, we can only optimistically transfer about 2500 bytes per frame to the VDP, which is not enough to redraw a whole frame! 768 patterns * (8 bytes for pattern table + 8 bytes for color table) = 12288 bytes, which is the reason for it being so hard to have smooth scroll on MSX...
Ok, so that's the VDP. Now, how do we get a smooth animation on such a slow hardware?
Step 1: Generating the Video
To generate the source video, I just wrote a small Java script to render a flag waving loop with 16 frames of 256x192 pixels in black and white (I wanted to have gray-scale, but I'll explain below why I decided against it). The original video looks like this:
Step 2: Extracting the Patterns
The next step is to go one by one each of the 16 frames of the animation, and extract all the different patterns needed to display this animation. For doing this, we scan each 8x8 pixel block in each of the 16 frames, and we create a list of all the different patterns we find. Also, since once this is uploaded to the VDP, each of the 3 banks of the VDP is separate, we do this for each bank separately. With the video above, we find the following:
- BANK 1: 787 different patterns
- BANK 2: 510 different patterns
- BANK 3: 772 different patterns
So, assuming we could store all of these tiles in the VDP, what we would have to do is this:
1) Store all of these tiles in the game ROM: (787+510+772)*16 = 33104 bytes
2) Now, each frame of the animation would just be 768 bytes (the name table for each frame). Since we have 16 frames, this would be: 768*16 = 12288 bytes
3) Then, at run time, we just load all the patterns in the VDP initially. And at each game frame, we just keep uploading the name table that corresponds to the current frame. This is just 768 bytes per frame, which is doable.
1) The VDP can only store 256 patterns per bank, and thus 787, 510 and 772 do not fit!
2) Even if they were to fit, that animation already would use most of the size of the ROM cartridge! (I was targeting a 48KB ROM)
So, we need to reduce that somehow :)
Step 3: Reducing the Set of Patterns Needed
If we were to just display the flag animation by itself, we could afford to have 256 patterns per bank. However, I wanted to have the game logo drawn on top of the flag, and maybe have some text message at the bottom of the screen. So, I had to reserve some patterns for that. In order to show the title image, I needed these patterns (not all patterns are strictly needed, but it was easier for me to have these patterns, which I reuse in other parts of the game):
The answer is a standard machine learning family of algorithms called clustering algorithms. Clustering algorithms work as follows: the input is a large collection of points in an n-dimensional space, and the output is a "clustering" (a grouping of the input points into a collection of "buckets" called "clusters"), where the points in a cluster should look more similar to each other than to points in other clusters. So, what I did was to consider every pattern as a point in a 64-dimensional space (one dimension per pixel in the pattern), and I gave all 787 patterns for BANK 1 as input to a clustering algorithm (a very simple one called k-medoids), and I asked the algorithm to cluster them into 144 clusters. From each cluster, I then pick the most "representative" pattern (the "medoid"). I then do the same thing with BANK 2, and BANK 3. I then replace every pattern in the flag animation by the representative pattern of its cluster. This effectively reduces the number of patterns needed for the animation to just 144 per bank!
The resulting animation now looks like this:
Notice that it mostly looks like the same animation, except for a few glitches! So, this already looks promising! After that, I just manually replaced the medoid of a couple of clusters by a different one to fix those dots at the top of the animation, and the animation already looked much cleaner!
Step 4: Translate the Animation to MSX Data
Finally, now we can just take those 144 patterns per BANK, and save them to a binary file. Notice that all patterns have just black and white as color, so, I do not store the color table (assume that the color table for all patterns is identical, and just set to black and white). So, I need to store 144*3*8 = 3456 bytes for the patterns (much lower than before!). And after compressed using Pletter, it's just 1434 bytes.
I considered having gray in the flag to have shadows, but that forced me to store a color table, which used more space. So, I decided against it!
Once we have the patterns, for each animation frame (there are 16 frames), we store the corresponding "name table", i.e., which patterns have to be drawn in each of the 32x24 positions of the screen. This is 768 bytes per frame, so 768*16 = 12272 bytes. I actually never draw the bottom line of the animation, since I use that for text, so that saves some space, and I just need 11776 bytes. After compressing them with Pletter, it;s reduced to 5842 bytes.
Step 5: Playing it in the MSX
Once we have that, to play it back on the MSX, we just need to copy the patterns to the pattern table in the VDP, set the color tables to black and white, and then at each frame copy the corresponding name table to the VDP. The video could play easily at 50 or 60 frames per second (since it just needs to copy 32*23 = 736 bytes per frame), but the animation would be too fast. So, the game plays it at 25 or 30 frames per second depending on if it is running on a PAL or NTSC MSX.
The XRacing logo is then drawn on top by just overwriting the proper positions on the name table, and drawing a few sprites on top of it to achieve the 4 rounded corners, and a few other details (there are 19 sprites on the title screen).
And that's it! So, basically the idea is: 1) take an animation loop as input, 2) get the set of patterns needed for each bank, 3) use k-medoids to reduce the number of patterns to 144 per bank, 4) with that we can already display the animation on the MSX, since we just need to change the name table in each frame!
It's not rocket science, but what I find interesting is that there is nothing in the code hard-coded for the video being a flag waving animation. The idea could in principle work for any other small animation loop. So, I might use this technique in the future to display other types of animations :)
Ok, that's all for this post, which is already getting too long! Next time I want to explain how the in-game scroll works! which is a bit more involved :)