Santiago Ontañón: 2019

Monday, April 22, 2019

Smooth Scroll in XRacing

As I explained in a previous post, my latest MSX game is XRacing, a top-down racing game inspired in the classic Super Cars for the Commodore Amiga.

When I was planning XRacing, the one thing that I wanted the game to have was smooth scroll. As a Super Cars-style game, if the scroll was the usual choppy 8x8 MSX scroll, it would be very hard to play. At the end I ended up using a method for achieving this that worked pretty well! So, today I just wanted to give you an idea of how it works!

Ok, so, let's start with the basics. How do people usually implement scroll in MSX games? (MSX2 and later machines can do this by hardware, but let's forget about those for now, and focus on the classic MSX1 machines) There are basically 3 common techniques:

Scroll by tiles: As I explained in this post, the MSX graphic chip (the VDP) displays graphics by defining "tiles" (or "patterns"), which are 8x8 pixel small graphics. There are 3 banks of 256 tiles each. Each of the banks contains the tiles to be displayed in the top, middle and bottom thirds of the screen respectively. A "name table" is a small 768 byte array in the VDP that contains one byte per tile in the screen (at the standard 256x192 pixels MSX resolution, there are 32x24 = 768 tiles in the screen), and specifies which tile to be drawn in each position. So, to do tile by tile scroll, you just need to update this "name table", which just requires writing 768 bytes to the VDP per game frame. This is a very small amount of information, and thus can be done easily at 50/60 frames per second, and that's why it's used by most games. This is what was used in all Konami games like the Nemesis saga, Knightmare, etc.
Bitmap-mode scroll: The other extreme is to ignore completely the name table, and use the video memory as if it was a bitmap (i.e., like if we were programming a game for the ZX spectrum or the Amstrad CPC, which did not have the concept of a name table). In this approach, the game just redraws the whole screen at each game frame, and thus can move the graphics pixel by pixel if desired. This allows for smooth scroll, but has the down side of having to write A LOT of data to the VDP in each game frame. Since each tile requires 16 bytes of data (8 bytes for the pattern, and 8 bytes for the colors), we need to write 16*768 = 12288 bytes per game frame. Now, the maximum speed at which we can transfer data to the VDP is one byte each 29 CPU cycles, so, that is 12288 * 29 = 356352 cycles. Since the Z80 CPU of the MSX runs at about 3.58MHz (3.58M cycles per second), this would result in at most 10 frames per second (ignoring the time we need for the game logic, and assuming maximum transfer speed, which is unlikely). Games that use this technique usually ignore the color table, and just use monochrome graphics, so that only half the data has to be transferred, and then, on top of that usually a large scoreboard is used to reduce the size of the playable area, reducing data transfer even more. Most ZX Spectrum game ports to the MSX use this technique, since in Spectrum this technique makes sense (Spectrum video memory is memory-mapped, and thus transfer speeds are much higher). To port the game to the MSX, they created a small buffer in RAM that acted as the ZX Spectrum video memory, and then wrote a routine that transferred its content to the VDP at each game frame (in this way the code for the ZX spectrum barely needs to be modified to run on an MSX), which is the routine responsible for the slowdown we see in most ZX Spectrum ports...
Pre-calculated scrolled tiles: this is a very interesting technique used by some more modern MSX games (and a few classic ones too). The idea is to have multiple versions of each tile in the tile banks. The different versions contain the same tile, but already scrolled 1 pixel to the left, 2 pixels to the left, 3 pixels to the left, etc. In this way, if we want to (for example) move the scroll 1 pixel to the left, we just need to update the name table to replace the current tiles by the versions of the tiles already scrolled 1 pixel to the left. This achieves smooth scroll at very fast speeds! The downside is that since we need to have multiple versions of each tile, the amount of different tiles we can have in a game level is much smaller, thus reducing graphic variety. Example games that used this technique are Lotus F3, or Uridium. As you can see Lotus F3 has very little graphical variety. Uridium has more variety, since it only requires horizontal scroll, and thus less versions of each tile are needed. This was also used in a few classic games like Circus Charlie, for example.

So, in summary "Scroll by tiles" is fast but choppy, "Bitmap-mode scroll" is smooth but slow, and "Pre-calculated scrolled tiles" is fast and smooth but reduces graphical variety. None of those options was good enough for what I wanted to do. So, I had to figure out a way to do better :)

Pre-calculating Scrolled Tiles

My method is based upon the pre-calculated scrolled tiles method. So, let's start with that. To illustrate all the concepts, I'm going to be using the actual graphics I used for the "grass" type of tracks in the game (there are two separate types of tracks: tracks with a grass background, and urban tracks with buildings). This is one of the first Formula 1 tracks in the game, called "St. Junipero" (named after my favorite Black Mirror episode :)):

Note that this is NOT how the track looks in the game, since I use a lot of sprites to add details to the track, but let's focus just on the tiles for now :) To author that track, this is the set of tiles I wanted to use:

Notice that there are only 51 different tiles used. It seems like there aren't that many, right? Well, you'll see that 51 is quite a large number for the "pre-calculated scrolled tiles" approach. Suppose that we wanted to use that map in a game with smooth scroll that moves in increments of 2 pixels (1 pixel smooth scroll would be even harder!). In order to do that, what we need to do is this:

For each tile in our set (51 tiles): find all the other tiles that appear at least once to the right of this tile in the map above. For each such pair of tiles, we need to generate the following pre-calculated tiles, and have them in our tile banks:

So, for each pair of tiles that appear next to each other (one to the right of the previous), we need, in addition to the two original tiles, three additional intermediate tiles corresponding to the intermediate scroll positions in which the tiles might appear.
If we wanted to have vertical scroll instead, We need to do the same thing for tiles that appear one of top of another, for vertical scroll.
If we wanted multi-directional scroll, then, instead of pairs of tiles, we need to account for combinations of horizontal + vertical scroll. So, in general, for each 2x2 block of tiles that appear in the map, we need to do what I described above. So, for each unique 2x2 tile combination, we would need to add 21 different pre-calculated tiles.

You can already see where this is going. If we want pre-calculated tiles for multidirectional scroll (even just for moving 2 pixels at a time, let alone pixel by pixel), we would need A LOT of tiles. If we remove duplicates (since lots of tiles generated with the previous method would look identical), here is the set of tiles we would need for the map above:

This is a total of 1196 tiles! And if we now create a separate map that has some unique tile combination not appearing in this map, we would then need even more tiles! Also, have in mind that I authored those tiles with this method in mind, so they are carefully drawn to minimize the number of tiles needed. Notice that all elements always have an even number of empty space surrounding them (for example the house has 4 empty pixels above (not 3, not 5), the cone has 2 empty pixels below, etc.). If you are not careful, you'd get even more tiles!

Now, in XRacing, I need the bottom part of the screen for a minimap, and some basic information about the game (speed, time, laps, car status, etc.). So, I only used the top two banks for the actual gameplay. Since maps scroll arbitrarily, I have the same tile set in each of the two banks. Thus, the maximum number of different tiles I can have loaded at once in the VDP is 256. This was clearly not going to work for XRacing :)

Scroll "On Rails"

My first idea was to force the scroll to be only either horizontal or only vertical. So, if we are scrolling horizontally, I would snap the y coordinate to a multiple of 8 pixels, and when scrolling vertically, I'd snap the x coordinate to a multiple of 8 pixels. With this idea, we would only need the following tiles:

That is "just" 389 tiles! Still too many, but we are getting closer to the magical 256 number! So, in order to reduce this number further, I used the idea of "rails".

Rather than allow the track to freely scroll, I would define "rails" in the track, and the camera can ONLY slide along those rails. For example, in the track above, these are the defined rails:

When the game starts, the player is in the left-most red-color rail, and thus the camera can only move vertically. As soon as the player reaches the top part of the map, the camera "snaps" to the horizontal blue-colored rail, and the camera can only move left-right, etc. The key idea is that not all tiles would need pre-calculated versions that scroll vertically or horizontally. For example, the little house in the tiles above was designed to only scroll vertically (there would be color clash if I scroll it horizontally!). So, I only add it to parts of the map associated with a vertical rail, and never to a horizontal rail. Likewise, there are graphics that I only ever associate with horizontal rails (like the yellow banner sign).

This reduces the set of tiles needed even further! I don't know how many tiles would I need for the map above, since I never calculated it. But when putting together ALL the XRacing maps that use this tile set, I only need 313 tiles, which is even closer to the 256 magic number! We are getting there!

Multiple Tile Groups

The last trick is to realize that not all tiles are needed at all time. For example the starting block (the blue semaphore at the beginning of a track), only appears once in the track! So, I decided to divide my tiles into smaller groups, and load/unload them during the game. The constraint is that we cannot upload too many tiles at once, or the player would notice a slowdown in the scroll. So, what I did is to define a large group of "common tiles" that would always be loaded, and then smaller groups of tiles, of which only one of them at a time would be loaded, and each time we switch from one rail to another, the rail has a tag which which group of tiles it needs to load.

This part was a bit tricky, and I wrote a small optimization algorithm using a classic hill-climbing approach to find the optimal grouping of tiles. I struggled quite a lot with this part, since it was hard to find a good grouping, and my algorithm kept finding local optima that were not good. Happy to provide details if anyone wants to replicate any of this. But suffice to say that there is an algorithm that given a set of maps can find these groups of tiles (the idea is to start with one group per rail, and then take tiles out of those groups into a "common group", trying to take out the tiles that appear most often). These were the groups of tiles that were found for XRacing:

A large common group of tiles with 199 tiles:

Two smaller groups of tiles with 57 tiles each:

For the urban tile set, there was 4 smaller groups of tiles instead of 2, but that was also manageable. So, in a regular game frame, I only need to update the two banks of the name tables (512 bytes to the VDP), and when the scroll snaps from one rail to the next, we need to upload an additional 57 tiles (57*16 = 912 bytes). I need to upload those to both bank 1 and bank 2, so, it's twice that amount, but that was also manageable. The game runs at 25/30 frames per second, so this was well within the margins of what I could handle!

Another tricky part was that some times, my optimization algorithm was not able to find a solution to the problem using only 256 tiles, and thus I had to constantly change the maps if there was a particular combination of tiles that only appeared in one map, to reduce the number of tiles actually needed. This was a bit painful, but eventually I got a set of maps for which my optimization algorithm could find a good grouping solution, and thus I could use for the game!

Sprites

Finally, in order to add a bit more graphic detail in the game, I use a collection of sprites in addition to tiles. In addition to the cars, and the small dots in the minimap, I use sprites in 3 different ways in XRacing:

To add graphic detail to the game. This is used for trees and signs, which have sprites to draw the outline of the trees, and the MSX logo on the signs. This can be seen in the images below.

To protect the scoreboard from sprites intruding into it I put 4 blank sprites in the top-left corner of the scoreboard, in that way the car, tree and MSX-logo sprites do not get drawn over the scoreboard. Just to clarify on this, the MSX can only display 4 sprites in each horizontal screen line at a time. So, if you have more than 4 sprites, the 5th sprite doesn't get displayed. I exploit this trick, and put 4 sprites in the scoreboard, so any other sprite that tries to be drawn there simply cannot be drawn :)

I use the same 4 sprite trick to make it look as if the cars go underneath the semaphore at game start. So, there are 4 sprites (black color) making up the semaphore, which serve both to add graphic detail, and to protect it from the car sprites

Conclusions

And that's basically it! So, the key question I wanted to answer was: can I take the "pre-calculated scrolled tiles" scrolling technique and push it so that I can increase graphic variety as much as possible? In order to do that I used the idea of "rails", which limit the number of tiles we need to pre-calculate and the idea of splitting the tiles into smaller groups that I can load/unload to the VDP at run time. This results in a scroll that is not fully multi-directional, but has the benefit of allowing me to have different graphics in the game (like the little houses, or trees, etc.) to decorate the map, which would not have been possible otherwise (look at the Lotus F3 example I pointed out above).

As a final note, once I made the game available, Artrag from the msx.org forums, pointed out to me that there is a way to push the VDP even further, and make the top two tile banks of the VDP share a single bank. In this way, we can have 2 banks of tiles, and switch from one to the other arbitrarily. So, to clarify: in XRacing the top part of the screen uses bank 1, and the middle bank 2. What Artrag is suggesting is that we can make both top and middle use bank 1, and then swap to bank 2 when we want. So, this would bring the number of total tiles we can have up to 512 (only 256 of them at once). I want to explore this possibility in my future games, so, it seems we have not yet found the "limits" of our beloved MSX computers! So, looking forward to better scrolling algorithms in the future! :)

Anyway, I hope you found this interesting, and if you have any different ideas or techniques to push the possibilities of doing smooth scroll in MSX games, I'd love to hear about them! :)

Sunday, February 17, 2019

Space Saving Techniques in Phantomas 2.0

Although most of my recent games have been for the MSX, last year Jordi Sureda and myself started working together on a game for the Amstrad CPC. The game is a remake of the Dinamic classic Phantomas 2. You can see a full walk through here of the game by JGonza (who was one of the beta testers of the game):

The game itself can be downloaded from Github, and if you want a physical copy of the game, you can buy it from matranet. The game is not very complex technically, and it works well mostly thanks to the graphics (made by Jordi Sureda). But there are a few interesting technical aspects. One of them in particular I think is particularly interesting since it's something that has to be done in every 8 bit project: how compress data as much as possible.

There are many techniques that people have used to reduce the space used by the room definition (e.g., using macro-tiles, as brilliantly explained in this video), or compressing them using a compression algorithm (like Pletter, or Exomizer). But there is one more thing that can be done, which is what I want to explain in this post: how to organize your data so that it gets maximally compressed when using a compression algorithm. Let me start by introducing a couple of basic ideas about compression algorithms.

Compression Algorithms in 8 bit Computers

A compression algorithm takes as input a sequence of symbols (say, of length n), and tries to generate another sequence of symbols of length smaller than n by exploiting regularities in the data. For example, if I have a sequence that is something like this "aaaaabbb", I could replace it by "5a3b", meaning that I have 5 times an "a", and then 3 times a "b". This is called "run length encoding", and works well when we are compressing sequences that have long sequences of repeated symbols. This was used in the classic PCX image format, popular in the Amiga and DOS times, where images contained usually large patches of the same color. The most common compression tool nowadays is the "zip" utility that comes with most OSs, which does use much more advanced compression than simple run length encoding (most modern compressors use dictionary-based approaches such as the LZW or DEFLATE algorithms).

These advanced compressors are usually too computationally intensive for 8 bit machines to handle, but there are tools, like Pletter (popular in the MSX world) or Exomizer (popular in the Amstrad CPC world), specifically designed for these machines. Usually, though, the way they are used is to have a compressor that runs on a modern PC to compress the data to fit it in a game, and then only the decompressor (which is not as computationally heavy) runs on the 8 bit machine.

For a comparison of how well these algorithms work, I refer you to an amazing empirical evaluation done by @newcoleco on the Atariage forums ( http://atariage.com/forums/topic/268244-data-compression-benchmark-dan-pletter-zx7-exomizer-pucrunch-megalz/ ). In summary: Exomizer compresses more, but decompression is much slower. So, if space is the biggest concern, use Exomizer, if decompression time is a concern (i.e., if you need to decompress things during gameplay), then Pletter is the better option.

For Phantomas 2.0, I considered both (for a few weeks I maintained two separate versions of the game using both compressors), but I ended deciding for Pletter, since Exomizer was too slow (it took about 4 seconds to decompress all the data I needed when changing rooms, which made the game unplayable). But in other games, Exomizer might be the best option. So, the important thing is to know your options!

Compressing the Rooms in Phantomas 2.0

There are many ways in which rooms can be defined in games. In Phantomas 2.0, we used a simple 2-plane grid-based approach: rooms are made out of 8x16 pixel tiles, and since the playable screen is 128x128 pixels wide (the game runs in Amstrad CPC mode 0, where pixels are twice as wide as tall) each room is 16x8 tiles in size. We used two planes, one for background tiles (drawn behind the player and enemies) and one for foreground tiles (drawn in front of the player and enemies). For example, here I show an example room, and the two planes it is made out of:

This representation for rooms uses quite a lot of space (ignoring enemy/item information), since there are 2 planes of 16*8 tiles, and we use one byte to specify each tile, each room is 16*8*2 = 256 byes in size (plus 20 or 30 for enemies/items). The game has 85 rooms, so that would be 24KB just for the rooms! (and remember that the Amstrad CPC only has 64 KB of RAM, and that 16KB of those are the video RAM, so, we can actually only use 48KB...). That was not going to work...

An option could have been going for the idea of macro tiles (used brilliantly in many of the productions by 4MHz). But macro tiles also have some limitations. So, since I did not want to impose any more restrictions on Jordi (who was creating the rooms), I decided I had to figure out a way to reduce space while maintaining this room representation. Of course, a first step is compressing them with Pletter. But there are 3 additional ideas that can be applied after that that can reduce space even more. So, let's get started!

Step 1: Direct Pletter Compression

Putting all 85 rooms together uses exactly 24316 bytes (286.07 bytes per room). Let's keep that as a baseline.

So, the first step is to just compress one by one, all the rooms using Pletter. The result is 12649 bytes (148.81) bytes per room. So, this is not bad, but we can do much better :)

Step 2: Exploiting Useles Bytes

Some bytes in the rooms are "useless". What I mean by "useless" is that whatever their value, the final room would look the same! For example, if there is a foreground tile that completely covers the background, then whatever value we have for the background tile is not going to affect the final room.

A first idea is to just set all of those background tiles covered completely by a foreground tile to 0. But a much better idea is to set them to the same value as the tile immediately to the left of them. If we are on the left-most tile, then we use the last background tile in the row before, and if we are on the top-left coordinate, then we just use "0". In this way, we are setting several bytes in a row with the same value, which results in a sequence that compressors like Pletter can compress better.

With this small change, the result is that the rooms (after compressing them, one by one with Pletter) use now only 11850 bytes (139.31 per room). With this simple trick, we have saved almost 800 bytes! not bad!

Step 3: |C(A+B)| < |C(A)| + |C(B)|

The next step might not be as obvious. But it is well known in information theory that if you have a compression algorithm C, and two sequences of symbols, A and B, the compression of the concatenation of A and B (which I will denote as "A+B") is usually smaller than the sum of the compression of A plus the compression of B. This is intuitive, since imagine that there is a subsequence of A that also happens to be in B. In this case, the compressor can exploit that when compressing A+B, however it cannot exploit it when compressing A or B separately, since there is no repetition. So, compressing two sequences together saves space most of the time.

Using that idea, the obvious thing to do is to just concatenate all the rooms, and compress the whole thing together, right? The problem with this is that you need to have enough space in RAM to decompress the rooms (recall they use 24316 bytes uncompressed). So, this was not an option. I know that some games (for example Operation Alexandra compressed the WHOLE game together!, but this was not an option for us, as the game does not fit in memory uncompressed :))

So, what we ended up doing is to compress the rooms by groups. We compressed rooms in groups of 4 (put 4 maps together, and then call Pletter to compress them). A little detail is that if you want good compression, you better place things that are similar together (as we saw above). So, I placed the background of all 4 maps together, then the foreground of the 4 maps, and then the item/enemy information.

With this idea, we get a significant reduction in space, and we are now at: 9926 bytes (116.18 bytes per room), which is already pretty good.

But we can do even better! :)

Step 4: Optimizing the Groupings

Some rooms look more similar to each other than some other rooms. So, I thought: if I put rooms that look similar in the same group when compressing them, would that save space? and the answer is yes! So, this is what I did: I wrote a program that first calculated the space needed when compressing all maps in groups of 4. Then, the program starts trying to permute maps (taking a map in a group and swapping it by a map in another group). After each swap, it recalculates the space, and if we have saved some space, then we keep the new grouping. Trying all possible map groupings is not feasible, but we can use a simple greedy search algorithm like this:
1) Start with the base grouping
2) Try every possible swap, if one reduces, keep it and continue from there
3) Once we have tried every possible swap, restart from the beginning again if we had any improvement, otherwise, terminate.

And this was another huge saving! After optimization, the rooms only use 9164 bytes (107.81 bytes per room)!!

Here's a little diagram showing a summary of the different ideas:

Conclusion

So, in summary, to get better compression:
1) try to put things that are similar together.
2) compress things together, better than compressing them separately
3) if you are compressing things together, try to optimize what groupings do you make!

Since these ideas worked very well, I used the same idea to store all the graphics of the game (in groups of 16 tiles), saving additional space (again, all the tiles uncompressed do not fit in RAM, so, we need to keep them compressed, and we only uncompress those needed for the current room). If you play the game, you might notice a small delay when changing rooms, which is when the game decompresses the necessary tiles for the next room. This is the part that was way too slow using Exomizer unfortunately...

The funniest thing is that after all of this space saving, I got a message from @Joseman on twitter, explaining me some details of the Amstrad CPC hardware I wasn't aware of (this was our first CPC game), and it turns out there are 4 full KB of extra space we didn't use! So, at the end there was plenty of unused space in the computer memory. But oh well, it was fun optimization anyway, and these ideas can be useful for other projects!

And that's it, I hope these techniques are useful to somebody else! If you have any additional space compression tricks, by all means I would like to hear about them! :D

Friday, February 15, 2019

The Flag Effect in XRacing

A few days ago I submitted my last MSX game to the MSXDev'18 competition. The game is called XRacing, and it is a top-down view car racing game inspired heavily in the Amiga game "Super Cars". A quick video of the intro and first level can be seen here:

I will explain the technique I used for the game scroll in another post, but in this post I wanted to focus on explaining how does the flag waving animation in the title screen work! I want to explain how it works, since the method achieved pretty nice results and it is not too complex, and it could be useful to other people trying to add similar effects to other MSX games!

My goal was to figure out a way to play a small "video loop" in the background that looked very smooth. I wanted to do this since I saw this video of a fire animation converted to the ZX Spectrum specs. I thought it would be amazing to have something like that on a game. So, I started thinking how could I convert a small video on my computer to play on the MSX, and how could I make it small enough to fit on a small game.

I was initially wanting to convert a fire animation (a real video of a fire, not one of those fire visual effects), but I struggled to find any that I could easily convert to a loop with a handful of frames (e.g., 16 frames or so). So, I remembered that back in the day, when we were working on the F-1 Spirit remake, we had this fancy flag animation in the background of the title screen. So, I thought that would be an easier one to convert! I started looking for real videos of a waving flag, but again I struggled to find one I could easily turn into a small 16 frame loop. So, I just wrote a little program to generate my own flag. It doesn't look real, but it did the trick.

Alright, so, before I describe the technique, let me take a brief moment to explain the technical specs of the MSX graphic chip, which will define the constraints of the approach!

The MSX VDP

Most first generation MSX computers shipped with the TMS9918 graphics chip, which I will just call "the VDP" (Video Display Processor). The VDP has several graphic modes, but I will focus on what is known as "Screen 2" in the MSX world, which is the most powerful graphic mode the VDP offers.

Screen 2 is a 256x192 pixel graphic mode, where the screen is divided into 8x8 pixel blocks called "patterns" or "tiles". The screen is thus 32 patterns wide (32*8 = 256) and 24 patterns tall (24*8 = 192), as is shown in this small diagram:

Each pattern is defined by two components a "pattern table entry" and a "color table entry", which are combined in the following way to draw the pattern:

So, as can be seen above, the "pattern table entry" (8 bytes per pattern) defines the "shape" of the pattern, and the "color table entry" (8 bytes per pattern) defines the color of the pixels. Two colors can be specified in each byte of the color table entry, which correspond to the 2 colors that can be used in the each of the 8 rows of the pattern. Each of the bits in the pattern table entry corresponds to a pixel in the pattern, and whether its a 0 or a 1 determines which of the two colors specified for the corresponding row will be drawn. This is where the "2 colors for each block of 8x1 pixels" comes in the MSX!

In order to draw something in the screen, what you need to do is to write data into the "video RAM" (VRAM from now on), and as a result, the content of the screen will change. The VDP has 16KB of VRAM, divided into a collection of areas:
- the Pattern Table: we can specify up to 768 different patterns, divided into 3 different banks (as shown in the picture above; more on this below). Each pattern is 8 bytes long, with each byte corresponding to one of the rows of the pattern (each bit corresponding to 1 pixel), as described above.
- the Color Table: also has 768 entries of 8 bytes each, specifying the colors of each of the 768 patterns as described above.
- the Name Table: finally, the name table is a 768 byte table that specifies which pattern to draw in each of the 768 positions of the screen. For example, if we define two patterns (a smiley face in pattern 0, and a sad face in pattern 1), if we write 0, 1, 0, 1, 0, 1, etc. in the first 32 bytes of the name table, we will see a row of smiley and sad faces. So, once we define a pattern, we can reuse it multiple times in the screen. There is only one limitation here, and it is that since we can only use 1 byte to specify which pattern to draw, we can only select from 256 patterns (since that's the maximum number of values we can represent with a single byte). Since there are 768 patterns, these are divided in "3 banks". In the top 8 rows of the screen (BANK 1), we can only select from patterns 0 - 255, in the middle rows, we can select from patterns 256 - 511 (BANK 2), and in the bottom 8 rows from 512 - 767 (BANK 3).
- and then a couple of tables for sprites (but we will ignore sprites for now :))

By looking at this structure you can already probably see why some MSX games have a very "chunky" scroll of 8x8 pixels! It's just because the easiest way to implement scroll in MSX is by just defining our graphics as patterns, and then just moving them with the name table (which lets us move them only in increments if 8 by 8 pixels!). If we want smoother scroll, then we have to figure out a way to do it, by using just the tools I described above!

Another thing to have in mind is the slow speed of the MSX CPU (Z80) and the even slower speed of data transfer to the VDP! The MSX Z80 has a clock of 3.58 MHz (3580000 ticks per second). If we want to have a smooth video/game that renders at 50 frames per second, we only have 71600 ticks per frame. If we have in mind that it takes usually 29 ticks to transfer a single byte to the VDP. So, we can only optimistically transfer about 2500 bytes per frame to the VDP, which is not enough to redraw a whole frame! 768 patterns * (8 bytes for pattern table + 8 bytes for color table) = 12288 bytes, which is the reason for it being so hard to have smooth scroll on MSX...

Ok, so that's the VDP. Now, how do we get a smooth animation on such a slow hardware?

Step 1: Generating the Video

To generate the source video, I just wrote a small Java script to render a flag waving loop with 16 frames of 256x192 pixels in black and white (I wanted to have gray-scale, but I'll explain below why I decided against it). The original video looks like this:

Step 2: Extracting the Patterns

The next step is to go one by one each of the 16 frames of the animation, and extract all the different patterns needed to display this animation. For doing this, we scan each 8x8 pixel block in each of the 16 frames, and we create a list of all the different patterns we find. Also, since once this is uploaded to the VDP, each of the 3 banks of the VDP is separate, we do this for each bank separately. With the video above, we find the following:

- BANK 1: 787 different patterns
- BANK 2: 510 different patterns
- BANK 3: 772 different patterns

So, assuming we could store all of these tiles in the VDP, what we would have to do is this:
1) Store all of these tiles in the game ROM: (787+510+772)*16 = 33104 bytes
2) Now, each frame of the animation would just be 768 bytes (the name table for each frame). Since we have 16 frames, this would be: 768*16 = 12288 bytes
3) Then, at run time, we just load all the patterns in the VDP initially. And at each game frame, we just keep uploading the name table that corresponds to the current frame. This is just 768 bytes per frame, which is doable.

Problems:
1) The VDP can only store 256 patterns per bank, and thus 787, 510 and 772 do not fit!
2) Even if they were to fit, that animation already would use most of the size of the ROM cartridge! (I was targeting a 48KB ROM)

So, we need to reduce that somehow :)

Step 3: Reducing the Set of Patterns Needed

If we were to just display the flag animation by itself, we could afford to have 256 patterns per bank. However, I wanted to have the game logo drawn on top of the flag, and maybe have some text message at the bottom of the screen. So, I had to reserve some patterns for that. In order to show the title image, I needed these patterns (not all patterns are strictly needed, but it was easier for me to have these patterns, which I reuse in other parts of the game):

If you count, those are 108 patterns But I wasn't sure if I was going to need a few more, so, I assumed I might need 112 patterns This left me with 256 - 112 = 144 patterns in each bank for the flag animation. So, how to reduce from 787 tiles to 144?

The answer is a standard machine learning family of algorithms called clustering algorithms. Clustering algorithms work as follows: the input is a large collection of points in an n-dimensional space, and the output is a "clustering" (a grouping of the input points into a collection of "buckets" called "clusters"), where the points in a cluster should look more similar to each other than to points in other clusters. So, what I did was to consider every pattern as a point in a 64-dimensional space (one dimension per pixel in the pattern), and I gave all 787 patterns for BANK 1 as input to a clustering algorithm (a very simple one called k-medoids), and I asked the algorithm to cluster them into 144 clusters. From each cluster, I then pick the most "representative" pattern (the "medoid"). I then do the same thing with BANK 2, and BANK 3. I then replace every pattern in the flag animation by the representative pattern of its cluster. This effectively reduces the number of patterns needed for the animation to just 144 per bank!

The resulting animation now looks like this:

Notice that it mostly looks like the same animation, except for a few glitches! So, this already looks promising! After that, I just manually replaced the medoid of a couple of clusters by a different one to fix those dots at the top of the animation, and the animation already looked much cleaner!

Step 4: Translate the Animation to MSX Data

Finally, now we can just take those 144 patterns per BANK, and save them to a binary file. Notice that all patterns have just black and white as color, so, I do not store the color table (assume that the color table for all patterns is identical, and just set to black and white). So, I need to store 144*3*8 = 3456 bytes for the patterns (much lower than before!). And after compressed using Pletter, it's just 1434 bytes.

I considered having gray in the flag to have shadows, but that forced me to store a color table, which used more space. So, I decided against it!

Once we have the patterns, for each animation frame (there are 16 frames), we store the corresponding "name table", i.e., which patterns have to be drawn in each of the 32x24 positions of the screen. This is 768 bytes per frame, so 768*16 = 12272 bytes. I actually never draw the bottom line of the animation, since I use that for text, so that saves some space, and I just need 11776 bytes. After compressing them with Pletter, it;s reduced to 5842 bytes.

Step 5: Playing it in the MSX

Once we have that, to play it back on the MSX, we just need to copy the patterns to the pattern table in the VDP, set the color tables to black and white, and then at each frame copy the corresponding name table to the VDP. The video could play easily at 50 or 60 frames per second (since it just needs to copy 32*23 = 736 bytes per frame), but the animation would be too fast. So, the game plays it at 25 or 30 frames per second depending on if it is running on a PAL or NTSC MSX.

The XRacing logo is then drawn on top by just overwriting the proper positions on the name table, and drawing a few sprites on top of it to achieve the 4 rounded corners, and a few other details (there are 19 sprites on the title screen).

And that's it! So, basically the idea is: 1) take an animation loop as input, 2) get the set of patterns needed for each bank, 3) use k-medoids to reduce the number of patterns to 144 per bank, 4) with that we can already display the animation on the MSX, since we just need to change the name table in each frame!

It's not rocket science, but what I find interesting is that there is nothing in the code hard-coded for the video being a flag waving animation. The idea could in principle work for any other small animation loop. So, I might use this technique in the future to display other types of animations :)

Ok, that's all for this post, which is already getting too long! Next time I want to explain how the in-game scroll works! which is a bit more involved :)

Santiago Ontañón