Monday, April 22, 2019

Smooth Scroll in XRacing

As I explained in a previous post, my latest MSX game is XRacing, a top-down racing game inspired in the classic Super Cars for the Commodore Amiga.

When I was planning XRacing, the one thing that I wanted the game to have was smooth scroll. As a Super Cars-style game, if the scroll was the usual choppy 8x8 MSX scroll, it would be very hard to play. At the end I ended up using a method for achieving this that worked pretty well! So, today I just wanted to give you an idea of how it works!

Ok, so, let's start with the basics. How do people usually implement scroll in MSX games? (MSX2 and later machines can do this by hardware, but let's forget about those for now, and focus on the classic MSX1 machines) There are basically 3 common techniques:

  • Scroll by tiles: As I explained in this post, the MSX graphic chip (the VDP) displays graphics by defining "tiles" (or "patterns"), which are 8x8 pixel small graphics. There are 3 banks of 256 tiles each. Each of the banks contains the tiles to be displayed in the top, middle and bottom thirds of the screen respectively. A "name table" is a small 768 byte array in the VDP that contains one byte per tile in the screen (at the standard 256x192 pixels MSX resolution, there are 32x24 = 768 tiles in the screen), and specifies which tile to be drawn in each position. So, to do tile by tile scroll, you just need to update this "name table", which just requires writing 768 bytes to the VDP per game frame. This is a very small amount of information, and thus can be done easily at 50/60 frames per second, and that's why it's used by most games. This is what was used in all Konami games like the Nemesis saga, Knightmare, etc. 
  • Bitmap-mode scroll: The other extreme is to ignore completely the name table, and use the video memory as if it was a bitmap (i.e., like if we were programming a game for the ZX spectrum or the Amstrad CPC, which did not have the concept of a name table). In this approach, the game just redraws the whole screen at each game frame, and thus can move the graphics pixel by pixel if desired. This allows for smooth scroll, but has the down side of having to write A LOT of data to the VDP in each game frame. Since each tile requires 16 bytes of data (8 bytes for the pattern, and 8 bytes for the colors), we need to write 16*768 = 12288 bytes per game frame. Now, the maximum speed at which we can transfer data to the VDP is one byte each 29 CPU cycles, so, that is 12288 * 29 = 356352 cycles. Since the Z80 CPU of the MSX runs at about 3.58MHz (3.58M cycles per second), this would result in at most 10 frames per second (ignoring the time we need for the game logic, and assuming maximum transfer speed, which is unlikely). Games that use this technique usually ignore the color table, and just use monochrome graphics, so that only half the data has to be transferred, and then, on top of that usually a large scoreboard is used to reduce the size of the playable area, reducing data transfer even more. Most ZX Spectrum game ports to the MSX use this technique, since in Spectrum this technique makes sense (Spectrum video memory is memory-mapped, and thus transfer speeds are much higher). To port the game to the MSX, they created a small buffer in RAM that acted as the ZX Spectrum video memory, and then wrote a routine that transferred its content to the VDP at each game frame (in this way the code for the ZX spectrum barely needs to be modified to run on an MSX), which is the routine responsible for the slowdown we see in most ZX Spectrum ports...
  • Pre-calculated scrolled tiles: this is a very interesting technique used by some more modern MSX games (and a few classic ones too). The idea is to have multiple versions of each tile in the tile banks. The different versions contain the same tile, but already scrolled 1 pixel to the left, 2 pixels to the left, 3 pixels to the left, etc. In this way, if we want to (for example) move the scroll 1 pixel to the left, we just need to update the name table to replace the current tiles by the versions of the tiles already scrolled 1 pixel to the left. This achieves smooth scroll at very fast speeds! The downside is that since we need to have multiple versions of each tile, the amount of different tiles we can have in a game level is much smaller, thus reducing graphic variety. Example games that used this technique are Lotus F3, or Uridium. As you can see Lotus F3 has very little graphical variety. Uridium has more variety, since it only requires horizontal scroll, and thus less versions of each tile are needed. This was also used in a few classic games like Circus Charlie, for example.
So, in summary "Scroll by tiles" is fast but choppy, "Bitmap-mode scroll" is smooth but slow, and "Pre-calculated scrolled tiles" is fast and smooth but reduces graphical variety. None of those options was good enough for what I wanted to do. So, I had to figure out a way to do better :)


Pre-calculating Scrolled Tiles

My method is based upon the pre-calculated scrolled tiles method. So, let's start with that. To illustrate all the concepts, I'm going to be using the actual graphics I used for the "grass" type of tracks in the game (there are two separate types of tracks: tracks with a grass background, and urban tracks with buildings). This is one of the first Formula 1 tracks in the game, called "St. Junipero" (named after my favorite Black Mirror episode :)):


Note that this is NOT how the track looks in the game, since I use a lot of sprites to add details to the track, but let's focus just on the tiles for now :) To author that track, this is the set of tiles I wanted to use:


Notice that there are only 51 different tiles used. It seems like there aren't that many, right? Well, you'll see that 51 is quite a large number for the "pre-calculated scrolled tiles" approach. Suppose that we wanted to use that map in a game with smooth scroll that moves in increments of 2 pixels (1 pixel smooth scroll would be even harder!). In order to do that, what we need to do is this:
  • For each tile in our set (51 tiles): find all the other tiles that appear at least once to the right of this tile in the map above. For each such pair of tiles, we need to generate the following pre-calculated tiles, and have them in our tile banks:
  • So, for each pair of tiles that appear next to each other (one to the right of the previous), we need, in addition to the two original tiles, three additional intermediate tiles corresponding to the intermediate scroll positions in which the tiles might appear.
  • If we wanted to have vertical scroll instead, We need to do the same thing for tiles that appear one of top of another, for vertical scroll.
  • If we wanted multi-directional scroll, then, instead of pairs of tiles, we need to account for combinations of horizontal + vertical scroll. So, in general, for each 2x2 block of tiles that appear in the map, we need to do what I described above. So, for each unique 2x2 tile combination, we would need to add 21 different pre-calculated tiles.
You can already see where this is going. If we want pre-calculated tiles for multidirectional scroll (even just for moving 2 pixels at a time, let alone pixel by pixel), we would need A LOT of tiles. If we remove duplicates (since lots of tiles generated with the previous method would look identical), here is the set of tiles we would need for the map above:


This is a total of 1196 tiles! And if we now create a separate map that has some unique tile combination not appearing in this map, we would then need even more tiles! Also, have in mind that I authored those tiles with this method in mind, so they are carefully drawn to minimize the number of tiles needed. Notice that all elements always have an even number of empty space surrounding them (for example the house has 4 empty pixels above (not 3, not 5), the cone has 2 empty pixels below, etc.). If you are not careful, you'd get even more tiles!

Now, in XRacing, I need the bottom part of the screen for a minimap, and some basic information about the game (speed, time, laps, car status, etc.). So, I only used the top two banks for the actual gameplay. Since maps scroll arbitrarily, I have the same tile set in each of the two banks. Thus, the maximum number of different tiles I can have loaded at once in the VDP is 256. This was clearly not going to work for XRacing :)


Scroll "On Rails"

My first idea was to force the scroll to be only either horizontal or only vertical. So, if we are scrolling horizontally, I would snap the y coordinate to a multiple of 8 pixels, and when scrolling vertically, I'd snap the x coordinate to a multiple of 8 pixels. With this idea, we would only need the following tiles:


That is "just" 389 tiles! Still too many, but we are getting closer to the magical 256 number! So, in order to reduce this number further, I used the idea of "rails".

Rather than allow the track to freely scroll, I would define "rails" in the track, and the camera can ONLY slide along those rails. For example, in the track above, these are the defined rails:


When the game starts, the player is in the left-most red-color rail, and thus the camera can only move vertically. As soon as the player reaches the top part of the map, the camera "snaps" to the horizontal blue-colored rail, and the camera can only move left-right, etc. The key idea is that not all tiles would need pre-calculated versions that scroll vertically or horizontally. For example, the little house in the tiles above was designed to only scroll vertically (there would be color clash if I scroll it horizontally!). So, I only add it to parts of the map associated with a vertical rail, and never to a horizontal rail. Likewise, there are graphics that I only ever associate with horizontal rails (like the yellow banner sign).

This reduces the set of tiles needed even further! I don't know how many tiles would I need for the map above, since I never calculated it. But when putting together ALL the XRacing maps that use this tile set, I only need 313 tiles, which is even closer to the 256 magic number! We are getting there!


Multiple Tile Groups

The last trick is to realize that not all tiles are needed at all time. For example the starting block (the blue semaphore at the beginning of a track), only appears once in the track! So, I decided to divide my tiles into smaller groups, and load/unload them during the game. The constraint is that we cannot upload too many tiles at once, or the player would notice a slowdown in the scroll. So, what I did is to define a large group of "common tiles" that would always be loaded, and then smaller groups of tiles, of which only one of them at a time would be loaded, and each time we switch from one rail to another, the rail has a tag which which group of tiles it needs to load.

This part was a bit tricky, and I wrote a small optimization algorithm using a classic hill-climbing approach to find the optimal grouping of tiles. I struggled quite a lot with this part, since it was hard to find a good grouping, and my algorithm kept finding local optima that were not good. Happy to provide details if anyone wants to replicate any of this. But suffice to say that there is an algorithm that given a set of maps can find these groups of tiles (the idea is to start with one group per rail, and then take tiles out of those groups into a "common group", trying to take out the tiles that appear most often). These were the groups of tiles that were found for XRacing:
  • A large common group of tiles with 199 tiles:

  • Two smaller groups of tiles with 57 tiles each:

For the urban tile set, there was 4 smaller groups of tiles instead of 2, but that was also manageable. So, in a regular game frame, I only need to update the two banks of the name tables (512 bytes to the VDP), and when the scroll snaps from one rail to the next, we need to upload an additional 57 tiles (57*16 = 912 bytes). I need to upload those to both bank 1 and bank 2, so, it's twice that amount, but that was also manageable. The game runs at 25/30 frames per second, so this was well within the margins of what I could handle!

Another tricky part was that some times, my optimization algorithm was not able to find a solution to the problem using only 256 tiles, and thus I had to constantly change the maps if there was a particular combination of tiles that only appeared in one map, to reduce the number of tiles actually needed. This was a bit painful, but eventually I got a set of maps for which my optimization algorithm could find a good grouping solution, and thus I could use for the game!


Sprites

Finally, in order to add a bit more graphic detail in the game, I use a collection of sprites in addition to tiles. In addition to the cars, and the small dots in the minimap, I use sprites in 3 different ways in XRacing:
  • To add graphic detail to the game. This is used for trees and signs, which have sprites to draw the outline of the trees, and the MSX logo on the signs. This can be seen in the images below.
  • To protect the scoreboard from sprites intruding into it I put 4 blank sprites in the top-left corner of the scoreboard, in that way the car, tree and MSX-logo sprites do not get drawn over the scoreboard. Just to clarify on this, the MSX can only display 4 sprites in each horizontal screen line at a time. So, if you have more than 4 sprites, the 5th sprite doesn't get displayed. I exploit this trick, and put 4 sprites in the scoreboard, so any other sprite that tries to be drawn there simply cannot be drawn :)

  • I use the same 4 sprite trick to make it look as if the cars go underneath the semaphore at game start. So, there are 4 sprites (black color) making up the semaphore, which serve both to add graphic detail, and to protect it from the car sprites



Conclusions

And that's basically it! So, the key question I wanted to answer was: can I take the "pre-calculated scrolled tiles" scrolling technique and push it so that I can increase graphic variety as much as possible? In order to do that I used the idea of "rails", which limit the number of tiles we need to pre-calculate and the idea of splitting the tiles into smaller groups that I can load/unload to the VDP at run time. This results in a scroll that is not fully multi-directional, but has the benefit of allowing me to have different graphics in the game (like the little houses, or trees, etc.) to decorate the map, which would not have been possible otherwise (look at the Lotus F3 example I pointed out above).

As a final note, once I made the game available, Artrag from the msx.org forums, pointed out to me that there is a way to push the VDP even further, and make the top two tile banks of the VDP share a single bank. In this way, we can have 2 banks of tiles, and switch from one to the other arbitrarily. So, to clarify: in XRacing the top part of the screen uses bank 1, and the middle bank 2. What Artrag is suggesting is that we can make both top and middle use bank 1, and then swap to bank 2 when we want. So, this would bring the number of total tiles we can have up to 512 (only 256 of them at once). I want to explore this possibility in my future games, so, it seems we have not yet found the "limits" of our beloved MSX computers! So, looking forward to better scrolling algorithms in the future! :)

Anyway, I hope you found this interesting, and if you have any different ideas or techniques to push the possibilities of doing smooth scroll in MSX games, I'd love to hear about them! :)

4 comments:

  1. Great reading!
    Can you add details on how you managed to optimize the common tile set among the 6 tile sets in your picture ?
    I assume 6 tile sets because you have plotted 6 boxes in the level.
    I would have computed the intersection among the 6 tile sets as starting point, then I would have added to the common tile set one tile at time from the 6 difference sets, taken from the largest one.
    The process should end when the sum of the sizes of the common tile set and of the largest difference set is equal to 256.

    ReplyDelete
  2. Hi Arturo! Thanks! The "6 boxes" were put there by hand. So, that is the input of the algorithm.
    - My first version worked exactly as you say, and was working initially, until I started adding more maps, that pushed the approach to the limit.
    - The problem is that, in addition to having the different tile groups, I need to assign an ID to each tile. Even if there are more than 256 tiles, I only have 256 IDs to assign to tiles, since I want each tile to use just 1 byte in the map. So, I need to figure out which tiles will have the same ID, and thus need to be in different groups. Two tiles with the same ID can never appear at the same time in the screen. This introduced an additional constraint that my initial version of the optimization algorithm could not handle very well.
    - So, the current version does this: it starts with one group per "rail" with all the tiles needed for that rail, and an empty "common group".
    - Then, it selects the tile that appears on the largest number of groups, and to resolve ties, it picks that that appears in the largest group (since we want to make the extra groups as small as possible). It moves that tile to the common group, and restarts the process.
    - The key is that not all tiles are candidates to be moved to the common group. So, before I choose a tile, I run a test to see if I would be able to assign tile IDs to the tiles in all the groups, respecting the constraint that two tiles with the same ID cannot be in the screen at the same time. This is a bit trickier than it seems, since tiles can appear in multiple groups, and they must have the same ID regardless of the group they are on, so, this creates some weird constraints.
    - If at any time in the process one of the groups becomes a subset of another one, I merge them.
    - Since my algorithm does greedy optimization, it some times converges to a local optima where the extra groups are too large. So, what I did was to manually force the assignment of some tiles by hand before optimization starts. With that, I Was able to push it to a local optima that had small extra groups (if the were too big, I could not fit them in RAM!)

    And that's basically it. Now that I read it, it does not look that complicated, but for some reason I struggled when getting it right. Probably I was getting nervous with the deadline looming and me not having a game working yet! hahaha

    ReplyDelete
  3. Hi Santi, very interesting! I'm thegeps (Giuseppe) from msx.org
    I like reading of your techniques. In my scrolling routine I have in RAM (in ROM, actually) 31 couples of tiles and in every couple there are a tile and the tile that in the map can be immediately up. So i read with an offset between the two tiles definition and copy 8 bytes in tiles pattern in VRAM. Same thing I do for color patterns. So I calculate tiles I need and can have double buffer 'cause I have enough room. I hope, in future, I'll write an article to describe my routine as good as your article is

    ReplyDelete
  4. Thanks thegeps!!! Yes, you should definitively write about it!! Your scroll is super smooth, and I think lots of people would like to learn how to do something like that! :)

    ReplyDelete