Sunday, October 11, 2020

Quick Technical Overview of The Menace from Triton (MSX Game)


One more game, one more post! My latest MSX game is called The Menace fro Triton, and it is a horizontal space shooter that takes inspiration from classics like Salamander or the Nemesis saga for MSX, as well as from more modern shooter games like Steredenn or Z-Exemplar. You can see a small game play video here:


I put a lot of work into this game, and I just wanted to explain a few of the interesting technical aspects I designed for it, just in case someone is interested. For this game, I set out to create an MSX game that used only 16KB of RAM, and with a cartridge of at most 48KB. I like setting constraints like these on the games I make for two main reasons: First, because it is fun! Without constraints, many things become easier (e.g., if I allow more RAM, I could have many memory buffers, making things easier; if I allow for MSX2, then I can use the additional capabilities of the MSX2 VDP to make the scroll easier, etc.), but I enjoy the "game" of figuring out if it's possible to make all I want to do within the constraints I set. Second (and most important), I have found that if I don't set a size limit on a game, I never finish them. My PC games go on forever in an endless spiral of adding more and more features. But with a small and finite amount of space, I know that as soon as the 48KB of the ROM are filled, that's it! No more content! So, that helps a lot in getting games actually finished. If I had decided to use a MegaROM, I'd probably would still be working on it, and at some point the project would get abandoned...

Anyway! While in previous games there was something obvious to talk about (3d rendering in ToP, content generation in XSpelunker and scroll in XRacing), the Menace from Triton does not have one key technical distinguishing feature, but it has a collection of smaller ones that might be interesting to explain. So, I wanted to talk about 4 technical details that at least I found interesting when designing this game:

  • Text rendering routines: not sure if you noticed but the font in the game is not monospace. Each letter has a different width. This is a bit more complicated to set up than the usual "1 letter per tile", but allowed me to fit much more (and better looking) text!
  • Content generation: I wanted to have lots of levels in the game, and I did not have enough space in the ROM for all of them. So, I resorted to content generation as a way to reduce space. The most interesting routine perhaps is the one that generates the star map.
  • Scroll and Agendas: the core technique for smooth scroll is nothing to write home about (same idea as used in many other games before), but there are a few details that I find interesting such as the way tiles are precomputed to minimize space, and how updating all the different game elements (scroll, content generation, enemies, bullets, explosions, etc.) is orchestrated at run-time while maintaining the smooth scroll.
  • Code compression: I really didn't want to do this, as it's a bit annoying to set up. But I ended up having to divide the source code into modules that are compressed in the ROM, and only decompressed when they are needed. This saved quite some space in the ROM, as, of all of my MSX games, this is the one that has the largest amount of source code, using too much space in the ROM.
There's a few other details I could talk about such as the custom music playback routines, or the scripts I use to generate the music data (I composed songs using EXCEL as the "tracker" for this game, which was surprisingly powerful, as I had all the power of modern copy/paste/insert/formulas/etc. at my disposal :) ).

Alright, so, let's just get started! Looks like this is going to be a long post!

Text Rendering in The Menace from Triton

In most MSX1 games, text is rendered in a very simple way: each character uses an 8x8 tile. In this way, drawing a character means simply sending a single byte to the VDP (the graphics chip) to change the name table. However, that limits text to 32 characters per line (as the MSX resolution is 256x192 pixels, i.e., 32x24 tiles). I knew there would be a lot of text in The Menace from Triton, so that was not good enough...

Last year, I worked with my friend Jordi Sureda in a point-and-click demo for Amstrad CPC, and text was very important, so, I worked on a nicer text rendering piece of code. Although we never released it (coming soon), I also built an MSX version of that demo. So, I already had the text rendering code from there, which I adapted to this game. The resulting text looks like this (this is a screenshot from the game intro, which you can see if you wait a few seconds in the title screen):


Rendering works as as follows:

1. Drawing the font: I wanted the font to use as little space as possible, so, I only used the characters actually used in the game. No lower case (I had upper and lower case for the point-and-click demo, but for this game, I just used upper case letters, numbers and some special symbols; 49 characters in total). The image I drew looks like this:


2. Encoding the font: the next step is to encode the font into some memory structure. I decided to encode the text by columns. So, each byte represents a column of pixels. This limits the fonts I can support to those that are 8 pixels tall. But this was an acceptable constraint. With this representation, a thin character such as an exclamation mark (that is just 2 pixels wide) is encoded as 2 bytes, a wider letter, like a "W" that is 6 pixels wide is encoded as 6 bytes, etc. Putting all the letters together one after another, results in the following character definition table, which contains, for each character, a byte for the character width, followed by bytes defining the characters:

Note: after drawing this figure, I noticed that I flipped the bits horizontally. But I was lazy to re-do it, so, just imagine a mirror of the left-table :)

A second small table shown above (right) stores the indexes where each character definition starts. In order to make indexes fit in a single byte (i.e., have values between 0 to 255), I store index/3, which means that each character needs to start in a position multiple of 3 in the character definition data table. Thus, there are gaps in the table (e.g., bytes at offsets 4 and 5, right before the exclamation mark). Storing index/2 would make this table smaller (saving about 30 bytes), but results in worse compression for the text later on, and thus index/3 was the best tradeoff. As we will see later, the index table is actually not needed, so, the font uses 349 bytes.

3. Encoding Text: Now, in order to encode text, instead of having strings like "HI!" encoded as ['H', 'I', '!'], text is directly stored by a sequence of indexes. So, the string "HI!" is stored as [62, 64, 2], which are the indexes form the index table of the 3 characters. 

There is a lot of text in The Menace from Triton (if you put together all the strings in the game, they add up to 3220 bytes). In order to compact them a little bit, as expected, text is compressed. I used aplib to compress text, which resulted in a slightly better compression than pletter. Since I do not have a buffer of size 3220 in RAM to decompress text to, I divided all the sentences into "banks" of at most 512 bytes each, resulting in 7 banks. In the same way as I explained for Phantomas 2.0, I optimize the distribution of sentences over the banks, and at the end it all compressed down to 2017 bytes.

4. Rendering Text: To render text, each individual sentence in the game is identified by two bytes: (bank number, sentence number within the bank). If I want to render, say, the 5th sentence in bank 2, I decompress bank 2, and then skip the first 4 sentences in the bank, to get the pointer to the 5th sentence. The first byte of the sentence contains its length in characters.

Then to actually render, I have a small RAM buffer (just enough to draw a whole line of text), where I render the text column by column, and then I copy it over to the VDP pattern table. I can then set the attribute table in the VDP to whichever color I want, and thus render text in any color. For this to happen, almost everywhere in the game (except in-game) I use "screen 2 bitmap mode" (where the name table is reset to 0, 1, 2, 3, 4, ..., 255 in all three banks, and I just draw directly to the pattern and attribute table. If you are curious, the rendering code can be found in this file ("draw_sentence_pre_render" and "draw_font_character" are the actual rendering functions).

Content Generation

There are two separate procedural content generators in the game: the star chart generator and the game level generator

The game level generator is fairly standard, and uses a classic "pattern-based" approach, where levels are made out of pre-defined patterns (of size 16x22 in this case). Each of these patterns has annotations to determine which type of pattern can follow (e.g. "empty top", "land at top and bottom", etc.), as well as which enemy waves make sense to spawn in each of them. Other than that, levels are just generated randomly by picking from the set of patterns that match the constraints, and then spawning enemy waves that match with the selected pattern. The land-based enemies (moai heads, turrets, etc.) are pre-defined in the patterns, but not always spawned. Each planet in the star chart has a difficulty level associated with it that determines the chance of spawning enemies: the higher the difficulty, the higher the chance of spawning more enemies. The difficulty of a level is defined by 13 numbers:
  • Enemy spawn probability: probability with which the land-based enemies are spawned in their marked positions in the map
  • Bullet speed
  • Fire rate for aggressive enemies, fire rate for medium aggressive enemies, fire rate for enemies that should not fire (even these fire in later levels of the game haha)
  • Fire delay (minimum delay enemies need to wait before they can fire again)
  • Enemy movement speed
  • Base enemy hit points, hit points for medium enemies, hit point for large enemies
  • Power pellet spawn probability (the probability with which normal enemies spawn power pellets), power pellet spawn probability for enemies that should spawn less power pellets.
  • Target level length
For example, the first level you play has values: [0.5, 2, 0.06, 0.004, 0, 4, 1, 1, 1, 4, 0.75, 0.5, 10], while the very last level (Triton) has values [0.7, 3, 0.19, 0.06, 0.04, 0, 2, 2, 6, 10, 0.375, 0.19, 40]. Difficulty is determined by the "column" of the star chart (the more you move to the right, the harder levels get). You can see all the different difficulty values in the "difficulty_d1_values_ROM", "difficulty_d3_values_ROM", etc. structures in this file.

The star chart generator is a bit more interesting, and I really struggled to find a way that could be coded compactly in assembler and that generated an interesting star chart... This is an example of how the star chart looks in the latest version (not yet on GitHub):


The challenge is to generate a connected graph that starts at the bottom-left (Ithaki) and finishes in the top-right (the Aigai Nebula), while offering multiple possible paths for the player to choose from. After trying many failed solutions, I settled for something quite simple actually, but that worked pretty well, so, I kept it! The algorithm works as follows:
  • The basic technique is to use a semi-random walk on a 8-connectivity grid. The "canvas" where to draw the star chart is a 11*5 grid, with 8-connectivity (up/down/left/right/diagonals). Given a start and and end destination in this grid, a "semi-random walk" starts at the start position, and moves straight to the target position but with some small chance, it moves in a different direction than the shortest path. 
  • Generating the star chart with this "semi-random walk" is as simple as this:
  • Step 1: defining the anchoring points
    • point A: Ithaki
    • point B: Aigai Nebula
    • point C: a random point in a 4x4 space at the center of the canvas
    • point D: a random point in the 4x2 top-left part of the canvas
    • point E: a random point in the 4x2 bottom-right part of the canvas
  • Step 2: generate the following semi-random walks, as shown in the figure below:
    • A-D
    • A-E
    • A-C
    • D-B
    • E-B
    • C-B
    • D-C
    • C-E
  • Step 3: place a random planet type at each visited grid cell (there's 4 planet types: Moai, Water, Tech, Temple), and connectors in all the connections visited by the walks.
  • The first boss (Polyphemus) is always spawned in the 3rd planet you visit, and the second and third bosses (Scylla and Charibdis) are spawned in random planets in two predefined columns of the map. The last boss (Triton) is always in the final planet, also called Triton.
And that's it! So, after considering more complex techniques like graph-based algorithms (e.g. Kruskal), I settled for this very simple method that does not use too much space when implemented in assembler, and generates star charts like those I imagined! So, if you see any star chart that looks particularly complicated, it's just because the semi-random walks some times take lots of detours to reach to the destination, creating loops, etc. (which is nice!). Too much randomness and it was a mess, so, I set the randomness to a fairly low probability (there is, for example, only a 0.023 chance of moving in the diametrically opposite direction of the target during a walk).

Scroll and Agendas

The basic scrolling technique used in Triton is standard: scroll modes 2 by 2 pixels, and all scrolling tiles are pre-rendered. So, at runtime all that needs to be updated is the name table. This is not anything new, and many games use this technique. But there are two details that I wanted to talk about: how are tiles pre-computed and the agendas used to ensure all runs smooth.

Pre-computing tiles: The MSX VDP has 3 tile banks with the patterns/attributes that correspond to the top/middle/bottom parts of the screen. In most scrolling engines either the same set of 256 pre-rendered tiles is used for all 3 banks, or a separate one is used for each bank.  In Triton I used 2 sets. One for the top of the screen, and another for the middle/bottom. This was a good compromise between variety and space. Using the Moai levels as an example, the tile sets look like this:

Pre-computing tiles is usually done off-line (usually pre-computed on a PC using some script), and then directly included in the ROM. However, that means storing a lot of tile information. And having 4 different level types, that was too much space, which I did not have. So, I went with a different approach. Each pre-rendered tile uses 16 bytes before compression, so, 2 sets of 256 tiles would be about 8KB before compression. Thus, instead of storing the pre-computed tiles, I stored a small set of base tiles, that look like this for the Moai levels (in reality I had a single set of base tiles including the tiles of all level types):

Each pre-rendered tile is the result of taking two base tiles and an offset, like this:


So, basically instead of 16 bytes, we just need 3 bytes per pre-rendered tile: left-tile, right-tile and offset. So, that's what I store. Then, just before starting a game, this table is used to calculate all the pre-rendered tiles directly on the MSX side. This table is just 256 * 3 bytes * 2 banks = 1536 bytes before compression. The real table is a bit bigger (as calculating the attributes of the pre-rendered tile is tricky in some edge cases and the assembler code to reconstruct it would have been too large, so, since the offset only needs 2 bits (as it can only take values 0, 2, 4, 6), I use the other 6 bits to store a bit more info, and in some special cases there some extra bytes, but those are implementation details ;) ).

Thus, I mostly only need to store a single base tile image (4KB) plus 1536*4 bytes. Thus, from 32KB before compression, we go to only 10KB before compression. Moreover, in addition to those tables, in both cases I need some additional tables to accelerate scroll calculations during rendering (2212 bytes in total after compression). In total, once all is compressed, I only need to actually use 6018 bytes for all the tile information used for all levels, which was compact enough! Without this approach I was using almost 10KB, so this saved about 4KB!

On the other hand, I need some code for pre-calculating the tiles, but that code is just about 150 bytes or so. Thus, it is worth it!

Agendas: the second interesting part of the scrolling engine is how to fit all of the calculations into the limited 58000 CPU cycles per frame available in 60Hz machines! These are the different things that need to be rendered/processed in the game, together with their average and maximum time they use (some are marked with question marks as I have not measured them recently, and I was lazy to measure them again explicitly for this blog post haha):
  • 27457 cycles (27457 max): Draw map
  • 3794 cycles (3809 max): Update sprites in the VDP
  • 849 cycles (2400 max): Choose next pattern for level generation
  • 16314 cycles (20014 max): decompress chosen pattern for level generation (level patterns are compressed as they do not fit in RAM or ROM decompressed, so, when one needs to be used in the game, it has to first be decompressed)
  • 7472 cycles (7472 max): copy decompressed pattern to the map buffer
  • 2138 cycles (5327 max): background star field scroll
  • 1157 cycles (1157 max): get keyboard/joystick input
  • 2179 cycles (8275 max): update the player ship/options
  • 6211 cycles (18047 max): update sprite-based enemies
  • 2782 cycles (23575 max): update enemy bullets (max is high, as it includes handling the effect of bullets colliding with player/walls)
  • ??? cycles (??? max): update tile-based enemies
  • 3766 cycles (8180 max): update player bullets
  • ??? cycles (??? max): update player secondary bullets (option bullets/missiles/torpedoes)
  • ??? cycles (??? max): update tile-based explosions
  • ??? cycles (??? max): update power pellets
  • ??? cycles (??? max): update player/enemy/bullet/power pellet/explosion positions when scroll loops around in the circular buffer
  • ??? cycles (??? max): music/SFX
If you add up all of those number, the total is about 85K cycles (with a max of about 150K), which was clearly not going to fit within one game frame :)

So, the solution used in the Menace from Triton was to use agendas. I have an agenda with 256 time slots, where each of the 256 time slots specifies which of the operations above is executed. So, not all operations are updated at every game cycle. It takes 16 game frames for the scroll to advance one whole tile, so 256 game frames is what is needed to advance 16 tiles (which is a whole level generation pattern). Thus, once each 256 cycles we need to choose a new pattern, decompress it and copy it to the game map circular buffer. I am not going to give the complete agenda, but here are a few hints:
  • Draw map is only done in the even cycles
  • The star field is only updated once every 4 cycles
  • Player is updated at every frame (thus, player moves at a smooth 50/60 fps)
  • Sprite-based enemies are only updated in the odd cycles (so, enemies move only at 25/30 fps)
  • Tile-based enemies are only updated once each 16 frames
  • Choosing the next PCG pattern and copying it to the circular buffer is spread across the 128 odd cycles:
    • in cycle 1 we choose the pattern
    • in cycle 3 we decompress it
    • cycles 5, 7, 9, etc. we go through the new chosen pattern (2 rows at each cycle) and spawn any tile-based enemies that need to be spawned
    • in cycles 131 and 133 the fully instantiated new chosen PCG pattern is copied to the circular map buffer (in 131 we make one copy, and in one edge case we need to copy the pattern twice; at the beginning and end of the circular buffer, for when scroll wraps around; this second copy if needed is done on cycle 133)
  • etc.
I really like this idea of an agenda, as it managed to reduce the average CPU required per frame to about 29K cycles (thus, there's CPU to spare in most frames!), and the maximum per frame in most cases stays below 50K. There are a few peaks of 100K (when text needs to be rendered, e.g. when a power pellet is picked up and the text corresponding to the next weapon in the power bar needs to be rendered). But those are one frame in a thousand, and are barely noticeable! The actual agenda used is encoded as code and not as an explicit table, but in games where space is not an issue, an actual table might be a better implementation. You can see some notes on the actual agenda for procedural generation operations in the "update_scroll_and_pcg" function comments in this file.

I thought this is a pretty general concept that could be useful in many many games, and I am definitively going to be using it in the future!

Code Compression

Even with all my efforts to fit the game within the 48KB of the ROM, it became clear it was not going to fit... So, I even considered having only 3 level types instead of the 4 I had initially planned... I even combined two different compressors (pletter and aplib), and compressed every piece of compressed data with both to see which of them compressed each file the best! At the end both of them are included in the game, as it turned out that I could save space by including both compressors, and choosing the best compressor for each piece of data than just picking either of the two! For example, interestingly for the level generation patterns, the best turned out to be to compress each pattern with pletter, then concatenate the compressed patterns, and compress the result with aplib. This was better than pletter-pletter, aplib-pletter, or aplib-aplib... In any case, I was desperate for space, so, I decided to try to compress the part of the game that was using the most space: code!

The code of the game uses about 27KB, which is way too much. That's more than half of the ROM! Compressing code is not uncommon in platforms like Spectrum/Amstrad where you can count on having a lot of RAM for decompressing the code to. For example, the Operation Alexandra game went all the way and THE WHOLE game is compressed in the tape, and decompressed right after the intro scene plays (overwriting it). But in MSX with only 16KB of RAM that's a bit tricky... So, I ended up doing this:
  • During gameplay it is when I am more RAM constrained, as the scrolling engine needs some fairly large buffers. So, the in-game code had to be in ROM.
  • An exception of this is while in boss-fights, as there is no more content generation going on, so, I don't need the content generation buffers.
  • So, I chose a set of "core" routines (in-game code, music, graphics, etc.) and that went to the ROM: this was about 16KB-17KB.
  • All the code for the intro/menu/star chart/password/weapon configuration menus is compressed in a single chunk. From 6291 bytes, this went down to 4232 bytes
  • Each of the 4 bosses code is also compressed, and only decompressed right before the boss fight (you might have noticed a slight pause when the boss is spawned): the code for each of the 4 bosses went down from: 756 / 1003 / 1476 / 749 bytes to 593 / 712 / 949 / 548 bytes!
All in all there are 10275 bytes of compressed code that compress down to 7034 bytes, saving a very nice chunk of 3241 bytes! I was very excited by this!

The one tricky thing is how to handle "calls/jumps" between the main part of the code and the compressed parts. So, what I did is this. The compressed pieces of code are always decompressed to the same address in RAM. So, I know where do they go. I inserted "hooks"/"entry points" in the compressed code so that I knew which addresses to jump/call. For example, the menus compressed code looks like this:
    include "constants.asm"
    include "autogenerated/text-constants.asm"
    include "../triton.sym"
keyboard_line_clicks:       equ keyboard_line_state+1

    org NON_GAME_COMPRESSED_CODE_START

    ; entry points:
    jp state_braingames_screen
    jp state_mission_screen_from_game_complete
    jp state_mission_screen_from_game_failed
    jp state_gameover_screen
    jp generate_minimap
    jp enable_nearby_minimap_paths
    jp get_minimap_pointer
    jp state_mission_boss_under_cursor

    include "state-story.asm"
    include "pcg-minimap.asm"
    include "state-title.asm"
    include "state-ending.asm"
    include "state-mission.asm"
    include "gfx-bitmap.asm"
    include "state-weapons.asm"
    include "state-gameover.asm"
    include "state-braingames.asm"
    include "state-password.asm"

Notice two things:

  • See that "include "../triton.sym"" statement? So, I compile the main part of the code. Such compilation generates a symbol table (triton.sym), which is then included by the compressed code, so the compressed code can call any function from the main part of the code. I then compile the compressed code, compress it with pletter/aplib, choose the one that best compresses it, and add it to the ROM.
  • The second part is that list of "jp ..." right before the list of includes that have the actual code. Those are the "hooks". Notice that the symbols present in the compressed code are not visible to the main code. So, I need a way to call those functions! So, if from the main part of the code, I want to call, say, the "generate_minimap" function, I just need to call it like this: "call NON_GAME_COMPRESSED_CODE_START+3*4", as each jump instruction uses 3 bytes. Of course, you I can define constants for these calls if I wanted, to make the code look prettier. 
Compiling the project then involves the following steps:
  • Create placeholder (empty) files corresponding to the compressed code (e.g., empty files with the right name, so that the assembling process works)
  • Compile the ROM (this just compiles the main part of the code, and includes the placeholder files instead of he actual compressed code). This also generates triton.sym.
  • Compile the compressed code (using triton.sym), and compress it, overwriting the placeholders.
  • Re-compile the ROM again, this time with the proper compressed code.
So, I need to compile the ROM twice every time I do a change, but that just takes a couple of seconds, so it's ok!

Finally, after submitting to the MSXDev competition, I also tried optimizing the code using my MDL assembler code optimizer, and that saved about 200 extra bytes. Thanks to those bytes I could add the new password saving system (with those 200 bytes, plus about 800 bytes of free space that were already in the ROM). In the final version of the game I only have 232 free bytes in RAM, and 0 free bytes in the ROM. MDL still suggests changes to save an additional 150 bytes or so when ignoring speed, but most of those are jp -> jr changes that would make some of the core routines slower, so, I don't want to make those changes unless I have no other choice! I have done just the minimum necessary MDL-suggested space optimizations to fit the password code (and the last few were really painful to accept hehe).

Conclusion

Anyway, that's it! So, as I mentioned above, I don't think there is any groundbreaking techniques used in the game, but there are lots of interesting (at least to me) little details here and there. I think the text rendering and the agendas are the part that I am happier about in the game, as those really made a difference! I'll definitively be reusing those in the future!

3 comments:

  1. Wow Santi, I liked a lot reading this! I'm asking myself why am I so lazy and didn't still write nothing about Freedom Fighter. You deserved victory for sure! Great game and great coder! thegeps

    ReplyDelete
  2. Great stuff! Super interesting and what we like! I have been thinking about doing something similar after I finish the current “noname”-platformer. I’m sure some of the ways/decisions made there, is of help to others. Reason is, giving away or having a look at the source code doesn’t really help much, as things are complicated. This is a way to give back to the community, and help improving the quality of the products on the msx scene!

    ReplyDelete