Friday, September 22, 2017

Raycasting in Tales of Popolon (MSX)

Today I wanted to talk about how raycasting works in my game Tales of Popolon, which I'll refer to as ToP for short. If you have never seen ToP, think of it as if you were to take the classic Wolfenstein 3D, but with a Greek theme, and running on an 8bit computer rather than on a PC. You can see a video of it here:

I am particularly proud of this game, since I put a lot of effort into making the raycasting engine run fast enough to run in the slow MSX CPU, and also using the MSX VDP (the graphics chip), that was definitively NOT designed for being used this way (the VDP better suits games with 2D graphics made out of 8x8 tiles).

Before I go on, I just wanted to take this chance to thank everyone that helped me complete this project. Starting from everyone that provided feedback, suggestions, and even help with the MSX2/MSX2+ and Turbo R support! And also special thanks to Matra for the fantastic physical edition of the game! (which you can find here).

Also, a small warning: this post is going to be quite technical, and I'm going to get into a lot of details of assembler programming. So, you are warned! :)

Ok, so, let's get right on to raycasting! Let me start by briefly explaining how raytracing works!

Raycasting Basics

The key idea behind raycasting is the following. Imagine that we stand in the middle of our living room and hold an empty picture frame in front of us (at arms length). Imagine now that we put a grid on top of the picture frame, so that the empty imaginary canvas is divided into a rectangular grid (each of the cells in this grid is one of the pixels in the screen). Now, if we want to know which color to paint each of the cells of this grid, we just need to "cast a ray" from the coordinates or our eye (assume we have a single eye, for simplicity) in the direction of each cell. The first object in our living room that such ray collides with after passing through the picture frame will determine the color of that cell.

So, to render a game using ray casting, we basically have to "cast a ray" for each pixel in the screen, calculate which object in the game gets "hit" by the ray, and paint the corresponding pixel of the color of that object.

Since that might be hard to visualize, let me show it with a small diagram (perspective might be totally off, since I drew it by hand in OminGraffle, hahaha):

Of course, if we want to do all of those calculations for a complex 3D scene, and expect the game to run at a decent frame rate, we would need a much more powerful computer than an MSX1. So, we will have to do lots of simplification assumptions. So, let me explain my simplifications in more detail (which are very similar to those used in Wolfenstein 3D, although not exactly the same).

Representing the Level Map

Simplification 1: the first simplification is how to represent the map. Rather than have 3D geometry, the map is represented as a 2 dimensional grid. Specifically, this is how I represent the very first level of ToP:

db 2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1
db 2,2,2,2,0,0,2,0,0,0,0,0,0,0,0,1
db 2,2,2,2,0,0,2,1,1,0,1,1,1,1,0,1
db 6,0,0,0,0,0,5,1,1,0,0,1,0,1,0,1
db 2,2,2,2,0,0,2,1,1,1,0,1,0,0,0,1
db 1,1,2,2,0,0,2,1,1,1,0,1,1,1,1,1
db 1,1,1,2,2,3,2,1,1,1,0,1,1,1,0,1
db 1,0,1,1,1,0,1,1,0,0,0,0,0,6,0,1
db 1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1
db 1,8,1,1,1,0,0,0,0,0,0,0,0,0,0,1
db 1,0,1,1,1,1,0,1,1,1,1,1,1,1,1,1
db 1,0,0,0,0,0,0,1,1,1,1,2,2,2,2,2
db 1,1,1,0,1,1,1,1,0,0,0,0,0,2,2,2
db 1,1,1,0,1,0,0,0,0,0,0,0,0,3,0,5
db 1,1,1,0,0,0,1,1,0,0,0,0,0,2,0,2
db 1,1,1,1,1,1,1,1,1,1,1,2,2,2,4,2

As you can see, it is a 16x16 grid of cells. Where each cell is stored as 1 byte (all the levels of ToP are exactly the same size: 16x16 = 256 bytes). This is done so that I can represent map coordinates at cell resolution in a single byte, with the 4 most significant bytes being the Z coordinate, and the 4 least significant bytes being the X coordinate. (note: I am using Y to represent "up", so, the map grid is in X, Z coordinates)

The number in each cell represents what is there in that position of the map: 0 means corridor, 1 is a red wall, 2 is a blue wall, 3 is a door, 4 is a staircase, 5 is a stone face, 6 is a gate, etc.

ToP uses coordinates at two levels of resolution:

  • Cell coordinates: where 0, 0 is the top-left of the map, and 15, 15 the bottom-right of the map represent the coordinates of the map at increments of one "cell". Walls are placed at this resolution.
  • Pixel coordinates: each "cell" is 16x16 pixels in size, so maps are 256x256 "pixels" in size. The coordinates of the player, enemies, items, bullets, etc. are at this resolution. 
So, to get the cell coordinates of an enemy at position (x, z), we just need to do (x/16, z/16). 

Also, as you can see, walls in ToP always occupy the whole height of a level, from floor to ceiling. The engine does not support slopes, half-height walls, etc. That is the first and most important simplification (half height walls are actually trivial to add, but I ended up removing them, since they didn't look great, and didn't add anything to the game!).

To make this post an acceptable length, I will only focus on rendering the background (walls, ceiling and floor) or a level for now, ignoring the items, enemies, etc.

Casting Rays

Ok, so, the very first thing we need to do is to be able to cast the ray corresponding to a given pixel. Let us make the following assumptions:

  • We have a camera in the game world at coordiantes (cx, cy, cz). 
  • The camera is looking in a direction (fx, fy , fz). (the forward vector)
  • Now, we assume that the "viewport" would be placed in world coordinates at a fixed distance D in front of the camera. So, the coordinates of the pixel at the very center of the viewport are (cx, cy, cz) + D*(fx, fy, fz). Also, we know that (fx, fy, fz) is the normal vector to the viewport (which is now completely defined).
Now, what we need to do is to calculate the coordinates of each of the pixels in the "viewport" in game-world coordinates, and with that we can define the ray that starts in the camera and goes through each one of them. But this can start to get complicated already (not math-wise, but computation-wise, since the MSX will struggle with trigonometry, multiplications, etc.). So, let's start making simplification assumptions:

Simplification 2: the camera will ALWAYS be at half-distance between the floor and the ceiling. So cy = "height of the walls"/2. Also, this means that all the pixels in the top-half of the screen correspond to rays that "go up" (and will eventually collide with the ceiling), and all the pixels in the bottom-half of the screen "go down" (and will eventually collide with the floor). 

Simplification 3: All the pixels in the very center of the screen, for which the rays go forward at very far distances without colliding with ceiling/floor will be ignored! So, there is a horizontal band of pixels in the middle of the screen that are treated specially.

Simplification 4: the camera can never loop up or down, always forward. This means that fy = 0.

With those simplifications, the process is already much simpler than the general case of raycasting! So, considering just the top half of the screen, to determine the color of a pixel P, what we need to do is:
  1. Calculate the coordinates of pixel P in game-world coordinates: px, py, pz
  2. Calculate the game-world coordinates at which the ray that passes through px, py, pz collides with the ceiling (i.e., when y = "height of the walls"): (lx, ly, lz)
  3. See what object is in the map at cell coordinates (lx/16, lz/16), and use the color of that object to draw that pixel.
And basically that's it!.... except for a small problem with Step 3... Notice that step 3 assumes that the position where the ray collides with the ceiling contains the object we want to draw, but that is not always the case, for example, look at the following situations (side view):

  • ray 1 will collide at a position where there is no wall (so, it's a corridor, identified with a "0" in the example map I showed above), and thus will render with "corridor" color
  • ray 2 will collide at a position where there is a blue wall (number "2" in the example map), and thus will render blue color.
  • But ray 3, which should also be a wall, collides with the ceiling at a position where there is no wall, so, it will be rendered the "corridor" color, which is wrong.

To solve this problem, there are several solutions. But the one I chose for ToP was this:
  1. For each vertical column of pixels in the screen (remember, we are only considering the top half of the screen for now), start with the top pixel:
  2. If the current ray collided with a corridor, draw corridor color, move on to the next pixel and repeat from step 2 (unless we are done with this column, in which case, we terminate).
  3. If the current ray collided with a wall, then draw wall color for this pixel and for all the remaining pixels of this vertical column, then terminate.

In this way, as soon as "ray 2" collided with a position where there was a wall, "ray 3" would have not even been considered, since all the remaining pixels would be drawn as wall.

Now, as a consequence of Simplifications 2 and 4, once we have rendered the top half of the screen (and if we ignore the fact that there are textures), the bottom half of the screen is just a mirror image of the top, and we don't even need to cast rays! So, if we want to do raycasting without textures, we would be done at this point.

I will explain below how to calculate the "collision points" in an efficient way, but for now, let's move on... Suffices to say that we want the collision coordinates to be at the "pixel coordinates" resolution level (so that we can properly render textures).

Rendering Textures

Rendering a texture basically means calculating which of the pixels of the texture needs to be copied to each pixel in the screen. In ToP, in order to make this calculation fast enough, I used textures that are 16x16 pixels in size. So, for example, this texture representing one of those stone faces that talks to you in the game

is stored like this:

    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,#f1,#f1,#f1,#f1,1,1,1,1,0,0,0,0
    db 0,0,0,#f1,#f1,1,1,1,1,1,1,1,1,0,0,0
    db 0,0,#f1,1,1,1,1,1,1,1,1,1,1,1,0,0
    db 0,0,#f1,1,1,1,1,1,1,1,1,1,1,1,0,0
    db 0,0,#f1,0,0,0,0,1,1,0,0,0,0,1,0,0
    db 0,0,1,1,0,#81,0,1,1,0,#81,0,1,1,0,0
    db 0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0
    db 0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0
    db 0,0,0,1,1,1,0,1,1,0,1,1,1,0,0,0
    db 0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0
    db 0,0,0,1,1,0,0,0,0,0,0,1,1,0,0,0
    db 0,0,0,1,0,1,1,1,1,1,1,0,1,0,0,0
    db 0,0,0,1,1,1,1,0,0,1,1,1,1,0,0,0
    db 0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

The pixels that have their 4 least significant bits to 0, are painted "black", those that have value "1" are painted of the "default texture color" (in this case grey), and those that have their 4 most significant bits different form 0 have their own color. I added multicolored textures later in the project, so this representation was just a hack, since I was running out of space in the ROM. But if I re-implement this again, I would probably just have the color directly (without having any "default color" thinguie).

So, when an object (wall, ceiling, floor) needs to be rendered, what needs to be done is to calculate the "offset" in the texture that needs to be rendered, i.e., the coordinates (tx, ty) of the pixel of the texture that needs to chosen.

Ceiling/Floor: Given that we have the collision point (lx, ly, lz) of each ray in "pixel coordinates", this is as simple as: tx = lx%16, and ty = lz%16.

Walls: Walls are more complicated, and "tx" and "ty" are each calculated via different methods:
  • tx: the "cell coordinates" of the ray collision point (lx, ly, lz) are (lx/16, lz/16), and we know that there is a wall in that position (otherwise, we would be rendering a ceiling/floor texture). But what we need to know is whether the ray entered in (lx/16, lz/16) from the north, east, south or west, and what is the "pixel coordinate" at which the ray exactly entered this cell. That entrance point is exactly what will determine "lx". So, if the entrance point is (ex, ez) in pixel coordinates, and the ray entered through the north or south, then tx = ex. If the ray entered through east/west, then tx = ez. To resolve this, ToP does this: we know that the previous screen pixel we rendered was showing either ceiling or floor (since if it were showing wall, this ray would have never been casted!). Let us call (lx0, ly0, lz0) to the ray collision point of the previous pixel. So, what we do is we use a Bresenham algorithm to move, pixel by pixel from (lx0, lz0) to (lx, lz) until we cross into the cell where the wall is, which is exactly the point where the ray entered the wall. In case that this is the very first pixel in the current column, then (lx0, lz0) = (cx, cz). 
  • ty: for this, what ToP does is this: let "n" be the screen y coordinate of the first pixel in the current column that is a wall (all the previous pixels were ceiling). If the vertical screen resolution is "m", then we know that the wall will be "m - (n*2)" pixels tall when rendered. So, ty is determined by starting with ty = 0 in the first pixel in this column that is a wall, and then advancing at increments of "16 / (m - (m*2))" pixels at each consecutive pixel in the column.

Comparing this to Wolfenstein 3D: one detail is that Wolfenstein 3D did NOT render floor/ceiling textures, but ToP does. So, in a sense ToP's engine is a little bit more advanced than Wolfenstein 3D's one :) (in other sense, it's more limited, of course, such as in rendering partially occluded sprites, for which ToP uses hardware sprites, which cannot be partially occluded easily)

Also, I noticed that this way of rendering textures would allow animations for free, since I just need to associate a different texture to a given object in different frames, and it'll look like an animation is playing. So, I added one wall type that has a small animation. I wanted to add more animations, but I ran out of ROM space... :( (notice that animations do not slow down the game at all, every wall could be displaying an animation at the same time, and it would not affect rendering time).

So, in summary, the rendering algorithm is as follows:

for(x = 0;x < screen_width; x++) {
  (lx0, lz0) = (cx, cz)
  for(y = 0;y < screen_height; y++) {
     (lx, ly, lz) = collisionCoordinatesOfRayForPixel(x, y);
     object = map[lx/16][lz/16];
     if (object == CORRIDOR) {
       putPixel(x, y, corridorTexture[lx%16][lz%16]);
         (lx0, lz0) = (lx, lz)
     } else {
       (rx,rz, direction) = crossOverPoint(lx0, lz0, lx, lz);
       if (direction == NORTH || direction == SOUTH) tx = rx%16;
                                                else tx = ry%16;
       ty = 0;
       for(y2 = y; y2 < screen_height; y2++) {
         putPixel(x, y2, objectTexture[object][tx][ty]);
         ty += 16 / (SCREEN_WIDTH - y*2);
       break;  // we are done with the column

Doesn't look that complicated, right? Well, it's actually not! the complex part is to make it run fast. So, let's get on to it! How do we implement this on an MSX?

Organizing the VDP Memory for Raycasting

As you can see in the pseudo code above, pixels are rendered column by column. So, it is important that pixels that are one on top of another are contiguous in memory, so that if we have the pointer of the current pixel in say, HL, we can just do "inc hl", to move to the next pixel. So, since I used SCREEN 2 for the game, I organized the name table like this:

Where you can see that I use the top two banks for rendering the game, and the bottom one for the score board. So, the pattern and color table of banks 1 and 2 is set to all 0s at the beginning of the game, and the regular patterns (the txt font, etc.) are only kept in bank 3.

Since I don't use the whole screen to render the game, I left some black borders around (marked with pattern "#ff"). Then you see that the first pattern that is used for rendering is #00, then immediately below that is #01, etc.

Simplification 5: I reduced the resolution of the game, and each rendered pixel is actually a 2x2 pixel. This divides by 4 the number of rays I need to cast, making the game run 4 times faster. I will call each of the 2x2 larger pixels a "logical pixel" from now on, to avoid confusion.

So, each 8x8 pattern contains a block of 4x4 logical "pixels". So, the first four "logical pixel columns" (0, 1, 2, 3) are stored in patterns #00, #01, #02, #03, #04, #05, #06 and #07. The next four columns (4, 5, 6, 7) are stored in patterns #08, #09, #0A, #0B, #0C, #0D and #0F, etc.

I keep an offscreen buffer in RAM that contains both the pattern and the color table, which is what I render to. Once a complete game frame has been rendered to this buffer. I copy its contents over to the VDP (first pattern table, and then color table). Since there is a lot of info to copy, you might have noticed that the colors in the game some times flicker a bit (that's because it takes more than one screen refresh cycle to copy all the information over to the VDP!). I considered interlacing the copying, but, even if it made it look better, it made the game a bit slower too, and used more space in the ROM, so, I decided against it.

In order to render, the process is thus like this:
  1. Clear RAM buffer: the part corresponding to the pattern buffer to all 0s, and the part corresponding to the color buffer to the color of the ceiling or floor (in that way the color is already set, and I only need to change it when I render a wall). The background color for all the patterns is always kept to black, and I only change the foreground color.
  2. For each column of pixels in the screen:
    1. Update the pattern buffer: for this, at the beginning of each column, I calculate a "mask", which is #C0 for the first column, #30 for the second, #0C for the third, and #03 for the fourth (after that, the cycle repeats).  To draw a pixel I OR this mask with the current pattern if I want to set the pixel as "on", and if I don't need to draw a pixel (when the texture has value 0), then I just skip this pixel.
    2. Update the color buffer: since the color can only be changed for blocks of 8x1 pixels, this means that each block of "4x1 logical pixels" (remember that the "logical pixels" in ToP are blocks of 2x2 pixels) have the same color. So, there can be some color clash. What I do is simply, each time a pixel is drawn (when I OR the mask for the pattern buffer), I set the color in the color buffer to the pixel color (only when rendering walls, when rendering floor/ceiling, the color buffer does not need to be even touched).
  3. Once all the columns have been rendered, copy the RAM buffer over to the VDP.

I considered not clearing the RAM buffers, and just overwrite. But it turns out that the extra "AND" operation required to "erase a pixel", and having to write colors each time was slower. So, clearing the RAM buffer was the fastest solution.

The Game Loop

Rendering a whole game frame takes very long (I can only render between 2 to 3 frames per second, depending on the viewport size selected). So, characters and enemies would move too slow for making the game playable. The solution I used was to divide the game loop into "sub frames". Each "game frame update" is actually divided into four "sub-frame updates" like this:
  • Subframe 1: Update player character/enemies/bullets/etc. Clear RAM buffer, and render 1/4th of the columns.
  • Subframe 2: Update player character/enemies/bullets/etc. Render 1/4th of the columns.
  • Subframe 3: Update player character/enemies/bullets/etc. Render 1/4th of the columns.
  • Subframe 4: Update player character/enemies/bullets/etc. Render 1/4th of the columns, copy the RAM buffer to the VDP.
In this way, the game executes 4 updates for each complete frame rendered using raycasting, which feels more fluid.  You can seethe effect of this in that enemies/bullets move smoothly, even if the environment is rendered much slower.

Note: as pointed out by NYYRIKKI in the forums, a better strategy would have been to run the game update on the screen refresh interrupt, and let the renderer run loose in the background. In that way, I would not need any speed adjustments when running on faster machines (turbo R). So: noted for the future, in case I reimplement this again! :)

Precalculated Tables

The computations required for raycasting require many multiplications, divisions and calls to trigonometry functions, all of which are very slow on MSX. So, in order to make the renderer render more than 1 frame every 10 seconds, I used a bunch of pre-calculated tables. Specifically, I used these tables:

  • Raycasting collisions: this is all precalculated. Notice that since I do not allow the camera to rotate up and down. The rays for each of the pixels in the screen will always be the exact same ones. So, what is the point of calculating this every time?! What I have is two precalculated tables (ray_x_offs_table, ray_y_offs_table) of size 512 x 32, which contain this: 
    • ray_x_offs_table[a][b]: the distance in the "x axis" at which a ray casted at an angle "a" around the y axis and "b" around the x axis will collide with the ceiling. "a" is precalculated at a resolution such that "512" steps is a full revolution. When rendering adjacent columns, we just advance one step in this dimension. So, since the default rendering viewport is 80 logical pixels wide, that corresponds to a viewing angle of about 56 degrees (which is close to the usual 60 degrees). The vertical resolution is 32 steps, since the viewport is 64 logical pixels tall, which means that the top half is 32 pixels tall. So, [b] is basically indexed by the pixel row we are currently rendering. 
    • ray_y_offs_table[a][b]: is the same but in the "z axis".
The camera angle is kept in the resolution of "128 degrees is one full revolution". So, when I want to cast a ray for logical pixel "i, j", I check ray_x_offs_table[angle*4 + i][j], and to the result of that I just have to add the camera coordinates, and that gives me directly the ray collision coordinates. So, no need to do any complex math during raycasting!
  • Texture offset increments: if you check the algorithm above, one step says "ty += 16 / (SCREEN_WIDTH - y*2);", this is problematic since it has one division. I have a precalculated table called "texture_vertical_rate_table" that stores the value of "16 / (SCREEN_WIDTH - y*2)" indexed by "y". So, one more division saved!
  • Sin/Cos tables: for moving the player / bullets / enemies around, I need to make them move depending on the angle they are facing. So, I need to calculate sine and cosine. This is all precalculated in four tables: sin_table, cos_tablesin_table_x12, cos_table_x12. The last two, since there are things that move faster, and thus, it's convenient to have it precalculated with the multiplication built-in already.
  • Distances: this has to do with rendering enemies/bullets/items. When I need to render an enemy that I know is at a certain distance, I need to know where in the screen to render it. That would also have involved divisions. So, I have a precalculated table that tells me for an object that is at distance dx, dz (in the X and Z axis respectively), at which y coordinate do I need to render it in the screen.
  • atan2: this is a routine I got from a post that ARTRAG did in the forums, that also has some precalculated tables (see original post here). I use it to calculate the X coordinate where to render enemies / items in the screen.
You would think that after all of these simplifications, this would already run fast on an MSX, right?! Well, think again!!! After I implemented all of the above it took 5439255 cycles in average to render a single frame... if we recall that the Z80 in the MSX executes 3579545 cycles per second, that comes to about 0.66 frames per second... it was unplayable...

At this point I had spent quite some time working on the engine and I was about to just give up. "The MSX is too slow for this" I thought. But I started looking at videos of other raycasting games for the MSX and Spectrum. Other people had managed to do it! so, there had to be a way! The next few weeks were 100% dedicated to optimization of the whole thing!

Speed Optimization Techniques

There are lots of websites with a very large number of Z80 optimization tricks. I read all I could find, and every single little trick that I could apply to my code, I applied it! The one that I'm sad I did not find a place to apply it is this one (which blew my mind!): So, I'm not going to go into detail of every single little thing, but just focus on those techniques that really made a big difference.

Moving if-then-else statements outside of inner loops: the first optimization (that saved about 400k or 500k cycles if I recall correctly) was the realization that every time I had a piece of code that looked like this:

while(some condition) {
  if (condition that does not change inside of the loop) {
    do X
  } else {
    do Y

I should replace it with:

if (condition that does not change inside of the loop) {
  while(some condition) {
    do X
} else {
  while(some condition) {
    do Y

This increased the size of the code significantly, but made it much faster. If you look at the raycasting code, you will see that I did this, for example, for handling the four quadrants. The code for managing raycasting in each of the four quadrants (from 0 to 90 degrees, 90 to 180, etc.) around the Y axis is identical except for a couple of places where I need to either add or subtract. By taking the "if we are in quadrant 1" conditions outside, I had to replicate the loop 4 times, but each loop ran much faster.

Aligned Tables: imagine that you have a table in a memory position X and you need to index it by the content of the accumulator register A, what you would have to do is something like this:

ld hl,X
ld b,0
ld c,a
add hl,bc
ld a,(hl)

However, if you know that X is a multiple of 256, then you know that when you do "ld hl,X", "l" will be equal to 0. So, you can actually just do:

ld h,X/256
ld l,a
ld a,(hl)

Voilà! much faster! This is a very common trick, and not anything I invented, but it saved another few hundred K cycles as well! I tried to use aligned tables for everything I could: textures, maps, etc.

Maximizing register use: but by far the thing that saved more CPU cycles was removing as much as I could any access to memory. The Z80 might only have a few registers: AF, BC, DE, HL, IX and IY. But a few of those (BC, DE, HL) are actually replicated! Using the assembler instruction EXX we swap from BC, DE, HL to the "ghost" copies of those registers. So, in fact we have AF, BC1, BC2, DE1, DE2, HL1, HL2, IX and IY, which is a TON of registers! Also, the IX and IY registers can also be used as pairs 8 bit registers (IXL, IXH, IYL, IYH), they are slower than the others, but still faster than RAM!

One of the most time consuming optimization processes I undertook was to track the usage of ALL registers (including the ghost BC, DE, HL) throughout the whole raycasting code. So, by making extensive use of EXX, I managed to remove all usages of variables in RAM in all inner loops of the raycasting code. For example, when rendering a texture, I keep the pointers to the texture, etc. in the regular BC, DE, HL, and the pointers to the raycasting pattern and color RAM buffer in the ghost copies. Sounds simple, but it was a pain to track which registers were being used at all times... I think all of this saved at least 1 million cycles or so.

Top Half of the Screen is Analogous to the Bottom: when rendering the bottom half of a column in the screen, there is no point on repeating calculations, since it is almost a mirror of the top half. When rendering the ceiling texture on the top half, I store all the texture data in a 32 byte buffer (which, of course, is 256-aligned), and when rendering the bottom half, I just retrieve it from that buffer (raycast_floor_texture_buffer).

Floor/Ceiling Texture Optimizations: in the first version, I allowed arbitrary textures for the ceiling/floor. But it turns out that rendering the ceiling/floor was the most time consuming part of the whole renderer! (no wonder they did not include it in Wolfenstein 3D!!!). So, what I ended up doing is to remove the floor/ceiling textures, and replace them by hard-coded code, that given the (tx, ty) offset coordinates calculates whether the pixel should be painted or not. For example, the floor texture is generated by painting those pixels for which (tx+ty) & 0x80 != 0

Thanks to such optimization, the code that renders the floor, is as simple as this!

    ;; retrieve the texture stored during ceiling rendering (bc points to "raycast_floor_texture_buffer"):
    ld a,(bc)
    or a
    jp z,raycast_render_done_with_floor_pixel

    ;; texture is 1, render pixel:
    ld a,(hl)   ;; hl points to the raycasting pattern RAM buffer
    or d        ;; d has the "mask" corresponding to this column
    ld (hl),a   ;; render pixel
    inc hl
    ld (hl),a   ;; we render it twice, since logical pixels are 2x2
    inc hl
    dec c       ;; c is the number of pixels left to render in this column
    jp nz,raycast_render_next_pixel_floor
    jp raycast_render_done_with_column

    inc hl
    inc hl
    dec c
    jp nz,raycast_render_next_pixel_floor

This saved another very significant number of CPU cycles!

Saving Space in the ROM: finally at the end I was running out of memory, so, in addition to compressing everything with Pletter, etc. I had to reduce the space taken by all the precalculated tables. For that reason, if you look at the code, you'll see that all the sine / cosine tables are overlapped (since the sine / cosine functions are the same function, just one is shifted from the other). And also, I don't really have a ray_y_offs_table, since it's exactly the same table as ray_x_offs_table but reversed. So, I only store one of them and either add (for X) or subtract (for Y) to index it.

After all of these optimizations, at the default viewport size, the game takes 1550184 cycles in average to render a whole frame, less than 1/3 of the original time! and making it to about 2.3 frames per second! If you reduce the viewport size to the smallest size then it takes about 1053651 cycles in average, getting about 3.5 frames per second, which, compared to the original 0.66 is ridiculously smooth!!!


So, those are the basics. There are a bunch of things that I have not explained, like how I am rendering the enemies / items, or how does occlusion testing works (when an enemy is behind a wall, it should not be rendered, and occlusion testing is expensive CPU-wise), but this entry is already getting too long. I also implemented a fancy sprite pattern loading code that dynamically loads and unloads sprites onto the VDP based on what is being rendered that I thought was very cool. But I think I'll leave it here, since the basics are covered!

One of the main obstacles to making it faster is basically the VDP. Not having direct access to change bytes of the video memory is what forces me to have to have the RAM buffer, and then a copy, etc. I think that the same type of game could perhaps run a bit faster on computers such as the Spectrum or the Amstrad CPC, where the video memory is memory mapped. There would be some drawbacks, of course, such as color clash in Spectrum, which might force me to forget about coloring textures at a pixel-level, and force me to use texture-wide colors. Also, in Spectrum or Amstrad I would not be able to use hardware sprites, so, rendering enemies/items would be harder, since it cannot be done independently of the background. Another obstacle is memory. I think that with a bit more RAM I could have added more pre-calculated tables, but this was not an option for the MSX1...

I also think that things could be optimized a bit more on an MSX2 since some of the additional screen modes supported by the MSX2 would not require having to OR a mask to render pixels (right now I need to READ a pixel, OR the mask, and the WRITE it, requiring two memory accesses per pixel rendered! which is not ideal...). 

Ok, so that's it. That's how the raycasting-side of ToP works! If you made it all the way here, congrats!!! Since this one was a long one!! hahaha

Wednesday, September 20, 2017

Procedural Content Generation in XSPelunker (MSX)

A few weeks ago, I finished my second entry for the MSXDev'17 competition, XSpelunker, a roguelike platformer for MSX heavily inspired by the amazing Spelunky, and with some touches from Opera Soft's "Livingstone Supongo".  For those of you who are not familiar with the MSX, it is a family of compatible 8bit computers released in the early 80s. The target computer for XSpelunker is an MSX1 with 16KB of RAM, and an 8bit Z80 CPU at 3.56MHz. Screen resolution is 256x192 pixels, with a color palette of 16 colors. So, do not expect to see Call of Duty here! :)

If you have not seen the game, here's a short video showing the intro and some gameplay:

One of the main features of XSpelunker is that levels are generated using Procedural Content Generation (PCG). So, each time you play a new game, levels are different from the last time you played, and you need to find your way to the exit again. In this blogpost I just wanted to explain how did the PCG algorithm of XSpelunker works, now that it's fresh in my mind! :)

(spoiler alert, it works VERY similarly to how the original Spelunky generates levels)

The key idea is to use a set of "preauthored chunks" and then recombine them randomly to form new levels (this is also how Spelunky generates levels). For example, here are some of the chunks used for the jungle levels of the game:

So, for each type of levels I wanted to have in the game, I just had to author a set of chunks. One problem of combining them randomly, however, is that it might result in levels that contain no valid path from the beginning to the exit of the level! In order to address that, we need to introduce two new concepts: "chunk types" and the "level path".

Chunk Types

In order to let the game generate levels that ensure that there is a path from start to finish of the level, I labeled each chunk with a "type". This type determines things like:
  • Special chunks: originally I had all chunks be 16x16 tiles in size. However, some puzzles I wanted to create required a larger space than that. So, I have some chunks that are 32x16 and others that are 16x16. 32x16 are SPECIAL chunks that are used differently than REGULAR chunks, which are 16x16.
  • Where in the level can this chunk be placed (for example, chunks like the left-hand-side one above only make sense if they are at the bottom part of a level because of the water part). So, chunks can be labeled as: TOP, BOTTOM, or ANYWHERE
  • Layer: levels are generated by stacking two layers of chunks one on top of the other. One layer contains the "background" (the foliage that you see at the top of the jungle levels, or the candles you see in the ruins levels), and the "foreground" (walls, platforms, etc.). I have a few BACKGROUND chunks to create a background, and on top of that, I overlay FOREGROUND chunks that define the level proper.
  • Connections: some chunks allow the character to navigate form left to right, others only allow the character to go up, etc. So, I defined six types of chunks depending on the paths that they allow the character to take:
Also, chunks of a given type might allow more than the necessary paths, but at least they need to guarantee that these paths are supported.

So, for example, the three chunks shown above are (from left to right):

Level Path

Now that we have each chunk classified into a type, the game can now reason about which chunks to use if it wants the character to be able to follow a specific path. So, the first step in level generation is to generate a path. Let us assume that we want to generate a level of size 4x4 chunks. Generating a path involves the following steps:

Now, we know the type of chunks that we can use for each section of the level. Those sections without constraints (marked as "-"), can use any chunk, and those with constraints can only use chunks of the designed type.

Adding Chunks

Background: Once we have the path, the next step is to generate the background. The only constraints that are used are that chunks need to be BACKGROUND, and also TOP and BOTTOM chunks can only be at the top or bottom of the level respectively. After this, we will have the empty canvas of a level, that might look like this (you can see some foliage at the top already nicely formed):

Special Chunks: After that, notice that there might be lots of sections si the level that do not belong to the path (marked as "-" above). Those are sections where we can spawn SPECIAL chunks. So, the next step is to look to see if there are any two side-by-side "-" sections in the level. If there are, then there is a small probability of selecting a SPECIAL chunk to occupy that space.

SPECIAL chunks always contain one of the "kick-ass" items of the game (boots, shield, bow, mask, gun, or belt). Notice that since we are generating them off the path, there is no guarantee that there will actually be a path to get those items. So, if the player wants to get them, she will have to find her own way there!

Regular Chunks: After SPECIAL chunks, REGULAR chunks are put in the map. Every location where no SPECIAL chunk was placed needs to have a REGULAR chunk. The game uses the constraints defined by the path, and selects one chunk at random from all the chunks in the chunk library that satisfy the constraints. At this point the level will look something like this:

Notice that the level does not look very good yet. We see that some tree trunks are cut half way through for example! That will be fixed in the next step.

"Beautifying" Levels 

In this step, the game goes through the level and fixes some aesthetic aspects of the levels. For example, in the jungle, it completes all the tree trunks all the way to the top of the screen. In other sections of the game, there are other elements such as ropes or beams that need to be completed as well.

Another detail that is added in this step is a small marker for the exit of the level (the small yellow arrow).

After beautifying the previous level, it'll look like this:

Which starts looking better, but still looks quite empty, since there are no enemies, items, or anything else!

Spawning Enemies and Items

Some chunks are annotated with places that are good "item spawning" spots. So, at this point, any of those items are added to the map. For example, some chunks have a spot marked as SUPPLIES, which is translated to either a rope, a bomb, a stone or an arrow.

In addition to those, every horizontal surface of the map has a small chance to spawn a rock or an arrow.

For enemies, each enemy has a specialized routine that determines where is it a good spot for spawning it (for example, monkeys only get spawned in tree trunks, piranhas in water, bee nests hanging form a wall, etc.).

After enemies have been spawned, the game checks to see if a minimum number of enemies has been spawned (since the randomness of the process might result in levels with very few enemies). If not enough enemies were spawned, this step is restarted until enough enemies are generated.

Once the level is generated, it is ready for use in the game! A small routine goes over the map identifying those tiles that need an animation (water, fire, etc.) and adds the necessary data structures. After that the game starts!

The Different Areas of the Game

The above process explains how jungle levels get generated. To generate levels for the other two sections of the game, some steps are slightly modified in order to slowly increase the difficulty.

For example, in the "Ruins" levels (2-1, 2-2, 2-3 and 2-4), one of the chunks of TYPE 5 is actually missing a connection (on the top-right). So, whenever that chunk is spawned there is a chance that there is no direct path to the exit. So, the player will have to use ropes/bombs to create a path.

In the final levels (3-1, 3-2, 3-3, and 3-4) there is one final twist: in addition to having a chunk missing a connection, the bottom part of the levels gets filled with water. So, if the path happened to go through there, then the player will have to either go through water, or find a different path. This might result in some pretty hard levels. But it usually is not a problem, since, if the player makes it to levels 3-*, usually she is loaded with bombs/ropes or even might have the diving mask, with which the last levels are much easier.

The last level (3-4) in particular has a very particular shape (if you made it there, you'll know!), and unless you are prepared, can be VERY difficult. But, once you know what to expect, it's really not that hard. But I won't say more, since otherwise, I'll spoil the fun of figuring out how to beat the game! hahaha

So, that's pretty much it at a high level. That's how levels are generated (not much different than the way Spelunky does it!). So, no need to read more if you just wanted to have a high-level idea. If you want to know a bit more of the implementation details, then you should keep reading :)

MSX Implementation Details, Memory Usage, etc.

Let me start by explaining what is the usual tool chain that I use for creating MSX games. I created a diagram that I thought would explain it better, but after I drew it, I am now unsure if it helps. But in any case here it is:

So, the bottom line is:
- I draw graphics (e.g. tilesets) with GIMP
- I then load the tilesets with TILED to author level chunks, which get saved to TMX files (which are xml files).
- I then use a small Java script to translate these TMX files to the binary file I use in the game
- These binary files are then compressed using Pletter, and built into the ROM compiled with Glass.

Also, a few times I've used the OpenMSX Debugger, but I'm still not very familiar with it, so, I only use it when I'm desperate! But anyway, I would be really curious to hear which development tool chains other people use! :)

Level Chunks: All right, now that we have that out of the way, let's get back to PCG for XSpelunker... Consider the Jungle level generation process. For this, I defined exactly 26 chunks (you can see them in github in the data/rooms-jungle folder). When converted to assembler, a chunk looks like this:

So, when compiled into the ROM these 26 chunks would use a whopping 7272 bytes!

Remember that I wanted to fit it all in a 32KB ROM, so, if just chunks for the jungle use this much, if we add the chunks for the ruins, and those for the inner ruins, that'll be about 21KB just in chunks, leaving only 11KB for the rest of the game... that was not acceptable.

Compression: An obvious way to reduce space is to use some form of compression (e.g., what you'd do with a ZIP file in your modern computer). In my first game, naively I just implemented my own "compression" code (using a simple Run Length Encoding scheme, which is not technically even considered compression, but "encoding"). But folks at the forums recommended me to use a better compressor, such as Pletter. So, that's what I am using now.

I considered two separate options for compressing the chunks:
  1. Putting ALL the level chunks into a single binary file, and compressing it.
  2. Compressing each level chunk separately.
The advantage of the first is that I would save more space (since, it's well known that sizeof(compressed(A+B)) <= sizeof(compressed(A)) + sizeof(compressed(B))). The problem is that then to decompress it when the game runs, I would need to have 7272 bytes of free RAM, which I do not have...

So, I had to go with the second alternative, compressing each chunk separately. This uses more space in the ROM, but I do not need to decompress all the chunks at once, and thus worked better for XSpelunker.

(beginning of off-topic paragraph)
Finally, about compression, I just wanted to say that it is very easy for us in 2017 to use compression and pack a lot of features into a very small cartridge, compared to what the guys in the 80s did. But we must have in mind that compression algorithms are quite recent: the common LZW algorithm is from 1984, and DEFLATE (usually used in the ZIP format) is from the early 1990s... So, clearly game developers from the 80s did not have easy access to these technologies! There were compression algorithms dating back to the late 70s, but I am not sure that (given there was no Internet back then) it was easy for game developers to actually know about them. So, all of this is just to say that when we compre modern homebrew MSX games, I would not compare them to older games, since the comparison is unfair: we have way more resources nowadays than they had. So, creating an MSX game in 2017 is much easier than it was back in the 1980s...
(end of off-topic paragraph)

Chunk Simplification: However, even after compressing each chunk, that was still using too much space. The problem is that chunks contain complex patterns that do not compress very well. So, the solution was to simplify how chunks are stored. What I did was this: instead of authoring a chunk exactly as it would look in the game, I would just author a "skeleton of the chunk" that would be enough for then reconstructing the chunk in the game. I think it's better explained with an example.

Here you can see a chunk how I want it to look in the game (left), and how I ended up authoring it (right):

As you can see, what I have done is this:
  1. Since there is a piece of code that completes all the partial tree trunks anyway, I only need to add the bottom of a tree trunk. No need to have the rest.
  2. The solid rock part contains many different types of tiles (tiles with grass, edge, tiles, etc.). What I did was to just use a single type of rock tile, and then write a small assembler routine that would replace those by the proper tiles. The routine was smaller than the amount of space I saved by this, so it was worth it.
The result are chunks that are use less space after they are compressed! So, after all of this, the 26 chunks that originally were 7272 bytes, ended up occupying only 1453 bytes in the ROM. Which is more acceptable!

Level Progression: Once everything was setup, the level generation function takes in a set of parameters: size of a level, probability of appearance of each enemy type, which level chunks are available, minimum number of enemies, etc. So, in order to create a level progression, I defined the following table:

where, for each level, I specify its parameters. For example, you can see that level 1, is a small "64x32" level (i.e. 4x2 chunks), where only the enemy types 1, 3 and 5 can be spawned (pinecones, piranhas and scorpions), and where at least there has to be 4 enemies.  Level two is larger (64x64), but has the same enemy types, and in levels 3 and 4, more enemy types can be spawned.

In that way, adding a new level to the game requires actually only 17 extra bytes. So, I went for 12 levels just for not making the game repetitive. But adding more levels is not a problem at all!

So, anyway, there you go! That's how the procedural level generation in XSpelunker works! It's not rocket science, but it was fun to create it, finetune all the parameters, level chunks, etc. so that the difficulty level was what I was going for!

This was a long post, but I just wanted to write it now when it's still fresh in my mind, before I completely forgot about how I did it, hahaha

And btw, if you are interested in PCG in general, you could check the book that some good colleagues wrote a few years ago: or if you want to see experiments and cool things, head over to the website!

Saturday, September 16, 2017

Making MSX Games

For as far as I can remember, making games or computer games have been a very important part of my life. I learned how to code very early in my life. I think I was a bit over 10 years old when I started copying BASIC programs from the computer manual onto my first computer, a ZX Spectrum +2. Of course at that age I barely understood some of the key concepts, which only became clear later on, but I could write simple programs that I called "games". A year later (around 1988), we acquired an MSX computer (a Philips VG8020), and I became obsessed with programming. I remember my parents would convince me to go out of the house with the promise of buying me a "new programming book" (I own dozens, but a few that I scanned can be found here).

So, I programmed games on my MSX, then on the Amiga 500 (using a BASIC-like language called AMOS), and then on PC. I remember I took a C++ programming book to a summer camp when I was 15, which I almost completely read there, and I could not wait to get back home to start coding!

But in any case, anyone who knows me knows that if you ever asked me "what are you up to these days?", I would always answer something like "Oh, I am making this new game that...". I always was thinking of a new idea for a new game. But of course, since that was before the Internet, I've lost almost all of the early games I did. However, making games came to a stop around 2007 when work became too intense and I could not find time to dedicate to coding games...

About a year ago, I was so overwhelmed with work (trying to get tenure) that I needed a way out. So, I decided that I would dedicate at least a half an hour a day to code games again. Looking around I saw that the "new thing" in the retro-games community was to actually code games for old 8bit computers, even releasing them in physical format, with box, cover art, etc. as if we were back in the 80s! That sounded extremely appealing to me and I decided to learn how to code for one of my favorite retro computers, the MSX

Although I knew it had been there for many years, I finally became active in the MSX community once again (even after about 35 years after the MSX computer was released, the forums at are still super active, with new posts every day!). When thinking about which game I would make, the choice was pretty obvious, since one of the games that I have coded again and again is a little game inspired by the classic Thrust arcade called Transball. I coded the first version of Transball for MS-DOS in about 1995, and have remade it since many times (Transball 2, Super Transball 2, Transball GL). So, I set out to create Transball on the MSX!

The MSX computer was a very powerful machine back in the day, but compared to modern computers is extremely limited. It features a Zilog Z80 CPU at 3.58 MHz (which is basically an extension of an intel 8080), 16KB - 64KB RAM (the one I owned had 64KB), and a screen resolution of 256x192 pixels with 16 colors. Games were loaded either using cartridges or using tapes (I swear I must have had in the order of about 100 tapes with games back in the day!). When I was thinking about it, just the title image of Super Transball 2 was larger than 64KB, so, how the hell can we fit a complete game in that space! (and leave some RAM space to actually run the game!). Later MSX machines (MSX2, MSX2+, etc.) were much more powerful, but I was going for the original MSX1!

Basically, the answer is that to make any decent game, it has to be coded at a very low-level, directly in assembler. So, I set out to learn Z80 assembler! At the beginning I thought that was crazy. I had heard that the Z80 wasn't even able to multiply! it could just add and subtract. So, If you wanted to multiply or divide, you would need to write your own routines! It sounded daunting initially, but it turned out that that was an advantage! The assembler language of the Z80 is so limited that it can be learned very quickly!

After trying different assemblers, and programming environments, I settled for just editing using Sublime Text, and compiling using the Glass Z80 compiler, which is a cross-platform Z80 compiler. The most complicated thing was how to start! How does one write the simplest possible assembler program for an MSX, compile it, and load it into a real MSX, or an MSX emulator to run it!

After searching for many alternatives, I settled for creating my game by compiling it into a ROM file (which is the emulator equivalent of a cartridge). A ROM file is literally just a file containing the bytes that would be stored in a cartridge that you would plug to your MSX. So, after seeing this example online of a "hello world" in assembler for the MSX, I figured out how to compile it, and run it in an emulator (I use OpenMSX, which is by far the best MSX emulator; some people like BlueMSX since it looks easier to use, but it's not been updated for years, and it's not as accurate as OpenMSX...):

CHPUT: equ 00a2h
    org 04000H
    db "AB" ;   ROM signature
    dw Execute  ; start address
    db 0,0,0,0,0,0,0,0,0,0,0,0
    ld hl,helloWorld
    ld a,(hl)
    and a
    jr z,Done
    call CHPUT  ; call the MSX BIOS to print a character on screen
    inc hl
    jr Loop

    db "Hello world!",0
    ds 8000h - $

If you want to try it, just type that into a text file, call it "hello-rom.asm", for example, then download the Glass compiler, and from the command line type:

java -jar Glass.jar hello-rom.asm hello-rom.rom

Open the generated hello-rom.rom file in OpenMSX, and there you go! you will see that in the screen the text "Hello World!" appears right below the MSX intro text. Exciting! Of course that call to the BIOS is ugly, and you are not going to get anywhere by drawing graphics calling the BIOS (if you want any speed, you need to do things a the lowest level, controlling every byte that is moved around), but it's a start!

Over the next few weeks I started learning all the stuff necessary for coding games for the MSX. How does the graphic system of the MSX work? How does one produce sound? How about input from the player? reading the joystick? etc. It was a long learning curve, that took several months, but before I knew it, I had it! I had a version of Transball running on a real MSX!!

The experience was amazing! Programming a game at such a low level in such a limited machine is like building a puzzle. When creating a game for PC, in my experience, the game is never finished! There is always one more thing that you want to add. On an MSX that is not the case: once you run out of memory, that's it. Game done. Adding one little visual effect there or one more sound effect is out of the question! And the memory fills up VERY fast! So, you basically have to figure out of how to fit everything you want to fit into the tiny amount of memory that is available.

For example, maps in Transball are usually 64x64 tiles in size, and each tile is stored as one byte in memory. So, to store a level, that's 4KB right there! I wanted to have 16 levels in the game,  so that would have been already 16*4KB = 64KB! The whole memory space of the computer for maps... However, the first 16KB of the memory is reserved for the BIOS, so that leaves you with 48KB only, and you also need to leave some free RAM to run the game. Usually the top 16KB are reserved for this, leaving you only with the middle 32KB of memory to store your game. So, 16 maps was not looking possible...

But, luckily, there are many tricks we can use to save memory space. And one of them is using compression (e.g., like when you compress a file as a ZIP). I used a very simple form of compression (which is technically not even considered compression in computer science, but "encoding"), called "run-length encoding", and with that I could reduce the space required to store all the maps to a reasonable amount.

The next problem were the graphics. You see a lot of people talking about "pixel art" on the Internet, thinking that the graphics they are creating are "like those in a NES console", or like a "ZX Spectrum". But when you actually need to draw graphics in the real thing (in this case on an MSX) you realize that it's much more complicated than using low-resolution and few colors. The MSX computer (and all computers of the 80s) had such a limited amount of memory for storing the graphics, that even if it can display 16 colors in the screen, you cannot arbitrarily choose the color of each pixel. In particular, in the "Screen 2" graphics mode, the MSX can only display 2 colors per each block of 8x1 pixels (and that's the most powerful one!). So, that turned out to be a huge constraint (and all the graphic artists of the 80s gained a renewed respect from me when I tried to recreate what they were doing back then!). And that is just one of the limitations, another is the limited number of "patterns" that can be stored at once in video memory, or the limited number of "hardware sprites" that can be displayed at once... but I won't get into the details!

And I will not get into how to create sounds for the MSX! Forget about samples! I will just say that to create a sound effect, you need to figure out the series of commands that you will want to send to a series of hardware oscillators and a white noise generator so that when they all act in combination, what comes out is what you expect! A lot of fun, but pretty hard at first! 

Since I was doing this for fun, I did not advertise my game at all until the day it was complete. When I had a fully-playable, reasonably tested version, I posted it on the MSX forums, just to see if people would play it, and that was the beginning of a ride!! I got a flood of messages with feedback, feature and bug requests, some people even went over the code and helped me optimize parts of it to save space for new features! The final product was a decent game (for being my first on MSX!) that looked like this:

Of course, graphics are terrible, and there is no in-game music, but the game is a lot of fun once you master the controls of the ship! (being a Thrust-like game, it's pretty hard to learn at the beginning if you have not played Thrust before!). Also, think that this is running on a computer that was released in 1983! 

A few weeks after that, I was contacted by the people of repro-factory to create a physical edition of the game (and that was something I wasn't expecting!). An amazing artist (Cesar Rincón Nadal) created a cover art, and the final product looked like this:

Which is amazing if you ask me!!!

So, there you go, that's the story of how I decided to code games for the MSX once again, after nearly 30 years! After Transball, I've now completed two more games for the MSX: Tales of Popolon and XSpelunker. Of which Tales of Popolon also has an amazing physical edition created by Matra, and with cover from Sergio Cabanillas! I am now considering which will be my next "retro game" project, another MSX game? an MSX2 game? perhaps a Spectrum game? who knows! :)

All of my MSX games are open-source, and can be found on my github page:

Next time I would like to tell you about how to Tales of Popolon and XSpelunker work under the hood!