Compression

From Stunts Wiki

Stunts and its ports uses multiple compression formats for code and resource files to save disk space and, more importantly, the number of floppy disks used to distribute the game.

Formats by game version

Game version DSI packing EAC packing RPck
RLE Huffman
Supported Prefix table Code endianness Huffman BPE
MS DOS BB 1.0 Game X X Big
Loader X X Little
BB 1.1/MS 1.1 Game X X X Little
Loader X X Little
Amiga X X X
PC98 X
FM Towns

DSI packing

The original release of Stunts for MS DOS uses a custom data compression format not found in the ports to other platforms. The code may have been hand-crafted for 16-bit x86 since other options were favoured in the later ports. The format does not have any identification bytes, so we will refer to it as DSI packing as it appeared in several DSI games in the late 80's and early 90's.

The data format supports both run-length encoding (RLE) and Huffman coding. They can be combined in multiple passes. The data format and the decompression code don't require any particular order or combination of algorithms. Every compressed file shipped with the game use Huffman coding, and when combined with RLE, the RLE pass is applied before Huffman.

File header

enum CompressionType {
    RLE     = 1
    Huffman = 2
}

struct Header {
    union {
        uint8 passes : 7
        uint8 compression_type : 7
    }
    uint8  multi_pass : 1
    uint32 uncompressed_size : 24
}


If the most significant bit of the first byte of the file is set, this is a multi-pass file and the remaining 7 bits are the number of passes to be processed recursively. If the multi_pass flag is not set, we are only doing a single pass and the first byte is the compression type. The next three bytes make up the uncompressed size. If the multi_pass flag was set, this is the final uncompressed size. When the multi_pass flag is not set, this is the uncompressed size of the current pass. The remaining data is processed recursively. The compression_type field decides how it is decoded.

Run-length encoding

struct RLE {
    uint32 compressed_size : 24
    uint8  reserved // Always 0x00, value is ignored by the game
    uint8  escape_length : 7
    uint8  no_sequence : 1
    uint8  escape_codes[escape_length]

    uint8  data[...]
}

RLE is used to reduce the length of consecutively repeated bytes by using an escape code followed by a counter to signify how many times the next byte should be repeated. It is particularly useful for compressing images, as large areas with a single colour would be represented by many consecutive identical bytes.

The RLE stage can itself have two passes if no_sequence is not set. The first pass will then expand sequences of bytes in the data. escape_codes[1] will mark the beginning and the end of a byte sequence. When the escape code is encountered in the data, the following bytes are considered a sequence until the escape code is hit again. After the end-marking escape code, a one byte length counter tells how many times the preceding byte sequence should be repeated in the output buffer. Because the escape code cannot itself be escaped, sequential byte runs can only be applied in files that have unused byte values to use as the first escape code.

The main RLE pass is a traditional expansion of single-byte runs, but with a twist that allows for multiple types of escape codes. The escape code's position in the header's escape code list affects its properties. When the first escape code is encountered in the source buffer, the following byte has the number of repetitions, and the next byte will be copied that many times to the output buffer. When the third escape code is encountered, the number of repetitions is stored in the following two bytes as a 16-bit integer. Then the following byte is copied that many times. All other escape codes use its position in the escape code list as the number of repetitions, and have no additional counter bytes before the data byte to be repeated. This allows the second escape code reserved for sequences to serve as a no-op without the two passes interfering. Escape codes can themselves be escaped in single-byte runs by having them expanded to only one copy.

All repetition counters have an off-by-one surplus, as the data byte itself counts as the first copy. For optimal compression, escape codes should be chosen from byte values that occur least frequently, or not at all, in the source data.

RLE decoding optimisation

Escape codes are put in an escape[256] array, where the code's byte value is the index and the escape code's position is the entry's value. This array is used to test incoming bytes for escape codes. Raw bytes will have the value 0, while other values corresponds to the escape code's property. This optimisation makes the decoding run O(1), as opposed to O(n), where n is the number of escape codes.

Huffman coding

struct Huffman {
    uint8 tree_levels : 7
    uint8 delta_coding : 1
    uint8 symbols_per_level[tree_levels]
    uint8 alphabet[SUM(symbols_per_level)]

    uint16 codes : var [...]
}


Huffman coding uses a binary tree to store data. When Huffman codes are read, each bit tells which branch to take when traversing the tree. If a leaf node is hit, we have found the code's symbol and can proceed with the next code. Frequently used symbols have shorter paths and thus takes up fewer bits in the source file.

The tree is stored by having two lists, one with symbols per level, and one with the alphabet. The alphabet is every possible symbol in the order they appear in the tree. The tree is built by going through each symbols_per_level. If there are symbols at the current level, consume them from alphabet and create leaf nodes. If there are no symbols, and there still are more levels to create, proceed to create left and right children nodes for the next level. The codes are arranged canonically, meaning that the symbols in the alphabet are sorted by code length and symbol value. This makes it possible to decode the data without constructing and traversing the tree structure, instead relying on an offset table.

The remaining data is the Huffman codes. Bit by bit is read to find leaf nodes in the Huffman tree whose symbols are appended to the output buffer. This is repeated until the output buffer has reached uncompressed_size.

The maximum Huffman tree height is 16. While the data format allows for up to 127 levels, Stunts' decompression function uses arrays with a fixed capacity of 16 for its level offset table.

Delta coding

If the flag for delta coding is set, each symbol is treated as the difference from the previous output byte instead of an independent final value. The last byte written to the output buffer is remembered and added to the next expanded symbol. While regular Huffman coding assigns shorter codes to frequently occurring values, this delta variant assigns shorter codes to differences that occurs frequently. This feature is not used by any shipped files, but the code is present in all versions that implements Huffman decoding. One can speculate that delta coding was outperformed by RLE combined with regular Huffman coding, making delta coding unnecessary.

Huffman decoding optimisations

Prefix table

With Huffman codes being paths in a binary tree, they have the property of being prefix-free, meaning that one leaf node's code cannot be the prefix of another node's code. This allows for a decoding optimisation technique known as prefix tables or Huffman tables. By creating arrays of symbols and code widths where the indices are the Huffman codes, the codes can be looked up directly without traversing the tree. Every array index for non-leaf nodes between two leaf node entries are filled with data for the preceding leaf node. This way there's no need to filter out bits that belong to the next Huffman code, at the expense of common symbols taking up multiple entries in the lookup table during decompression.

Stunts utilises this optimisation for Huffman codes that are up to 8 bits wide. The first byte of the code is checked in the widths[256] array. If the code's width is 8 bits or less, the same code is looked up in the symbols[256] array and written to the output buffer. To find the start of the next Huffman code, the source buffer is advanced by the code's bit width found in the widths array.

Offset table

As Stunts limits the prefix tables to 8 bit wide Huffman codes to save RAM, another optimisation is applied for codes that exceeds 8 bits. Prefixes for longer codes have the value 0x40 in the widths array, indicating that the current code is a prefix that needs further expansion. Benefiting from the canonical nature of the codebook, Stunts uses two additional helper arrays: total_codes[tree_levels], which has the total number of leaf nodes that exists in the tree up to each level, and code_offsets[tree_levels], which for each level has a value that is the level's offset in the alphabet minus the tree's total capacity up to each level, and thus can be added to a code with the same level in order to get its index in the alphabet. As a code's bit width corresponds to the code's level in the Huffman tree, the width can be used to compare the current code with the level's total by getting total_codes[current_width] for each extra bit read. If the code is smaller than the total number of codes encountered, it is a leaf node, and we get the symbol by adding the level's code offset to the code with alphabet[code + code_offsets[width]]. This is more laborious than the use of prefix tables for direct lookup, but it only applies to codes that are occurring less frequently in the input file, and it is still far more efficient than traversing a binary tree.

Version differences

The Huffman decompression in the game code of Stunts 1.0 differs from the one in its LOAD.EXE and later versions of the game. The Huffman codes are written with the least significant bit first, as opposed to the other versions that has the codes written with the opposite orientation. The game code of Stunts 1.0 also does not use a prefix table, and the bit-stream orientation was likely changed with the introduction of this optimisation.

EAC packing

Pixel graphics in the PC98 port and non-3d shape resource files in the Amiga port uses another in-house format that is characterized by the identification byte 0xFB, assumed to be the initials of its author, Frank Barchard. This format supports a suite of different compression methods and it evolved through the 90's and early 2000's as a part of EA Canada's internal engine libraries. Whereas later games tended to use the LZSS-based RefPack algorithm, the Stunts ports uses other methods. We will therefore refer to these compressions as EAC packing to distinguish them from the ubiquitous RefPack.

This is the currently assumed header structure:

struct Header {
    uint8  method_flags : 3
    uint8  method_type  : 4
    uint8  32bit_length : 1 // Not used by Stunts
    uint8  magic // Always 0xFB
    uint32 uncompressed_size : 24

    // Possibly more fields if indicated by method_flags

    uint8  data[...]
}

Below are the values of the first byte in the shipped files. Values from the DOS version of Mickey's ABC's: A Day at the Fair are included, as this may be the first public incarnation of Barchard compression. The bitfield suspected to be the method is highlighted in red, the method flags in green.

Game Hex Bits
Mickey's ABC BPE 0x44 01000100
Stunts Amiga Huffman 0x26 00100110
Stunts Amiga Huffman 0x27 00100111
Stunts Amiga BPE 0x42 01000010
Stunts PC98 BPE 0x46 01000110

The game code appears to support several other methods that are not used by the shipped files.

Method Hex Bits Amiga PC98 Mickey
0x4 Huffman coding 0x26 00100110 X
0x27 00100111 X
0x8 Byte pair encoding 0x40 01000000 X X
0x41 01000001 X X
0x42 01000010 X X
0x44 01000100 X
0x45 01000101 X
0x46 01000110 X
0x47 01000111 X
0xC Obfuscation 0x60 01100000 X X X
0x62 01100010 X X X
0x66 01100100 X X X
0xD Store 0x6B 01101011 X X

0x4 Huffman coding

Similar to, but incompatible with the Huffman coding used in later EA games as method 0x6 and documented by Martin Korth.

0x8 Byte pair encoding

Byte pair encoding replaces frequently occurring byte pairs with a single byte that is not used elsewhere in the data. The process is repeated recursively until there are no more repeated byte pairs or available code bytes. Appears to be the same format as documented by Martin Korth.

0xC Obfuscation

Very simple data obfuscation scheme where each byte is subtracted from the previous. The process is reversed by adding running bytes as they are copied.

0xD Store

Data is embedded without any transformations.

RPck

3-d shapes in the Amiga port are compressed with RPck, a simple byte-oriented RLE-based format also found in earlier Amiga games from DSI. Compression ratio is inferior to EAC's BPE or Huffman compression. RPck may have been kept for performance reasons or deadline constraints.

// Big endian byte ordering.
struct Header {
    char   magic[4] // "RPck" or "Rpck"
    uint32 uncompressed_size
    uint32 bytes_saved

    uint8  data[...]
}

After the header comes a one-byte control code treated as a signed integer. If the value is negative, it is negated and this number of bytes is read and copied directly to the output. Since the maximum length is 128, another control code is needed if there are more than 128 consecutive bytes without repetition. If the control byte value is positive, the following byte is copied to the output and then duplicated the number of times specified in the control byte. This is repeated until uncompressed_size is reached or there is no more data in the source buffer.

The game is also shipped with two files, stdapmin.psh and stdbaudi.psh, which appears to be wrapped in another variant of this format. The header is "PPkc" [sic.] and the contents are stored without compression, but the length header is stripped from the wrapped resource files. Yet another possibly related format appears in the DSI game Test Drive with the header "Pckd".

FM Towns

The FM Towns and FM Towns Marty ports of the game were distributed on CD-ROM with ample storage and does not apply any compression to the game resource files.