The other day I was trying to play an old MS-DOS game, Mutan Zone from Opera Soft. Despite being terribly hard (and requiring you to do pixel-perfect jumps), it was one of the games I owned for the old AMSTRAD PC/W back in the eighties and nostalgia hit me, so I played a few times... and decided maybe would be great to try and extract some of those fancy graphics. I lately get bored more easily with the games themselves and instead like to thinker with their internals.
This the game title screen, the (at first unkown) goal I'd end up achieving. At older systems also called "loading screen", because would be what you'd see while the game loaded into memory:
Before digging into code, I searched online for graphic formats, and while there was a .PIC file format, and was used by old painting programs, it had a very noticeable header (
01234h) which wasn't the case with the game .PIC files.
My first naive approach was to assume 1 byte had one pixel (typical for MS-DOS games, as a VGA card could display 256 colors). I hex-analyzed some PIC files (both from Mutan Zone and Abadia Del Crimen), and they didn't seemed to have anything strange, other than lots of byte repetitions, so my bet was that the file had no compression. Also, I knew from my small retro gaming knowledge a few facts and guesses:
- Games were written in Assembler, and ports to other computers were very frequent (many times by the same company)
- There were almost no tools so building an image editor that converted to different formats would be already a hard for them
- Space was an issue but at RAM too, and CPU was a big issue too, so compressing graphics would add more complexity than value
Based on this, I tried to simply dump the bytes, one at a time, in 8x8 sprites, using the pixel byte value as the green component of a RGB PNG. The results were... nothing recognizable. Tried 8x16 and 16x16, but no visible patterns resembling anything.
I built a crappy ASCII dumper that put
@ everywhere there wasn't a zero, and toyed around with the first chunks of bytes. what I did found was that, instead of resembling sprites, the pixels would make more sense arranged horizontally, in a single row... and then I had this eureka moment: The PIC file might just be storing first the loading screen! It could also might be storing some metadata as a header, but didn't looked as such (lots of zeroes, atypical in header info) but... Would try first to just dump all content as a 320x200 image.
So I grabbed a screenshot of the title screen:
And comparing the "black & white" dump vs the loading screen, there were similarities but still didn't matched the first row of pixels...
Then, I decided to check the graphics mode for hints. In CGA you only use 4 colors, and the game ones were from a standard palette (0 black, 1 cyan, 2 magenta, 3 white)... so converting to binary we just need two bits to store the chosen color of a pixel:
00 -> black
01 -> cyan
10 -> magenta
11 -> white
I also thought that if I were to build an image editor for that era, graphics and memory, I'd squeeze 4 pixels per byte. Analysing the size of PIC files from another game from the company, Sol Negro, I found out that the EGA version files was double the size than CGA ones... so bits-per-pixel (
bpp) were "in use" instead of just using a full byte per pixel. Also, it kind of confirmed my guess that PIC files weren't compressed (else size would differ but being almost exactly twice... was suspicious).
The first bytes of the file in binary were:
00110000 00000000 11000000 11000000
Which, if you count in pairs, match the pixels at the first row of the title screen:
With that in mind, I changed my code to keep reading one byte at a time, but operate with pairs of bits.
Early experiment of splitting a byte into 4 pixels / 2bpp. Color wasn't extracted properly, but I was getting somewhere:
Maybe I was doing something wrong (although it looked straightforward), so I decided to check how vigasoco project was reading the Abadia del Crimen data and placing pixels at the screen.
Copy & pasting the pixel unpacking method improved but didn't fixed the bug:
So I went back to my code and tried with Abadia del Crimen:
Someting was surely wrong, so in the end, I did the simplest boolean algebra logic I could to be 100% sure I was masking and shifting and grabbing the correct pixels:
# 00000011 1 + 2 = 3
# 00001100 8 + 4 = 12
# 00110000 16 + 32 = 48
# 11000000 128 + 64 = 192
if pixel == 0:
return (data & 192) >> 6
elif pixel == 1:
return (data & 48) >> 4
elif pixel == 2:
return (data & 12) >> 2
return (data & 3)
And voila! it worked and was reading every pixel right... or maybe not:
Uhm.. a half-size image with weird cuts below... this looked like some kind of interlacing, so I assumed I was reading first all even rows and then all odd ones:
Almost there, but there were black lines. Going back to the hex PIC data, I saw that between the last "even row" and the first "odd row", there were 192 bytes/768 pixels of zeroes. I still have to find why that padding zeroes, but I simply skipped them and tried again:
Finally! A pixel-perfect PNG dump of the game's title screen.
I've uploaded all the code to my GitHub so, despite being small, better go there if you wish to see all the details. It is a simple Python script that reads bytes, operates with them and saves (using PIL) the data into a PNG, but I want to keep it at hand for the future if I need to do again bitwise operations, bit-representations of integers and the like.
Considering that I almost got before the Abadia del Crimen title screen, I decided to try a few more PIC files from other Opera Soft games I knew... and as long as they are CGA-based, it works perfectly:
I've been able to convert a PIC file, but it seems game sprites and backgronds don't live there. I've already done some initial peeking and the
OVL files (present at most DOS games from Opera Soft) they include inside at least a COM executable (newer games like Sol Negro include a DOS EXE binary, with their identifiable
MZ header), so each level runs independenty as a separate binary... but at least COM files have no clear separation between code instructions and data, so it'll require more work.
This is why also the Python script is so specific for title screens. Until I figure out where and how are sprites stored, it doesn't makes sense for now to make more generic the extractor (and maybe I'll even duplicate the code and keep the tile one intact).
Update: Added .PIC research (interesting but not critical) and corrected typo.