Kartones Blog

Be the change you wanna see in this world

Course Review: Apache Kafka Series - Learn Apache Kafka for Beginners (Udemy)

Doing some research for a work-related task, I needed to learn the basics of Kafka, so I tried a visual approach instead of just reading some tutorials or a book (which takes more than a few hours). Learn Apache Kafka for Beginners is an introductory course that lasts between 3 and 4 hours, depending on if you use Java, Scala, Akka or similar or not, as you will skip the corresponding videos demoing basic consumer-producer examples.

You will learn the basics of Kafka: How it works, producers, consumers, topics, partitions, offsets, segments, basic setup and configuration (including a handy Docker small cluster for local development), and some tips and advices regarding how to properly configure everything. Considering that it also has coding demos, while a small course the content is fine and very well summarized and presented.

There are other more advanced courses (which I'll probably take), regarding Streams, Connectors and other non-trivial areas.

An insightful introduction, recommended.

Course Review: Master 120 common phrasal verbs (Udemy)

More english study "results", another course done at Udemy, this time "Master 120 common phrasal verbs".

As the course title suggests, you get to learn 120 common phrasal verbs, not more, not less. How to use them in conversations, how they're written, and some exercises.

I liked about the course that you get to see the typical way you'd thought the sentence is written, then it gets corrected to the phrasal verb version. Some were common but others I didn't knew about them so I'm glad I did this (short) course.

Course Review: Adam Grant on Developing Original Ideas (Udemy)

I did this Udemy course as a suggestion from a work mailing list, as I didn't knew who was Adam Grant before, nor I had any urgency on trying to have better ideas... but was a short one so... why not?

This time I'll leave the contents description to the course webpage, instead just leaving the notes I took while watching it. It was interesting because it breaks some very common legends and misconceptions regarding entrepreneurs, CEOs and the like.

  • originality = creativity + change
  • originality it is not something you're born with
  • question the default
  • volume and variety of ideas, not only eureka moments
  • break your frame and try to learn other domains
  • working in different things
  • idea selection is hard
  • can't judge your own ideas (too positive, biased). managers neither (too negative, rely too much on intuition). peers are the best judges. seek feedback from peers often.
  • past success doesn't guarantees future success
  • openly explain downsides of your ideas
  • frame ideas to known/familiar concepts
  • fit the job to the person

Mutan Zone sprite exporter WIP

I'm having so much fun trying to reverse engineer and extract the graphics, that after being able to export the title screens from Mutan Zone Opera Soft game, now I want to get the sprites.

This is far from easy as they are inside the binaries, and the game has 3 of them:

  • MUTAN_Z.EXE looks like the only one, but this tiny 2KB executable just does some bootstrapping: Allocates memory, sets the video mode, loads corresponding COM code in memory (more about this in next item) and executes it. It also goes back to text mode and displays error messages if something failed during the bootstrapping.
  • MUTAN_Z1.OVL Looks like some data file, but actually if you rename it to .COM it shows the lower menu (but doesn't do anything else). It is the actual first level of the game as its own executable.
  • MUTAN_Z2.OVL Similar to previous one, contains a .COM with the second game level, in this case perfectly booting at least the maze-mini game before the main level.

I already had renamed the .OVL files after checking their contents and other older Opera Soft games (which were always self-contained .COM files), but decided to went on and dissassemble the .EXE file to confirm it.

First, I installed Reko under Windows 7 using one of my Virtual Machines. It generates a decent Assembler code (comments are mine):

    mov ax,0013     ;; Set Video Mode 0x13
    int 10          ;; Video Services
    mov si,04A7
    jmp 026A

But to my surprise, it also generates some rough C code automatically translating for you some of the interrupt calls:

else if (ds->b061A != 0x03)
    bios_video_set_block_of_DAC_registers(0x00, 0x10, cs, 1191);

As the full assembler is just around 400 lines of code (including some data byte blocks), I ended up refreshing some of my assembler and commenting all of the interrupt calls and some code fragments. It is great to see that now you can find even at the Wikipedia lists of MSDOS API/interrupts (int 21h) and BIOS interrupts. I mostly confirmed that the .COM contents are loaded in memory and apparently then executed (have to confirm but probably the program instruction counter etc. are just updated to there, at least is what I'd do).

After this, I decided to try with the first level's .COM file, but Reko didn't liked it, so I searched for alternatives. In the end, I discovered that the acclaimed IDA Pro has not only a freeware version for non-commercial usage, but also that it has Linux binaries and, even better, that any version can understand most MS-DOS and Windows executable formats!

IDA Pro in action

Note the visual minimap of the contents in the upper part, with lots of blue and brown (instructions and similar), but also quite some grey (byte data). Those smelled interesting, so I extracted (as "raw bytes") the blocks to separate files.

I grabbed my existing PIC exporter and run it with the files... and no visible patterns. Then I thought... why the heck would anybody interlace sprite rows (as with the title screen), too much complication for tiny 8x8 or 8x16 blocks... so I changed the code to read sequentially all rows, also added some tiny changes to generate N files when read content surpasses the specified "sprite size", and checked again. Some of the data blocks were still not fully recognizable, but I'm using the following MS-DOS screenshot as the guide to hunt for sprites and I detected some color patterns:

MS-DOS in-game screenshot

I ran it through all segments... and one of them had displaced but visible numbers from the menu and letters from the intro text:

Found something... letters and numbers

Checking the pixels this is not an interlacing problem, but a mere offset/displacement issue. Adding to the code the option to skip the first X pixels (remember from my previous post, 1 byte == 4 CGA 2 bit pixels), I fixed it:

Letters and numbers correctly rendered

I'm not an expert in pixel art, but checking other files coudn't recognize any fragments of enemies, scenery or the player... but I did saw that one block had half glibberish half clearly recognizable the game frame/menu rightmost part:

Right part of the game menu/frame

So, some pixels on the same data block are clearly 8x8, while others are still broken because are not so small. But at least I'm on the right path regarding displaying data. My current bet is that the code contains which offsets contain 8x8 sprites, which 8x16 (or whatever size the entities have) and which the menu parts. My hypothesis about the game menu/frame is that the upper part (and maybe lower too) might be 320 pixels width, read like the title screen. I'll probably configure segments of the byte data to be treated with different sizes, so I can handle 8x8, 8x16 and 320xWHATEVER. Again is what I'd do, not limit myself to handle only 8x8 sprites if I can just know which ones are bigger (and work with a finite group of available sizes).

I've uploaded the code of the yet unfinished sprite exporter to my Github. It really doesn't yet export individual sprites with the constant values I've commited but changing both sizes to 8 will output lots of pixel garbage, as many sprites aren't aligned.

Next steps are trying to guess the other sprites, or probably just checking the dissassembled .COM code, where it reads data from the data blocks and reverse engineer how it does it. I also read that DOSBOX has "debug enabled" compilation flags that allow runtime debugging so I might give it a try to experiment with more disassemblers.

NOTE: If I end up succeeding and extract the sprites, my plan is to build a small Python script that, given the MS-DOS game data files, will extract the byte data blocks, so that afterwards running the sprites script will generate the PNG. Expecting anybody to do a manual extraction of byte blocks is quite time-consuming, but I don't want to include the original game data files (they are trivial to find online).

Opera Soft's PIC to PNG exporter

The other day I was trying to play an old MS-DOS game, Mutan Zone from Opera Soft. Despite being terribly hard (and requiring you to do pixel-perfect jumps), it was one of the games I owned for the old AMSTRAD PC/W back in the eighties and nostalgia hit me, so I played a few times... and decided maybe would be great to try and extract some of those fancy graphics. I lately get bored more easily with the games themselves and instead like to thinker with their internals.

This the game title screen, the (at first unkown) goal I'd end up achieving. At older systems also called "loading screen", because would be what you'd see while the game loaded into memory:

Mutan Zone

Before digging into code, I searched online for graphic formats, and while there was a .PIC file format, and was used by old painting programs, it had a very noticeable header (01234h) which wasn't the case with the game .PIC files.

My first naive approach was to assume 1 byte had one pixel (typical for MS-DOS games, as a VGA card could display 256 colors). I hex-analyzed some PIC files (both from Mutan Zone and Abadia Del Crimen), and they didn't seemed to have anything strange, other than lots of byte repetitions, so my bet was that the file had no compression. Also, I knew from my small retro gaming knowledge a few facts and guesses:

  • Games were written in Assembler, and ports to other computers were very frequent (many times by the same company)
  • There were almost no tools so building an image editor that converted to different formats would be already a hard for them
  • Space was an issue but at RAM too, and CPU was a big issue too, so compressing graphics would add more complexity than value

Based on this, I tried to simply dump the bytes, one at a time, in 8x8 sprites, using the pixel byte value as the green component of a RGB PNG. The results were... nothing recognizable. Tried 8x16 and 16x16, but no visible patterns resembling anything.

I built a crappy ASCII dumper that put @ everywhere there wasn't a zero, and toyed around with the first chunks of bytes. what I did found was that, instead of resembling sprites, the pixels would make more sense arranged horizontally, in a single row... and then I had this eureka moment: The PIC file might just be storing first the loading screen! It could also might be storing some metadata as a header, but didn't looked as such (lots of zeroes, atypical in header info) but... Would try first to just dump all content as a 320x200 image.

So I grabbed a screenshot of the title screen:

Title screenshot.

And comparing the "black & white" dump vs the loading screen, there were similarities but still didn't matched the first row of pixels...

Then, I decided to check the graphics mode for hints. In CGA you only use 4 colors, and the game ones were from a standard palette (0 black, 1 cyan, 2 magenta, 3 white)... so converting to binary we just need two bits to store the chosen color of a pixel:

00 -> black
01 -> cyan
10 -> magenta
11 -> white

I also thought that if I were to build an image editor for that era, graphics and memory, I'd squeeze 4 pixels per byte. Analysing the size of PIC files from another game from the company, Sol Negro, I found out that the EGA version files was double the size than CGA ones... so bits-per-pixel (bpp) were "in use" instead of just using a full byte per pixel. Also, it kind of confirmed my guess that PIC files weren't compressed (else size would differ but being almost exactly twice... was suspicious).

The first bytes of the file in binary were:

00110000 00000000 11000000 11000000

Which, if you count in pairs, match the pixels at the first row of the title screen:

Title screen upper-left corner zoom

With that in mind, I changed my code to keep reading one byte at a time, but operate with pairs of bits.

Early experiment of splitting a byte into 4 pixels / 2bpp. Color wasn't extracted properly, but I was getting somewhere:

WIP screenshot #1

Maybe I was doing something wrong (although it looked straightforward), so I decided to check how vigasoco project was reading the Abadia del Crimen data and placing pixels at the screen.

Copy & pasting the pixel unpacking method improved but didn't fixed the bug:

WIP screenshot #2

So I went back to my code and tried with Abadia del Crimen:

WIP screenshot #3

Someting was surely wrong, so in the end, I did the simplest boolean algebra logic I could to be 100% sure I was masking and shifting and grabbing the correct pixels:

# 00000011  1 + 2 =     3
# 00001100  8 + 4 =    12
# 00110000 16 + 32 =   48
# 11000000 128 + 64 = 192
if pixel == 0:
    return (data & 192) >> 6
elif pixel == 1:
    return (data & 48) >> 4
elif pixel == 2:
    return (data & 12) >> 2
    return (data & 3)

And voila! it worked and was reading every pixel right... or maybe not:

WIP screenshot #4

Uhm.. a half-size image with weird cuts below... this looked like some kind of interlacing, so I assumed I was reading first all even rows and then all odd ones:

WIP screenshot #5

Almost there, but there were black lines. Going back to the hex PIC data, I saw that between the last "even row" and the first "odd row", there were 192 bytes/768 pixels of zeroes. I still have to find why that padding zeroes, but I simply skipped them and tried again:

Mutan Zone

Finally! A pixel-perfect PNG dump of the game's title screen.

I've uploaded all the code to my GitHub so, despite being small, better go there if you wish to see all the details. It is a simple Python script that reads bytes, operates with them and saves (using PIL) the data into a PNG, but I want to keep it at hand for the future if I need to do again bitwise operations, bit-representations of integers and the like.

Other results

Considering that I almost got before the Abadia del Crimen title screen, I decided to try a few more PIC files from other Opera Soft games I knew... and as long as they are CGA-based, it works perfectly:

Abadia del Crimen


Livingstone Supongo 2

Sol Negro

The future

I've been able to convert a PIC file, but it seems game sprites and backgronds don't live there. I've already done some initial peeking and the OVL files (present at most DOS games from Opera Soft) they include inside at least a COM executable (newer games like Sol Negro include a DOS EXE binary, with their identifiable MZ header), so each level runs independenty as a separate binary... but at least COM files have no clear separation between code instructions and data, so it'll require more work.

This is why also the Python script is so specific for title screens. Until I figure out where and how are sprites stored, it doesn't makes sense for now to make more generic the extractor (and maybe I'll even duplicate the code and keep the tile one intact).

Update: Added .PIC research (interesting but not critical) and corrected typo.

Previous entries