Kartones Blog

Be the change you wanna see in this world

15 Years Blogging

Today December 2nd, 2019, marks a special date: exactly 15 years ago, I decided to start a blog to dump my development learnings and dull rants from time to time. The first blog post was about a game development framework I was using back then to prepare my master thesis project, Jamagic.

Master Thesis videogame prototype

That original blog post now lives here, but I began using Google's Blogger as the platform to host my content.

I had another website with kind of posts (mostly tech-oriented news), but it was in Spanish and I wanted both to practice writing in English and to be able to speak about any topic I wanted.

For the record, I'll list the different platforms and approximate dates because one one hand it's been quite a challenge to migrate the data from one place to another, but on the other hand I've learned a lot about becoming more pragmatic and adopting leaner and simpler platforms, data formats and general approaches.

  • 2004: Blogger. Simple design, could just add some simple Javascript for visits counting and showcasing (the only "analytics" you wanted back then).
  • 2005: Custom .NET-powered blog platform (maybe DasBlog?). A friend created a blog community and invited me to move my blog there. I had to manually export and re-create the posts 😭.
  • 2007: Community Server. Decided to try building my own blog community for me and my friends (although a few external people joined), so once again I manually exported one by one and re-inserted all blog posts.
  • 2014: BlogEngine.NET. Software rots, and when you're using an outdated version of a closed-source application, even when it is extensive and I could also use .NET Reflector to learn about the internals and keep improving and advancing things, there comes a time to shake things up. Not without pain I migrated to a multiple single-instances setup of BlogEngine, which used BlogML and back then was a Community Server -> BlogEngine script. It was an important milestone in simplifying things not only because of a more open data format, but also because I could switch from having to JOIN half a dozen SQL DB tables to just maintaining a physical file per post or page.
  • 2016: Pelican. I stopped working professionally with .NET in 2009, but kept using it for personal projects. In 2016 the only think I ever wrote some C# or ASP.NET for was for the blog, so to simplify my life I decided to jump the "static site generators" wagon and migrate once again the blog. This time was dead easy with just a simple tweaks, and Markdown allows HTML so all old posts were just dumped as such (I'd rather write new ones in MD). Compilation added an extra step but with some simple scripts I now instead do a "build and deploy" rather than a "edit and upload".


Quite a journey to fight to keep my content always available. Call me crazy but I hate when you try to search for something old and there are no search results. Google tends to de-rank and hide old stuff, DuckDuckGo started crawling the web later so doesn't knows everything and most often, companies, platforms and domains just disappear, so to me it pays off to keep your stuff somewhere you can ensure is available.

Maintaining the blog has also served the nice purpose of having a playground or sandbox where to test and experiment with many web-related topics. From building custom plugins and extensions to the platform it was running on, to removing all kinds of tracking and almost all javascript of the page, to maintaining as a personal challenge trying to keep things fast, overall is fun and refreshing to have some personal projects. And heck, I also feel I'm up to date enough in web topics to achieve things like:

Google PageSpeed Insights November 2019

Specially considering I mostly only do backend development nowadays.


To wrap up and stop wasting more time of any poor reader who made it until here, I'd like to be sincere: What I write here is usually and mostly irrelevant. I also lately feel less inclined to blog because I both feel I'm not going to add anything worthy and that the web is nowadays full of too many experts with shallow blog posts, posts that we then tend to take as the real truth. I feel I should instead be focusing on reading more books than contributing to this quick doses of "information", so my real message of this post is more like... don't waste much time reading only blog posts, instead to dig deep into a topic research books and papers, and experiment.

Talk less, do more.


Steam Web API Introduction

One of my pet projects, Finished Games, is reaching a state in which already serves decently as a catalog and tracker. Sure I have a ton of ideas to add, more sources to get games from and many improvements, but the base system is working so I can start to tackle other areas, like automations.

I can manually add games to the database. I can import them from "catalog sources", and if already exists match them with the existing title, update certain fields, etc. But I still need to manually mark those games I own, so if, as in this example, a platform like Steam can provide me with a list of which titles I've got, and maybe which ones I've completed (by checking certain achievements), it's way easier and nicer.

So, without further ado, here's a brief introduction of the Steam Web API endpoints I'm going to use soon to be able to sync user catalogs.

Setup and Documentation

You can register for an API at https://steamcommunity.com/dev, and it is instantaneous, no need to wait for a manual approval.

Once you have an API key, the official docs are at https://developer.valvesoftware.com/wiki/Steam_Web_API.

Basic endpoints

I just need three endpoints to grab user data relevant to my use case.

Obtaining the steamid from a vanityurl (an account friendly name), like "kartones". Not everybody has them setup but I for example do, so better be prepared:

http://api.steampowered.com/ISteamUser/ResolveVanityURL/v0001/?key=YOUR-API-KEY&vanityurl=VANITY-URL

Fetching the list of owned games of a given user. Including game name, which saves you an additional call to fetch game details (which also returns no name for some titles! 😵):

http://api.steampowered.com/IPlayerService/GetOwnedGames/v0001/?key=YOUR-API-KEY&steamid=76561197987342492&format=json&include_appinfo=true

Retrieving achievements by game and user. Not only the unlock status but also the epoch timestamp of when it was unlocked (useful for deltas):

http://api.steampowered.com/ISteamUserStats/GetPlayerAchievements/v0001/?appid=271590&key=YOUR-API-KEY&steamid=76561197987342492

Additional Endpoint

If you want way more info about a game, from the description, release date or the developer name, to screenshots, platforms (Windows, Linux, Mac), genres and more, there is an store endpoint that works without authentication:

http://store.steampowered.com/api/appdetails/?appids=271590

Rate Limits

api.steampowered.com calls are rate-limited to 100k per day according to the API terms of use.

store.steampowered.com is rate-limited against abuse. I read somewhere that seems to be around 200 requests in a 5 minutes window, so you should cache those call results.


Book Review: Power-Up

Review

Power-Up book cover

Title: Power-Up: How Japanese Video Games Gave the World an Extra Life

Author: Chris Kohler

I had this 2005 book forgotten in a bookshelf and decided to read it during some vacations. Counting around 300 pages, it makes for an interesting and easy read if the topic of (mostly) retro japanese videogames interests you.

From an evolution of Arcades and consoles to videogames themselves, more than half of the book covers what we could say it's the core: how games evolved, the author's opinion on why they were so important and influential, and how they fit into 2003-2004. The remainder are less common but very interesting topics: game music, translations, interactions and collaborations between american and japanese game development studios and of them with Nintendo. I specially liked the music and translation chapters because were areas I know almost nothing about and I was surprised of how relevant they are, both to Japanese people and for example why we had those horrible translations of arcades and early console games.

If I had to criticise something about the book, the only two minor things I can come up with are the following:

  • The chapter about Akihabara felt too detailed to me. I don't want a retro shopping guide so just an anecdotal comment or two would have been better.
  • The book has a certain bias towards focusing more on Nintendo than SEGA, which back then was also very relevant. While I concur that Nintendo is so special and that it almost "saved" the american video-consoles market after people burned out of Atari and the like, I miss more details about them.

As mentioned, just minor things, it is a delightful read and you can clearly feel how the author enjoys videogames, Japan cultural differences (at least applied to games) and in general researched nicely for the book.


Four Horsemen of the Python Apocalypse

Four Horsemen of the Apocalypse

I think I've found the four horsemen of the Apocalypse in the python world. A combo that, while will cause pain and destruction at first, will also leave afterwards a much better codebase, stricter but uniform and less prone to certain bugs.

Who are these raiders?

mypy: Not a newcomer to my life (see I & II). Each day I'm more convinced that any non-trivial python project should embrace type hints as self-documentation and as a safety measure to reduce wrong typing bugs.

flake8: The classic, so useful and almost always customized, losing some of its power when used alone. Still certainly useful, just needs to be configured to adapt to black.

isort: Automatically formats your imports. By itself supports certain settings, but should also be configured to please black rules.

black: The warmonger. Opinionated, radical, almost non-configurable, but pep-8 compliant and with decent reasonings about each and every rule it applies when auto-formatting the files. It will probably make you scream in anger when it first modifies all files, even some you didn't knew your project had, even django migrations and settings files 🤣... But it is the ultimate tool to cut out nitpickings and stupid discussions at pull request reviews. Everyone will be able to focus on reviewing the code itself instead of how it looks.

pre-commit: isort and black are meant to run with either this tool or a similar one, instead of as a test (black even ignores stdout process piping). After some experiments, the truth is that makes more sense to keep auto-formatters at a different level than test runners and linters, and as flake8 will also fail the pre-commit hook, I decided to move everything except mypy to pre-commit.


Go programming language has, among other things, taken a great step by making a great decision: It provides one official way to format your code, and it does fix the formatting automatically by itself (instead of emitting warnings/errors).

I was reluctant to try black and isort because I was worried of the chaos they can cause. But again, checking code often means coding style discussions here and there, so encouraged by a colleage I decided to try it both at work (in a softer and more gradual way) and at home (going all in. Almost everybody will hate at least one or two changes it automatically performs, but it leaves no more room for discussion, as you can only configure the maximum line length. period.

I ran black through my whole project, but else they only format created and modified files, which is good for big codebases.


It takes some time to configure all of the linters and formatters until you're able to do a few sweeps and finally commit, so here are my configuration values:

Mypy runs as a linter test, but the other three are setup as pre-commit hooks inside .pre-commit-config.yaml.


Bulk Queries in MySQL vs PostgreSQL

I lately read a non-trivial amount of code diffs almost on a daily basis, so I'm learning a thing or two not only via the code itself, but also via the decisions taken and the "why"s of those decisions.

A recent example that I queried about was the following: You notice there's a DB query that causes a MySQL deadlock timeout. The Query operates over a potentially big list of items, and the engineer decided to split it into small sized chunks (let's say 10 items per chunk). [1]

My knowledge of MySQL is pretty much average; I know the usual differences between MyISAM and InnoDB, a few differences regarding PostgreSQL and not much more. And I consider I still know more about PostgreSQL than MySQL (although I haven't actively used PG since 2016). But in general what I've often seen, learned and have been told is to go for one bulk query instead of multiple individual small ones: You make less calls between processes and software pieces, less data transformations, the query planner can be smarter as knows "the full picture" of your intentions (e.g. operate with 1k items) and, who knows, maybe the rows you use have good data locality and are stored contiguously in disk or memory so they get loaded and saved faster. It is true you should keep your transactions scoped to the smallest surface possible, but at the same time the cost of opening and closing N transactions is bigger than doing it a single time, so there are advantages in that regard too.

With that "general" SQL knowledge, I went and read a few articles about the topic, and asked to the DB experts "Unlike other RDBMS, is it better in MySQL to chunk big queries?" And the answer is yes. MySQL's query planner is simpler than PostgreSQL's by design, and as JOINs sometimes hurt, a way to get some extra performance is delegating joining data to the application layer, or transforming the JOIN(s) into IN(s). So, to avoid lock contention and potential deadlocks, it is good to split into small blocks potentially large, locking queries, as this way other queries can execute in between. [2]

I also learned that, when using row-level locking, InnoDB normally uses next-key locking, so for each record it also locks the gap before it (yes, it's the gap before, not after). [3]


This differentiation is very interesting because it affects your data access patterns. Despite minimizing transaction scope, ensuring you have the appropriate indexes in place, tuning up the query to be properly built, and other good practices, if you use MySQL transactions you need to take into account lock contention (more frequently than with other engines, not that you won't cause them with suboptimal queries anywhere else).

A curious fact is that this is the second time that I find MySQL being noticeably different from other RDBMS. Using Microsoft's SQL Server first, and then PostgreSQL, you are always encouraged to use stored routines (stored procedures and/or stored functions) because of the benefits they provide, one of them being higher performance. With MySQL even a database trigger hurts performance, and everybody avoids stored procedures because they perform worse than application logic making queries [4]. As for the why, I haven't had time nor the will to investigate.

References:

[1]: Minimize MySQL Deadlocks with 3 Steps

[2]: What is faster, one big query or many small queries?

[3]: InnoDB Transaction Model and Locking

[4]: Why MySQL Stored Procedures, Functions and Triggers Are Bad For Performance


Previous entries