Kartones Blog

Be the change you wanna see in this world

Course Review: American English Pronunciation (Udemy)

Another small 3 hours course I've finished recently 🤓 American English Pronunciation has a quite self-explicative name, and delivers what it tells: You'll get a lot of advices on how to speak, from general short and long vowel sounds, consonant sounds, to hard scenarios like diphthongs, "S and Z" or "CH and S", among others.

These small courses are starting to be quite valuable to me, because for the cost of one or two English lessons you get "focused lessons" on topics you might want to learn, improve or simply revisit. In this case it helped me to practice pronunciation and repeated and exemplified catchy cases.


How different opensource storage systems replicate data

Listening to a podcast the other day I learned some interesting things about MongoDB (which I haven't used). I learned about causal consistency, which I had never heard of before. It was curious to also heard that, when asked about how to handle replication issues (the dreaded (replication lag), the podcast guest (a product manager that works at MongoDB) suggested either using master pinning or, if they were using the latest MongoDB versions, then it automatically used session pinning inside to.

That a quite modern and non-relational database had the same suggested approaches to handling replication and consistency issues was peculiar. Then, on the same talk they mentioned that Mongo uses an Operations Log for replication, which sounded really really familiar... so I did the following tiny research exercise of summarizing a few opensource storage systems and how they replicate their data:

MongoDB

Stores operations at the Oplog, which is used for replication.

Mysql

Stores in the binlog, and supports 3 methods of replication: Statement Based Replication (SBR), Row Based Replication (RBR), Mixed Based Replication (MBR). Two of them use the aforementioned binlog.

PostgreSQL

Since 9.0, PostgreSQL supports performing a streaming replication of its WAL (Write-Ahead Log).

Redis

Keeps and emits a stream of commands to the replication nodes.

Cassandra

Has a commit log + in-memory memTables per node. Not having used the system I'm not totally sure, but it seems that when replicating data using the more advanced strategy (NetworkTopologyStrategy), the designed coordinator sends a write request, and when each node has it on the commit log & memTables, it replies back with an ACK.

Note that Cassandra is a distributed system, so each data item lives at N nodes only, where N is the replication factor (number of copies of each data item).

Others

I've left out other systems that are not meant for long term storage out, but even some of those have a similar design. For example, Kafka keeps a local command log (with the writes) to send to the partition replicas in order to achieve topic replication.

Conclusion

Different database systems, almost exactly the same solution: Keep a log of commands, propagate that log to replicas.

Or, as my friend Saski pointed out, capture all changes to an application state as a sequence of events [and replicate them] (a kind of event sourcing).


Course Review: Building Your English Brain (Udemy)

My partner recently started to also use Udemy to do English courses. Initially re-doing the ones I've finished, she also wanted one or two new ones, so why not doing them myself too? Building Your English Brain is the first of those.

We have a small 3 hours course with miscellaneous tips to train your brain into thinking directly in English. From watching movies and TV series, doing different exercises, to TED talks or practice some writing, I personally didn't found any new idea that I wasn't already practising one way or another but it makes for a good summary (plus three hours of listening :)


Book Review: Two Scoops of Django 1.11

Review

Two Scoops of Django 1.11 book cover

Title: Two Scoops of Django 1.11

Author: Daniel Greenfeld, Audrey Roy Greenfeld

Daily work doesn't always allows to get as deep as I'd want deep into some of the tools and frameworks I use, so I decided to read a book about Django and learn a thing or two. This title has really good reviews and was recommended to me by a few colleagues so was a simple choice. I also took my time to read it because I was applying many of the concepts to one of my side projects (as it is one of the best ways to learn, by practising).

The book is a modern equivalent of those old Assembler, C or Pascal books, with dozens of chapters and trying to cover so many things you could feel overwhelmed. And I say modern because one thing that it does improve a lot is how everything is explained. Instead of the old, hard to digest reference books, the authors create a fictional company (to sell ice cream) with the website using Django and evolve almost all of the topics covered by applying them to or exemplifying them using that company and products.

This is a big book, around 500 pages and 35 chapters, and while there are some drawings, most of it it's either text or code (but fear not, as examples are small, concise and easily readable). I'll just list the most relevant topics/chapters so that you can grasp how much content there is:

  • Best practices and advices on how to setup Django projects
  • Settings
  • Models, Admin, core components
  • ORM
  • Function-Based Views and Class-Based Views
  • Forms
  • Templates, tags, filters, and optionally switching to Jinja2
  • Django REST Framework and REST APIs in general
  • Third-party packages (also non-django specific ones)
  • Testing
  • Async task queues
  • Security
  • Logging
  • Signals
  • Deployment (and a brief intro to Continuous Integration)
  • Debugging
  • Coding style guidelines

It is my first Django book read, but I consider it a must if you're past the official website tutorial (excellent but brief). I've definitely learned a lot 🤓


Self-Modifying code and avoiding conditionals

La Abadía del Crímen title screenshot

Between 1980 and 1990 Assembly was the most used language for everything, from videogames to nasty viruses or most of your everyday programs. After playing the game, reading and watching some technical details about La Abadía del Crímen (of which by the way there are two books about in Spanish: I & II), it's been mentioned multiple times that the original AMSTRAD CPC version was amazing technically, but that it wasn't easy to port because of the self-modifying code usages it contains.

As it sounds heretic as of today to think about patching in-memory code, I've read a bit about how it works, these scenarios being the most common ones:

  • Avoiding branching code, conditional checks, etcetera by modifying in-memory instructions to jump to a different location
  • Reusing memory structures with less code (e.g. make different characters use the same memory struct)
  • Hiding things like interrupt call or certain strings/numbers/memory addresses, mostly either for viruses or for copy protection mechanisms

Now, back in the day this made sense: Memory and CPU were so restricted, that performing an if frequently could really hurt your game, or keeping properly scoped functions with different logic pieces could mean extra precious cycles spent on pushing and popping registers from the stack [1]. But nowadays nobody would even think about something as simple as manually changing the instruction pointer (except trying to circumvent videogame protections, hacking videogame consoles and other shady areas). And even if you wanted, Data Execution Protection, memory pages protection, and the tons of caches between the code and it's real execution makes it a really bad idea to even try for anything normal.

But another point is that there are more modern ways to do something exactly the same as self-replicating code in videogames mostly try to do (avoiding conditionals):

  • Bit masks/manipulation: Old but still very valid when performance is relevant. Caveat is code is not as readable and not everything can be made bit masks...
  • Functional programming: This strictly is not removing conditionals, but you tend to reduce their usage when you think in pipelining functions and handling just input/output instead of keeping state all around.
  • Object orientation/duck typing: Different classes (or functions if the language allows) provide methods that share a same interface, and you inverse where the conditional lives (although eliminating it or not depends on how you instantiate the object): instead of doing if X then A else B, you provide either an X-object that does A or a non-X-object that does B).
  • Function handlers: Almost the same as previous point; You define a function handler, a C# delegate, you name it, and just change it by another one when you wish to modify the current behaviour. Super-simple C# example I did long ago here.

I'd definitely go for function handlers if I were to build any non-trivial game today: it is clean and very friendly towards testing at both sides, as you can inject fake AIs, and test each one in isolation.

In the end you're just modifying a function pointer to have a different address, so it is really really similar to adding or changing a JMP assembly instruction. Plus nobody will get crazy trying to debug logic that has been patched in-memory and no longer resembles the source code.

Which other ways of avoiding conditionals and/or organizing your code to avoid huge switches do you know?


[1] : Actually, those anti-object orientation "patterns" would come back with J2ME, where hardware constraints once again favoured a non-Java approach of having a single class with as much inline code, global variables and few methods as possible.


Previous entries