Kartones Blog

Be the change you wanna see in this world

Why the 10th Man Rule is relevant

World War Z is an amazing zombies book with a terrible and unique movie adaptation. As far as I know, it is the only zombies film not featuring a single drop of blood or seen zombie bite, I guess because they spent all the budget on both Brad Pitt's salary and the marketing campaign. But there are learnings even in bad things, and this scenario is no less. In one scene, a character describes the 10th Man Rule (also known as devil’s advocate), which proposes:

[...] produce a range of explanations and assessments of events that avoid relying on a single concept [...].
If ten people are in a room, and nine agree on how to interpret and respond to a situation, the tenth man must disagree. His duty is to find the best possible argument for why the decision of the group is flawed.

Now, this rule can be real or can be fictional, but anyway I like it and sometimes apply it. Usually in the context of everyone saying the pros and I focusing on the cons, but the opposite can also happen.

I think this rule is relevant in discussions because in a perfect world we would gather different experts and stakeholders and spend as much as needed thinking and talking about approaches, choices and trade-offs about any proposal or technical requirements specification, but the reality usually goes more along the lines of:

  • If the proposal comes from Product, usually speaks little of trade-offs and difficulties (other than listing existing ones as justification for the proposal itself)
  • If the proposal or tech spec comes from Engineering, my general feeling is that, with all this "perversion of agile" we can now mostly say "I wanna do X" and just jump into building it, losing that phase of properly researching or simply giving some thoughts to how to achieve your goal; what I sometimes call "think before you code". I am not a fan of endless documentation, but writing a tech spec forces you to think about your action plan at least once [1].

Circling back to the main point, we can say we usually attend a meeting or review a proposal which many times lacks depth. Now, ignorance is bliss they say, and sometimes that is good because if you don't know you can't do X because it's impossible, you might find a way to actually do it [2]. But often those meetings are either echo chamber or an environment where there's some aversion to voice concerns (to "say no"). I'm pretty sure everyone has at least once been in a situation of being faced with everyone else thinking "A is best" while we think "Are you all blind? B is clearly best!".

The purpose of this rule is not be an excuse to become the grumpy person who always points out the flaws, but to enable discussions that challenge the main approach and avoid group thinking (which usually impedes individual thinking). Note that this does not mean you can be disrespectful or impolite, of course.

It is nice to do positive thinking and generally seek a "how yes", but we must not forget to also think about the "why not" at some point.


[1]: I view it the same as explaining a subject to somebody else helps you study it. Forcing to explain or detail something enforces retaining what you learn.

[2]: Wasn't some quote along the lines of "Nobody told me it was impossible, so I did it"?


Let's talk about idempotency

Let's have a brief talk about idempotency applied to computer science.

Definitions

Wikipedia's definition of idempotence says a function is idempotent if the system state remains the same after one or several calls.

A Stripe blog post nicely summarizes that an idempotent API or service endpoint can be called any number of times while guaranteeing that side effects only occur once.

Idempotency then is a mechanism to help us with: - Makes retries safe, as else a retry can make things worse and cause a cascading effect - Helps dealing with duplicates. Generalizing previous point, applied to messaging and/or service interactions, it allows to get most of the benefits of an Exactly-once message delivery in a world of At-least-once. - Helps achieving certain data integrity without relying on distributed transactions, two-phase locking and other mechanisms which are more reliable but also incur in performance penalties

How

Definitions are fine, but do not talk about how to achieve idempotency. The main pillar is the idempotency-key, a way to identify a change request so that we can detect repetitions/duplications. The key might be something really simple, like a UUID the caller generates and sets as a custom header (e.g. Idempotency-Key for Stripe calls). It could be a hash of the sent data. It could also be a few fields we decide are relevant.

We'll focus on a non-trivial scenario: We have a service that can receive calls from multiple sources/clients and that internally keeps a FSM (Finite State Machine). We want to protect this service so that only processes once a request to "transition from state A to B". This isn't trivial because is very easy to protect against the same caller repeating the requests (e.g. it caller has implemented a retry system), but harder as you somehow need to differentiate when client X performs a request to transition from A to B, and then arrives a request from client Y with the same petition: transition from A to B.

We could start by defining our idempotency-key as simply <target_entity_id> + <new_state>, and that would work for simple scenarios, but what happens if our states graph allows multiple ways of reaching state B? Then comes my suggestion: If you notice, I didn't say transition to B but transition from A to B. If our idempotency-key is for example <target_entity_id> + <new_state> + <current_state>, we can now easily differentiate transitions A -> B and D -> B without problems.

And now, what do we do with this idempotency-key? we simply use it to keep track of recent calls: - if the key is not present in our idempotency storage (Redis or in-memory are common, but any storage you can imagine is fine), we perform action and cache the output at our idempotency storage (pretty much like a cache) - if the key is present, we return the cached results and do not execute anything [1]

We shouldn't keep cached responses forever, right? This is why Redis or a similar in-memory cache is such a good fit: You just set a decent TTL for the idempotency items you store, and forget about it. The value depends: for sensitive operations like a purchase I've set it in the past to one hour, but could be extended for long-running processes or batch jobs, or kept very short (e.g. a few seconds for a delete operation).


There's one remaining subtlety: what do we do if our system is designed in such a way that there can be concurrent requests to the same service (trivial scenario: you have multiple instances of it)? What if we have a slow endpoint and we get a second request to transition from A to B meanwhile the first one is executing?

Here it is true that idempotency fails a bit to fully help us, because it will only work flawlessly in one of the following cases: - you have a single service instance (or single point of entry for processing actions/requests) - you have an action queue, buffer, etc., so again, actions are processed sequentially - you only care about repeated requests from the same caller (like Stripe's idempotency key implementation as a unique hash id)

If we want to support concurrent execution of idempotent requests, we probably need some request management mechanism, to detect executing (yet incompleted) requests and apply some sort of strategy to them: - wait until original finishes? - re-enqueue the request at the tail? - return an HTTP 307 redirect or a 409 conflict?

We can incorporate this detection to the idempotency middleware/component: instead of just storing the response status and data, by including also if it's finished or ongoing (personal advice, if ongoing set a small TTL); or we can have a separate requests log (just keeping which ones have finished and which ones are ongoing); we could even implement most of the idempotency management at NGinx level with some LUA scripting, although here I advise caution because caching non-GETs is a dangerous path and you must be very careful discerning which HTTP headers to take into account as part of the idempotency "key".

Something along the lines of:

# not present
return None

# ongoing request
return {
    "status": "ongoing"
}

# finished request (non-error)
return {
    "status": "finished",
    "http_status": 200,
    "response": { ... }
}

# finished request (error)
return {
    "status" "finished",
    "http_status": 400,
    "response: None
}

Alternatives

Our previous scenario is one that, with some convention and agreement over the data sent, or via flexible configuration options, can be built for example as a Django REST Framework decorator.

A post from NParticularBus includes a handy list of some alternate approaches:

  • Message de-duplication: Explained above when done inside services. When talking about message-based communication means literally detecting repeated requests in a short time span and removing the duplicates.
  • Natural idempotency: Coding your logic to be as idempotent as possible. This is always desired and can be done for individual entities with some effort, but with complex services really hard to achieve in the upper layers.
  • Entities and messages with version information: If the data can provide some kind of version number, we can say I want to update data from entity id XXX being at version 3, then if backend detects current stored version is no longer 3 (because something else updated it), you can fail the change request. This has the drawback of needing extra communication, as the change request emitter would need to query for current data and try to apply the modifications again.
  • Side effect checks: A bit naive approach when talking complex systems, being able to detect if the side effect is already present (if our service is already in state B don't execute transition from A to B) is something you ought to be already doing.
  • Partner state machines: Having a single source that can issue a change request allows to control execution (and narrow it to exactly-once), but also creates a single point of failure and for existing complex systems might not be so easy to achieve.
  • Accept uncertainty: Embracing chaos is always an option, but one that usually doesn't ends up well 😆.

Wrapping up

Pure, raw idempotency is hard to achieve in a complex real world system. If you can, of course go for the simplest approach: Wrap your data change request into a transaction and rollback if idempotency must be honoured; you will leverage existing tools and have no unexpected side effects. As for me, the times I've implemented idempotency (at either API level or service endpoint level), a good idempotency key + TTL-limited caching of the results has proven useful.

Notes

[1] Not re-executing logic is an important remark. In a pure mathematical equivalence, where for example the number 1 is idempotent because no matter how many times you multiply by it N*1 = N, we could execute the logic every time and "just make sure the system stays the same". Now just imagine how quickly this can spiral into very complex logic, let me provide you with a simple example: ORM models usually keep modified_at timestamps, so if you want to be pure/strict, it shouldn't update more than once when running an idempotent change request twice; thus, you probably will need to use transactions everywhere, which can be a huge performance penalty. And this way we arrive to the alternative: "doing nothing". If we already know the output, the best way to not alter the system is not touching it, just returning the cached output and all is fine, we respect idempotency theory while ensuring system/data is kept exactly the same.


Course Review: Master English: 100 Phrasal verbs for IELTS (Udemy)

Master English: 100 Phrasal verbs for IELTS, TOEFL, CAE, FCE is, apart from a very SEO-oriented title, another 4 hours Udemy English course that I took to focus on practising phrasal verbs. Using 10 topics and a conversation for each, you'll do some exercises and learn multiple examples of usage of each of the hundred verbs.

Correctly done and pronounced and easy to follow, although in the conversations the speed is noticeably fast (so if you already listen it at 1.25x or 1.5x they speak really quickly), while I think I didn't learn anything new (I've completed more courses about phrasal verbs), it was a good practice and reminder exercise.


Course Review: Business English Course for ESL Students (Udemy)

Business English Course for ESL Students is one of those focused courses you might want to take if you need domain-specific vocabulary training. Mostly containing work-related words, sentences, phrasal verbs and idioms, it's near 6 hours of videos are quite interesting, well done and practical.

Finance, retail, marketing, interviews, phone calls, meetings, being polite/correct at work (with your colleagues, etc.), even topics about computers and entrepreneurship and stock market, everything is exactly what you would expect. If I had to find something to improve, maybe the medical field chapter is too specific, but for your everyday life is still useful, so not a real complaint.


Book Review: Stay Awhile and Listen: Book II

Review

Stay Awhile and Listen II book cover

Title: Stay Awhile and Listen: Book II: Heaven, Hell, and Secret Cow Levels

Author: David L. Craddock

Four years later than the first book, we return see how Blizzard North (ex-Condor) deals with the massive success of their action RPG, Diablo. From the outsourced Diablo: Hellfire expansion (a decision they wouldn't make again), to the full development of Diablo II and its expansion, Diablo II: Lord of Destruction (this time, fully built in-house), and the initial development of both Diablo III and a new space-themed action RPG (nicknamed "Starblo").

Where in the first book we read a story about titanic efforts and unstoppable desire to create a great game, of how a small company was able to revolutionize the computer RPGs genre with tons of hard work, this second tale as a grim overall tone. Yes, Diablo II was a success (I've poured way too many hours on it and was one of the first games I played in multiplayer) and technically was impressive, but reading this book one feels sad of how some bad management and egos (and maybe other factors) destroyed a promising company.

I haven't exactly measured it, but probably half of the contents relate with negative concepts: fights between Blizzard North and Blizzard Entertainment, fights between the office's own employees and bosses, fights with the external company that created D1's Hellfire expansion, fights with Blizzard's parent companies... Here most of the struggles are human, social, instead of technical; there are technical problems to solve, of course, but don't feel as impacting as in the first game.

The insanely long crunch periods, the (not very healthy) rivalry between both Blizzard studios, the backstabbing of employees as years passed by, and the sad closing of the North studio after the bosses played a bluff and lost to Vivendi, leaving the company and opening the doors for a restructuring and posterior merge of some of the employees... All sums up to a great loss of the magic formula that created both games. The legacy lives on, and even in the recent early trailers of Diablo IV we can see ideas and characters meant for the third title, but it's not the same team.

I like that the book tries to give as many points of view as possible about many of the topics. For example, when it is talking about how the studio almost halted production for months after Diablo II shipped, you get to read points of view of the three bosses, old employees, new employees, people working on Diablo 2 expansion, people starting to work on concept art or the main story for the third part... even sometimes some opinions from Blizzard Entertainment employees.

It is a very interesting read, full of details and insider info that any fan of the franchise will surely love to learn. It will be interesting to read in a few years the third and last part, fully focused on Diablo III after being its development was restarted by Blizzard Entertainment.


Previous entries