How different opensource storage systems replicate data

Listening to a podcast the other day I learned some interesting things about MongoDB (which I haven't used). I learned about causal consistency, which I had never heard of before. It was curious to also heard that, when asked about how to handle replication issues (the dreaded (replication lag), the podcast guest (a product manager that works at MongoDB) suggested either using master pinning or, if they were using the latest MongoDB versions, then it automatically used session pinning inside to.

That a quite modern and non-relational database had the same suggested approaches to handling replication and consistency issues was peculiar. Then, on the same talk they mentioned that Mongo uses an Operations Log for replication, which sounded really really familiar... so I did the following tiny research exercise of summarizing a few opensource storage systems and how they replicate their data:

MongoDB

Stores operations at the Oplog, which is used for replication.

Mysql

Stores in the binlog, and supports 3 methods of replication: Statement Based Replication (SBR), Row Based Replication (RBR), Mixed Based Replication (MBR). Two of them use the aforementioned binlog.

PostgreSQL

Since 9.0, PostgreSQL supports performing a streaming replication of its WAL (Write-Ahead Log).

Redis

Keeps and emits a stream of commands to the replication nodes.

Cassandra

Has a commit log + in-memory memTables per node. Not having used the system I'm not totally sure, but it seems that when replicating data using the more advanced strategy (NetworkTopologyStrategy), the designed coordinator sends a write request, and when each node has it on the commit log & memTables, it replies back with an ACK.

Note that Cassandra is a distributed system, so each data item lives at N nodes only, where N is the replication factor (number of copies of each data item).

Others

I've left out other systems that are not meant for long term storage out, but even some of those have a similar design. For example, Kafka keeps a local command log (with the writes) to send to the partition replicas in order to achieve topic replication.

Conclusion

Different database systems, almost exactly the same solution: Keep a log of commands, propagate that log to replicas.

Or, as my friend Saski pointed out, capture all changes to an application state as a sequence of events [and replicate them] (a kind of event sourcing).

Posted by Kartones on 2019-09-24

Comments? Share via Twitter Share via Linkedin