Bulk Queries in MySQL vs PostgreSQL

I lately read a non-trivial amount of code diffs almost on a daily basis, so I'm learning a thing or two not only via the code itself, but also via the decisions taken and the "why"s of those decisions.

A recent example that I queried about was the following: You notice there's a DB query that causes a MySQL deadlock timeout. The Query operates over a potentially big list of items, and the engineer decided to split it into small sized chunks (let's say 10 items per chunk). [1]

My knowledge of MySQL is pretty much average; I know the usual differences between MyISAM and InnoDB, a few differences regarding PostgreSQL and not much more. And I consider I still know more about PostgreSQL than MySQL (although I haven't actively used PG since 2016). But in general what I've often seen, learned and have been told is to go for one bulk query instead of multiple individual small ones: You make less calls between processes and software pieces, less data transformations, the query planner can be smarter as knows "the full picture" of your intentions (e.g. operate with 1k items) and, who knows, maybe the rows you use have good data locality and are stored contiguously in disk or memory so they get loaded and saved faster. It is true you should keep your transactions scoped to the smallest surface possible, but at the same time the cost of opening and closing N transactions is bigger than doing it a single time, so there are advantages in that regard too.

With that "general" SQL knowledge, I went and read a few articles about the topic, and asked to the DB experts "Unlike other RDBMS, is it better in MySQL to chunk big queries?" And the answer is yes. MySQL's query planner is simpler than PostgreSQL's by design, and as JOINs sometimes hurt, a way to get some extra performance is delegating joining data to the application layer, or transforming the JOIN(s) into IN(s). So, to avoid lock contention and potential deadlocks, it is good to split into small blocks potentially large, locking queries, as this way other queries can execute in between. [2]

I also learned that, when using row-level locking, InnoDB normally uses next-key locking, so for each record it also locks the gap before it (yes, it's the gap before, not after). [3]


This differentiation is very interesting because it affects your data access patterns. Despite minimizing transaction scope, ensuring you have the appropriate indexes in place, tuning up the query to be properly built, and other good practices, if you use MySQL transactions you need to take into account lock contention (more frequently than with other engines, not that you won't cause them with suboptimal queries anywhere else).

A curious fact is that this is the second time that I find MySQL being noticeably different from other RDBMS. Using Microsoft's SQL Server first, and then PostgreSQL, you are always encouraged to use stored routines (stored procedures and/or stored functions) because of the benefits they provide, one of them being higher performance. With MySQL even a database trigger hurts performance, and everybody avoids stored procedures because they perform worse than application logic making queries [4]. As for the why, I haven't had time nor the will to investigate.

References:

[1]: Minimize MySQL Deadlocks with 3 Steps

[2]: What is faster, one big query or many small queries?

[3]: InnoDB Transaction Model and Locking

[4]: Why MySQL Stored Procedures, Functions and Triggers Are Bad For Performance

Posted by Kartones on 2019-11-12

Comments? Share via Twitter Share via Linkedin