-
Notifications
You must be signed in to change notification settings - Fork 82
Description
Context
Since MySQL server is generally hosted on a different machine, an important performance factor is the number of roundtrips you have to perform.
e.g. if you have a 1ms round trip time, and 3 queries to perform in a transaction, you'll have to do 5 full roundtrips for a total of 5ms. Sometimes there is no way around this because you need the result of the previous query to perform the next one, but sometimes you don't.
The classic technique for minimizing the cost of roundtrips, is to send multiple queries in a row without immediately waiting for the response, AKA pipelining
Existing Solution with Trilogy
Right now some people are essentially doing poor man's pipelining by enabling MULTI_STATEMENTS and concatenating their queries together. It works but is wonky and a bit unsafe, as MULTI_STATEMENTS is off by default for a good reason, it prevent various injection vulnerabilities.
Existing Solution with other DB clients
Mysql2
The mysql2 gem has some limited support for pipelining, which they refer to as async: https://github.com/brianmario/mysql2?tab=readme-ov-file#async
It's a bit wonky because they only offer to queue the query and then block on async_result, but I suppose it works.
PG
The pg gem has some support for a pipeline API since PostgreSQL 14 (2021), I haven't tried it, and it seems a bit wonky, but it's the same idea. You start a pipeline with #enter_pipeline_mode, queue some queries in that pipeline, and can flush the pipelined queries whenever you want so they are sent all at once.
Here's the spec for it in the gem: https://github.com/ged/ruby-pg/blob/d072b21852865ecb84e6345df11d68eed50702bb/spec/pg/connection_spec.rb#L2009-L2122
Other databases
While pipelining isn't very commonly used with RDBMS, it has been used extensively for years with redis-rb and redis-client.
redis-client only offer a simple block based API, and return all the results at once in an Array:
result_1, result_2 = client.pipelined do |pipeline|
pipeline.call("COMMAND 1")
pipeline.call("COMMAND 2")
endredis-rb offer some more convenience by having pipelined queries return a Future object as well.
future_1, future_2
result_1, result_2 = client.pipelined do |pipeline|
future_1 = pipeline.command1
future_2 = pipeline.command2
end
future_1.value == result_1
future_2.value == result_2Some older versions of redis-rb also had a "queue" API I deprecated:
client.queue(:command_1)
client.queue(:command_2)
result_1, result_2 = client.commitUse case
As mentioned in introduction, the use case for such API would be to optimized some specific cases, particularly around transactions. If your queries aren't in a transaction, you may simply want to perform the queries concurrently each in their own connection, but as soon as transactions are involved you can't do this anymore.
One specific place where I'd like to use this is in Active Record's transaction code. Right now Active Record's BEGIN queries are lazy, they are only send when the first query of the transaction is triggered.
If Trilogy had such pipelining API, we could send the BEGIN and the first query of the transaction in a pipeline, and save one roundtrip per transaction.
FYI @kirs
@matthewd @jhawthorn @composerinteralia @adrianna-chang-shopify @eileencodes any opinions / concerns about introducing such capability? I'm not set on any specific API design, but I'm happy to explore, etc.
