Tuesday, May 26, 2015

the myth of asynchronous JDBC

I keep seeing people (especially in the scala/typesafe world) posting about async jdbc libraries. STOP IT! Under the current APIs, async JDBC belongs in a realm with Unicorns, Tiger Squirrels, and 8' spiders. While you might be able to move the blocking operations and queue requests and keep your "main" worker threads from blocking, jdbc is synchronous. At some point, somewhere, there's going to be a thread blocking waiting for a response.

It's frustrating to see so many folks hyping this and muddying the waters. Unless you write your own client for a dbms and have a dbms that can multiplex calls over a single connection (or using some other strategy to enable this capability) db access is going to block. It's not impossible to make the calls completely async, but nobody's built it yet. Yes, I know ajdbc is taking a stab at this capability, but even IT uses a thread pool for the blocking calls (be default).

Someday we'll have async database access (it's not impossible...well it IS with the current JDBC specification), but no general purpose RDBMS has this right now. The primary problems with the hype/misdirection are that #1 inexperienced programmers don't understand that they've just moved the problem and will use the APIs and wonder why the system is so slow (oh I have 1000 db calls queued up waiting for my single db thread to process the work) and #2 It belies a serious misunderstanding of the difference between async JDBC (not possible per current spec) and async db access (totally possible/doable, but rare in the wild).

12 comments:

James Ward said...

You are right that async RDBMS drivers can just move a bottleneck from one place to another. But it also can be a better / more efficient way to deal with request spikes and backlogs that eventually catch up.

St├ęphane Landelle said...

You can also have a look at https://github.com/mauricio/postgresql-async (that's not a JDBC driver, of course).

kk said...

Scala has an async driver for PostgreSQL (linked above) and so does Java and Clojure. See https://github.com/alaisi/postgres-async-driver and https://github.com/alaisi/postgres.async .

The Java interface of course has nothing to do with jdbc, but none of these drivers block threads.

Mike Mainguy said...

@kk um, yes they do. They just move the blocking to another thread pool.

Erik Brandsberg said...

It is true that async JDBC doesn't exist except by wrapping it in another layer that does a threaded queue for processing. That said, there can be additional benefits for this other than just "getting the query out of the way" by queueing the requests, and reorganizing them into batch processing. Most drivers have logic that allows rewriting of the queries in a batch into a more optimized operation. With MySQL, this needs to be enabled (http://stackoverflow.com/questions/26307760/mysql-and-jdbc-with-rewritebatchedstatements-true) but can improve overall processing time by a factor of 10.

The company I am a co-founder of actually has recently introduced a product that allows a policy-driven approach to making insert/update/deletes asynchronous in this manner (with automatic batching) where appropriate (http://www.heimdalldata.com/). It allows insert/update/deletes that match a policy to immediately respond to the execute call, then in the background batch the queries for you, and send them to the database. It then logs the actual results for later analysis. If one of several calls in a batch fail with an exception, it will detect this, and attempt to re-execute the failed call, and log the results on this attempt.

This solution isn't ideal for all operations, but for many, such as logging user activity or page views, it can help reduce the page render time without having to re-code the application, and works with any JDBC data source.

Toby said...

I totally agree with your comments. When I initially read about Slick I thought it was awesome - a non blocking async database framework ... then I read about the dependency on a JDBC driver. Whilst I'm a big fan of Scala I think Typesafe massively over hype the whole async/reactive thing. If I write Thread.sleep(Long.MAX_VALUE) and wrap it in a Future can I now claim to have highly performant, async, non blocking code that follows the reactive manifesto etc etc??

Anonymous said...

kk is actually correct,

if you look into the source of the async libs kk linked you'll see they are using netty at the socket level and only include jdbc to handle parsing of the wire protocol

one thread handles many requests, if you're not familiar with nio async sockets its similar to a linux poll or a berkley sockets select(fd_set)

effectively the thread pairs input and output streams then tells the kernel to wake it up when read/write is ready




Christian Schmitt said...

Of course wrapping a Thread.sleep will cause your Thread to block.
But you should do blocking calls on a different Thread pool.
Reactive has more to do that connecting to a database will wait until the TCP roundtrip has the database connected. The Thread does exactly nothing while doing the TCP roundtrip, it just waits until it gets an ACK. Most networking / io stuff involves such things. Even File API's are mostly blocking. Except the Fact that Java has some libraries that won't block (even File API's) something like the java.nio and netty stuff. Mostly the Database also the Thread also waits for the response of the Database after some kind of SELECT. This could be written non blocking, too.
However the connection won't be speed up that much since connections on JDBC are mostly pooled so there is no gain in an async API since you just get an already connected TCP line from your pool.

Binh Thanh Nguyen said...

Thanks, nice post

Edwin Dalorzo said...

Well, I was reading the source of postgres-async-driver and I can see the IO is handled using Netty, which is advertised as a non-blocking IO framework. So I have not been able to find where postgres-async-driver is blocking. Are you positively sure it is blocking?

Erik Brandsberg said...

@Edwin

I haven't looked at this code, and even if it implements using Netty for a true Async call, it wouldn't qualify as JDBC. There are various non-JDBC based frameworks that can provide true async IO, but unless using the technique Heimdall uses (via policy vs. API), they can't provide full JDBC compatibility. I will look into this code for inclusion in Heimdall, as when the server-side code supports it, it would provide a boost for our system vs. emulating async operations.

Mike Mainguy said...

I'm saying JDBC is blocking...if you put a component in between that brokers a queue in front of the block call and it's response, then "that api is non-blocking", but it's still based on a blocking api (JDBC). Frankly I think it's a little bit of "slight of hand" because at the end of the day it's a rare application that either (#1 couldn't do the same thing with any other queuing technology or #2 actually needs this).

To your point, from an application perspective, the postgres is completely async...it just uses a pool of threads internally to manage the blocking JDBC calls.