Multithreading with SQLite

Multithreading with SQLite #

[ The code discussed here can be downloaded from this link ]

Recently a CodeProject article looked at SQLite and how it handles concurrency when used with C# and .NET framework. However, inquisitive minds asked a number of questions that, I think, merit a more complete answer:

  • is performance affected by the framework used (.NET) or is it just an SQLite limitation?
  • how many rows per second can you expect to get?

I had the tools needed to bring more light to this problem, so, here is a C++ multi-threaded program, without any large framework, using (almost) directly the SQLite C API.

SQLite Multi-threading Models #

For a long time, SQLite has eschewed multi-threading, its author, D. Richard Hipp, being known as not a fan of multi-threading. Here is an example highlighting his views back in 2005:

Actually, this seems like a good opportunity to repeat my oft-ignored advice to not use more than one thread in a single address space. If you need multiple threads, create multiple processes. This has nothing to do with SQLite - it is just good programming advice. I have worked on countless multi-threaded programs over the years, and I have yet to see a single one that didn’t contain subtle, hard to reproduce, and very hard to troubleshoot bugs related to threading issues.

I am constantly amazed at the prevailing idea (exemplified by Java) that software should be strongly typed and should not use goto statement or pointers - all in the name of reducing bugs - but that it is OK to use multiple threads within the same address space. Strong typing helps prevent only bugs that are trivially easy to locate and fix. The use of goto statements and pointers likewise results in deterministic problems that are easy to test for and relatively easy to track down and correct. But threading bugs tend to manifest themselves as timing-dependent glitches and lock-ups that are hardware and platform dependent, that never happen the same way twice, and that only appear for customers after deployment and never in a testing environment.

(I cannot say I totally disagree with him).

Over time, SQLite has grown to offer multi-threading support under two models. One is called “multithreading” and assumes that the same database connection object and objects derived from it, in particular SQL statements, are never used by two different threads. In other words, SQLite will take care to synchronize access to database file itself but all memory structures should be kept separate for separate threads.

The other model is called “serialized” and it allows for connections and queries to be shared between threads. The selection of the threading model is done using the sqlite3_config function. To select any of these modes, the SQLite code must be compiled to include the multi-threading code (by default it is).

The Test Code #

This is a very simple test. It runs an INSERT statement for a table with 4 columns: time, a counter, the ID of the thread that inserted the row and the number of retries needed to insert the row. If the database is busy, the thread gives up its execution slice, increments the number of retries and repeats until the statement succeeds:

  for (int i = 0; i < NSTATEMENTS; i++)
  {
    int retries = 0;
    // bind initial values to each column
    q.bind(1, i);
    q.bind(2, (int)id());
    q.bind(3, retries);

    //try to add row
    while ((result = q.step()) == SQLITE_BUSY)
    {
      //if DB is busy...
      Sleep(0);                 //... give up current time slice...
      q.reset().deactivate();   //... return statement to ready state....
      retries++;                //... increment number of retries...
      q.bind(3, retries);       //... and update the bound value
    }
    if (result != SQLITE_DONE)
    {
      cout << "Thread " << id() << " abnormal exit (" << result << ")\n";
      break;
    }
    q.reset();
  }

First Results #

I configured the program to run with 8 threads, each one inserting 1000 records. First let’s do it in “multithreaded” mode:

SQLITE version is: 3.43.2
Compile time threading supported: yes
Setting threading model to SQLITE_CONFIG_MULTITHREAD
Thread 16884 finished in 4.9089 sec
Thread 21588 finished in 7.6662 sec
Thread 2784 finished in 18.044 sec
Thread 29940 finished in 20.312 sec
Thread 9312 finished in 21.943 sec
Thread 19716 finished in 24.614 sec
Thread 19884 finished in 28.693 sec
Thread 30340 finished in 31.032 sec

Total rows: 8000
Total retries: 9972 max retries: 1624

Hmm, over 30 seconds is not great! Maybe it’s that large number of retries; let’s switch to “serialized” mode:

SQLITE version is: 3.43.2
Compile time threading supported: yes
Setting threading model to SQLITE_CONFIG_SERIALIZED
Thread 25152 finished in 5.0608 sec
Thread 18496 finished in 10.83 sec
Thread 17260 finished in 12.217 sec
Thread 29928 finished in 19.888 sec
Thread 29544 finished in 22.742 sec
Thread 13196 finished in 24.445 sec
Thread 15564 finished in 29.439 sec
Thread 11688 finished in 30.434 sec

Total rows: 8000
Total retries: 0 max retries: 0

We got rid of retries because now it is SQLite who take care of synchronization but not much of an overall time improvement.

The ACID is Killing You! #

Maybe SQLite is really bad at multi-threading. Let’s try with just one thread and see what’s going on:

SQLITE version is: 3.43.2
Compile time threading supported: yes
Setting threading model to SQLITE_CONFIG_SERIALIZED
Thread 18456 finished in 3.8462 sec

So, if one thread takes almost 4 seconds to do 1000 INSERT’s, 8 threads are going to take 4x8=32 seconds, which is close to our first results. So, it’s not the multi-threading that is creating the bottleneck, it must be something deeper. The answer is the ACID.

SQLite is a transactional database and you have probably heard about the ACID requirements for a transaction: it must be Atomic, Consistant, Isolated and Durable. By default SQLite treats each SQL statement as a transaction and, when an INSERT terminates, that data is truly written on the disk. Power can go down and still no data will be lost because all buffers have been flushed and everything is safe, but there is a price to pay for this safety: each INSERT takes quite a bit of time.

You can change the default approach of “each statement is a transaction” by wrapping multiple statements between BEGIN TRANSACTION and END TRANSACTION statements. Our loop will look something like:

  db.exec("BEGIN TRANSACTION");
  for (int i = 0; i < NSTATEMENTS; i++)
  {
    //... same stuff
  }
  db.exec ("END TRANSACTION");

Let’s do that and run again our test with 8 threads:

SQLITE version is: 3.43.2
Compile time threading supported: yes
Setting threading model to SQLITE_CONFIG_MULTITHREAD
Thread 8132 finished in 0.0050677 sec
Thread 5160 finished in 0.010653 sec
Thread 17808 finished in 0.016677 sec
Thread 21988 finished in 0.021636 sec
Thread 25136 finished in 0.027609 sec
Thread 23768 finished in 0.03277 sec
Thread 10568 finished in 0.038499 sec
Thread 28452 finished in 0.045179 sec

Total rows: 8000
Total retries: 188 max retries: 60

Yes! We went from 30+ seconds to 0.05 seconds. We found the real bottleneck.

Just one more wrinkle: if you wrap everything in one transaction you cannot use the same connection for all threads (the “serialized” mode). SQLite will not let you start a new transaction on the same connection while another transaction is in progress.

The House That Jack Built #

I said that I would use only the SQLite C API but that was not entirely true. I happened to have some tools to make my life easier.

First, there is sqlitepp, a thin C++ wrapper for the C API. You can find the documentation here. sqlitepp uses the error-code model for error handling, so that has to be included too.

Using C++ with threads is not such a walk in the park, as I have discussed in another article. All that threading infrastructure has to be included too.

Both components are part of my MLIB Library and latest code is on GitHub. Let’s not forget the most important stuff, SQLite itself. Whenever possible I prefer linking statically and with SQLite it’s not difficult at all.

What you get in the end is a smallish (by today’s standards), 1MB application with no external dependencies.