Skip to content

ACID transactions

March 12, 2021

The last few days, I’ve been down the rabbit hole of learning more about Postgres and databases in general. Here are a few things I have explored, along with useful resources. Let’s start with the basics.

ACID transactions

ACID stands for atomicity, consistency, isolation and durability. In my experience atomicity and isolation are the most interesting for modern application development, but let’s cover all of them.

Atomicity

Atomicity means that transactions consisting of multiple statements should succeed or fail completely. Martin Kleppman (YouTube) suggested the alternative term “Abortability”, that I think is a bit more clear.

Take an example in Postgres, where atomicity is typically achieved using BEGIN TRANSACTIONS and COMMIT TRANSACTIONS. Let’s say we have two tables, one with users and one with user related events. We want to guarantee that both rows are inserted, or none of them.

CREATE TABLE users(id INT UNIQUE);
CREATE TABLE users_events(type TEXT);

BEGIN; -- Begin transaction
INSERT INTO users_events VALUES ('user 1 created');
INSERT INTO users VALUES (1);
COMMIT; -- Commit both statements to the database

Now, if we get an error half way (maybe the database connection timed out, or in the case below there is already a user with the same ID). We use ROLLBACK to abort the transaction.

BEGIN; -- Begin transaction
INSERT INTO users_events VALUES ('user 1 created again');
INSERT INTO users VALUES (1); -- ERROR: duplicate key value violates unique constraint "users_id_key"
ROLLBACK; -- Abort the transaction and both statements

And so we have an atomic transaction!

Consistency

Consistency means that the database should be in a valid state, before and after a transaction. What does valid mean? It means that it is consistent to whatever rules have been defined. Good examples includes constraints for table rows, such as UNIQUE and NOT NULL in Postgres.

For example:

CREATE TABLE customers(
    ID bigint UNIQUE,
    email TEXT NOT NULL
);

Given the following table definition, our database would be inconsistent if we found the following database rows:

Violation the UNIQUE constraint

SELECT * FROM customers;

 id |      email
----+-----------------
  1 | hedvor@mail.com
  1 | john@mail.com

Violation of the NOT NULL constraint

SELECT * FROM customers;

 id |      email
----+-----------------
  1 | hedvor@mail.com
  2 |

In my experience, keeping your database consistent isn’t really a problem for the application developer, but for the database itself. The responsibility split is as follows

  1. Developer: Defining the rules (uniqueness, postgres triggers, …)
  2. Database: Ensuring database consistency, according to those rules (consistency).

Most modern databases handle consistency well. In Postgres, for example, uniqueness violations will be rejected with: [23505] ERROR: duplicate key value violates unique constraint.

Isolation

Transactions are often run in parallel. Isolation means that even though transactions are run in parallel, the end result should be same as if the transaction run in series.

Using full serialization in a database comes with a performance cost, and most transactions do not need full serialization. Instead it is common to be able to choose isolation level when performing a transaction.

Here are three common isolation levels:

This article by pgDash is very good at explaining these concepts.

Durability

The last concept in the quartet is durability. This means that once the database reports a transaction as successful, the transaction is guaranteed not to be lost, even in the case of a hardware fault, or a crash in the database. Guaranteeing durability is also a problem to be solved by the database, not by the application developer. One common approach by databases to solve this is to use write-head logging.

Conclusion

Resources