Audit tables in Postgres

A while back I watched this video (summary here) with Michael Bryzek (Flow.io, Gilt). He presented how they do Change Data Capture (CDC) with PostgreSQL and DynamoDB. One goal with CDC is that changes to databases rows and tables can be propagated to other parts of the system. In microservices systems, this can be used to build read-projections, or just keep local caches of data.

Postgres is one of my favorite databases to work with in production (and probably the one I most experience with), but I have little or experience working features such as PostgreSQL triggers and stored procedures. In this blog post, I explore how to use triggers to create a simple audit table in Postgres.

Why do you need an audit table?

Customer support might need the information to answer customer support requests. They might need to answer questions such as why their data is in the current state, and when it was changed and by whom. was updated.
You (the engineer) might want to answer the same questions when debugging 😉
You can use the audit table to build more complex functionality, such as propagating changes to other parts of the system.

Let’s get started!

The tables

Let us say that the user profiles are stored in the profile table.

CREATE TABLE profile
(
    id         INT GENERATED ALWAYS AS IDENTITY,
    name       VARCHAR(40) NOT NULL,
    email      VARCHAR(40) NOT NULL,
    PRIMARY KEY (id)
);

We also set up the profile_audits.

CREATE TABLE profile_audits
(
    id         INT GENERATED ALWAYS AS IDENTITY,
    operation  VARCHAR(40)  NOT NULL,
    row        jsonb        NOT NULL,
    changed_on TIMESTAMP(6) NOT NULL
);

*Note: Our profile_audits can contain any columns we want. I chose a single row column to store the full changed row because 1) it is easy 2) we don’t have to worry about schema changes to the profile table.*

Linking the two tables

The goal is to be able to track changes to the profile table, and store them in profile_audit. For this we need

A postgres trigger that listen to changes to the profile table. From the documentation:

CREATE TRIGGER creates a new trigger. The trigger will be associated with the specified table or view and will execute the specified function function_name when certain events occur.

We also need a postgres function to called by the trigger.

First, let’s define the function:

CREATE OR REPLACE FUNCTION log_profile_changes()
    RETURNS TRIGGER
    LANGUAGE PLPGSQL
AS
$$
BEGIN
    RAISE EXCEPTION 'not implemented :-)'
END;
$$

The function will just raise an exception, for now.

Next we set up our trigger. The following makes sure log_profile_changes is called on INSERTS and UPDATES to the profile table.

CREATE TRIGGER profile_changes
    BEFORE INSERT OR UPDATE
    ON profile
    FOR EACH ROW
EXECUTE PROCEDURE log_profile_changes();

We can try our trigger by doing an insert to the profile table.

INSERT INTO profile (name, email) VALUES ('Magnus', 'magnus@gmail.com');

This results in:

[P0001] ERROR: not implemented :-) Where: PL/pgSQL function log_profile_changes() line 3 at RAISE

Success! This is the exception not implemented :-) from our function.

Now we know that the basic setup works as expected, we can update our function to make inserts to the audit table. We have access to several variables inside the trigger. We are interested in NEW and TG_OP. Complete list of variables here.

NEW - Data type RECORD; variable holding the new database row for INSERT/UPDATE operations in row-level triggers. This variable is NULL in statement-level triggers and for DELETE operations.
TG_OP - Data type text; a string of INSERT, UPDATE, DELETE, or TRUNCATE telling for which operation the trigger was fired.

The new function:

CREATE OR REPLACE FUNCTION log_profile_changes()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS
$$
BEGIN

    IF (TG_OP = 'INSERT') OR (TG_OP = 'UPDATE') THEN
        INSERT INTO profile_audits(operation, row, changed_on)
        VALUES (TG_OP, to_jsonb(NEW), now());
    END IF;

    RETURN NEW;
END ;
$$

Testing everything

We can test our trigger with an insert and an update.

INSERT INTO profile (name, email) VALUES ('Magnus', 'magnus@gmail.com');
UPDATE profile SET email = 'magnus2@gmail.com' WHERE name = 'Magnus';

Result 1

SELECT * FROM profile

id	name	email
4	Magnus	magnus2@gmail.com

Result 2

SELECT * FROM profile_audits

id	operation	row	changed_on
1	INSERT	{“id”: 4, “name”: “Magnus”, “email”: “magnus@gmail.com”}	2022-02-28 07:39:36.313046
2	UPDATE	{“id”: 4, “name”: “Magnus”, “email”: “magnus2@gmail.com”}	2022-02-28 07:40:07.678714

Looks good!

Conclusion

Triggers are a simple way of creating audit tables when using PostgreSQL. There are a few pros and cons worth mentioning.

✅ Pros

Makes troubleshooting easy, since all changes are tracked
Uses built-in postgres functionality - meaning we don’t have to reimplement the same logic in the application code.

❌ Cons

Moving application logic to the database CAN quite quickly become very messy. See some good comments on stored procedures here. If an engineering organization wants to use these approaches, care should be take to discuss the pros and cons and common pitfalls.
Increased insert latency, and storage because profile_audits also needs to be populated.
Increased disk usage and costs

Possible next steps

In an event based system, the profile_audits can be used as a building block to periodically send out change events, like ProfileCreated, ProfileUpdated, ProfileDeleted.
We could also add more audit information, what caused the change to happen and whom. It would be interesting to distinguish whether an administrator or the user themselves updated it, for example. In an event based system, tracking the event id which caused the change might also be interesting.

Resources

Design Microservice Architectures the Right Way - Michael Bryzek
Summary of the video above
datacater.io: PostgreSQL Change Data Capture (CDC)
Built-in audit function in Postgres