How to configure PostgreSQL for integration testing

Estimated read time 7 min read

When testing, achieving performance and reliability is critical. In this article, I’ll explain how to set up PostgreSQL for testing and discuss some common pitfalls to avoid.

Isolation is the primary goal

Before we get into the details, let’s define our goals:

Isolation – We want to ensure that every test is run in an isolated environment. At a minimum, this means each test should have its own database. This ensures that the tests don’t interfere with each other and you can run the tests in parallel without any issues.

Performance – We want to make sure that the PostgreSQL setup for testing is fast. Slow solutions can make running tests in CI/CD prohibitively expensive. The solution we came up with had to allow us to perform testing without introducing too much additional overhead.

The rest of this article will focus on what we tried, what worked, and what didn’t work.

Doesn’t work

Use transactions

The first method we tried was to use transactions. We start a transaction at the beginning of each test and roll it back at the end.

The basic idea is shown in the following example:

test('calculates total basket value', async () => {
  await pool.transaction(async (tx) => {
    await tx.query(sql.unsafe`
      INSERT INTO basket (product_id, quantity)
      VALUES (1, 2)
    `);
 
    const total = await getBasketTotal(tx);
 
    expect(total).toBe(20);
  });
});

The transactional approach works well in simple cases (for example, testing a single function), but it can quickly become a problem when it comes to testing integrations between multiple components. Due to connection pooling, nested transactions, and other factors, the amount of work required to make the transaction handling method work properly means that we cannot replicate the true behavior of the application, which does not provide the confidence we need.

We also want to avoid mixing testing methods for the sake of consistency. Although using transactions is sufficient for some tests, we want a consistent approach across all tests.

Using SQLite

Another approach we tried was using SQLite. SQLite is an in-memory database that is fast and easy to set up. Similar to how transactions are handled, SQLite works well for simple cases. However, it can quickly become a problem when dealing with using PostgreSQL-specific feature code paths. In our case, we were unable to use SQLite for testing due to the use of various PostgreSQL extensions, PL/pgSQL functions, and other PostgreSQL-specific features.

pglite provides PostgreSQL packaged as a WASM module, which can be used in Node.js. This might be a good option, although we haven’t tried it yet. Regardless, pglite’s current lack of support for extensions will be a hindrance for us.

Use pg_tmp

Another approach we tried was to use pg_tmp. pg_tmp is a tool that creates a temporary PostgreSQL instance for each test. In theory, pg_tmp is a good solution. It allows for complete isolation of testing. In practice, it’s much slower than we can tolerate. With pg_tmp, it takes a few seconds to start and populate the database, and this extra overhead adds up quickly when running thousands of tests. Suppose you have 1000 tests and each test takes 1 second to run. If you add an additional 2 seconds of overhead to create a new database, you will add an additional 2000 seconds (33 minutes) of overhead.

If you prefer this approach, you might also be able to use Docker containers. Depending on the situation, Docker containers may even be faster than pg_tmp. integresql is a project I came across in a HN post. It seems to be a good option, reducing the overhead of creating a new database to about 500 milliseconds. It has a pooling mechanism that further reduces overhead. We decided not to continue down this path because we were satisfied with the level of isolation we were getting using the template database.

What works

After trying various approaches, we decided to combine two approaches: Template Databases and mounting a memory disk. This approach allows us to isolate each test at the database level without introducing too much additional overhead or complexity.

Template Databases

A template database is a database used as a template for creating new databases. When you create a new database from a template database, the new database has the same schema as the template database. The steps to create a new database from a template database are as follows:

  1. Create a template databaseALTER DATABASE <database_name> is_template=true;
  2. Create a new database from a template databaseCREATE DATABASE <new_database_name> TEMPLATE <template_database_name>;

The main advantage of using a template database is that you don’t have to deal with managing multiple PostgreSQL instances. You can create a replica database and have each test run in an isolated environment.

However, using the template database alone is not fast enough for our use case. Creating a new database from the template database still takes too long to run thousands of tests:

postgres=# CREATE DATABASE foo TEMPLATE contra;
CREATE DATABASE
Time: 1999.758 ms (00:02.000)

Therefore, memory mounting is required.

Another limitation of the template database to be aware of is that no other sessions can be connected to the source database during the replication process. CREATE DATABASE will fail if any other connections exist at startup; new connections to the source database will be blocked during the copy operation. This limitation can be easily overcome using mutual exclusion mode, but it is something to be aware of.

Mount memory disk

The final piece of the puzzle is mounting the memory disk. By mounting an in-memory disk and creating a template database on the in-memory disk, we can significantly reduce the overhead of creating a new database.

I’ll discuss how to mount a memory disk in the next section, but first, let’s see what a difference it makes.

postgres=# CREATE DATABASE bar TEMPLATE contra;
CREATE DATABASE
Time: 87.168 ms

This is a significant improvement and makes this approach feasible for our use case.

Needless to say, this approach is not without its drawbacks. The data is stored in memory, which means it is not persistent. If the database crashes or the server is restarted, data is lost. However, for running tests, this is not an issue. Each time a new database is created, the data is regenerated from the template database.

Use Docker to mount a memory disk

The approach we took was to use a Docker container with an in-memory disk. Here are the setup steps:

$ docker run \
  -p 5435:5432 \
  --tmpfs /var/lib/pg/data \
  -e PGDATA=/var/lib/pg/data \
  -e POSTGRES_PASSWORD=postgres \
  --name contra-database \
  --rm \
  postgres:14

In the above command, we are creating a Docker container and mounting a memory disk to /var/lib/pg/data. We also set the PGDATA environment variable to /var/lib/pg/data to ensure that PostgreSQL uses memory disks for data storage. The end result is that the underlying data is stored in memory, greatly reducing the overhead of creating a new database.

Manage test database

The basic idea is to create a template database before running the tests, and then create a new database from the template database for each test. Here’s a simplified version of how to manage a test database:

import {
  createPool,
  sql,
  stringifyDsn,
} from 'slonik';
 
type TestDatabase = {
  destroy: () => Promise<void>;
  getConnectionUri: () => string;
  name: () => string;
};
 
const createTestDatabasePooler = async (connectionUrl: string) => {
  const pool = await createPool(connectionUrl, {
    connectionTimeout: 5_000,
    // This ensures that we don't attempt to create multiple databases in parallel.
    maximumPoolSize: 1,
  });
 
  const createTestDatabase = async (
    templateName: string,
  ): Promise<TestDatabase> => {
    const database = 'test_' + uid();
 
    await pool.query(sql.typeAlias('void')`
      CREATE DATABASE ${sql.identifier([database])}
      TEMPLATE ${sql.identifier([templateName])}
    `);
 
    return {
      destroy: async () => {
        await pool.query(sql.typeAlias('void')`
          DROP DATABASE ${sql.identifier([database])}
        `);
      },
      getConnectionUri: () => {
        return stringifyDsn({
          ...parseDsn(connectionUrl),
          databaseName: database,
          password: 'unsafe_password',
          username: 'contra_api',
        });
      },
      name: () => {
        return database;
      },
    };
  };
 
  return () => {
    return createTestDatabase('contra_template');
  };
};
 
const getTestDatabase = await createTestDatabasePooler();

This way you can use getTestDatabase to create a new database for each test. The destroy method can be used to clean the database after the test has run.

In conclusion

This setup allows us to run thousands of tests in parallel on multiple shards without any issues. There is little overhead in creating a new database, and isolation is done at the database level. We are pleased with the performance and reliability this setup provides.

You May Also Like

More From Author

+ There are no comments

Add yours