Engineering · 12 May 2026

What's actually inside your Postgres data directory

PGDATA looks like a folder full of cryptic short names. It is in fact the entire database. A tour, directory by directory, and the reasons a backup company ended up reading every one of them.

By Rahul·12 minute read·All posts

When you install Postgres, run initdb, and start it up, all of your data lives inside one directory. The Postgres documentation calls it “the data directory.” The environment variable points at it: PGDATA. On Debian it’s at /var/lib/postgresql/16/main. On macOS via Homebrew it’s /opt/homebrew/var/postgresql@16. Inside a Docker image it’s wherever the volume is mounted. Wherever you find it, the contents look about the same: a couple of dozen short, cryptic directory names, a handful of config files, and a little file called PG_VERSION that’s there to keep you honest.

For most application developers, this folder is a black box. You connect on port 5432, you write SQL, you trust that the bytes are safe somewhere. That’s how it should be most of the time.

But when you start running backup software, the abstraction leaks. The first time it leaks for you, it usually hurts. A restore “succeeds” but the application can’t connect because a role is missing. An index is reachable but rows come back in a different order, and a query plan flips. A tablespace symlink points at a directory that doesn’t exist. The dump finished; the restore finished; the database is wrong.

We hit all of these in the first six months of building dbcrate. We got out of them by reading the data directory file by file until each name made sense. This is a tour through what we found, what each folder is for, and the lesson each one taught us.

A first look

Here’s what ls $PGDATA looks like on a fresh Postgres 16 install:

$PGDATA/
├── PG_VERSION
├── base/
├── current_logfiles
├── global/
├── pg_commit_ts/
├── pg_dynshmem/
├── pg_hba.conf
├── pg_ident.conf
├── pg_logical/
├── pg_multixact/
├── pg_notify/
├── pg_replslot/
├── pg_serial/
├── pg_snapshots/
├── pg_stat/
├── pg_stat_tmp/
├── pg_subtrans/
├── pg_tblspc/
├── pg_twophase/
├── pg_wal/
├── pg_xact/
├── postgresql.auto.conf
├── postgresql.conf
├── postmaster.opts
└── postmaster.pid

Twenty four entries. Some are huge (base/ and pg_wal/); some are routinely empty (pg_snapshots/, pg_replslot/ on a database with no replicas). All of them matter, in the sense that if you delete the wrong one, your database is broken.

Postgres calls this whole thing a “cluster.” Not in the distributed-systems sense. In Postgres-speak, a cluster is one running server with one set of system tables, one set of users, and as many databases as you’ve created inside it. One PGDATA is one cluster.

Now let’s open the boxes.

`base/`: where your tables actually live

If your Postgres holds 200 GB of data, somewhere around 198 GB of it is in base/. This is the directory that holds your tables, your indexes, your TOAST chunks, and almost everything you’d think of as “the data.”

Inside base/ is one subdirectory per database, named not by your database’s name but by its OID:

base/
├── 1/        (template1)
├── 4/        (template0)
├── 5/        (postgres)
├── 16384/    (your databases)
└── 16386/

The big numbers are your actual databases. To find out which is which:

SELECT oid, datname FROM pg_database;

Inside each database directory is another set of numbered files. Each is a “relation”: a table, an index, a sequence, or a materialized view. The number is again an OID, this time of the relation. pg_class maps OIDs to names. Each relation is split across files of up to 1 GB:

base/16384/
├── 16432       (heap, segment 0)
├── 16432.1     (heap, segment 1)
├── 16432.2
├── 16432.3
├── 16432.4
├── 16432_fsm   (free space map)
└── 16432_vm    (visibility map)

The .N suffix is the segment number. Each file is a sequence of 8 KB pages. The _fsm is the free space map; the _vm is the visibility map. Both are derived; both can be rebuilt; both should still be backed up if you care about not paying to rebuild them.

The lesson base/ taught us was about how backups don’t see this. pg_dump walks the catalogs and emits SQL. It does not copy a single byte of base/. The bytes get reconstructed on restore from INSERT statements (or, with --format=custom, from a Postgres-flavored binary format that’s still logical, not physical). That’s a feature most of the time. It also means your backup is at the mercy of pg_dump understanding everything in your schema, including custom types, including extensions, including PL/Python functions nobody has touched in two years.

In practice: dump and restore into a clean Postgres of the same major version once a quarter, before you have to. The dumps that worked last year may not work this year.

`global/`: the things shared across all databases

global/ is the smaller, weirder cousin of base/. It holds the catalog tables that aren’t per-database: pg_database itself, pg_authid (the table behind pg_roles, where every user and role lives), pg_tablespace, and a few others. It also holds the cluster control file, pg_control, which we’ll come back to.

The directory is small, often a few megabytes. The information in it is critical. Roles live here. Delete global/ and your databases still exist on disk, but Postgres has no idea who’s allowed to log into them.

pg_dump doesn’t dump global/. There’s a separate tool, pg_dumpall, that does. So if you depend on pg_dump for backups and you restore into a fresh cluster, you’ll discover that none of your application’s roles exist. The first time you discover this is when the application can’t log in.

This was the second-most common failure mode in our early customer restores. The fix is to dump roles separately:

pg_dumpall --roles-only > roles.sql
pg_dump --format=custom mydb > mydb.dump

Then on restore:

psql -f roles.sql
pg_restore -d mydb mydb.dump

We do both as part of every backup now. If you’re rolling your own backup script, do the same. Roles are not optional infrastructure.

`pg_wal/`: the part that scares people

The Write-Ahead Log is, depending on your role, the most beautiful or the most terrifying thing about Postgres. Beautiful, because it’s the mechanism that makes COMMIT mean what it says. Terrifying, because if you fill the disk with WAL files, your database stops accepting writes, and if you delete the wrong WAL files, you can’t recover.

pg_wal/ (called pg_xlog/ before Postgres 10) holds 16 MB segment files with names like 000000010000000000000037. The 24-character name encodes the timeline, the log file number, and the segment number. Postgres flushes them to disk on every commit, modulo synchronous_commit settings. Once a checkpoint has flushed all the dirty pages in shared buffers down to base/, older WAL files can be recycled or removed.

Two settings control how much WAL accumulates: max_wal_size (a soft cap that triggers a checkpoint) and wal_keep_size (a hard floor that retains WAL even past a checkpoint, for replicas). If you have a replica that’s fallen behind, or a replication slot that no consumer is reading, WAL grows without bound and fills your disk. This is the single most common way that healthy-looking Postgres servers go down at 3 a.m.

For backups, pg_wal/ is what makes point-in-time recovery possible. A base backup plus the stream of WAL files from that point lets you restore to any moment. Without WAL, you have only the base backup, which is the database state as of when the base backup ended.

The agent we ship today doesn’t touch WAL. It runs pg_dump, which is logical and timestamped. Point-in-time recovery is on the roadmap for v5.

One thing worth knowing: pg_wal/archive_status/ is bookkeeping for archive_mode. If files pile up there with .ready suffixes and never become .done, your archive command is failing silently. Set up an alert.

`pg_xact/` and `pg_multixact/`: the commit log

Every transaction in Postgres has a transaction ID (XID), and every XID has a state: in progress, committed, aborted, or sub-committed. That state lives in pg_xact/, which used to be called pg_clog/ (commit log) before Postgres 10. The directory holds tiny files (256 KB each), each containing the state of about a million transactions.

When you read a row, Postgres needs to know whether the transaction that wrote it committed. The visibility check goes to pg_xact/. Without it, the database has no idea what’s real and what’s a half-finished write.

pg_multixact/ holds the same kind of data for transactions sharing locks on a row. Most workloads don’t generate much of it; high-contention SELECT FOR SHARE patterns can.

Both directories are vacuumed and truncated as old transactions age out. Long-running transactions hold the cleanup back, and pg_xact/ (along with a number of other things) grows. That’s part of why a forgotten BEGIN in a psql session can wreck a database over a weekend.

The lesson, even though we don’t take physical backups today: if you copy base/ while the database is running, you have to copy pg_xact/ from a coordinated point in time too. That’s part of what pg_basebackup handles. A naive cp -r of PGDATA on a live server will eventually restore a state where base/ references XIDs that pg_xact/ doesn’t know about, and the database will refuse to start.

`pg_tblspc/`: symlinks, and the trouble they cause

Tablespaces let you put specific tables on specific filesystems. Fast NVMe for hot data, cheaper rotating disk for cold, two tablespaces, tables assigned to whichever you want.

pg_tblspc/ doesn’t actually hold any data. It holds symlinks, one per tablespace, named by OID:

pg_tblspc/
├── 16500 -> /mnt/fast/postgres
└── 16501 -> /mnt/slow/postgres

This is a beautiful feature when it works and a nightmare when it doesn’t. pg_dump knows about tablespaces; on restore, the target server has to have the same tablespace paths available, or you remap with --tablespaces-to-restore, or use --no-tablespaces and accept that everything goes into the default.

The first customer who showed us a database with a tablespace was the customer who taught us this. Their dump restored into our verifier, the verifier didn’t have /mnt/fast/postgres, and pg_restore failed loudly. We added tablespace remapping the same week.

In practice: if you use tablespaces, document what they’re for. If you don’t, don’t start. The default tablespace is fine for almost every workload, and the operational complexity rarely pays for itself.

`pg_logical/` and `pg_replslot/`: the replication machinery

Logical replication uses the WAL stream, decodes it into row changes, and ships them to a subscriber. The state lives in two places. pg_logical/ holds metadata about logical decoding: snapshot files, mapping files, origin progress. pg_replslot/ holds one subdirectory per replication slot.

If you’ve ever set up logical replication and then deleted the subscriber without dropping the slot first, you’ll have learned what we learned: the slot pins WAL on the publisher, indefinitely, until you either drop the slot or the disk fills up. The disk filling up usually wins.

Physical backups capture slot state; dumps don’t. After a logical restore, any logical replication you had set up has to be rebuilt on the restored side. We probably should record whether replication slots exist on a database we’re backing up and warn the operator. Adding it to the agent is a four-line patch we haven’t got to yet.

The small directories

These exist for completeness and you can mostly ignore them in day-to-day operations. Knowing what they are saves a moment of panic next time you ls $PGDATA.

pg_subtrans/ is the subtransaction log, written when you use SAVEPOINT or PL/pgSQL exception blocks.
pg_serial/ holds state for serializable transaction snapshots. Empty unless you use SERIALIZABLE isolation.
pg_notify/ holds in-flight LISTEN / NOTIFY queues.
pg_snapshots/ holds exported snapshots used by pg_dump --jobs to coordinate parallel workers.
pg_stat/ and pg_stat_tmp/ are the statistics collector’s persistent and live copies. As of Postgres 15, stats live in shared memory and pg_stat_tmp/ is mostly vestigial.
pg_dynshmem/ holds dynamic shared memory segments for parallel query workers. Transient.
pg_commit_ts/ is the commit timestamp log. Empty unless you’ve enabled track_commit_timestamp.
pg_twophase/ holds prepared transactions. Empty unless you use PREPARE TRANSACTION.

The pattern: every Postgres feature that needs persistent on-disk state gets its own directory. Most are empty most of the time. The ones that matter, you’ve already met.

The configuration files

postgresql.conf is the main settings file. pg_hba.conf is the host-based authentication file (who can connect from where, and how). pg_ident.conf maps OS usernames to Postgres usernames for the few authentication methods that need it.

Two surprises about these files. First, postgresql.auto.conf exists and overrides postgresql.conf. It’s where ALTER SYSTEM SET writes. If you’ve been editing postgresql.conf by hand and wondering why your changes don’t take effect, check whether someone has been running ALTER SYSTEM on the same key.

Second, pg_hba.conf is read top to bottom and the first matching line wins. A strict reject rule at the bottom does nothing if a permissive line above matches first. We’ve watched a customer lock themselves out by getting this exactly wrong.

Logical backups don’t dump these files. They are settings, not data. If you’ve spent hours tuning postgresql.conf and you want those settings to survive a restore into a fresh host, copy the file separately.

In practice: keep postgresql.conf and pg_hba.conf in version control alongside your application code. Treat them as code, because they are.

`pg_control` and the cluster’s identity

Inside global/ is a small file called pg_control. It holds the cluster’s identity and a bunch of state Postgres needs at startup: the system identifier, the catalog version, the WAL level, the latest checkpoint location. If pg_control is missing or corrupted, the cluster will not start.

pg_controldata prints it in human-readable form. Run it once on a live Postgres and read the output:

$ pg_controldata $PGDATA
pg_control version number:            1300
Catalog version number:               202307071
Database system identifier:           7387012345678901234
Database cluster state:               in production
Latest checkpoint location:           1/A234B0C8
...
Database block size:                  8192
Bytes per WAL segment:                16777216
Data page checksum version:           1

Two lines on that list are load-bearing for backups. The “Database system identifier” is unique to this cluster; if you back up cluster A and try to apply WAL from cluster B, the identifiers won’t match and recovery refuses to proceed. The “Data page checksum version” tells you whether checksums were enabled at initdb time. If they were, every page on disk has a checksum that gets verified on read. This is one of the cheapest things you can do for your data integrity, and it cannot be turned on after the fact without a long, ugly migration. If you’re spinning up a new cluster, run initdb --data-checksums. Future you will thank present you.

Why we read all of this, even though we ship logical backups

dbcrate ships logical backups today. We pipe pg_dump --format=custom through zstd and age encryption straight to object storage. We don’t read most of PGDATA directly. So why have we spent so much time learning it?

Three reasons.

First, every restore failure we’ve debugged has come back to something that wasn’t in the dump. Roles in global/. Locales in pg_control. Extensions whose underlying shared libraries weren’t installed at the same version. Tablespaces pointing at directories the verifier didn’t have. The dump captured the application’s view of the database. The data directory was the source of truth about everything else.

Second, our customers ask us what to do when something on their host is misbehaving. Disk filling up: look at pg_wal/ and pg_replslot/. Backups slow on a pre-15 install: check whether pg_stat_tmp/ is on a slow disk. Worried about corruption: enable data_checksums next time. We can’t help if we don’t know.

Third, physical backups and point-in-time recovery are on the roadmap for v5. The day we ship them, every assumption we make about the data directory will be load-bearing for restores that happen at the worst moments in our customers’ lives. Better to over-prepare now.

What you can do this evening

If you’ve never opened your own PGDATA, here’s a short list. None of these write anything.

Find the directory: psql -c "SHOW data_directory;".
Run du -sh $PGDATA/* (as the postgres user, or with sudo). Note which directories are large. base/ and pg_wal/ will dominate. Anything else unexpectedly large is worth investigating.
Run pg_controldata $PGDATA. Note the system identifier, the WAL level, and whether data page checksums are on.
Look at pg_tblspc/. If it’s empty, you don’t have to worry about tablespaces on restore. If it has symlinks, follow them and check the targets exist.
Run ls -la $PGDATA/pg_replslot/. If there are subdirectories, you have replication slots; check the consumers are still there.
Read pg_hba.conf top to bottom. Out loud, if you can.

That’s an hour, and you’ll come out of it knowing things about your own database that you didn’t know this morning.

Closing

The data directory isn’t magic. It’s a couple of dozen folders, well-named once you know what they’re for, holding the state of a system that’s been refined for thirty years. Backup software that doesn’t look inside is backup software that gets surprised. Operators who don’t look inside get surprised at 3 a.m. when something has filled up.

We built dbcrate because we wanted backups that don’t get surprised. That meant reading the directory until we knew what every name was for. We’ve shared what we found here in case it saves you the same archaeology.

If you operate a Postgres in production and you want backups that are honest about what they are (verified, encrypted on the host, restored once a week into a clean cluster of the same major version), we’re building it. There’s a beta. Drop us a line at [email protected] and we’ll get you set up.

Stuck? Open an issue on our GitHub or send us mail. We answer.

— 30 —

A first look

base/: where your tables actually live

global/: the things shared across all databases

pg_wal/: the part that scares people

pg_xact/ and pg_multixact/: the commit log

pg_tblspc/: symlinks, and the trouble they cause

pg_logical/ and pg_replslot/: the replication machinery