When Partitioning Becomes Necessary
I spent the better part of a year watching our main transactions table grow from 50 million rows to over 400 million. For a while, indexes held things together. But once we crossed roughly 200 million rows, the cracks started showing: nightly settlement queries that used to finish in seconds were taking minutes, VACUUM operations were blocking writes during peak hours, and our index bloat was eating disk space faster than we could provision it.
There's no magic number, but in my experience these are the signals that it's time to partition:
- Table size exceeds 50–100 GB or surpasses 100–200 million rows
VACUUMandANALYZEtake longer than your maintenance window- Index scans slow down despite proper indexing, because the B-tree depth keeps growing
- You have a natural time-based or categorical access pattern (most payment queries do)
Tip: Don't partition prematurely. If your table is under 10 GB and queries are fast, partitioning adds complexity without meaningful benefit. Measure first.
How Partitioning Works
PostgreSQL's declarative partitioning (available since version 10, significantly improved in 12+) lets you split a single logical table into smaller physical tables. Queries against the parent table are automatically routed to the relevant partitions. The planner skips partitions that can't contain matching rows — a mechanism called partition pruning.
~38M rows
~35M rows
~40M rows
~12M rows
~42M rows
Instead of scanning 400 million rows, a query for March 2026 settlements only touches the ~40 million rows in that single partition. The difference is dramatic.
Choosing a Partitioning Strategy
PostgreSQL supports three partitioning methods. For payment data, the choice depends on your query patterns.
| Strategy | Best For | Pros | Cons |
|---|---|---|---|
| Range | Time-series data (created_at, settlement_date) | Natural fit for payment timelines; easy to drop old data; excellent pruning on date queries | Uneven partition sizes if volume fluctuates seasonally |
| List | Categorical data (status, currency, region) | Good for queries that always filter by a specific category; simple to reason about | Doesn't scale well if categories grow; unbalanced partition sizes |
| Hash | Even distribution when no natural range exists | Guarantees uniform partition sizes; works for merchant_id or transaction_id | No partition pruning on range queries; can't easily drop a "time window" of data |
For most payment systems, range partitioning on created_at is the right default. Your settlement reports, reconciliation jobs, and compliance queries almost always filter by date range. If you also need to shard by merchant, consider a composite approach: range-partition by month, then sub-partition by merchant hash.
Setting It Up
Here's the DDL I used for our transactions table. The parent table defines the schema and partition key; each child table covers one month.
CREATE TABLE transactions (
id BIGINT GENERATED ALWAYS AS IDENTITY,
merchant_id UUID NOT NULL,
amount NUMERIC(15,2) NOT NULL,
currency CHAR(3) NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'pending',
payment_method VARCHAR(30),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
settled_at TIMESTAMPTZ,
metadata JSONB
) PARTITION BY RANGE (created_at);
-- Create monthly partitions
CREATE TABLE transactions_2026_01 PARTITION OF transactions
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE transactions_2026_02 PARTITION OF transactions
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
CREATE TABLE transactions_2026_03 PARTITION OF transactions
FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');
CREATE TABLE transactions_2026_04 PARTITION OF transactions
FOR VALUES FROM ('2026-04-01') TO ('2026-05-01');
-- Indexes are created on each partition automatically
CREATE INDEX ON transactions (created_at);
CREATE INDEX ON transactions (merchant_id, created_at);
CREATE INDEX ON transactions (status) WHERE status IN ('pending', 'processing');
Important: The partition key (created_at) must be included in any unique constraint or primary key. This is a PostgreSQL requirement. You can't have a global unique index on id alone — it needs to be (id, created_at).
Partition Pruning in Action
When you query with a filter on the partition key, PostgreSQL skips irrelevant partitions entirely. You can verify this with EXPLAIN:
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, amount, status
FROM transactions
WHERE created_at >= '2026-03-01'
AND created_at < '2026-04-01'
AND merchant_id = 'a1b2c3d4-...'
AND status = 'settled';
-- Output shows:
-- -> Index Scan on transactions_2026_03
-- Rows Removed by Filter: 0
-- Buffers: shared hit=42
-- Only the March partition was scanned.
Without partitioning, that same query would scan the full index across all 400 million rows. With partitioning, it touches only the March partition's index — roughly 40 million rows.
Automating Maintenance with pg_partman
Manually creating partitions every month is a recipe for a 2 AM incident when the next month's partition doesn't exist and inserts start failing. That's where pg_partman comes in. It automates partition creation and optionally handles retention.
-- Install the extension
CREATE EXTENSION pg_partman;
-- Configure automatic partition management
SELECT partman.create_parent(
p_parent_table => 'public.transactions',
p_control => 'created_at',
p_type => 'native',
p_interval => 'monthly',
p_premake => 3 -- create 3 future partitions ahead
);
-- Schedule maintenance (run via pg_cron or external scheduler)
-- This creates upcoming partitions and optionally drops old ones
SELECT partman.run_maintenance('public.transactions');
I run run_maintenance daily via pg_cron. It checks whether the next three months of partitions exist and creates any that are missing. For retention, you can configure pg_partman to automatically detach or drop partitions older than a specified threshold:
-- Keep 24 months of data, detach older partitions
UPDATE partman.part_config
SET retention = '24 months',
retention_keep_table = true -- detach but don't drop
WHERE parent_table = 'public.transactions';
Warning: Setting retention_keep_table = false will permanently delete data. In payment systems, regulatory requirements often mandate keeping transaction records for 5–7 years. Detach old partitions and move them to cold storage instead of dropping them.
Migrating an Existing Table
This is the part nobody enjoys. You can't just ALTER TABLE an existing unpartitioned table into a partitioned one. The migration requires creating a new partitioned table and moving data over. Here's the approach I've used in production with minimal downtime:
- Create the new partitioned table with the same schema (as shown above).
- Backfill historical data in batches during off-peak hours:
-- Migrate in monthly chunks to avoid long locks INSERT INTO transactions_new SELECT * FROM transactions_old WHERE created_at >= '2026-01-01' AND created_at < '2026-02-01'; -- Repeat for each month - Set up a trigger or logical replication on the old table to capture new writes during migration.
- Swap tables in a short transaction once the backfill catches up:
BEGIN; ALTER TABLE transactions_old RENAME TO transactions_archive; ALTER TABLE transactions_new RENAME TO transactions; COMMIT; - Update application connection pools and verify queries hit the new table.
The entire swap step takes milliseconds since it's just metadata changes. The backfill is the slow part, but it runs in the background without blocking reads or writes on the original table.
Partition Key Selection: created_at vs merchant_id
I've seen teams debate this endlessly. Here's my take after running both in production:
created_at(range) — wins for settlement, reconciliation, reporting, and compliance queries. Also makes retention trivial: drop a month, done. This is the right choice for 90% of payment systems.merchant_id(hash) — useful if your primary access pattern is "give me all transactions for merchant X" and you have thousands of merchants with roughly equal volume. But you lose the ability to efficiently drop old data by time range.- Composite (range + hash sub-partitioning) — the best of both worlds, but adds operational complexity. Only worth it if you genuinely query both dimensions frequently and have the team to manage it.
Gotchas I Learned the Hard Way
- Foreign keys referencing partitioned tables are not supported in PostgreSQL (as of version 17). If other tables reference
transactions.id, you'll need to handle referential integrity at the application level or use a different approach. - Cross-partition queries without the partition key will scan all partitions. If someone writes
SELECT * FROM transactions WHERE merchant_id = '...'without a date filter, they'll hit every partition. Add guardrails in your application layer. enable_partition_pruningmust beon(it's on by default, but verify). Without it, the planner won't skip irrelevant partitions.- Partition-wise joins and aggregates (
enable_partitionwise_join,enable_partitionwise_aggregate) are off by default. Turning them on can significantly speed up joins between partitioned tables, but increases planning time.