Why Payment Queries Are a Different Beast
Most indexing advice assumes generic workloads — a users table, a products catalog, maybe some blog posts. Payment systems don't work like that. The query patterns are fundamentally different, and if you index for the generic case, you'll hit walls fast.
Payment tables are time-range heavy. Almost every dashboard, report, and reconciliation job filters by created_at or settled_at. You're constantly asking "show me all transactions between Monday and Friday" or "what settled in the last hour." That's range scans, not point lookups, and it changes everything about how your indexes should be structured.
Then there's status filtering. A payment can be pending, processing, completed, failed, or refunded. Operations teams live in the pending and failed views. But here's the thing — in a healthy system, 95%+ of your transactions are completed. You're indexing hundreds of millions of rows to find the few thousand that actually matter.
And merchant lookups tie it all together. Every payment belongs to a merchant, and merchants need to see their own transactions filtered by date and status. That's a three-column filter on a table that grows by millions of rows per day.
The Table We're Working With
Let's ground this in something concrete. Here's a simplified version of the payment_transactions table I've worked with across multiple payment platforms:
CREATE TABLE payment_transactions (
id BIGSERIAL PRIMARY KEY,
merchant_id UUID NOT NULL,
amount BIGINT NOT NULL, -- cents
currency CHAR(3) NOT NULL,
status VARCHAR(20) NOT NULL,
payment_method VARCHAR(30) NOT NULL,
customer_email VARCHAR(255),
reference_id VARCHAR(100) UNIQUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
settled_at TIMESTAMPTZ
);
At 50 million rows, naive queries start hurting. At 500 million, wrong indexes will bring your database to its knees.
Composite Indexes — Column Order Is Everything
The most common query in any payment system is some variation of: "show me this merchant's transactions for the last 7 days." The naive approach is to create separate indexes on merchant_id and created_at. PostgreSQL will use one of them and then filter the rest. That's not good enough.
You need a composite index, and the column order matters more than most people realize. The rule is: equality columns first, range columns last.
-- Good: equality first, range last
CREATE INDEX idx_txn_merchant_created
ON payment_transactions (merchant_id, created_at DESC);
-- Bad: range first, equality second
CREATE INDEX idx_txn_created_merchant
ON payment_transactions (created_at DESC, merchant_id);
Why does order matter? With (merchant_id, created_at DESC), PostgreSQL can jump directly to the B-tree leaf for a specific merchant, then scan forward through the already-sorted timestamps. It's a single seek plus a sequential scan of exactly the rows you need.
With (created_at DESC, merchant_id), the database has to scan the entire date range across all merchants and then filter. On a table with 10,000 active merchants, that's roughly 10,000x more rows examined.
Rule of thumb for composite indexes: put columns used in = conditions first, columns used in ORDER BY next, and columns used in range conditions (BETWEEN, >, <) last. This gives the planner the tightest possible B-tree traversal.
When you add status filtering — "show me this merchant's failed transactions this week" — extend the composite:
CREATE INDEX idx_txn_merchant_status_created
ON payment_transactions (merchant_id, status, created_at DESC);
This handles the three-column filter pattern that dominates payment dashboards. The planner walks the B-tree: merchant → status → date range. No wasted reads.
Partial Indexes — Stop Indexing What You Don't Query
Here's something that took me too long to learn: in a payment system, you almost never query for completed transactions by status. You query for everything except completed. The operations team wants pending, failed, and stuck transactions. The completed ones just sit there for reporting, which runs on date ranges anyway.
A partial index lets you index only the rows that match a condition. For a table with 500 million rows where 480 million are completed, this is transformative:
-- Only index the ~4% of transactions that aren't completed
CREATE INDEX idx_txn_active_status
ON payment_transactions (status, merchant_id, created_at DESC)
WHERE status NOT IN ('completed');
-- Only index pending transactions for the retry queue
CREATE INDEX idx_txn_pending_retry
ON payment_transactions (created_at)
WHERE status = 'pending'
AND created_at > now() - INTERVAL '24 hours';
That first index covers 20 million rows instead of 500 million. It's smaller, faster to scan, and faster to maintain on writes. The second one is even more aggressive — it only indexes pending transactions from the last 24 hours, which is exactly what the retry worker queries.
Covering Indexes — Skip the Table Entirely
Every time PostgreSQL finds a row via an index, it still has to go back to the heap (the actual table) to fetch the columns you selected. This is called a "heap fetch," and on large tables with cold caches, it's where most of your query time goes.
A covering index includes all the columns the query needs, so PostgreSQL can answer the query entirely from the index using an Index Only Scan:
-- Covers: SELECT amount, currency, status, created_at
-- WHERE merchant_id = ? AND created_at > ?
CREATE INDEX idx_txn_merchant_covering
ON payment_transactions (merchant_id, created_at DESC)
INCLUDE (amount, currency, status);
The INCLUDE clause (available since PostgreSQL 11) adds columns to the index leaf pages without including them in the B-tree sort order. The planner sees that all requested columns are in the index and skips the heap entirely.
I've seen this turn a 45ms merchant dashboard query into a 3ms one. The difference is dramatic when your table is larger than available RAM and heap fetches mean random disk I/O.
Index Type Comparison
The Write Penalty — Every Index Has a Cost
Here's the trade-off nobody warns you about until you're in production: every index you add slows down your writes. Every INSERT into payment_transactions has to update not just the heap, but every single index on that table. For a payment system processing 5,000 transactions per second, this adds up fast.
I learned this the hard way. We had 11 indexes on our main transactions table. Each insert was touching 11 B-trees, updating 11 sets of leaf pages, and generating 11x the WAL traffic. Our insert latency crept from 2ms to 18ms over six months as the indexes grew. We didn't notice because it happened gradually — until a Black Friday traffic spike pushed us past the tipping point and inserts started queuing.
The fix was an audit. We pulled pg_stat_user_indexes and found that 4 of those 11 indexes had zero scans in the last 30 days. Dead weight. We dropped them and insert latency dropped to 6ms overnight.
-- Find unused indexes (run this periodically)
SELECT
schemaname || '.' || relname AS table,
indexrelname AS index,
pg_size_pretty(pg_relation_size(i.indexrelid)) AS size,
idx_scan AS times_used
FROM pg_stat_user_indexes i
JOIN pg_index USING (indexrelid)
WHERE idx_scan = 0
AND NOT indisunique
AND NOT indisprimary
ORDER BY pg_relation_size(i.indexrelid) DESC;
Production rule: if an index hasn't been scanned in 30 days and it's not enforcing a unique constraint, it's a candidate for removal. Reset the stats with pg_stat_reset() after a deploy, wait a full business cycle (at least one month), then audit.
Reading Query Plans — EXPLAIN ANALYZE Is Your Best Friend
You can't optimize what you can't measure. Before adding any index, run EXPLAIN (ANALYZE, BUFFERS) on the actual query. Not EXPLAIN alone — that shows the planner's estimate, which can be wildly wrong on skewed data. ANALYZE actually executes the query and shows real numbers.
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT id, amount, currency, status, created_at
FROM payment_transactions
WHERE merchant_id = 'a1b2c3d4-...'
AND status = 'failed'
AND created_at > now() - INTERVAL '7 days';
-- Before index (sequential scan):
-- Seq Scan on payment_transactions
-- Rows Removed by Filter: 4,891,203
-- Buffers: shared hit=38201 read=124892
-- Planning Time: 0.4ms
-- Execution Time: 1,847.3ms
-- After composite + partial index:
-- Index Scan using idx_txn_merchant_status_created
-- Rows Removed by Index Recheck: 0
-- Buffers: shared hit=12
-- Planning Time: 0.2ms
-- Execution Time: 0.8ms
Look at three things: the scan type (Seq Scan is almost always bad on large tables), the buffer counts (shared read means disk I/O, shared hit means cache), and Rows Removed by Filter (high numbers mean the index isn't selective enough).
How the Query Planner Decides
Understanding what PostgreSQL does when your query arrives helps you design better indexes. Here's the simplified decision flow:
Full table read
B-tree traversal
The planner picks the cheapest path based on table statistics (pg_statistic), index selectivity, and cost parameters. If your stats are stale — which happens when you bulk-load transactions — the planner makes bad choices. Run ANALYZE payment_transactions after any large data load.
Practical Tips From Production
After indexing payment tables across four different companies, here's what I keep coming back to:
- Start with three indexes max on your transactions table. The primary key, a composite on
(merchant_id, created_at DESC), and a partial on active statuses. Add more only whenEXPLAIN ANALYZEtells you to. - Use
CONCURRENTLYfor production index creation. A normalCREATE INDEXlocks the table for writes. On a payment table, that means dropped transactions. Always useCREATE INDEX CONCURRENTLY— it takes longer but doesn't block. - Partition before you index. Once your table passes a billion rows, even good indexes struggle. Range-partition by
created_at(monthly works well for most payment volumes) and let PostgreSQL prune entire partitions before touching indexes. - Monitor index bloat. High-churn columns like
statusandupdated_atcause index bloat as old entries are marked dead but not reclaimed. RunREINDEX CONCURRENTLYon a schedule, or usepg_repackfor zero-downtime rebuilds. - Don't index for batch jobs. Nightly reconciliation reports that scan millions of rows are better served by sequential scans on partitioned tables. Adding an index to speed up a job that runs once a day but slows down every insert is a bad trade.
-- Create index without blocking writes (always do this in production)
CREATE INDEX CONCURRENTLY idx_txn_merchant_status_created
ON payment_transactions (merchant_id, status, created_at DESC);
-- Check index bloat ratio
SELECT
nspname || '.' || relname AS index,
pg_size_pretty(pg_relation_size(c.oid)) AS size,
CASE WHEN avg_leaf_density > 0
THEN round((1 - avg_leaf_density / 90.0) * 100)
ELSE 0 END AS bloat_pct
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
JOIN pg_stat_user_indexes i ON i.indexrelid = c.oid
LEFT JOIN pgstatindex(c.oid::regclass) ON true
WHERE c.relkind = 'i'
ORDER BY pg_relation_size(c.oid) DESC;
References
- PostgreSQL Documentation — Indexes
- PostgreSQL Documentation — Using EXPLAIN
- Use The Index, Luke — A Guide to Database Performance for Developers
- Use The Index, Luke — Concatenated Indexes (Column Order)
- PostgreSQL Documentation — The Cumulative Statistics System
Disclaimer: This article reflects the author's personal experience and opinions. Product names, logos, and brands are property of their respective owners. Code examples are simplified for clarity — always review and adapt for your specific use case and security requirements.