The Bug That Started It All
We had a wallet service processing about 12,000 transfers per hour. Every night, the reconciliation job would flag a small discrepancy — usually between $50 and $200. Not enough to trigger alarms, but enough to compound. After three months, we were off by over $14,000. The ledger entries were all there. The transactions all looked correct individually. But the balances didn't add up.
The root cause? Two concurrent transfers reading the same balance, both passing the "sufficient funds" check, and both deducting. Classic lost update. And it was entirely enabled by our use of READ COMMITTED — PostgreSQL's default isolation level.
Key insight: READ COMMITTED re-evaluates each statement against the latest committed data. That sounds safe, but it means two transactions can read the same balance, both decide there's enough money, and both commit a deduction. Neither sees the other's write until it's too late.
A Quick Refresher on Isolation Levels
Before we get into the fixes, let's ground ourselves. PostgreSQL supports four isolation levels, but really implements three distinct behaviors. Here's what each one actually protects you from:
*PostgreSQL's REPEATABLE READ actually prevents phantom reads (unlike the SQL standard minimum). It uses snapshot isolation under the hood.
The important thing to understand: READ COMMITTED takes a new snapshot for every statement. So if Transaction B commits between your SELECT balance and your UPDATE balance, you're operating on stale data and you won't know it.
The Race Condition in Action
Let me show you exactly how this plays out. Imagine a wallet with a $100 balance and two concurrent $80 transfers hitting at the same time:
Both transactions read $100, both passed the check, both wrote $20. The second commit silently overwrote the first. No error, no exception, no log entry. Just a quiet $60 hole in your ledger.
Fix #1: SELECT FOR UPDATE
The most straightforward fix is pessimistic locking with SELECT ... FOR UPDATE. This grabs a row-level lock that forces concurrent transactions to wait in line:
-- Transaction A
BEGIN;
SELECT balance FROM wallets WHERE user_id = 'usr_abc' FOR UPDATE;
-- Row is now locked. Transaction B will block here until A commits.
UPDATE wallets SET balance = balance - 80 WHERE user_id = 'usr_abc';
COMMIT;
In Go, this looks like:
func (s *WalletService) Transfer(ctx context.Context, userID string, amount int64) error {
tx, err := s.db.BeginTx(ctx, &sql.TxOptions{Isolation: sql.LevelReadCommitted})
if err != nil {
return fmt.Errorf("begin tx: %w", err)
}
defer tx.Rollback()
var balance int64
err = tx.QueryRowContext(ctx,
"SELECT balance FROM wallets WHERE user_id = $1 FOR UPDATE",
userID,
).Scan(&balance)
if err != nil {
return fmt.Errorf("lock wallet: %w", err)
}
if balance < amount {
return fmt.Errorf("insufficient funds: have %d, need %d", balance, amount)
}
_, err = tx.ExecContext(ctx,
"UPDATE wallets SET balance = balance - $1 WHERE user_id = $2",
amount, userID,
)
if err != nil {
return fmt.Errorf("deduct balance: %w", err)
}
return tx.Commit()
}
Watch out: FOR UPDATE works great for single-row operations, but if you're locking multiple wallets (sender + receiver), always lock in a consistent order (e.g., by user ID) to avoid deadlocks.
NOWAIT and SKIP LOCKED
In high-throughput systems, you might not want transactions queuing up behind a lock. PostgreSQL gives you two escape hatches:
SELECT ... FOR UPDATE NOWAIT— fails immediately if the row is locked. Good for user-facing APIs where you'd rather return a "try again" than make them wait.SELECT ... FOR UPDATE SKIP LOCKED— skips locked rows entirely. Useful for worker queues processing payment jobs, not for balance checks.
Fix #2: REPEATABLE READ + Retry Logic
If you bump the isolation level to REPEATABLE READ, PostgreSQL will detect the conflict and throw a serialization error (40001) instead of silently clobbering data. The trade-off: you need retry logic.
func (s *WalletService) TransferWithRetry(ctx context.Context, userID string, amount int64) error {
maxRetries := 3
for attempt := 0; attempt <= maxRetries; attempt++ {
err := s.doTransfer(ctx, userID, amount)
if err == nil {
return nil
}
// Check for serialization failure (SQLSTATE 40001)
var pgErr *pgconn.PgError
if errors.As(err, &pgErr) && pgErr.Code == "40001" {
if attempt < maxRetries {
continue // Retry the entire transaction
}
}
return err
}
return fmt.Errorf("transfer failed after %d retries", maxRetries)
}
func (s *WalletService) doTransfer(ctx context.Context, userID string, amount int64) error {
tx, err := s.db.BeginTx(ctx, &sql.TxOptions{
Isolation: sql.LevelRepeatableRead,
})
if err != nil {
return err
}
defer tx.Rollback()
var balance int64
err = tx.QueryRowContext(ctx,
"SELECT balance FROM wallets WHERE user_id = $1", userID,
).Scan(&balance)
if err != nil {
return err
}
if balance < amount {
return fmt.Errorf("insufficient funds")
}
_, err = tx.ExecContext(ctx,
"UPDATE wallets SET balance = balance - $1 WHERE user_id = $2",
amount, userID,
)
if err != nil {
return err
}
return tx.Commit()
}
The retry rate in practice is low — we measured around 2% under our normal load. The key advantage over FOR UPDATE is that you're not holding locks, so your overall throughput can be higher when contention is low.
Fix #3: PostgreSQL Advisory Locks
For cases where you need application-level coordination that goes beyond row locks — say, preventing duplicate payment processing across multiple tables — advisory locks are a powerful tool:
-- Use the wallet ID as the lock key
SELECT pg_advisory_xact_lock(hashtext('wallet:' || 'usr_abc'));
-- Now you have an exclusive lock scoped to this transaction.
-- No other transaction can acquire the same advisory lock until you commit.
SELECT balance FROM wallets WHERE user_id = 'usr_abc';
-- ... do your checks and updates ...
Advisory locks are lighter than FOR UPDATE because they don't interact with the row-level locking system. They're purely application-defined. I've used them to coordinate across multiple tables — locking on a payment ID while updating both the wallet balance and the ledger entry, ensuring the whole operation is atomic from a business logic perspective.
Warning: Advisory locks are invisible to most monitoring tools. If you use them, build dashboards that track pg_locks where locktype = 'advisory'. I've seen teams introduce advisory locks and then spend weeks debugging "mysterious" lock waits because they forgot to monitor them.
So Which Should You Use?
After running all three approaches in production across different services, here's my honest take:
- SELECT FOR UPDATE — use this for single-wallet operations where simplicity matters. It's the easiest to reason about, the easiest to debug, and the performance cost is negligible for most payment volumes. This is what we use for 90% of our transfer operations.
- REPEATABLE READ + retry — use this when you have read-heavy transactions that occasionally write, or when you want to avoid holding locks during longer operations. The retry logic adds complexity, but it's worth it for batch processing or reconciliation jobs.
- SERIALIZABLE — use this for critical financial calculations where correctness is non-negotiable and you can tolerate higher retry rates. We use it for end-of-day settlement calculations.
- Advisory locks — use these for cross-table coordination or when you need to lock on a concept (like a payment ID) rather than a specific row.
The real lesson: The default isolation level is a default, not a recommendation. For payment systems, you need to make a deliberate choice for every transaction that touches money. Audit your codebase — I guarantee you'll find at least one BEGIN that should have a FOR UPDATE and doesn't.
References
- PostgreSQL Documentation — Transaction Isolation
- PostgreSQL Documentation — Explicit Locking
- PostgreSQL Documentation — Advisory Lock Functions
- PostgreSQL Wiki — Serializable Snapshot Isolation (SSI)
- Go database/sql — TxOptions Documentation
Disclaimer: This article reflects the author's personal experience and opinions. Product names, logos, and brands are property of their respective owners. Pricing and features mentioned are subject to change — always verify with official documentation.