The Retention Landscape
Payment data retention isn't a single policy — it's a matrix of overlapping requirements from card networks, regulators, tax authorities, and your own business needs. Visa requires transaction records for 13 months minimum. PCI DSS says you must limit storage to what's strictly necessary. GDPR says delete personal data when it's no longer needed. And your finance team wants 7 years of data for audits.
These requirements often conflict. The engineering challenge is building a system that satisfies all of them simultaneously without turning your database into an unqueryable archive.
Regulatory Timelines You Must Know
| Requirement | Retention Period | What to Keep |
|---|---|---|
| Visa Core Rules | 13 months | Transaction records, authorization logs |
| Mastercard Standards | 13 months | Settlement data, dispute records |
| PCI DSS 4.0 | Business-justified only | Minimize stored cardholder data |
| GDPR / UK GDPR | Purpose-limited | Personal data only as long as needed |
| Tax / Accounting (varies) | 5-7 years | Financial records, invoices, receipts |
| Chargeback Window | 540 days (Visa) | Evidence for dispute resolution |
Key insight: The chargeback window is often the binding constraint, not the card network retention rule. Visa allows disputes up to 540 days after the transaction date for certain reason codes. If you archive transaction details before that window closes, you can't defend chargebacks.
Tiered Storage Architecture
The solution is tiered storage — keep recent data hot, move aging data to warm storage, and archive old data to cold storage. Each tier has different query performance characteristics and cost profiles.
Full indexes, fast queries
Reduced indexes, slower queries
Batch queries only
Partition-Based Archival with PostgreSQL
PostgreSQL's declarative partitioning is the foundation of the hot tier. Partition your transactions table by month, and archival becomes detaching a partition rather than running a massive DELETE.
CREATE TABLE transactions (
id BIGINT GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ NOT NULL,
merchant_id UUID NOT NULL,
amount BIGINT NOT NULL,
currency CHAR(3) NOT NULL,
status TEXT NOT NULL
) PARTITION BY RANGE (created_at);
-- Monthly partitions
CREATE TABLE transactions_2026_03
PARTITION OF transactions
FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');
CREATE TABLE transactions_2026_04
PARTITION OF transactions
FOR VALUES FROM ('2026-04-01') TO ('2026-05-01');
When a partition ages out of the hot tier, detach it and move it to the warm tier:
-- Detach without blocking writes (PostgreSQL 14+)
ALTER TABLE transactions
DETACH PARTITION transactions_2025_12 CONCURRENTLY;
-- Export to Parquet for cold storage
COPY transactions_2025_12 TO PROGRAM
'parquet-converter --output s3://payments-archive/2025/12/'
WITH (FORMAT csv, HEADER);
Real-world gotcha: DETACH PARTITION CONCURRENTLY requires PostgreSQL 14+. On older versions, detaching takes an ACCESS EXCLUSIVE lock that blocks all reads and writes on the parent table. We learned this the hard way during a maintenance window that turned into an outage.
The Cold Storage Layer
For data older than 2 years, Parquet files on S3 (or equivalent object storage) give you the best cost-to-queryability ratio. Parquet's columnar format means you can scan specific fields across billions of rows without reading the entire dataset.
Structure your cold storage by date hierarchy for efficient partition pruning:
s3://payments-archive/
transactions/
year=2024/
month=01/
part-00000.parquet
part-00001.parquet
month=02/
...
settlements/
year=2024/
month=01/
...
Query cold data with tools like AWS Athena, Trino, or DuckDB. For compliance queries that happen once a quarter, the per-query cost model of Athena is far cheaper than keeping the data in a running database.
GDPR and the Right to Erasure Conflict
Here's where it gets tricky. GDPR gives individuals the right to request deletion of their personal data. But tax law requires you to keep financial records for 5-7 years. And PCI DSS requires audit trails.
The practical solution is pseudonymization rather than deletion for data that has a legal retention requirement:
-- Instead of deleting, pseudonymize the personal data
UPDATE transactions_archive
SET
cardholder_name = 'REDACTED',
email = encode(digest(email, 'sha256'), 'hex'),
ip_address = '0.0.0.0',
billing_address = 'REDACTED'
WHERE customer_id = $1
AND created_at < NOW() - INTERVAL '13 months';
This satisfies GDPR because the data is no longer personally identifiable, while preserving the financial record for tax and audit purposes. Document this approach in your data processing agreement — regulators generally accept pseudonymization as a valid response to erasure requests when legal retention obligations apply.
Building the Archival Pipeline
The archival pipeline runs as a scheduled job — we use a weekly cadence. The key principle: archival must be idempotent. If the job fails halfway through, re-running it should produce the same result.
Aged partitions
To Parquet/S3
Row count + checksum
Partition
After 30-day hold
The 30-day hold between detach and drop is critical. If someone discovers a data issue after archival, you can reattach the partition without restoring from cold storage. We've used this escape hatch twice in production — both times it saved hours of recovery time.
Lessons from Production
- Never delete without exporting first. This sounds obvious, but under pressure to free disk space, teams skip the verification step. Always confirm the row count in cold storage matches the source before dropping the partition.
- Keep your archival metadata in a separate table. Track which partitions were archived, when, where the Parquet files live, and the row count. This table is your index into cold storage and your audit trail for compliance.
- Test your restore path quarterly. Archival is useless if you can't restore. Pick a random archived month each quarter and verify you can query it end-to-end. We found a Parquet schema drift issue this way that would have made 3 months of data unreadable.
- Encrypt cold storage at rest and in transit. Payment data in S3 needs server-side encryption (SSE-S3 or SSE-KMS) and bucket policies that enforce HTTPS. A misconfigured bucket with payment data is a PCI DSS violation and a breach waiting to happen.
- Automate partition creation, not just archival. If you forget to create next month's partition, inserts fail. Use a cron job or pg_partman to create partitions 3 months ahead.
References
- PCI Security Standards Council — PCI DSS v4.0 Document Library
- GDPR Article 17 — Right to Erasure
- PostgreSQL Documentation — Table Partitioning
- Apache Parquet Format Specification
- Visa Core Rules and Visa Product and Service Rules
Disclaimer: This article reflects the author's personal experience and opinions. It is not legal advice — consult qualified legal counsel for compliance decisions specific to your jurisdiction. Product names, logos, and brands are property of their respective owners.