Why PostgreSQL Wasn't Enough
Let me be upfront: PostgreSQL is fantastic for transactional workloads. We ran our core payment processing on it for years without complaints. But analytics is a different beast. Once we crossed roughly 200 million rows in our transactions table, our reporting queries started timing out. A simple daily revenue breakdown by merchant was taking 45 seconds. Aggregating chargeback rates across a quarter? Forget it.
We tried the usual tricks — partial indexes, materialized views refreshed on a cron, read replicas dedicated to reporting. They bought us a few months. But the fundamental problem was that PostgreSQL is row-oriented. When your analytics query only needs 4 columns out of 30, it still reads the entire row from disk. At our scale, that meant scanning terabytes of data we didn't care about.
We evaluated a few options — BigQuery, Redshift, Apache Druid — and landed on ClickHouse. The deciding factors were raw query speed on cold data, the ability to self-host (PCI compliance made managed services complicated), and the surprisingly low operational overhead once you get past the initial learning curve.
The Pipeline Architecture
The design is straightforward. Our payment service writes transaction events to Kafka. A Go consumer service reads from Kafka, buffers events into batches, and inserts them into ClickHouse. Materialized views inside ClickHouse handle the real-time aggregations. Grafana and our internal dashboard query ClickHouse directly.
Service
Topics
Inserter
MergeTree
/ Grafana
Nothing revolutionary here, and that's the point. The fewer moving parts, the fewer things break at 3 AM.
ClickHouse Schema and Table Engines
Choosing the right table engine matters a lot. We went with ReplacingMergeTree for the raw transactions table. Payment events can be replayed from Kafka during recovery, and ReplacingMergeTree deduplicates rows by the sorting key during background merges. This gave us at-least-once delivery semantics without worrying about duplicates polluting our aggregations.
CREATE TABLE payments.transactions
(
transaction_id String,
merchant_id UInt64,
amount_cents Int64,
currency LowCardinality(String),
status LowCardinality(String),
payment_method LowCardinality(String),
card_network LowCardinality(String),
country_code LowCardinality(FixedString(2)),
created_at DateTime64(3),
processed_at DateTime64(3),
_version UInt64
)
ENGINE = ReplacingMergeTree(_version)
PARTITION BY toYYYYMM(created_at)
ORDER BY (merchant_id, created_at, transaction_id)
TTL created_at + INTERVAL 24 MONTH
SETTINGS index_granularity = 8192;
A few things worth noting. LowCardinality is ClickHouse's dictionary encoding — it dramatically reduces storage for columns with low cardinality like currency codes and payment statuses. Partitioning by month keeps our DROP PARTITION cleanup fast when data ages out. The ORDER BY clause is your primary index in ClickHouse, so we optimized it for our most common query pattern: filtering by merchant, then by time range.
The Go Batch Inserter
ClickHouse hates small inserts. Seriously — inserting one row at a time will tank your performance and create thousands of tiny parts that the merge process can't keep up with. You want batches of at least 10,000 rows, ideally more. Our Go service buffers events in memory and flushes either when the buffer hits 50,000 rows or every 5 seconds, whichever comes first.
type BatchInserter struct {
ch driver.Conn
buf []Transaction
mu sync.Mutex
maxBuf int
flush time.Duration
}
func (b *BatchInserter) Insert(ctx context.Context, tx Transaction) error {
b.mu.Lock()
b.buf = append(b.buf, tx)
shouldFlush := len(b.buf) >= b.maxBuf
b.mu.Unlock()
if shouldFlush {
return b.Flush(ctx)
}
return nil
}
func (b *BatchInserter) Flush(ctx context.Context) error {
b.mu.Lock()
batch := b.buf
b.buf = make([]Transaction, 0, b.maxBuf)
b.mu.Unlock()
if len(batch) == 0 {
return nil
}
insert, err := b.ch.PrepareBatch(ctx,
"INSERT INTO payments.transactions")
if err != nil {
return fmt.Errorf("prepare batch: %w", err)
}
for _, tx := range batch {
if err := insert.Append(
tx.ID, tx.MerchantID, tx.AmountCents,
tx.Currency, tx.Status, tx.PaymentMethod,
tx.CardNetwork, tx.CountryCode,
tx.CreatedAt, tx.ProcessedAt, tx.Version,
); err != nil {
return fmt.Errorf("append row: %w", err)
}
}
return insert.Send()
}
We use the official clickhouse-go driver in native protocol mode — not the HTTP interface. The native protocol uses a binary format that's significantly faster for bulk inserts. One gotcha: make sure you're calling PrepareBatch and Append, not building SQL strings. The driver handles columnar encoding under the hood, which is exactly what ClickHouse wants.
Tip: Run a separate goroutine with a time.Ticker to flush the buffer on the time interval. This ensures data reaches ClickHouse even during low-traffic periods when the buffer size threshold won't be hit.
Materialized Views for Real-Time Aggregations
This is where ClickHouse really shines. Instead of running expensive GROUP BY queries over billions of rows at read time, you define materialized views that incrementally aggregate data as it's inserted. The cost shifts from query time to insert time, and the insert overhead is minimal.
Here's the materialized view we use for hourly revenue by merchant:
CREATE MATERIALIZED VIEW payments.hourly_revenue_mv
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (merchant_id, currency, hour)
AS SELECT
merchant_id,
currency,
toStartOfHour(created_at) AS hour,
sum(amount_cents) AS total_cents,
count() AS tx_count,
countIf(status = 'failed') AS failed_count
FROM payments.transactions
WHERE status IN ('settled', 'failed')
GROUP BY merchant_id, currency, hour;
SummingMergeTree automatically sums numeric columns when parts merge, so the view stays compact. Querying hourly revenue for a merchant over the last 30 days now scans a few thousand rows instead of tens of millions. Our p99 query latency dropped from 12 seconds to under 200 milliseconds.
Query Patterns for Payment Dashboards
Most of our dashboard queries follow a predictable shape: filter by merchant, filter by time range, aggregate by some dimension. ClickHouse's query optimizer is excellent at pruning partitions and using the primary index, but you still need to think about your ORDER BY key.
A typical dashboard query — daily settlement volume with success rates:
SELECT
toDate(created_at) AS day,
sum(amount_cents) / 100 AS volume_usd,
count() AS total,
countIf(status = 'settled') / count() * 100 AS success_rate
FROM payments.transactions
WHERE merchant_id = 4821
AND created_at >= now() - INTERVAL 30 DAY
GROUP BY day
ORDER BY day;
Because merchant_id is the first column in our ORDER BY key, ClickHouse skips straight to the relevant granules. Combined with monthly partitioning, this query touches a tiny fraction of the total data.
Lessons Learned
A few things I wish I'd known before starting:
- Don't use ClickHouse for OLTP. It's tempting to consolidate, but point lookups by primary key are slow compared to PostgreSQL. Keep your transactional database for transactional work.
- Batch size matters more than you think. We saw a 20x throughput improvement going from 1,000-row batches to 50,000-row batches. The sweet spot depends on your row width, but start large.
- Monitor the merge queue. If
parts_to_doinsystem.mergeskeeps growing, your insert rate is outpacing background merges. Reduce insert frequency or increasemax_threads. - Test your TTL policies. We once set a TTL that silently dropped data we still needed for quarterly compliance reports. Always verify TTL behavior in staging first.
References
- ClickHouse MergeTree Engine Documentation
- ClickHouse Materialized Views
- ClickHouse Insert Performance Tuning
- clickhouse-go — Official Go Driver for ClickHouse
- Go database/sql Package Documentation
- ClickHouse LowCardinality Type
Disclaimer: This article reflects the author's personal experience and opinions. Product names, logos, and brands are property of their respective owners. Pricing and features mentioned are subject to change — always verify with official documentation.