Building Payment Analytics Pipelines with Go and ClickHouse

Why PostgreSQL Wasn't Enough

Let me be upfront: PostgreSQL is fantastic for transactional workloads. We ran our core payment processing on it for years without complaints. But analytics is a different beast. Once we crossed roughly 200 million rows in our transactions table, our reporting queries started timing out. A simple daily revenue breakdown by merchant was taking 45 seconds. Aggregating chargeback rates across a quarter? Forget it.

We tried the usual tricks — partial indexes, materialized views refreshed on a cron, read replicas dedicated to reporting. They bought us a few months. But the fundamental problem was that PostgreSQL is row-oriented. When your analytics query only needs 4 columns out of 30, it still reads the entire row from disk. At our scale, that meant scanning terabytes of data we didn't care about.

We evaluated a few options — BigQuery, Redshift, Apache Druid — and landed on ClickHouse. The deciding factors were raw query speed on cold data, the ability to self-host (PCI compliance made managed services complicated), and the surprisingly low operational overhead once you get past the initial learning curve.

Criteria	PostgreSQL	ClickHouse
Storage Model	Row-oriented	Column-oriented
Aggregation (1B rows)	Minutes	Sub-second
Compression Ratio	~3x	~10-15x
Single-row Lookups	Excellent	Not ideal
Best For	OLTP / Transactions	OLAP / Analytics

The Pipeline Architecture

The design is straightforward. Our payment service writes transaction events to Kafka. A Go consumer service reads from Kafka, buffers events into batches, and inserts them into ClickHouse. Materialized views inside ClickHouse handle the real-time aggregations. Grafana and our internal dashboard query ClickHouse directly.

Payment
Service

→

Kafka
Topics

→

Go Batch
Inserter

→

ClickHouse
MergeTree

→

Dashboard
/ Grafana

Nothing revolutionary here, and that's the point. The fewer moving parts, the fewer things break at 3 AM.

ClickHouse Schema and Table Engines

Choosing the right table engine matters a lot. We went with ReplacingMergeTree for the raw transactions table. Payment events can be replayed from Kafka during recovery, and ReplacingMergeTree deduplicates rows by the sorting key during background merges. This gave us at-least-once delivery semantics without worrying about duplicates polluting our aggregations.

CREATE TABLE payments.transactions
(
    transaction_id   String,
    merchant_id      UInt64,
    amount_cents     Int64,
    currency         LowCardinality(String),
    status           LowCardinality(String),
    payment_method   LowCardinality(String),
    card_network     LowCardinality(String),
    country_code     LowCardinality(FixedString(2)),
    created_at       DateTime64(3),
    processed_at     DateTime64(3),
    _version         UInt64
)
ENGINE = ReplacingMergeTree(_version)
PARTITION BY toYYYYMM(created_at)
ORDER BY (merchant_id, created_at, transaction_id)
TTL created_at + INTERVAL 24 MONTH
SETTINGS index_granularity = 8192;

A few things worth noting. LowCardinality is ClickHouse's dictionary encoding — it dramatically reduces storage for columns with low cardinality like currency codes and payment statuses. Partitioning by month keeps our DROP PARTITION cleanup fast when data ages out. The ORDER BY clause is your primary index in ClickHouse, so we optimized it for our most common query pattern: filtering by merchant, then by time range.

The Go Batch Inserter

ClickHouse hates small inserts. Seriously — inserting one row at a time will tank your performance and create thousands of tiny parts that the merge process can't keep up with. You want batches of at least 10,000 rows, ideally more. Our Go service buffers events in memory and flushes either when the buffer hits 50,000 rows or every 5 seconds, whichever comes first.

type BatchInserter struct {
    ch     driver.Conn
    buf    []Transaction
    mu     sync.Mutex
    maxBuf int
    flush  time.Duration
}

func (b *BatchInserter) Insert(ctx context.Context, tx Transaction) error {
    b.mu.Lock()
    b.buf = append(b.buf, tx)
    shouldFlush := len(b.buf) >= b.maxBuf
    b.mu.Unlock()

    if shouldFlush {
        return b.Flush(ctx)
    }
    return nil
}

func (b *BatchInserter) Flush(ctx context.Context) error {
    b.mu.Lock()
    batch := b.buf
    b.buf = make([]Transaction, 0, b.maxBuf)
    b.mu.Unlock()

    if len(batch) == 0 {
        return nil
    }

    insert, err := b.ch.PrepareBatch(ctx,
        "INSERT INTO payments.transactions")
    if err != nil {
        return fmt.Errorf("prepare batch: %w", err)
    }

    for _, tx := range batch {
        if err := insert.Append(
            tx.ID, tx.MerchantID, tx.AmountCents,
            tx.Currency, tx.Status, tx.PaymentMethod,
            tx.CardNetwork, tx.CountryCode,
            tx.CreatedAt, tx.ProcessedAt, tx.Version,
        ); err != nil {
            return fmt.Errorf("append row: %w", err)
        }
    }

    return insert.Send()
}

We use the official clickhouse-go driver in native protocol mode — not the HTTP interface. The native protocol uses a binary format that's significantly faster for bulk inserts. One gotcha: make sure you're calling PrepareBatch and Append, not building SQL strings. The driver handles columnar encoding under the hood, which is exactly what ClickHouse wants.

Tip: Run a separate goroutine with a time.Ticker to flush the buffer on the time interval. This ensures data reaches ClickHouse even during low-traffic periods when the buffer size threshold won't be hit.

Materialized Views for Real-Time Aggregations

This is where ClickHouse really shines. Instead of running expensive GROUP BY queries over billions of rows at read time, you define materialized views that incrementally aggregate data as it's inserted. The cost shifts from query time to insert time, and the insert overhead is minimal.

Here's the materialized view we use for hourly revenue by merchant:

CREATE MATERIALIZED VIEW payments.hourly_revenue_mv
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (merchant_id, currency, hour)
AS SELECT
    merchant_id,
    currency,
    toStartOfHour(created_at) AS hour,
    sum(amount_cents)          AS total_cents,
    count()                    AS tx_count,
    countIf(status = 'failed') AS failed_count
FROM payments.transactions
WHERE status IN ('settled', 'failed')
GROUP BY merchant_id, currency, hour;

SummingMergeTree automatically sums numeric columns when parts merge, so the view stays compact. Querying hourly revenue for a merchant over the last 30 days now scans a few thousand rows instead of tens of millions. Our p99 query latency dropped from 12 seconds to under 200 milliseconds.

12B+

Rows ingested

< 200ms

p99 query latency

Storage cost reduction

Query Patterns for Payment Dashboards

Most of our dashboard queries follow a predictable shape: filter by merchant, filter by time range, aggregate by some dimension. ClickHouse's query optimizer is excellent at pruning partitions and using the primary index, but you still need to think about your ORDER BY key.

A typical dashboard query — daily settlement volume with success rates:

SELECT
    toDate(created_at)                          AS day,
    sum(amount_cents) / 100                     AS volume_usd,
    count()                                     AS total,
    countIf(status = 'settled') / count() * 100 AS success_rate
FROM payments.transactions
WHERE merchant_id = 4821
  AND created_at >= now() - INTERVAL 30 DAY
GROUP BY day
ORDER BY day;

Because merchant_id is the first column in our ORDER BY key, ClickHouse skips straight to the relevant granules. Combined with monthly partitioning, this query touches a tiny fraction of the total data.

Lessons Learned

A few things I wish I'd known before starting:

Don't use ClickHouse for OLTP. It's tempting to consolidate, but point lookups by primary key are slow compared to PostgreSQL. Keep your transactional database for transactional work.
Batch size matters more than you think. We saw a 20x throughput improvement going from 1,000-row batches to 50,000-row batches. The sweet spot depends on your row width, but start large.
Monitor the merge queue. If parts_to_do in system.merges keeps growing, your insert rate is outpacing background merges. Reduce insert frequency or increase max_threads.
Test your TTL policies. We once set a TTL that silently dropped data we still needed for quarterly compliance reports. Always verify TTL behavior in staging first.

References

Disclaimer: This article reflects the author's personal experience and opinions. Product names, logos, and brands are property of their respective owners. Pricing and features mentioned are subject to change — always verify with official documentation.

Why PostgreSQL Wasn't Enough

The Pipeline Architecture

ClickHouse Schema and Table Engines

The Go Batch Inserter

Materialized Views for Real-Time Aggregations

Query Patterns for Payment Dashboards

Lessons Learned

References

Related Articles