Go Concurrency Patterns for Payment Processing

Why Concurrency in Payments Is Different

Most concurrency tutorials show you how to scrape web pages in parallel or process images faster. Payments are a different animal. You can't retry a charge without checking if the first one went through. You can't drop a transaction on the floor because a goroutine panicked. And you absolutely cannot process the same payment twice because of a race condition.

Go's concurrency primitives — goroutines and channels — are a great fit for this domain, but only if you layer the right patterns on top. Raw go func() calls scattered through your codebase will eventually cost you money. I've seen it happen. A missing sync.WaitGroup caused a settlement service to exit before all transactions were confirmed, and we spent two days reconciling the gap.

The Fan-Out/Fan-In Pattern

This is the workhorse pattern for payment batch processing. You have a pile of transactions to process, and you want to spread the work across multiple goroutines (fan-out), then collect the results back into a single stream (fan-in).

Here's how the data flows through the system:

Fan-Out / Fan-In Pipeline

Transaction
Queue

→

Worker 1

Worker 2

Worker 3

Worker N

→

Results
Channel

→

Settlement
Report

The key insight: the number of workers should be bounded. In payment processing, each worker likely holds a database connection and an HTTP connection to a payment gateway. Spinning up 10,000 goroutines sounds cool until your connection pool is exhausted and your gateway starts returning 429s.

Worker Pool with Bounded Concurrency

Here's the pattern I use in every payment service. A fixed pool of workers pulls jobs from a shared channel. The channel acts as a natural backpressure mechanism — if all workers are busy, senders block until one frees up.

func ProcessBatch(ctx context.Context, txns []Transaction, concurrency int) ([]Result, error) {
    g, ctx := errgroup.WithContext(ctx)
    jobs := make(chan Transaction, concurrency)
    results := make(chan Result, len(txns))

    // Fan-out: start fixed number of workers
    for i := 0; i < concurrency; i++ {
        g.Go(func() error {
            for txn := range jobs {
                res, err := processPayment(ctx, txn)
                if err != nil {
                    return fmt.Errorf("txn %s: %w", txn.ID, err)
                }
                results <- res
            }
            return nil
        })
    }

    // Send jobs to workers
    g.Go(func() error {
        defer close(jobs)
        for _, txn := range txns {
            select {
            case jobs <- txn:
            case <-ctx.Done():
                return ctx.Err()
            }
        }
        return nil
    })

    // Wait for all workers, then close results
    err := g.Wait()
    close(results)

    // Fan-in: collect results
    var out []Result
    for r := range results {
        out = append(out, r)
    }
    return out, err
}

Notice the select on ctx.Done() when sending jobs. Without that, if one worker hits a fatal error and the errgroup cancels the context, the sender goroutine would block forever trying to push into a full channel. I learned this the hard way — a stuck goroutine leaked memory for three days before we noticed.

Rule of thumb for worker count: start with the number of available connections to your payment gateway, not the number of CPU cores. Payment processing is I/O-bound. In our case, the gateway allowed 50 concurrent connections per merchant, so we run 40 workers to leave headroom for retries and health checks.

Buffered vs Unbuffered Channels

Choosing the right channel type matters more than most people think, especially when money is involved. Here's how I decide:

Aspect	Unbuffered	Buffered
Synchronization	Sender blocks until receiver is ready	Sender blocks only when buffer is full
Backpressure	Immediate — producer slows to consumer speed	Delayed — absorbs bursts up to buffer size
Data loss risk	None — handoff is guaranteed	Buffered items lost if process crashes
Payment use case	Real-time charge confirmations	Batch settlement, async notifications
Throughput	Lower — tight coupling between stages	Higher — decouples producer and consumer

For anything involving real money movement, I default to unbuffered channels for the critical path. Yes, it's slower. But when a process crashes, nothing is sitting in a buffer waiting to be processed. Every transaction that was sent was also received. For batch jobs and async notifications, buffered channels are fine — just make sure you persist the work to a durable queue before it enters the channel.

Context Cancellation — The Kill Switch

Every payment operation needs a timeout. A goroutine waiting forever on a gateway response is a goroutine leaking memory, holding a connection, and blocking a worker slot. Go's context package gives you cancellation propagation for free.

func processPayment(ctx context.Context, txn Transaction) (Result, error) {
    // Hard timeout: no payment call should take more than 30s
    ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
    defer cancel()

    // Check if we're already cancelled before doing work
    select {
    case <-ctx.Done():
        return Result{}, ctx.Err()
    default:
    }

    resp, err := gateway.Charge(ctx, txn)
    if err != nil {
        if ctx.Err() == context.DeadlineExceeded {
            // Timeout: we don't know if the charge went through
            // Mark as UNKNOWN and reconcile later
            return Result{
                TxnID:  txn.ID,
                Status: StatusUnknown,
            }, nil
        }
        return Result{}, fmt.Errorf("charge failed: %w", err)
    }

    return Result{TxnID: txn.ID, Status: StatusSuccess, GatewayRef: resp.ID}, nil
}

Never assume a timeout means the charge failed. If your context deadline fires after the gateway received the request but before you got the response, the charge may have succeeded. Always mark timed-out transactions as UNKNOWN and reconcile them with the gateway's transaction log. Treating timeouts as failures will lead to double charges.

Errgroup — Coordinating Failure

The errgroup package from golang.org/x/sync is the single most useful concurrency tool for payment systems. It gives you three things at once: goroutine lifecycle management, error propagation, and context cancellation on first failure.

The pattern I showed in the worker pool above uses errgroup, but here's a more targeted example — running parallel validation checks before authorizing a payment:

func ValidatePayment(ctx context.Context, req PaymentRequest) error {
    g, ctx := errgroup.WithContext(ctx)

    g.Go(func() error {
        return validateCard(ctx, req.CardToken)
    })

    g.Go(func() error {
        return checkFraudScore(ctx, req)
    })

    g.Go(func() error {
        return verifyMerchantLimits(ctx, req.MerchantID, req.Amount)
    })

    // If ANY check fails, ctx is cancelled and
    // remaining goroutines exit early
    return g.Wait()
}

The beauty here: if the fraud check returns an error, the context gets cancelled immediately. The card validation and merchant limit checks — which might be waiting on slow external calls — see the cancelled context and bail out. You don't waste time finishing checks for a payment you're already going to reject.

Errgroup with a Concurrency Limit

Since Go 1.20, errgroup supports SetLimit, which turns it into a bounded worker pool without the boilerplate of managing channels yourself:

g, ctx := errgroup.WithContext(ctx)
g.SetLimit(20) // max 20 concurrent gateway calls

for _, txn := range transactions {
    txn := txn // capture loop variable
    g.Go(func() error {
        return processAndStore(ctx, txn)
    })
}

return g.Wait()

Patterns to Avoid

After building payment systems in Go for a few years, here are the patterns that have burned me:

Unbounded goroutine spawning — for _, txn := range txns { go process(txn) } looks harmless until you have 500,000 transactions and your OOM killer fires. Always use a worker pool.
Shared mutable state without synchronization — a map tracking transaction statuses accessed by multiple goroutines will corrupt silently. Use sync.Map or funnel updates through a channel.
Ignoring goroutine leaks — a goroutine blocked on a channel send with no receiver will live forever. In payment services that run for months, this adds up. Use goleak in your tests.
Fire-and-forget goroutines for critical work — if you go sendWebhook(txn) and the process restarts, that webhook is gone. Use a persistent queue instead.

Putting It Together

The settlement batch job I mentioned at the start uses all of these patterns together: errgroup for lifecycle management, a buffered channel as a job queue, bounded workers matching our gateway connection limit, context timeouts on every external call, and an UNKNOWN status for anything that times out. The reconciliation job runs 15 minutes later and resolves the unknowns by querying the gateway's transaction API.

Go makes concurrency easy to start and hard to get wrong — if you use the right patterns. The stdlib and x/sync give you almost everything you need. The rest is discipline: bound your concurrency, cancel aggressively, and never assume a timeout means failure when money is on the line.

References

Disclaimer: This article reflects the author's personal experience and opinions. Product names, logos, and brands are property of their respective owners. Code examples are simplified for clarity — always review and adapt for your specific use case and security requirements. This is not financial or legal advice.

Why Concurrency in Payments Is Different

The Fan-Out/Fan-In Pattern

Worker Pool with Bounded Concurrency

Buffered vs Unbuffered Channels

Context Cancellation — The Kill Switch

Errgroup — Coordinating Failure

Errgroup with a Concurrency Limit

Patterns to Avoid

Putting It Together

References

Related Articles