Go's errgroup for Parallel Payment API Calls

Why Not Just Use Goroutines and WaitGroup

The naive approach to parallel API calls in Go looks like this: spawn goroutines, use a sync.WaitGroup, collect results through channels. It works, but it has three problems in payment contexts:

Error propagation is manual. If one provider call fails, you need to decide whether to cancel the others. With raw goroutines, you're wiring up context cancellation yourself.
Panics in goroutines crash the process. A nil pointer from a malformed provider response takes down your entire payment service, not just that one request.
No concurrency limits. During a settlement batch, you might fan out 10,000 reconciliation calls. Without a limiter, you'll exhaust file descriptors or get rate-limited by the provider.

golang.org/x/sync/errgroup solves all three. It's a thin wrapper — about 60 lines of code — but it encodes the right patterns for concurrent work with shared error handling.

errgroup Basics for Payment Fan-Out

The core pattern: create a group with a context, launch goroutines with g.Go(), and wait for all of them. If any goroutine returns an error, the context is cancelled and g.Wait() returns that error.

g, ctx := errgroup.WithContext(ctx)

g.Go(func() error {
    return callFraudService(ctx, txn)
})
g.Go(func() error {
    return callRiskEngine(ctx, txn)
})

if err := g.Wait(); err != nil {
    // At least one call failed — ctx was cancelled,
    // so the other call got a cancellation signal too
    return fmt.Errorf("pre-auth checks failed: %w", err)
}

The key insight: when errgroup.WithContext creates the group, it derives a child context. The first error from any goroutine cancels that child context. Other goroutines receive the cancellation through ctx.Done() — but only if they're checking it. Make sure your HTTP clients respect context cancellation (the standard library's http.Client does by default).

Important: errgroup only returns the first error. In payment systems, you often need all errors — "Stripe timed out AND Adyen returned 503." We'll cover multi-error collection below.

Parallel Provider Health Checks

Before routing a transaction, our orchestration layer checks which providers are healthy. Doing this sequentially adds latency — three providers at 200ms each means 600ms before you even start the authorization. With errgroup, it's a single round-trip:

type HealthResult struct {
    Provider string
    Healthy  bool
    Latency  time.Duration
}

func checkProviders(ctx context.Context, providers []Provider) ([]HealthResult, error) {
    results := make([]HealthResult, len(providers))
    g, ctx := errgroup.WithContext(ctx)

    for i, p := range providers {
        g.Go(func() error {
            start := time.Now()
            err := p.Ping(ctx)
            results[i] = HealthResult{
                Provider: p.Name(),
                Healthy:  err == nil,
                Latency:  time.Since(start),
            }
            return nil // Don't fail the group on unhealthy provider
        })
    }

    return results, g.Wait()
}

Notice that each goroutine writes to its own index in the results slice — no mutex needed. And we return nil from each goroutine because an unhealthy provider isn't an error in the group; it's data we use for routing decisions.

Orchestrator

→

Stripe ✓
45ms

Adyen ✓
62ms

Worldpay ✗
timeout

→

Route to
Stripe

Multi-Acquirer Routing with First Success

Sometimes you want to try multiple acquirers simultaneously and take the first successful authorization. This is common in high-value transactions where you want to maximize approval rates. errgroup alone doesn't support "first success" — it waits for all goroutines. But you can combine it with a channel:

func authorizeWithFallback(ctx context.Context, txn Transaction, acquirers []Acquirer) (*AuthResult, error) {
    ctx, cancel := context.WithCancel(ctx)
    defer cancel()

    resultCh := make(chan *AuthResult, len(acquirers))
    g, ctx := errgroup.WithContext(ctx)

    for _, acq := range acquirers {
        g.Go(func() error {
            res, err := acq.Authorize(ctx, txn)
            if err != nil {
                return err
            }
            if res.Approved {
                resultCh <- res
                cancel() // Signal others to stop
            }
            return nil
        })
    }

    go func() {
        g.Wait()
        close(resultCh)
    }()

    if res, ok := <-resultCh; ok {
        return res, nil
    }
    return nil, g.Wait() // All failed — return the first error
}

Warning: Sending the same transaction to multiple acquirers simultaneously can result in double charges if more than one approves before cancellation propagates. Use this pattern only for idempotent operations or when your providers support void-on-duplicate.

Bounded Concurrency for Batch Operations

Settlement reconciliation might involve checking thousands of transactions against a provider's API. Unbounded concurrency will get you rate-limited or worse. errgroup's SetLimit method (added in Go 1.20) handles this:

func reconcileBatch(ctx context.Context, txns []Transaction) []ReconcileResult {
    results := make([]ReconcileResult, len(txns))
    g, ctx := errgroup.WithContext(ctx)
    g.SetLimit(20) // Max 20 concurrent API calls

    for i, txn := range txns {
        g.Go(func() error {
            res, err := reconcileOne(ctx, txn)
            results[i] = ReconcileResult{TxnID: txn.ID, Result: res, Err: err}
            return nil // Collect errors in results, don't fail the group
        })
    }

    g.Wait()
    return results
}

The SetLimit(20) call means at most 20 goroutines run concurrently. When one finishes, the next g.Go() call unblocks. This is cleaner than managing a semaphore channel yourself, and it integrates with errgroup's error handling.

Collecting Errors Without Losing Context

errgroup returns only the first error. In payment systems, you need all of them — for logging, for deciding whether to retry, and for incident response. Here's the pattern we use:

type MultiError struct {
    mu     sync.Mutex
    errors []error
}

func (me *MultiError) Add(err error) {
    me.mu.Lock()
    me.errors = append(me.errors, err)
    me.mu.Unlock()
}

func (me *MultiError) Err() error {
    me.mu.Lock()
    defer me.mu.Unlock()
    if len(me.errors) == 0 {
        return nil
    }
    return fmt.Errorf("%d provider errors: %w", len(me.errors), errors.Join(me.errors...))
}

Use it alongside errgroup — each goroutine appends to the MultiError and returns nil to the group, so all goroutines run to completion. Then check MultiError.Err() after g.Wait().

Pattern	Error Behavior	Use Case
errgroup (default)	First error cancels all	Pre-auth checks (all must pass)
errgroup + nil returns	All run to completion	Health checks, batch reconciliation
errgroup + MultiError	Collect all, decide after	Multi-provider settlement
errgroup + channel	First success wins	Multi-acquirer authorization

Production Lessons

Always set a context timeout. errgroup inherits the parent context, but if that context has no deadline, a hung provider call blocks the group forever. We wrap every payment fan-out with a 5-second timeout.
Log which goroutine failed. errgroup's first-error-wins behavior means you lose context about which provider caused the failure. Wrap errors with the provider name: fmt.Errorf("stripe: %w", err).
Don't share mutable state between goroutines. The index-per-goroutine pattern (results[i]) is safe because each goroutine writes to a unique index. But if you're building a shared map, you need a mutex.
Use SetLimit for external APIs. Even if your service can handle 1,000 concurrent goroutines, the provider's API probably can't. We've been rate-limited by every major PSP at least once during batch operations.
Test with -race. Fan-out patterns are where data races hide. Run go test -race on every CI build. We caught a shared-buffer bug in our reconciliation fan-out that only manifested under load.

References

Disclaimer: This article reflects the author's personal experience and opinions. Product names, logos, and brands are property of their respective owners. Code examples are simplified for clarity — always add proper error handling and testing for production use.

Why Not Just Use Goroutines and WaitGroup

errgroup Basics for Payment Fan-Out

Parallel Provider Health Checks

Multi-Acquirer Routing with First Success

Bounded Concurrency for Batch Operations

Collecting Errors Without Losing Context

Production Lessons

References

Related Articles