Go Error Handling Patterns for Financial Systems — When if err != nil Isn't Enough

Why Financial Systems Need Better Error Handling

I've been writing Go for payment microservices for a while now, and if there's one thing I've learned the hard way, it's that if err != nil { return err } will eventually cost you money. Literally.

In most applications, an error is an error. You log it, maybe show the user a friendly message, and move on. In payment systems, the type of error changes everything. A timeout from the payment gateway means the charge might have gone through — retry blindly and you double-charge the customer. A "card declined" response is permanent — retrying it ten times won't change the outcome, but it will get you rate-limited by the gateway and flagged for suspicious behavior.

The stakes are different here. Every unclassified error is a potential dispute, a compliance finding, or a customer who never comes back. So we need error handling that does more than just propagate failures — it needs to classify them, carry context for audit trails, and inform retry decisions.

A real incident that shaped my thinking: we once had a service that retried on all errors from a gateway. A network blip caused 340 duplicate charges in 12 minutes. The refund process took three days. After that, we built the patterns I'm sharing here.

Typed Errors for Payment Domains

The foundation of everything else is a well-structured error type. Go's error interface is minimal by design, but for payments, we need errors that carry domain-specific information. Here's the core type I use across our payment services:

type PaymentError struct {
    Code            string
    Message         string
    Retryable       bool
    GatewayResponse string
    TransactionID   string
    HTTPStatus      int
}

func (e *PaymentError) Error() string {
    return fmt.Sprintf("payment error %s: %s (retryable: %t)",
        e.Code, e.Message, e.Retryable)
}

// Sentinel errors for common cases
var (
    ErrInsufficientFunds = &PaymentError{Code: "insufficient_funds", Retryable: false}
    ErrGatewayTimeout    = &PaymentError{Code: "gateway_timeout", Retryable: true}
    ErrRateLimited       = &PaymentError{Code: "rate_limited", Retryable: true}
    ErrInvalidCard       = &PaymentError{Code: "invalid_card", Retryable: false}
    ErrFraudSuspected    = &PaymentError{Code: "fraud_suspected", Retryable: false}
)

The key fields are Code for programmatic classification, Retryable for retry logic, and GatewayResponse for debugging. With Go 1.13+ error wrapping, we can check these anywhere in the call stack:

func handleChargeResult(err error) {
    var payErr *PaymentError
    if errors.As(err, &payErr) {
        if payErr.Retryable {
            // enqueue for retry
            retryQueue.Push(payErr.TransactionID)
            return
        }
        // permanent failure — notify the customer
        notifyDecline(payErr.TransactionID, payErr.Code)
        return
    }
    // unknown error — flag for manual review
    escalate(err)
}

The errors.As call unwraps through any number of wrapping layers to find our PaymentError. This means intermediate services can add context without destroying the classification.

The Error Classification Pattern

Raw gateway responses are messy. Every payment processor returns errors differently — Stripe gives you structured error codes, some legacy gateways give you four-digit numeric codes, and others return free-text messages. I normalize everything through a classifier:

type ErrorCategory int

const (
    CategoryRetryable      ErrorCategory = iota
    CategoryPermanent
    CategoryRequiresReview
)

func ClassifyGatewayError(httpStatus int, gatewayCode string) (*PaymentError, ErrorCategory) {
    // Network-level classification
    if httpStatus >= 500 {
        return &PaymentError{
            Code:      "gateway_error",
            Message:   "Gateway returned server error",
            Retryable: true,
            HTTPStatus: httpStatus,
        }, CategoryRetryable
    }

    // Application-level classification
    switch gatewayCode {
    case "insufficient_funds", "card_declined", "expired_card":
        return &PaymentError{
            Code:      gatewayCode,
            Message:   "Card was declined",
            Retryable: false,
        }, CategoryPermanent

    case "rate_limit", "try_again_later":
        return &PaymentError{
            Code:      gatewayCode,
            Message:   "Gateway rate limited",
            Retryable: true,
        }, CategoryRetryable

    case "invalid_card_number", "invalid_cvv":
        return &PaymentError{
            Code:      gatewayCode,
            Message:   "Invalid card details",
            Retryable: false,
        }, CategoryPermanent

    case "fraud_warning", "risk_threshold":
        return &PaymentError{
            Code:      gatewayCode,
            Message:   "Flagged for fraud review",
            Retryable: false,
        }, CategoryRequiresReview

    default:
        return &PaymentError{
            Code:      "unknown_" + gatewayCode,
            Message:   "Unrecognized gateway response",
            Retryable: false,
        }, CategoryRequiresReview
    }
}

The important detail: unknown errors default to RequiresReview, not Retryable. In payments, when you don't know what happened, the safest thing is to stop and let a human look at it. Retrying an unknown error is how you get duplicate charges.

Gateway Response

↓

HTTP 5xx?

Yes

↓

Retryable

↓

HTTP 4xx?

↓

Check Error Code

Insufficient Funds

↓

Permanent Decline

Rate Limited

↓

Retryable

Invalid Card

↓

Permanent

Fraud Flag

↓

Requires Review

3 error categories

that cover 95% of payment failures

Error Type	Retryable?	Action	Example
Timeout	Yes	Retry with idempotency key	Gateway didn't respond within 30s
Decline	No	Notify customer, stop retries	Insufficient funds, expired card
Rate Limit	Yes	Backoff and retry after delay	HTTP 429 from gateway API
Network Error	Yes	Retry with circuit breaker	DNS resolution failure, TCP reset
Fraud Block	No	Escalate to fraud review queue	Risk score exceeded threshold

Wrapping Errors Without Losing Context

Go's fmt.Errorf with %w is great for adding context, but in financial systems you have to be deliberate about what context you add. Transaction IDs, gateway names, amounts — all useful for debugging. Card numbers, CVVs, account details — absolutely not. PCI DSS is very clear on this: sensitive authentication data must never appear in logs.

Here's the pattern I follow:

func chargeCard(ctx context.Context, req ChargeRequest) error {
    resp, err := gateway.Charge(ctx, req)
    if err != nil {
        // Safe context: transaction ID, gateway name, amount
        // Never include: card number, CVV, full account number
        return fmt.Errorf(
            "charge failed for txn %s via %s (amount: %d %s): %w",
            req.TransactionID,
            req.GatewayName,
            req.Amount,
            req.Currency,
            classifyError(err, resp),
        )
    }
    return nil
}

// What NOT to do:
// return fmt.Errorf("charge failed for card %s: %w", req.CardNumber, err)
// This puts PAN data in your logs. Don't do it.

The wrapped error preserves the full chain. Any caller can still use errors.As to extract the PaymentError and check the Retryable flag, but now the error message also carries the operational context you need when you're debugging at 2 AM.

One thing I've found useful: create a helper that masks sensitive data if it accidentally gets passed in. It's a safety net, not a primary defense, but it's caught real mistakes in code review.

Error Handling in Retry Logic

Retry logic in payment systems has to be smarter than a simple loop. You need to respect the Retryable flag, implement exponential backoff to avoid hammering a struggling gateway, and set a hard ceiling on attempts. Here's the retry function we use:

func RetryPayment(ctx context.Context, txnID string, fn func() error) error {
    maxRetries := 3
    baseDelay := 500 * time.Millisecond

    var lastErr error
    for attempt := 0; attempt <= maxRetries; attempt++ {
        lastErr = fn()
        if lastErr == nil {
            return nil
        }

        var payErr *PaymentError
        if errors.As(lastErr, &payErr) && !payErr.Retryable {
            // Permanent failure — retrying won't help
            return fmt.Errorf(
                "permanent failure on attempt %d for txn %s: %w",
                attempt+1, txnID, lastErr,
            )
        }

        if attempt < maxRetries {
            // Exponential backoff: 500ms, 1s, 2s
            delay := baseDelay * time.Duration(1<



    A few things to note. The function respects context cancellation — if the upstream caller times out or cancels, we stop retrying immediately instead of burning through attempts. The backoff is exponential but capped at three retries, because in payments, if it hasn't worked after three tries, something is genuinely wrong and you need human eyes on it.

    
      Always pair retry logic with idempotency keys. If you're retrying a charge, the gateway needs to know it's the same charge attempt, not a new one. Without idempotency keys, retry logic is just a double-charge generator.
    

    Audit Trail Errors

    In financial systems, every error is a potential compliance event. Regulators and auditors want to know what happened, when, and what the system did about it. But PCI DSS means you can't just dump everything into your logs. You need structured logging that captures operational context while scrubbing sensitive data.

    Here's the pattern I use with structured logging (we use slog from the standard library, but this works with zerolog or zap too):

func LogPaymentError(logger *slog.Logger, err error, txnID string) {
    var payErr *PaymentError
    if errors.As(err, &payErr) {
        logger.Error("payment processing failed",
            // Operational context — safe for logs
            slog.String("transaction_id", txnID),
            slog.String("error_code", payErr.Code),
            slog.Bool("retryable", payErr.Retryable),
            slog.Int("http_status", payErr.HTTPStatus),
            slog.String("gateway_response", payErr.GatewayResponse),
            slog.Time("timestamp", time.Now().UTC()),

            // Classification for alerting
            slog.String("category", classifyForAlert(payErr)),

            // Never log these:
            // slog.String("card_number", req.CardNumber),  // PCI violation
            // slog.String("cvv", req.CVV),                 // PCI violation
            // slog.String("account_number", req.Account),  // PII risk
        )
        return
    }

    // Unclassified error — log and escalate
    logger.Error("unclassified payment error",
        slog.String("transaction_id", txnID),
        slog.String("error", err.Error()),
        slog.String("category", "requires_review"),
        slog.Time("timestamp", time.Now().UTC()),
    )
}

func classifyForAlert(err *PaymentError) string {
    if err.Retryable {
        return "retryable"
    }
    if err.Code == "fraud_suspected" || err.Code == "risk_threshold" {
        return "requires_review"
    }
    return "permanent"
}

    The structured fields make these errors searchable. When compliance asks "show me all fraud-flagged transactions from last Tuesday," you can query category=requires_review AND error_code=fraud_suspected instead of grepping through unstructured log lines. That's the difference between a 30-second query and a two-hour investigation.

    I also route different categories to different alerting channels. Retryable errors go to a dashboard for trend monitoring — a spike in timeouts might mean the gateway is having issues. Permanent declines are normal business flow. But requires_review errors page the on-call engineer, because those are the ones that can turn into real problems if they sit unattended.

    Putting It All Together

    These patterns aren't complex individually, but they compound. Typed errors feed the classifier. The classifier informs retry logic. Retry logic generates structured audit logs. Each layer builds on the one below it, and the result is a system where you can confidently answer "what happened to this transaction?" at any point in the pipeline.

    If you're building payment systems in Go, start with the error type. Get that right, and the rest follows naturally. The PaymentError struct I showed above has been stable in our codebase for over two years — the classification logic changes as we integrate new gateways, but the core type hasn't needed modification. That's a good sign that the abstraction is at the right level.

    
      References
      
        Go Blog — Error handling and Go
        Go Blog — Working with Errors in Go 1.13
        Effective Go — Errors
        Stripe API Error Handling
      
    

    The code examples in this article are simplified for clarity and do not represent production-ready implementations. Always conduct thorough testing, security review, and compliance validation before deploying error handling logic in financial systems. PCI DSS requirements vary by merchant level — consult your QSA for specific guidance on logging and data handling.