April 11, 2026 9 min read

Tuning Go's HTTP Client for External Payment API Calls

Go's default http.Client ships with no timeout and a connection pool that silently leaks under pressure. Here's how to configure it properly for payment APIs where every millisecond — and every dropped connection — costs real money.

The Default Client Is a Ticking Time Bomb

If you've ever written http.Get("https://api.stripe.com/v1/charges") in production code, you've shipped a bug. Go's http.DefaultClient has no timeout. None. A payment provider that hangs during a TLS handshake will block your goroutine forever, and you won't know until your service runs out of memory or file descriptors.

I learned this the hard way. We had a payment service running smoothly for months — until one of our providers started responding slowly during a regional outage. Within 20 minutes, our connection pool was exhausted, goroutines were piling up, and we were dropping legitimate payment requests across all providers, not just the degraded one.

The default http.Transport also has some defaults that don't make sense for payment workloads. MaxIdleConnsPerHost defaults to 2. If you're sending 500 requests per second to Stripe's API, you're tearing down and rebuilding TLS connections constantly. Each new TLS handshake to a payment API adds 50-150ms of latency you didn't need.

Understanding the Timeout Layers

Go's HTTP stack has multiple timeout boundaries, and they interact in ways that aren't immediately obvious. Here's how a single payment API request flows through them:

HTTP Request Timeout Layers
DNS Lookup
~5ms
0–50ms
TCP Connect
~30ms
50–200ms
TLS Handshake
~80ms
200–500ms
Send Request
~10ms
500–600ms
Resp. Headers
~200ms
600–2000ms
Resp. Body
variable
2000ms+
DialContext timeout (DNS + TCP)
TLSHandshakeTimeout
client.Timeout (entire request)

The critical thing to understand: client.Timeout covers the entire lifecycle from dial to reading the last byte of the response body. Transport.TLSHandshakeTimeout only covers the TLS negotiation. And context.Context deadlines override everything — if your context expires, the request is cancelled regardless of other timeout settings.

For payment APIs, I set the overall client.Timeout to 30 seconds (payment providers can be slow during settlement windows), but I keep the dial and TLS timeouts tight at 5 and 10 seconds respectively. If we can't even establish a connection in 5 seconds, something is seriously wrong and we should fail fast.

A Properly Configured Payment HTTP Client

Here's the client configuration we run in production for all external payment provider calls:

package payment

import (
    "context"
    "crypto/tls"
    "io"
    "net"
    "net/http"
    "time"
)

// NewPaymentHTTPClient returns an *http.Client tuned for
// external payment API calls (Stripe, Adyen, Checkout.com, etc).
func NewPaymentHTTPClient() *http.Client {
    transport := &http.Transport{
        DialContext: (&net.Dialer{
            Timeout:   5 * time.Second,  // TCP connection timeout
            KeepAlive: 30 * time.Second, // TCP keep-alive probe interval
        }).DialContext,

        TLSClientConfig: &tls.Config{
            MinVersion: tls.VersionTLS12, // PCI DSS requirement
        },
        TLSHandshakeTimeout: 10 * time.Second,

        MaxIdleConns:        100, // Total idle connections across all hosts
        MaxIdleConnsPerHost: 20,  // Per-host idle pool (default is 2!)
        MaxConnsPerHost:     50,  // Hard cap per host — prevents stampede
        IdleConnTimeout:     90 * time.Second,

        ResponseHeaderTimeout: 15 * time.Second, // Time to wait for headers
        ExpectContinueTimeout: 1 * time.Second,

        ForceAttemptHTTP2: true,
    }

    return &http.Client{
        Transport: transport,
        Timeout:   30 * time.Second, // Hard ceiling for entire request
    }
}

A few things worth calling out. MaxIdleConnsPerHost at 20 means we keep up to 20 warm connections per payment provider. This avoids the constant TLS handshake overhead that kills your p99 latency. MaxConnsPerHost at 50 is the safety valve — even during a traffic spike, we won't open more than 50 concurrent connections to a single provider. This protects both us and the provider from connection floods.

The Response Body Drain Problem

This one catches almost everyone. When you're done reading a response, you need to fully drain and close the body. If you don't, the underlying TCP connection can't be returned to the pool — it gets discarded and a new one has to be established for the next request.

// drainAndClose fully reads and closes the response body so the
// underlying connection can be reused by the connection pool.
func drainAndClose(body io.ReadCloser) {
    // Read up to 8KB of remaining body to allow connection reuse.
    // We cap it to avoid reading a massive unexpected response.
    io.CopyN(io.Discard, body, 8192)
    body.Close()
}

// Usage in a payment call:
func (c *Client) CreateCharge(ctx context.Context, req ChargeRequest) (*Charge, error) {
    resp, err := c.httpClient.Do(httpReq)
    if err != nil {
        return nil, fmt.Errorf("charge request failed: %w", err)
    }
    defer drainAndClose(resp.Body)

    // ... decode response
}

The io.CopyN with a limit is intentional. You don't want to read an unbounded response body if the provider sends back something unexpected. 8KB is more than enough for any payment API error response, and it lets the connection return to the pool cleanly.

Retry Strategy for Idempotent Operations

Payment APIs are inherently dangerous to retry — you don't want to charge a customer twice. But most providers support idempotency keys, and network-level failures (timeouts, connection resets) are safe to retry because the request may never have reached the provider.

func (c *Client) doWithRetry(ctx context.Context, req *http.Request) (*http.Response, error) {
    maxRetries := 3
    baseDelay := 200 * time.Millisecond

    var lastErr error
    for attempt := 0; attempt <= maxRetries; attempt++ {
        if attempt > 0 {
            delay := baseDelay * time.Duration(1<= 502 {
            drainAndClose(resp.Body)
            lastErr = fmt.Errorf("provider returned %d", resp.StatusCode)
            continue
        }

        return resp, nil
    }

    return nil, fmt.Errorf("all %d attempts failed: %w", maxRetries+1, lastErr)
}

The exponential backoff goes 200ms, 400ms, 800ms. For payment APIs, I keep the delays short — if a provider is down, I'd rather fail fast and let the caller handle it than hold a customer's checkout flow hostage for 30 seconds of retries. Always respect ctx.Done() between retries so upstream timeouts propagate correctly.

Important: Only retry when you have an idempotency key set in the request headers. If the request reached the provider and you got a timeout reading the response, retrying without an idempotency key risks duplicate charges. Stripe uses Idempotency-Key, Adyen uses a reference field, and Checkout.com uses Cko-Idempotency-Key. Always set these before entering the retry loop.

Production Incident: Connection Pool Exhaustion

Incident timeline — March 2025: One of our payment providers started responding with 15-second delays instead of the usual 200ms. We were using the default MaxIdleConnsPerHost of 2 and had no MaxConnsPerHost limit. Within 12 minutes, our service had opened 3,400+ connections to a single host. File descriptors hit the OS limit, and new connections to all providers started failing with socket: too many open files. Every payment route went down — not just the slow provider. Recovery took 8 minutes after we deployed a hotfix with the tuned transport settings above.

The root cause was straightforward: slow responses meant connections were held longer, the tiny idle pool couldn't absorb the load, so Go kept opening new connections. Without MaxConnsPerHost, there was no ceiling. The fix was the configuration shown above, plus adding ResponseHeaderTimeout so we'd bail on slow providers before they could accumulate thousands of connections.

P99 Latency
12.4s
380ms
97% reduction
Error Rate
18.3%
0.05%
near zero
Open Connections
3,400+
~45
per provider

After deploying the tuned client, the numbers spoke for themselves. The MaxConnsPerHost cap meant that even when a provider degraded again two weeks later, our connection count stayed flat at 50 and the circuit breaker tripped cleanly without affecting other providers.

Key Takeaways

References

Disclaimer

The code examples and configurations in this article are based on personal production experience and are provided for educational purposes. Timeout values, connection pool sizes, and retry strategies should be tuned to your specific traffic patterns, provider SLAs, and infrastructure constraints. Always load test configuration changes before deploying to production, and consult your payment provider's documentation for their recommended integration practices.