Tuning Go's HTTP Client for External Payment API Calls

The Default Client Is a Ticking Time Bomb

If you've ever written http.Get("https://api.stripe.com/v1/charges") in production code, you've shipped a bug. Go's http.DefaultClient has no timeout. None. A payment provider that hangs during a TLS handshake will block your goroutine forever, and you won't know until your service runs out of memory or file descriptors.

I learned this the hard way. We had a payment service running smoothly for months — until one of our providers started responding slowly during a regional outage. Within 20 minutes, our connection pool was exhausted, goroutines were piling up, and we were dropping legitimate payment requests across all providers, not just the degraded one.

The default http.Transport also has some defaults that don't make sense for payment workloads. MaxIdleConnsPerHost defaults to 2. If you're sending 500 requests per second to Stripe's API, you're tearing down and rebuilding TLS connections constantly. Each new TLS handshake to a payment API adds 50-150ms of latency you didn't need.

Understanding the Timeout Layers

Go's HTTP stack has multiple timeout boundaries, and they interact in ways that aren't immediately obvious. Here's how a single payment API request flows through them:

HTTP Request Timeout Layers

DNS Lookup

~5ms

0–50ms

TCP Connect

~30ms

50–200ms

TLS Handshake

~80ms

200–500ms

Send Request

~10ms

500–600ms

Resp. Headers

~200ms

600–2000ms

Resp. Body

variable

2000ms+

DialContext timeout (DNS + TCP)

TLSHandshakeTimeout

client.Timeout (entire request)

The critical thing to understand: client.Timeout covers the entire lifecycle from dial to reading the last byte of the response body. Transport.TLSHandshakeTimeout only covers the TLS negotiation. And context.Context deadlines override everything — if your context expires, the request is cancelled regardless of other timeout settings.

For payment APIs, I set the overall client.Timeout to 30 seconds (payment providers can be slow during settlement windows), but I keep the dial and TLS timeouts tight at 5 and 10 seconds respectively. If we can't even establish a connection in 5 seconds, something is seriously wrong and we should fail fast.

A Properly Configured Payment HTTP Client

Here's the client configuration we run in production for all external payment provider calls:

package payment

import (
    "context"
    "crypto/tls"
    "io"
    "net"
    "net/http"
    "time"
)

// NewPaymentHTTPClient returns an *http.Client tuned for
// external payment API calls (Stripe, Adyen, Checkout.com, etc).
func NewPaymentHTTPClient() *http.Client {
    transport := &http.Transport{
        DialContext: (&net.Dialer{
            Timeout:   5 * time.Second,  // TCP connection timeout
            KeepAlive: 30 * time.Second, // TCP keep-alive probe interval
        }).DialContext,

        TLSClientConfig: &tls.Config{
            MinVersion: tls.VersionTLS12, // PCI DSS requirement
        },
        TLSHandshakeTimeout: 10 * time.Second,

        MaxIdleConns:        100, // Total idle connections across all hosts
        MaxIdleConnsPerHost: 20,  // Per-host idle pool (default is 2!)
        MaxConnsPerHost:     50,  // Hard cap per host — prevents stampede
        IdleConnTimeout:     90 * time.Second,

        ResponseHeaderTimeout: 15 * time.Second, // Time to wait for headers
        ExpectContinueTimeout: 1 * time.Second,

        ForceAttemptHTTP2: true,
    }

    return &http.Client{
        Transport: transport,
        Timeout:   30 * time.Second, // Hard ceiling for entire request
    }
}

A few things worth calling out. MaxIdleConnsPerHost at 20 means we keep up to 20 warm connections per payment provider. This avoids the constant TLS handshake overhead that kills your p99 latency. MaxConnsPerHost at 50 is the safety valve — even during a traffic spike, we won't open more than 50 concurrent connections to a single provider. This protects both us and the provider from connection floods.

The Response Body Drain Problem

This one catches almost everyone. When you're done reading a response, you need to fully drain and close the body. If you don't, the underlying TCP connection can't be returned to the pool — it gets discarded and a new one has to be established for the next request.

// drainAndClose fully reads and closes the response body so the
// underlying connection can be reused by the connection pool.
func drainAndClose(body io.ReadCloser) {
    // Read up to 8KB of remaining body to allow connection reuse.
    // We cap it to avoid reading a massive unexpected response.
    io.CopyN(io.Discard, body, 8192)
    body.Close()
}

// Usage in a payment call:
func (c *Client) CreateCharge(ctx context.Context, req ChargeRequest) (*Charge, error) {
    resp, err := c.httpClient.Do(httpReq)
    if err != nil {
        return nil, fmt.Errorf("charge request failed: %w", err)
    }
    defer drainAndClose(resp.Body)

    // ... decode response
}

The io.CopyN with a limit is intentional. You don't want to read an unbounded response body if the provider sends back something unexpected. 8KB is more than enough for any payment API error response, and it lets the connection return to the pool cleanly.

Retry Strategy for Idempotent Operations

Payment APIs are inherently dangerous to retry — you don't want to charge a customer twice. But most providers support idempotency keys, and network-level failures (timeouts, connection resets) are safe to retry because the request may never have reached the provider.

func (c *Client) doWithRetry(ctx context.Context, req *http.Request) (*http.Response, error) {
    maxRetries := 3
    baseDelay := 200 * time.Millisecond

    var lastErr error
    for attempt := 0; attempt <= maxRetries; attempt++ {
        if attempt > 0 {
            delay := baseDelay * time.Duration(1<= 502 {
            drainAndClose(resp.Body)
            lastErr = fmt.Errorf("provider returned %d", resp.StatusCode)
            continue
        }

        return resp, nil
    }

    return nil, fmt.Errorf("all %d attempts failed: %w", maxRetries+1, lastErr)
}

The exponential backoff goes 200ms, 400ms, 800ms. For payment APIs, I keep the delays short — if a provider is down, I'd rather fail fast and let the caller handle it than hold a customer's checkout flow hostage for 30 seconds of retries. Always respect ctx.Done() between retries so upstream timeouts propagate correctly.

Important: Only retry when you have an idempotency key set in the request headers. If the request reached the provider and you got a timeout reading the response, retrying without an idempotency key risks duplicate charges. Stripe uses Idempotency-Key, Adyen uses a reference field, and Checkout.com uses Cko-Idempotency-Key. Always set these before entering the retry loop.

Production Incident: Connection Pool Exhaustion

Incident timeline — March 2025: One of our payment providers started responding with 15-second delays instead of the usual 200ms. We were using the default MaxIdleConnsPerHost of 2 and had no MaxConnsPerHost limit. Within 12 minutes, our service had opened 3,400+ connections to a single host. File descriptors hit the OS limit, and new connections to all providers started failing with socket: too many open files. Every payment route went down — not just the slow provider. Recovery took 8 minutes after we deployed a hotfix with the tuned transport settings above.

The root cause was straightforward: slow responses meant connections were held longer, the tiny idle pool couldn't absorb the load, so Go kept opening new connections. Without MaxConnsPerHost, there was no ceiling. The fix was the configuration shown above, plus adding ResponseHeaderTimeout so we'd bail on slow providers before they could accumulate thousands of connections.

P99 Latency

12.4s

↓

380ms

97% reduction

Error Rate

18.3%

↓

0.05%

near zero

Open Connections

3,400+

↓

~45

per provider

After deploying the tuned client, the numbers spoke for themselves. The MaxConnsPerHost cap meant that even when a provider degraded again two weeks later, our connection count stayed flat at 50 and the circuit breaker tripped cleanly without affecting other providers.

Key Takeaways

Never use http.DefaultClient for payment APIs. Always configure explicit timeouts and transport settings.
Set MaxConnsPerHost to prevent a single slow provider from exhausting your file descriptors and taking down all payment routes.
Bump MaxIdleConnsPerHost well above the default of 2 — TLS handshakes to payment providers are expensive and add real latency.
Always drain response bodies before closing, or your connection pool will silently leak.
Use idempotency keys and exponential backoff for retries. Never retry a payment request without an idempotency key.
Layer your timeouts: tight dial and TLS timeouts for fast failure detection, a longer overall timeout for the full request lifecycle.

References

Go net/http package documentation — Official reference for http.Client, Transport, and all timeout fields.
http.Transport type documentation — Detailed docs on connection pooling, idle connection management, and TLS configuration.
Go Concurrency Patterns: Context — The Go Blog's guide to context propagation and deadline management.
The complete guide to Go net/http timeouts — Cloudflare's comprehensive breakdown of every timeout in Go's HTTP stack.
Stripe API: Idempotent Requests — Stripe's documentation on idempotency keys for safe retries.

Disclaimer

The code examples and configurations in this article are based on personal production experience and are provided for educational purposes. Timeout values, connection pool sizes, and retry strategies should be tuned to your specific traffic patterns, provider SLAs, and infrastructure constraints. Always load test configuration changes before deploying to production, and consult your payment provider's documentation for their recommended integration practices.