The 14-Hour Grep Session
A merchant reported that a $2,340 refund never arrived. Simple enough, right? Pull up the transaction ID, trace the flow, find the failure. Except our logs looked like this:
2025-11-14 03:22:41 processing refund for merchant acme_corp amount 2340
2025-11-14 03:22:41 calling provider API...
2025-11-14 03:22:42 provider returned status 200
2025-11-14 03:22:42 updating ledger
2025-11-14 03:22:43 error: context deadline exceeded
No transaction ID in the log line. No correlation ID linking the request across services. No structured fields I could query. Just free-text strings that some engineer wrote with fmt.Printf two years ago. I was grepping through 47GB of raw text files, trying to match timestamps across six different services running in different time zones.
The actual bug? The ledger service had a 1-second timeout that was too aggressive for refunds over $1,000 (which required an extra fraud check). Took me 14 hours to find what should have been a 5-minute Kibana query.
Setting Up slog for Payment Services
Go 1.21 shipped log/slog in the standard library, and it's genuinely good enough for production payment systems. No need for zerolog or zap anymore — unless you're squeezing out every nanosecond of allocation overhead, which you probably aren't if you're making network calls to Visa.
package main
import (
"context"
"log/slog"
"os"
)
func initLogger() *slog.Logger {
handler := slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
Level: slog.LevelInfo,
ReplaceAttr: func(groups []string, a slog.Attr) slog.Attr {
// Mask card numbers at the handler level
if a.Key == "card_number" {
masked := maskPAN(a.Value.String())
return slog.String("card_number", masked)
}
return a
},
})
return slog.New(handler)
}
func maskPAN(pan string) string {
if len(pan) < 10 {
return "****"
}
return pan[:6] + "******" + pan[len(pan)-4:]
}
Every payment log line in our system includes these fields. No exceptions:
logger.InfoContext(ctx, "payment processed",
slog.String("transaction_id", txn.ID),
slog.String("merchant_id", txn.MerchantID),
slog.Int64("amount", txn.Amount),
slog.String("currency", txn.Currency),
slog.String("status", "success"),
slog.String("provider", "stripe"),
slog.Duration("latency", elapsed),
slog.String("correlation_id", correlationID(ctx)),
)
Key rule: Every log line that touches money must have transaction_id, merchant_id, amount, currency, and status. We enforce this with a custom linter that fails CI if a payment log call is missing any of these fields.
Correlation IDs: The Glue Across Services
The single most impactful change we made was propagating a correlation ID through every service in the payment chain. When a request hits our API gateway, it generates a UUID and stuffs it into the context. Every downstream service extracts it and includes it in every log line.
// Middleware that extracts or generates correlation ID
func CorrelationMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
corrID := r.Header.Get("X-Correlation-ID")
if corrID == "" {
corrID = uuid.NewString()
}
ctx := context.WithValue(r.Context(), corrIDKey, corrID)
w.Header().Set("X-Correlation-ID", corrID)
next.ServeHTTP(w, r.WithContext(ctx))
})
}
Now when that merchant calls about a missing refund, I type correlation_id: "abc-123" into Kibana and see every log line from every service, in order, with all the context I need. What used to take 14 hours takes 30 seconds.
Log Levels That Actually Make Sense
Most teams overthink log levels. Here's the strategy we settled on after a lot of trial and error:
- ERROR — money was lost, stuck, or a customer is impacted right now. Pages someone at 3am.
- WARN — something unexpected happened but we recovered. Retry succeeded, fallback kicked in, timeout was close. Review in the morning.
- INFO — normal payment lifecycle events. Transaction created, authorized, captured, settled. This is your audit trail.
- DEBUG — request/response bodies, internal state transitions. Off in production unless you're actively debugging.
PCI DSS warning: Never log full card numbers (PANs), CVVs, PINs, or full magnetic stripe data. Even at DEBUG level. Even in staging. We mask PANs to show only the first 6 and last 4 digits (BIN + last four), and we strip CVV fields entirely before they reach the logger. Our CI pipeline runs a scanner that flags any log statement containing fields named cvv, cvc, pin, or security_code.
Shipping Logs to ELK and Datadog
Since slog outputs JSON to stdout, the integration with any log aggregator is straightforward. In our Kubernetes setup, Fluentd picks up container stdout, parses the JSON (zero config needed since it's already structured), and ships it to Elasticsearch. We keep 30 days of hot storage and 1 year of cold storage in S3 for compliance.
For Datadog, we use their Fluentd output plugin. The structured fields become facets automatically — I can build dashboards that show payment failure rates by merchant, average latency by provider, and error distributions by currency. All from the same log data.
The Index Template That Saved Us
One thing that bit us early: Elasticsearch was auto-detecting amount as a string because the first document it saw had the value "2340" instead of 2340. After that, all numeric aggregations on amount were broken. We now ship an explicit index template that maps amount to long, latency_ms to float, and timestamp to date. Lesson learned: always define your mappings upfront for financial data.
The bottom line: Structured logging isn't a nice-to-have for payment systems — it's infrastructure. Every fmt.Println in your payment code is a future incident that takes 10x longer to debug. Migrate to slog, enforce mandatory fields, mask sensitive data at the handler level, and your on-call engineers will thank you at 3am.
References
- Go Standard Library — log/slog Package Documentation
- PCI Security Standards Council — Document Library
- Elasticsearch — Index Templates Documentation
- Fluentd — Elasticsearch Output Plugin
- OpenTelemetry — Logs Specification
Disclaimer: This article reflects the author's personal experience and opinions. Product names, logos, and brands are property of their respective owners. Pricing and features mentioned are subject to change — always verify with official documentation.