April 5, 2026 9 min read

Building Real-Time Fraud Detection for Payment Systems

Rule engines get you started. Velocity checks catch the patterns. ML scoring handles the edge cases. Here's how to wire them together without tanking your latency budget.

Fraud Detection Is Not a Feature You Bolt On Later

I've seen teams treat fraud detection as a phase-two concern — something to figure out after launch. That's a mistake that gets expensive fast. A single undetected fraud ring can burn through six figures in chargebacks before anyone notices the pattern. And once your payment processor flags your chargeback ratio above 1%, you're looking at penalty programs, higher processing fees, or losing your merchant account entirely.

The reality is that fraud detection needs to be part of your transaction flow from day one. Not as a monolith — you don't need a perfect system at launch — but as a layered architecture that you can iterate on. Start simple, add complexity where the data tells you to.

< 50ms
Latency target per check
0.1%
False positive rate goal
99.7%
Fraud catch rate target

Those numbers aren't aspirational — they're the baseline your payment partners expect. Miss the latency target and your checkout conversion drops. Let false positives creep up and you're blocking legitimate customers. Let fraud through and you're eating chargebacks.

The Three-Layer Architecture

Every fraud detection system I've built or worked on follows roughly the same shape: a rule engine for known patterns, velocity checks for behavioral anomalies, and an ML model for the stuff that's hard to write rules for. They run in sequence, and each layer can short-circuit the pipeline if the signal is strong enough.

Transaction
Rule Engine
Velocity Check
ML Score
Approve
Review
Block

The key insight: each layer is cheap on its own. Rules are just conditionals. Velocity checks are Redis lookups. The ML model is a single inference call. Stacked together, they give you coverage that no single approach can match.

Layer 1: The Rule Engine

Rules are your first line of defense, and honestly, they catch more fraud than most people expect. A well-maintained rule engine handles 60-70% of fraud attempts before anything else even runs.

Start with these baseline rules:

Structure your rules as independent, composable units. Each rule returns a score contribution and a reason code. Don't build a giant if-else tree — you'll regret it the first time you need to disable a rule at 2 AM during an incident.

Tip: Store your rules in a configuration layer (database or config service), not in application code. You want your fraud ops team to be able to toggle rules without a deployment. Hot-reloading rule configs has saved me more than once during active fraud attacks.

Layer 2: Velocity Checks

Velocity checks answer a simple question: is this entity doing something too fast or too often? Fraudsters operate at scale — they're testing stolen card numbers, cycling through accounts, or hammering your checkout from the same device fingerprint.

What to track

Redis is the standard tool here. Use sorted sets with timestamps as scores, and ZRANGEBYSCORE to count events within your time windows. Set TTLs on your keys so you're not accumulating stale data. For most payment volumes, a single Redis instance handles this comfortably — you're looking at sub-millisecond lookups.

The tricky part is setting thresholds. Start conservative (you'd rather flag too much than too little), then tune based on your actual transaction distribution. Pull your 99th percentile values for each metric and use those as starting points.

Layer 3: ML Scoring

Machine learning fills the gaps that rules and velocity checks can't cover. Fraudsters adapt — they slow down their attack rate, rotate IPs, use residential proxies. A good ML model picks up on subtler correlations: the combination of a new device, a high-value order, and a shipping address that's 500 miles from the billing zip.

Feature engineering matters more than model choice

I've seen teams spend weeks tuning hyperparameters on an XGBoost model when the real leverage was in the features. Focus your energy here:

For model serving, latency is non-negotiable. You're inside a payment authorization flow — every millisecond counts. Pre-compute as many features as possible and store them in Redis or a feature store. The model inference itself should be a single forward pass through a lightweight model (gradient-boosted trees or a small neural net), served behind something like TensorFlow Serving or a custom gRPC endpoint. Keep it under 20ms.

Warning: Never train your fraud model only on labeled fraud cases. You need a representative sample of legitimate transactions too, or your model will overfit to the fraud patterns you've already caught and miss novel attack vectors. Use stratified sampling and consider techniques like SMOTE for handling class imbalance.

Comparing Approaches

No single approach works in isolation. Here's how they stack up:

Approach Strengths Weaknesses Latency
Rules Only Fast, explainable, easy to audit Brittle, can't adapt to new patterns < 5ms
ML Only Catches novel fraud, learns from data Black box, needs training data, drift risk 10-30ms
Hybrid (recommended) Best coverage, layered defense More operational overhead 30-50ms

The hybrid approach is what you want in production. Rules handle the obvious stuff instantly. ML catches the sophisticated attacks. Velocity checks sit in between, catching the volumetric patterns that neither rules nor ML handle well on their own.

The Decision Engine

Each layer produces a score. The decision engine combines them into a final verdict: approve, send to manual review, or block. This is where you define your risk appetite.

A simple weighted-sum approach works well to start:

final_score = (rule_score * 0.3) + (velocity_score * 0.3) + (ml_score * 0.4)

if final_score < 20:
    return "APPROVE"
elif final_score < 65:
    return "REVIEW"
else:
    return "BLOCK"

Those weights and thresholds are tunable. In practice, I adjust them weekly based on the previous week's fraud-to-false-positive ratio. The review queue is your pressure valve — if it's overflowing, your thresholds are too aggressive. If fraud is slipping through, they're too loose.

One thing I'd stress: always log the full scoring breakdown for every transaction. When a chargeback comes in three months later, you need to understand why the system approved it. That audit trail is also what feeds your ML model's next training cycle.

Monitoring and Feedback Loops

A fraud system without monitoring is just a liability. You need real-time dashboards tracking:

The feedback loop is what makes the system get smarter over time. Chargebacks are your ground truth — when one comes in, trace it back through the pipeline. Which rules did it pass? What was the ML score? Use that data to retrain your model monthly and adjust your rules quarterly.

Set up alerts for anomalies: a 2x spike in block rate, a sudden drop in approval rate, or ML scores clustering in an unusual range. These are early indicators that either fraud patterns have shifted or something in your pipeline is broken.

References


Disclaimer: This article reflects personal engineering experience and is intended for educational purposes. Fraud detection requirements vary significantly by jurisdiction, payment network, and business context. Always consult with your compliance team, legal counsel, and payment processor before implementing fraud prevention systems. The thresholds and metrics mentioned are illustrative — your actual targets should be determined by your specific risk profile and regulatory obligations.