Building Real-Time Fraud Detection for Payment Systems

Fraud Detection Is Not a Feature You Bolt On Later

I've seen teams treat fraud detection as a phase-two concern — something to figure out after launch. That's a mistake that gets expensive fast. A single undetected fraud ring can burn through six figures in chargebacks before anyone notices the pattern. And once your payment processor flags your chargeback ratio above 1%, you're looking at penalty programs, higher processing fees, or losing your merchant account entirely.

The reality is that fraud detection needs to be part of your transaction flow from day one. Not as a monolith — you don't need a perfect system at launch — but as a layered architecture that you can iterate on. Start simple, add complexity where the data tells you to.

< 50ms

Latency target per check

0.1%

False positive rate goal

99.7%

Fraud catch rate target

Those numbers aren't aspirational — they're the baseline your payment partners expect. Miss the latency target and your checkout conversion drops. Let false positives creep up and you're blocking legitimate customers. Let fraud through and you're eating chargebacks.

The Three-Layer Architecture

Every fraud detection system I've built or worked on follows roughly the same shape: a rule engine for known patterns, velocity checks for behavioral anomalies, and an ML model for the stuff that's hard to write rules for. They run in sequence, and each layer can short-circuit the pipeline if the signal is strong enough.

Transaction

Rule Engine

Velocity Check

ML Score

Approve

Review

Block

The key insight: each layer is cheap on its own. Rules are just conditionals. Velocity checks are Redis lookups. The ML model is a single inference call. Stacked together, they give you coverage that no single approach can match.

Layer 1: The Rule Engine

Rules are your first line of defense, and honestly, they catch more fraud than most people expect. A well-maintained rule engine handles 60-70% of fraud attempts before anything else even runs.

Start with these baseline rules:

Block transactions from sanctioned countries (OFAC list)
Reject mismatched billing/shipping countries on high-value orders
Flag card-not-present transactions above your 95th percentile amount
Block known BIN ranges associated with prepaid cards (if your risk model warrants it)
Reject transactions where the email domain was registered less than 7 days ago

Structure your rules as independent, composable units. Each rule returns a score contribution and a reason code. Don't build a giant if-else tree — you'll regret it the first time you need to disable a rule at 2 AM during an incident.

Tip: Store your rules in a configuration layer (database or config service), not in application code. You want your fraud ops team to be able to toggle rules without a deployment. Hot-reloading rule configs has saved me more than once during active fraud attacks.

Layer 2: Velocity Checks

Velocity checks answer a simple question: is this entity doing something too fast or too often? Fraudsters operate at scale — they're testing stolen card numbers, cycling through accounts, or hammering your checkout from the same device fingerprint.

What to track

Transactions per card number in the last 1, 5, and 60 minutes
Distinct cards used per device fingerprint in the last hour
Total spend per user account in the last 24 hours
Failed transaction attempts per IP address in the last 10 minutes
Unique shipping addresses per card in the last 7 days

Redis is the standard tool here. Use sorted sets with timestamps as scores, and ZRANGEBYSCORE to count events within your time windows. Set TTLs on your keys so you're not accumulating stale data. For most payment volumes, a single Redis instance handles this comfortably — you're looking at sub-millisecond lookups.

The tricky part is setting thresholds. Start conservative (you'd rather flag too much than too little), then tune based on your actual transaction distribution. Pull your 99th percentile values for each metric and use those as starting points.

Layer 3: ML Scoring

Machine learning fills the gaps that rules and velocity checks can't cover. Fraudsters adapt — they slow down their attack rate, rotate IPs, use residential proxies. A good ML model picks up on subtler correlations: the combination of a new device, a high-value order, and a shipping address that's 500 miles from the billing zip.

Feature engineering matters more than model choice

I've seen teams spend weeks tuning hyperparameters on an XGBoost model when the real leverage was in the features. Focus your energy here:

Time since account creation vs. transaction amount (new accounts spending big is a signal)
Distance between billing address and IP geolocation
Device fingerprint age — how long have you seen this browser/device combination?
Historical chargeback rate for the card's BIN range
Transaction amount deviation from the user's own spending pattern

For model serving, latency is non-negotiable. You're inside a payment authorization flow — every millisecond counts. Pre-compute as many features as possible and store them in Redis or a feature store. The model inference itself should be a single forward pass through a lightweight model (gradient-boosted trees or a small neural net), served behind something like TensorFlow Serving or a custom gRPC endpoint. Keep it under 20ms.

Warning: Never train your fraud model only on labeled fraud cases. You need a representative sample of legitimate transactions too, or your model will overfit to the fraud patterns you've already caught and miss novel attack vectors. Use stratified sampling and consider techniques like SMOTE for handling class imbalance.

Comparing Approaches

No single approach works in isolation. Here's how they stack up:

Approach	Strengths	Weaknesses	Latency
Rules Only	Fast, explainable, easy to audit	Brittle, can't adapt to new patterns	< 5ms
ML Only	Catches novel fraud, learns from data	Black box, needs training data, drift risk	10-30ms
Hybrid (recommended)	Best coverage, layered defense	More operational overhead	30-50ms

The hybrid approach is what you want in production. Rules handle the obvious stuff instantly. ML catches the sophisticated attacks. Velocity checks sit in between, catching the volumetric patterns that neither rules nor ML handle well on their own.

The Decision Engine

Each layer produces a score. The decision engine combines them into a final verdict: approve, send to manual review, or block. This is where you define your risk appetite.

A simple weighted-sum approach works well to start:

final_score = (rule_score * 0.3) + (velocity_score * 0.3) + (ml_score * 0.4)

if final_score < 20:
    return "APPROVE"
elif final_score < 65:
    return "REVIEW"
else:
    return "BLOCK"

Those weights and thresholds are tunable. In practice, I adjust them weekly based on the previous week's fraud-to-false-positive ratio. The review queue is your pressure valve — if it's overflowing, your thresholds are too aggressive. If fraud is slipping through, they're too loose.

One thing I'd stress: always log the full scoring breakdown for every transaction. When a chargeback comes in three months later, you need to understand why the system approved it. That audit trail is also what feeds your ML model's next training cycle.

Monitoring and Feedback Loops

A fraud system without monitoring is just a liability. You need real-time dashboards tracking:

Approval, review, and block rates (sudden shifts mean something changed)
Rule trigger frequency (a rule that never fires is dead weight; one that fires constantly might be too broad)
ML model score distribution (watch for drift — if the distribution shifts, your model is going stale)
Chargeback rate by cohort (time period, merchant category, geography)
Manual review turnaround time and outcomes

The feedback loop is what makes the system get smarter over time. Chargebacks are your ground truth — when one comes in, trace it back through the pipeline. Which rules did it pass? What was the ML score? Use that data to retrain your model monthly and adjust your rules quarterly.

Set up alerts for anomalies: a 2x spike in block rate, a sudden drop in approval rate, or ML scores clustering in an unusual range. These are early indicators that either fraud patterns have shifted or something in your pipeline is broken.

References

PCI DSS Document Library — PCI Security Standards Council's official documentation on payment data security requirements.
Redis Sorted Sets Documentation — Official Redis docs on sorted sets, the data structure behind efficient velocity checks.
TensorFlow Serving Architecture — Google's guide to serving ML models in production with low-latency inference.
OFAC Sanctions Programs — U.S. Treasury's Office of Foreign Assets Control sanctions list and compliance guidance.
Scikit-learn Ensemble Methods — Documentation on gradient boosting and ensemble techniques commonly used in fraud scoring models.

Disclaimer: This article reflects personal engineering experience and is intended for educational purposes. Fraud detection requirements vary significantly by jurisdiction, payment network, and business context. Always consult with your compliance team, legal counsel, and payment processor before implementing fraud prevention systems. The thresholds and metrics mentioned are illustrative — your actual targets should be determined by your specific risk profile and regulatory obligations.

Fraud Detection Is Not a Feature You Bolt On Later

The Three-Layer Architecture

Layer 1: The Rule Engine

Layer 2: Velocity Checks

What to track

Layer 3: ML Scoring

Feature engineering matters more than model choice

Comparing Approaches

The Decision Engine

Monitoring and Feedback Loops

References

Related Articles