Fraud Detection Is Not a Feature You Bolt On Later
I've seen teams treat fraud detection as a phase-two concern — something to figure out after launch. That's a mistake that gets expensive fast. A single undetected fraud ring can burn through six figures in chargebacks before anyone notices the pattern. And once your payment processor flags your chargeback ratio above 1%, you're looking at penalty programs, higher processing fees, or losing your merchant account entirely.
The reality is that fraud detection needs to be part of your transaction flow from day one. Not as a monolith — you don't need a perfect system at launch — but as a layered architecture that you can iterate on. Start simple, add complexity where the data tells you to.
Those numbers aren't aspirational — they're the baseline your payment partners expect. Miss the latency target and your checkout conversion drops. Let false positives creep up and you're blocking legitimate customers. Let fraud through and you're eating chargebacks.
The Three-Layer Architecture
Every fraud detection system I've built or worked on follows roughly the same shape: a rule engine for known patterns, velocity checks for behavioral anomalies, and an ML model for the stuff that's hard to write rules for. They run in sequence, and each layer can short-circuit the pipeline if the signal is strong enough.
The key insight: each layer is cheap on its own. Rules are just conditionals. Velocity checks are Redis lookups. The ML model is a single inference call. Stacked together, they give you coverage that no single approach can match.
Layer 1: The Rule Engine
Rules are your first line of defense, and honestly, they catch more fraud than most people expect. A well-maintained rule engine handles 60-70% of fraud attempts before anything else even runs.
Start with these baseline rules:
- Block transactions from sanctioned countries (OFAC list)
- Reject mismatched billing/shipping countries on high-value orders
- Flag card-not-present transactions above your 95th percentile amount
- Block known BIN ranges associated with prepaid cards (if your risk model warrants it)
- Reject transactions where the email domain was registered less than 7 days ago
Structure your rules as independent, composable units. Each rule returns a score contribution and a reason code. Don't build a giant if-else tree — you'll regret it the first time you need to disable a rule at 2 AM during an incident.
Tip: Store your rules in a configuration layer (database or config service), not in application code. You want your fraud ops team to be able to toggle rules without a deployment. Hot-reloading rule configs has saved me more than once during active fraud attacks.
Layer 2: Velocity Checks
Velocity checks answer a simple question: is this entity doing something too fast or too often? Fraudsters operate at scale — they're testing stolen card numbers, cycling through accounts, or hammering your checkout from the same device fingerprint.
What to track
- Transactions per card number in the last 1, 5, and 60 minutes
- Distinct cards used per device fingerprint in the last hour
- Total spend per user account in the last 24 hours
- Failed transaction attempts per IP address in the last 10 minutes
- Unique shipping addresses per card in the last 7 days
Redis is the standard tool here. Use sorted sets with timestamps as scores, and ZRANGEBYSCORE to count events within your time windows. Set TTLs on your keys so you're not accumulating stale data. For most payment volumes, a single Redis instance handles this comfortably — you're looking at sub-millisecond lookups.
The tricky part is setting thresholds. Start conservative (you'd rather flag too much than too little), then tune based on your actual transaction distribution. Pull your 99th percentile values for each metric and use those as starting points.
Layer 3: ML Scoring
Machine learning fills the gaps that rules and velocity checks can't cover. Fraudsters adapt — they slow down their attack rate, rotate IPs, use residential proxies. A good ML model picks up on subtler correlations: the combination of a new device, a high-value order, and a shipping address that's 500 miles from the billing zip.
Feature engineering matters more than model choice
I've seen teams spend weeks tuning hyperparameters on an XGBoost model when the real leverage was in the features. Focus your energy here:
- Time since account creation vs. transaction amount (new accounts spending big is a signal)
- Distance between billing address and IP geolocation
- Device fingerprint age — how long have you seen this browser/device combination?
- Historical chargeback rate for the card's BIN range
- Transaction amount deviation from the user's own spending pattern
For model serving, latency is non-negotiable. You're inside a payment authorization flow — every millisecond counts. Pre-compute as many features as possible and store them in Redis or a feature store. The model inference itself should be a single forward pass through a lightweight model (gradient-boosted trees or a small neural net), served behind something like TensorFlow Serving or a custom gRPC endpoint. Keep it under 20ms.
Warning: Never train your fraud model only on labeled fraud cases. You need a representative sample of legitimate transactions too, or your model will overfit to the fraud patterns you've already caught and miss novel attack vectors. Use stratified sampling and consider techniques like SMOTE for handling class imbalance.
Comparing Approaches
No single approach works in isolation. Here's how they stack up:
| Approach | Strengths | Weaknesses | Latency |
|---|---|---|---|
| Rules Only | Fast, explainable, easy to audit | Brittle, can't adapt to new patterns | < 5ms |
| ML Only | Catches novel fraud, learns from data | Black box, needs training data, drift risk | 10-30ms |
| Hybrid (recommended) | Best coverage, layered defense | More operational overhead | 30-50ms |
The hybrid approach is what you want in production. Rules handle the obvious stuff instantly. ML catches the sophisticated attacks. Velocity checks sit in between, catching the volumetric patterns that neither rules nor ML handle well on their own.
The Decision Engine
Each layer produces a score. The decision engine combines them into a final verdict: approve, send to manual review, or block. This is where you define your risk appetite.
A simple weighted-sum approach works well to start:
final_score = (rule_score * 0.3) + (velocity_score * 0.3) + (ml_score * 0.4)
if final_score < 20:
return "APPROVE"
elif final_score < 65:
return "REVIEW"
else:
return "BLOCK"
Those weights and thresholds are tunable. In practice, I adjust them weekly based on the previous week's fraud-to-false-positive ratio. The review queue is your pressure valve — if it's overflowing, your thresholds are too aggressive. If fraud is slipping through, they're too loose.
One thing I'd stress: always log the full scoring breakdown for every transaction. When a chargeback comes in three months later, you need to understand why the system approved it. That audit trail is also what feeds your ML model's next training cycle.
Monitoring and Feedback Loops
A fraud system without monitoring is just a liability. You need real-time dashboards tracking:
- Approval, review, and block rates (sudden shifts mean something changed)
- Rule trigger frequency (a rule that never fires is dead weight; one that fires constantly might be too broad)
- ML model score distribution (watch for drift — if the distribution shifts, your model is going stale)
- Chargeback rate by cohort (time period, merchant category, geography)
- Manual review turnaround time and outcomes
The feedback loop is what makes the system get smarter over time. Chargebacks are your ground truth — when one comes in, trace it back through the pipeline. Which rules did it pass? What was the ML score? Use that data to retrain your model monthly and adjust your rules quarterly.
Set up alerts for anomalies: a 2x spike in block rate, a sudden drop in approval rate, or ML scores clustering in an unusual range. These are early indicators that either fraud patterns have shifted or something in your pipeline is broken.
References
- PCI DSS Document Library — PCI Security Standards Council's official documentation on payment data security requirements.
- Redis Sorted Sets Documentation — Official Redis docs on sorted sets, the data structure behind efficient velocity checks.
- TensorFlow Serving Architecture — Google's guide to serving ML models in production with low-latency inference.
- OFAC Sanctions Programs — U.S. Treasury's Office of Foreign Assets Control sanctions list and compliance guidance.
- Scikit-learn Ensemble Methods — Documentation on gradient boosting and ensemble techniques commonly used in fraud scoring models.
Disclaimer: This article reflects personal engineering experience and is intended for educational purposes. Fraud detection requirements vary significantly by jurisdiction, payment network, and business context. Always consult with your compliance team, legal counsel, and payment processor before implementing fraud prevention systems. The thresholds and metrics mentioned are illustrative — your actual targets should be determined by your specific risk profile and regulatory obligations.