Engineering a KYC/AML Compliance Pipeline That Doesn't Kill Your Conversion

Compliance Engineering Is a Different Animal

Most backend work follows a predictable pattern: get a request, validate it, process it, return a response. Compliance engineering breaks that model in ways that catch teams off guard.

First, your requirements come from regulators, not product managers. The FATF (Financial Action Task Force) publishes recommendations that get translated into local law — MAS Notice 626 in Singapore, the Bank Secrecy Act in the US, the EU's 6th Anti-Money Laundering Directive. These aren't suggestions. Get them wrong and you're looking at fines, license revocation, or worse.

Second, everything needs an audit trail. Not just logging — immutable, tamper-evident records of every decision your system makes. When a compliance officer from MAS walks in and asks "why did you approve this customer 14 months ago," you need to produce the exact document they submitted, the exact checks you ran, and the exact risk score your system computed. In under an hour.

Third, false positives are your daily reality. Sanctions screening will flag "Mohammed Ali" against half the watchlists on earth. Your system needs to handle this gracefully without blocking every legitimate customer named Mohammed from opening an account.

The KYC Verification Pipeline

Here's the pipeline I've converged on after building this twice. Each step feeds into the next, and the whole thing needs to complete in under 60 seconds for a good user experience.

Document UploadID / Passport

→

OCR ExtractName, DOB, ID#

→

Liveness CheckSelfie match

→

Database VerifyGov registry

→

Risk ScoreComposite

→

Approve< 30

Review30-70

Reject> 70

Document upload and OCR

The user submits a government-issued ID — passport, national ID card, or driver's license. Your system needs to extract structured data (full name, date of birth, document number, expiry date) from the image. Vendors like Jumio, Onfido, and Veriff handle this well. I've used Jumio's Netverify and Onfido's Document Report API in production — both return structured JSON with confidence scores per field.

The engineering gotcha: OCR confidence varies wildly by document type and country. A Swedish passport scans at 99%+ accuracy. A worn Indonesian KTP might come back at 70%. You need threshold logic per document type, and a fallback to manual review when confidence drops below your threshold.

Liveness detection

This is where you verify the person holding the phone is the same person on the ID document. The user takes a selfie (or a short video), and the system compares it against the document photo. Onfido's Facial Similarity Report and iProov's Genuine Presence Assurance are the two I've worked with. iProov is more resistant to deepfake attacks but adds 3-5 seconds to the flow.

Database verification

Cross-reference the extracted data against government registries or credit bureaus. In Singapore, you can verify against MyInfo (via SingPass). In the US, vendors like Socure and Alloy aggregate data from credit bureaus, DMV records, and utility databases. This step catches synthetic identities — fake people constructed from real data fragments.

AML Screening

Once you know who the customer is, you need to check if they're someone you're not allowed to do business with. This is AML screening, and it runs in parallel with (or immediately after) KYC verification.

Sanctions lists

At minimum, you're screening against:

OFAC SDN List — US Treasury's Specially Designated Nationals list. If you process USD or have any US nexus, this is mandatory.
EU Consolidated Sanctions List — covers all EU restrictive measures.
UN Security Council Consolidated List — global baseline.
Local lists — MAS publishes its own list in Singapore, AUSTRAC in Australia, etc.

The technical challenge is fuzzy matching. You can't just do exact string comparison — "Muammar Gaddafi" appears as "Moammar Gadhafi," "Mu'ammar Al-Qadhafi," and about 30 other transliterations on various lists. Vendors like ComplyAdvantage, Dow Jones Risk & Compliance, and Refinitiv World-Check handle the fuzzy matching and list aggregation. Rolling your own is a mistake — the edge cases in name transliteration alone will consume months of engineering time.

PEP and adverse media screening

Politically Exposed Persons (PEPs) aren't blocked outright, but they require Enhanced Due Diligence (EDD). Your system needs to flag them and route them to a compliance analyst for manual review. Adverse media screening checks news sources for negative coverage — fraud charges, money laundering investigations, sanctions evasion. ComplyAdvantage and Dow Jones both offer API-based adverse media screening that returns structured risk indicators.

Key point: AML screening isn't a one-time check. You need ongoing monitoring — re-screening your entire customer base every time a sanctions list updates. OFAC updates their list roughly every two weeks. Build this as a batch job, not a one-shot process.

Transaction Monitoring

KYC tells you who the customer is. Transaction monitoring tells you what they're doing. This is where you detect suspicious patterns — structuring (breaking large amounts into smaller ones to avoid reporting thresholds), rapid movement of funds, transactions with high-risk jurisdictions, or unusual spikes in volume.

Rule-based vs ML-based approaches

Approach	Pros	Cons
Rule-based e.g., Actimize, NICE	Explainable to regulators, deterministic, easy to audit. You can point to the exact rule that triggered an alert.	High false positive rates (often 95%+). Criminals learn the thresholds. Maintaining hundreds of rules becomes a nightmare.
ML-based e.g., Featurespace, Feedzai	Catches novel patterns, lower false positive rates, adapts to evolving behavior.	Black-box problem — regulators ask "why did this trigger?" and "the model said so" isn't an answer. Requires significant training data.
Hybrid Rules + ML scoring	Rules handle regulatory minimums (e.g., CTR filing at $10K). ML handles anomaly detection. Best of both worlds.	More complex to build and maintain. Need both rule engine and ML infrastructure.

In practice, every production system I've seen uses a hybrid. Rules catch the obvious stuff — transactions over $10,000 trigger a Currency Transaction Report (CTR) in the US, no ML needed. The ML layer sits on top and catches the patterns that rules miss, like a customer who suddenly starts receiving dozens of small transfers from new counterparties.

When the system flags a transaction, it generates a Suspicious Activity Report (SAR) for the compliance team to review. If confirmed, the SAR gets filed with FinCEN (US), STRO (Singapore), or the equivalent local FIU. Your system needs to track the full lifecycle: alert generated → analyst assigned → investigation notes → decision (file SAR / dismiss) → filing confirmation.

The Conversion Killer Problem

Here's the tension every FinTech product team fights over: a thorough KYC flow takes 5-10 minutes and requires a government ID, a selfie, and sometimes proof of address. Every step you add drops your onboarding conversion by 10-20%. I've watched a 6-step KYC flow achieve a 23% completion rate. That's 77% of potential customers gone before they ever make a transaction.

The solution is risk-based tiering. Instead of running every customer through the same gauntlet, you tier the verification based on what they want to do.

Tier 1 — Low Risk

Simplified KYC

Email + phone verification
Name + DOB collection
Sanctions screening only
Limit: $1,000/month

Tier 2 — Medium Risk

Standard KYC

Government ID + OCR
Liveness check
PEP + sanctions screening
Limit: $10,000/month

Tier 3 — High Risk

Enhanced Due Diligence

All of Tier 2
Proof of address
Source of funds documentation
Manual compliance review

This approach lets users start transacting within minutes at Tier 1, then progressively upgrade as they need higher limits. Wise (formerly TransferWise) does this well — you can send a small transfer with just basic details, but larger amounts require full document verification. The key is that the user is already invested in your product by the time you ask for the heavier checks.

Regulatory note: Risk-based tiering must be documented in your compliance program and approved by your compliance officer. You can't just pick arbitrary thresholds — they need to align with FATF Recommendation 10 and your local regulator's guidance on simplified due diligence. MAS, for example, has specific conditions under which simplified CDD is permitted.

Audit Trail Engineering

This is the part most engineers underestimate. Your compliance audit trail isn't just application logs — it's a legal record. When a regulator audits you, they want to see exactly what data you collected, what checks you ran, what the results were, and who made the final decision. Months or years after the fact.

The data model I use looks something like this:

// Go — compliance check result
type ComplianceCheckResult struct {
    CheckID       string    `json:"check_id"`
    CustomerID    string    `json:"customer_id"`
    CheckType     string    `json:"check_type"`     // "kyc_document", "sanctions", "pep", "adverse_media"
    Provider      string    `json:"provider"`        // "onfido", "complyadvantage", "jumio"
    Status        string    `json:"status"`          // "clear", "consider", "rejected"
    RiskScore     int       `json:"risk_score"`      // 0-100
    RawResponse   []byte    `json:"raw_response"`    // full vendor response, encrypted
    Decision      string    `json:"decision"`        // "auto_approved", "manual_review", "rejected"
    DecidedBy     string    `json:"decided_by"`      // "system" or analyst user ID
    DecisionNotes string    `json:"decision_notes"`
    CreatedAt     time.Time `json:"created_at"`
    ExpiresAt     time.Time `json:"expires_at"`      // KYC checks expire — typically 12-24 months
}

A few things to note. Store the raw vendor response — encrypted at rest — alongside your parsed result. I've been in situations where a regulator questioned a specific check, and having the raw Onfido or ComplyAdvantage response saved us weeks of back-and-forth. The ExpiresAt field is critical too: KYC checks aren't permanent. Most regulators expect periodic re-verification, especially for higher-risk customers.

For the webhook payload that kicks off downstream processes after a check completes:

// Compliance check webhook payload
{
  "event": "compliance.check.completed",
  "timestamp": "2026-04-05T14:32:00Z",
  "data": {
    "check_id": "chk_8a3b1f2e9d",
    "customer_id": "cus_4k9m2n7p",
    "check_type": "kyc_document",
    "status": "clear",
    "risk_score": 18,
    "decision": "auto_approved",
    "tier_assigned": "standard",
    "checks_completed": ["document_ocr", "liveness", "sanctions", "pep"],
    "next_review_date": "2027-04-05"
  }
}

Immutability matters: Use an append-only data store for compliance records. PostgreSQL with row-level security and no UPDATE/DELETE grants on the compliance schema works. Some teams use event sourcing with Kafka — every state change is a new event, and the current state is a projection. Either way, the principle is the same: once a compliance record is written, it can never be modified.

Practical Architecture Tips

After building this twice and watching other teams build it, here's what I'd do on day one:

Abstract your vendors behind an interface. You will switch KYC providers. Jumio's pricing changes, Onfido launches a better liveness product, your regulator mandates a local provider. If your business logic is coupled to a specific vendor's API, switching costs you months. Define a ComplianceChecker interface and implement it per vendor.
Make screening asynchronous. Don't block the user's signup request while you wait for a sanctions screening API that takes 2-8 seconds. Accept the signup, run screening in the background, and notify the user when they're approved. Most users clear in under 30 seconds anyway.
Build a manual review queue from day one. No matter how good your automation is, 5-15% of checks will need human review. Build the internal tool for your compliance analysts early — a simple dashboard showing pending reviews, the customer's submitted documents, screening results, and approve/reject buttons with mandatory notes.
Version your risk rules. When you change a screening threshold or add a new rule, you need to know which version of the rules applied to each customer. Store the rule version alongside every decision. This is non-negotiable for audit purposes.
Set up ongoing monitoring as a separate service. Initial KYC is a one-time pipeline. Ongoing AML monitoring (re-screening, transaction monitoring) is a continuous process. Keep them architecturally separate — different scaling characteristics, different SLAs, different failure modes.

References

Disclaimer: This article reflects the author's personal experience and opinions. It is not legal or regulatory advice. Compliance requirements vary by jurisdiction and business model — always consult with qualified legal counsel and your compliance team before implementing changes to your KYC/AML processes. Product names, logos, and brands are property of their respective owners.

Compliance Engineering Is a Different Animal

The KYC Verification Pipeline

Document upload and OCR

Liveness detection

Database verification

AML Screening

Sanctions lists

PEP and adverse media screening

Transaction Monitoring

Rule-based vs ML-based approaches

The Conversion Killer Problem

Tier 1 — Low Risk

Tier 2 — Medium Risk

Tier 3 — High Risk

Audit Trail Engineering

Practical Architecture Tips

References

Related Articles