What Merchant Onboarding Actually Looks Like
If you've only built consumer KYC, merchant onboarding will surprise you. Consumer identity verification is relatively straightforward — one person, one ID, one selfie. Merchant onboarding is a different beast. You're verifying a business entity, its ownership structure, its financial history, and its risk profile. A single merchant application might touch six different third-party APIs, require documents in three different formats, and sit in a manual review queue for days if the business operates in a high-risk category.
The core challenge is balancing two competing pressures. Sales wants merchants live yesterday — every day a merchant waits is a day they might sign with a competitor. Compliance wants thorough verification — every merchant you approve without proper checks is a potential fraud vector, regulatory fine, or chargeback liability. The engineering job is building a pipeline that satisfies both.
Here's the pipeline I've converged on. Each stage can auto-advance or route to manual review depending on the risk signals collected so far.
The key insight: this isn't a linear pipeline in practice. KYB checks run in parallel. Risk scoring happens incrementally as data arrives. And the state machine governing application status needs to handle retries, partial completions, and documents that expire mid-review. I'll walk through each stage.
KYB Verification — More Than Just a Name Check
Know Your Business verification is the merchant equivalent of KYC, but significantly more involved. You're not just verifying a person — you're verifying that a business entity exists, is in good standing, and that the people claiming to own it actually do.
Document collection and entity verification
At minimum, you need the business's legal name, registration number, incorporation date, registered address, and tax ID. In the US, that means an EIN and state registration. In the UK, a Companies House number. In Singapore, a UEN. The first engineering decision: do you ask the merchant to upload documents, or do you pull data programmatically?
I strongly prefer programmatic verification where possible. Middesk is excellent for US business verification — you pass in a business name and state, and they return incorporation status, registered agent, Secretary of State filings, and even tax registration status. For the UK, the Companies House API is free and surprisingly good. The fallback is document upload — articles of incorporation, business licenses, tax certificates — which you then need to OCR and verify manually or through a vendor like Onfido.
Beneficial ownership
This is where it gets complicated. Regulations (FinCEN's Customer Due Diligence Rule in the US, the EU's 4th Anti-Money Laundering Directive) require you to identify every individual who owns 25% or more of the business. For a simple LLC with two partners, that's straightforward. For a holding company with nested subsidiaries across three jurisdictions, you're looking at a tree traversal problem.
I've built this as a recursive ownership graph. Each entity node has edges to its owners, and you walk the graph until you reach natural persons. Each person then goes through standard KYC — ID verification, sanctions screening, PEP checks. The engineering gotcha: circular ownership structures exist (Company A owns 30% of Company B, which owns 40% of Company A). Your graph traversal needs cycle detection, or you'll spin forever.
Practical tip: Middesk and Persona both offer beneficial ownership verification APIs that handle the graph traversal for you. Unless you're processing tens of thousands of applications per month, buy don't build for this piece. The edge cases in international ownership structures alone will consume a quarter of engineering time.
Building the Risk Scoring Model
Once you've verified the business exists and identified its owners, you need to assess how risky this merchant is. Risk scoring for merchants is fundamentally different from consumer credit scoring — you're predicting the likelihood of chargebacks, fraud, money laundering, or regulatory violations, not creditworthiness.
The three biggest input signals I've found:
- MCC code (Merchant Category Code). A four-digit code classifying the merchant's business type. MCC 5967 (direct marketing — inbound teleservices) is high-risk. MCC 5411 (grocery stores) is low-risk. Card networks publish their own risk classifications, and your acquiring bank will have opinions too.
- Projected transaction volume. A new coffee shop expecting $5K/month in card transactions is very different from a new online marketplace expecting $500K/month. Higher volume means higher exposure.
- Geography. A merchant incorporated in Delaware selling to US customers is lower risk than a merchant incorporated in a jurisdiction with weak AML enforcement selling cross-border.
Here's the risk matrix I use as a starting point. The actual weights get tuned based on your portfolio's historical chargeback and fraud data.
| MCC Category | US / EU / UK | APAC | High-Risk Jurisdictions |
|---|---|---|---|
| Low-risk Grocery, retail, SaaS |
Low | Low | Medium |
| Medium-risk Travel, digital goods, marketplaces |
Medium | Medium | High |
| High-risk Gambling, crypto, nutraceuticals |
High | High | Prohibited |
In practice, I compute a composite risk score from 0 to 100 using a weighted formula. MCC risk contributes about 35%, geography 25%, transaction volume 20%, and business age plus ownership complexity the remaining 20%. Scores below 40 auto-approve. Scores between 40 and 70 go to a review queue. Above 70 gets auto-declined with a reason code.
One thing I learned the hard way: don't hardcode these weights. Store them in a configuration service and version every change. When your compliance team asks "why did we approve merchant X six months ago," you need to reproduce the exact scoring model that was active at the time of that decision.
Compliance Pipeline Architecture
The onboarding pipeline is fundamentally an async, event-driven system. Merchant submits an application, and then a cascade of verification tasks kicks off — some in parallel, some sequential, some requiring human input. The worst architectural mistake I've seen is building this as a synchronous request-response flow. A single Middesk API call can take 30 seconds. Persona's identity verification can take minutes if the user needs to re-upload a document. You cannot hold an HTTP connection open for that.
State machine for application status
Every merchant application is a state machine. The states I use:
// Go — merchant application states
const (
StatusDraft = "draft" // Application started, not submitted
StatusSubmitted = "submitted" // Submitted, awaiting verification
StatusKYBInProgress = "kyb_in_progress" // Business verification running
StatusKYBFailed = "kyb_failed" // Business verification failed
StatusRiskScoring = "risk_scoring" // Computing risk score
StatusPendingReview = "pending_review" // In manual review queue
StatusInfoRequested = "info_requested" // Waiting for merchant to provide more docs
StatusApproved = "approved" // Ready to go live
StatusRejected = "rejected" // Declined with reason
StatusSuspended = "suspended" // Post-approval suspension
)
Each state transition emits an event to a message queue (I use Kafka, but SQS works fine for lower volume). Downstream consumers handle notifications, update dashboards, trigger the next verification step, or alert the compliance team. The state machine enforces valid transitions — you can't go from draft to approved without passing through verification stages.
Async document verification
When a merchant uploads a document — say, articles of incorporation — the flow looks like this: upload hits your API, gets stored in S3 (encrypted, of course), a message goes onto the verification queue, a worker picks it up, sends it to your OCR/verification provider, and polls or receives a webhook when the result is ready. The result updates the application state and potentially triggers the next step.
The critical design decision: use webhooks from your verification providers, not polling. Middesk, Persona, and Onfido all support webhooks. Polling wastes compute and adds latency. But you need a webhook retry mechanism — providers will retry a few times, but if your endpoint is down during a deploy, you'll miss events. I run a reconciliation job every hour that checks for applications stuck in kyb_in_progress for more than 15 minutes and re-fetches the status from the provider.
Manual review queues
No matter how good your automation is, some percentage of applications need human eyes. In my experience, that's 8-15% of total volume. The manual review tool needs to surface the right context: the merchant's application data, all verification results so far, the computed risk score with a breakdown of contributing factors, and any documents they've uploaded. The reviewer needs approve/reject/request-more-info buttons, and every action must be logged with the reviewer's ID and notes.
SLA matters here: Track time-in-queue for manual reviews. If your compliance team takes 3 days to review an application, your merchants are signing up with competitors. I set a 4-hour SLA for standard reviews and 24-hour for high-risk. Breaches trigger escalation alerts.
Third-Party Provider Integration
You will integrate with multiple verification providers. Here's what I've used in production and what each is good at:
- Middesk — US business verification. Pulls Secretary of State filings, tax registrations, OFAC screening, and business address verification. Their API is clean and webhook support is solid. My go-to for US KYB.
- Persona — Identity verification for beneficial owners. Handles document upload, OCR, liveness checks, and database verification. Their inquiry-based model maps well to the "verify each owner" flow.
- Onfido — Similar to Persona for identity verification, with strong international document coverage. I've used them when onboarding merchants outside the US where Persona's document support was thinner.
The most important architectural decision: abstract every provider behind an interface. I define a BusinessVerifier interface and a PersonVerifier interface, then implement adapters for each provider. When Middesk changes their API (they've done it twice since I started using them), or when you need to add a new provider for a new geography, you're swapping an adapter, not rewriting business logic.
// Go — provider abstraction
type BusinessVerifier interface {
VerifyBusiness(ctx context.Context, req BusinessVerifyRequest) (*BusinessVerifyResult, error)
GetVerificationStatus(ctx context.Context, verificationID string) (*BusinessVerifyResult, error)
RegisterWebhook(ctx context.Context, url string) error
}
type PersonVerifier interface {
VerifyIdentity(ctx context.Context, req IdentityVerifyRequest) (*IdentityVerifyResult, error)
GetVerificationStatus(ctx context.Context, verificationID string) (*IdentityVerifyResult, error)
}
Handling the Edge Cases
The happy path — US LLC, two owners, low-risk MCC — is maybe 40% of your applications. The rest is edge cases, and they'll eat your engineering time if you don't plan for them.
Sole proprietors vs corporations
Sole proprietors don't have a separate legal entity. There's no EIN (they use their SSN), no articles of incorporation, no registered agent. Your KYB pipeline needs a separate path for them — skip business entity verification, but run enhanced identity checks on the individual. I treat sole proprietors as a single-owner business where the owner IS the business, which simplifies the data model but requires different verification logic.
Multi-country onboarding
If you onboard merchants in multiple countries, every assumption breaks. Business registration formats differ. Tax ID structures differ. Document types differ. The UK has Companies House. Germany has Handelsregister. Singapore has ACRA. You need country-specific verification flows, which means your state machine needs to branch based on the merchant's jurisdiction. I handle this with a VerificationPlan that gets generated at application time based on the merchant's country — it defines which checks to run, which providers to use, and which documents to collect.
High-risk MCCs
Some MCC codes — gambling (7995), cryptocurrency (6051), adult content (5967) — require Enhanced Due Diligence regardless of other risk factors. For these merchants, I require additional documentation: processing history from previous acquirers, proof of licensing (gambling licenses, money transmitter licenses), and a detailed description of their business model. These always go through manual review, and I typically require sign-off from a senior compliance analyst, not just any reviewer.
Monitoring the Onboarding Funnel
You can't improve what you don't measure. These are the metrics I track on every onboarding pipeline I build:
Those numbers are targets from a mature pipeline. When you first launch, expect worse — maybe 70% auto-approve and 48-hour median approval time. The key metrics to track:
- Time-to-approval — broken down by risk tier. Low-risk should be under 24 hours. Medium-risk under 3 days. High-risk under 7 days.
- Drop-off rate by stage — if 30% of merchants abandon at the document upload step, your UX is broken or you're asking for documents they don't have. I've seen drop-off rates halve just by adding clear instructions about what documents are needed and in what format.
- False positive rate — merchants flagged for manual review who turn out to be legitimate. If this is above 10%, your risk model is too aggressive and you're wasting compliance analyst time.
- False negative rate — merchants who were auto-approved but later turned out to be fraudulent or generated excessive chargebacks. This is the scary one. Track it by monitoring chargeback rates and fraud reports per merchant cohort, segmented by their onboarding risk score.
- Manual review throughput — applications reviewed per analyst per day. If this is declining, your review tool needs better UX or your analysts need better training.
I build a dashboard that shows these metrics in real-time, with alerts when any metric breaches its threshold. A sudden spike in manual review volume might mean a verification provider is returning more "consider" results than usual. A drop in auto-approve rate might mean you accidentally deployed a stricter risk model. These signals need to be visible to both engineering and compliance teams.
Feedback loop: The most valuable thing you can build is a feedback loop from post-approval outcomes back into your risk model. Every merchant who gets approved and later generates chargebacks above 1% is training data. Every merchant who gets flagged for manual review and turns out clean is training data. Use it. Retune your risk weights quarterly based on actual outcomes.
References
- Middesk — Business Verification API Documentation
- Persona — Identity Verification API Documentation
- Onfido — Identity Verification API Documentation
- FinCEN — Customer Due Diligence (CDD) Final Rule
- Visa — Acquirer Risk Program and Merchant Monitoring
- Mastercard — Business Risk Assessment and Mitigation (BRAM)
Disclaimer: This article reflects the author's personal experience and opinions. It is not legal or regulatory advice. Compliance requirements vary by jurisdiction, card network, and acquiring bank — always consult with qualified legal counsel and your compliance team. Product names, logos, and brands are property of their respective owners. Pricing and features mentioned are subject to change — always verify with official documentation.