This article is a systems and architecture analysis for engineering audiences. It is not financial, legal, or compliance advice.
Perpetual KYC (pKYC) Architecture: Continuous Risk Reassessment for 2026
This is a systems and architecture analysis for engineers. It is not financial, legal, or compliance advice, and it makes no predictions about any specific institution or regulation.
A bank’s periodic Know Your Customer review is a promise to notice things a year late. A customer onboarded as low risk in January can become a director of a sanctioned company in March, appear in a leaked corporate registry in May, and still sail through until their scheduled refresh the following January. A perpetual KYC architecture closes that gap by treating due diligence as a continuous stream problem instead of a calendar problem: every external change — a new corporate filing, a sanctions-list delta, an adverse-media hit, an anomalous transaction — becomes an event that can re-score the customer within minutes. This post is a reference architecture for that system. We walk the full pipeline from ingestion through entity resolution, event detection, a hybrid rules-plus-ML risk engine, fuzzy-matching screening, and case management, and we are honest about where each layer breaks under real data.
What this covers: the layered pKYC reference architecture, event-driven reassessment loops, the risk-scoring engine, sanctions and adverse-media screening, false-positive management, model governance, and the failure modes that make continuous due diligence hard in practice.
Context and Background
Periodic KYC refresh emerged when data was expensive to pull and manual to review. Institutions bucketed customers into low, medium, and high risk, then re-reviewed them on fixed cycles — often one, three, and five years respectively. The cadence was an operational compromise, not a risk-based truth. It assumes customer risk is roughly constant between reviews, which is exactly the assumption financial crime exploits. A mule account is dormant at onboarding and only becomes interesting after it starts moving money. A shell company looks clean until its ultimate beneficial owner changes. The Financial Action Task Force frames customer due diligence as an ongoing obligation, and its guidance on ongoing monitoring makes clear that CDD is meant to be continuous in spirit — periodic batch review is an implementation shortcut, not the intent.
The periodic model also scales badly. As a book of business grows, the refresh backlog grows with it, and the reviews cluster into painful quarterly spikes that starve analysts of time for genuinely risky cases. Worse, most of that effort is wasted: the overwhelming majority of refreshed customers have not changed at all, so analysts re-key the same unchanged data to satisfy a date.
Perpetual KYC inverts the model. Instead of asking “whose review date is due?”, the system asks “who has materially changed since we last looked?” That reframing is fundamentally an event-driven, streaming-data architecture, closely related to the patterns behind real-time fraud detection architecture. The engineering challenge is no longer scheduling — it is detecting meaningful change across dozens of noisy data sources, attributing that change to the right entity, and deciding, cheaply and explainably, whether it warrants human attention.
The pKYC Reference Architecture
A perpetual KYC architecture is a layered event pipeline: ingestion normalizes heterogeneous sources into events, entity resolution attaches each event to a canonical customer, change detection decides which events are material, a hybrid risk engine re-scores the entity, screening checks it against sanctions and adverse-media data, and case management routes anything that crosses a threshold to an analyst — with every step written to an immutable audit log.

Figure 1: The pKYC reference architecture as a left-to-right event pipeline.
Figure 1 shows the seven layers. External sources (corporate registries, sanctions and PEP lists, adverse media), transaction activity, and the existing KYC profile store all feed a common ingestion layer built on a streaming backbone. Ingested events flow to entity resolution, which maintains an identity graph; from there, change detection filters for materiality, the risk engine re-scores, screening runs, and case management handles anything that alerts. Audit and lineage sit alongside every stage so that any decision can be reconstructed after the fact. The rest of this section walks the first three layers; the risk engine and screening get their own deep-dive below.
Ingestion: turning heterogeneous sources into a uniform event stream
The ingestion layer’s job is to convert wildly different sources into a single, uniform stream of change events. A corporate registry publishes bulk XML or JSON dumps on an irregular schedule. Sanctions lists such as the OFAC Specially Designated Nationals list and the consolidated EU list publish structured deltas when they update. Adverse-media feeds arrive as a firehose of unstructured articles. Transaction data arrives continuously from core banking. Onboarding profiles change when a customer updates their details.
Most teams land these on a durable log such as Apache Kafka, with one topic per source domain and change-data-capture (CDC) connectors pulling updates from operational databases. The design goals are ordering, replayability, and idempotency. Ordering matters because a “director added” event must not be processed before the “company created” event it depends on. Replayability matters because when you improve a downstream model, you want to re-run history through it. Idempotency matters because every source will, eventually, deliver the same record twice. A schema registry enforces contracts so a malformed registry dump cannot silently corrupt the identity graph downstream. Events are deliberately thin — an identifier, a source, a timestamp, and a payload reference — with heavy documents kept in object storage and referenced by URI, so the log stays fast.
Sources also differ in freshness semantics, and the ingestion layer has to encode that difference explicitly. Sanctions deltas are authoritative and near-real-time, so they warrant immediate processing. Corporate-registry dumps are authoritative but stale by days or weeks, so their events carry an “as-of” timestamp distinct from the ingestion timestamp — conflating the two is a classic bug that makes a customer look freshly risky when in fact the underlying filing is months old. Adverse-media feeds are neither authoritative nor complete; they are leads, and the pipeline must treat them as probabilistic signals rather than facts. Encoding provenance and confidence on every event at the moment of ingestion, rather than inferring it downstream, is what keeps the rest of the system honest. A dead-letter topic captures events that fail schema validation so that a single bad record neither blocks the stream nor vanishes silently, and a small reconciliation job periodically confirms that the count of events consumed matches the count produced per source, catching connector gaps before they become compliance gaps.
Entity resolution and the identity graph
Raw events are useless until they are attached to the right customer, and this is the hardest layer in the entire pipeline. Entity resolution links records that refer to the same real-world person or company across sources that share no common key — “Jon A. Smith” in your CRM, “Smith, Jonathan Andrew” in a registry, and “J SMITH” on a wire — into one canonical entity. It also builds the surrounding graph: the companies a person directs, the beneficial owners behind a company, the accounts they control, the counterparties they transact with.
Modern systems model this as an identity graph: nodes are entities and edges are relationships, each carrying a confidence weight. Resolution proceeds in two stages. Blocking cheaply narrows billions of comparisons to plausible candidates using cheap keys such as normalized name tokens, date of birth, or postcode. Matching then scores each candidate pair with a probabilistic model — a Fellegi-Sunter style approach or a supervised classifier — that weighs agreement and disagreement across fields. The output is not a hard yes/no but a confidence score, which is exactly the signal risk scoring needs downstream. When the graph itself is the object of analysis — detecting layered ownership or circular control — the techniques overlap heavily with those in AML graph neural networks for money-laundering detection.
Two properties make entity resolution tractable at scale in a perpetual system. First, resolution must be incremental. A batch resolver that recomputes the entire graph nightly cannot support event-driven reassessment; instead, each incoming event triggers a local re-resolution of only the affected neighborhood, merging or splitting entities as new evidence arrives. Second, merges must be reversible. Because resolution is probabilistic, some merges will later prove wrong, and a system that hard-deletes the source records on merge cannot unwind the mistake. The durable pattern is to keep source records immutable and represent the canonical entity as a derived view over them, so a merge is a link that can be cut and a split is a link that can be added, both without data loss. Confidence weights on edges let downstream layers reason about uncertainty explicitly — a sanctions hit reached only through a low-confidence 0.6 ownership edge should be treated differently from one reached through a confirmed directorship, and surfacing that distinction to analysts prevents both over- and under-reaction.
Change detection and event triggers
Not every event deserves a re-score. Change detection is the layer that separates material change from noise, and it is where a naive pKYC system either drowns analysts or misses real risk. A change in registered address might be immaterial; the appearance of a new beneficial owner who resolves to a sanctioned entity is not. The layer applies materiality rules — some deterministic, some learned — to decide whether an event should trigger reassessment, update the customer’s baseline silently, or be discarded. Getting this filter right is what makes the difference between a system that fires ten thousand alerts a day and one that fires the fifty that matter.
Materiality is best expressed relative to a maintained baseline per entity rather than as a judgment on the event in isolation. The system holds a compact snapshot of each customer’s last-known risk-relevant state — their tier, key attributes, graph neighborhood, and recent behavioral aggregates — and each event is scored by how far it moves that state. A transaction that is large in absolute terms may be immaterial for a customer whose baseline is high-volume trade finance, while the same amount is highly material for a dormant retail account. This is why materiality cannot be a fixed global threshold: it is a function of both the event and the entity’s history. Some triggers are categorical and bypass the delta logic entirely — any sanctions or PEP match, any change in beneficial ownership, any jurisdiction change into a high-risk country — because policy demands they always be reviewed regardless of magnitude. The remainder are graded, and the grading is exactly the lever that controls alert volume.
Deeper Analysis: Risk Engine and Screening
The risk engine and screening subsystem are where pKYC earns or loses trust. This is the part regulators scrutinize, analysts live in, and model-risk teams govern. Two loops matter: the outer reassessment loop that decides when to re-score, and the inner screening pipeline that checks an entity against watchlists. We take them in turn.
The event-driven reassessment loop

Figure 2: The reassessment loop. A change event is normalized, tested for materiality, re-scored, and either cleared or escalated.
Figure 2 traces one event through the loop. A change event arrives — a new filing, a transaction pattern, a watchlist delta. It is normalized and attached to its entity. A trigger rule tests materiality: if the change is immaterial, the system updates the entity’s baseline and stops, logging that it looked and chose not to act. If it is material, the entity is re-scored with refreshed features. The system then compares the new score to the previous one. A small delta within tolerance is auto-cleared, with the decision and its reason written to the audit log. A delta over threshold raises an alert into the case queue.
Three engineering properties make this loop safe. First, every path terminates in a logged decision — even “no action” is recorded, because a regulator’s question is often “why did you not act?” rather than “why did you act?” Second, the re-score is idempotent: replaying the same event produces the same score and the same decision. Third, the thresholds are configuration, not code, so risk owners can tune sensitivity without a deployment, and every threshold change is itself versioned and auditable.
The loop must also be resilient to the messiness of real event streams. Events arrive late, out of order, and in bursts. A watchlist provider might publish a large delta that lands as ten thousand events in a few seconds; a registry outage might delay a batch by a day and then release it all at once. The reassessment loop absorbs these by decoupling ingestion rate from scoring rate through the durable log, and by making re-scoring commutative where possible so that processing two independent changes in either order converges to the same final state. Where changes genuinely depend on each other — ownership before control, entity creation before attribute update — the system uses the entity as a partition key so that all events for one customer are processed in order relative to each other, even as different customers are processed in parallel. This ordering-within-entity, parallelism-across-entities pattern is what lets the loop stay both correct and fast under load.
The risk-scoring engine: rules plus ML

Figure 3: The hybrid risk engine. Features feed both deterministic rules and an ML model; a calibrated blend produces a tier and an action.
The risk engine, shown in Figure 3, is deliberately hybrid. A pure rules engine is transparent but brittle and easy to game; a pure ML model is powerful but hard to explain to a regulator. Combining them gives explainable coverage. Features flow from a feature store — behavioral signals such as transaction velocity and geographic spread, profile attributes such as industry and product mix, and graph features such as proximity to a known-bad entity. Deterministic rules encode hard policy: a resolved sanctions hit forces the highest tier regardless of what any model says. In parallel, a gradient-boosting model produces a calibrated risk probability from the same features.
A fusion step blends the two. Rules act as overrides and floors; the ML score refines ranking within the space the rules leave open. Calibration matters here: a raw model output of 0.8 must actually mean roughly an 80% empirical hit rate, or downstream thresholds are meaningless. The blended score maps to a discrete risk tier — low, medium, high — and each tier maps to an action: passive monitoring, enhanced review, or escalation for enhanced due diligence. The feature store is shared between training and serving so that the features a model saw in training are exactly those it sees in production, eliminating training-serving skew, which is one of the most common silent failure modes in deployed risk models.
The most valuable and most fragile features in a pKYC model are the graph features. Proximity to a known-bad entity, the number of degrees separating a customer from a sanctioned party, the density of a customer’s ownership cluster, and whether the customer sits on a path that forms a circular ownership loop are all far more predictive than any single profile attribute. But they are expensive to compute and they shift as the graph changes, so the architecture must decide which graph features to materialize incrementally and which to compute on demand. A practical split is to precompute cheap, frequently-read features — degree, cluster membership, nearest known-bad distance — as the graph mutates, and to reserve expensive path-finding queries for the moment a case is actually opened. This keeps steady-state scoring fast while still giving analysts the deep relational context when they need it. Because graph features move when neighbors change, a change to one entity can legitimately re-score its neighbors, which is why the reassessment loop must be able to fan a single event out to a bounded set of affected entities rather than treating each customer as an island.
Screening: fuzzy matching against sanctions, PEP, and adverse media

Figure 4: The screening pipeline. Names are normalized, candidates generated, fuzzy-matched against lists, and scored.
Screening, in Figure 4, is where entity data meets watchlists, and it is dominated by one problem: names are messy. The same person appears as “Muhammad”, “Mohammed”, and “Mohamad”; transliteration from non-Latin scripts is lossy; corporate names carry inconsistent suffixes. Exact matching misses real hits and fuzzy matching drowns you in false ones, so the pipeline is a careful sequence rather than a single comparison.
First, normalization: case folding, transliteration to a canonical script, stripping of corporate stopwords, and tokenization. Second, blocking generates candidates so you compare against a shortlist, not the entire watchlist, which keeps latency bounded. Third, fuzzy matching scores each candidate using a blend of edit-distance, token-set similarity, and phonetic algorithms such as Double Metaphone for name-sound equivalence. Fourth, the surviving candidates are checked against the sanctions lists (OFAC, EU, UN), PEP and relatives-and-close-associates lists, and an adverse-media classifier that uses NLP to distinguish “arrested for fraud” from “commented on a fraud case”. Finally, hit scoring combines match strength with list severity, and only candidates above a threshold become confirmed hits routed to a case.
The dominant operational cost here is false positives. A common structural technique is to persist prior disposition decisions keyed by a stable match signature, so that once an analyst clears “J. Smith” against a specific SDN entry, an identical future match auto-dispositions with that rationale rather than re-alerting. This turns screening from a stateless comparator into a system with memory, and it is often where the largest analyst-time savings come from.
Watchlist deltas add a second dimension the pipeline must handle deliberately. When a sanctions list updates, you cannot simply re-screen every customer against the whole list every day; that is both wasteful and slow. The efficient pattern is delta-driven screening — when a new name is added to a list, screen only against the existing customer base for that one addition; when a customer’s own name or identifiers change, screen only that customer against the full lists. Both directions of change are events, and both feed the same reassessment loop. Threshold tuning here is a genuine risk-owner decision rather than an engineering one: a lower match threshold catches more true hits at the cost of more analyst review, and the right cutoff depends on the institution’s risk appetite and list severity, not on any universal number. The pipeline should make that threshold configurable per list and per identifier type, and it should record which threshold was in force when any given match was scored, so a later reviewer can reconstruct why a borderline name did or did not alert.
Case management and alert triage
When the loop escalates, the output is a case, and case management is where the system meets its human operators. A well-designed case is self-contained: it carries the triggering event, the before-and-after risk scores, the specific features and rules that moved the score, the resolved entity and its relevant graph neighborhood, and any prior related cases. An analyst should be able to make a disposition without leaving the case to hunt for context in five other systems, because context-switching is where analyst time and accuracy both degrade.
Triage prioritizes the queue so the highest-risk cases surface first, typically by a blend of risk tier, list severity, and the size of the score delta. Every disposition — cleared, escalated, or referred — feeds back into the system in two ways. It becomes disposition memory that suppresses identical future alerts, and, over time, the corpus of analyst decisions becomes labeled training data that improves both the materiality filter and the ML risk model. This feedback loop is what lets a pKYC system get quieter and sharper over time rather than noisier, provided the labels are captured cleanly and the model is retrained under governance rather than silently.
Model governance, explainability, and lineage
Any ML in a KYC decision path lives under model-risk governance. That means a model inventory, documented training data and assumptions, pre-deployment validation, and ongoing performance monitoring for drift. Explainability is a hard requirement, not a nice-to-have: when a customer is tiered high, the system must be able to state which features drove that score. Techniques such as SHAP-style attributions on the gradient-boosting model provide per-decision reason codes. Underpinning all of it is data lineage — the ability to trace any score back through the exact feature values, source records, and model version that produced it. Lineage is what lets you answer, months later, “why was this customer flagged on this date?” without guesswork.
Trade-offs, Gotchas, and What Goes Wrong
Alert fatigue is the first and worst failure mode. A pKYC system that fires on every change trains analysts to click “clear” reflexively, which defeats the purpose. The materiality filter and disposition memory exist specifically to keep alert volume proportional to genuine risk; under-tuning them is more dangerous than not building pKYC at all, because it manufactures the appearance of diligence while degrading its substance.
Data quality is the second. Entity resolution amplifies bad data: a wrong date of birth can merge two distinct people into one entity, so one person’s risk contaminates another’s, or split one person into two so risk goes unnoticed. Every downstream layer inherits these errors, and they are nearly invisible until an audit surfaces them.
Model drift is the third. Financial crime typologies evolve, so a model trained on last year’s patterns silently decays. Without population-stability and performance monitoring, the decay is invisible until a missed case makes it obvious. Drift monitoring on both inputs and outputs is not optional.
Then there are the tensions with no clean answer. Continuous monitoring pulls toward collecting and retaining more data, which collides directly with data-minimization and privacy principles; the architecture must let you monitor without hoarding. Latency trades against completeness: waiting for every source to confirm gives a fuller picture but delays action, while acting on partial data is fast but error-prone. And graph queries that are cheap at thousands of entities become expensive at hundreds of millions, so the identity graph’s storage and query design is a first-order scaling decision, not an afterthought.
Two further anti-patterns deserve naming because they recur across implementations. The first is the silent-model-swap: deploying a retrained risk model without re-baselining thresholds, so scores shift under fixed cutoffs and either alert volume explodes or genuine risk quietly slips below the line. Any model change must be paired with threshold re-validation and a shadow-run against recent history before it decides live cases. The second is treating “no alert” as “no record”. A system that only persists cases when it escalates cannot prove it looked at the events it dismissed, which is precisely the evidence a regulator or auditor asks for. Every reassessment, including every auto-clear, must leave a durable trace, even though the overwhelming majority of traces will never be read. Storage is cheap; an unexplainable gap in the timeline is not.
Finally, be wary of over-trusting the identity graph as ground truth. The graph is a probabilistic model of reality, assembled from imperfect sources, and its edges carry uncertainty that tends to get flattened away as data moves downstream. A directorship inferred from a fuzzy name match is not the same as one confirmed by a registry filing, yet both can end up as plain edges if confidence is not propagated. Preserving and surfacing that uncertainty end to end — rather than collapsing it into false certainty — is what separates a system analysts trust from one they learn to second-guess.
Practical Recommendations
Treat pKYC as a streaming data platform first and a compliance feature second. The hardest problems — ordering, idempotency, entity resolution, drift — are data-engineering problems, and teams that under-invest there ship systems that alert loudly and reliably on the wrong things. Start with the event backbone and entity resolution; a mediocre risk model on a clean identity graph beats a sophisticated one on a broken graph.
Make every decision explainable and auditable from day one. Retrofitting lineage onto a system that discarded intermediate state is far harder than capturing it as you go. Keep thresholds and rules as versioned configuration so risk owners can tune without a code deploy, and version those changes too.
Invest early in false-positive management, because analyst time is the scarcest resource in the whole system and screening is where it drains fastest.
Sequence the build to de-risk it. Ship ingestion and entity resolution first and run them in shadow mode, resolving entities and detecting change without acting on anything, so you can measure resolution quality and change volume against reality before a single alert reaches an analyst. Add rules-based scoring next and calibrate thresholds against that observed change volume. Only then layer in ML, and introduce it in shadow alongside the rules so you can compare its decisions to analyst dispositions before it influences live cases. This staged rollout turns the two riskiest components — resolution accuracy and model behavior — into things you observe and tune rather than things you discover in production. Instrument every stage from the start, because a pKYC system you cannot measure is one you cannot govern, and the metrics that matter most — alert precision, disposition throughput, resolution error rate, and model drift — are exactly the ones that are painful to reconstruct after the fact.
Engineering checklist:
- [ ] Durable, replayable, idempotent event log across all sources
- [ ] Schema registry enforcing source contracts at ingestion
- [ ] Two-stage entity resolution (blocking then probabilistic matching) with confidence scores
- [ ] Materiality filter separating triggering from silent baseline updates
- [ ] Shared feature store for training and serving to prevent skew
- [ ] Calibrated hybrid risk score with rule overrides and per-decision reason codes
- [ ] Screening with normalization, phonetic matching, and disposition memory
- [ ] Full lineage from score back to source records and model version
- [ ] Drift monitoring on model inputs and outputs
Frequently Asked Questions
What is the difference between periodic KYC and perpetual KYC?
Periodic KYC re-reviews customers on fixed cycles — commonly one, three, or five years by risk tier — regardless of whether anything changed. Perpetual KYC is event-driven: it continuously ingests changes from transactions, registries, and watchlists, and re-scores a customer only when something material changes. The practical effect is that risk is reassessed within minutes or hours of a triggering event rather than at the next scheduled date, so newly emerged risk is caught far sooner and analyst effort is spent on genuinely changed customers.
Do you need machine learning to build a pKYC system?
No. A pKYC architecture is fundamentally about event-driven data flow, and a purely rules-based risk engine can run the loop end to end. Machine learning improves risk ranking and false-positive reduction, but it adds model-governance and explainability obligations. Many teams start rules-only to get the streaming, entity-resolution, and case-management foundations solid, then add ML for scoring and adverse-media classification once the data platform is trustworthy and lineage is in place.
Why is entity resolution considered the hardest part?
Because it links records that share no common key across sources with inconsistent, incomplete, and conflicting data, and every downstream layer depends on it being right. A resolution error either merges two distinct people so one’s risk contaminates the other, or splits one person so risk is missed. These errors are hard to detect, propagate silently through scoring and screening, and often only surface during an audit. Getting resolution right is the foundation the rest of the architecture stands on.
How does pKYC reduce false positives in screening?
Through several layered techniques. Name normalization and phonetic matching cut spurious mismatches from transliteration and spelling variation. Blocking limits comparisons to plausible candidates. Hit scoring combines match strength with list severity so weak matches on low-severity entries do not alert. Most importantly, disposition memory persists prior analyst decisions keyed by a match signature, so a previously cleared match auto-dispositions instead of re-alerting. Together these keep alert volume proportional to genuine risk rather than name-collision noise.
What role does Kafka or streaming play in the architecture?
Streaming provides the durable, ordered, replayable event backbone the whole system depends on. Kafka-style topics decouple sources from consumers, so ingestion, resolution, scoring, and screening scale independently. Ordering guarantees dependent events process in sequence. Replayability lets you re-run history through improved models. Idempotent processing tolerates duplicate delivery. Without a streaming backbone, pKYC degrades into fragile batch jobs that lose the near-real-time reassessment that defines the pattern in the first place.
How is a pKYC risk decision made explainable to regulators?
By combining a hybrid scoring design with full lineage. Deterministic rules provide inherently transparent overrides and floors. The ML component uses calibrated outputs and per-decision attribution — SHAP-style reason codes that name the features driving a score. Data lineage ties each decision back to the exact feature values, source records, and model version behind it. Together these let the system answer “why was this customer tiered high on this date?” with a reproducible, evidenced trail rather than an opaque number.
Further Reading
- AML graph neural networks for money-laundering detection — how graph learning surfaces layered ownership and laundering structure in the same identity graph pKYC builds.
- Real-time fraud detection architecture — the streaming and feature-store patterns that pKYC reassessment shares with fraud scoring.
- MPC wallet custody architecture with threshold signatures — a companion FinTech reference architecture on securing the assets that pKYC monitors.
- FATF guidance and standards — the international standards framing customer due diligence as an ongoing obligation.
- OFAC Specially Designated Nationals (SDN) list — a primary sanctions-list source that screening pipelines consume.
By Riju — about
