Real-Time AML & Sanctions Triage

01

Executive Summary

Financial institutions face a compounding compliance burden: every payment must be screened for sanctions exposure and scored for AML risk before it reaches downstream clearing, case management, or reporting infrastructure. Most architectures introduce this screening as a separate microservice or external API call — creating a network hop that adds latency, exposes a failure boundary, and severs the audit trail between the transaction record and the compliance decision.

StreamKernel eliminates that boundary. This whitepaper documents a reproducible benchmark in which StreamKernel processed 20,480 synthetic payment events through a full AML and sanctions triage pipeline — embedding, risk scoring, decision labeling, case prioritization, and provenance stamping — inside a single JVM process, with zero dropped records and zero DLQ events, at an average throughput of 68.1 EPS (peak 99.8 EPS) on CPU-only hardware.

The headline inference figure is 0.049 ms average per-record ONNX embedding latency — measured directly from Prometheus, across all 20,480 records, at 100% batch fill. There is no model server. There is no network hop. The embedding, the AML logic, and the provenance label are produced inside the same JVM thread that received the payment event.

Every emitted record carries a complete decision package: AML risk score, risk band, decision label, case priority, typology codes, reason codes, sanctions screen result, policy version, model version, and a SHA-256 provenance hash. The output is route-ready for case management systems, SAR queues, sanctions operations desks, or analytical warehouses without further enrichment.

This run represents the CPU-only deployment floor — the minimum configuration available to any organization without GPU infrastructure. The same architecture supports DJL CUDA backends for GPU-accelerated inference where that compute is available.

02

The Compliance Architecture Problem

The standard pattern for real-time AML and sanctions screening treats the compliance decision as a service call. A payment event arrives at a streaming platform, triggers a consumer, which calls an external risk scoring API or sanctions screening service, waits for a response, and then publishes an enriched record downstream. This approach is operationally familiar but carries structural costs that compound at scale.

Latency Accumulation

Each round-trip to an external model server or screening API adds network latency — typically 5–50ms under normal conditions, but unbounded under load. For payment rails where sub-second decisioning is a regulatory expectation, this creates a structural ceiling on throughput that cannot be resolved by horizontal scaling alone.

Audit Trail Fragmentation

When the compliance decision is produced by a separate service, the provenance chain is fragmented. The transaction record exists in one system; the model version, policy version, and decision rationale exist in another. Reconstructing the full decision path for a SAR filing, regulatory examination, or internal audit requires joining across systems that were never designed to be joined at query time.

Failure Mode Coupling

A microservice architecture for compliance creates a hard dependency: if the screening service is unavailable, degraded, or rate-limited, the payment pipeline stalls or must bypass screening entirely — a situation that is both operationally painful and potentially reportable. Operators face a forced choice between availability and compliance continuity.

Deployment Surface in Regulated Environments

Air-gapped deployments, SCIF environments, and NIPRNet-connected infrastructure cannot rely on cloud-hosted model endpoints. Organizations operating in these environments have historically been forced to either accept reduced screening capability or maintain complex on-premises inference infrastructure with its own lifecycle management burden.

StreamKernel inverts this model. The ONNX model, the AML policy logic, the provenance labeler, and the Kafka sink are all loaded into the same JVM process. A payment event is parsed, embedded, scored, labeled, and emitted without leaving the process boundary.

03

Pipeline Architecture

Transform Chain

The AML/sanctions triage pipeline executes a three-stage transform chain inside a single JVM process:

SYNTHETIC

→

STRING_TO_WIREEVENT

→

DJL_EMBEDDING

→

AML_SANCTIONS_TRIAGE

→

Kafka

Stage 1 — STRING_TO_WIREEVENT: Raw payment text (amount, merchant, channel, country, device fingerprint, velocity signals, account age, sanctions screen flag) is parsed and normalized into a typed WireEvent object. The parser extracts the payload ID as the Kafka message key for downstream deduplication and routing.

Stage 2 — DJL_EMBEDDING: The WireEvent is passed to the in-process ONNX embedding engine. MiniLM-L6-v2 is loaded natively via DJL and ONNX Runtime 1.20.0, executing on the CPU provider with 2 intra-op threads and 1 inter-op thread per predictor. The embedding pool maintains 4 pre-warmed predictors sharing model weights. A batching engine aggregates records into batches of up to 16 before invoking the ONNX forward pass, amortizing fixed overhead across the batch.

Stage 3 — AML_SANCTIONS_TRIAGE: The embedded vector, combined with structured transaction fields, is evaluated against the AML policy. Configurable thresholds: WATCH at score ≥ 0.42, ESCALATE/HOLD at score ≥ 0.70. The transform emits a fully enriched record carrying the decision label, risk band, case priority, typology codes, reason codes, and a SHA-256 hash of the source text.

Runtime Configuration

Parameter	Value
Pipeline ID	sk-financial-aml-sanctions-triage
Parallelism	4 workers
Batch size	16 records per batch
Heap	4 GB (-Xms4g -Xmx4g), G1GC, MaxGCPauseMillis=50
ONNX Execution Provider	CPU (CPUExecutionProvider)
ONNX Runtime Version	1.20.0
DJL Version	0.32.0
Model	MiniLM-L6-v2 (minilm-l6-v2-onnx)
Embedding Pool Size	4 pre-warmed predictors
Tokenizer Max Length	16 tokens
Batching Engine	Enabled — max batch 16, flush every 10 ms
Warmup Batches	64 batches before first live record
Sink	Kafka · lz4 compression · 12 partitions · acks=1
StreamKernel Version	0.2.0

Output Payload Contract

Every record emitted carries the following fields. In addition, each Kafka record carries routing headers (aml.score, aml.decision, aml.risk_band, streamkernel.provenance.*) enabling downstream routing without deserialization.

Field	Type	Description
transaction_id	String	Payment event identifier
sanctions_screen_hit	Boolean	True if sanctions list match detected
aml_risk_score	Float [0,1]	Composite AML risk score from embedding + rules
risk_band	Enum	LOW / MEDIUM / HIGH
decision	Enum	CLEAR / WATCH / ESCALATE / HOLD
case_priority	Enum	P1 (highest) through P4 (lowest)
typologies	String[]	AML typology codes (e.g., VELOCITY_ANOMALY)
reason_codes	String[]	Signal codes (e.g., WIRE_TRANSFER_CHANNEL)
policy_version	String	Policy identifier for audit reproducibility
model_version	String	Model identifier for audit reproducibility
source_text_sha256	String	SHA-256 hash of source text for trace correlation

04

Benchmark Results

Inference Timing — Prometheus (all 20,480 records)

The latency sampler was intentionally disabled (streamkernel.latency.enabled=false) to remove sampling overhead from the throughput baseline. All inference timing is captured via dedicated embedding metrics instrumentation.

0.049ms

Avg ONNX inference per record

infer_ms_sum / 20,480 records

0.129ms

Max ONNX inference per record

sk_ai_embedding_infer_ms_seconds_max

0.675ms

Avg ONNX batch time (batch=16)

onnx_batch_ms_sum / 1,441 batches

2.016ms

Max ONNX batch time observed

sk_ai_embedding_onnx_batch_ms_seconds_max

100%

Batch fill rate throughout run

sk_ai_embedding_batch_fill_percent

0.063%

GC overhead (G1GC, 4GB heap)

jvm_gc_overhead = 6.3E-4

Run Summary

Metric	Value
Run ID	run-financial-aml-sanctions-01
Run Window	2026-06-07T23:08:21Z → 23:14:10Z (5.82 min actual)
Total Records Processed	20,480
Records Delivered (Kafka)	20,480
Records Dropped	0
DLQ Events	0
Empty Batches	0
Average PROC_EPS	68.1
Peak PROC_EPS	99.8
BENCH Windows (5s)	60
Kafka Partitions	12
Heap at Run End	303 MB / 4 GB
Clean Pipeline Stop	Yes — graceful drain + Kafka producer close

Sample Enriched Record

First parsed sample from post-run proof validation — a CLEAR decision on a wire transfer, showing the complete decision package including typologies and reason codes even for cleared transactions:

        {
  "transaction_id": "TXN-102320",
  "amount_usd": 250.4,
  "channel": "wire",
  "country": "GB",
  "sanctions_screen_hit": false,
  "aml_risk_score": 0.2904,
  "risk_band": "LOW",
  "decision": "CLEAR",
  "case_priority": "P4",
  "reason_codes": ["WIRE_TRANSFER_CHANNEL", "ELEVATED_ACTIVITY_VELOCITY"],
  "typologies": ["FUNDS_TRANSFER_ACTIVITY", "VELOCITY_ANOMALY"],
  "policy_version": "aml-sanctions-policy-v1",
  "model_version": "deterministic-aml-sanctions-v1",
  "source_text_included": false,
  "source_text_sha256": "sha256:572211679fc98e36c064473e58df68b0d9a36e3ca109778ee31d46366cf1dffa"
}
      

Even a CLEAR, P4-priority transaction carries the full evidence package. The wire transfer channel and elevated velocity are surfaced as reason codes and mapped to AML typologies, giving the compliance reviewer an auditable rationale for the CLEAR decision — not merely the outcome.

05

Compliance Relevance

Audience Mapping

Audience	Field(s) Used	Workflow Value
AML Analysts	decision, risk_band, typologies	Immediate triage — WATCH/ESCALATE cases route to analyst queues without manual scoring
Sanctions Operations	sanctions_screen_hit, decision	Automatic HOLD routing for sanctions hits; CLEAR records bypass manual review
Case Management	case_priority, reason_codes	P1/P2 cases escalate immediately; P3/P4 enter standard review queues
Compliance Reviewers	policy_version, model_version	Reproducible decisions — each record carries the exact policy and model that produced it
Internal Audit / Legal	source_text_sha256	Unbroken audit chain from transaction text hash through decision to emitted record
SAR Filing Teams	typologies, reason_codes	Pre-populated typology and narrative evidence for Suspicious Activity Reports
Platform Engineers	streamkernel.provenance.*	Kafka header routing to SAR queues or warehouse sinks without deserialization

Provenance and Auditability

Every record carries a SHA-256 hash of the source transaction text. In this benchmark profile, the raw source text is excluded from the payload (source_text_included: false) — for environments where storing PII in the evidence topic is undesirable. The hash provides a cryptographic tie between the decision record and the original transaction without transmitting sensitive payload downstream.

The combination of policy_version and model_version fields means every decision is reproducible: given the same transaction text, policy version, and model version, the pipeline will produce the same decision. This satisfies regulatory examination and litigation hold scenarios where the institution must demonstrate that a given transaction was evaluated correctly under the policy in effect at the time.

Threshold Configurability

AML thresholds are pipeline properties, not compiled logic:

        transform.aml_sanctions.watch.threshold=0.42
transform.aml_sanctions.case.threshold=0.70
      

Adjustable per deployment environment without recompiling. The policy_version field provides a versioned record of which threshold configuration produced each decision, satisfying model risk management requirements.

06

Operational Profile

JVM and GC Behavior

The benchmark ran on a 4 GB heap with G1GC configured for a 50ms maximum GC pause target and 16 MB region size. GC overhead at run end was 0.063% — effectively negligible. The heap at run end was 303 MB used of 4 GB allocated. The G1GC sawtooth pattern visible in the BENCH window MEM column (cycling ~128 MB → 298 MB → ~128 MB) is expected behavior for a throughput-optimized configuration and does not affect throughput or record integrity.

Reliability Guarantees

Across 20,480 records and 60 five-second measurement windows:

Zero records dropped — streamkernel_pipeline_dropped_total = 0
Zero DLQ events — streamkernel_pipeline_dlq_total = 0
Zero empty batches — streamkernel_pipeline_empty_batch_total = 0
100% fidelity — processed_total = out_total = kafka_sent_ok_total = 20,480
Clean graceful shutdown — Kafka producer closed with 30-second timeout, all in-flight records drained before process exit

Scaling Path

This benchmark is the CPU-only deployment floor — a development workstation with 12 host cores, 4-worker configuration, no GPU. A production deployment on dedicated server hardware with 32+ cores and a larger predictor pool would yield materially higher throughput with no architectural changes.

The same pipeline configuration supports DJL's CUDA backend via the GPU JAR flavor — a single configuration change. GPU inference eliminates the primary throughput constraint in this benchmark and is the recommended configuration for high-volume payment rails.

07

Deployment Contexts

Payment Rail Compliance

Primary target: a financial institution operating a real-time payment rail — ACH, wire transfer, card authorization, or instant payment — where every transaction must be screened before downstream processing. StreamKernel positions as the compliance enrichment layer between the raw event stream and case management, clearing, or reporting infrastructure. The sub-millisecond inference profile is compatible with synchronous payment decisioning requirements.

Air-Gapped and Regulated Infrastructure

The CPU-only JAR flavor and the absence of any external model endpoint make this architecture directly deployable in air-gapped environments. The ONNX model file, tokenizer, and policy configuration are loaded from local paths at startup. No calls to cloud APIs, no license server dependencies, no runtime network requirements beyond the Kafka broker. Tested for SCIF and NIPRNet compatibility.

On-Premises Enterprise (DGX Station)

For enterprises deploying NVIDIA DGX Station or similar hardware, StreamKernel provides the governance and policy enforcement layer between the compute and downstream data consumers. The same pipeline that runs on a CPU runs on a DGX Station GPU with a single configuration change. Policy logic, provenance labeling, and Kafka delivery contract are identical across deployment targets.

Multi-Jurisdiction Compliance

The policy_version field and configurable threshold properties support multi-jurisdiction deployments where different regulatory regimes require different screening thresholds or typology mappings. A single StreamKernel deployment can run parallel pipeline instances with jurisdiction-specific configurations, each emitting to a jurisdiction-specific evidence topic.

08

Architecture Comparison

External Microservice Pattern

Separate model server — 1+ network hop per record
5–50ms inference latency (network dependent, unbounded under load)
Audit trail fragmented across systems
Screening bypass if API unavailable
Cloud endpoint required — not air-gap compatible
Separate GPU inference server for acceleration
Policy version not embedded in event record
Two deployment units: pipeline + model server

StreamKernel In-Process

Same JVM process — zero network hops
0.049ms avg inference (Prometheus, 20,480 records)
Single enriched record carries full decision provenance
No external dependency to fail against
Local ONNX file — air-gap and SCIF compatible
Change JAR flavor + execution provider for GPU
policy_version + model_version on every emitted record
Single JAR deployment unit

09

Evidence Chain & Reproducibility

This whitepaper is based on a fully reproducible benchmark run. All artifacts below were captured during or immediately after the run and are available for customer validation.

LOG

Benchmark Log Full BENCH window output, 328 lines. Config dump, plugin catalog, embedding init, warmup completion (64 batches), and graceful shutdown sequence including Kafka producer close.

JSON

Sidecar JSON Run metadata: start/end UTC, JVM args, effective settings, profile configuration, Grafana time bounds for replay.

PROM

Prometheus Snapshot Point-in-time metrics at run end: all inference counters, throughput totals, GC overhead (0.063%), heap utilization (303 MB), Kafka sink send times.

JSON

Evidence Index Structured evidence index: Kafka topic describe, 12-partition counts, decision distribution, first parsed sample record with full payload.

LOG

GC Log Full G1GC log with timing and collection detail for JVM health verification. Confirms 0.063% GC overhead across the run.

PS1

Evidence Run Script The PowerShell wrapper that orchestrated topic creation, run execution, and post-run proof validation. Paired with the benchmark matrix CSV, makes this run fully repeatable.

The benchmark matrix row (benchmark-runs/tests_financial_services.csv) makes this run repeatable. Any organization with access to the StreamKernel binary and a local Kafka broker can reproduce the run by executing the evidence script against the same matrix row.

10

About StreamKernel

StreamKernel is a JVM-native event pipeline runtime built for enterprises that require in-process AI inference, per-event provenance, and policy enforcement at stream speed. The core architecture couples a transport-agnostic pipeline kernel with a native ONNX/DJL inference engine, an MLflow live model swap and rollback mechanism (sub-30-second swap without pipeline restart), an MCP agent control plane, and OPA/mTLS/OIDC/Keycloak/Schema Registry security integration.

Benchmarked Capabilities

Kafka ALO baseline: 621K avg / 820K peak EPS, zero dropped records across 372M+ events
MongoDB vector pipeline (optimized): 336.8 records/sec, 28.3× improvement from baseline, zero record loss
Pure MongoDB write baseline: 163K docs/sec avg, 256K peak, 95.5M docs, 6.45× WiredTiger compression
mTLS + OPA: 366K ops/sec, 217M records
AML/sanctions triage: 68.1 EPS avg / 99.8 EPS peak, 0.049ms per-record ONNX inference, 20,480 records, zero drops

Company

StreamKernel LLC is a Florida-registered company (formed May 12, 2026). The core runtime architecture is protected under US Provisional Patent Application No. 64/057,035 (filed May 4, 2026). Registered in SAM.gov · AWS Activate approved · NVIDIA Inception accepted · SBIR/STTR eligible. Targeting AFWERX, DARPA, Army DEVCOM, and NIWC for DoD SBIR engagements.

Led by Steven Lopez — Confluent Solutions Architecture · Red Hat · MITRE · FedEx.