Financial Services · AML / Sanctions

Real-Time AML & Sanctions Triage
at Sub-Millisecond Inference

In-process ONNX embedding inside the JVM. No model server. No network hop. Every payment event screened, scored, and provenance-stamped before it leaves the pipeline.

Benchmark-Backed CPU-Only Baseline 20,480 Records · Zero Drops 0.049 ms Avg Inference
20,480
Records Processed
0.049ms
Avg ONNX Inference
0
Records Dropped
68.1
Avg EPS (CPU-only)
99.8
Peak EPS
100%
Batch Fill Rate
Run ID: run-financial-aml-sanctions-01  ·  Date: June 7, 2026  ·  Patent: US Prov. App. No. 64/057,035
Request Demo →
01

Executive Summary

Financial institutions face a compounding compliance burden: every payment must be screened for sanctions exposure and scored for AML risk before it reaches downstream clearing, case management, or reporting infrastructure. Most architectures introduce this screening as a separate microservice or external API call — creating a network hop that adds latency, exposes a failure boundary, and severs the audit trail between the transaction record and the compliance decision.

StreamKernel eliminates that boundary. This whitepaper documents a reproducible benchmark in which StreamKernel processed 20,480 synthetic payment events through a full AML and sanctions triage pipeline — embedding, risk scoring, decision labeling, case prioritization, and provenance stamping — inside a single JVM process, with zero dropped records and zero DLQ events, at an average throughput of 68.1 EPS (peak 99.8 EPS) on CPU-only hardware.

The headline inference figure is 0.049 ms average per-record ONNX embedding latency — measured directly from Prometheus, across all 20,480 records, at 100% batch fill. There is no model server. There is no network hop. The embedding, the AML logic, and the provenance label are produced inside the same JVM thread that received the payment event.

Every emitted record carries a complete decision package: AML risk score, risk band, decision label, case priority, typology codes, reason codes, sanctions screen result, policy version, model version, and a SHA-256 provenance hash. The output is route-ready for case management systems, SAR queues, sanctions operations desks, or analytical warehouses without further enrichment.

This run represents the CPU-only deployment floor — the minimum configuration available to any organization without GPU infrastructure. The same architecture supports DJL CUDA backends for GPU-accelerated inference where that compute is available.

02

The Compliance Architecture Problem

The standard pattern for real-time AML and sanctions screening treats the compliance decision as a service call. A payment event arrives at a streaming platform, triggers a consumer, which calls an external risk scoring API or sanctions screening service, waits for a response, and then publishes an enriched record downstream. This approach is operationally familiar but carries structural costs that compound at scale.

Latency Accumulation

Each round-trip to an external model server or screening API adds network latency — typically 5–50ms under normal conditions, but unbounded under load. For payment rails where sub-second decisioning is a regulatory expectation, this creates a structural ceiling on throughput that cannot be resolved by horizontal scaling alone.

Audit Trail Fragmentation

When the compliance decision is produced by a separate service, the provenance chain is fragmented. The transaction record exists in one system; the model version, policy version, and decision rationale exist in another. Reconstructing the full decision path for a SAR filing, regulatory examination, or internal audit requires joining across systems that were never designed to be joined at query time.

Failure Mode Coupling

A microservice architecture for compliance creates a hard dependency: if the screening service is unavailable, degraded, or rate-limited, the payment pipeline stalls or must bypass screening entirely — a situation that is both operationally painful and potentially reportable. Operators face a forced choice between availability and compliance continuity.

Deployment Surface in Regulated Environments

Air-gapped deployments, SCIF environments, and NIPRNet-connected infrastructure cannot rely on cloud-hosted model endpoints. Organizations operating in these environments have historically been forced to either accept reduced screening capability or maintain complex on-premises inference infrastructure with its own lifecycle management burden.

StreamKernel inverts this model. The ONNX model, the AML policy logic, the provenance labeler, and the Kafka sink are all loaded into the same JVM process. A payment event is parsed, embedded, scored, labeled, and emitted without leaving the process boundary.

03

Pipeline Architecture

Transform Chain

The AML/sanctions triage pipeline executes a three-stage transform chain inside a single JVM process:

SYNTHETIC
STRING_TO_WIREEVENT
DJL_EMBEDDING
AML_SANCTIONS_TRIAGE
Kafka

Stage 1 — STRING_TO_WIREEVENT: Raw payment text (amount, merchant, channel, country, device fingerprint, velocity signals, account age, sanctions screen flag) is parsed and normalized into a typed WireEvent object. The parser extracts the payload ID as the Kafka message key for downstream deduplication and routing.

Stage 2 — DJL_EMBEDDING: The WireEvent is passed to the in-process ONNX embedding engine. MiniLM-L6-v2 is loaded natively via DJL and ONNX Runtime 1.20.0, executing on the CPU provider with 2 intra-op threads and 1 inter-op thread per predictor. The embedding pool maintains 4 pre-warmed predictors sharing model weights. A batching engine aggregates records into batches of up to 16 before invoking the ONNX forward pass, amortizing fixed overhead across the batch.

Stage 3 — AML_SANCTIONS_TRIAGE: The embedded vector, combined with structured transaction fields, is evaluated against the AML policy. Configurable thresholds: WATCH at score ≥ 0.42, ESCALATE/HOLD at score ≥ 0.70. The transform emits a fully enriched record carrying the decision label, risk band, case priority, typology codes, reason codes, and a SHA-256 hash of the source text.

Runtime Configuration

ParameterValue
Pipeline IDsk-financial-aml-sanctions-triage
Parallelism4 workers
Batch size16 records per batch
Heap4 GB (-Xms4g -Xmx4g), G1GC, MaxGCPauseMillis=50
ONNX Execution ProviderCPU (CPUExecutionProvider)
ONNX Runtime Version1.20.0
DJL Version0.32.0
ModelMiniLM-L6-v2 (minilm-l6-v2-onnx)
Embedding Pool Size4 pre-warmed predictors
Tokenizer Max Length16 tokens
Batching EngineEnabled — max batch 16, flush every 10 ms
Warmup Batches64 batches before first live record
SinkKafka · lz4 compression · 12 partitions · acks=1
StreamKernel Version0.2.0

Output Payload Contract

Every record emitted carries the following fields. In addition, each Kafka record carries routing headers (aml.score, aml.decision, aml.risk_band, streamkernel.provenance.*) enabling downstream routing without deserialization.

FieldTypeDescription
transaction_idStringPayment event identifier
sanctions_screen_hitBooleanTrue if sanctions list match detected
aml_risk_scoreFloat [0,1]Composite AML risk score from embedding + rules
risk_bandEnumLOW / MEDIUM / HIGH
decisionEnumCLEAR / WATCH / ESCALATE / HOLD
case_priorityEnumP1 (highest) through P4 (lowest)
typologiesString[]AML typology codes (e.g., VELOCITY_ANOMALY)
reason_codesString[]Signal codes (e.g., WIRE_TRANSFER_CHANNEL)
policy_versionStringPolicy identifier for audit reproducibility
model_versionStringModel identifier for audit reproducibility
source_text_sha256StringSHA-256 hash of source text for trace correlation
04

Benchmark Results

Inference Timing — Prometheus (all 20,480 records)

The latency sampler was intentionally disabled (streamkernel.latency.enabled=false) to remove sampling overhead from the throughput baseline. All inference timing is captured via dedicated embedding metrics instrumentation.

0.049ms
Avg ONNX inference per record
infer_ms_sum / 20,480 records
0.129ms
Max ONNX inference per record
sk_ai_embedding_infer_ms_seconds_max
0.675ms
Avg ONNX batch time (batch=16)
onnx_batch_ms_sum / 1,441 batches
2.016ms
Max ONNX batch time observed
sk_ai_embedding_onnx_batch_ms_seconds_max
100%
Batch fill rate throughout run
sk_ai_embedding_batch_fill_percent
0.063%
GC overhead (G1GC, 4GB heap)
jvm_gc_overhead = 6.3E-4

Run Summary

MetricValue
Run IDrun-financial-aml-sanctions-01
Run Window2026-06-07T23:08:21Z → 23:14:10Z (5.82 min actual)
Total Records Processed20,480
Records Delivered (Kafka)20,480
Records Dropped0
DLQ Events0
Empty Batches0
Average PROC_EPS68.1
Peak PROC_EPS99.8
BENCH Windows (5s)60
Kafka Partitions12
Heap at Run End303 MB / 4 GB
Clean Pipeline StopYes — graceful drain + Kafka producer close

Sample Enriched Record

First parsed sample from post-run proof validation — a CLEAR decision on a wire transfer, showing the complete decision package including typologies and reason codes even for cleared transactions:

{ "transaction_id": "TXN-102320", "amount_usd": 250.4, "channel": "wire", "country": "GB", "sanctions_screen_hit": false, "aml_risk_score": 0.2904, "risk_band": "LOW", "decision": "CLEAR", "case_priority": "P4", "reason_codes": ["WIRE_TRANSFER_CHANNEL", "ELEVATED_ACTIVITY_VELOCITY"], "typologies": ["FUNDS_TRANSFER_ACTIVITY", "VELOCITY_ANOMALY"], "policy_version": "aml-sanctions-policy-v1", "model_version": "deterministic-aml-sanctions-v1", "source_text_included": false, "source_text_sha256": "sha256:572211679fc98e36c064473e58df68b0d9a36e3ca109778ee31d46366cf1dffa" }

Even a CLEAR, P4-priority transaction carries the full evidence package. The wire transfer channel and elevated velocity are surfaced as reason codes and mapped to AML typologies, giving the compliance reviewer an auditable rationale for the CLEAR decision — not merely the outcome.

05

Compliance Relevance

Audience Mapping

AudienceField(s) UsedWorkflow Value
AML Analystsdecision, risk_band, typologiesImmediate triage — WATCH/ESCALATE cases route to analyst queues without manual scoring
Sanctions Operationssanctions_screen_hit, decisionAutomatic HOLD routing for sanctions hits; CLEAR records bypass manual review
Case Managementcase_priority, reason_codesP1/P2 cases escalate immediately; P3/P4 enter standard review queues
Compliance Reviewerspolicy_version, model_versionReproducible decisions — each record carries the exact policy and model that produced it
Internal Audit / Legalsource_text_sha256Unbroken audit chain from transaction text hash through decision to emitted record
SAR Filing Teamstypologies, reason_codesPre-populated typology and narrative evidence for Suspicious Activity Reports
Platform Engineersstreamkernel.provenance.*Kafka header routing to SAR queues or warehouse sinks without deserialization

Provenance and Auditability

Every record carries a SHA-256 hash of the source transaction text. In this benchmark profile, the raw source text is excluded from the payload (source_text_included: false) — for environments where storing PII in the evidence topic is undesirable. The hash provides a cryptographic tie between the decision record and the original transaction without transmitting sensitive payload downstream.

The combination of policy_version and model_version fields means every decision is reproducible: given the same transaction text, policy version, and model version, the pipeline will produce the same decision. This satisfies regulatory examination and litigation hold scenarios where the institution must demonstrate that a given transaction was evaluated correctly under the policy in effect at the time.

Threshold Configurability

AML thresholds are pipeline properties, not compiled logic:

transform.aml_sanctions.watch.threshold=0.42 transform.aml_sanctions.case.threshold=0.70

Adjustable per deployment environment without recompiling. The policy_version field provides a versioned record of which threshold configuration produced each decision, satisfying model risk management requirements.

06

Operational Profile

JVM and GC Behavior

The benchmark ran on a 4 GB heap with G1GC configured for a 50ms maximum GC pause target and 16 MB region size. GC overhead at run end was 0.063% — effectively negligible. The heap at run end was 303 MB used of 4 GB allocated. The G1GC sawtooth pattern visible in the BENCH window MEM column (cycling ~128 MB → 298 MB → ~128 MB) is expected behavior for a throughput-optimized configuration and does not affect throughput or record integrity.

Reliability Guarantees

Across 20,480 records and 60 five-second measurement windows:

  • Zero records dropped — streamkernel_pipeline_dropped_total = 0
  • Zero DLQ events — streamkernel_pipeline_dlq_total = 0
  • Zero empty batches — streamkernel_pipeline_empty_batch_total = 0
  • 100% fidelity — processed_total = out_total = kafka_sent_ok_total = 20,480
  • Clean graceful shutdown — Kafka producer closed with 30-second timeout, all in-flight records drained before process exit

Scaling Path

This benchmark is the CPU-only deployment floor — a development workstation with 12 host cores, 4-worker configuration, no GPU. A production deployment on dedicated server hardware with 32+ cores and a larger predictor pool would yield materially higher throughput with no architectural changes.

The same pipeline configuration supports DJL's CUDA backend via the GPU JAR flavor — a single configuration change. GPU inference eliminates the primary throughput constraint in this benchmark and is the recommended configuration for high-volume payment rails.

07

Deployment Contexts

Payment Rail Compliance

Primary target: a financial institution operating a real-time payment rail — ACH, wire transfer, card authorization, or instant payment — where every transaction must be screened before downstream processing. StreamKernel positions as the compliance enrichment layer between the raw event stream and case management, clearing, or reporting infrastructure. The sub-millisecond inference profile is compatible with synchronous payment decisioning requirements.

Air-Gapped and Regulated Infrastructure

The CPU-only JAR flavor and the absence of any external model endpoint make this architecture directly deployable in air-gapped environments. The ONNX model file, tokenizer, and policy configuration are loaded from local paths at startup. No calls to cloud APIs, no license server dependencies, no runtime network requirements beyond the Kafka broker. Tested for SCIF and NIPRNet compatibility.

On-Premises Enterprise (DGX Station)

For enterprises deploying NVIDIA DGX Station or similar hardware, StreamKernel provides the governance and policy enforcement layer between the compute and downstream data consumers. The same pipeline that runs on a CPU runs on a DGX Station GPU with a single configuration change. Policy logic, provenance labeling, and Kafka delivery contract are identical across deployment targets.

Multi-Jurisdiction Compliance

The policy_version field and configurable threshold properties support multi-jurisdiction deployments where different regulatory regimes require different screening thresholds or typology mappings. A single StreamKernel deployment can run parallel pipeline instances with jurisdiction-specific configurations, each emitting to a jurisdiction-specific evidence topic.

08

Architecture Comparison

External Microservice Pattern

  • Separate model server — 1+ network hop per record
  • 5–50ms inference latency (network dependent, unbounded under load)
  • Audit trail fragmented across systems
  • Screening bypass if API unavailable
  • Cloud endpoint required — not air-gap compatible
  • Separate GPU inference server for acceleration
  • Policy version not embedded in event record
  • Two deployment units: pipeline + model server

StreamKernel In-Process

  • Same JVM process — zero network hops
  • 0.049ms avg inference (Prometheus, 20,480 records)
  • Single enriched record carries full decision provenance
  • No external dependency to fail against
  • Local ONNX file — air-gap and SCIF compatible
  • Change JAR flavor + execution provider for GPU
  • policy_version + model_version on every emitted record
  • Single JAR deployment unit
09

Evidence Chain & Reproducibility

This whitepaper is based on a fully reproducible benchmark run. All artifacts below were captured during or immediately after the run and are available for customer validation.

LOG
Benchmark Log Full BENCH window output, 328 lines. Config dump, plugin catalog, embedding init, warmup completion (64 batches), and graceful shutdown sequence including Kafka producer close.
JSON
Sidecar JSON Run metadata: start/end UTC, JVM args, effective settings, profile configuration, Grafana time bounds for replay.
PROM
Prometheus Snapshot Point-in-time metrics at run end: all inference counters, throughput totals, GC overhead (0.063%), heap utilization (303 MB), Kafka sink send times.
JSON
Evidence Index Structured evidence index: Kafka topic describe, 12-partition counts, decision distribution, first parsed sample record with full payload.
LOG
GC Log Full G1GC log with timing and collection detail for JVM health verification. Confirms 0.063% GC overhead across the run.
PS1
Evidence Run Script The PowerShell wrapper that orchestrated topic creation, run execution, and post-run proof validation. Paired with the benchmark matrix CSV, makes this run fully repeatable.

The benchmark matrix row (benchmark-runs/tests_financial_services.csv) makes this run repeatable. Any organization with access to the StreamKernel binary and a local Kafka broker can reproduce the run by executing the evidence script against the same matrix row.

10

About StreamKernel

StreamKernel is a JVM-native event pipeline runtime built for enterprises that require in-process AI inference, per-event provenance, and policy enforcement at stream speed. The core architecture couples a transport-agnostic pipeline kernel with a native ONNX/DJL inference engine, an MLflow live model swap and rollback mechanism (sub-30-second swap without pipeline restart), an MCP agent control plane, and OPA/mTLS/OIDC/Keycloak/Schema Registry security integration.

Benchmarked Capabilities

  • Kafka ALO baseline: 621K avg / 820K peak EPS, zero dropped records across 372M+ events
  • MongoDB vector pipeline (optimized): 336.8 records/sec, 28.3× improvement from baseline, zero record loss
  • Pure MongoDB write baseline: 163K docs/sec avg, 256K peak, 95.5M docs, 6.45× WiredTiger compression
  • mTLS + OPA: 366K ops/sec, 217M records
  • AML/sanctions triage: 68.1 EPS avg / 99.8 EPS peak, 0.049ms per-record ONNX inference, 20,480 records, zero drops

Company

StreamKernel LLC is a Florida-registered company (formed May 12, 2026). The core runtime architecture is protected under US Provisional Patent Application No. 64/057,035 (filed May 4, 2026). Registered in SAM.gov · AWS Activate approved · NVIDIA Inception accepted · SBIR/STTR eligible. Targeting AFWERX, DARPA, Army DEVCOM, and NIWC for DoD SBIR engagements.

Led by Steven Lopez — Confluent Solutions Architecture · Red Hat · MITRE · FedEx.