Executive Summary
Financial institutions face a compounding compliance burden: every payment must be screened for sanctions exposure and scored for AML risk before it reaches downstream clearing, case management, or reporting infrastructure. Most architectures introduce this screening as a separate microservice or external API call — creating a network hop that adds latency, exposes a failure boundary, and severs the audit trail between the transaction record and the compliance decision.
StreamKernel eliminates that boundary. This whitepaper documents a reproducible benchmark in which StreamKernel processed 20,480 synthetic payment events through a full AML and sanctions triage pipeline — embedding, risk scoring, decision labeling, case prioritization, and provenance stamping — inside a single JVM process, with zero dropped records and zero DLQ events, at an average throughput of 68.1 EPS (peak 99.8 EPS) on CPU-only hardware.
The headline inference figure is 0.049 ms average per-record ONNX embedding latency — measured directly from Prometheus, across all 20,480 records, at 100% batch fill. There is no model server. There is no network hop. The embedding, the AML logic, and the provenance label are produced inside the same JVM thread that received the payment event.
Every emitted record carries a complete decision package: AML risk score, risk band, decision label, case priority, typology codes, reason codes, sanctions screen result, policy version, model version, and a SHA-256 provenance hash. The output is route-ready for case management systems, SAR queues, sanctions operations desks, or analytical warehouses without further enrichment.
This run represents the CPU-only deployment floor — the minimum configuration available to any organization without GPU infrastructure. The same architecture supports DJL CUDA backends for GPU-accelerated inference where that compute is available.
The Compliance Architecture Problem
The standard pattern for real-time AML and sanctions screening treats the compliance decision as a service call. A payment event arrives at a streaming platform, triggers a consumer, which calls an external risk scoring API or sanctions screening service, waits for a response, and then publishes an enriched record downstream. This approach is operationally familiar but carries structural costs that compound at scale.
Latency Accumulation
Each round-trip to an external model server or screening API adds network latency — typically 5–50ms under normal conditions, but unbounded under load. For payment rails where sub-second decisioning is a regulatory expectation, this creates a structural ceiling on throughput that cannot be resolved by horizontal scaling alone.
Audit Trail Fragmentation
When the compliance decision is produced by a separate service, the provenance chain is fragmented. The transaction record exists in one system; the model version, policy version, and decision rationale exist in another. Reconstructing the full decision path for a SAR filing, regulatory examination, or internal audit requires joining across systems that were never designed to be joined at query time.
Failure Mode Coupling
A microservice architecture for compliance creates a hard dependency: if the screening service is unavailable, degraded, or rate-limited, the payment pipeline stalls or must bypass screening entirely — a situation that is both operationally painful and potentially reportable. Operators face a forced choice between availability and compliance continuity.
Deployment Surface in Regulated Environments
Air-gapped deployments, SCIF environments, and NIPRNet-connected infrastructure cannot rely on cloud-hosted model endpoints. Organizations operating in these environments have historically been forced to either accept reduced screening capability or maintain complex on-premises inference infrastructure with its own lifecycle management burden.
StreamKernel inverts this model. The ONNX model, the AML policy logic, the provenance labeler, and the Kafka sink are all loaded into the same JVM process. A payment event is parsed, embedded, scored, labeled, and emitted without leaving the process boundary.
Pipeline Architecture
Transform Chain
The AML/sanctions triage pipeline executes a three-stage transform chain inside a single JVM process:
Stage 1 — STRING_TO_WIREEVENT: Raw payment text (amount, merchant, channel, country, device fingerprint, velocity signals, account age, sanctions screen flag) is parsed and normalized into a typed WireEvent object. The parser extracts the payload ID as the Kafka message key for downstream deduplication and routing.
Stage 2 — DJL_EMBEDDING: The WireEvent is passed to the in-process ONNX embedding engine. MiniLM-L6-v2 is loaded natively via DJL and ONNX Runtime 1.20.0, executing on the CPU provider with 2 intra-op threads and 1 inter-op thread per predictor. The embedding pool maintains 4 pre-warmed predictors sharing model weights. A batching engine aggregates records into batches of up to 16 before invoking the ONNX forward pass, amortizing fixed overhead across the batch.
Stage 3 — AML_SANCTIONS_TRIAGE: The embedded vector, combined with structured transaction fields, is evaluated against the AML policy. Configurable thresholds: WATCH at score ≥ 0.42, ESCALATE/HOLD at score ≥ 0.70. The transform emits a fully enriched record carrying the decision label, risk band, case priority, typology codes, reason codes, and a SHA-256 hash of the source text.
Runtime Configuration
| Parameter | Value |
|---|---|
| Pipeline ID | sk-financial-aml-sanctions-triage |
| Parallelism | 4 workers |
| Batch size | 16 records per batch |
| Heap | 4 GB (-Xms4g -Xmx4g), G1GC, MaxGCPauseMillis=50 |
| ONNX Execution Provider | CPU (CPUExecutionProvider) |
| ONNX Runtime Version | 1.20.0 |
| DJL Version | 0.32.0 |
| Model | MiniLM-L6-v2 (minilm-l6-v2-onnx) |
| Embedding Pool Size | 4 pre-warmed predictors |
| Tokenizer Max Length | 16 tokens |
| Batching Engine | Enabled — max batch 16, flush every 10 ms |
| Warmup Batches | 64 batches before first live record |
| Sink | Kafka · lz4 compression · 12 partitions · acks=1 |
| StreamKernel Version | 0.2.0 |
Output Payload Contract
Every record emitted carries the following fields. In addition, each Kafka record carries routing headers (aml.score, aml.decision, aml.risk_band, streamkernel.provenance.*) enabling downstream routing without deserialization.
| Field | Type | Description |
|---|---|---|
| transaction_id | String | Payment event identifier |
| sanctions_screen_hit | Boolean | True if sanctions list match detected |
| aml_risk_score | Float [0,1] | Composite AML risk score from embedding + rules |
| risk_band | Enum | LOW / MEDIUM / HIGH |
| decision | Enum | CLEAR / WATCH / ESCALATE / HOLD |
| case_priority | Enum | P1 (highest) through P4 (lowest) |
| typologies | String[] | AML typology codes (e.g., VELOCITY_ANOMALY) |
| reason_codes | String[] | Signal codes (e.g., WIRE_TRANSFER_CHANNEL) |
| policy_version | String | Policy identifier for audit reproducibility |
| model_version | String | Model identifier for audit reproducibility |
| source_text_sha256 | String | SHA-256 hash of source text for trace correlation |
Benchmark Results
Inference Timing — Prometheus (all 20,480 records)
The latency sampler was intentionally disabled (streamkernel.latency.enabled=false) to remove sampling overhead from the throughput baseline. All inference timing is captured via dedicated embedding metrics instrumentation.
Run Summary
| Metric | Value |
|---|---|
| Run ID | run-financial-aml-sanctions-01 |
| Run Window | 2026-06-07T23:08:21Z → 23:14:10Z (5.82 min actual) |
| Total Records Processed | 20,480 |
| Records Delivered (Kafka) | 20,480 |
| Records Dropped | 0 |
| DLQ Events | 0 |
| Empty Batches | 0 |
| Average PROC_EPS | 68.1 |
| Peak PROC_EPS | 99.8 |
| BENCH Windows (5s) | 60 |
| Kafka Partitions | 12 |
| Heap at Run End | 303 MB / 4 GB |
| Clean Pipeline Stop | Yes — graceful drain + Kafka producer close |
Sample Enriched Record
First parsed sample from post-run proof validation — a CLEAR decision on a wire transfer, showing the complete decision package including typologies and reason codes even for cleared transactions:
{
"transaction_id": "TXN-102320",
"amount_usd": 250.4,
"channel": "wire",
"country": "GB",
"sanctions_screen_hit": false,
"aml_risk_score": 0.2904,
"risk_band": "LOW",
"decision": "CLEAR",
"case_priority": "P4",
"reason_codes": ["WIRE_TRANSFER_CHANNEL", "ELEVATED_ACTIVITY_VELOCITY"],
"typologies": ["FUNDS_TRANSFER_ACTIVITY", "VELOCITY_ANOMALY"],
"policy_version": "aml-sanctions-policy-v1",
"model_version": "deterministic-aml-sanctions-v1",
"source_text_included": false,
"source_text_sha256": "sha256:572211679fc98e36c064473e58df68b0d9a36e3ca109778ee31d46366cf1dffa"
}
Even a CLEAR, P4-priority transaction carries the full evidence package. The wire transfer channel and elevated velocity are surfaced as reason codes and mapped to AML typologies, giving the compliance reviewer an auditable rationale for the CLEAR decision — not merely the outcome.
Compliance Relevance
Audience Mapping
| Audience | Field(s) Used | Workflow Value |
|---|---|---|
| AML Analysts | decision, risk_band, typologies | Immediate triage — WATCH/ESCALATE cases route to analyst queues without manual scoring |
| Sanctions Operations | sanctions_screen_hit, decision | Automatic HOLD routing for sanctions hits; CLEAR records bypass manual review |
| Case Management | case_priority, reason_codes | P1/P2 cases escalate immediately; P3/P4 enter standard review queues |
| Compliance Reviewers | policy_version, model_version | Reproducible decisions — each record carries the exact policy and model that produced it |
| Internal Audit / Legal | source_text_sha256 | Unbroken audit chain from transaction text hash through decision to emitted record |
| SAR Filing Teams | typologies, reason_codes | Pre-populated typology and narrative evidence for Suspicious Activity Reports |
| Platform Engineers | streamkernel.provenance.* | Kafka header routing to SAR queues or warehouse sinks without deserialization |
Provenance and Auditability
Every record carries a SHA-256 hash of the source transaction text. In this benchmark profile, the raw source text is excluded from the payload (source_text_included: false) — for environments where storing PII in the evidence topic is undesirable. The hash provides a cryptographic tie between the decision record and the original transaction without transmitting sensitive payload downstream.
The combination of policy_version and model_version fields means every decision is reproducible: given the same transaction text, policy version, and model version, the pipeline will produce the same decision. This satisfies regulatory examination and litigation hold scenarios where the institution must demonstrate that a given transaction was evaluated correctly under the policy in effect at the time.
Threshold Configurability
AML thresholds are pipeline properties, not compiled logic:
transform.aml_sanctions.watch.threshold=0.42
transform.aml_sanctions.case.threshold=0.70
Adjustable per deployment environment without recompiling. The policy_version field provides a versioned record of which threshold configuration produced each decision, satisfying model risk management requirements.
Operational Profile
JVM and GC Behavior
The benchmark ran on a 4 GB heap with G1GC configured for a 50ms maximum GC pause target and 16 MB region size. GC overhead at run end was 0.063% — effectively negligible. The heap at run end was 303 MB used of 4 GB allocated. The G1GC sawtooth pattern visible in the BENCH window MEM column (cycling ~128 MB → 298 MB → ~128 MB) is expected behavior for a throughput-optimized configuration and does not affect throughput or record integrity.
Reliability Guarantees
Across 20,480 records and 60 five-second measurement windows:
- Zero records dropped — streamkernel_pipeline_dropped_total = 0
- Zero DLQ events — streamkernel_pipeline_dlq_total = 0
- Zero empty batches — streamkernel_pipeline_empty_batch_total = 0
- 100% fidelity — processed_total = out_total = kafka_sent_ok_total = 20,480
- Clean graceful shutdown — Kafka producer closed with 30-second timeout, all in-flight records drained before process exit
Scaling Path
This benchmark is the CPU-only deployment floor — a development workstation with 12 host cores, 4-worker configuration, no GPU. A production deployment on dedicated server hardware with 32+ cores and a larger predictor pool would yield materially higher throughput with no architectural changes.
The same pipeline configuration supports DJL's CUDA backend via the GPU JAR flavor — a single configuration change. GPU inference eliminates the primary throughput constraint in this benchmark and is the recommended configuration for high-volume payment rails.
Deployment Contexts
Payment Rail Compliance
Primary target: a financial institution operating a real-time payment rail — ACH, wire transfer, card authorization, or instant payment — where every transaction must be screened before downstream processing. StreamKernel positions as the compliance enrichment layer between the raw event stream and case management, clearing, or reporting infrastructure. The sub-millisecond inference profile is compatible with synchronous payment decisioning requirements.
Air-Gapped and Regulated Infrastructure
The CPU-only JAR flavor and the absence of any external model endpoint make this architecture directly deployable in air-gapped environments. The ONNX model file, tokenizer, and policy configuration are loaded from local paths at startup. No calls to cloud APIs, no license server dependencies, no runtime network requirements beyond the Kafka broker. Tested for SCIF and NIPRNet compatibility.
On-Premises Enterprise (DGX Station)
For enterprises deploying NVIDIA DGX Station or similar hardware, StreamKernel provides the governance and policy enforcement layer between the compute and downstream data consumers. The same pipeline that runs on a CPU runs on a DGX Station GPU with a single configuration change. Policy logic, provenance labeling, and Kafka delivery contract are identical across deployment targets.
Multi-Jurisdiction Compliance
The policy_version field and configurable threshold properties support multi-jurisdiction deployments where different regulatory regimes require different screening thresholds or typology mappings. A single StreamKernel deployment can run parallel pipeline instances with jurisdiction-specific configurations, each emitting to a jurisdiction-specific evidence topic.
Architecture Comparison
External Microservice Pattern
- Separate model server — 1+ network hop per record
- 5–50ms inference latency (network dependent, unbounded under load)
- Audit trail fragmented across systems
- Screening bypass if API unavailable
- Cloud endpoint required — not air-gap compatible
- Separate GPU inference server for acceleration
- Policy version not embedded in event record
- Two deployment units: pipeline + model server
StreamKernel In-Process
- Same JVM process — zero network hops
- 0.049ms avg inference (Prometheus, 20,480 records)
- Single enriched record carries full decision provenance
- No external dependency to fail against
- Local ONNX file — air-gap and SCIF compatible
- Change JAR flavor + execution provider for GPU
- policy_version + model_version on every emitted record
- Single JAR deployment unit
Evidence Chain & Reproducibility
This whitepaper is based on a fully reproducible benchmark run. All artifacts below were captured during or immediately after the run and are available for customer validation.
The benchmark matrix row (benchmark-runs/tests_financial_services.csv) makes this run repeatable. Any organization with access to the StreamKernel binary and a local Kafka broker can reproduce the run by executing the evidence script against the same matrix row.
About StreamKernel
StreamKernel is a JVM-native event pipeline runtime built for enterprises that require in-process AI inference, per-event provenance, and policy enforcement at stream speed. The core architecture couples a transport-agnostic pipeline kernel with a native ONNX/DJL inference engine, an MLflow live model swap and rollback mechanism (sub-30-second swap without pipeline restart), an MCP agent control plane, and OPA/mTLS/OIDC/Keycloak/Schema Registry security integration.
Benchmarked Capabilities
- Kafka ALO baseline: 621K avg / 820K peak EPS, zero dropped records across 372M+ events
- MongoDB vector pipeline (optimized): 336.8 records/sec, 28.3× improvement from baseline, zero record loss
- Pure MongoDB write baseline: 163K docs/sec avg, 256K peak, 95.5M docs, 6.45× WiredTiger compression
- mTLS + OPA: 366K ops/sec, 217M records
- AML/sanctions triage: 68.1 EPS avg / 99.8 EPS peak, 0.049ms per-record ONNX inference, 20,480 records, zero drops
Company
StreamKernel LLC is a Florida-registered company (formed May 12, 2026). The core runtime architecture is protected under US Provisional Patent Application No. 64/057,035 (filed May 4, 2026). Registered in SAM.gov · AWS Activate approved · NVIDIA Inception accepted · SBIR/STTR eligible. Targeting AFWERX, DARPA, Army DEVCOM, and NIWC for DoD SBIR engagements.
Led by Steven Lopez — Confluent Solutions Architecture · Red Hat · MITRE · FedEx.