Healthcare PHI Detection Tokenization Redaction Local AI

PHI-Aware Governed
AI Pipeline

Detect PHI fields, apply redaction or tokenization, run local AI enrichment, and write a governed record to Kafka — all in the same JVM process, before the event touches any downstream consumer.

72,832 rec
5 min run
~243 r/s
Avg throughput
0 err
Pipeline errors
18.1 ms
ONNX avg / rec

Healthcare data platforms face a real tension: AI enrichment — risk scoring, care-gap detection, population analytics — requires events to move. But many of those same events contain PHI that can't reach external models or unmanaged analytics sinks without prior inspection.

This pipeline resolves that in-process. By the time a record is written to Kafka it already has a policy decision, an enumerated list of detected PHI fields, a transformed delivered_payload (redacted, tokenized, or blocked), local AI enrichment labels, and a full provenance chain.

How it compares to the Defense Telemetry profile

Same hardware, same JVM settings, same pipeline skeleton — different tokenizer context. The shorter token max length (16 vs 24) is the primary throughput driver.

Avg throughput
243 r/s vs 84 defense
ONNX per-record
18.1 ms vs 61 defense
Token max length
16 vs 24 defense
Records / 5 min
72,832 vs 25,344
!

This is a PHI-aware governed AI pipeline pattern using safe synthetic events. It demonstrates the structural controls — detection, labeling, redaction, tokenization, routing, provenance — that a HIPAA-compliant deployment would require. It does not constitute HIPAA compliance out of the box. All patient, member, provider, diagnosis, and claim data is fully synthetic.

Pipeline & Config

SYNTHETIC
HC SOURCE
Input
STRING_TO
_WIREEVENT
Parse
DJL_
EMBEDDING
Enrich
PHI_
GUARDRAIL
Govern
KAFKA
GOVERNED
Sink

Pipeline profile — exact config from this run

streamkernel_healthcare_phi_safe_ai.properties properties
# ── Pipeline identity ────────────────────────────────────────
pipeline.id                              = sk-healthcare-phi-safe-ai
pipeline.parallelism                     = 4
pipeline.batch.size                      = 16
pipeline.drain.timeout.ms                = 30000

# ── Transform chain ──────────────────────────────────────────
transform.chain                          = STRING_TO_WIREEVENT,DJL_EMBEDDING,PHI_GUARDRAIL
transform.version                        = healthcare-phi-safe-ai-v1

# ── PHI guardrail policy ─────────────────────────────────────
transform.phi_guardrail.policy.version   = healthcare-phi-governance-policy-v1
transform.phi_guardrail.model.version    = deterministic-healthcare-ai-v1
transform.phi_guardrail.high_claim.amount= 5000.0   # claim_amount ≥ this → elevated risk
transform.string_to_wireevent.use.payload.id.as.key = true

# ── ONNX embedding ───────────────────────────────────────────
ai.embedding.enabled                     = true
ai.embedding.engine                      = OnnxRuntime
ai.embedding.device                      = cpu
ai.embedding.pool.size                   = 4
ai.embedding.batching.enabled            = true
ai.embedding.batching.max.size           = 16
ai.embedding.batching.flush.ms           = 10
ai.embedding.onnx.intra.op.threads       = 2
ai.embedding.onnx.inter.op.threads       = 1
ai.embedding.onnx.warmup.count           = 64
ai.embedding.tokenizer.max.length        = 16  # ← shorter than defense (24) → 3.4× faster
ai.embedding.input.max.chars             = 768
ai.embedding.normalize                   = true
ai.embedding.model.uri                   = file:///models/minilm-l6-v2-onnx/model.onnx
ai.embedding.tokenizer.uri               = file:///models/minilm-l6-v2-onnx/tokenizer.json
ai.feature.version                       = healthcare-phi-features-v1

# ── Source ───────────────────────────────────────────────────
source.type                              = SYNTHETIC
source.synthetic.text.profile            = HEALTHCARE_EVENTS
source.synthetic.payload.size            = 768    # larger than defense (640)
source.synthetic.max.records.per.second  = 0

# ── Kafka sink ───────────────────────────────────────────────
sink.type                                = KAFKA
sink.kafka.topic                         = streamkernel-healthcare-phi-governed
sink.kafka.bootstrap.servers             = localhost:9092
sink.kafka.compression.type              = lz4
sink.kafka.acks                          = 1
sink.kafka.batch.size                    = 262144
sink.kafka.linger.ms                     = 10    # slightly longer than defense (5ms)
sink.kafka.enable.idempotence            = false

# ── Security ─────────────────────────────────────────────────
security.type                            = PERMIT_ALL    # replace with OIDC for prod

# ── Observability ────────────────────────────────────────────
metrics.provider                         = PROMETHEUS
metrics.prometheus.port                  = 8080
metrics.tag.run_id                       = run-healthcare-phi-01
metrics.tag.env                          = healthcare-demo

PHI Decisions

The PHI_GUARDRAIL transformer evaluates each event against a deterministic healthcare policy. It detects PHI fields, assesses the requested destination, scores privacy risk, and emits one of four decisions. All four were observed and verified in this run.

Detected PHI fields across all event types:

payload.patient_id payload.member_id payload.provider_id payload.diagnosis_code payload.clinical_note payload.claim_amount
BLOCK
→ dlq
PHI detected + destination is external model or unmanaged sink. Payload replaced with block notice; original not retained. AI enrichment disabled.
PHI_FIELDS_DETECTED PHI_NOT_ALLOWED_FOR_EXTERNAL_MODEL
REDACT
→ audit_review
PHI detected + destination is internal analytics. Direct identifiers replaced with ***. Structured fields preserved.
PHI_FIELDS_DETECTED SENSITIVE_DATA_REDACTED
ESCALATE
→ audit_review
High-risk event type (risk scoring / clinical) routed for human review. Redaction + tokenization both applied. AI enrichment enabled — full labels.
PHI_FIELDS_DETECTED HIGH_RISK_HEALTHCARE_EVENT
TOKENIZE
→ operational_sink
PHI present but destination is a permitted care app. Direct IDs replaced with deterministic tokens. Diagnosis bucketed to ICD group. AI enrichment enabled.
PHI_FIELDS_DETECTED DIRECT_IDENTIFIERS_TOKENIZED

Decision coverage confirmed by post-run script: BLOCK ✓   REDACT ✓   ESCALATE ✓   TOKENIZE ✓

Output Contract

What the transformer actually does to the payload

The key thing distinguishing this from a routing-only pattern: the transformer constructs a delivered_payload that reflects the decision. A BLOCK produces a stub. A TOKENIZE produces real data with identifiers swapped for stable tokens. The original payload is never forwarded as-is when PHI is present.

BLOCK — delivered_payload
{
  "blocked": true,
  "original_payload_retained": false,
  "review_reason": "PHI blocked before
  external model or unmanaged
  destination"
}
TOKENIZE — delivered_payload
{
  "patient_token":  "tok_8c29c1e64ecb",
  "member_token":   "tok_3ed8e0b853c8",
  "provider_token": "tok_fab89005e89c",
  "diagnosis_group":"J45",
  "claim_amount_bucket":"1000_to_4999",
  "clinical_note": "Provider submitted
  supporting note for review."
}

Local AI enrichment — conditionally applied

AI enrichment runs inside the same transformer using the embedding produced by DJL_EMBEDDING. No second process, no external API call. It's skipped entirely for BLOCK decisions:

ai_enrichment field — TOKENIZE example json
{
  "ai_enrichment": {
    "enabled":         true,
    "event_category": "patient_engagement",
    "urgency":         "low",
    "claim_risk_score": 0.1227
  }
}

// BLOCK decision — enrichment skipped
{
  "ai_enrichment": {
    "enabled":         false,
    "event_category": "not_enriched_blocked",
    "urgency":         "n/a",
    "claim_risk_score": 0.0
  }
}

Kafka message headers

healthcare.policy.decisionTOKENIZE
healthcare.routeoperational_sink
healthcare.contains_phitrue
healthcare.privacy_riskmedium
healthcare.phi.fieldspatient_id, member_id, provider_id, diagnosis_code, clinical_note, claim_amount
healthcare.redaction.appliedfalse
healthcare.policy.versionhealthcare-phi-governance-policy-v1
healthcare.model.versiondeterministic-healthcare-ai-v1
streamkernel.provenance.pipeline.idsk-healthcare-phi-safe-ai
streamkernel.provenance.transform.chainSTRING_TO_WIREEVENT → DJL_EMBEDDING → PHI_GUARDRAIL
streamkernel.provenance.model.aliaschampion
streamkernel.provenance.model.stageproduction
streamkernel.provenance.model.ref.sha25612f054f7c8e13bae3db9b20761dd44e5c2c598749ae48b0f5a6ce0d7d5f7b16d
streamkernel.provenance.config.sha25677bdee4e82a77b5e93820e6a39152c09a7db5a53065524347b6c09b4bcc71e09
streamkernel.provenance.inference.timestamp2026-05-23T00:36:58.778Z

Reproduce It

1
Start Kafka
Broker at localhost:9092. Use the included Docker Compose with tmpfs for Kafka data to avoid overlay2 bottleneck.
2
Confirm pre-run state
.\scripts\demo_before_healthcare_phi.ps1
Verifies broker, topic state (must exist with 0 messages or not exist), ONNX model, tokenizer JSON, and prints the expected output contract.
3
Run the benchmark
.\test-java-runner.ps1 `
  -MatrixFile .\benchmark-runs\tests_healthcare.csv `
  -SingleTest streamkernel_healthcare_phi_safe_ai_5m
4
Verify all four decisions
.\scripts\demo_after_healthcare_phi.ps1 -SampleCount 3
Checks record count, 12 header keys, JSON contract, and confirms all four policy decisions (BLOCK, REDACT, ESCALATE, TOKENIZE) were observed.

Benchmark Results

Hardware: Intel i9-8950HK (12 cores), CPU-only. Same hardware and JVM settings as the defense telemetry profile — the throughput difference is entirely explained by token context length.

72,832 rec
Total · 5 min
In = Out. Zero drops.
~243 r/s
Avg throughput
Peak: 370.9 r/s. 2.9× defense profile.
18.1 ms
ONNX per-record
3.4× faster than defense. Token max: 16.
2.8 ms
Tokenize per-rec
Half the defense profile (5.5 ms).
848 MB
Peak JVM heap
Higher than defense — PHI transform objects. G1GC stable.
0 err
All error counters
DLQ · Auth · Source · Dropped

Throughput log trace

streamkernel_healthcare_phi_safe_ai_5m_20260522_2036.log log
# window_s=5  PROC_EPS  EFF%  PROC_TOTAL  MEM     DROPPED
20:36:45   265.4    66.3    1,328     313MB   0   ← fast ramp (shorter token context)
20:37:00   220.4    55.1    4,544     239MB   0
20:37:30   291.2    72.8   11,344     218MB   0
20:40:15   281.7    70.4   51,760     416MB   0
20:40:55   370.9    92.7   61,968  1,153MB   0   ← peak (92.7% of 4-thread ceiling)
20:41:20   316.7    79.2   68,064     503MB   0
20:41:40   147.4    36.8   72,768     845MB   0   ← wind-down

# DROPPED=0 across all 60 windows  |  EMPTY=0 throughout
# MEM at peak (1,153MB) = G1 before collection; sawtooth is normal
# EFF = % of 4-thread × 100 r/s theoretical ceiling

Tuning Knobs

PropertyThis runWhat it does
ai.embedding.tokenizer.max.length 16 Dominant factor. This is why this profile is 3.4× faster than the defense profile (24 tokens). Set to the shortest value your event text supports — every token less is a meaningful speedup on CPU.
transform.phi_guardrail.high_claim.amount 5000.0 Claim amounts at or above this value contribute to elevated privacy risk scoring. Tune to your organization's definition of high-value claims.
sink.kafka.linger.ms 10 Slightly higher than the defense profile (5 ms) to allow larger Kafka batches given the higher record throughput. Increase further if producer throughput is the bottleneck.
ai.embedding.pool.size 4 Number of ONNX predictor instances. Keep pool × intra.op.threads ≤ physical cores to avoid thread contention.
ai.embedding.batching.max.size 16 Batch fill was 100% throughout — the source kept up. You can increase this if latency budget allows. Diminishing returns above 32 on CPU at this token length.

Extending It

Plug in your own PHI detector

The PHI_GUARDRAIL transformer implements the WireEventTransformer SPI. Replace the deterministic field scanner with an NLP-based detector, a custom rule engine, or a fine-tuned clinical classifier — the output contract (headers + delivered_payload) stays the same. Downstream consumers don't change.

Replace the embedding model

Swap all-MiniLM-L6-v2 for a clinical domain model fine-tuned on ICD codes, clinical notes, or claims text. Change ai.embedding.model.uri and ai.embedding.tokenizer.uri to point at the new ONNX artifact — no code changes required.

Swap the sink

Change sink.type to MONGODB to write governed records directly to a vector store for downstream RAG. The healthcare.policy.decision header travels unchanged to any sink type.

Multi-sink fan-out (next increment)

Physical routing to separate topics — TOKENIZE events to local-ai-vector, ESCALATE to audit-review, BLOCK to DLQ — in the same pipeline pass. The header contract is already designed for it.

Enable GPU inference

ai.embedding.device                     = gpu
ai.embedding.onnx.execution.provider    = CUDA
ai.embedding.onnx.enable.cpu.fallback   = true