Technical Whitepaper · May 2026

Governed Agent Tool Audit

A StreamKernel evidence run for normalizing, enriching, governing, routing, and auditing agent tool-call events before they affect enterprise systems.

Run streamkernel_agent_tool_audit_5m

Status Completed

Runtime Java 21 · CPU JAR

Evidence 22 May 2026

Agentic AI Governance In-Process ONNX Policy + Risk Labels Kafka Evidence Topic Synthetic Safe Demo

Policy Decision Coverage

ALLOWpermitted operational path

AUDIT_ONLYinspect and retain evidence

ALERTafter-hours sensitive access

RESTRICTconstrained write path

BLOCKdisallowed tool action

ESCALATEhuman review route

25,488 governed records

Written to streamkernel-agent-tool-audit-governed with policy, risk, audit, route, model, and provenance metadata.

Executive Summary

Governed Agent Tool Audit demonstrates StreamKernel as an event-boundary control layer for agentic systems. The pipeline takes synthetic agent tool-call events, normalizes them into a stable event envelope, runs in-process ONNX embedding, applies deterministic OPA-style governance and risk rules, stamps provenance, and writes governed audit records to Kafka.

This is not an agent framework and does not replace LangChain, LangGraph, CrewAI, AutoGen, custom planners, memory stores, model gateways, identity systems, or human approval workflows. It sits beside them as a governed event boundary: every agent action becomes an inspectable record with policy decision, risk score, reason codes, routing label, audit record, model labels, and StreamKernel provenance.

Core thesis: enterprise AI adoption needs more than agent planning. It needs a durable governance boundary around what agents read, write, execute, escalate, and route.

25,488

Governed records written

5.21m

Actual run duration

Policy decisions observed

Kafka partitions

Pipeline parallelism

Pipeline batch size

Dropped / DLQ / denied counters

CPU

ONNX Runtime execution

The Business Problem

Agent frameworks typically focus on planning, memory, tool invocation, and orchestration. Enterprise teams also need to answer different questions: Was the tool call allowed? Was the target resource sensitive? Was the action high-risk? Was a human review required? Which policy version made the decision? Did the event carry provenance downstream?

Without a governed event boundary, each agent application tends to implement its own logging, policy, routing, and audit conventions. That creates inconsistent evidence and makes it harder for platform, security, risk, and compliance teams to reason about agent behavior across frameworks.

Agent teams

Keep their planner and tool architecture, while sending tool-call events into a normalized governance stream.

SEC

Security teams

See blocked, alerted, escalated, restricted, and audit-only actions in an inspectable stream.

RISK

Risk teams

Get risk labels, reason codes, resource sensitivity, trust-tier context, and review routes.

AUD

Compliance teams

Receive an audit record for every governed agent decision, with model and policy labels attached.

Runtime Architecture

The run exercises a single StreamKernel pipeline profile. It uses a synthetic agent tool-call source, a stable event envelope transformer, in-process ONNX embedding, an agent tool audit transformer, and Kafka delivery to a governed evidence topic.

Figure 1 — Governed agent tool-call pipeline

Layer	Component	Run value
Source	`SYNTHETIC`	Generates agent tool events without customer systems or real logs.
Normalize	`STRING_TO_WIREEVENT`	Creates stable key, byte payload, and source metadata.
Model execution	`DJL_EMBEDDING`	Runs ONNX inference in-process before governance enrichment.
Governance	`AGENT_TOOL_AUDIT`	Adds policy decisions, risk labels, audit records, route labels, escalation details, and DLQ details.
Delivery	`KAFKA`	Writes inspectable records to `streamkernel-agent-tool-audit-governed`.
Evidence	`streamkernel.provenance.enabled=true`	Stamps config, transform, model, and run identifiers.

Run Configuration

The profile used parallelism 4, batch size 16, a fixed executor, disabled latency sampling, disabled cache, and Kafka sink delivery. It used MiniLM ONNX artifacts through DJL/ONNX Runtime on CPU, with a predictor pool of 4 and batch size 16.

pipeline.id=sk-agent-tool-audit
pipeline.parallelism=4
pipeline.batch.size=16
transform.chain=STRING_TO_WIREEVENT,DJL_EMBEDDING,AGENT_TOOL_AUDIT
sink.type=KAFKA
sink.kafka.topic=streamkernel-agent-tool-audit-governed
source.synthetic.text.profile=AGENT_TOOL_CALLS
ai.embedding.engine=OnnxRuntime
ai.embedding.pool.size=4
ai.embedding.batching.max.size=16
security.type=PERMIT_ALL

Security boundary: this is a reproducible local evidence profile. It uses PERMIT_ALL and PLAINTEXT Kafka in the run; production profiles should replace those with enterprise security controls.

Evidence Baseline

The pre-run script confirmed Kafka reachability, absence of the target topic before the run, presence of the pipeline profile, benchmark matrix row, ONNX model, tokenizer, and the expected governed-agent JSON/header contract. The post-run script confirmed the topic existed, counted records by partition, validated required headers and JSON fields, and sampled governed output records.

Evidence item	Observed value
Test name	`streamkernel_agent_tool_audit_5m`
Run ID	`run-agent-tool-audit-01`
Status	COMPLETED
Actual duration	5.21 minutes
Kafka topic	`streamkernel-agent-tool-audit-governed`
Total records written	25,488
Kafka partitions	12
Policy decision coverage	ALLOW AUDIT_ONLY ALERT RESTRICT BLOCK ESCALATE
Metrics snapshot	Prometheus captured on port 8080
Runtime artifact	CPU all-in-one JAR

Figure 2 — Evidence trail: before, run, after

Policy Decision Coverage

The post-run evidence confirmed all six policy outcomes. This is the key buyer-facing point: the pipeline does not merely log agent actions; it classifies and routes them according to policy and risk.

Decision	Representative meaning	Route / outcome
ALLOW	Approved operational action.	Operational route.
AUDIT_ONLY	Low-risk read-only action retained for audit and analytics.	`analytics_sink` and `audit_sink`.
ALERT	Sensitive after-hours access requiring alerting.	`alert_topic` plus audit route.
RESTRICT	Privileged write action permitted only with restrictions.	`restricted_operational_sink` plus audit route.
BLOCK	Disallowed or destructive tool action.	Blocked route or DLQ-style path.
ESCALATE	High-impact or low-confidence action requiring review.	`review_queue` plus audit route.

Engineering takeaway: the decision is carried both in the JSON payload and in Kafka headers, allowing downstream consumers, SIEM pipelines, review queues, and analytics sinks to route without reparsing the entire event body.

Governed Output Contract

Each emitted record carries a normalized agent event, policy decision, risk metadata, audit record, route labels, and StreamKernel provenance. The post-run script verified required JSON fields and Kafka headers.

JSON

Payload contract

event_id, agent_id, session_id, user_id, tool, action, resource_type, policy_decision, risk_level, risk_score, reason_codes, risk_reasons, audit_record, normalized_context, target_routes.

HDR

Kafka headers

agent.policy.decision, agent.route, agent.risk.level, agent.risk.score, agent.reason_codes, agent.policy.version, agent.model.version, agent.tool, agent.action, agent.audit.required, and streamkernel.provenance.*.

{
  "event_type": "agent_tool_call_audit",
  "tool": "retrieve_account_activity",
  "action": "read",
  "policy_decision": "ALERT",
  "route": "alert_topic",
  "risk_level": "medium",
  "risk_score": 0.63,
  "reason_codes": ["AFTER_HOURS_SENSITIVE_ACTION"],
  "audit_required": true,
  "safe_synthetic_demo": true
}

Performance Envelope

The run processed and emitted 25,488 records with in-process ONNX embedding, deterministic agent-tool governance, Kafka delivery, and provenance stamping. The observed pipeline total maps to approximately 85 records/sec over the 300-second benchmark window, or approximately 81.5 records/sec when measured against the full 5.21-minute process window including lifecycle overhead.

~85/s

Benchmark-window throughput

1,629

Embedding batches

100%

Final batch fill metric

36.3ms

Max observed GC pause

Metric	Observed value
Pipeline in / processed / out	25,488 / 25,488 / 25,488
Dropped / DLQ / denied counters	0 / 0 / 0
Kafka sent OK	25,488
Embedding records	25,488
Embedding batches	1,629
Average speedometer EPS	84.8 records/sec across 60 five-second windows
Observed speedometer range	38.4 to 163.2 records/sec
Kafka request latency average	~11.38 ms
Kafka queue time average	~12.59 ms
G1 heap after GC	~4.1% long-lived heap usage

Interpretation: this run is evidence of governed agent-record processing with local CPU inference. It should not be presented as a maximum throughput benchmark; the profile intentionally performs ONNX embedding and governance enrichment per event.

Security and Deployment Posture

The run is intentionally safe as a synthetic local demo, but not hardened as a production security profile. It uses PERMIT_ALL, PLAINTEXT Kafka, and Prometheus bound broadly without auth/TLS, which the runtime warns about. Those settings are acceptable for a local evidence run and should be replaced for enterprise demonstrations.

✓Use synthetic agent tool-call events; no customer logs or real agent data required.

✓Preserve policy, model, transform, config, source, sink, and run provenance.

✓Replace PERMIT_ALL with OPA or customer SecurityProvider for production profiles.

✓Use TLS/SASL/mTLS or secured Kafka profile instead of PLAINTEXT Kafka.

✓Protect Prometheus with loopback bind, auth, TLS, or network policy.

✓Route physical fan-out to audit, analytics, alert, review, operational, and DLQ sinks in the next increment.

Why It Resonates With Architects and Engineers

The architecture is not trying to replace the agent stack. It gives enterprise teams a consistent event boundary for governing tool calls across agent frameworks. That distinction matters because buyers can adopt it without rewriting their planners, chains, tools, or memory systems.

Audience	Value
AI platform teams	Add a reusable governance boundary without rewriting agent apps.
Security teams	Inspect blocked, alerted, restricted, and escalated tool calls in Kafka.
Compliance teams	Receive an audit record for every governed tool decision.
Enterprise architects	Externalize policy and provenance from individual agent frameworks.
Risk teams	Route high-impact or low-confidence actions to review before execution.

Current Limits

The current runtime writes a single governed evidence topic. Records carry route labels such as operational_sink, analytics_sink, alert_topic, review_queue, restricted_operational_sink, and dlq. Physical conditional fan-out to separate operational, audit, alert, review, analytics, and DLQ sinks is the next demo increment.

The governance rules in this public run are deterministic OPA-style rules. That makes the demo reproducible and safe, but production deployments should connect live OPA/Rego policy, customer-owned risk models, enterprise identity, approval workflows, and SIEM routing as needed.

Careful claim boundary: this run proves a governed event contract and evidence topic. It does not prove production multi-tenant isolation, remote security hardening, physical multi-sink fan-out, or customer-specific policy correctness.

Conclusion

The governed agent tool-audit run shows StreamKernel operating as a policy-aware event boundary for agentic systems. The pipeline normalized synthetic agent tool calls, ran in-process ONNX embedding, enriched events with deterministic governance and risk metadata, stamped provenance, and wrote 25,488 governed records to Kafka.

For architects and engineers, the important result is not only the record count. The important result is the contract: every tool action becomes a durable event with a policy decision, risk label, reason codes, route, audit record, model version, policy version, and provenance headers. That is the missing operational layer many agent stacks still leave to each application team to invent.

Final takeaway: StreamKernel makes agent behavior governable at the event boundary, without forcing enterprises to standardize on one agent framework.