Heron  logo

Heron

Wireshark for AI Agents: passive eBPF observability

2026-06-25

Product Introduction

  1. Definition: Heron is a passive network analyzer and agent observability platform built for monitoring and reconstructing the behavior of AI agents in production. It operates as a standalone, off-path Rust application that leverages eBPF for deep packet inspection of TLS-encrypted LLM traffic.
  2. Core Value Proposition: Heron provides zero-instrumentation, agent-agnostic observability by capturing and analyzing LLM API calls directly from the network wire. It exists to give developers and operators a truthful, reconstructed narrative of what their AI agents, tools, and multi-step workflows are actually doing in production, without inserting SDKs or proxies into the critical request path.

Main Features

  1. Passive Network Capture & eBPF TLS Inspection: Heron uses libpcap for live network interface capture or .pcap file replay and employs eBPF SSL uprobes to decrypt and read TLS-encrypted traffic directly on the host. This allows it to see plaintext HTTP and SSE streams (e.g., /v1/messages) without client cooperation or breaking encryption, identifying the specific process that initiated each LLM call.
  2. Agent Turn Reconstruction Pipeline: The core pipeline processes captured bytes through a sequence: packet capture → HTTP/SSE parsing → wire-API detection (for OpenAI, Anthropic, Azure, Gemini) → semantic extraction of tool calls, results, and planner steps → agent-turn assembly. This stitches multi-call interactions (planner → tool → planner → tool) into a single, addressable session turn, automatically folding multi-leg proxy hops.
  3. Agent-Agnostic with Named Turn Profiles: Heron is designed to be agent-framework agnostic, reconstructing interactions from Claude Code, OpenAI Codex, Hermes, OpenClaw, or any custom agent using standard LLM APIs. It includes named profiles for known agents to sharpen stitching accuracy, while falling back to a generic profile for any other traffic, making it universally applicable.
  4. Structured Metrics & Analytics Console: It aggregates eight key metrics—TTFT (Time to First Token), E2E Latency, TPOT (Time Per Output Token), Call Rate, Token Throughput (TOK/s), Active Calls, Error Rate, and Cache Hit Ratio—in sliding windows, per model and route. Data is stored in DuckDB (default) or optional PostgreSQL/TimescaleDB/ClickHouse and served through a React console on localhost:3000.
  5. Training Data Export: Heron can export supervised fine-tuning (SFT) trajectory data, converting reconstructed turns and sessions into messages JSONL format with fully rehydrated tool call and result arguments, enabling one-click export of production interaction data for model improvement.

Problems Solved

  1. Pain Point: Invisible agent failures and opaque production behavior. Agent code often appears correct in development but fails in production due to tool call stalls, planner loops, silent model substitutions, or retry storms. Traditional logs show "200 OK" but hide the true performance bottleneck, latency source, or failure root cause within multi-step LLM workflows.
  2. Target Audience: AI/ML Engineers, Platform/SRE Teams, and Developers building or operating autonomous agents and complex LLM pipelines. Specifically, those using frameworks like LangChain, AutoGen, or building custom agents on OpenAI/Anthropic APIs who need production debugging and performance analytics without modifying their agent codebase.
  3. Use Cases: Debugging production AI agent failures where logs are insufficient; optimizing LLM application performance by pinpointing latency in tool calls or planner loops; validating agent behavior to ensure it adheres to expected workflows; gathering real-world interaction data for SFT dataset creation; and auditing which models or API routes are being used across a distributed system.

Unique Advantages

  1. Differentiation: Unlike SDK instrumentation, Heron requires zero client code changes or cooperation. Compared to a reverse proxy (e.g., LiteLLM), it is never in the request path, so its failure cannot cause an outage. Versus server-side OpenTelemetry, it sees full request/response bodies and reconstructs cross-service agent narratives without requiring the server to emit complex, custom spans.
  2. Key Innovation: The specific technological innovation is the agent-agnostic turn assembly from passive wire data. By combining eBPF for TLS decryption, a lock-free flow dispatcher for high-performance parsing, and a semantic stitching engine that understands LLM API patterns and common agent workflows, Heron can automatically reconstruct a high-level narrative of agent activity from low-level network packets alone.

Frequently Asked Questions (FAQ)

  1. How does Heron work without an SDK or proxy? Heron uses eBPF SSL uprobes to intercept and decrypt TLS traffic directly on the Linux host where it's installed, combined with libpcap for packet capture. It then parses the decrypted HTTP/SSE traffic, detects LLM API wire formats, and reconstructs agent workflows—all without being inserted into the request path or requiring any changes to client code.
  2. Which LLM providers and agent frameworks does Heron support? Heron supports traffic from OpenAI, Anthropic, Azure OpenAI, Gemini, vLLM, and Ollama (OpenAI-compatible). It is agent-framework agnostic but has optimized named profiles for Claude Code, OpenAI Codex, and Hermes to improve workflow stitching; all other agent traffic is handled by a robust generic profile.
  3. What are the deployment prerequisites and performance implications? Heron requires Linux (with eBPF support) or macOS, and must be installed where traffic is already TLS-terminated (e.g., behind a load balancer, on an inference host). It has a minimal footprint as a passive, off-path observer; a failure in Heron does not affect the LLM calls it monitors, and its Rust/Tokio core is designed for high-throughput, low-latency packet processing.
  4. What metrics does Heron provide for LLM monitoring? It provides eight core metrics: Time to First Token (TTFT), End-to-End (E2E) Latency, Time Per Output Token (TPOT), Call Rate, Token Throughput (TOK/s), Active Calls, Call Error Rate, and Cache Hit Ratio. These are aggregated in sliding windows and broken down per model and route, offering actionable insights for operations, development, and business teams.
  5. Can Heron be used to create training data for fine-tuning models? Yes. A key capability is SFT trajectory export, which converts Heron's reconstructed agent turns and sessions into messages JSONL format. This includes full tool call arguments and results, rehydrated from the captured traffic, enabling one-click batch export of high-quality, real-world interaction data for supervised fine-tuning datasets.

Submit to 240+ Directories with 1-Click

Maximize your product's SEO and drive massive traffic by automatically submitting it to over 240 curated startup directories using DirSubmit.

Subscribe to Our Newsletter

Get weekly curated tool recommendations and stay updated with the latest product news