LunaRoute

LunaRoute is a high-performance local proxy designed for AI coding assistants such as Claude Code, OpenAI Codex CLI, and OpenCode, acting as an intermediary between developers' tools and LLM providers. It operates as a zero-overhead passthrough proxy while providing full visibility into all AI interactions through session recording and debugging capabilities. The solution supports multiple API dialects including Anthropic and OpenAI formats simultaneously without requiring normalization.
The core value lies in enabling complete observability of AI-assisted coding workflows while maintaining native API compatibility and sub-millisecond latency. It provides security teams with automatic PII redaction capabilities and offers developers detailed analytics on token usage, tool performance, and session history through dual storage formats (SQLite for metadata and JSONL for full logs).

Zero-overhead passthrough proxy mode delivers 0.1-0.2ms latency penalty while preserving 100% API fidelity for both Anthropic and OpenAI formats through zero-copy routing. This feature supports simultaneous operation with multiple AI tools through dual-dialect configuration, handling /v1/messages (Anthropic) and /v1/chat/completions (OpenAI) endpoints concurrently without protocol translation.
Comprehensive session recording system captures full request/response cycles with asynchronous writes to JSONL files (10KB/request average) and SQLite databases (1-2KB/session). The system implements automatic retention policies including 30-day age-based cleanup and 1GB size limits, with Zstd compression applied after 7 days for storage optimization.
Built-in PII redaction engine detects 15+ sensitive data patterns including emails, phone numbers, and credit cards using regex and Luhn validation. Redaction modes offer four operational choices: masking ([EMAIL]), complete removal, HMAC-based tokenization for reversible operations, and partial disclosure (last 4 digits), all applied before disk storage to ensure compliance.

Addresses the lack of visibility in AI-assisted coding workflows by providing granular metrics on token consumption (input/output/thinking tokens), tool call latency, and provider response times. Developers gain insights into cost drivers through detailed session breakdowns showing token distribution across multiple requests within a single coding session.
Targets development teams working with multiple AI coding assistants who require unified monitoring across Claude Code, Codex CLI, and custom implementations. The solution particularly benefits regulated industries needing audit trails, with security teams leveraging its pre-storage redaction capabilities to meet data protection requirements.
Solves infrastructure complexity in mixed AI environments by serving as a single proxy endpoint for diverse tools, eliminating the need for separate monitoring solutions. Common use cases include debugging expensive API sessions (>10k tokens), identifying slow tool integrations (Bash command bottlenecks), and reproducing past solutions through session replay capabilities.

Differentiates from generic API proxies through native understanding of AI coding workflows, including automatic parsing of Codex CLI's ~/.codex/auth.json credentials and Claude's x-api-key header propagation. This specialization enables true zero-configuration deployment where client tools automatically provide authentication credentials.
Implements dual storage architecture combining SQLite for queryable metadata (fast aggregation of token costs by model) and JSONL for full conversation history. The system offers deterministic session grouping through UUIDv4 identifiers while maintaining temporal organization through daily directory partitioning (~/.lunaroute/sessions/YYYY-MM-DD/).
Provides competitive edge through Rust-based performance optimizations including connection pooling (32 idle connections per host), async I/O batching (100ms write intervals), and SIMD-accelerated PII detection. The proxy maintains <1% CPU utilization under 100 RPS load while supporting Prometheus metrics export for integration with existing monitoring stacks.

How does LunaRoute handle API keys for different providers? LunaRoute requires no preconfigured API keys, leveraging client-supplied authentication through Codex CLI's local auth.json file for OpenAI and x-api-key headers for Anthropic. For custom deployments, environment variables like ${GOOGLE_API_KEY} can be injected through ${} syntax in configuration files.
What storage requirements exist for session recording? The system defaults to ~1MB per 100 sessions (SQLite) and ~1MB per 100 requests (JSONL), with Zstd compression reducing JSONL size by 10x after 7 days. Administrators can configure retention policies through max_age_days (default 30) and max_size_mb (default 1024) parameters.
Can LunaRoute operate alongside existing proxies? Yes, the solution runs as a separate service on configurable ports (default 8081 for API, 8082 for UI), supporting parallel operation with other proxies. It implements HTTP/1.1 and SSE (Server-Sent Events) compatibility for streaming responses without blocking other network services.

High-perf, secure local proxy for AI coding assistants