Product Introduction
- Definition: LineageLens is a self-hosted AI code provenance and governance platform. Technically, it is a local HTTP proxy (port 8788) combined with a VS Code extension that captures, correlates, and analyzes every insertion of AI-generated code into a developer's codebase.
- Core Value Proposition: It exists to solve the critical visibility gap in AI-assisted development by providing a complete, auditable lineage for every AI-generated code block. It answers the essential questions: "Which prompt wrote this code?", "Which AI model generated it?", and "What potential risks does it contain?"—all while ensuring data never leaves the user's infrastructure.
Main Features
- AI Code Insertion Detection & Capture: The VS Code extension monitors for insertions of four or more lines of code from any integrated AI tool. It works by correlating code changes with LLM traffic intercepted by the local proxy within a ±15-second window using timing and content similarity algorithms.
- Transparent Local Proxy: The core of LineageLens is a lightweight proxy server running on
localhost:8788. Developers configure their AI tools (Cursor, Copilot Chat, Claude Code, etc.) to use this proxy as their API endpoint. It silently intercepts, logs, and forwards all LLM request/response traffic without impacting latency. - Multi-Tool Adapter & Correlation Engine: The system includes 11 pre-built adapters for popular AI coding agents (Cursor, GitHub Copilot, Claude Code, Aider, Continue.dev, etc.). These adapters parse tool-specific API formats to accurately tag provenance records with the correct source tool and AI model (e.g., GPT-4, Claude 3.5 Sonnet).
- Static Analysis Risk Scoring Engine: Every captured code insertion is automatically analyzed using Abstract Syntax Tree (AST) parsing and regex patterns. It flags security and quality issues such as hardcoded secrets, potential SQL injection vectors, weak cryptographic functions, and shell command execution, assigning a risk score (Low, Medium, High, Critical).
- Self-Hosted Data Storage & Governance Dashboard: For teams (Plus/Max plans), LineageLens deploys as a Docker stack containing a FastAPI backend and a PostgreSQL database (with pgvector for semantic search). This provides a centralized dashboard for timeline views, risk charts, team management with JWT authentication, and CSV export for compliance audits.
- Cross-Tool Lineage Graph (Max Plan): The enterprise-tier integrates a Neo4j graph database to create visual, queryable provenance graphs. This allows tracing the ancestry of code across different AI tools, development sessions, and multiple developers, which is critical for root-cause analysis in complex projects.
Problems Solved
- Pain Point - Security & Compliance Blind Spots: Organizations cannot see or audit the AI-generated code proliferating in their codebases, creating risks like shipped secrets, vulnerabilities, and compliance failures (SOC 2, GDPR). LineageLens provides the structured, append-only audit trail required for evidence.
- Pain Point - Tool Sprawl & Lack of Unified Signal: Development teams use multiple AI tools (Cursor, Copilot, CLI tools), leading to fragmented, untraceable code contributions. LineageLens unifies provenance data across all tools into a single system of record.
- Target Audience: Senior Engineers & Tech Leads who need to understand their codebase's composition; Engineering Managers responsible for team output quality and incident response; VP Engineering & Compliance Officers in regulated industries (Fintech, Healthtech) who must demonstrate governance over AI-assisted development.
- Use Cases: Post-Incident Root Cause Analysis to trace a bug back to the exact AI prompt and model that generated it. Compliance Audit Preparation to provide documented proof of AI code review and risk assessment processes. Team Productivity & Quality Analysis to correlate AI tool usage with defect rates and review cycles.
Unique Advantages
- Differentiation: Unlike basic git history or code review tools, LineageLens captures the prompt and model context, which is permanently lost in other workflows. Compared to cloud-based AI analytics platforms, its fully self-hosted architecture guarantees zero data exfiltration, addressing paramount enterprise security and data residency concerns.
- Key Innovation: Its correlation engine that matches IDE code insertions with proxy-captured LLM traffic using a hybrid timing-and-content heuristic is a novel technical approach to solving the provenance problem. Furthermore, its extensible adapter system for 11+ AI tools provides out-of-the-box compatibility unmatched by niche or single-tool solutions.
Frequently Asked Questions (FAQ)
- Does LineageLens send my code or prompts to the cloud? No. LineageLens is architected as a fully self-hosted platform. The free Base tier stores data locally as JSON files. The Plus and Max tiers are deployed to your own infrastructure (e.g., your VPC), with data stored in your own PostgreSQL and Neo4j databases. No code, prompts, or provenance records are sent to LineageLens servers.
- How does LineageLens work with Cursor and GitHub Copilot? For Cursor, you configure its custom AI provider setting to point to the local LineageLens proxy. For GitHub Copilot Chat in VS Code, you set the proxy endpoint in its configuration. LineageLens includes dedicated adapters that parse the specific API formats of these tools to correctly identify them and capture prompts/completions.
- What is the performance impact on my AI coding tools? The local proxy adds minimal latency (typically <10ms) as it runs on
localhost. It functions as a pass-through, logging requests and responses asynchronously. Users report no perceptible slowdown in the response time of Copilot, Cursor, or Claude Code. - Can I use LineageLens with multiple AI tools simultaneously? Yes. Once configured, the single proxy on port 8788 will capture traffic from all supported AI tools that are routed through it. The correlation engine and adapters will correctly tag each provenance record with its source tool (e.g., "Cursor," "Claude Code CLI").
- What happens to my provenance data if I stop using LineageLens? You retain full control and ownership of all your data. In the Base tier, data resides in local JSON files on your machine. In the Plus/Max tiers, data resides in your self-hosted databases. If you decommission the service, your historical provenance records remain in your possession.
