Circuit Tracer

Circuit Tracer is an open-source tool developed by Anthropic to visualize the internal computations of large language models (LLMs) as attribution graphs.
The core value of Circuit Tracer lies in enhancing AI transparency by enabling researchers to trace and analyze the decision-making pathways within LLMs.

Circuit Tracer generates attribution graphs that partially reveal the internal steps a model uses to produce specific outputs, supporting popular open-weights models like Gemma-2-2b and Llama-3.2-1b.
The tool integrates with Neuronpedia’s interactive frontend, allowing users to explore, annotate, and share attribution graphs through a visual interface.
Researchers can test hypotheses by modifying feature values in the attribution graphs and observing real-time changes in model outputs, enabling iterative analysis of model behavior.

Circuit Tracer addresses the critical challenge of limited visibility into the internal reasoning processes of AI models, which hinders interpretability and safety evaluations.
The product primarily targets AI researchers, machine learning engineers, and transparency-focused organizations seeking to audit or improve LLM behavior.
Typical use cases include analyzing multi-step reasoning in models, studying multilingual representations, and identifying circuits responsible for specific output patterns.

Unlike proprietary interpretability tools, Circuit Tracer is fully open-source and supports community-driven extensions, enabling collaborative research on model internals.
The integration with Neuronpedia provides a unique interactive visualization layer not found in similar tools, combining graph exploration with annotation and sharing capabilities.
Competitive advantages include pre-analyzed attribution graphs for common model behaviors, transcoders trained through the GemmaScope project, and compatibility with Anthropic’s research ecosystem.

Which models are currently supported by Circuit Tracer? Circuit Tracer supports Gemma-2-2b and Llama-3.2-1b, with attribution graph generation implemented through the open-source library and demo notebooks.
How can I interact with the attribution graphs? Users can explore graphs interactively via Neuronpedia’s hosted frontend or modify feature values programmatically using the provided Python library and API integrations.
Can researchers contribute to improving Circuit Tracer? The tool welcomes community contributions through GitHub issues and extensions, with demonstrated use cases including circuit modification experiments and transcoder optimizations.

Anthropic's open tools to see how AI thinks