Agentmemory logo

Agentmemory

#1 Persistent memory for Codex, Hermes, OpenClaw, Claude ++

2026-05-16

Product Introduction

  1. Definition: Agentmemory is a comprehensive memory runtime layer for AI-powered coding agents and large language model (LLM) applications. Technically, it is a self-contained, open-source system for capturing, compressing, storing, retrieving, and consolidating the contextual history of AI coding sessions.
  2. Core Value Proposition: It exists to solve the critical context window limitation in AI assistants like Claude Code and Hermes, enabling infinite memory and long-term context retention. Its primary value is drastically reducing token usage (up to 95% fewer input tokens) while keeping 100% of session history searchable, thereby allowing for 200x more tool calls before hitting context limits.

Main Features

  1. Auto-Capture Hooks: Twelve automated hooks seamlessly integrate with popular coding agents (Claude Code, Codex CLI, Hermes) to capture every event—PreToolUse, PostToolUse, SessionStart, Stop—without manual coding. This creates a continuous stream of compressed observations from every interaction.
  2. Triple-Stream Hybrid Retrieval: Implements a hybrid search system combining BM25 (lexical), vector (semantic), and knowledge graph (relational) retrieval. Results are reranked on-device, achieving a 95.2% retrieval recall rate (R@5) on the LongMemEval-S benchmark with a P50 latency under 20ms on a laptop.
  3. Automatic Consolidation Engine: An hourly background process compresses raw observations into semantic memories. It deduplicates entries, decays stale data based on retention scoring, and emits audit logs for every deletion, transforming noisy session data into a clean, actionable knowledge base.
  4. Zero-External-Database Architecture: The entire runtime operates as a single Node.js process. It does not require external databases like Redis, Kafka, Postgres, Qdrant, or Neo4j. All state is persisted on disk as JSON, simplifying deployment and ensuring data locality.
  5. Universal MCP & REST API Surface: Provides full memory functionality through 51 Model Context Protocol (MCP) tools and 121 corresponding REST endpoints (/agentmemory/*). This allows any MCP-compatible client (Claude Desktop, Cursor, etc.) or custom application to save, recall, and search memories via a standardized interface.
  6. Built-In Observability & UI: Includes first-class observability via OpenTelemetry (OTEL) for traces and logs. Ships with two UIs: a real-time memory viewer on port 3113 for streaming observations, session replay, and knowledge graph visualization, and an engine console on port 3114 for debugging functions, triggers, and spans.

Problems Solved

  1. Pain Point: The finite context window of LLMs causes AI coding assistants to "forget" earlier parts of a long session, making past decisions, tool calls, and code changes invisible and leading to repetitive or contradictory actions.
  2. Target Audience: Developers and engineers using AI-powered coding assistants (Claude Code, Cursor, Codex, Hermes), AI tool builders integrating MCP, and teams needing persistent, searchable memory for long-running AI agent workflows.
  3. Use Cases: Essential for multi-hour coding sessions where context must be maintained; for teams building complex AI agents that require state persistence across sessions; for developers needing to audit and replay an AI's decision-making history; and for projects where data privacy mandates local, off-cloud memory storage.

Unique Advantages

  1. Differentiation: Compared to alternatives like Mem0 or Lettace, Agentmemory offers superior retrieval accuracy (95.2% R@5), requires zero external dependencies, and provides a vastly more extensive integration surface (121 REST endpoints, 51 MCP tools, 12 auto-hooks). It is a complete runtime, not just a library or vector store.
  2. Key Innovation: The iii Engine architecture, where every memory operation is modeled as a worker, function, or trigger within a single process. This eliminates the complexity and latency of microservices and external databases. The triple-stream hybrid retrieval with on-device reranking is a technical innovation that delivers high accuracy with minimal infrastructure.

Frequently Asked Questions (FAQ)

  1. How does Agentmemory reduce LLM context token usage? Agentmemory compresses raw session observations (like tool calls and prompts) into semantic memories. During a session, it recalls only the most relevant, compressed memories instead of the entire raw history, reducing token count by up to 95% while maintaining searchable access to all data.
  2. Is Agentmemory data stored locally or in the cloud? All data is stored locally on your machine. Agentmemory runs as a single Node process with no external databases; state is saved to disk as JSON. This ensures full data privacy, sovereignty, and offline operation.
  3. Which AI coding assistants are compatible with Agentmemory? It offers native plugins for Claude Code, Codex CLI, Hermes, pi, and OpenHuman. Through its universal MCP server, it is compatible with any MCP client, including Claude Desktop, Cursor, Gemini CLI, Windsurf, Cline, and Roo Code.
  4. What is the difference between Agentmemory and a simple vector database? Agentmemory is a complete memory runtime, not just a vector store. It includes automated data capture (hooks), multi-modal retrieval (BM25+vector+graph), automatic consolidation/compression, a built-in UI, and an MCP/REST API layer—functionality you would need to manually build around a basic vector database.
  5. How do I visualize the memories stored by Agentmemory? The package includes a built-in real-time viewer accessible at http://localhost:3113. This dashboard shows a live observation stream, allows you to browse and filter memories, replay past sessions, and visualize the extracted knowledge graph as an interactive force-directed diagram.

Subscribe to Our Newsletter

Get weekly curated tool recommendations and stay updated with the latest product news