YourMemory logo

YourMemory

Cut token waste by 84% with self pruning MCP memory

2026-04-21

Product Introduction

  1. Definition: YourMemory is a biologically-inspired, local-first persistent memory engine and Model Context Protocol (MCP) server designed for AI agents. It functions as a sophisticated middleware layer that manages long-term context through a hybrid architecture of vector embeddings and graph-based relationships. Operating locally on Python 3.11 – 3.14, it serves as a decentralized memory infrastructure that allows LLMs to retain project-specific knowledge without the "amnesia" common in stateless sessions or the "context hoarding" typical of simple RAG (Retrieval-Augmented Generation) systems.

  2. Core Value Proposition: YourMemory exists to solve the fundamental inefficiency of context window management in AI workflows. By implementing the Ebbinghaus Forgetting Curve logic, the system reduces token waste by up to 84% and improves recall accuracy to 52% (benchmarked via LoCoMo). Its primary purpose is to provide a "leaner context" for sharper reasoning, ensuring that AI agents prioritize relevant facts while automatically pruning stale or low-utility data. It eliminates the need for repeated manual explanations of tech stacks and project requirements, significantly lowering LLM API costs and improving developer productivity.

Main Features

  1. Biologically-Inspired Memory Pruning (Ebbinghaus Logic): YourMemory applies mathematical models of human memory decay to digital data. Every stored memory is assigned a strength value that degrades over time unless reinforced by recall. When a memory's strength falls below a 0.05 threshold, it is automatically pruned from the active retrieval set. This prevents "stale tokens" from polluting the LLM's context window. The system also features "Recall Propagation," where accessing a specific memory boosts the freshness of its connected neighbors in the graph, ensuring related clusters of information survive together.

  2. Hybrid Graph + Vector Retrieval Engine: Version 1.3.0 introduces a two-round retrieval process. Round 1 utilizes semantic vector search (via spaCy and local embeddings) to find direct matches for a query. Round 2 employs a Graph Expansion layer to surface "forgotten" context—information that is topically related but lacks direct vocabulary overlap with the query. This hybrid approach adds a 5-percentage-point increase in recall performance over pure semantic search, allowing agents to understand the broader implications of a task, such as identifying deployment environments (K8s/Docker) when only a backend language (Python) is mentioned.

  3. Multi-Agent Secure Memory Management: The system supports complex multi-agent environments where different AI entities may require shared or private knowledge bases. Using SHA-256 hashed API keys (prefixed with ym_), users can authenticate specific agents and set visibility levels to "shared" or "private." This ensures that sensitive data, such as staging keys or private implementation details, is only accessible to authorized agents, while global project standards remain available to the entire agentic workforce.

  4. Automated MCP Integration and Global Rules: Through the "yourmemory-setup" command, the tool automatically configures itself for major AI clients including Claude Code, Claude Desktop, Cursor, Windsurf, and Cline. It injects a curated instruction set into the global agent context (e.g., ~/.claude/CLAUDE.md), dictating when the agent should store, update, or recall information. This "baked-in" logic removes the need for manual prompt engineering to manage memory, as the agent is natively instructed on how to interact with the YourMemory MCP server.

Problems Solved

  1. Context Window Bloat and Token Waste: Standard AI sessions often involve sending massive amounts of redundant history to the LLM, leading to exponential token growth. YourMemory keeps the memory block flat (typically 76–91 tokens), resulting in an 84.1% reduction in tokens over 30 sessions. This solves the problem of high operational costs and the "dilution" of LLM reasoning capabilities caused by oversized prompts.

  2. Target Audience: The primary users are Software Engineers (Full-stack, DevOps), AI Researchers, and Power Users of Agentic IDEs (Cursor/Windsurf). It is specifically built for developers who manage complex, long-running projects where maintaining a "single source of truth" for the tech stack is critical across multiple coding sessions.

  3. Use Cases: Essential for maintaining consistent project architectures across different AI tools, managing multi-agent collaboration where agents need varying permission levels, and reducing "clarifying question" loops in developer workflows. It is also highly effective for local-first enthusiasts who require high-performance RAG capabilities without sending sensitive data to cloud-based memory providers like Zep or Mem0.

Unique Advantages

  1. Differentiation: Unlike cloud-dependent competitors such as Supermemory or Zep Cloud, YourMemory is 100% local, ensuring zero data leakage and zero cloud inference costs for retrieval. In the LoCoMo Recall@5 benchmark, YourMemory achieved 52% recall accuracy, significantly outperforming Supermemory (28%) and Mem0 (18%). Its ability to run the graph expansion layer entirely in-process or via an optional local Neo4j/PostgreSQL backend offers scalability without sacrificing privacy.

  2. Key Innovation: The integration of "Chain-aware pruning" is a distinct technical breakthrough. Before any memory is deleted due to age, the system checks its graph neighbors. If a related fact is still relevant and frequently accessed, the "dying" memory is preserved. This mimics biological cognitive clusters, ensuring that the system learns the structural importance of information rather than treating facts as isolated data points.

Frequently Asked Questions (FAQ)

  1. How does YourMemory improve AI agent performance? YourMemory improves agent performance by providing a high-recall (52% on LoCoMo) context layer that uses a hybrid Graph + Vector search. By pruning stale data via the Ebbinghaus curve, it ensures the agent's context window is filled only with high-utility information, which prevents reasoning errors caused by conflicting or outdated project data.

  2. Does YourMemory require an internet connection or cloud subscription? No. YourMemory is a 100% local solution. All vector searches, graph expansions, and memory pruning processes run on your local machine using Python and spaCy. There are no cloud inference costs, and your data never leaves your environment, making it compliant with strict privacy requirements.

  3. Which AI code editors and agents are compatible with YourMemory? YourMemory is built as an MCP (Model Context Protocol) server, making it natively compatible with Claude Code, Claude Desktop, Cursor, Windsurf, Cline, Continue, and Zed. The "yourmemory-setup" utility automatically detects these clients and configures the necessary JSON and Markdown files for immediate integration.

  4. How does the system handle different project stacks without getting confused? The system uses "Recall Propagation" and graph-based clustering to keep related technologies together. If you are working on a Python/MongoDB project, those memories form a cluster. When you mention "database," the graph surfaces MongoDB specifically, while unrelated "stale" facts from previous projects (e.g., a React frontend from a different task) are automatically pruned if they haven't been reinforced.

Subscribe to Our Newsletter

Get weekly curated tool recommendations and stay updated with the latest product news