PMB

Product Introduction

Definition: PMB (Persistent Memory Bank) is an open-source, offline-first, MCP (Model Context Protocol)-native memory layer for AI coding agents. It is a local middleware that stores decisions, lessons, project facts, and recent work in a single SQLite file on the user's disk, and injects that memory automatically into agent prompts during every session. It is purely Python-based (pip-installable) and works with any MCP-aware agent, including Claude Code, Cursor, Codex, Zed, Windsurf, and Gemini Copilot.
Core Value Proposition: PMB solves the fundamental "session amnesia" problem of large language model (LLM) coding assistants. Instead of forcing developers to re-explain project conventions, past bugs, and technical decisions in every new chat, PMB provides persistent, cross-tool memory that is automatically recalled before the agent reasons. The system is 100% local – no cloud, no API keys, no telemetry – and achieves recall latencies of ~35 milliseconds on typical workspaces, with zero LLM calls on the read path.

Main Features

Automatic Memory Injection via Lifecycle Hooks
PMB integrates with an agent's prompt lifecycle using MCP hooks. On every agent turn, the engine classifies the incoming message (in sub-millisecond time) and retrieves the most relevant lessons, decisions, and project overview before the model begins reasoning. After the response, PMB journalizes the agent's work – decisions, lessons learned, completed tasks – asynchronously. This eliminates the need for the agent to remember to call a tool; the memory is surfaced automatically.
Hybrid Recall Engine (BM25 + Dense Vectors + Entity Graph + RRF)
Memory retrieval uses a four-stage hybrid approach:
- BM25 lexical ranking for exact keyword matches (self-compiles a lexicon from usage traffic).
- Dense vector embeddings via sentence-transformers MiniLM-L12-v2 (supports 50+ languages, cross-lingual retrieval like a Russian query finding an English fact).
- Entity graph – live, color-coded nodes (facts, files, decisions, lessons) connected by importance scores; neighbors are highlighted on hover in the dashboard.
- Reciprocal Rank Fusion (RRF) to combine the three streams into a single ranked list.
  On the LoCoMo benchmark, PMB achieves 94.6% recall@10 and MRR of 0.774 with nDCG@10 of 0.816, all computed locally without an LLM grader.
Honest Impact Tracking – Memory Hygiene Scoring
Every stored lesson carries a surface_id. PMB automatically tracks whether the agent actually followed that advice – confirmed via activity logs or auto-detected compliance. Lessons are marked useful, unverified, or dead. Dead memories are flagged for pruning, preventing context bloat. The dashboard shows a "follow-rate" metric per lesson, enabling users to trust that memory is earning its place rather than collecting dust.
Asynchronous, Non-Blocking Writes
When the agent writes a new memory, the MCP tool returns in under 1 millisecond. The embedding computation and LanceDB vector insert run on a background thread, never blocking the agent's turn. SQLite (WAL mode) handles concurrent writes from multiple agents sharing the same workspace.
Local Dashboard with Map & Timeline
A web-based dashboard served from the local machine provides a live, interactive entity graph (Map) – color-coded nodes sized by importance, with cluster viewing and neighbor highlighting. The Timeline renders a git-graph style journal of all events (decisions, lessons, commits, failures) per project, with density rails and per-day grouping. Both are live, not mockups, and run entirely on the user's machine.
Memory Decay and Self-Tending
Memory flows through stages: Active → Read → Decay → Compact → Archive. The daemon automatically ages memories that are no longer retrieved, compacts older entries, and archives them without deletion. This ensures recall quality remains sharp even after a year of use.

Problems Solved

Pain Point – Agent Session Amnesia
Every AI coding assistant session starts with a blank slate. Developers waste time reiterating project conventions, architectural decisions, last week's bug fixes, and coding style preferences. PMB eliminates this by automatically injecting the relevant memory before the model even starts reasoning, saving an average of 5–15 minutes per session.
Target Audience
- Software developers using Claude Code, Cursor, Codex, Zed, Windsurf, or any MCP-aware agent on a daily basis.
- Tech leads and architects who maintain large codebases with many conventions and tribal knowledge.
- Open-source project maintainers wanting to share project memory across contributors without cloud dependencies.
- Privacy-sensitive teams (enterprise, legal, healthcare) that forbid sending code to third-party APIs.
Use Cases
- Multi-tool switching: A developer moves from Claude Code (CLI) to Cursor (IDE) to Zed (editor) – PMB’s shared SQLite workspace means all tools read the same memory: decisions, file conventions, past failures.
- Long-running projects: A year of development on a single product; PMB automatically decays outdated knowledge while promoting frequently recalled facts, keeping context fresh.
- Cross-lingual teams: A Russian-language query retrieves an English-language technical decision because the embedding model supports 50+ languages.
- Onboarding new contributors: New teammates inherit the project’s stored decisions, lesson history, and entity graph without reading lengthy documentation.

Unique Advantages

Differentiation vs. RAG and Vector Databases
Traditional RAG or standalone vector DBs require developers build a pipeline, define chunking strategies, handle metadata, and force the agent to call a retrieval tool manually. PMB automates the entire cycle: classification → hybrid retrieval → injection → write → scoring. It is not a generic vector store; it is a purpose-built memory layer that includes entity graphs and a self-evaluating feedback loop.
Key Innovation – Zero LLM on the Read Path
Most "memory" tools for LLMs call another model to summarize or rerank memories, adding latency and cost. PMB performs all classification, ranking, and fusion with traditional algorithms (BM25, RRF, entity graph traversals) – no LLM is invoked during recall. The only optional LLM use is for graph extraction and summarization when first storing a memory, and that can be pointed to a fully local Ollama instance. This gives PMB a fundamentally lower per-query cost ($0 per recall) and sub-30ms response times.
Radical Honesty – Impact Tracking
PMB is the rare tool that tells you when its own memory is not helping. By scoring follow-rates and flagging dead lessons, it prevents the "garbage in, garbage out" problem that plagues most persistent context systems. This transparency aligns with the E-E-A-T principle of authoritative, trustworthy information.
Portability and Durability
Everything lives in one local SQLite file with an adjacent LanceDB vector store. Copying the workspace is a simple cp. No servers, no registries, no proprietary formats. PMB is built on "boring, durable pieces" – SQLite, LanceDB, BM25, sentence-transformers – that will still be openable in five years.

Frequently Asked Questions (FAQ)

Is PMB really free and open source?
Yes. PMB is released under the Apache 2.0 license, with no paid tiers, no seats, and no telemetry. You own both the code and the data file forever. The entire source is available on GitHub for audit or contribution.
How does PMB compare to using a simple vector database like Chroma or Weaviate?
PMB is purpose-built for AI agent memory, not a generic vector store. It combines BM25, dense vectors, and an entity graph with RRF ranking – all automated via MCP hooks. It also includes impact tracking (honest follow-rate), memory decay, and a visual dashboard. Chroma/Weaviate require manual pipeline construction and maintenance.
Does PMB work offline and without any API keys?
Absolutely. PMB is 100% offline – no network calls are made on the read path. There are no API keys, no accounts, and no cloud dependency. Even embedding and graph extraction can run locally with Ollama. You can disconnect the internet and PMB continues working exactly the same.
Which AI coding agents are supported?
Any agent that implements the MCP (Model Context Protocol) standard is supported. Currently PMB has one-command integrations for Claude Code, Cursor, Codex, Zed, Windsurf, and Gemini Copilot. It also works with OpenCode and any custom MCP client.
Can multiple agents share the same memory workspace?
Yes. PMB’s SQLite database uses WAL mode to handle concurrent writes from several agents simultaneously. You can have Claude Code, Cursor, and Codex all reading and writing to the same workspace – their context follows the project, not the editor. The pmb connect command lets you point any number of agents to the same workspace file.

Stop re-explaining your project to AI coding agents

Product Introduction

Main Features

Problems Solved

Unique Advantages

Frequently Asked Questions (FAQ)

Submit to 240+ Directories with 1-Click

Related Products

Moltbot

Readdy

Floutwork

PMB

Stop re-explaining your project to AI coding agents

Product Introduction

Main Features

Problems Solved

Unique Advantages

Frequently Asked Questions (FAQ)

Submit to 240+ Directories with 1-Click

Related Products

Moltbot

Readdy

Floutwork

Subscribe to Our Newsletter