Tenure

Product Introduction

Definition: Tenure is a local-first, privacy-focused long-term memory (LTM) proxy for Large Language Models (LLMs). It acts as a middleware layer that sits between OpenAI-compatible chat clients (like Open WebUI, LM Studio, or custom frontends) and upstream LLM providers (OpenAI, Anthropic Claude via Bedrock, LiteLLM). Its core function is to build, maintain, and inject a structured, editable "world model" of a user's preferences, expertise, and decisions into every LLM session, eliminating the need for repetitive context re-briefing.
Core Value Proposition: Tenure exists to solve the pervasive "context reset" problem in iterative AI workflows. It ensures that every new chat session begins with the LLM already aware of the user's established technical stack, communication style, past decisions, and ruled-out approaches, transforming the AI from a "stranger" into a knowledgeable collaborator. Its primary value is in privacy-first LLM memory, structured belief injection, and zero-configuration session continuity.

Main Features

Structured Belief Store (Beyond RAG): Tenure does not store raw conversation transcripts. Instead, it uses an asynchronous extraction worker to parse LLM responses and distill them into structured "beliefs" categorized as Preferences, Decisions, Entities, Open Questions, and Expertise. Each belief includes a why_it_matters field, which provides direct, actionable instructions for the LLM (e.g., "shapes all code examples toward TypeScript"). Retrieval uses alias-weighted term matching over traditional semantic similarity search, ensuring precise recall of named concepts (like "MongoDB raw driver") rather than returning everything in a semantic neighborhood.
Scoped Context Injection & Domain Hierarchy: To prevent context bleed between unrelated projects, Tenure organizes beliefs into a three-level hierarchy: Universal (communication style), Domain (e.g., domain:code, domain:writing), and Project (specific to a named project). The system automatically infers scope from session content or allows manual setting via commands like !scope domain:code/typescript. During a request, it assembles a curated, token-budgeted slice of relevant beliefs and injects them into the system prompt alongside a minimal recent conversation history.
Fully Local & Encrypted Operation: Tenure is designed as a local AI memory system. It runs entirely on the user's machine (localhost:5757), and no belief data or conversation content is transmitted to any external cloud service. All stored belief content is encrypted at rest. The system provides a transparent admin UI (/beliefs) where every belief is visible, editable, pinnable, and correctable, ensuring users have full auditability and control over their AI profile.
Drop-in OpenAI API Compatibility & Prompt Caching: Tenure presents a fully compliant OpenAI API endpoint at /v1. Any client configured to use an OpenAI API can be pointed at Tenure's local address with a bearer token, requiring no plugins or custom integration. To optimize token usage and cost, it implements prompt caching on static instruction and belief tiers. This means the lengthy, detailed system prompt is paid for once per session rather than on every turn, while remaining beliefs are dynamically retrieved and injected within a token budget.

Problems Solved

Pain Point: The "Monday Morning" Context Reset. Every new LLM session starts from a blank slate, forcing users to repeatedly re-explain their technical stack, project decisions, writing style, and preferences. This leads to wasted tokens, time, and frustration when the model provides generic or misaligned responses.
Target Audience: The product is essential for professionals and enthusiasts whose work compounds across multiple AI sessions. Key personas include: Software Engineers (re-explaining stack: TypeScript, Fastify, no ORMs), Data Scientists (tracking modeling decisions and dataset quirks), Writers & Creatives (maintaining character bibles and narrative consistency), Researchers & Students (keeping thesis angles and ruled-out sources in context), and Consultants (switching cleanly between client-specific contexts).
Use Cases:
- A developer asking "How should I structure my repository layer?" in a new session and immediately receiving a TypeScript/MongoDB raw driver example aligned with their previously stated preferences, without any re-briefing.
- A writer maintaining consistent character voice across dozens of separate chat sessions about a novel.
- A researcher ensuring that an LLM assistant remembers which methodological approaches have already been deemed invalid for the current project.

Unique Advantages

Differentiation vs. Traditional Memory/RAG: Unlike systems that simply dump conversation history or use vector similarity search (which returns semantically "close" but potentially irrelevant items), Tenure focuses on structured, actionable beliefs and precision term matching. This means it retrieves exactly what you named (e.g., your decision to avoid Mongoose) rather than everything related to databases. It also differs by being a transparent, user-editable system, not a hidden profiling black box.
Key Innovation: The why_it_matters Field and Belief Compaction. The core technical innovation is structuring extracted knowledge not as inert facts but as direct instructions for the LLM via the why_it_matters field. This converts observations into immediately executable guidance. Coupled with automatic belief compaction (which can be tuned to Aggressive, Conservative, or Off modes), the system maintains a growing yet efficient and relevant world model without linear, unbounded history bloat.

Frequently Asked Questions (FAQ)

How does Tenure's memory system differ from ChatGPT's memory feature? Tenure is a self-hosted, local-first proxy that gives you complete visibility and editorial control over every stored "belief." Your data never leaves your machine and is encrypted at rest. In contrast, cloud-based memory features are opaque, not editable, and send your data to the provider's servers.
Is Tenure compatible with LM Studio and Open WebUI? Yes, Tenure offers full drop-in compatibility. You simply configure your OpenAI-compatible client (like LM Studio or Open WebUI) to use the Tenure proxy endpoint (http://localhost:5757/v1) and provide your API token. The client operates normally, unaware it is routing through Tenure's memory layer.
What LLM models are required for Tenure to work effectively? Tenure's asynchronous belief extraction worker requires LLMs with reliable structured output capabilities. It is verified to work best with models like Claude 4.5+, GPT-4o-mini+, OpenAI o3/o4-mini+, and Amazon Nova Pro. It routes to any OpenAI-compatible endpoint serving these models.
Can I pause Tenure's memory extraction during a sensitive chat? Yes. You can pause extraction globally from the Settings UI, or dynamically within any chat session using natural commands: !extract off pauses for the current session, and !extract on resumes. This allows full control without breaking workflow.
How do I import my existing knowledge into Tenure? The quick start process includes an onboarding step where you can seed your world model by answering questions. You can also directly import existing documents (skills files, bios, notes) during setup, giving Tenure a "cold start" knowledge base to begin refining from your first session.

Local AI memory that knows what you chose and why

Product Introduction

Main Features

Problems Solved

Unique Advantages

Frequently Asked Questions (FAQ)

Related Products

Moltbot

Floutwork

Recall Augmented Browsing

Tenure

Local AI memory that knows what you chose and why

Product Introduction

Main Features

Problems Solved

Unique Advantages

Frequently Asked Questions (FAQ)

Related Products

Moltbot

Floutwork

Recall Augmented Browsing

Subscribe to Our Newsletter