Product Introduction
1. Definition
Query Memory is an enterprise-grade AI knowledge infrastructure and Retrieval-Augmented Generation (RAG) platform designed to provide AI agents with persistent, queryable memory. It functions as a managed middleware layer that automates the complex lifecycle of document ingestion, including parsing, chunking, embedding generation, and semantic retrieval, accessible via a unified RESTful API.
2. Core Value Proposition
The primary mission of Query Memory is to eliminate the technical overhead of building and maintaining custom RAG pipelines. By turning unstructured data—such as PDFs, spreadsheets, and live websites—into structured, grounded knowledge, Query Memory enables developers to build AI agents that are highly accurate, context-aware, and free from the hallucinations typically associated with large language models (LLMs). It optimizes "time-to-intelligence" by providing a production-ready stack for grounded AI retrieval.
Main Features
1. Unified Multi-Format Document Parsing
Query Memory features a sophisticated ingestion engine capable of processing diverse data types, including PDF, DOCX, XLSX/CSV, HTML, and Markdown. Unlike standard text extractors, this feature uses structural analysis to identify headings, tables, and narrative flow. It transforms raw files into "clean structured elements," ensuring that the hierarchical integrity of a document is preserved before it is passed to an AI model, which is critical for maintaining context in complex technical manuals or financial reports.
2. Semantic Knowledge Indexing and Retrieval
The platform automates the creation of vector-searchable knowledge bases. Once documents are parsed, Query Memory handles the chunking strategies and embedding processes using high-performance vector databases. The "Knowledge Query" API allows developers to perform natural language searches against these indexed collections, returning the top-k most relevant text chunks and context snippets. This ensures that AI agents have immediate access to specific, factual information from large datasets without requiring manual database management.
3. Grounded Agent Endpoints
Query Memory provides specialized API endpoints for deploying "Grounded AI Agents." These agents are configured with specific system prompts and directly attached to one or more knowledge bases. At runtime, the platform enforces agent behavior to ensure answers are strictly derived from the connected data (grounding). It supports model controls, conversation history management, and multi-source attribution, allowing agents to cite their sources (e.g., specific document names or URLs) for auditability.
4. Domain-Aware Web Parsing
The Web Parsing API allows for the live extraction of external data from specific domains (e.g., genome.gov, nih.gov). It converts HTML content into clean Markdown or JSON, preserving document structure while filtering out web noise. This enables agents to augment their internal knowledge with live, public web context, performing ranked retrieval on real-time data to ensure information stays current.
Problems Solved
1. Complexity of RAG Pipeline Engineering
Building a reliable RAG system requires integrating disparate tools for OCR, text splitting, embedding models, and vector stores. Query Memory solves this by providing an "all-in-one" stack, reducing weeks of engineering effort into a single API call. It addresses the "chunking problem"—where poor data segmentation leads to loss of context—through intelligent structural parsing.
2. Target Audience
- AI Engineers and Developers: Seeking to build production-ready AI applications without managing vector database infrastructure.
- Enterprise Data Architects: Needing a secure, scalable way to make internal documentation accessible to LLMs.
- SaaS Product Managers: Looking to integrate "Chat with your Data" features into existing software platforms.
- Bioinformatics and Research Teams: Requiring precise retrieval from high-density technical papers and clinical datasets.
3. Use Cases
- Customer Support Automation: Connecting help center PDFs and SOPs to an agent to provide instant, cited answers to user queries.
- Technical Research Assistants: Ingesting thousands of research papers (e.g., genomic studies) to allow researchers to query specific findings or methodologies.
- Internal Knowledge Management: Turning company wikis, spreadsheets, and Slack exports into a searchable corporate brain.
- Regulatory Compliance: Using agents to scan and summarize legal or clinical documents while ensuring every claim is grounded in official documentation.
Unique Advantages
1. Superior Throughput and Scale
Query Memory outperforms self-hosted RAG baselines and other hosted services, achieving a 90% throughput benchmark. It is built for high-throughput ingestion and low-latency retrieval, making it suitable for applications that require processing thousands of requests across massive datasets.
2. Unified API Surface
While competitors often separate document processing from retrieval, Query Memory uses a consistent RESTful pattern across parsing, knowledge management, and agent interaction. This "API-first" workflow allows developers to move from raw file upload to a functioning agent query in minutes using standard cURL, Python, or JavaScript integrations.
3. Multi-Source Context Stacking
The platform allows agents to retrieve context from "stacked sources"—simultaneously querying PDFs, live web data, and spreadsheets. This synthesis capability ensures that the AI's response is a composite of the most relevant information available, regardless of the original data format.
Frequently Asked Questions (FAQ)
1. How does Query Memory prevent AI hallucinations?
Query Memory prevents hallucinations through "grounding." By connecting AI agents to specific knowledge bases, the platform forces the LLM to use only the provided document context to generate answers. If the information is missing from the indexed files, the agent is instructed to state that the context is insufficient rather than fabricating an answer.
2. Can I integrate Query Memory with my existing LLM providers?
Yes. Query Memory acts as the retrieval layer (the "memory") for your AI. Through its Agent API, you can configure various models (such as the Inception or Mercury series) and connect them to your data. The platform provides the relevant context to the model, which can then be used in conjunction with your preferred AI workflow.
3. What file types are supported for AI knowledge base creation?
Query Memory supports a wide array of document formats including PDF, DOCX (Word), XLSX/CSV (Excel), HTML, Markdown, and plain text. It also handles media and images by parsing them into structured elements suitable for semantic search and AI processing.
4. Is there a free tier for developers to test the RAG API?
Yes, Query Memory offers a Free plan for evaluation and small projects. This plan includes 25 document parses per month, 1 knowledge base, 1 agent, and 50 web parsing URLs, providing a full-featured sandbox for developers to test the RAG pipeline before scaling to the Pro or Enterprise tiers.
