Caveman

Definition: Caveman is a specialized token-reduction framework and AI agent plugin designed to optimize communication between Large Language Models (LLMs) and developers. Technically categorized as an instruction-tuning and prompt-compression layer, Caveman integrates directly into coding agents like Claude Code, Cursor, Windsurf, and GitHub Copilot to enforce extreme brevity without compromising technical integrity.
Core Value Proposition: Caveman exists to solve the inefficiency of verbose LLM outputs, which consume excessive API credits and increase latency. By implementing a "caveman-speak" communication protocol, it slashes output tokens by approximately 75% and reduces input context by nearly 46% through its proprietary compression tools. It is built for developers who prioritize speed, technical accuracy, and cost-efficiency over conversational fluff.

Multi-Level Grunt Intensity: Caveman provides four distinct tiers of communication density: Lite (professional but fluff-free), Full (default caveman-speak, dropping articles and fragments), Ultra (maximum telegraphic compression using abbreviations), and 文言文 (Wenyan Mode). The Wenyan mode utilizes Classical Chinese literary structures, which represent the most token-efficient written language in human history, to further minimize the computational footprint of technical explanations.
caveman-compress (Input Optimization): This utility automates the rewriting of local memory files, such as CLAUDE.md or project notes, into a hyper-compressed format. By stripping prose while preserving code blocks, URLs, and technical schematics, it reduces the "session-start" token overhead by an average of 46%. This ensures the AI agent has a smaller context window to process, leading to faster reasoning and lower costs.
Automated Agent Hooks and Integration: Caveman features a one-line installation process via npx and specialized shells for various IDEs. It includes SessionStart hooks for Claude Code and .clinerules for Cline, ensuring the "terse" behavior is persistent across sessions without requiring manual re-triggering. It also injects custom slash commands like /caveman-commit for terse conventional commits and /caveman-review for one-line PR feedback.

LLM Token Inflation and API Costs: Standard LLM behavior often includes "throat-clearing" (e.g., "I'd be happy to help you with that") and redundant explanations. For high-volume users of Claude 3.5 Sonnet or GPT-4o, this results in significant financial waste. Caveman eliminates filler tokens, directly impacting the bottom line for teams using usage-based AI billing.
Cognitive Overload and "Wall of Text": Developers often have to sift through paragraphs of prose to find a single line of code or a specific technical reason for a bug. Caveman solves this by delivering the "answer-first" in a fragmented, highly readable format that emphasizes technical substance over grammatical perfection.
Accuracy Degradation in Long Responses: Referencing the March 2026 research "Brevity Constraints Reverse Performance Hierarchies in Language Models," Caveman addresses the phenomenon where verbose models actually lose accuracy. By forcing the model into a constrained, brief output mode, it can actually improve technical precision by focusing the model's attention on the core logic rather than prose generation.
Target Audience: The primary users include Software Engineers, DevOps Professionals, and AI Power Users who utilize tools like Claude Code, Cursor, Windsurf, and Copilot. It is particularly essential for developers working in terminal-based environments where screen real estate is limited and scrolling through verbose AI responses is disruptive.

Systematic Differentiation: Unlike simple "be brief" system prompts, Caveman is a full-featured ecosystem. It includes state management (intensity levels persist), specialized sub-skills (commits, reviews), and a bidirectional compression strategy (optimizing both what the AI says and what it reads).
The "Wenyan" Innovation: The inclusion of Classical Chinese (Wenyan) as a technical communication tool is a unique innovation in prompt engineering. Since LLM tokenizers are highly sensitive to character density, using a language designed for brevity allows for complex technical concepts to be transmitted in a fraction of the token count required by English.
Performance-to-Value Ratio: Benchmarks demonstrate up to 87% savings on complex tasks like explaining React re-rendering bugs or debugging PostgreSQL race conditions. While output tokens are reduced, "thinking" or reasoning tokens remain untouched, ensuring the AI's "brain" size is not diminished—only its "mouth" is made smaller.

Does Caveman reduce the quality of the AI's code suggestions? No. Caveman is designed to strictly preserve technical substance, code blocks, URLs, and logic. It only removes linguistic "fluff," filler words, and unnecessary pleasantries. Research suggests that brevity constraints can actually improve the accuracy of large models by reducing the likelihood of hallucinations in long-form prose.
How do I install Caveman for Cursor or Windsurf? You can use the npx command: npx skills add JuliusBrussee/caveman -a cursor (or -a windsurf). To make it "always-on," you should paste the provided system prompt snippet into your .cursor/rules/caveman.mdc or .windsurf/rules/caveman.md file, as these agents do not support native plugin hooks like Claude Code.
Can I toggle Caveman off if I need a detailed explanation? Yes. You can switch modes instantly using commands like /caveman lite for more grammar or stop caveman / normal mode to return to the agent's default conversational style. The settings are modular and can be adjusted mid-session.
What is the benefit of the caveman-compress tool? The compress tool targets the input context. Coding agents often read a "memory" file (like CLAUDE.md) at the start of every session. By compressing this file by ~50%, you save thousands of tokens over the course of a day’s work, as you aren't paying for the AI to re-read the same verbose instructions in every single prompt.

Why use so many tokens when few do trick?