Stet

Definition: Stet is a high-performance, minimalist voice input method and speech-to-text (STT) utility designed natively for macOS. It functions as a system-wide accessibility and productivity tool that leverages on-device voice processing and Large Language Model (LLM) refinement to convert spoken words into polished text in real-time.
Core Value Proposition: Stet exists to bridge the 4x speed gap between human speech (~~160 wpm) and traditional typing (~~40 wpm). By integrating a "Press, Speak, Release" workflow, it allows users to dictate text directly into any active cursor field—ranging from IDEs like VS Code to communication platforms like Slack—while using AI to filter out disfluencies and filler words without losing the speaker's unique intent or tone.

On-Device Voice Processing: Stet prioritizes local execution for the initial speech-to-text conversion. By processing audio data locally on the Mac hardware, the application minimizes latency and ensures that sensitive vocal data does not leave the machine unnecessarily. This native integration allows for a seamless "Press-to-Talk" experience that mirrors the responsiveness of system-level functions.
AI-Driven Refinement Engines: Once the local engine captures the raw transcript, Stet applies advanced AI refinement to "clean" the text. This feature is context-aware: it can distinguish between a message to a friend (where it preserves natural phrasing and "character") and a prompt for an AI or professional document (where it strips out noise and "ums" to make the content lean and professional). Users can toggle this refinement based on the desired output persona.
Open-Source Transparency and BYOK (Bring Your Own Key): Unlike proprietary dictation software, Stet is open-source (available on GitHub), allowing for full technical auditing. It offers a flexible infrastructure where users can connect their own OpenAI or Anthropic API keys for "Personal Key" use at no cost, or opt for the "Stet Cloud" subscription for zero-configuration, priority AI processing.

Cognitive Load and Typing Bottlenecks: Traditional QWERTY input is significantly slower than human thought and speech. Stet addresses the "input lag" of the human-computer interface by allowing users to capture complex ideas at the speed of conversation, which is essential for creative writing, coding, and rapid-fire communication.
Transcription Noise and Disfluencies: Standard dictation tools often capture every "um," "ah," and "like," resulting in messy text that requires manual editing. Stet solves this by using AI to intelligently "let the words stand" (the literal meaning of the editorial term stet) while removing the linguistic clutter that distracts from the core message.
Target Audience:

Software Engineers: Using voice for documentation, git commits, and code comments within JetBrains, VS Code, or Cursor.
Content Creators and Writers: Drafting long-form content in Obsidian, Notion, or Bear without the friction of a keyboard.
Busy Professionals: Managing high volumes of correspondence in Slack, Teams, and Gmail.
Privacy-Conscious Users: Individuals who require AI assistance but want the transparency of an open-source tool.

Rapid Prompt Engineering: Dictating complex instructions to LLMs like ChatGPT or Claude where clarity and brevity are required.
Asynchronous Communication: Sending clear, professional messages in Slack or Discord while on the move or away from a desk.
Developer Documentation: Explaining logic in real-time while navigating through complex codebases in an IDE.

Differentiation: Most voice-to-text apps are either too heavy (enterprise transcription suites) or too simple (built-in system dictation). Stet occupies the "middle-out" space: it is a native Mac experience that feels invisible but provides the power of a modern AI stack. It works in every app simultaneously without requiring specific plugins or integrations.
Key Innovation: The "Refinement Toggle" is Stet's primary innovation. Instead of just transcribing what was said, it understands the intent of the speech. It can transform a rambling thought into a structured professional request or preserve the emotional nuances of a personal note, giving the user control over the "polish level" of their voice.

Is my voice data private when using Stet? Yes. Voice processing is performed locally on your Mac. If you use the "Personal Key" plan, your transcribed text is sent directly to your chosen AI provider (via your own API key). If you use Stet Cloud, data is handled with priority processing and end-to-end security, maintaining the privacy standards of an open-source project.
Does Stet work in specialized apps like VS Code or Figma? Stet is designed to work everywhere the macOS cursor can land. Because it functions as a global input method, it is compatible with all professional tools including VS Code, JetBrains, Figma, Notion, Slack, and even terminal emulators like Warp or Zed.
Can I use Stet for free? Yes. Stet offers a "Personal Key" tier that is free to use. By bringing your own API key from an AI provider, you can access unlimited on-device voice processing and AI refinement without a monthly subscription to the Stet Cloud service.

Smart open-source dictation that sounds like you, not AI.