Product Introduction
- Definition: Spoke is a native macOS utility application specializing in on-device speech-to-text transcription. It functions as a system-level input method, converting spoken audio into text directly within any active text field using local AI models.
- Core Value Proposition: Spoke eliminates friction in text input by enabling hands-free, instantaneous dictation with uncompromising privacy. Its primary value lies in offline voice transcription and optional AI-powered text processing, allowing users to dictate efficiently without data leaving their device.
Main Features
- Push-to-Talk Dictation:
- How it works: Users press and hold a configurable keyboard shortcut (e.g., Globe
fnkey, Right⌘). Spoke activates the microphone instantly via macOS Core Audio, processes speech locally using its CoreML-optimized neural network (600M parameters), and inserts the transcribed text upon key release. Zero app switching is required.
- How it works: Users press and hold a configurable keyboard shortcut (e.g., Globe
- AI Skills Integration:
- How it works: Post-transcription, users can route text through customizable AI prompts via connected providers (OpenAI, Anthropic, Google Gemini, Ollama). Skills execute tasks like real-time translation, grammar correction, or code prompt formatting before insertion. Users supply their own API keys; Spoke acts as a direct conduit, never storing or proxying data.
- Privacy-First Architecture:
- How it works: All audio processing occurs locally via Apple's CoreML framework. The open-source speech model (trained for 25 European languages) runs entirely on-device. Microphone audio is discarded immediately after transcription. AI Skills only send transcribed text (never audio) externally if activated. No accounts or internet are needed for core dictation.
Problems Solved
- Pain Point: Slow or disruptive text input workflows, especially for users who think faster than they type or work in dynamic environments (e.g., coding, note-taking). Traditional dictation often requires cloud services, compromising speed and data privacy.
- Target Audience:
- Developers & Technical Writers: Dictate code comments, documentation, or CLI prompts rapidly. Use AI Skills to format code snippets.
- Multilingual Professionals: Translate dictations in real-time during communications or writing.
- Accessibility Users & Power Users: Reduce typing strain and boost productivity with seamless voice input.
- Privacy-Conscious Individuals: Professionals handling sensitive information requiring offline, zero-data-retention transcription.
- Use Cases:
- Dictating emails/docs in airplane mode or restricted networks.
- Generating technically precise prompts for AI coding assistants (e.g., Claude) via voice.
- Real-time transcription of meetings/ideas directly into project management tools.
- Correcting grammar/punctuation in drafts without switching contexts.
Unique Advantages
- Differentiation:
- vs. Subscription Apps (e.g., Dragon): Spoke offers perpetual licensing ($9.99 one-time) vs. recurring fees ($60-$180/year). It runs fully offline, whereas competitors typically require cloud processing.
- vs. Built-in macOS Dictation: Spoke uses a superior on-device model (6.34% WER vs. ~7.4% for Whisper Large V3), supports custom AI post-processing, and offers configurable push-to-talk without menu activation delays.
- Key Innovation:
- Local CoreML Optimization: Implements a state-of-the-art open-source speech model (600M params) with best-in-class latency (~400ms for 60s audio on Apple Silicon) and accuracy, entirely offline.
- Modular AI Skill System: Unique per-skill provider switching and custom prompt engineering allow tailored text transformation without compromising core privacy.
Frequently Asked Questions (FAQ)
- Does Spoke work without an internet connection?
Yes, Spoke's core speech-to-text transcription operates 100% offline after the initial model download. No internet connection is required for basic dictation. AI Skills require internet only if using cloud providers like OpenAI or Anthropic. - Is my voice data stored or sent to servers?
No. Spoke processes all microphone audio locally on your Mac using CoreML. The audio is immediately discarded after transcription. Only if you actively use an AI Skill is the transcribed text (never the audio) sent to your chosen provider via your API key. - What Macs are compatible with Spoke?
Spoke requires macOS Sonoma (14+) and a Mac with Apple Silicon (M-series chips) for optimal performance of its local CoreML model. It does not support Intel Macs or older macOS versions. - Can I use my own AI API keys with Spoke?
Absolutely. Spoke integrates directly with OpenAI, Anthropic, Google Gemini, or local Ollama models using your own API keys. Your keys and data are never stored or proxied by Spoke's servers. - How accurate is Spoke's transcription compared to alternatives?
Spoke's optimized on-device model achieves a 6.34% word error rate (WER), outperforming larger models like Whisper Large V3 (~7.4% WER) in its weight class. It supports 25 European languages with auto-detection, offering best-in-class accuracy for offline transcription.