Product Introduction
- Definition: Kokori is a macOS-native text-to-speech (TTS) application and local API server that converts text into high-quality audio entirely offline. It falls under the technical category of desktop TTS software with embedded REST API capabilities.
- Core Value Proposition: Kokori eliminates dependency on cloud-based TTS services by providing offline, privacy-focused speech synthesis with studio-grade voices, reducing costs and latency for developers and creators.
Main Features
Local REST API Server:
- How it works: Runs a lightweight HTTP server (port 5002) on your Mac. Send JSON payloads via POST requests (
text,voice,speed) to generate audio without internet. - Technology: Built on Kokoro TTS engine, leveraging neural networks for natural prosody. Processes requests locally, avoiding cloud dependencies.
- How it works: Runs a lightweight HTTP server (port 5002) on your Mac. Send JSON payloads via POST requests (
Multi-Voice Library:
- How it works: Offers 50+ preloaded voices across 8 languages (e.g., American/British English, Japanese, Mandarin). Voices are quality-ranked (A-F) for clarity.
- Technology: Utilizes optimized acoustic models for each voice, supporting gender/language filters (e.g.,
en_us_heartfor high-quality American female).
Audio History & Logging:
- How it works: Automatically archives generated audio files locally. Detailed logs track API requests, errors, and performance metrics (e.g., latency).
- Technology: File-based storage with timestamped entries, enabling debugging without third-party tools.
Problems Solved
- Pain Point: High costs and privacy risks of cloud TTS APIs (e.g., Google Cloud, AWS Polly). Kokori enables $0 operational expenses and zero data leakage.
- Target Audience:
- Developers: Test voice-enabled apps offline, avoiding per-request fees.
- Content Creators: Generate unlimited voiceovers for videos/podcasts without subscriptions.
- Privacy-Conscious Users: Process sensitive documents offline (e.g., legal/medical text).
- Use Cases:
- Prototyping voice assistants without API keys.
- Creating multilingual audiobooks offline.
- Accessibility tools for offline text consumption.
Unique Advantages
- Differentiation:
- Vs. Cloud TTS: No throttling, 100% offline, no recurring fees.
- Vs. Built-in macOS TTS: Higher-quality voices, developer API, and speed/pitch control.
- Key Innovation:
- Integrated Desktop-App/API Hybrid: Seamlessly switch between GUI (menubar) and programmatic use.
- Voice Quality Hierarchy: Curated voice library with transparency about performance (e.g., "A-grade" vs. "D-grade" voices).
Frequently Asked Questions (FAQ)
Does Kokori work without an internet connection?
Yes, Kokori’s TTS engine and API server run entirely offline—no data leaves your device.Can I use Kokori voices for commercial projects?
Absolutely. The license permits commercial usage, including video monetization and app integration.How resource-intensive is the local API server?
Optimized for efficiency: uses <500MB RAM during operation and supports concurrent requests on modern Macs.What languages and accents does Kokori support?
Includes 8 languages (English, Japanese, Spanish, etc.) with regional variants like British (bf_alice) and American (af_heart) voices.Is there a Windows or Linux version?
Currently, Kokori is exclusive to macOS, leveraging native Apple Silicon/Intel optimizations.