Audio Tools
Explore the best new Audio tools and products curated by the community.
Start building with natural voices and expressive controls to bring your apps to life.
ElevenCreative is a single platform to generate, edit, and localize premium audio and video in minutes, powered by advanced voice, music, SFX, image, and video models. Powering millions of creators, marketing teams, and media companies worldwide.
Solo transcribes your speech and rewrites it with AI — entirely on your device. No cloud. No latency. No compromise. Built for Apple Silicon.
TADA (Text-Acoustic Dual Alignment) is Hume AI's open-source speech-language model that synchronizes text and audio one-to-one. TADA synchronizes text and speech into a single continuous stream via 1:1 token alignment. Generating audio at 5x the speed of conventional LLM-based TTS systems completely eliminates skipped words and content hallucinations across 1000+ tests.
We've open-sourced Fish Audio S2, a new generation of expressive TTS that lets you direct voices with natural language. Add cues like [whisper] or [laughing nervously], generate multi-speaker dialogue in one pass, and create scary-real voices across 80+ languages.
Spoke is a macOS app that transcribes your voice into any text field. It runs a local speech model — no audio leaves your device. Hold a keyboard shortcut, speak, and the text appears wherever your cursor is. Optionally connect an AI provider to process transcriptions on the fly.
Universal-3 Pro Streaming is the most accurate real-time STT model for voice agents. With entity detection, speaker labels, and code switching, it's built for the hard stuff: disfluencies, alphanumerics, and noisy environments. One API. 99+ languages. Try it free.
Vocova transcribes audio and video to text in 100+ languages. Paste a link from YouTube, TikTok, Zoom, or 1,000+ platforms — or upload any file. What makes it different: - Speaker identification with color-coded labels and timestamps - Translate transcripts to 145+ languages with bilingual side-by-side view - Edit transcripts directly in the browser - Export as PDF, DOCX, SRT, VTT, TXT, or CSV - AI summaries and Q&A extraction Free to start, no credit card required.
Universal-3 Pro Streaming is the most accurate real-time STT model for voice agents. With entity detection, speaker labels, and code switching, it's built for the hard stuff: disfluencies, alphanumerics, and noisy environments. One API. 99+ languages. Try it free.
Accent Conversion for the Listener removes accent friction in real time. It converts accented English into neutral American English on the listener’s side, so speakers don’t change how they talk — you just understand instantly. Fully on-device with near-zero latency and works across Zoom, Teams, and Meet. Built for global teams where “can you repeat that?” quietly slows everything down.
Expressive Mode is a voice agent so expressive that it blurs the line between AI and human conversation. Powered by Eleven v3 Conversational and a new turn-taking system for better-timed responses with fewer interruptions.
ProducerAI is a creative collaborator, whether you’re writing lyrics, developing a melody or experimenting with genres. With ProducerAI, you can turn your imagination into dynamic tracks. Producer AI has joined Google Labs.
Dictato turns speech into text on your Mac. No cloud, no account, no internet needed. Your audio stays on your computer. Press a hotkey, talk, release. Text appears where your cursor is — Gmail, Slack, VS Code, whatever app you're in. Three engines to choose from: Parakeet, Whisper, Apple Supports 25-99 languages depending on which you pick. Optional proofreading and translation, all on-device. 7-day free trial. $9.99 for a two-year license. Requires macOS 14+ and Apple Silicon.
Real-time translation overlay for Mac, Windows, and Linux. Capture audio from any app, see it translated instantly.
Monologue turns your voice into polished writing—inside the apps you already use. From coding in the terminal to sending a quick message to grandpa, Monologue is the shortest distance between speech and writing. Unlike basic dictation, Monologue doesn't just transcribe. It rewrites, removes filler words, adds punctuation, and adapts to context. Your texts sound like texts. Your emails sound human. Your notes turn into clean lists and structured thoughts.
Auden is OS-level tool available on Desktop, Tablet and Mobile that let you record and listen to any type of audio you want to remember later. is listens, summarizes what has been said and saves the recording playback and categorizes it, all powered with AI!
JustScribe is a privacy-first live transcription app for macOS. Instant, offline speech-to-text powered by AI. No cloud, no data collection. Your voice, your data.
We introduce PersonaPlex, a full-duplex conversational AI model that enables natural conversations with customizable voices and roles. PersonaPlex handles interruptions and backchannels while maintaining any chosen persona, outperforming existing systems on conversational dynamics and task adherence.
Onboard every user like it’s your best live call. Obi is a voice AI agent that talks users through setup, answers questions in real time, and shares insights after every session. No clunky tours or videos—just real conversation, 24/7, at any scale.
Voxtral Transcribe 2 delivers ultra-fast, highly accurate speech-to-text with real-time transcription and speaker diarization. Built for live apps, voice agents, and meetings, it supports 13 languages, word-level timestamps, and privacy-first deployment All at industry-leading speed and cost.
HyNote is a comprehensive end-to-end knowledge second brain that transforms raw data into polished results. By seamlessly managing the entire lifecycle of your information—from capturing diverse inputs like audio and video to generating professional-grade exports—it functions as a true second brain. This streamlined workflow ensures that your insights aren't just stored, but are actively evolved from initial sparks into actionable, shareable outputs.