Audio Tools

Explore the best new Audio tools and products curated by the community.

Grok's Text to Speech API  logo
Grok's Text to Speech API
Grok's Text to Speech API is now available.
MarketingAudio

Start building with natural voices and expressive controls to bring your apps to life.

2026-03-18
54
ElevenCreative by ElevenLabs logo
ElevenCreative by ElevenLabs
The AI creative platform to bring your content to life
Artificial IntelligenceAudioVideo

ElevenCreative is a single platform to generate, edit, and localize premium audio and video in minutes, powered by advanced voice, music, SFX, image, and video models. Powering millions of creators, marketing teams, and media companies worldwide.

2026-03-15
63
Solo Voice logo
Solo Voice
Private by architecture, not by promise.
ProductivityAppleAudio

Solo transcribes your speech and rewrites it with AI — entirely on your device. No cloud. No latency. No compromise. Built for Apple Silicon.

2026-03-13
56
TADA logo
TADA
1:1 text-acoustic alignment for 5x faster speech generation
Open SourceArtificial IntelligenceAudio

TADA (Text-Acoustic Dual Alignment) is Hume AI's open-source speech-language model that synchronizes text and audio one-to-one. TADA synchronizes text and speech into a single continuous stream via 1:1 token alignment. Generating audio at 5x the speed of conventional LLM-based TTS systems completely eliminates skipped words and content hallucinations across 1000+ tests.

2026-03-11
60
Fish Audio S2 logo
Fish Audio S2
Real Expressive AI Voices
Open SourceArtificial IntelligenceGitHubAudio

We've open-sourced Fish Audio S2, a new generation of expressive TTS that lets you direct voices with natural language. Add cues like [whisper] or [laughing nervously], generate multi-speaker dialogue in one pass, and create scary-real voices across 80+ languages.

2026-03-10
67
Spoke logo
Spoke
Private voice-to-text for macOS. Hold a key, speak, done.
LanguagesMenu Bar AppsAudio

Spoke is a macOS app that transcribes your voice into any text field. It runs a local speech model — no audio leaves your device. Hold a keyboard shortcut, speak, and the text appears wherever your cursor is. Optionally connect an AI provider to process transcriptions on the fly.

2026-03-05
58
AssemblyAI: Universal-3 Pro Streaming logo
AssemblyAI: Universal-3 Pro Streaming
The most accurate streaming speech model for voice agents.
Developer ToolsArtificial IntelligenceAudio

Universal-3 Pro Streaming is the most accurate real-time STT model for voice agents. With entity detection, speaker labels, and code switching, it's built for the hard stuff: disfluencies, alphanumerics, and noisy environments. One API. 99+ languages. Try it free.

2026-03-04
94
Vocova logo
Vocova
Transcribe audio & video from 1,000+ platforms
ProductivityArtificial IntelligenceAudio

Vocova transcribes audio and video to text in 100+ languages. Paste a link from YouTube, TikTok, Zoom, or 1,000+ platforms — or upload any file. What makes it different: - Speaker identification with color-coded labels and timestamps - Translate transcripts to 145+ languages with bilingual side-by-side view - Edit transcripts directly in the browser - Export as PDF, DOCX, SRT, VTT, TXT, or CSV - AI summaries and Q&A extraction Free to start, no credit card required.

2026-03-04
52
AssemblyAI logo
AssemblyAI
The most accurate streaming speech model for voice agents.
Developer ToolsArtificial IntelligenceAudio

Universal-3 Pro Streaming is the most accurate real-time STT model for voice agents. With entity detection, speaker labels, and code switching, it's built for the hard stuff: disfluencies, alphanumerics, and noisy environments. One API. 99+ languages. Try it free.

2026-03-04
53
Krisp Accent Conversion  logo
Krisp Accent Conversion
Understand accented speech in real time
ProductivityArtificial IntelligenceAudio

Accent Conversion for the Listener removes accent friction in real time. It converts accented English into neutral American English on the listener’s side, so speakers don’t change how they talk — you just understand instantly. Fully on-device with near-zero latency and works across Zoom, Teams, and Meet. Built for global teams where “can you repeat that?” quietly slows everything down.

2026-03-03
66
Expressive Mode for ElevenAgents logo
Expressive Mode for ElevenAgents
AI voice agents that adapt tone, timing & emotion by context
Customer CommunicationArtificial IntelligenceAudio

Expressive Mode is a voice agent so expressive that it blurs the line between AI and human conversation. Powered by Eleven v3 Conversational and a new turn-taking system for better-timed responses with fewer interruptions.

2026-03-02
59
Producer AI by Google Labs  logo
Producer AI by Google Labs
Turn ideas into tracks with your AI co-producer
MusicArtificial IntelligenceAudio

ProducerAI is a creative collaborator, whether you’re writing lyrics, developing a melody or experimenting with genres. With ProducerAI, you can turn your imagination into dynamic tracks. Producer AI has joined Google Labs.

2026-02-28
60
Dictato logo
Dictato
Local instant voice-to-text for every Mac
ProductivityAudio

Dictato turns speech into text on your Mac. No cloud, no account, no internet needed. Your audio stays on your computer. Press a hotkey, talk, release. Text appears where your cursor is — Gmail, Slack, VS Code, whatever app you're in. Three engines to choose from: Parakeet, Whisper, Apple Supports 25-99 languages depending on which you pick. Optional proofreading and translation, all on-device. 7-day free trial. $9.99 for a two-year license. Requires macOS 14+ and Apple Silicon.

2026-02-24
73
Seagull logo
Seagull
Real-time translation overlay for all your computer audio.
LanguagesAudioVideo

Real-time translation overlay for Mac, Windows, and Linux. Capture audio from any app, see it translated instantly.

2026-02-23
57
Monologue for iOS logo
Monologue for iOS
Turn your voice into polished writing—wherever you go.
ProductivityArtificial IntelligenceAudio

Monologue turns your voice into polished writing—inside the apps you already use. From coding in the terminal to sending a quick message to grandpa, Monologue is the shortest distance between speech and writing. Unlike basic dictation, Monologue doesn't just transcribe. It rewrites, removes filler words, adds punctuation, and adapts to context. Your texts sound like texts. Your emails sound human. Your notes turn into clean lists and structured thoughts.

2026-02-19
59
Auden logo
Auden
Your day-to-day AI memory that listens and remembers
ProductivityArtificial IntelligenceAudio

Auden is OS-level tool available on Desktop, Tablet and Mobile that let you record and listen to any type of audio you want to remember later. is listens, summarizes what has been said and saves the recording playback and categorizes it, all powered with AI!

2026-02-17
53
JustScribe logo
JustScribe
On-device instant voice transcription
PrivacyAudio

JustScribe is a privacy-first live transcription app for macOS. Instant, offline speech-to-text powered by AI. No cloud, no data collection. Your voice, your data.

2026-02-17
62
NVIDIA PersonaPlex logo
NVIDIA PersonaPlex
Natural Conversational AI With Any Role and Voice
Open SourceArtificial IntelligenceGitHubAudio

We introduce PersonaPlex, a full-duplex conversational AI model that enables natural conversations with customizable voices and roles. PersonaPlex handles interruptions and backchannels while maintaining any chosen persona, outperforming existing systems on conversational dynamics and task adherence.

2026-02-16
65
Obi logo
Obi
Repeatable 1:1 onboarding call
Customer SuccessArtificial IntelligenceAudio

Onboard every user like it’s your best live call. Obi is a voice AI agent that talks users through setup, answers questions in real time, and shares insights after every session. No clunky tours or videos—just real conversation, 24/7, at any scale.

2026-02-06
61
Voxtral Transcribe 2 by Mistral logo
Voxtral Transcribe 2 by Mistral
Real-time speech-to-text with speaker diarization
AndroidDeveloper ToolsArtificial IntelligenceAudio

Voxtral Transcribe 2 delivers ultra-fast, highly accurate speech-to-text with real-time transcription and speaker diarization. Built for live apps, voice agents, and meetings, it supports 13 languages, word-level timestamps, and privacy-first deployment All at industry-leading speed and cost.

2026-02-05
84
HyNote End-to-End Publish logo
HyNote End-to-End Publish
Turns any meeting, audio, or file into clear written notes
AndroidNotesMeetingsAudio

HyNote is a comprehensive end-to-end knowledge second brain that transforms raw data into polished results. By seamlessly managing the entire lifecycle of your information—from capturing diverse inputs like audio and video to generating professional-grade exports—it functions as a true second brain. This streamlined workflow ensures that your insights aren't just stored, but are actively evolved from initial sparks into actionable, shareable outputs.

2026-02-03
52

Subscribe to Our Newsletter

Get weekly curated tool recommendations and stay updated with the latest product news