Omnilingual ASR

Meta's Omnilingual ASR is an open-source automatic speech recognition (ASR) system designed to transcribe speech across 1,600+ languages, including 500 low-resource languages previously unsupported by AI. It leverages a large language model (LLM)-based architecture that enables expansion to new languages with minimal in-context examples, eliminating the need for full retraining. The system is released under the Apache 2.0 license, ensuring accessibility for researchers, developers, and communities.
The core value of Omnilingual ASR lies in its ability to democratize speech technology by bridging the digital divide for underrepresented languages. It achieves state-of-the-art transcription quality at an unprecedented scale while empowering global communities to extend support to their native languages with minimal resources.

Omnilingual ASR provides a suite of models ranging from lightweight 300M parameter versions for edge devices to high-accuracy 7B parameter models, all trained on a multilingual corpus spanning 1,600+ languages. The architecture includes a 7B-parameter wav2vec 2.0 speech encoder and two decoder variants (CTC and transformer-based LLM-ASR), achieving character error rates (CER) below 10% for 78% of supported languages.
The Omnilingual ASR Corpus, released under CC-BY license, includes transcribed speech data for 350 underserved languages, curated through partnerships with global organizations and native speakers. This dataset represents the largest ultra-low-resource spontaneous speech collection publicly available for ASR research.
The system introduces in-context learning capabilities inspired by LLMs, allowing users to adapt the model to new languages by providing only a few audio-text pairs without retraining. This feature enables rapid deployment for languages with no prior digital footprint or labeled datasets.

Omnilingual ASR addresses the exclusion of low-resource languages from mainstream ASR systems, which traditionally require large labeled datasets and expert fine-tuning. It eliminates dependency on high-resource language data, enabling scalable transcription for languages with limited digital representation.
The product targets linguistically marginalized communities, AI researchers working on multilingual systems, and developers building inclusive speech-to-text applications. It also serves organizations advocating for digital preservation of endangered languages.
Typical use cases include transcribing oral histories in remote dialects, enabling voice interfaces for regional languages in healthcare/education, and supporting real-time communication tools for multilingual populations. Researchers can leverage the corpus and models to advance underrepresented language NLP tasks.

Unlike conventional ASR systems limited to ~100 high-resource languages, Omnilingual ASR supports 1,600+ languages with specialized optimization for 500 previously untranscribed languages. Its architecture combines self-supervised speech representations (wav2vec 2.0) with LLM decoders, a hybrid approach unseen in prior multilingual ASR implementations.
The system innovates through community-driven extensibility: users can add new languages using <10 in-context examples via the LLM-ASR decoder, reducing adaptation costs by 90% compared to traditional fine-tuning methods. The 7B-parameter wav2vec 2.0 model also sets a new benchmark for speech representation learning at scale.
Competitive advantages include open-source availability (Apache 2.0), integration with PyTorch’s fairseq2 framework, and pre-trained models optimized for both high-resource and ultra-low-resource languages. The inclusion of a language exploration demo and transcription tool further lowers adoption barriers.

What languages does Omnilingual ASR support? Omnilingual ASR natively supports 1,600+ languages, including 500 low-resource languages like Akan, Bhojpuri, and Quechua, with continuous expansion through community contributions. Users can verify coverage via the language exploration demo or extend the system to unsupported languages using in-context examples.
How can I add a new language to the model? Provide 5-10 paired audio-text samples in your target language through the provided API, which triggers the LLM-ASR decoder’s in-context learning without requiring model retraining. Full fine-tuning remains optional for optimizing performance.
What computational resources are required for deployment? The 300M parameter model runs on edge devices with 2GB RAM, while the 7B model requires cloud GPUs for real-time inference. All models are compatible with ONNX Runtime for hardware optimization.
Is the training data publicly accessible? The Omnilingual ASR Corpus—350 languages’ worth of transcribed speech—is available under CC-BY license, while additional data from partners like Mozilla Common Voice follows their respective licenses.
How does the performance compare to existing ASR systems? For 78% of supported languages, Omnilingual ASR achieves CER below 10%, outperforming prior multilingual models like Whisper by 40% relative error reduction in low-resource settings. Benchmark details are provided in the accompanying paper.

Advancing automatic speech recognition for 1,600+ languages