LFM2.5

Product Introduction

Definition: LFM2.5 is a family of open-weight, device-optimized foundation models (1.2B-1.6B parameters) for edge AI deployment, including Base, Instruct, Japanese (JP), Vision-Language (VL), and Audio-Language variants. It belongs to the technical category of small-scale large language models (LLMs) specifically engineered for resource-constrained hardware.
Core Value Proposition: LFM2.5 delivers private, low-latency, always-available AI intelligence directly on edge devices (smartphones, vehicles, IoT), eliminating cloud dependency. It provides state-of-the-art performance in the 1B parameter class across text, vision, audio, and multilingual tasks while maintaining minimal memory footprint and high inference speed.

Main Features

Hybrid Architecture for Edge Efficiency:
How it works: Builds upon the LFM2 device-optimized architecture, combining transformer innovations with custom components for reduced computational overhead. Utilizes techniques like quantization-aware training (QAT) down to INT4 precision and optimized kernels for frameworks like llama.cpp (GGUF), MLX (Apple Silicon), vLLM (GPU), and ONNX.
Technical Specs: Achieves inference speeds up to 439 tokens/sec on Qualcomm NPUs (NexaML) and 2975 tokens/sec prefill on AMD Ryzen CPUs (llama.cpp Q4_0), with memory usage as low as 56MB on CPUs.
Advanced Multimodal Capabilities:
Vision-Language (LFM2.5-VL-1.6B): Features enhanced multi-image comprehension and multilingual vision understanding (Arabic, Chinese, French, German, Japanese, Korean, Spanish). Benchmarks show significant gains over predecessors (e.g., 50.67 on MMStar vs. 49.87 for LFM2-VL).
Audio-Language (LFM2.5-Audio-1.5B): Processes audio natively (speech input/output) without separate ASR/TTS pipelines. Uses a custom LFM-based audio detokenizer, achieving 8x faster waveform generation than LFM2 on mobile CPUs (INT4 QAT) with minimal quality loss (STOI 0.89, UTMOS 3.53).
Domain-Specialized Models:
Japanese Optimization (LFM2.5-1.2B-JP): A dedicated chat model fine-tuned for Japanese linguistic and cultural nuance, outperforming generalist models like Qwen3-1.7B in Japanese benchmarks (JMMLU: 50.7 vs. 47.7).
Instruction Tuning (LFM2.5-1.2B-Instruct): Trained with multi-stage reinforcement learning (RL) on 28T tokens for superior instruction following and tool use. Leads 1B-class models in GPQA (38.89), MMLU-Pro (44.35), and IFEval (86.23).

Problems Solved

Privacy-Sensitive AI Deployment: Enables fully private on-device AI for healthcare, finance, and enterprise applications, ensuring sensitive data never leaves the user’s device. Eliminates cloud latency and security risks.
Latency-Critical Edge Applications: Solves high-latency issues in real-time systems (e.g., autonomous vehicles, industrial robots) with sub-100ms audio response times and CPU-optimized inference. The audio model’s native processing cuts end-to-end latency dramatically.
Hardware-Constrained AI: Addresses memory and compute limitations of mobiles, IoT devices (Qualcomm Dragonwing), and embedded systems via sub-100MB memory footprints, INT4 quantization, and NPU-optimized kernels (AMD/Xilinx, Qualcomm NexaML).

Target Audience

Automotive Engineers: For in-car assistants requiring low-latency voice control and offline operation.
Mobile App Developers: Building local AI copilots, translation tools, or camera-based features on iOS/Android (via LEAP SDK).
Japanese-Localization Specialists: Creating culturally accurate chatbots and productivity tools.
IoT Device Manufacturers: Needing efficient voice/vision AI for smart home/industrial sensors.
Enterprise IT Teams: Deploying confidential document analysis or customer service agents on-premises.

Unique Advantages

Performance-Per-Watt Leader: Outperforms competitors like Llama 3.2 1B Instruct (+22.32 on GPQA), Gemma 3 1B IT (+14.65 on MMLU-Pro), and Qwen3-1.7B (+4.04 on IFEval) while using 3-5x less memory than Qwen3-1.7B (19MB vs. 306MB on Galaxy S25).
Native Multimodal Integration: Unified audio-language processing (no transcription/TTS chaining) and multilingual vision comprehension are industry-first for sub-2B models, enabled by Liquid AI’s custom detokenizers and hybrid training pipelines.

Frequently Asked Questions (FAQ)

Where can I download LFM2.5 models?
All open-weight LFM2.5 models (Base, Instruct, JP, VL, Audio) are available on Hugging Face, LEAP (Liquid’s deployment platform), and Amazon Bedrock. GGUF/MLX/ONNX formats are supported.
What hardware supports LFM2.5 optimization?
LFM2.5 is optimized for Qualcomm NPUs (Snapdragon X Elite/Gen4), AMD Ryzen AI NPUs, Apple Silicon (via MLX), and NVIDIA GPUs (via vLLM). CPU deployment uses llama.cpp.
How does LFM2.5-JP improve Japanese AI applications?
It achieves SOTA Japanese benchmark scores (JMMLU: 50.7) via specialized training, making it ideal for culturally nuanced tasks like customer service bots, content moderation, and local productivity tools.
Can LFM2.5-Audio run offline on mobile devices?
Yes, the INT4-quantized audio model runs natively on smartphones (e.g., Samsung Galaxy S25) using <1GB RAM, generating speech 8x faster than LFM2 with high fidelity (UTMOS 3.53).
Is LFM2.5 suitable for enterprise customization?
Absolutely. The LFM2.5-Base model is designed for fine-tuning on proprietary data using LEAP, enabling custom on-device agents for finance, healthcare, or industrial use cases without cloud dependency.

The next generation of on-device AI

Product Introduction

Main Features

Problems Solved

Target Audience

Unique Advantages

Frequently Asked Questions (FAQ)

Related Products

Moltbot

Floutwork

Recall Augmented Browsing

LFM2.5

The next generation of on-device AI

Product Introduction

Main Features

Problems Solved

Target Audience

Unique Advantages

Frequently Asked Questions (FAQ)

Related Products

Moltbot

Floutwork

Recall Augmented Browsing

Subscribe to Our Newsletter