Xiaomi's Open Source Model, Born for Reasoning

MiMo is an open-source large language model (LLM) series licensed under Apache 2.0, explicitly designed to excel in reasoning tasks such as mathematics and code generation. It includes pre-trained, supervised fine-tuned (SFT), and reinforcement learning (RL)-tuned models, with the 7B parameter version demonstrating performance comparable to larger models like OpenAI's o1-mini. The series emphasizes a holistic approach to model development, spanning data preprocessing, synthetic data generation, and advanced RL optimization techniques.
The core value of MiMo lies in its ability to unlock the inherent reasoning potential of language models through a dual focus on pre-training and post-training strategies. By optimizing data quality, training objectives, and reward mechanisms, MiMo achieves state-of-the-art performance in mathematical and coding benchmarks while maintaining efficiency in a compact 7B parameter size.

MiMo-7B-Base incorporates Multiple-Token Prediction (MTP) as a training objective, enabling simultaneous prediction of multiple tokens during inference to accelerate generation speed without sacrificing accuracy. This is supported by custom modifications to inference engines like vLLM, which integrate speculative decoding for latency reduction.
The RL training framework uses test difficulty-driven rewards for code problems, assigning granular scores based on the complexity of test cases to provide dense optimization signals. This addresses sparse reward challenges in traditional RL setups, improving policy convergence for intricate coding tasks.
A three-stage pre-training strategy combines filtered web data, synthetic reasoning datasets, and domain-specific corpora, totaling 25 trillion tokens. The pipeline employs multi-dimensional data filtering and proprietary text extraction tools to maximize reasoning pattern density in training data.

MiMo addresses the industry-wide challenge of developing small-scale models (7B parameters) that simultaneously excel in mathematical reasoning and code generation, domains traditionally dominated by 32B+ models. It eliminates the trade-off between model size and multi-domain reasoning capability.
The product targets AI researchers, developers, and enterprises requiring cost-efficient LLMs for STEM education tools, automated code review systems, and competition-level mathematics problem solvers.
Typical use cases include generating verifiable solutions for International Mathematical Olympiad (IMO)-level problems, synthesizing executable Python code for LiveCodeBench challenges, and powering low-latency reasoning APIs through vLLM-optimized deployment.

Unlike conventional models that focus solely on post-training RL, MiMo implements reasoning-centric pre-training through synthetic data augmentation and MTP objectives, establishing superior base model capabilities before fine-tuning. This results in a 93.6% Pass@1 rate on MATH-500 after RL-from-base training, outperforming many SFT-then-RL approaches.
The Seamless Rollout Engine reduces GPU idle time by 56% through continuous rollout generation, asynchronous reward computation, and early termination of low-reward trajectories. This infrastructure innovation enables 2.29× faster RL training cycles compared to standard implementations.
Competitive advantages include matching OpenAI o1-mini's performance (95.8% vs. 90.6% on MATH-500) while using 4.6× fewer parameters, and achieving 57.8% Pass@1 on LiveCodeBench v5 versus 53.8% for o1-mini. The Apache 2.0 license further enables commercial deployment without restrictive usage terms.

How does MiMo-7B-RL compare to GPT-4 in mathematical reasoning? MiMo-7B-RL achieves 95.8% Pass@1 on MATH-500 versus GPT-4's 74.6%, with specialized training on competition-level problems and synthetic data. However, GPT-4 maintains broader general knowledge coverage.
Can MiMo be used commercially under Apache 2.0? Yes, the Apache 2.0 license permits commercial use, modification, and distribution, provided copyright notices and license terms are retained.
What hardware is required for deployment? The base model runs on 1×A10 GPU (24GB VRAM) using vLLM, while full RL inference with speculative decoding requires 2×A10s. Quantized versions for edge deployment will be released in Q3 2025.

Subscribe to Our Newsletter