Product Introduction
Definition: Mistral Medium 3.5 is a flagship 128B dense Large Language Model (LLM) designed specifically for advanced agentic workflows. It serves as a unified model architecture that merges sophisticated instruction-following, logical reasoning, and high-tier coding capabilities into a single set of weights. Engineered for high-performance self-hosted inference, it supports a massive 256k token context window and features a vision encoder trained from scratch to process variable image resolutions and aspect ratios.
Core Value Proposition: Mistral Medium 3.5 exists to bridge the gap between high-parameter proprietary models and the flexibility of open-weights ecosystems. It provides enterprise-grade performance for long-horizon tasks—such as remote software engineering and complex research synthesis—while maintaining a footprint that allows for deployment on as few as four high-end GPUs. This allows organizations to run state-of-the-art agentic systems without compromising on data sovereignty or suffering from the latency of centralized API bottlenecks.
Main Features
Unified 128B Dense Architecture: Unlike Mixture-of-Experts (MoE) models, Mistral Medium 3.5 utilizes a dense 128B parameter structure that provides consistent reasoning depth across all tokens. This architecture enables the model to excel in "Merged Tasking," where it must simultaneously follow complex instructions, reason through logical fallacies, and generate production-ready code. It is optimized for 256k context windows, allowing it to ingest entire repositories or extensive document sets for analysis.
Mistral Vibe Remote Coding Agents: This feature offloads coding tasks from the user's local machine to cloud-based isolated sandboxes. Using the Mistral Vibe CLI or the Le Chat interface, developers can spawn async agents that perform module refactors, test generation, and dependency upgrades in parallel. A unique "Teleport" capability allows users to transition a local CLI session to the cloud, maintaining task state and session history while freeing up local resources.
Configurable Reasoning Effort: The model introduces a granular "reasoning effort" parameter that can be adjusted per request. For simple conversational tasks, the model operates at lower latency, while for complex "agentic runs"—such as multi-step research or architectural planning—the model increases its internal compute passes to ensure higher accuracy and fewer hallucinations in tool-calling sequences.
Le Chat "Work Mode" Agentic Harness: Powered by Mistral Medium 3.5, Work Mode transforms the standard chatbot into a proactive assistant capable of multi-step task execution. It utilizes a new execution backend that allows the agent to call multiple tools in parallel, read/write to connected documents, and navigate cross-app workflows (e.g., pulling data from Sentry and Jira to draft a summary in Slack) until a user-defined objective is met.
Problems Solved
The Local Development Bottleneck: Traditional coding agents are often restricted to the developer's local laptop, consuming significant CPU/RAM and preventing parallel tasking. Mistral Medium 3.5 and Vibe solve this by moving these agents to the cloud, allowing developers to trigger multiple agents simultaneously and receive notifications only when the PR is ready for review.
Fragile Multi-Step Tool Calling: Standard LLMs often fail when required to call multiple tools in a specific sequence over long durations. Mistral Medium 3.5 addresses this with its high score on the τ³-Telecom benchmark (91.4), ensuring that agentic workflows remain stable during trial-and-error processes and complex API interactions.
Target Audience:
- Software Engineers and DevOps Teams: Who need to automate repetitive coding tasks like unit testing, bug fixing, and CI/CD investigations.
- Data Analysts and Researchers: Who require an agent capable of synthesizing information across internal documents, web sources, and proprietary databases.
- Enterprise Architects: Seeking a high-performance 128B model that can be self-hosted on private infrastructure (4x GPUs) for security and compliance.
- Use Cases:
- Automated Refactoring: Initiating a Vibe session to update a legacy codebase to a newer framework version.
- Triage and Incident Response: Using Work Mode to scan Sentry logs, correlate them with recent GitHub commits, and draft a report for the engineering team.
- Executive Synthesis: Automatically pulling context from emails, calendars, and meeting notes to generate a comprehensive briefing document.
Unique Advantages
Optimized Self-Hosting (4-GPU Deployment): While many models of this caliber require massive server clusters, Mistral Medium 3.5 is optimized to run on as few as four GPUs. This makes it the premier choice for engineers and teams running self-hosted inference who need flagship performance without the infrastructure overhead of 400B+ parameter models.
Performance-to-Size Efficiency: Mistral Medium 3.5 outperforms significantly larger models, such as Qwen3.5 397B, on critical coding benchmarks like SWE-Bench Verified (scoring 77.6%). It delivers elite coding intelligence at a fraction of the computational cost of its competitors.
Modified MIT License for Open Weights: By providing open weights under a modified MIT license, Mistral AI enables a level of transparency and customization that proprietary "black box" models cannot match. Developers can fine-tune the model for specific enterprise domains while benefiting from the pre-trained flagship weights.
Frequently Asked Questions (FAQ)
What is the context window size for Mistral Medium 3.5? Mistral Medium 3.5 features a 256k token context window. This large capacity allows the model to process extensive documentation, large-scale codebases, and long conversational histories without losing track of information or requiring aggressive truncation.
How much does the Mistral Medium 3.5 API cost? The model is priced competitively for high-tier performance. Through the Mistral AI API, input tokens are priced at $1.5 per million, and output tokens are priced at $7.5 per million. It is also available via the Pro, Team, and Enterprise plans on Le Chat.
Can Mistral Medium 3.5 be used for vision-based tasks? Yes. Mistral Medium 3.5 includes a vision encoder trained from scratch. Unlike models that use off-the-shelf encoders, this implementation is designed to handle variable image sizes and aspect ratios, making it highly effective for analyzing technical diagrams, UI screenshots, and complex visual data.
How does Mistral Vibe "Teleport" work? The Teleport feature allows a developer to move an active coding session from their local CLI to Mistral’s remote cloud infrastructure. This transfer includes the entire session history, task state, and pending approvals, allowing the agent to continue working autonomously while the developer disconnects or moves to another task.
