Mercury Edit 2

Definition: Mercury Edit 2 is a specialized diffusion-based Large Language Model (dLLM) designed specifically for next-edit prediction in software development. Unlike traditional autoregressive models that generate tokens sequentially, Mercury Edit 2 utilizes a diffusion architecture to generate code changes in parallel, optimizing it for the high-concurrency and low-latency requirements of modern Integrated Development Environments (IDEs).
Core Value Proposition: Mercury Edit 2 exists to provide developers with near-instantaneous, high-fidelity code modifications based on real-time context. By prioritizing "next-edit" logic over simple line completion, it anticipates the developer's intent, reducing cognitive friction and manual typing. Its primary value lies in its superior latency-to-quality ratio, outperforming frontier models in both speed and edit acceptance rates.

Diffusion-Based Token Generation: Mercury Edit 2 leverages a dLLM (diffusion Large Language Model) architecture. This technology allows the model to predict and generate multiple tokens simultaneously across a code block rather than one by one. This parallelization significantly reduces the time-to-first-token and overall inference latency, making AI suggestions feel instantaneous and integrated into the developer's natural typing cadence.
KTO-Aligned Human Preference Tuning: To ensure suggestions are helpful rather than intrusive, Inception Labs utilizes Kahneman-Tversky Optimization (KTO). This unpaired reinforcement learning method aligns the model with a high-quality human preference dataset derived from actual developer interactions (accepting vs. rejecting edits). This results in a model that is 27% more selective, focusing on high-impact changes and minimizing "overzealous" or distracting suggestions.
Multi-Benchmark Validated Accuracy: The model is evaluated against a rigorous suite of four benchmarks: Instinct, Fill-in-the-middle (FIM), Next-edit Prediction (NEP), and a proprietary internal next-edit benchmark. These tests measure the model's proficiency across diverse scenarios, including variable renaming, complex refactoring, and feature implementation, ensuring that the proposed edits align with human-written "gold-standard" code.

Pain Point: Inference Latency in AI Pair Programming: Standard large-scale LLMs often introduce a "lag" that disrupts developer flow. Mercury Edit 2 addresses this by optimizing the model specifically for the next-edit task, achieving speeds that allow it to function as a real-time thought partner rather than a slow external tool.
Target Audience: The primary users are professional software engineers, full-stack developers, and DevOps specialists who require high-velocity coding environments. It is particularly targeted at users of the Zed editor and developers building custom internal developer tools (IDPs) via the Inception API Platform.
Use Cases:

Refactoring Legacy Code: Quickly renaming variables or restructuring functions across a codebase.
Feature Implementation: Generating the "next logical step" in a new module based on existing patterns in the repository.
Boilerplate Reduction: Automatically predicting standard setup code or repetitive logic based on recent edits.
Contextual Error Correction: Suggesting fixes for syntax errors or logic bugs immediately after they are typed.

Differentiation: Compared to general-purpose frontier models, Mercury Edit 2 is purpose-built for editing rather than general conversation. It achieves a 48% higher acceptance rate than previous iterations while remaining provider-agnostic, as demonstrated by its deep integration into the Zed editor ecosystem.
Key Innovation: The application of diffusion models to the specific domain of code editing is a paradigm shift. While most coding assistants use "Fill-in-the-middle" (FIM) logic on autoregressive transformers, Mercury Edit 2’s use of dLLM allows for a "reasoning-lite" but "speed-heavy" approach that is uniquely suited for the micro-edits that constitute the bulk of daily programming.

How much does Mercury Edit 2 cost to use via the API? Mercury Edit 2 is priced competitively at $0.25 per 1 million input tokens and $0.75 per 1 million output tokens. For frequent requests, cached input tokens are significantly cheaper at $0.025 per 1 million tokens. New users on the Inception API Platform typically receive 10 million free tokens to start.
How do I enable Mercury Edit 2 in the Zed editor? Zed users can configure Mercury Edit 2 as their edit prediction provider by using their Inception API key. Inception Labs often provides promotional periods, such as one free month of edit suggestions for Zed users, to facilitate the transition to diffusion-based editing.
What makes a diffusion LLM (dLLM) better for coding than a standard LLM? In the context of code editing, speed is critical. Standard LLMs generate code line-by-line, which can be slow for large blocks. A dLLM like Mercury Edit 2 generates tokens in parallel, which drastically lowers latency. Furthermore, by focusing strictly on the "next-edit" task, the model avoids the overhead of general-purpose models, leading to higher accuracy in code-specific scenarios.

Ultra-fast next-edit prediction for coding