GPT-4.1 in the API

GPT-4.1 in the API is a new series of advanced language models released by OpenAI, comprising three variants: GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano. These models are optimized for coding, instruction following, and long-context comprehension, with a refreshed knowledge cutoff of June 2024. They are designed for API integration, offering developers enhanced performance, lower latency, and reduced costs compared to previous GPT-4o and GPT-4.5 models.
The core value of GPT-4.1 lies in its ability to deliver state-of-the-art AI capabilities for enterprise applications while balancing cost efficiency and scalability. It addresses critical developer needs such as reliable code generation, precise adherence to complex instructions, and processing of up to 1 million tokens of context, enabling advanced use cases like document analysis, agentic systems, and real-time applications.

Enhanced Coding Performance: GPT-4.1 achieves a 54.6% success rate on SWE-bench Verified, a 21.4% absolute improvement over GPT-4o, making it ideal for software engineering tasks like code reviews, patch generation, and full-stack development. It reduces extraneous code edits from 9% to 2% and supports diff formats for efficient code changes.
Superior Instruction Following: The model scores 38.3% on Scale’s MultiChallenge benchmark, a 10.5% absolute gain over GPT-4o, excelling in format adherence, negative instruction compliance, and multi-turn coherence. It improves reliability in applications requiring structured outputs (e.g., XML/YAML generation) and context-aware interactions.
1 Million Token Context Window: GPT-4.1 processes up to 1 million tokens with improved comprehension, enabling analysis of large codebases, legal documents, or video transcripts. It achieves 72.0% accuracy on Video-MME (long, no subtitles), outperforming GPT-4o by 6.7% absolute, and reliably retrieves "needles" in synthetic and real-world long-context evaluations.

Complex Coding Workflows: GPT-4.1 reduces manual debugging by generating functional code with fewer errors, addressing pain points like inconsistent tool usage and incremental code edits. For example, it improves code review accuracy by 55% in real-world tests compared to other models.
Enterprise Developers and AI Engineers: The model targets developers building AI-powered tools for software engineering, data analysis, and customer support, as well as enterprises needing scalable solutions for document processing or multimodal tasks.
Use Cases: Legal document review (17% accuracy gain for Thomson Reuters), financial data extraction (50% improvement for Carlyle), real-time autocompletion (GPT-4.1 nano), and agentic systems for customer service or coding assistance (e.g., Hex’s SQL workflows).

Performance vs. Cost: GPT-4.1 mini matches or exceeds GPT-4o’s intelligence at half the latency and 83% lower cost, while GPT-4.1 nano delivers 80.1% MMLU accuracy at 1/10th the price of GPT-4o. The series outperforms GPT-4.5 in coding and long-context tasks despite lower compute requirements.
Specialized Training: The models are fine-tuned for real-world developer feedback, with improvements in code diff formats, multi-hop reasoning (e.g., Graphwalks BFS), and vision tasks (74.8% on MMMU). GPT-4.1 nano is OpenAI’s first nano-sized model optimized for speed and cost-sensitive applications.
API-Exclusive Optimizations: Unlike ChatGPT, GPT-4.1 is tailored for programmatic use, featuring prompt caching (75% discount on repeated contexts), 32,768-token output limits, and seamless integration with tools like the Responses API for agentic workflows.

How does GPT-4.1 differ from ChatGPT’s GPT-4o? GPT-4.1 is API-exclusive and optimized for developer use cases, with specialized training for coding, instruction following, and long context. While ChatGPT incorporates some GPT-4o improvements, GPT-4.1 offers higher accuracy, lower latency, and cost savings for programmatic applications.
Is GPT-4.1 backward-compatible with GPT-4.5 Preview? GPT-4.1 replaces GPT-4.5 Preview, which will be deprecated on July 14, 2025. Developers should migrate to GPT-4.1, as it matches or exceeds GPT-4.5’s performance on coding and instruction following at reduced costs.
Can GPT-4.1 handle 1 million tokens in real time? GPT-4.1 processes 1 million tokens with latency under one minute for full context loads, while GPT-4.1 nano delivers sub-5-second responses for 128k-token inputs. For repetitive tasks, prompt caching reduces latency and costs by 75%.

Announcing GPT-4.1, GPT-4.1 mini, & GPT-4.1 nano in the API