DeepCoder-14B-Preview

DeepCoder-14B-Preview is a code reasoning large language model (LLM) fine-tuned from DeepSeek-R1-Distilled-Qwen-14B using distributed reinforcement learning (RL). It specializes in generating accurate code solutions while scaling to long context lengths of up to 64K tokens.
The model aims to democratize advanced AI capabilities by leveraging open-source methodologies, achieving performance comparable to larger proprietary models like OpenAI’s o3-mini with only 14B parameters.

GRPO+ Training Framework: Enhances stability and efficiency by incorporating offline difficulty filtering, eliminating entropy loss, and removing KL constraints. This reduces runtime overhead and accelerates training while preserving reasoning accuracy.
Iterative Context Lengthening: Scales context handling from 16K to 32K tokens during training, enabling generalization to 64K-context inference. This improves performance on benchmarks like LiveCodeBench by 8% over the base model.
Overlong Filtering & Clip High: Masks loss for truncated sequences to maintain long-context reasoning integrity and encourages exploration through adjusted surrogate loss bounds.

Addresses the challenge of training LLMs to handle long-context code generation while maintaining accuracy. DeepCoder achieves 60.6% Pass@1 accuracy on LiveCodeBench v5, outperforming its base model.
Targets developers and researchers needing efficient, open-source tools for automated code generation, debugging, and optimization tasks.
Ideal for scenarios requiring scalable code solutions, such as real-time coding assistance, synthetic dataset generation, and complex algorithm design.

Unlike similar models, DeepCoder combines RL efficiency with a compact 14B parameter architecture, reducing computational costs while matching larger models’ performance.
Innovations like GRPO+ and iterative context scaling enable stable training and long-context generalization without proprietary dependencies.
Competitive advantages include MIT licensing for broad commercial use, integration with Hugging Face ecosystems, and transparency in training methodologies.

What benchmarks validate DeepCoder’s performance? The model achieves 60.6% Pass@1 on LiveCodeBench v5, a 8% improvement over its base model, and matches OpenAI’s o3-mini in key metrics.
How does GRPO+ improve training stability? By removing entropy/KL losses and implementing offline difficulty filtering, GRPO+ reduces runtime overhead and prevents training collapse.
Can DeepCoder handle 64K-token contexts? Yes, through iterative context lengthening and overlong filtering, it generalizes beyond its 32K training context to 64K during inference.
Is the model open source? Yes, DeepCoder is MIT-licensed and available on Hugging Face for commercial and research use.
What datasets were used for training? The model trains on 24K problem-test pairs from Taco-Verified, PrimeIntellect, SYNTHETIC-1, and LiveCodeBench v5 datasets.

Democratizing AI via open source RL