DeepSeek Prover-V2: The secret weapon that makes math Olympiad easy!

A few days ago, rumors were circulating everywhere about the upcoming release of DeepSeek-R2. DeepSeek indeed has new developments, but instead of R2, what everyone got was DeepSeek-Prover-V2, which is, of course, open source.

Prover-V2 achieved the best performance in the theorem proving domain, reaching a pass rate of 88.9% in the MiniF2F test, and also scored well in AIME 24 and 25. On the evening of April 30, some technical details of DeepSeek-Prover-V2 were updated on the machine learning collaboration platform HuggingFace.

This time, the DeepSeek team released two versions of the DeepSeek-Prover-V2 model, with parameter sizes of 7B and 671B respectively. Among them, DeepSeek-Prover-V2-671B is trained based on DeepSeek-V3-Base, while DeepSeek-Prover-V2-7B is built on DeepSeek-Prover-V1.5-Base and supports a context length extension of up to 32K tokens.

DeepSeek-Prover-V2-7B link: DeepSeek-Prover-V2-7B
DeepSeek-Prover-V2-671B link: DeepSeek-Prover-V2-671B

To summarize in one sentence what DeepSeek-Prover-V2 is: it is an open-source large language model specifically designed for the "mathematical AI programming language" Lean 4, focusing on formal theorem proving.

Its initialization data is collected through a recursive theorem proving process driven by DeepSeek-V3. In the cold start training phase, DeepSeek-V3 first decomposes complex problems into a series of solvable sub-goals through prompts. Each time a sub-goal is solved, these proofs are integrated into a "chain of thought." It also incorporates the step-by-step reasoning trajectory of DeepSeek-V3 to jointly construct the initial training data for reinforcement learning.

The brilliance of this strategy lies in its ability to merge informal and formal mathematical reasoning into a unified model, allowing the model to think flexibly like a human while also reasoning rigorously like a machine, truly achieving an integrated fusion of mathematical reasoning.

Technical Overview#

Generating Cold Start Inference Data through Recursive Proof Search#

To build the cold start dataset, the DeepSeek team designed a concise and efficient recursive theorem proving process, using DeepSeek-V3 as a unified tool, responsible for both the decomposition of sub-goals and the formal expression of reasoning steps. The specific process involves guiding DeepSeek-V3 through prompts to decompose the theorem into high-level proof sketches while simultaneously formalizing these reasoning steps in Lean 4 language, ultimately generating a series of clearly structured and logically rigorous sub-goals.

Overview of the cold start data collection process used by DeepSeek-Prover-V2. Reducing computational overhead has always been a strength of the DeepSeek team, and this time is no exception. They used a smaller 7B model to complete the proof search for each sub-goal, thereby reducing the computational burden. Once all steps of the complex problem decomposition are successfully solved, they correspond the complete formal step-by-step proofs with the chain of thought generated by DeepSeek-V3, combining them into cold start inference data.

Reinforcement Learning Based on Synthetic Cold Start Data#

The DeepSeek team selected a portion of challenging theorem problems. Although the 7B proof model cannot solve them end-to-end, it can effectively handle the series of decomposed sub-goals. Integrating the proofs of all sub-goals can construct a complete formal proof of the original problem. Subsequently, this formal proof is attached to the chain of thought generated by DeepSeek-V3, which showcases the corresponding lemma decomposition process, thus forming a training dataset that tightly integrates informal reasoning with subsequent formal processes.

After fine-tuning the proof model with synthetic cold start data, the research team further introduced a reinforcement learning phase to enhance the model's ability to convert informal reasoning into formal proofs. During the training process, following the general objectives of the reasoning model, binary feedback of "correct/incorrect" is used as the primary reward signal.

The resulting model, DeepSeek-Prover-V2-671B, achieved state-of-the-art performance in neural theorem proving tasks, with a pass rate of 88.9% on the MiniF2F test, successfully solving 49 out of 658 problems in the PutnamBench dataset. All proofs generated by DeepSeek-Prover-V2 on the miniF2F dataset have been organized into a ZIP file and are available for download.

Download link: miniF2F Solutions