Tonight, the AI community was shaken! Google made a surprise move late at night, officially launching the "strongest reasoning model" Gemini 2.5 Pro! That's right, it's the large reasoning model from Google that I mentioned in my article yesterday! It has defeated Claude-3.7-Thinking, and the leaked large model, codenamed "Nebula," was previously reported to have particularly good performance, surpassing models like o1, o3-mini, and Claude 3.7 Thinking. I didn't expect the new model to be released so quickly; it was leaked on the 24th, and Google officially announced it on the 25th!
Gemini 2.5 Pro ranks first on the large model leaderboard LMSYS Arena, and it is a clear first! Its score is a full 40 points higher than Grok-3 and GPT-4.5! It's worth noting that previously, the top models on LMSYS had very close scores, only a few points apart. Grok had just announced breaking the 1400 score barrier, and now Gemini 2.5 Pro has jumped straight to 1443, setting the record for the largest jump up.
First of all, Gemini 2.5 Pro (model version is gemini-2.5-pro-exp-03-25) is a reasoning model, which Google claims is the most powerful model to date. Not only does it lead comprehensively, but it also has no weaknesses. It ranks first in all evaluation categories (comprehensive ability, coding, mathematics, creative writing, etc.), especially excelling in complex prompts with style control (Hard Prompts w/ Style Control) and multi-turn conversations (Multi-Turn).
Gemini 2.5 Pro is not only Google's largest reasoning model but also possesses multimodal capabilities, ranking first in the Vision Arena visual leaderboard. It ranks second in the web development leaderboard WebDev Arena, only behind Claude-3.7, whose programming position remains difficult to shake.
Now let's look at the specific scores on various benchmarks—Gemini 2.5 Pro achieved the best overall performance. It leads particularly in science (Science), code generation, visual reasoning (MMMU), and long text understanding (MRCR). In the so-called hardest test, "The Last Exam for Humans," Gemini 2.5 Pro is far ahead of OpenAI's o3-mini. In the so-called hardest AI test "The Last Exam for Humans," Gemini 2.5 Pro is far ahead of other models.
SWE-bench represents coding ability, while Aider Polyglot represents code editing level. After reviewing all the leaderboards, I can only say "terrifying!" Now, Gemini 2.5 Pro is already available for use in Google AI Studio and the Gemini APP. Portal: Google AI Studio
Next, let's look at the effects—
First: Mandelbrot Set Demonstration#
The Mandelbrot set is a collection of points that form fractals in the complex plane, and some call it the most bizarre and magnificent geometric figure ever created by humanity, once referred to as "God's fingerprint." Let's see the effect generated by Gemini 2.5 Pro.
Second: Web Mini Game#
Do you remember this familiar dinosaur running game? The black-and-white version in memory has turned into a colored version. The generation is quite impressive.
The biggest advantage of Gemini 2.5 Pro is that it still possesses native multimodal capabilities and an ultra-long context length, currently supporting up to a 1M window, with 2M on the way. However, the API pricing has not yet been announced. DeepSeek V3-0324 has also just been released, under the most permissive MIT license. Will it be a closed-source giant consolidating its stronghold, or an open-source camp promoting technological equality?