Google深夜炸場：Gemini 2.5 震撼發佈！號稱世界最強“思考型”AI，各種測試大幅領先

2025/03/26

•

Google DeepMind 正式推出了他們迄今為止最智能的AI模型——Gemini 2.5。這次發佈的第一個版本是 Gemini 2.5 Pro 實驗版 (Experimental)，Google稱其為“思考型模型”，在多個主流基準測試中實現了大幅領先，尤其在推理和程式碼能力上表現驚人

劃重點

Google長期探索提升AI推理能力的方法，比如強化學習（RL）和思維鏈（Chain-of-Thought）提示。之前的 Gemini 2.0 Flash Thinking 是首次嘗試，而 Gemini 2.5 通過顯著增強的基礎模型和改進的後訓練技術，將這種“思考”能力提升到了新高度，並直接內建到模型中。這意味著未來的Google模型將能更好地處理複雜問題，支援更強大的、具備上下文感知能力的AI智能體（Agents）

Gemini 2.5 Pro 實驗版：性能炸裂，直接看資料！

這次的 2.5 Pro 實驗版絕非浪得虛名，它在衡量人類偏好的 LMArena 排行榜上直接登頂，且優勢顯著，顯示出其強大的能力和高品質的輸出風格

以下是它在多個關鍵基準測試中的單次嘗試（pass@1）成績，對比了包括 OpenAI、Anthropic、Grok、DeepSeek 等友商模型（資料來源為Google官方及第三方榜單）：

核心亮點解讀：

超強推理：在GPQA、AIME 2025 等高難度推理基準上表現卓越。特別是 Humanity's Last Exam 無工具 18.8% 的成績

高級編碼：相比 2.0 有巨大飛躍，擅長建立視覺效果好的Web應用、程式碼智能體應用、程式碼轉換和編輯。在行業標準SWE-Bench Verified 上，使用自訂智能體設定達到 63.8%

繼承並強化 Gemini 優勢：

原生多模態：依然能理解文字、音訊、圖像、視訊甚至程式碼庫。
超長上下文窗口：發佈即支援 100 萬 token，性能優於前代，即將支援 200 萬 token！處理海量資料和複雜資訊源的能力更強

實測

我用三個提示測試了一下Gemini 2.5 Pro 的前端程式碼能力表現，總體感覺僅就前端來看Gemini 2.5 Pro不如deepseek v3 最新版,缺少了一些細節

提示1：幫我製作一個賽博朋克貪吃蛇遊戲，在單個HTML中運行

Gemini 2.5 實現效果

作為對比，這是deepseek v3 0324

提示2：Create a single HTML file containing CSS and JavaScript to generate an animated weather card. The card should visually represent the following weather conditions with distinct animations: Wind: (e.g., moving clouds, swaying trees, or wind lines) Rain: (e.g., falling raindrops, puddles forming) Sun: (e.g., shining rays, bright background) Snow: (e.g., falling snowflakes, snow accumulating) Show all the weather card side by side The card should have a dark background. Provide all the HTML, CSS, and JavaScript code within this single file. The JavaScript should include a way to switch between the different weather conditions (e.g., a function or a set of buttons) to demonstrate the animations for each

Gemini 2.5 實現效果：