Gemini's Multimodal Update vs GPT-5: Gaps Explained

Google Gemini’s multimodal update finally challenges GPT-5 on visual understanding. In June 2026, Google rolled out a major upgrade, targeting its longtime weak spot.

Until now, OpenAI held a clear edge in interpreting images, video and mixed media. However, that edge is starting to narrow.

What’s Actually New in Gemini’s Multimodal Update

First, Google pushed Gemini-3.1-flash-image and Gemini-3-pro-image to general availability in June. The update adds video-to-image generation, a fresh capability inside Gemini-3.1-flash-image.

Developers can now upload a video file or paste a YouTube link. Then, Gemini generates thumbnails, movie posters, or summary infographics automatically.

In addition, Google launched gemini-embedding-2, its first multimodal embedding model. The model maps text, images, video, audio, and PDFs into one shared space.

This upgrade powers File Search, which returns visual citations alongside text. Meanwhile, Google retired Imagen completely, shifting every workflow toward Gemini models.

How Gemini’s Multimodal Update Closes the Benchmark Gap

Following the rollout, independent benchmark testing confirms real progress. Gemini 3.1 Pro scores 82.8 on multimodal and grounded tasks. GPT-5.5 trails significantly, posting only 70.4 in the same category.

Additionally, MMMU-Pro produces the widest gap between the two models. Gemini also edges ahead on the overall leaderboard, scoring 89 to 88.

However, the margin stays thin enough to avoid calling it a defeat. Gemini wins decisively in vision tasks, yet barely wins overall.

GPT-5’s June Countermove

In response, OpenAI answered swiftly with its own preview release. The company unveiled GPT-5.6 Sol, Terra, and Luna on June 26.

However, OpenAI chose depth over breadth for the release. Sol pushes coding performance further, setting a new benchmark record on Terminal-Bench 2.1. The model also strengthens biology research and shows measurable cybersecurity gains via ExploitBench.

Furthermore, Terra matches GPT-5.5’s performance while costing half as much. Consequently, OpenAI built GPT-5.6 for reasoning and agentic work, not visual tasks. The choice reveals where each company sees its strongest ground.

Where the Gap Still Holds

Despite Gemini’s gains, GPT-5.5 keeps a clear lead in pure reasoning tasks. It averages 85 points against Gemini’s 77.1.

However, CritPt shows the sharpest divide between the two models and pricing tells a similar story. GPT-5.5 costs $5 per million input tokens and $30 per million output tokens. Gemini 3.1 Pro charges only $2 and $12 for the same workload.

Therefore, cost favors Google, while reasoning power still favors OpenAI. Buyers simply cannot pick one model and check every box.

Which Workflow Should Switch First

Overall, teams running document, image, and video pipelines gain the most right now. File search across mixed media now works inside one unified API. However, coding teams and security researchers should wait before switching models.

GPT-5.6 remains available only through a limited partner preview. OpenAI plans a broader rollout within weeks, not immediately. Until then, GPT-5.5 keeps its edge in reasoning-heavy production work.

Most teams will likely run both models, routing tasks by strength. Ultimately, Gemini closes a real gap in vision, while OpenAI still holds reasoning ground tightly.

What's Hot

Tesla Is Teaching Its Self-Driving AI With Millions of Fake Crashes. Here Is Why That Might Make Real Roads Safer

Why Gemini’s Multimodal Update Could Finally Close the Gap With GPT-5 — and Where It Still Falls Short

Anthropic Says Chinese Rival Alibaba Copied Claude at Scale. Here Is What Model Extraction Actually Means and Why It Matters

Tesla Is Teaching Its Self-Driving AI With Millions of Fake Crashes. Here Is Why That Might Make Real Roads Safer

Why Gemini’s Multimodal Update Could Finally Close the Gap With GPT-5 — and Where It Still Falls Short

Anthropic Says Chinese Rival Alibaba Copied Claude at Scale. Here Is What Model Extraction Actually Means and Why It Matters

OpenAI Just Built the One Thing That Could Make It Stop Depending on Nvidia. Here Is What Its First Custom Chip Does

Norway Just Banned AI in Elementary Schools. The Country That Already Removed Smartphones From Classrooms Is Now Drawing the Firmest Line Any Government Has Set Between AI and Children.

Market Collapse: What Happened to NFTs?

Quantum Computing Advances Force Coinbase and Institutional Custodians to Rethink Crypto Security

AI Assisted Hacking Groups Target Crypto Firms With Multi-Layered Social Engineering

Global Crypto Regulations Expand as 2026 Begins With New Data Collection Frameworks and National Laws

Coinbase Bets on Stablecoin and On-Chain Growth as Key Market Drivers in 2026 Strategy

Tesla Is Teaching Its Self-Driving AI With Millions of Fake Crashes. Here Is Why That Might Make Real Roads Safer

Tesla Is Teaching Its Self-Driving AI With Millions of Fake Crashes. Here Is Why That Might Make Real Roads Safer

Apple Just Rebuilt Siri With AI Across Every Device It Makes. WWDC 2026 Was Not a Software Update. It Was a Strategic Repositioning

The 1-Petaflop Superchip: How Nvidia RTX Spark Puts Local AI Agents Directly on Your Laptop.

Anthropic Says Chinese Rival Alibaba Copied Claude at Scale. Here Is What Model Extraction Actually Means and Why It Matters

Britain’s Cyber Agency Just Warned That AI-Generated Code Could Trigger the Next Wave of Catastrophic Security Failures. The Advisory Names Vibe Coding Directly and It Is Not a Mild Caution.

North Korea Compromised 144 AI Developer Packages in 88 Minutes Without Touching a Single Line of Source Code. The Mastra Attack Is the Most Targeted Supply Chain Strike Against AI Development Tools Ever Documented.

A Criminal Group Now Holds Working Credentials for More Than 70,000 Fortinet Firewalls Across 194 Countries and Is Still Active. Accenture, Oracle, Samsung and PwC Are Among the Named Victims of FortiBleed.

A Dataset of 24 Billion Stolen Usernames and Passwords Just Surfaced Online. Researchers Are Already Calling It the Largest Credential Exposure of 2026.

Why Gemini’s Multimodal Update Could Finally Close the Gap With GPT-5 — and Where It Still Falls Short

Tesla Is Teaching Its Self-Driving AI With Millions of Fake Crashes. Here Is Why That Might Make Real Roads Safer

Anthropic Says Chinese Rival Alibaba Copied Claude at Scale. Here Is What Model Extraction Actually Means and Why It Matters

OpenAI Just Built the One Thing That Could Make It Stop Depending on Nvidia. Here Is What Its First Custom Chip Does

Coinbase responds to hack: customer impact and official statement

Anthropic Will Use Claude User Chats For Data Training

Cursor AI Hits 1 Million Daily Users. Why Developers Are Switching to This Coding Tool

MIT Study Reveals ChatGPT Impairs Brain Activity & Thinking

Tesla Is Teaching Its Self-Driving AI With Millions of Fake Crashes. Here Is Why That Might Make Real Roads Safer

Why Gemini’s Multimodal Update Could Finally Close the Gap With GPT-5 — and Where It Still Falls Short

Anthropic Says Chinese Rival Alibaba Copied Claude at Scale. Here Is What Model Extraction Actually Means and Why It Matters

OpenAI Just Built the One Thing That Could Make It Stop Depending on Nvidia. Here Is What Its First Custom Chip Does

Our Picks

Most Popular

Coinbase responds to hack: customer impact and official statement

Anthropic Will Use Claude User Chats For Data Training

Cursor AI Hits 1 Million Daily Users. Why Developers Are Switching to This Coding Tool

Stay Ahead with Exclusive Updates!

What's Hot

Why Gemini’s Multimodal Update Could Finally Close the Gap With GPT-5 — and Where It Still Falls Short

What’s Actually New in Gemini’s Multimodal Update

How Gemini’s Multimodal Update Closes the Benchmark Gap

GPT-5’s June Countermove

Where the Gap Still Holds

Which Workflow Should Switch First

Related Posts