AI's Reality Check: Why Gemini 3 and GPT-5 Flunked the Physics Test

AI and Sons Team

November 24, 2025

0 comments

Industry Analysis

Top models like Gemini 3 and GPT-5 just faced the 'CritPt' physics benchmark, and the results were humbling. Here is what it means for the future of reasoning.

We often hear that AGI is around the corner. But a new benchmark released this week suggests otherwise. The CritPt test, designed by over 50 physicists, threw complex, unpublished research problems at the world's top AI models.

The Results? Failing Grades.

Even the brand-new Gemini 3 Pro and GPT-5 struggled to crack a 10% accuracy rate on these novel tasks. The test was designed to prevent 'memorization'—meaning the AI couldn't just look up the answer in its training data; it had to reason.

What This Means

This isn't a sign that AI is useless. It's a sign that AI is currently a Research Assistant, not a Lead Scientist. For developers and businesses, the takeaway is clear: use these tools to summarize and iterate, but do not trust them to solve net-new problems without deep human oversight.

AI and Sons Team

Content author at Ai and Sons, sharing insights on artificial intelligence and technology.

Discussion

Join the conversation

View All

Gemini 3 Has Arrived: Google's Newest Model Enters the Chat

Google's Gemini 3 is live as of this week. We break down the new 'Deep Think' capabilities and what they mean for your workflow.

GoogleGemini 3LLMs

AI and Sons Team

November 20, 2025

1 min read

Microsoft Copilot Work IQ: The Enterprise Intelligence Layer Behind Agentic Work

An in-depth breakdown of Copilot Work IQ, what it changes in enterprise AI operations, and where teams should focus to turn context-aware copilots into real business outcomes.

Microsoft CopilotWork IQEnterprise AI

Ai and Sons Team

May 5, 2026

5 min read

Microsoft & OpenAI Enter the “Next Phase”: Non‑Exclusive IP License, Flexible Cloud Paths

April 2026 amendment keeps Microsoft primary on Azure yet lets OpenAI distribute across clouds; Microsoft’s model IP license runs to 2032 but is no longer exclusive.

MicrosoftOpenAICloud Partnerships

Ai and Sons Team

May 5, 2026

2 min read

The Results? Failing Grades.

What This Means

AI and Sons Team

Discussion

Join the conversation

Related Posts

Gemini 3 Has Arrived: Google's Newest Model Enters the Chat

Microsoft Copilot Work IQ: The Enterprise Intelligence Layer Behind Agentic Work

Microsoft & OpenAI Enter the “Next Phase”: Non‑Exclusive IP License, Flexible Cloud Paths