Back to Blog

AI's Reality Check: Why Gemini 3 and GPT-5 Flunked the Physics Test

AI and Sons Team
November 24, 2025
0 comments
Industry Analysis
AI's Reality Check: Why Gemini 3 and GPT-5 Flunked the Physics Test

Top models like Gemini 3 and GPT-5 just faced the 'CritPt' physics benchmark, and the results were humbling. Here is what it means for the future of reasoning.

We often hear that AGI is around the corner. But a new benchmark released this week suggests otherwise. The CritPt test, designed by over 50 physicists, threw complex, unpublished research problems at the world's top AI models.

The Results? Failing Grades.

Even the brand-new Gemini 3 Pro and GPT-5 struggled to crack a 10% accuracy rate on these novel tasks. The test was designed to prevent 'memorization'—meaning the AI couldn't just look up the answer in its training data; it had to reason.

What This Means

This isn't a sign that AI is useless. It's a sign that AI is currently a Research Assistant, not a Lead Scientist. For developers and businesses, the takeaway is clear: use these tools to summarize and iterate, but do not trust them to solve net-new problems without deep human oversight.

Tags:AI BenchmarksGemini 3GPT-5ScienceDeep Learning Limitations
Share:
AA

AI and Sons Team

Content author at Ai and Sons, sharing insights on artificial intelligence and technology.

Discussion

0

Join the conversation

Sign in with your Google account to participate in the discussion, ask questions, and share your insights.

Related Posts

View All
Gemini 3 Has Arrived: Google's Newest Model Enters the Chat

Gemini 3 Has Arrived: Google's Newest Model Enters the Chat

Google's Gemini 3 is live as of this week. We break down the new 'Deep Think' capabilities and what they mean for your workflow.

GoogleGemini 3LLMs
AI and Sons Team
November 20, 2025
1 min read
0
When AI Boundaries Fail: Bedrock, LangSmith, and SGLang Raise the Stakes

When AI Boundaries Fail: Bedrock, LangSmith, and SGLang Raise the Stakes

Recent Bedrock AgentCore, LangSmith, and SGLang disclosures show how weak AI boundaries can combine data leakage, token theft, and remote code execution.

AI SecurityBedrockLangSmith
AI and Sons Team
March 18, 2026
6 min read
0
Secure AI Adoption for Normal Companies: A Practical 2026 Playbook

Secure AI Adoption for Normal Companies: A Practical 2026 Playbook

A practical roadmap for regular businesses to deploy AI safely in 2026, combining NIST, OWASP, and ISO guidance with concrete controls and a 90-day execution plan.

AI SecurityAI GovernanceAI Adoption
AI and Sons Team
March 13, 2026
5 min read
0