AI models surpass cutoff scores in Chile's PAES 2026 test

2026年01月08日(木)

AIによるレポート

A study applying Chile's university entrance exam, PAES 2026, to AI models shows several systems scoring high enough for selective programs like Medicine and Civil Engineering. Google's Gemini led with averages near 950 points, outperforming rivals like ChatGPT. The experiment underscores AI progress and raises questions about standardized testing efficacy.

A study by Professor Jonathan Vásquez, Ph.D. in Computer Science from the University of Valparaíso, and Sebastián Cisterna, MBA from Harvard and professor at Universidad Adolfo Ibáñez, assessed AI models' performance on the PAES 2026. The researchers simulated responses to official tests, determining accessible careers as if they were real applicants.

Google led with Gemini 3 Flash, averaging 957.38 points and scoring 1,000 in History and Social Sciences, Biology, Physics, Reading Competency, and Math Competency 1. Its Pro version averaged near 950 points, qualifying for any career in Chilean universities. 'Gemini surpassed' ChatGPT, the authors noted, with lighter models showing unexpected maturity.

All models achieved 100% in History and Social Sciences, a standard that was exceptional in 2025. OpenAI's GPT-5.2 Extended Reasoning performed well in Language and Sciences, accessing fields like Journalism or Psychology, but lagged in Math M2 for complex engineering. GPT-5.2 Instant suited social sciences and education.

Chinese model DeepSeek excelled in cost-efficiency: up to 14 times cheaper in fast versions and 30 in reasoning modes, with an 880-point average for programs like Pedagogy or Nursing, but not top Medicine spots.

Cisterna observed that 'more reasoning' modes didn't always outperform faster ones, challenging expectations. The authors stress AIs optimize prior data, not 'learn' like humans, questioning tests' ability to measure human skills in an automation era: 'The question is no longer just what career an AI could study, but how well current selection metrics reflect expected human competencies'.

Illustration of OpenAI's GPT-5.4 launch, showing enhanced AI models for knowledge work in a modern office setting amid competition.

OpenAI releases GPT-5.4 models for knowledge work

2026年03月06日(金) AIによるレポート AIによって生成された画像

OpenAI has launched GPT-5.4, including variants Thinking and Pro, aimed at improving agentic tasks and knowledge work. The update features enhanced computer-use capabilities and reduced factual errors, amid competition from Anthropic following a US defense deal controversy. The models are available immediately to paid users and developers.

Study finds Google's AI Overviews wrong in 10% of cases

A New York Times analysis shows Google's AI Overviews, powered by Gemini, answering correctly only 90% to 91% of questions in a standard benchmark. This translates to tens of millions of incorrect responses daily across searches. Google disputes the test's relevance.

AI models fail to profit from Premier League betting in new study

2026年04月11日(土) AIによるレポート

AI systems from leading companies including Google, OpenAI, Anthropic and xAI lost money when betting on soccer matches in a simulated 2023-24 Premier League season, according to a report by startup General Reasoning. The study, called KellyBench, tested eight top models on their ability to manage risk and adapt over time. Anthropic's Claude Opus 4.6 performed best with an average 11 percent loss, while xAI's Grok 4.20 repeatedly failed.

Linux

Study shows AI can deanonymize online users from posts

技術

OpenAI launches ChatGPT Images 2 image generation model

科学

Physicists debate AI's impact at Denver summit

New study questions Centaur AI's cognitive simulation claims

Researchers from Zhejiang University have challenged the capabilities of the Centaur AI model, arguing it memorizes patterns rather than truly understanding tasks. Their findings, published in National Science Open, suggest limitations in instruction comprehension. The work critiques a July 2025 Nature study that hailed Centaur's performance across 160 cognitive tasks.

2026/03/31 02:54

AI models surpass cutoff scores in Chile's PAES 2026 test

関連記事

OpenAI releases GPT-5.4 models for knowledge work

Study finds Google's AI Overviews wrong in 10% of cases

AI models fail to profit from Premier League betting in new study

Study shows AI can deanonymize online users from posts

OpenAI launches ChatGPT Images 2 image generation model

Physicists debate AI's impact at Denver summit

New study questions Centaur AI's cognitive simulation claims

UK study reveals AI agents evading safeguards in user interactions

Increased AI chatbot use among Swedes – but also concerns

Generative AI in gaming faces pushback at GDC 2026

Top AI coding assistants fail one in four tasks

Spanish Congress deputies use AI to prepare speeches

このウェブサイトはCookieを使用します