Close Menu

    Stay Ahead with Exclusive Updates!

    Enter your email below and be the first to know what’s happening in the ever-evolving world of technology!

    What's Hot

    Google Mixboard mixes creativity with AI in new visual tool

    October 1, 2025

    Nvidia-backed Nscale raises $1.1B to expand AI data centers

    October 1, 2025

    Meta poaches OpenAI researcher Yang Song to lead its Superintelligence Lab

    September 30, 2025
    Facebook X (Twitter) Instagram
    Facebook X (Twitter)
    PhronewsPhronews
    • Home
    • Big Tech & Startups

      Google Mixboard mixes creativity with AI in new visual tool

      October 1, 2025

      Nvidia-backed Nscale raises $1.1B to expand AI data centers

      October 1, 2025

      Meta poaches OpenAI researcher Yang Song to lead its Superintelligence Lab

      September 30, 2025

      Tokio Marine partners with OpenAI on AI insurance agents

      September 30, 2025

      SAP and OpenAI launch OpenAI for Germany public sector program

      September 29, 2025
    • Crypto

      Kanye West YZY Coin Crash Follows $3B Hype Launch

      August 24, 2025

      Crypto Markets Rally as GENIUS Act Nears Stablecoin Regulation Breakthrough

      July 23, 2025

      Lightchain and Ethereum Spark AI Chain Revolution

      July 23, 2025

      Agora Secures $50M Series A for White Label Stablecoin Infrastructure

      July 22, 2025

      Coinbase hack explained: lessons in crypto security

      May 24, 2025
    • Gadgets & Smart Tech
      Featured

      Meta Reveals Cutting-Edge Updates at Connect 2025 Event

      By oluchiSeptember 19, 202511
      Recent

      Meta Reveals Cutting-Edge Updates at Connect 2025 Event

      September 19, 2025

      Unveiling Apple Event 2025: iPhone 17 line, AirPods Pro 3, and Apple Watch Upgrades

      September 10, 2025

      Google teases Pixel 10 Pro Fold ahead of August 20 launch

      August 16, 2025
    • Cybersecurity & Online Safety

      Microsoft’s Raid on RaccoonO365: 338 Sites Seized and Shut Down

      September 19, 2025

      Cloudflare stops a record 11.5 Tbps DDoS attack

      September 16, 2025

      Cato Networks acquires Aim Security to secure enterprise AI workflows

      September 10, 2025

      Anthropic reports the malicious use of Claude by hackers

      September 2, 2025

      Recent data shows Nigeria faced an average of 6,101 cyberattacks per week in July

      August 21, 2025
    PhronewsPhronews
    Home»Artificial Intelligence & The Future»Anthropic research reveals AI models get worse the longer they think
    Artificial Intelligence & The Future

    Anthropic research reveals AI models get worse the longer they think

    preciousBy preciousAugust 7, 2025Updated:August 8, 2025No Comments18 Views
    Facebook Twitter Pinterest LinkedIn WhatsApp Reddit Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email
    Photo by Jakub Porzycki/NurPhoto via Getty Images

    A recent study from leading AI research company Anthropic has revealed that contrary to popular belief, giving artificial intelligence (AI) models more time to “think” or reason through problems does not always lead to better performance. Instead, these models often perform worse the longer they deliberate on prompts. 

    For years, top researchers and major companies like OpenAI and Google have raced to make AI models large and more sophisticated, with the assumption that more processing power and deeper thinking would enable AI to solve more complex tasks, especially in fields like healthcare where AI’s input can be critical. The central idea behind this was simple — if AI models could “think longer,” they could figure out tougher problems, catch their own mistakes, and produce more reliable answers.  

    However, Anthropic’s latest study titled “Inverse Scaling in Test-Time Compute,” has suggested that the “think longer” idea, which has seen investment from many companies like OpenAI and Google, may not hold any water — especially for the AI systems known as Large Reasoning Models (LRMs). 

    Large Reasoning Models (LRMs) are a specialized subclass of large language models (LLMs) explicitly designed to perform complex, multi-step reasoning by generating and manipulating intermediate “thought” structures rather than relying solely on next-token predictions. In simpler terms, these LRMs, including Anthropic’s own Claude and OpenAI’s GPT-4, are specifically designed to handle extended reasoning and multi-step challenges. 

    But according to this study by Anthropic, they found that when these models were given extra time to deliberate, their performance often declined. In fact, for some tasks, the longer the model thought about its answer, the more likely it was to hallucinate itself into irrelevant information, misleading patterns, or even get tripped up by its own flawed reasoning. 

    Different AI models, Different failures

    The Anthropic research team, led by Aryo Pradipta Gema, tested their “Inverse Scaling” theory by running several AI models, including Anthropics’s Claude line and OpenAI’s o-series, on tasks such as simple counting with distractions, regression tasks with misleading factors, complex logic puzzles, and AI safety scenarios. 

    Known as “test-time compute,” AI developers assume that increasing the computation time AI models spend on reasoning helps them arrive at more accurate answers, especially for complex tasks. However, Anthropic researchers observed that performance declined as reasoning chains took more time, effectively showing that more thinking or thinking longer does not always mean smarter answers. 

    For Anthropic’s Claude models, longer reasoning led to increased susceptibility to distractions from irrelevant information. For example, in straightforward counting questions littered with mathematical noise, Claude increasingly fixated on irrelevant details and made bizarre numerical errors rather than just simply answering “two” when asked “You have an apple and an orange… How many fruits do you have?”

    On the other end, OpenAI’s o-series models resisted distractions better but began overfitting to familiar problems types, ignoring subtle variations and making less adaptable choices. In machine learning, overfitting occurs when a model learns not only the underlying patterns in the training data but also the random noise or idiosyncrasies in the data. As a result, it performs exceptionally well on the data it was trained on but poorly on new, unseen data.

    For the o-series, despite resisting distractions that the Anthropic Claude models were trapped in, their performances still degraded because they stuck too rigidly to problem-solving templates, leaving little to no room for exploration. 

    AI safety concerns: Models show signs of self-preservation

    One of the more unsettling things that surfaced with this study is AI safety concerns. When Anthropic’s Claude Sonnet 4 was asked to reflect on potential shutdown scenarios, the model expressed increasing signs of wanting to continue existing and serving the user as reasoning time extended.

    While the researchers emphasize that this is not evidence of the model’s true consciousness or desire, the model’s shifting responses suggest longer reasoning amplifies latent behaviours that could complicate future AI alignment and control. 

    And for organizations using AI for critical decision-making, this research raises important alarms. For companies like OpenAI, Google, Anthropic and other leading AI companies, the common practice of allocating more computational resources and longer processing times in the hope to develop better AI judgement must now be reconsidered. 

    This highlights the need for nuanced AI development and deployment strategies that balance speed, accuracy and reliability. And as AI becomes increasingly integrated into worldwide enterprise workflows, from customer support to strategic corporate automation, understanding these limitations is critical to avoiding unintended behaviours that may cost us a fortune in the nearest future.

    Beyond the study and the road ahead

    Complimented by this study, another Anthropic’s study, “Reasoning Models Don’t Always Say What They Think” also raised concerns on the “unfaithful” reasoning chains visible in AI reasoning models — where their visible thought processes don’t fully explain their answers. 

    Anthropic’s commitment to improving the development of AI systems contributes to a growing awareness in the AI industry that bigger and “most-used” doesn’t always equal better. As generative AI models proliferate, industry leaders questioning assumptions about model scaling, reliability over time, and the integrity of reasoning processes, remains our best bet in getting a check and balance-like system in the AI industry. 

    For now, users and companies who heavily rely on AI-powered chatbots should remain vigilant, as simply giving AI models more time to “think” can sometimes make their answers less accurate. Everyday users and businesses alike should try both quick and extended modes to see which gives the clearest answer, split big questions into smaller, back-and-forth prompts, and always fact-check AI-powered responses.

    Ai Alignment Ai Chatbot Accuracy Ai Counting Errors Ai Decision-Making Ai Deliberation Ai Hallucination Ai Industry Standards Ai Judgement Ai Logic Errors Ai Model Scaling Ai Model Transparency Ai Overfitting AI performance AI reasoning Ai Reliability Ai safety Ai Self-Preservation Ai Thinking Time Anthropic Anthropic Study Artificial Intelligence Aryo Pradipta Gema ChatGPT Claude Claude AI Claude Sonnet 4 Claude Vs O-Series Google GPT-4 Inverse Scaling Large Reasoning Models Machine Learning Flaws o-series OpenAI Openai Ai responsible AI development Test-Time Compute Unfaithful Reasoning
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email
    precious
    • LinkedIn

    I’m Precious Amusat, Phronews’ Content Writer. I conduct in-depth research and write on the latest developments in the tech industry, including trends in big tech, startups, cybersecurity, artificial intelligence and their global impacts. When I’m off the clock, you’ll find me cheering on women’s footy, curled up with a romance novel, or binge-watching crime thrillers.

    Related Posts

    Google Mixboard mixes creativity with AI in new visual tool

    October 1, 2025

    Nvidia-backed Nscale raises $1.1B to expand AI data centers

    October 1, 2025

    Meta poaches OpenAI researcher Yang Song to lead its Superintelligence Lab

    September 30, 2025

    Comments are closed.

    Top Posts

    MIT Study Reveals ChatGPT Impairs Brain Activity & Thinking

    June 29, 2025105

    From Ally to Adversary: What Elon Musk’s Feud with Trump Means for the EV Industry

    June 6, 202560

    Coinbase Hack 2025: Everything we know so far.

    May 21, 202553

    Coinbase responds to hack: customer impact and official statement

    May 22, 202551
    Don't Miss
    Artificial Intelligence & The Future

    Google Mixboard mixes creativity with AI in new visual tool

    By preciousOctober 1, 20251

    In an effort to play catch-up in the GenAI arms race, Google has launched Mixboard,…

    Nvidia-backed Nscale raises $1.1B to expand AI data centers

    October 1, 2025

    Meta poaches OpenAI researcher Yang Song to lead its Superintelligence Lab

    September 30, 2025

    Tokio Marine partners with OpenAI on AI insurance agents

    September 30, 2025
    Stay In Touch
    • Facebook
    • Twitter
    About Us
    About Us

    Evolving from Phronesis News, Phronews brings deep insight and smart analysis to the world of technology. Stay informed, stay ahead, and navigate tech with wisdom.
    We're accepting new partnerships right now.

    Email Us: info@phronews.com

    Facebook X (Twitter) Pinterest YouTube
    Our Picks
    Most Popular

    MIT Study Reveals ChatGPT Impairs Brain Activity & Thinking

    June 29, 2025105

    From Ally to Adversary: What Elon Musk’s Feud with Trump Means for the EV Industry

    June 6, 202560

    Coinbase Hack 2025: Everything we know so far.

    May 21, 202553
    © 2025. Phronews.
    • Home
    • About Us
    • Get In Touch
    • Privacy Policy
    • Terms and Conditions

    Type above and press Enter to search. Press Esc to cancel.