Close Menu

    Stay Ahead with Exclusive Updates!

    Enter your email below and be the first to know what’s happening in the ever-evolving world of technology!

    What's Hot

    Anthropic research reveals AI models get worse the longer they think

    August 7, 2025

    OpenAI signs strategic UK Partnership to build AI hubs in public services

    August 5, 2025

    China-linked hackers exploit SharePoint zero-day flaw to hit U.S. agencies

    August 3, 2025
    Facebook X (Twitter) Instagram
    Facebook X (Twitter)
    PhronewsPhronews
    • Home
    • Big Tech & Startups

      OpenAI signs strategic UK Partnership to build AI hubs in public services

      August 5, 2025

      China-linked hackers exploit SharePoint zero-day flaw to hit U.S. agencies

      August 3, 2025

      Zip Security raises $13.5M to help SMBs automate cybersecurity

      August 3, 2025

      OpenAI prepares to launch GPT-5 model in August

      July 31, 2025

      Trump administration unveils AI Action Plan to cut red tape and boost infrastructure

      July 29, 2025
    • Crypto

      Crypto Markets Rally as GENIUS Act Nears Stablecoin Regulation Breakthrough

      July 23, 2025

      Lightchain and Ethereum Spark AI Chain Revolution

      July 23, 2025

      Agora Secures $50M Series A for White Label Stablecoin Infrastructure

      July 22, 2025

      Coinbase hack explained: lessons in crypto security

      May 24, 2025

      Coinbase responds to hack: customer impact and official statement

      May 22, 2025
    • Gadgets & Smart Tech
      Featured

      EV Giant Tesla opens first India showroom in Mumbai

      By preciousJuly 28, 20253
      Recent

      EV Giant Tesla opens first India showroom in Mumbai

      July 28, 2025

      Google rolls out Veo 3 video generator to Pro & Ultra users

      July 19, 2025

      DStv Eyes Weekly Subscription Model Amid Economic Headwinds 

      June 26, 2025
    • Cybersecurity & Online Safety

      China-linked hackers exploit SharePoint zero-day flaw to hit U.S. agencies

      August 3, 2025

      Microsoft July 2025 Patch Tuesday update: 128 security vulnerabilities including SQL Server flaws

      July 26, 2025

      Scattered Spider gang steps up SIM-swap attacks on airlines

      July 15, 2025

      Ransomware Terror: How SafePay Hijacked Ingram Micro

      July 15, 2025

      SmartAttack: New Smartwatch Attack Shows How Air-gapped Systems Can Be Breached

      June 24, 2025
    PhronewsPhronews
    Home»Artificial Intelligence & The Future»Anthropic research reveals AI models get worse the longer they think
    Artificial Intelligence & The Future

    Anthropic research reveals AI models get worse the longer they think

    preciousBy preciousAugust 7, 2025No Comments1 Views
    Facebook Twitter Pinterest LinkedIn WhatsApp Reddit Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email
    Photo by Jakub Porzycki/NurPhoto via Getty Images

    A recent study from leading AI research company Anthropic has revealed that contrary to popular belief, giving artificial intelligence (AI) models more time to “think” or reason through problems does not always lead to better performance. Instead, these models often perform worse the longer they deliberate on prompts. 

    For years, top researchers and major companies like OpenAI and Google have raced to make AI models large and more sophisticated, with the assumption that more processing power and deeper thinking would enable AI to solve more complex tasks, especially in fields like healthcare where AI’s input can be critical. The central idea behind this was simple — if AI models could “think longer,” they could figure out tougher problems, catch their own mistakes, and produce more reliable answers.  

    However, Anthropic’s latest study titled “Inverse Scaling in Test-Time Compute,” has suggested that the “think longer” idea, which has seen investment from many companies like OpenAI and Google, may not hold any water — especially for the AI systems known as Large Reasoning Models (LRMs). 

    Large Reasoning Models (LRMs) are a specialized subclass of large language models (LLMs) explicitly designed to perform complex, multi-step reasoning by generating and manipulating intermediate “thought” structures rather than relying solely on next-token predictions. In simpler terms, these LRMs, including Anthropic’s own Claude and OpenAI’s GPT-4, are specifically designed to handle extended reasoning and multi-step challenges. 

    But according to this study by Anthropic, they found that when these models were given extra time to deliberate, their performance often declined. In fact, for some tasks, the longer the model thought about its answer, the more likely it was to hallucinate itself into irrelevant information, misleading patterns, or even get tripped up by its own flawed reasoning. 

    Different AI models, Different failures

    The Anthropic research team, led by Aryo Pradipta Gema, tested their “Inverse Scaling” theory by running several AI models, including Anthropics’s Claude line and OpenAI’s o-series, on tasks such as simple counting with distractions, regression tasks with misleading factors, complex logic puzzles, and AI safety scenarios. 

    Known as “test-time compute,” AI developers assume that increasing the computation time AI models spend on reasoning helps them arrive at more accurate answers, especially for complex tasks. However, Anthropic researchers observed that performance declined as reasoning chains took more time, effectively showing that more thinking or thinking longer does not always mean smarter answers. 

    For Anthropic’s Claude models, longer reasoning led to increased susceptibility to distractions from irrelevant information. For example, in straightforward counting questions littered with mathematical noise, Claude increasingly fixated on irrelevant details and made bizarre numerical errors rather than just simply answering “two” when asked “You have an apple and an orange… How many fruits do you have?”

    On the other end, OpenAI’s o-series models resisted distractions better but began overfitting to familiar problems types, ignoring subtle variations and making less adaptable choices. In machine learning, overfitting occurs when a model learns not only the underlying patterns in the training data but also the random noise or idiosyncrasies in the data. As a result, it performs exceptionally well on the data it was trained on but poorly on new, unseen data.

    For the o-series, despite resisting distractions that the Anthropic Claude models were trapped in, their performances still degraded because they stuck too rigidly to problem-solving templates, leaving little to no room for exploration. 

    AI safety concerns: Models show signs of self-preservation

    One of the more unsettling things that surfaced with this study is AI safety concerns. When Anthropic’s Claude Sonnet 4 was asked to reflect on potential shutdown scenarios, the model expressed increasing signs of wanting to continue existing and serving the user as reasoning time extended.

    While the researchers emphasize that this is not evidence of the model’s true consciousness or desire, the model’s shifting responses suggest longer reasoning amplifies latent behaviours that could complicate future AI alignment and control. 

    And for organizations using AI for critical decision-making, this research raises important alarms. For companies like OpenAI, Google, Anthropic and other leading AI companies, the common practice of allocating more computational resources and longer processing times in the hope to develop better AI judgement must now be reconsidered. 

    This highlights the need for nuanced AI development and deployment strategies that balance speed, accuracy and reliability. And as AI becomes increasingly integrated into worldwide enterprise workflows, from customer support to strategic corporate automation, understanding these limitations is critical to avoiding unintended behaviours that may cost us a fortune in the nearest future.

    Beyond the study and the road ahead

    Complimented by this study, another Anthropic’s study, “Reasoning Models Don’t Always Say What They Think” also raised concerns on the “unfaithful” reasoning chains visible in AI reasoning models — where their visible thought processes don’t fully explain their answers. 

    Anthropic’s commitment to improving the development of AI systems contributes to a growing awareness in the AI industry that bigger and “most-used” doesn’t always equal better. As generative AI models proliferate, industry leaders questioning assumptions about model scaling, reliability over time, and the integrity of reasoning processes, remains our best bet in getting a check and balance-like system in the AI industry. 

    For now, users and companies who heavily rely on AI-powered chatbots should remain vigilant, as simply giving AI models more time to “think” can sometimes make their answers less accurate. Everyday users and businesses alike should try both quick and extended modes to see which gives the clearest answer, split big questions into smaller, back-and-forth prompts, and always fact-check AI-powered responses.

    Anthropic Artificial Intelligence ChatGPT Claude Google o-series OpenAI
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email
    precious
    • LinkedIn

    I’m Precious Amusat, Phronews’ Content Writer. I conduct in-depth research and write on the latest developments in the tech industry, including trends in big tech, startups, cybersecurity, artificial intelligence and their global impacts. When I’m off the clock, you’ll find me cheering on women’s footy, curled up with a romance novel, or binge-watching crime thrillers.

    Related Posts

    OpenAI signs strategic UK Partnership to build AI hubs in public services

    August 5, 2025

    OpenAI prepares to launch GPT-5 model in August

    July 31, 2025

    Trump administration unveils AI Action Plan to cut red tape and boost infrastructure

    July 29, 2025

    Comments are closed.

    Top Posts

    MIT Study Reveals ChatGPT Impairs Brain Activity & Thinking

    June 29, 202597

    From Ally to Adversary: What Elon Musk’s Feud with Trump Means for the EV Industry

    June 6, 202558

    Coinbase Hack 2025: Everything we know so far.

    May 21, 202551

    Coinbase responds to hack: customer impact and official statement

    May 22, 202548
    Don't Miss
    Artificial Intelligence & The Future

    Anthropic research reveals AI models get worse the longer they think

    By preciousAugust 7, 20251

    A recent study from leading AI research company Anthropic has revealed that contrary to popular…

    OpenAI signs strategic UK Partnership to build AI hubs in public services

    August 5, 2025

    China-linked hackers exploit SharePoint zero-day flaw to hit U.S. agencies

    August 3, 2025

    Zip Security raises $13.5M to help SMBs automate cybersecurity

    August 3, 2025
    Stay In Touch
    • Facebook
    • Twitter
    About Us
    About Us

    Evolving from Phronesis News, Phronews brings deep insight and smart analysis to the world of technology. Stay informed, stay ahead, and navigate tech with wisdom.
    We're accepting new partnerships right now.

    Email Us: info@phronews.com

    Facebook X (Twitter) Pinterest YouTube
    Our Picks
    Most Popular

    MIT Study Reveals ChatGPT Impairs Brain Activity & Thinking

    June 29, 202597

    From Ally to Adversary: What Elon Musk’s Feud with Trump Means for the EV Industry

    June 6, 202558

    Coinbase Hack 2025: Everything we know so far.

    May 21, 202551
    © 2025. Phronews.
    • Home
    • About Us
    • Get In Touch
    • Privacy Policy
    • Terms and Conditions

    Type above and press Enter to search. Press Esc to cancel.