'Catastrophic overtraining' could harm large language AI models that are trained on more data for the sake of training

13 Apr, 2025

@Source: techradar.com

Skip to main content Tech Radar Pro Tech Radar Gaming Tech Radar Pro TechRadar the business technology experts Search TechRadar View Profile België (Nederlands) Deutschland North America US (English) Australasia New Zealand Expert Insights Website builders Web hosting Best web hosting Best website builder Best office chairs Expert Insights 'Catastrophic overtraining' could harm large language AI models that are trained on more data for the sake of training Wayne Williams 13 April 2025 University researchers found less is sometimes more when it comes to LLMs When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works. (Image credit: Shutterstock / NicoElNino) Researchers from top US universities warn extending pre-training can be detrimental to performance Too much pre-training can deliver worse performance due to something akin to the butterfly effect The more they are pre-trained, the more they become sensitive to small changes that could disrupt the end result Researchers from Carnegie Mellon, Stanford, Harvard, and Princeton are challenging one of AI development’s accepted core beliefs - that the more pre-training data the better the performance. As reported by HPCwire, a new paper discuses the concept of “catastrophic overtraining,” whereby extended pre-training can harm a model’s performance after fine-tuning. The researchers compared two versions of the OLMo-1B model, one trained on 2.3 trillion tokens and another on 3 trillion. Despite the larger training set, the more extensively trained model reportedly performed up to 3% worse on benchmarks like AlpacaEval and ARC. You may like DeepSeek and the race to surpass human intelligence Shut it all down? Microsoft research suggests AI usage is making us feel dumber – but you don't need to panic yet Reaching the inflection point This performance drop, the study claims, is linked to a phenomenon called “progressive sensitivity.” As the token count increases, the model becomes more fragile. Even small tweaks, like adjustments during fine-tuning, or the introduction of noise, can reverse earlier gains. The authors demonstrated this by injecting Gaussian noise into pre-trained models, noting that performance degraded more sharply the longer the model was trained. The point where this additional training starts to degrade performance is called the “inflection point.” Are you a pro? Subscribe to our newsletter Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed! Contact me with news and offers from other Future brandsReceive email from us on behalf of our trusted partners or sponsorsBy submitting your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over. Once reached, the benefits of training start to become outweighed by the risk of internal instability. The study found that this tipping point often occurs beyond 2.5 trillion tokens in smaller models, like OLMo-1B. “Catastrophic overtraining may be inevitable... especially when the pre-training and fine-tuning tasks are misaligned,” the authors warn in their paper, which you can access through the arXiv pre-print server. While the researchers are not suggesting an end to pre-training, they do feel that developers should consider just how much pre-training is enough. As the paper concludes, “Our findings call for a renewed focus on model scaling that considers the entire training pipeline.” For AI developers chasing scale, the message seems clear: sometimes, less really is more. You might also like 'An extension of a scientist's brain': Researchers explore AI to augment inspiration Researchers design tech that could 'potentially replace solar cells' in applications A new AI tool wants to make serendipitous scientific discovery less human Wayne Williams Social Links Navigation Wayne Williams is a freelancer writing news for TechRadar Pro. He has been writing about computers, technology, and the web for 30 years. In that time he wrote for most of the UK’s PC magazines, and launched, edited and published a number of them too. You must confirm your public display name before commenting Please logout and then login again, you will then be prompted to enter your display name. DeepSeek and the race to surpass human intelligence Shut it all down? Microsoft research suggests AI usage is making us feel dumber – but you don't need to panic yet Hallucinations are dropping in ChatGPT but that's not the end of our AI problems Navigating transparency, bias, and the human imperative in the age of democratized AI Is the DeepSeek hype justified? AI doesn't belong in the classroom unless you want kids to learn all the wrong lessons China has spent billions of dollars building far too many data centers for AI and compute - could it lead to a huge market crash? What’s next for AI innovation in a post-DeepSeek world What companies can learn from the gold rush for the AI boom The AI lie: how trillion-dollar hype is killing humanity I tried using the Deep Research feature with Google's Gemini 2.5 Pro model, and now I wonder if an AI can overthink The surprising reason ChatGPT and other AI tools make things up – and why it’s not just a glitch Latest in Pro “We want to work with the best" - Okta reveals new security tools designed to safeguard GenAI systems Mass quishing attacks linked to organized crime gangs across the UK Google Workspace is offering huge discounts for the US government Mark Zuckerberg allegedly offered US data to China in bid to enter market, ex-Meta exec tells Senate Microsoft study claims AI is still struggling to debug software China admits behind closed doors it was involved in Volt Typhoon attacks Canva launches Canva AI for coding, photo editing and spreadsheets Russian hackers hit military mission in Ukraine with info-stealing malware on external drives Oracle says "obsolete servers" hacked, denies cloud breach Amazon CEO says it has to operate like the “world's largest startup”, urges AI investment Mastering SaaS contract management: Five key strategies for IT leaders Top US sensor maker Sensata hit by worrying ransomware attack Latest in News Leaked renders may have given us our first proper look at the Google Pixel Watch 4 NYT Connections hints and answers for Monday, April 14 (game #673) Quordle hints and answers for Monday, April 14 (game #1176) NYT Strands hints and answers for Monday, April 14 (game #407) Are iPhone prices safe? Phones, computers, and chips are now exempt from US tariffs Netflix is testing an AI search engine to supercharge your recommendations Some of Siri's delayed Apple Intelligence features are tipped to arrive with iOS 19 NYT Connections hints and answers for Sunday, April 13 (game #672) Quordle hints and answers for Sunday, April 13 (game #1175) NYT Strands hints and answers for Sunday, April 13 (game #406) ICYMI: the week's 7 biggest tech stories from tariff-based iPhone panic buying to Samsung One UI 7 update taking its times Fujifilm's quirky new compact just leaked – and it could be 2025's most fun camera LATEST ARTICLES Microsoft is digging its own grave with Windows 11, and it has to stop AMD is making a handheld gaming PC chip with proper AI capabilities, but do gamers really need this? ‘The key is to build a bridge with iOS’: OnePlus has a plan to tackle Apple's smartphone industry dominance NYT Connections hints and answers for Monday, April 14 (game #673) NYT Strands hints and answers for Monday, April 14 (game #407) TechRadar is part of Future US Inc, an international media group and leading digital publisher. Visit our corporate site. Contact Future's experts Terms and conditions Privacy policy Cookies policy Advertise with us Web notifications Accessibility Statement Future US, Inc. Full 7th Floor, 130 West 42nd Street, Please login or signup to comment Please wait...