• Home  
  • Clickbait Gives AI Models ‘Brain Rot,’ Researchers Find
- Technology

Clickbait Gives AI Models ‘Brain Rot,’ Researchers Find

If you think scrolling the internet all day is making you dumber, just imagine what it’s doing to large language models that consume a near-endless stream of absolute trash crawled from the web in the name of “training.” A research team recently proposed and tested a theory called “LLM Brain Rot Hypothesis,” which posited that […]

If you think scrolling the internet all day is making you dumber, just imagine what it’s doing to large language models that consume a near-endless stream of absolute trash crawled from the web in the name of “training.” A research team recently proposed and tested a theory called “LLM Brain Rot Hypothesis,” which posited that the more junk data is fed into an AI model, the worse its outputs would become. Turns out that is a pretty solid theory, as a preprint paper published to arXiv by the team shows “brain rot” impacts LLMs and results in non-trivial cognitive declines.

To see how LLMs perform on a steady diet of internet sewage, researchers from Texas A&M University, University of Texas at Austin, and Purdue University identified two types of “junk” data: short social media posts that have lots of engagement, including likes and reposts, and longer content with clickbait headlines, sensationalized presentation, and a superficial level of actual information. Basically, the same type of content that is also rotting out our own brains. With that in mind, the researchers scraped together a sample of one million posts on X and then trained four different LLMs on varying mixtures of control data and junk data to see how it would affect performance.

And wouldn’t you know it, it turns out that consuming directly from the internet landfill that is X isn’t great for thinking clearly. All four models tested—Llama3 8B, Qwen2.5 7B/0.5B, Qwen3 4B—showed some forms of cognitive decline. Meta’s Llama proved the most sensitive to the junk, seeing drops in its reasoning capabilities, understanding of context, and adherence to safety standards. Interestingly, a much smaller model, Qwen 3 4B, proved more resilient, though still suffered declines. It also found that the higher the rates of bad data, the more likely a model was to slip into “no thinking” mode, failing to provide any reasoning for its answer, which was more likely to be inaccurate.

More than just getting “dumber” in its thinking, though, the researchers found the inclusion of junk also resulted in an interesting effect: it led to changes in the model’s “personality,” succumbing to what the researchers called “dark traits.” For instance, the Llama 3 model displayed significantly higher levels of narcissism and became less agreeable. It also went from displaying nearly no signs of psychopathy to extremely high rates of the behavior.

Interestingly, the researchers also found that mitigation techniques done to try to minimize the impact of the junk data couldn’t fully reverse the harm of bad information. As a result, the researchers warn that the process of crawling the web for any and all data may not actually produce better results for LLMs, as the volume of information does not equate to quality. They suggest more careful curation could be necessary to address these potential harms, as there may not be any going back once you feed the model garbage. Apparently, for LLMs, the “you are what you eat” rule applies.

First Appeared on
Source link

Leave a comment

Your email address will not be published. Required fields are marked *

isenews.com  @2024. All Rights Reserved.