• Home  
  • Reddit Set Trap for Perplexity in Data Scraping Legal Fight
- Business

Reddit Set Trap for Perplexity in Data Scraping Legal Fight

Employees at Reddit knew something was wrong. Perplexity — the $20 billion artificial intelligence company that competes with OpenAI and Google — had agreed to follow Reddit’s instructions, blocking it from scraping content from the site, according to a lawsuit Reddit filed Wednesday. But, the lawsuit said, Perplexity continued to cite Reddit in its AI-generated […]

Employees at Reddit knew something was wrong.

Perplexity — the $20 billion artificial intelligence company that competes with OpenAI and Google — had agreed to follow Reddit’s instructions, blocking it from scraping content from the site, according to a lawsuit Reddit filed Wednesday.

But, the lawsuit said, Perplexity continued to cite Reddit in its AI-generated answers — more than ever. The CEO of another AI company even speculated that Perplexity and Reddit secretly struck a content licensing deal.

“The increase was so dramatic that an outside observer hypothesized that the increase was due to Perplexity entering a licensing deal with Reddit and thereby obtaining full access to Reddit’s data,” Reddit’s lawsuit said.

“In truth, there is no license between Perplexity and Reddit,” the lawsuit said, adding that it was the result of “a scheme by Perplexity to obtain Reddit’s data through the circumvention of the technological measures protecting Reddit data.”

So Reddit set a trap. The company created a test post that could only be crawled by Google’s search engine, according to the lawsuit. While Google has a content-licensing deal with Reddit, Perplexity does not.

It was the digital equivalent of a “marked bill,” Reddit’s lawsuit said. According to the lawsuit, the only way Perplexity would be able to get the data in the test post is if it bypassed Reddit’s guardrails using Google’s search engine page results, or SERPS.

If the content from the post was ingested by Perplexity through Google, Reddit would know, according to the lawsuit.

A few hours after it set the trap, Reddit got its answer.

“Within hours, queries to Perplexity’s ‘answer engine’ produced the contents of that test post,” the lawsuit says. “The only way that Perplexity could have obtained that Reddit content and then used it in its ‘answer engine’ is if it and/or its Co-Defendants scraped Google SERPs for that Reddit content and Perplexity then quickly incorporated that data.”

Reddit described the test in its lawsuit, filed Wednesday in Manhattan federal court, against Perplexity and three data-scraping companies: Oxylabs UAB, AWM Proxy, and SerpApi. Reddit alleged the data-scraping companies may have taken its posts without permission and sold them to Perplexity.

Perplexity spokesperson Jesse Dwyer told Business Insider in response to the lawsuit that the company “will not tolerate threats against openness and the public interest.”

Perplexity said in a Reddit post after the lawsuit was filed that it “does not train AI models on content.”

A representative for SerpApi said the company plans to “vigorously defend ourselves in court, while Oxylabs’ chief governance and strategy officer, Denas Grybauskas, said the company was “shocked and disappointed.”

“Oxylabs has always been and will continue to be a pioneer and an industry leader in public data collection, and it will not hesitate to defend itself against these allegations,” Grybauskas said.

AWMProxy, identified in Reddit’s lawsuit as a former Russian botnet, could not be reached for comment.

Reddit’s trap resembles one set up by internet infrastructure company Cloudflare. In an August blog post, the company said it set up web pages with code that instructed Perplexity not to crawl those sites’ content. It found that Perplexity’s crawlers went to those websites anyway.

Cloudflare CEO Matthew Prince compared Perplexity to “North Korean hackers” for the behavior. Reddit cited the characterization in its lawsuit.

“Some supposedly ‘reputable’ AI companies act more like North Korean hackers,” Matthew Prince wrote on X in August. “Time to name, shame, and hard block them.”

Cloudflare did not respond to a request for comment.


First Appeared on
Source link

Leave a comment

Your email address will not be published. Required fields are marked *

isenews.com  @2024. All Rights Reserved.