Last Updated on August 2, 2024 by Larious
Steve Huffman, the CEO of Reddit, has announced that Microsoft and other AI search engines will now have to pay if they want to continue data scrapping on the platform.
Earlier this year, Reddit reached an agreement with Google and OpenAI, allowing them to gain access to real-time content from Reddit’s data API to train their AI (artificial intelligence) models.
This latest move from Huffman is an attempt to end free data scraping on the popular platform, control how its data is used, and ensure its responsible use.
“Without these agreements, we don’t have any say or knowledge of how our data is displayed and what it’s used for, which has put us in a position now of blocking folks who haven’t been willing to come to terms with how we’d like our data to be used or not used,” Huffman said in an interview with The Verge.
He added, “Microsoft, Anthropic, and Perplexity for refusing to negotiate, saying it has been “a real pain in the ass to block these companies.
“We can no longer be completely open because we have to be very considerate of where our data ends up and what it’s used for. Any crawler that we don’t have a formal agreement with, we’re now blocking.”
In June 2024, Reddit implemented some blocks by updating its Robots Exclusion Protocol (robots.txt), which gives high-level instructions about how the company does and doesn’t allow Reddit to be crawled by third parties on its platform without authorization.
Those who want to access its data in search results must make a licensing deal with Reddit for content of any kind, including its use in web searches. It has already prevented Microsoft’s Bing search engine from accessing any comments and posts on its platform unless there is a formal agreement, news confirmed by Microsoft’s head of search Jordi Ribas on X.
While Reddit and Microsoft attempted to make a deal, their negotiations ended without agreement. A spokesperson from Reddit also said, “Anyone accessing Reddit content must abide by our policies, including those in place to protect redditors. We are selective about who we work with and trust with large-scale access to Reddit content.”
Currently, Google is the only search engine with access to Reddit content, with whom it signed a $60 million deal in February this year.
While Microsoft and Perplexity have yet to comment on The Verge’s report, Anthropic spokesperson Jennifer Martinez responded in a statement: “Reddit has been on our block list for web crawling since mid May, and we haven’t added any URLs from Reddit to our crawler since then. We respect robots.txt, the industry accepted signal for blocking web crawling.”