Reddit will block the Internet Archive

Reddit has announced that it will be blocking the Internet Archive's Wayback Machine from indexing the majority of its content, citing instances where AI companies have violated its platform policies and scraped data from the Wayback Machine. The limits will start "ramping up" today, and Reddit says it has reached out to the Internet Archive in advance to inform them of the changes. Reddit believes that not all of its content should be archived in this way, and it has raised concerns about the ability of people to scrape content from the Internet Archive in the past. The Wayback Machine will only be able to index the Reddit.com homepage, effectively limiting the archive's ability to preserve insights into the most popular news headlines and posts on the platform. Reddit has a history of cutting off access to scraper tools as AI companies have begun to use them en masse, but it is willing to provide that data if companies pay. The platform has struck deals with Google and OpenAI, but it has also sued Anthropic, claiming the company was still scraping from Reddit even after Anthropic said it wasn't scraping anymore. The Internet Archive's director, Mark Graham, has stated that the organization has a longstanding relationship with Reddit and that they continue to have ongoing discussions about this matter.
Note: This is an AI-generated summary of the original article. For the full story, please visit the source link below.