In a recent analysis, data journalist Ben Welsh discovered that a significant number of news websites are blocking AI web-crawling bots, such as Applebot-Extended and Google-Extended. These bots play a vital role in gathering information for search engines and other platforms. However, the decision to block them has raised questions about the impact on news outlets and their relationships with AI companies.
Welsh’s findings revealed that 25% of the news websites surveyed are blocking Applebot-Extended, compared to 53% blocking OpenAI’s bot. This shows a clear divide among news publishers regarding their stance on allowing these bots to crawl their websites. Some publishers have opted to block these bots, while others have entered into partnerships with AI companies to gain access to their data.
Major publishers like The New York Times, OpenAI, and Perplexity have entered into partnerships with AI companies to leverage their technology for gathering data. This strategic approach not only benefits the publishers in terms of access to valuable information but also raises questions about the commercial agreements involved in such partnerships.
Managing AI web crawlers can be a daunting task for news outlets, as the landscape is constantly evolving with new bots being introduced. This makes it difficult to keep an up-to-date block list of bots that should be restricted from accessing a website. Some companies, like Dark Visitors, offer services to automatically update a site’s robots.txt file to block unwanted bots.
Media executives are increasingly becoming involved in the decision-making process of which bots to block on their websites. Some CEOs of major media companies are directly involved in determining which AI scraping tools should be restricted. This highlights the growing importance of robots.txt files in the digital publishing industry and the challenges faced by media executives in managing them effectively.
The decision to block AI web-crawling bots on news websites has far-reaching implications for both publishers and AI companies. While some publishers choose to block these bots due to lack of commercial agreements, others see the value in partnering with AI companies to gain access to valuable data. Moving forward, it will be crucial for news outlets to carefully consider their approach to managing AI web crawlers and the impact it may have on their content distribution strategies.