Within the coming weeks, Reddit will begin blocking most automated bots from accessing its public knowledge. You’ll have to make a licensing deal, like Google and OpenAI have completed, to make use of Reddit content material for mannequin coaching and different industrial functions.
Whereas this has technically been Reddit’s coverage already, the corporate is now imposing it by updating its robots.txt file, a core a part of the online that dictates how net crawlers are allowed to entry a website. “It’s a sign to those that don’t have an settlement with us that they shouldn’t be accessing Reddit knowledge,” the corporate’s chief authorized officer, Ben Lee, tells me. “It’s additionally a sign to unhealthy actors that the phrase ‘enable’ in robots.txt doesn’t imply, and has by no means meant, that they’ll use the information nevertheless they need.”