AI Data Press | Powered by EnterpriseDB © 2025
TikTok surpasses Google as the most scraped website in 2025, driven by AI's demand for diverse content.
Video-first platforms like YouTube now account for nearly 40% of all web scraping activity.
The rise in scraping coincides with increased scrutiny over data collection practices.
E-commerce is being reshaped by data's role in pricing strategies and customer experiences.
Amazon and Meta face challenges as they respond to the boom in data harvesting.
A new report from web scraping firm Decodo reveals that TikTok has dethroned Google as the most scraped website in 2025, driven by a massive demand for diverse content to train artificial intelligence models—a change that makes collecting multimodal data a core requirement for building advanced AI.
An appetite for video: The trend extends beyond just one app. According to the PPC Land analysis, video-first platforms like YouTube, now in the number four spot, collectively make up nearly 40% of all scraping activity. This marks a fundamental pivot from scraping simple text to gathering the rich, multimodal content that modern AI models now require.
The privacy paradox: The surge in scraping coincides with growing scrutiny over data collection. A separate study from privacy firm Incogni found that many of the most popular foreign-owned apps in the US show a pattern of aggressive data collection. The report noted that TikTok was the most aggressive of the group, gathering 24 different types of user data, including sensitive information like names and home addresses.
The trend is reshaping e-commerce, where, as Decodo senior manager Gabrielė Verbickaitė noted, this data now plays a “bigger role in pricing strategies, product assortment decisions, and shaping customer experiences.” The boom in data collection is prompting pushback, with Amazon actively blocking AI crawlers from its marketplace. Meanwhile, the scale of data harvesting continues to make headlines, as a leaked list revealed Meta's systematic scraping operations across millions of websites.