PINGDOM_CHECK
Light
Dark

Five key takeaways from Extract Summit 2025

Read Time
5 mins
Posted on
November 10, 2025
Table of Content

What is the state of web data right now? The sixth annual Web Data Extract Summit brought together a couple of hundred of the web scraping space’s finest to answer that question.


Held on November 5 and 6, 2025, at the Gibson Hotel in Dublin, Ireland, the summit featured one day of debate and one full day of hands-on workshops.


The summit concluded with a sense that web data gathering is becoming simultaneously easier and more challenging.

1. AI is accelerating the hard work of scraping

After a year of hype over no-code AI data extraction tools, AI is settling into the scraping workflow as professional developers’ new best friend.


Scrapoxy creator Fabien Vauchelles demonstrated how he uses an LLM inside his code editor to reverse-engineer the inner workings of complex, obfuscated anti-bot mechanisms, saying: “This kind of work would have taken me three months in the past. I can do that in 15 minutes now.”

Zyte used Extract Summit Dublin to launch the beta of its Web Scraping Copilot, a Visual Studio Code extension with automated parsing code generation. Chief product officer Iain Lennon told data developers he aims to keep them in control of their pipeline with “partial autonomy” and “a sliding scale of choice”:


"We're asking ourselves, ‘How can we accelerate code for web scraping with AI?’ Code is still the best solution.”

– Iain Lennon, chief product officer, Zyte


Developers got hands-on with the tool during Extract Labs workshop sessions.


Such developments are rapidly accelerating the “time to data” for web scraping engineers. Zyte chief operating officer Suzanne Hassett told attendees:


"I have responsibility for over 100 developers - this is doubling our output. This is huge for us.”

– Suzanne Hassett, chief operating officer, Zyte

2. Models may choke on a dead internet

AI is not all good news. “Dead internet theory” - the idea that the internet is becoming a hollowed-out shell, populated more by bots than by humans - is now a statistical reality, said Domagoj Marić, AI customer delivery manager at Pontis Technology, in a talk that shocked listeners with a grim picture of a "synthetic web" drowning in AI-generated content.

Attendees were struck by Marić’s depiction of a web on which bots post text comments, 50% of traffic is now from non-human sources and generative AI is used to create photos and videos that are as irresistible as they are inauthentic.


There may be a looming irony: next-generation AI models whose forebears were used to create fake content are now feeding on their own derivative output as training input - a cannibalistic loop that could lead to “model collapse”.


Want to keep the internet human and your data grounded in reality? Marić urged attendees to “support authentic content”, “get off Facebook” and “not just be lurkers”.

3. The access wars are heating up

The cat-and-mouse game between web scrapers and the anti-bot systems aiming to protect sites is escalating into a high-tech, high-stakes arms race, with speakers at the Extract Summit 2025 declaring that the old rules of engagement are dead.

According to speakers’ talks (and the chat over coffee and Guinness), the battle has moved beyond simple IP-based blocking, with Antoine Vastel, head of research at anti-fraud platform provider Castle, saying an IP address  is now considered a "weak signal”.


In its place? Anti-bot systems are looking to identify a scraper’s entire persona, attendees heard. This includes the network fingerprint (TLS/JA4), the browser fingerprint (Canvas, WebGL, audio context), and user behavior.


This technical escalation is driving up the cost of entry for data-gatherers. Scraping expert Fabien Vauchelles explained:


"The goal of the anti-bot (systems) is to raise the bar every time," 

– "Fabien Vauchelles, creator, Scrapoxy


He framed the conflict as an economic tit-for-tat that is designed to make data access prohibitively expensive.

That skirmish also has seasons. Speaking on a panel, Kenny Aires of Zyte’s done-for-you data delivery team said that, two weeks before major shopping events like Black Friday, "we see the anti-bots upgrading... it's very challenging," creating a frantic scramble for scraping teams. 


“We see even small websites (begin to) use protection,” Aires said.

4. The devil is in the detail

For anyone gathering web data these days, attention to detail is emerging as a key skillset.


Kieron Spearing of Centric Software championed an "investigative mindset," urging developers to "take their time" to forensically deconstruct every request. Spearing argued that a single technical flaw can produce a "cascading failure" that can derail an entire large-scale operation. Meticulousness, he insisted, is the only path to building stable, scalable scrapers.

The legal status of scraped web data, too, hinges on many fine nuances.


For example, web data gatherers already know that website owners can declare their access preferences to crawlers in a robots.txt file.


But, speaking on a panel, Dr. Bernd Justin Jütte, associate professor in intellectual property law at University College Dublin, said: "A recent ruling in Germany...said the declaration doesn't have to be machine-readable... it can also be written in natural language into...the terms and condition of the website.” Respecting wishes articulated in myriad different formats could prove challenging.

According to Dr. Nikos Minas, global IP counsel, Wesco International:


"Where do you get your data? That should be your primary concern.”

– Dr. Nikos Minas, global IP counsel, Wesco International.

5. An ID card for your agent?

Though web publishers and data gatherers continue to size each other up, it’s no longer a two-sided world, as autonomous web agents also now enter the fray.


They may not be recognisable as data extraction tools, but some large operators are now beginning to block the likes of ChatGPT Agent and Perplexity’s Comet browser.


Scrapoxy’s Vauchelles warned:


“We are moving toward some kind of a closed internet.”

– Fabien Vauchelles, creator, Scrapoxy


“The future is pretty clear,” he said. “Major websites want to make deals and build authentication systems. The website will say ‘Okay I’ll let you pass’ - but, perhaps for other users, you won't have the same access.”

Castle’s Antoine Vastel, channeling defensive website owners’ perspective, sees promise in Web Bot Auth, a brand-new potential protocol for managing bot access.


“What I like with this standard is that it's crypto, so it's secure,” he told a panel. "Big platforms are asking questions. They don't really know what to do with AI agents. They first want to get visibility - you can't have a strategy if you don't know.”

Summary

Web data scraping is simultaneously becoming easier and more difficult.


Tools like web scraping APIs and AI add-ons are emerging to eliminate the busywork that goes into data gathering.


But Extract Summit 2025 heard accessing sites on the open web is now becoming more challenging and costly than ever for those who lack economies of scale and skill.


Want to go deeper? Watch all summit talks and panel discussions on the Extract Summit YouTube channel.

×

Try Zyte API

Zyte proxies and smart browser tech rolled into a single API.