Debunk EU scrapes millions of news articles with Zyte
Checking authenticity of Baltic region news stories
DebunkEU.org is using Zyte API with Extraction service to monitor and expose disinformation campaigns spread across media outlets in the Baltic region and further afield. To achieve this Debunk EU is currently scraping news-based websites worldwide in over 40 languages including Russian, Chinese, Iranian, Arabic, German, French, Ukrainian, Georgian, Balkan and Baltic languages. With the help of our easy-to-use Automatic Extraction API - plus friendly technical support from the Zyte team - Debunk EU is scraping around 1.5 million news articles every month from thousands of news sources.
Weāre really happy with the quality of Zyteās Automatic Extraction. We are also very satisfied by the level of technical support we get. Without Zyte we simply wouldnāt be able to do what we do.
Girius Merkys
CTO at DebunkEU.org
About
DebunkEU.org is an independently-funded think tank and non-governmental organization that tracks disinformation and misinformation campaigns across media outlets in Baltic countries and Poland, as well as in the United States and North Macedonia.
Its team of over 50 analysts and active volunteers conducts detailed fact-checking and research into disinformation concerns in the Baltic countries and Poland. The think-tank reports on topics including misinformation about COVID-19 and vaccines, political turmoil in Belarus and Russia, and attempts to target NATO activities.
Debunk EU publishes over 100 reports per year, and also runs a programme of educational media literacy campaigns. It also works closely with national institutions in partner countries that provide more valuable insights on the situation in the Baltics.
Challenges
Debunk EU aims to counter disinformation and information campaigns, with the goal of providing insights into complex issues in a concise, understandable and informative way.
From 2017, Debunk EU started exploring the options for collecting news articles from various sources. āAt that time all the commercial options were really expensive, so we developed our own extraction solution based on Scrapyā explains Debunk EU CTO Girius Merkys. āIt was OK, but we had something like 200 domains to monitor and it required a lot of maintenance.ā
As time passed, Debunk EU faced the growing challenge of monitoring more and more domains. āSome small countries that weāre interested in might have over a thousand news outletsā states Girius. āIn the disinformation space itās common to see lots of simple Wordpress-based websites controlled by one entity, all running the same story to give the impression that āit must be trueāā.
Girius also notes that the process of debunking false or misleading content online can be both costly and time consuming. āItās difficult to fact-check a piece of information if you do not know where to start. Whatās more, debunking disinformation costs way more than creating it.ā
In parallel with the constantly increasing number of media outlets to monitor, Girius observes that the process of extracting online news articles efficiently is becoming steadily more resource-intensive: āTo analyze so much data is quite a challenge. Page designs are also changing more and more frequently, and javascript based sites are becoming more popular. Itās very difficult to scrape that kind of content ā sometimes itās impossible.ā
Solution
To deal with the rapidly-growing scale and complexity of extracting millions of news articles, Debunk EU approached Zyte to provide a cost-effective and easy-to-use automated article extraction solution that would minimize development overheads for the busy Debunk EU team.
With the help of Zyte API, Debunk EU is able to track the evolution of disinformation campaigns by monitoring over 1.5 million online articles every month.
āAs weāve scaled up we didnāt want the hassle of having to keep maintaining Scrapyā says Girius. āAlso, because we are a non-commercial NGO we needed an affordable solution ā and thatās something Zyte has been able to offer us, plus technical assistance because of the sheer volume of requests we have every month.ā
As well as the quality and reliability of article extraction, Girius also welcomes the efficient support offered by the Zyte team: āWeāre very happy with the help we get. Without it, we wouldnāt be able to do our work and publish more than 100 reports every year. I really like the article list service. It really just makes everything much easier for us. We just give the link of the domain, then we get the article list and we just scrape it with your API. Itās automatic and itās really convenient.ā
Results - web scraping at scale
million/ articles per month
languages covered
domains monitored
Summary
With help from Zyte API Debunk EU is able to access millions of news articles every year ā with the capacity to grow smoothly as it monitors a greater range of media outlets in more territories.
Access any website
One powerful web scraping API to access all websites.Ā Per-site pricing that just makes sense.
Trusted by leading brands
Why Zyte API
Scrape websites of all complexity levels
Zyte API enables you to scrape websites of all complexity levels. Extract data using the right solution 100% of the time. Automate troubleshooting so that when proxy management alone can't get you what you need, use our single web scraping API.
Per-site pricing
Our pricing maps directly to your web scraping strategy. Cheaper for easy websites, and more expensive for difficult websites. Stop toggling between multiple scraping tools based on the use case with our web scraping API.