Debunk EU aims to counter disinformation and information campaigns, with the goal of providing insights into complex issues in a concise, understandable and informative way.
From 2017, Debunk EU started exploring the options for collecting news articles from various sources. “At that time all the commercial options were really expensive, so we developed our own extraction solution based on Scrapy” explains Debunk EU CTO Girius Merkys. “It was OK, but we had something like 200 domains to monitor and it required a lot of maintenance.”
As time passed, Debunk EU faced the growing challenge of monitoring more and more domains. “Some small countries that we’re interested in might have over a thousand news outlets” states Girius. “In the disinformation space it’s common to see lots of simple Wordpress-based websites controlled by one entity, all running the same story to give the impression that ‘it must be true’”.
Girius also notes that the process of debunking false or misleading content online can be both costly and time consuming. “It’s difficult to fact-check a piece of information if you do not know where to start. What’s more, debunking disinformation costs way more than creating it.”
In parallel with the constantly increasing number of media outlets to monitor, Girius observes that the process of extracting online news articles efficiently is becoming steadily more resource-intensive: “To analyze so much data is quite a challenge. Page designs are also changing more and more frequently, and javascript based sites are becoming more popular. It’s very difficult to scrape that kind of content – sometimes it’s impossible.”