PINGDOM_CHECK
5 Mins

EuroPython, here we go!

We are very excited about EuroPython 2015!

33 Zytans from 15 countries will be meeting (most of them, for the first time) in Bilbao, for what is going to be our largest get-together event so far. We are also thrilled to have gotten our 8 sessions accepted (5 talks, 1 poster, 1 tutorial, 1 helpdesk) and couldn't feel prouder of being a Gold Sponsor.

Here is a summary of the talks, tutorials, and poster sessions that our staff will be giving at EuroPython 2015.

juan-talkJuan Riaza

Dive into Scrapy / 45-minute talk incl. Q&A

Tuesday 21 July at 11:45

Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

In this talk some advanced techniques will be shown based on how Scrapy is used at Scrapinghub.

Scrapy Helpdesk / 3 hours helpdesk

Date and Time yet to be defined

Scrapy is an open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.

This helpdesk is run by members of Scrapinghub, where Scrapy was built and designed.

Scrapy Workshop / 3 hours training

Friday 24 July at 14:30

If you want to get data from the web, and there are no APIs available, then you need to use web scraping! Scrapy is the most effective and popular choice for web scraping and is used in many areas such as data science, journalism, business intelligence, web development, etc.

This workshop will provide an overview of Scrapy, starting from the fundamentals and working through each new topic with hands-on examples.

Participants will come away with a good understanding of Scrapy, the principles behind its design, and how to apply the best practices encouraged by Scrapy to any scraping task.

shane-talkShane Evans

Advanced Web Scraping / 45-minute talk incl. Q&A

Tuesday 21 July at 11:00

Python is a fantastic language for writing web scrapers. There is a large ecosystem of useful projects and a great developer community. However, it can be confusing once you go beyond the simpler scrapers typically covered in tutorials.

In this talk, we will explore some common real-world scraping tasks. You will learn best practises and get a deeper understanding of what tools and techniques can be used.

Topics covered will include:

  • Crawling - single pages, websites, focussed crawlers, etc.
  • Data extraction - techniques for “scraping” data from from web pages (e.g. regular expressions, xpath, machine learning)
  • Deployment - how to run and maintain different kinds of web scrapers
  • Real world examples

alexander-talkAlexander Sibiryakov

Frontera: open-source large-scale web crawling framework / 30-minute talk incl. Q&A

Monday 20 July at 15:15

In this talk I’m going to introduce Scrapinghub’s new open source framework Frontera. Frontera allows to build real-time distributed web crawlers and website focused ones.

Offering:

  • customizable URL metadata storage (RDBMS or Key-Value based),
  • crawling strategies management,
  • transport layer abstraction.
  • fetcher abstraction.

Along with framework description I’ll demonstrate how to build a distributed crawler using Scrapy, Kafka and HBase, and hopefully present some statistics of Spanish internet collected with newly built crawler. Happy EuroPythoning!

Frontera: open-source large-scale web crawling framework / Poster session

Date and Time yet to be defined

In this poster session I’m going to introduce Scrapinghub’s new open source framework Frontera. Frontera allows to build real-time distributed web crawlers and website focused ones.

Offering:

  • customizable URL metadata storage (RDBMS or Key-Value based),
  • crawling strategies management,
  • transport layer abstraction.
  • fetcher abstraction.

Along with framework description I’ll demonstrate how to build a distributed crawler using Scrapy, Kafka and HBase, and hopefully present some statistics of Spanish internet collected with newly built crawler. Happy EuroPythoning!

eugene-talkEugene Amirov

Sustainable way of testing your code / 30-minute talk incl. Q&A

Monday 20 July at 15:45

How to write a test so you would remember what it does in a year from now? How to write selective tests with different inputs? What is test? How to subclass tests cases and yet maintain control on which tests would run? How to extend or to filter inputs used in parent classes? Are you a tiny bit intrigued now? 🙂

This is not another talk about how to test, but how to organize your tests so they were maintainable. I will be using nose framework as an example, however main ideas should be applicable to any other framework you choose. Explaining how some parts of code works I would have to briefly touch some advanced python topics, although I will provide need-to-know basics there, so people with any level of python knowledge could enjoy the ride.

lluis-talkLluis Esquerda

CityBikes: bike-sharing networks around the world / 45-minute talk incl. Q&A

Wednesday 22 July at 11:00

CityBikes [1] started on 2010 as a FOSS alternative endpoint (and Android client) to gather information for Barcelona’s Bicing bike sharing service. Later evolved as an open API [2] providing bike sharing data of any (mostly) service worldwide.

Fast forward today and after some C&D letters, there’s support for more than 200 cities, more than 170M historical entries have been gathered for analysis (in approx. a year) and the CityBikes API is the main source for open bike share data worldwide. This talk will tour about how we got there with the help of python and the community [3].

PS: We have a realtime map, it is awesome [4].

[1]: http://citybik.es
[2]: http://api.citybik.es
[3]: http://github.com/eskerda/pybikes
[4]: http://upcoming.citybik.es

In case you haven't registered for EuroPython yet, you can do it here.

If you have any suggestions for anything specific you would like to hear us talk about or questions you would like us to answer related to our talks, please tell us in the comments. If you're interested in finding out how web data extraction can help your business, feel free to reach out to us!