Open source at our heart
Make building spiders a breeze
Scrapy is an open source Python framework built specifically for web scraping by Zyte co-founders Pablo Hoffman and Shane Evans. Out of the box, Scrapy spiders are designed to download webpage data (HTML, JSON, XML…), parse and process the data and save it in any structured data format (e.g. CSV, JSON, XML).
Powerful open source technology
Scrapy boasts a wide range of built-in extensions and middlewares designed for handling cookies and sessions as well as HTTP features like compression, authentication, caching, user-agents, robots.txt and crawl depth restriction. It is also very easy to extend through the development of custom middlewares or pipelines to your web scraping projects which can give you the specific functionality you require.
Giving you the power of Data Extraction
Scrapy
Scrapy is our open source web crawling framework written in Python. Scrapy is one of the most widely used and highly regarded frameworks of its kind; very powerful yet easy to use.
Spidermon
Spidermon is our battle-tested open source spider monitoring library for Scrapy.
DateParser
DateParser is our library for parsing human-readable dates and times. Supports 18 languages.
Eli5
A library for debugging machine learning classifiers and explaining their predictions.
Formasaurus
Formasaurus figures out the type of an HTML form using machine learning. Is it a login, search, sign up, password recovery, contact form, etc.
W3lib
W3lib provides a number of useful web-related functions for your web scraping projects.
ScrapyRT
ScrapyRT let’s you reuse your spider’s logic to extract data from web pages through a single HTTP request.
Queuelib
Queuelib lets you create disk-based queues in Python.
Parsel
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
Cssselect
CSS Selectors for Python
Itemloaders
Library to populate items using XPath and CSS with a convenient API
Itemadapter
Common interface for data container classes
Protego
A pure-Python robots.txt parser with support for modern conventions.
Price-parser
Extract price amount and currency symbol from a raw text string
Number-parser
Parse numbers written in natural language
Used by companies powered by data
Dev tools that make scraping easy
Zyte API
Unblock websites with one powerful API
Highest success rate with lowest response times
Lowest total cost of ownership
Highest compliance standards built in
Only pay for what you use
AI Scraping for product data
Zyte Data
Get web data delivered quickly and accurately.
We extract data for the largest companies in the world so they don't have to
Tell us about your project, we'll handle the rest
Leverage our world-class legal team to inform compliance
Standard and bespoke web data extraction projects