PINGDOM_CHECK

Announcing Smart Scraping Beta for Zyte API

How can I create more spiders, faster and avoid drowning in maintenance or compromising data quality?

One of the highlights of Extract Summit 2023 was the beta launch of a revolutionary web data extractionsolution that redefines how developers and non-developers collaborate, leveraging AI to gather valuable information from the web.


There are two key takeaways from this announcement:


  • Developers can now create spiders as reusable templates, with the solution generating a user-friendly UI.

  • Zyte provides an editable e-commerce scraping template powered by AI, automating crawling, parsing, and ban handling for the user.


Using this solution and the included template, users can start extracting data from e-commerce websites within seconds, without the need for ongoing maintenance, thanks to AI-powered functionality.


To give it a try simply login and navigate to Scrapy Cloud, and follow the instructions to create a new project with Zyte's AI-Powered Spiders beta.

For Developers:

Craft customizable scrapers, fine-tune them to perfection, and save them as reusable templates non-developers can configure in a user-friendly UI. These templates can be used by your team to tackle any web scraping task with precision. Say goodbye to repetitive coding and hello to efficiency.

For Non-Developers:

Zyte Smart Scraping opens the door to web data for everyone. Our user-friendly interface allows you to explore a library of pre-built scraping templates, each designed for specific types of websites or data. Simply choose a template, customize the inputs to match your needs, and let the solution do the rest. No coding required!


If you want to edit the template, it’s a breeze for a Scrapy developer.

AI-Powered Templates:

Zyte have thoughtfully crafted pre-made templates that leverage AI modules for automated crawling, parsing, and data extraction from common data types. These templates will eventually cover a wide range of data sources, from e-commerce product listings to news articles and more. For developers, there’s an open source template that includes these AI modules, providing a head start on complex scraping tasks.

AI
  • How it works

    • Developers create spiders, configure them, and save them as templates.

    • Non-Developers access these templates through the intuitive UI, selecting the one that suits their data needs.

    • Customization is a breeze – developers set your data extraction rules, and output formats effortlessly.

    • The solution generates a spider by combining the template with the users configuration, combining developer expertise with non-developer ease-of-use.

  • Benefits

    • Consistency: Templates ensure scraping adheres to best practices, maintaining data accuracy.

    • Efficiency: Developers save time by reusing templates; non-developers access data without technical barriers.

    • Empowerment: Web data extraction becomes accessible to all, fostering collaboration.

    • Scale: Using Zyte API to automate away bans and extraction grunt work frees you to scale

    • Centralized code: when you want to update spiders built using templates you can change them in one place.

  • So what does the AI template do?


    What are the major time consuming and maintenance heavy aspects of scraping?


    • Handling Bans

    • Creating Crawl Patterns

    • Writing Parsing Code

    • Fixing all of these fragile things when they break


    With Smart Scraping for Zyte API you can access the infinite scale of AI with the control and quality of custom code


    By using Zyte API as the foundation and Scrapy as the framework, the Smart Scraping templates and associated libraries automate many of the tedious, time-consuming tasks developers usually have to handle around bans, parsing and crawling. This solution takes care of the grunt work, so you don't have to.


    But isn't AI expensive and low quality?


    In the past, we might have agreed that AI wasn't economically viable and lacked the reliability needed for commercial-grade data collection. However, with the launch of Smart Scraping for Zyte API, we've made it possible for developers to use AI for web scraping when:


    • You want to obtain data quickly.

    • You want to keep collecting data without fixing code.

    • You want data from many sites.


    When the setup and maintenance costs of writing custom code outweigh the value of collecting data, AI becomes the key to success. We've worked on this for more than four years now - reducing costs and building models that can match or surpass human scraping. 


    Battle tested by our services division


    We know it works because we're our own biggest customer. We've found we can deliver data three times faster at the same quality for new websites, while generating radically fewer maintenance overheads.


    What if I want to modify templates or create my own templates?


    We've made it easy to modify and create your own custom spider templates using our Scrapy spider template as a base, or start entirely from scratch if you prefer.


    • Expand the schema

    • Send data to a third-party service

    • Filter results and customize crawl patterns

    • And much more.



    How can you try Smart Scraping for yourself?


    The great news is that an Open Beta will be available starting October 26, 2023, and you can use free trials to test the solution.


    You will need two things to get started:


    • A Scrapy Cloud account (which includes 1 free unit).

    • A Zyte API account (which includes $5 free credit).


    We believe this is the future of web data extraction, where technical and non-technical minds collaborate seamlessly with AI to harvest data from the digital world.


    To get started simply signup for a free Scrapy Cloud account, and follow the instructions to create a new project with Zyte's AI-Powered Spiders beta.


    PS. Currently, AI-powered templates are limited to e-commerce websites, but templates for news, media, and many other data archetypes will be coming soon.