Zyte API now offers a fully hosted and managed browser service for web scraping
Are you sick of the endless cycle of development, hosting, monitoring, and maintenance required to manage your own headless browser infrastructure?
Scaling becomes a nightmare with multiple tools, varied browser profiles and the constant need for orchestration. Investing time, energy and cash into managing docker instances for your fleet of browsers only adds to the pain and total cost of ownership.
In this article, we want to share a new capability with Zyte API’s built-in headless browser: a scriptable, fully hosted and managed headless browser that can be called by a single line of code in your spiders.
Why use a headless browser for web scraping?
In general, headless browsers are used in web scraping for a few reasons:
Unblocking sites: used by some developers that see headless browsers as a way to manage anti-bot systems.
Browser interaction and actions: used when developers must perform user interactions with the website to expose data or navigate to content.
Rendering JS: access content that needs Javascript, from popups to fetching the data for a page from a hard-to-access JSON file.
Taking screenshots: used when visual data has to be scraped, often for quality verification. It requires a headless browser that can load the visual information of the webpage.
The way developers are using standalone headless browsers for web scraping is costly and time-consuming. Take Selenium and Puppeteer, two popular headless browsers broadly used by web scrapers to mimic user behavior. Even though you can easily integrate them with Zyte API and use its automatic ban handling, you’ll still need to manage multiple tools.
So, even when using free tools, the infrastructure needed to make them work seamlessly is not free.
Each browser instance must be hosted, usually on AWS servers or similar, which demands server configuration and maintenance. Also, you often need to create a cascading proxy system, perhaps from multiple vendors, creating multiple combinations of IPs and browsers to unblock the websites you need.
Scale that for millions of requests daily and you now have a complex structure that will probably require a dedicated developer or a whole team.
All of that can now be outsourced with confidence to Zyte API.
Introducing Zyte API’s headless browser
Zyte API presents a hosted and managed headless browser equipped with cutting-edge anti-ban technology.
Developers can now rely on Zyte API to seamlessly handle bans, render dynamic content, automate browser actions and generate screenshots, all without the complexities of managing their own browser infrastructure. Plus, it's incredibly easy to just “turn on/off” using simple parameters within Zytes API calls.
It automatically solves the ban handling problem with added benefits of browser actions, the ability to turn JavaScript on/off and to use it to render pages and take screenshots.
A faster and cheaper option than running your own headless browser systems
No need to host or maintain. Zyte has a team of developers maintaining the structure necessary to run the headless browser, guaranteeing robust service and uptime. Also, we use our own solution for in-house projects that scrape billions of pages monthly.
Solve bans automatically. You can use Zyte API as your headless browser solution, and by default, it will automatically solve any ban issues you face right out of the box. So you get a virtually unblockable headless browser that's a joy to use and makes accessing hard-to-reach content a breeze.Â
Request only when you need it. Headless browser is a feature of Zyte API, which means you can activate it at the spider level. This replaces the subscriptions for servers and proxy rotation tools when running your own headless browser system.
Predictable and scalable pricing. You can use our free cost calculator to estimate the price per request to unblock and render any website you need before committing to any plan or payment. Also, we offer custom packages for companies willing to scale their operation — the more you scrape with us, the less you pay per request.
It has a powerful IDE for setup and debugging, making it easy to test scripts, execute browser actions, capture network requests, and deploy solutions. Developers can access a library of premade functions for common interactions while retaining the flexibility to script custom actions.
Compatible with advanced features of Zyte API, such as browser actions, network capture, response headers, toggling JavaScript, geolocation, cookies, session contexts, redirection, metadata and more.
Test it now in a real web scraping project
You can quickly test Zyte API’s headless browser by starting a trial and running a query using the run query and/or IDE tools accessed via your dashboard. Another option is to use the AI Scraping feature with the rendering option turned on and scripting some browser actions on top of it.