PINGDOM_CHECK

How to use Playwright with Zyte Smart Proxy Manager

Read Time
6 Mins
Posted on
June 3, 2022
We have launched a new Zyte SmartProxy Playwright and we’re sure you’re going to love it!
By
Neha Setia Nagpal
×

Try Zyte API

Zyte proxies and smart browser tech rolled into a single API.
Return to top

How to use Playwright with Zyte Smart Proxy Manager

Introducing a new, easy-to-use library to work with Zyte Smart Proxy Manager

Great news for all the Zyte Smart Proxy Manager users who use Playwright (a headless browser library) and all the Playwright users who are looking for an easy-to-integrate anti-ban solution for extracting data from javascript-heavy websites.

We have launched a new Zyte SmartProxy Playwright and we’re sure you’re going to love it!

At Zyte, the developer experience matters the most, and we wanted to give you a smooth experience of scraping dynamic websites with seamless integration between Playwright and our smart rotating proxy service, Zyte Smart Proxy Manager.

Here’s a quick explanation of how to get started, it’s super easy!

What is Zyte SmartProxy Playwright library? 

Zyte SmartProxy Playwright library is a client library built on top of Playwright — an open-source framework for web automation across Chromium, Firefox, and WebKit, with a single API, written to work seamlessly with Zyte Smart Proxy Manager.

With this library, you will be able to make the best of the headless browser capabilities of Playwright and manage bans by unlocking the powerful proxy management tool - Zyte Smart Proxy Manager in your web scraping projects.

Using our library for Playwright you will no longer have to maintain a separate piece of software running in the background to help connect with Zyte Smart Proxy Manager. 

In this tutorial, I will demonstrate how your Playwright web scraping script will have superhero capabilities to 

  • Handle data nested in scripts
  • Manage proxy rotation
  • Manage bans
  • Manage sessions
  • Increase cost savings in your project
  • Speed up page loads

Prerequisite for the tutorial: 

In order to run the script used in the tutorial, please make sure that you are ready with the following:

  • Have a Zyte Smart Proxy Manager account
    If you don’t have an SPM account, sign-up for a 14-day free trial here. Check out our docs on subscribing to Smart Proxy Manager for detailed instructions.
  • Be ready with a Zyte Smart Proxy Manager API Key
    Once you sign up, you will see your API key on the Getting Started page.
    If you already have a Smart Proxy Manager subscription, select ‘Smart Proxy Manager’ under ‘Tools’ on the left side of the dashboard and click on ‘API Access’. Make note of it, as we will need it later in the code.
  • Install Node.js and npm
    Install Node.js, npm on your system, and ensure that /usr/local/bin is in your $PATH environment variable.
smart proxy manager
smart proxy manager

Setting up the Zyte Smart Proxy Playwright  

Installing Zyte SmartProxy Playwright library is surprisingly easy.

Just run the following command using npm and it will automatically install the Playwright library along with all the supported browsers-Chromium, Firefox, and WebKit.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
$ npm install zyte-smartproxy-playwright
$ npm install zyte-smartproxy-playwright
$ npm install zyte-smartproxy-playwright

Awesome, now that you are all set and configured. Let’s get the show started!

Using Zyte SmartProxy Playwright for seamless web scraping

To demonstrate the integration between Zyte Smart Proxy Manager and Headless browser library - Playwright, we will write a script that will cause our headless browser to take a screenshot of ‘Web Scraping Sandbox’. This sandbox is developed by Zyte for demonstration purposes, feel free to play around with it and experiment with new techniques around web scraping.

Let’s start our Zyte SmartProxy Playwright tutorial with this basic example. 

Create a new file with the name sample.js and open it in your favorite code editor

  1. First, let’s import the Zyte SmartProxy Playwright library into your script.
    const { chromium } = require('zyte-smartproxy-playwright');
    const { chromium } = require('zyte-smartproxy-playwright');
  2. Next, create an instance of the browser with additional parameters- headless, spm_apikey

    2.1. It is important to note, that by default Zyte SmartProxy Playwright will open the browser in headless mode. We have set the ‘headless’ parameter to ‘false’. This means that it will open the Chromium GUI.

    2.2. Set the value of spm_apikey, as mentioned in the prerequisite above.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
(async () => {
const browser = await chromium.launch({
spm_apikey: '<<enter-your-api-key>>',
headless: false,
});
(async () => { const browser = await chromium.launch({ spm_apikey: '<<enter-your-api-key>>', headless: false, });
(async () => {
    const browser = await chromium.launch({
        spm_apikey: '<<enter-your-api-key>>',
        headless: false,
    });
  1. It is a good practice for developer experience to use logs for easy debugging and show the trace of what’s happening in the code. Before opening the new page in the browser, add a log.
    console.log('Before new page');
    console.log('Before new page');
  2. Next, open a new page using the browser instance, which is just like opening a tab in your browser.
    const page = await browser.newPage({ignoreHTTPSErrors: true});
    const page = await browser.newPage({ignoreHTTPSErrors: true});

  3. Now, you can request for any webpage to be loaded, with the help of  goto function. If the server responds to the request, it will open the web scraping sandbox, else it will throw an error in the logs.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
try {
await page.goto('https://toscrape.com/', {timeout: 180000});
} catch(err) {
console.log(err);
}
try { await page.goto('https://toscrape.com/', {timeout: 180000}); } catch(err) { console.log(err); }
try {
        await page.goto('https://toscrape.com/', {timeout: 180000});
    } catch(err) {
        console.log(err);
    }
  1. Take a screenshot of the web scraping sandbox, with screenshot command. In the path argument, give the path to the directory where you want to save the screenshot. The path used in this script will save the screenshot in your current directory which contains sample.js.
    await page.screenshot({path: 'screenshot.png'});
    await page.screenshot({path: 'screenshot.png'});

  2. Finally, don’t forget to close the browser.
     
    await browser.close();
    await browser.close();


    Altogether, the final code looks like this:
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
const { chromium } = require('zyte-smartproxy-playwright');
(async () => {
const browser = await chromium.launch({
spm_apikey: '<<enter-your-api-key>>',
headless: false,
});
console.log('Before new page');
const page = await browser.newPage({ignoreHTTPSErrors: true});
console.log('Opening page ...');
try {
await page.goto('https://toscrape.com/', {timeout: 180000});
} catch(err) {
console.log(err);
}
console.log('Taking a screenshot ...');
await page.screenshot({path: 'screenshot.png'});
await page.waitForTimeout(10000) await browser.close();
})();
const { chromium } = require('zyte-smartproxy-playwright'); (async () => { const browser = await chromium.launch({ spm_apikey: '<<enter-your-api-key>>', headless: false, }); console.log('Before new page'); const page = await browser.newPage({ignoreHTTPSErrors: true}); console.log('Opening page ...'); try { await page.goto('https://toscrape.com/', {timeout: 180000}); } catch(err) { console.log(err); } console.log('Taking a screenshot ...'); await page.screenshot({path: 'screenshot.png'}); await page.waitForTimeout(10000) await browser.close(); })();
const { chromium } = require('zyte-smartproxy-playwright');
(async () => {
    const browser = await chromium.launch({
        spm_apikey: '<<enter-your-api-key>>',
        headless: false,
    });
    console.log('Before new page');
    const page = await browser.newPage({ignoreHTTPSErrors: true});

    console.log('Opening page ...');
    try {
        await page.goto('https://toscrape.com/', {timeout: 180000});
    } catch(err) {
        console.log(err);
    }

    console.log('Taking a screenshot ...');
    await page.screenshot({path: 'screenshot.png'});
    await page.waitForTimeout(10000)    await browser.close();
})();

Execute script on the command line.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
$ node sample.js
$ node sample.js
$ node sample.js 

If your script runs successfully, you should be able to see the following in your terminal 

And also, screenshot.png in your project folder. 

Additional functionalities

In addition to easy integration and management of headless capabilities of Playwright with Zyte Smart Proxy Manager, our library provides additional functionalities such as

  • Automatic session management with Zyte Smart Proxy Manager.
  • Ad-blocking to speed up page loads and save costs (Zyte Smart Proxy Manager charges only per successful request). You can use the ‘block_ads’ argument and set it ‘true' and the library will block ads defined by block_list using @cliqz/adblocker-playwright package.
  • Direct downloading static assets (images, CSS, javascript, etc.) to speed up page load and save costs in Zyte Smart Proxy requests. You can use the ‘static_bypass’ argument and set it to ‘true’. and the library will skip the proxy used for static assets defined by `static_bypass_regex` or pass false to use the proxy.

Important note: block_ads and static_bypass are enabled by default. Some websites may not work with block_ads and static_bypass enabled. Try disabling them if you encounter any issues. To know more about these functionalities, read here.

Zyte Smart Proxy Manager

Learn how to superpower your Smart Proxy Manager with Playwright

Using libraries like Zyte SmartProxy Playwright can make it so much easier to work with dynamic websites and manage bans and rotate proxies all together in a single piece of code. Later this month, on the 22nd of June, I will be hosting a webinar to demonstrate the true power of this new integration and show you how to make the most out of it. You will get to know So be sure to join me!

This webinar will be a good opportunity for you to interact with our web scraping experts and clarify your doubts on the fly while doing hands-on integration of these libraries.

Read more 

If you are new to headless browsers, Playwright and Zyte Smart Proxy Manager. Here are some links to learn more about these topics. I hope you find them useful. 

  • Zyte Smartproxy Playwright: A wrapper over Playwright to provide Zyte Smart Proxy Manager specific functionalities.
  • Zyte Smart Proxy Manager: When extracting web data at scale using proxy management is critical to avoid getting banned or blocked. Smart Proxy Manager automatically selects the best proxies to keep your crawl healthy. It handles retries and applies rotation and fingerprinting logic to maximize your success rate.
  • Integration SPM with Playwright: Learn more about the integration on our official documentation.
  • What Is a Headless Browser?: A headless browser is a web browser without a user interface. Basically, it’s the same Chrome or Firefox we normally use with things we can click or touch stripped away: no tab bar, URL bar, bookmarks, or any other elements for visual interaction.
  • How does a headless browser help with web scraping and data extraction?: To understand the role of headless browsers in web scraping with our Technical Lead-Pawel.
  • Playwright: Playwright is a framework for web testing and automation. It allows testing Chromium, Firefox, and WebKit with a single API.
×

Try Zyte API

Zyte proxies and smart browser tech rolled into a single API.