PINGDOM_CHECK

A developer’s guide to rotating proxies in Python

Read Time
5 Mins
Posted on
April 8, 2022
By
Neha Setia Nagpal
×

Try Zyte API

Zyte proxies and smart browser tech rolled into a single API.
Return to top

A developer’s guide to rotating proxies in Python

A proxy is an intermediary server that hides your IP, so you can navigate through web traffic anonymously and securely. Proxies have very interesting use-cases, the most prominent of them being web scraping for pricing intelligence, SEO monitoring, data collection for market research, etc. And the correct use of rotating proxies is a key ingredient of this.

If you want to know more about proxies for web scraping and how proxies work, feel free to skim through our recent blog.

In this developer guide, you will learn how to:

  1. Set up a proxy using the Python library - ‘Requests’
  2. Use rotating proxies in three different ways
    1. Using Request library
    2. Using Scrapy rotating middleware
    3. Using Zyte’s Smart Proxy Manager

So let’s get started!

Prerequisites 

  1. Requests: It is an elegant and simple HTTP library for Python. It allows you to send HTTP/1.1 requests extremely easily. There’s no need to manually add query strings to your URLs or to form-encode your POST data. To install the library, run this command in the terminal.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
python -m pip install requests
python -m pip install requests
python -m pip install requests
  1. Scrapy: This is one of the most powerful, fast, open-source web crawling frameworks written in Python to extract structured data which can be used for a wide range of useful applications, like data mining, information processing, or historical archival. If you are new to scrapy, this tutorial on Scrapy would be a good place to start. Scrapy comes with a middleware that makes rotating proxies a breeze, once you have a list of working proxies.  To install scrapy and scrapy-rotating-proxies, run the following commands.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
pip install scrapypip install scrapy-rotating-proxies
pip install scrapypip install scrapy-rotating-proxies
pip install scrapypip install scrapy-rotating-proxies
  1. Zyte Smart Proxy Manager: This is a proxy management and antiban solution that manages proxy pools and handles bans so you can focus on extracting quality data. Follow this guide to create a Smart Proxy Manager account and get a 14-day free trial. You can cancel at any time and you won’t be charged a single penny for the free trial.

 To use Smart Proxy Manager with Scrapy, you need to install this middleware `scrapy-zyte-smartproxy`

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
pip install scrapy-zyte-smartproxy
pip install scrapy-zyte-smartproxy
pip install scrapy-zyte-smartproxy

How to set up a proxy using Requests?

First, import the Requests library, then create a proxy dictionary to map the protocols - HTTP and HTTPS to a proxy URL. Finally, set up a response using requests.get method to make the request to a URL using the proxy dictionary. For example:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
import requests
proxies = { 'http': 'http://10.10.1.10:3128', 'https': 'http://10.10.1.10:1080', }
response = requests.get('http://example.org', proxies=proxies)
import requests proxies = { 'http': 'http://10.10.1.10:3128', 'https': 'http://10.10.1.10:1080', } response = requests.get('http://example.org', proxies=proxies)
import requests 
proxies = { 'http': 'http://10.10.1.10:3128', 'https': 'http://10.10.1.10:1080', }
response = requests.get('http://example.org', proxies=proxies)

Configure proxies for individual URLs

You can configure proxies for individual URLs even if the schema is the same. This comes in handy when you want to use different proxies for different websites you wish to scrape.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
import requests
proxies = { 'http://example.org': 'http://10.10.1.10:3128', 'http://something.test': 'http://10.10.1.10:1080', }
requests.get('http://something.test/some/url', proxies=proxies)
import requests proxies = { 'http://example.org': 'http://10.10.1.10:3128', 'http://something.test': 'http://10.10.1.10:1080', } requests.get('http://something.test/some/url', proxies=proxies)
import requests 
proxies = { 'http://example.org': 'http://10.10.1.10:3128', 'http://something.test': 'http://10.10.1.10:1080', }
requests.get('http://something.test/some/url', proxies=proxies)

Creating sessions 

Sometimes you need to create a session and use a proxy at the same time to request a page. In this case, you first have to create a new session object and add proxies to it then finally send the request through the session object:

 `requests.get` essentially uses the `requests.Session` under the hood.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
import requests
s = requests.Session()
s.proxies = {
"http": "http://10.10.10.10:8000",
"https": "http://10.10.10.10:8000",
}
r = s.get("http://toscrape.com")
import requests s = requests.Session() s.proxies = { "http": "http://10.10.10.10:8000", "https": "http://10.10.10.10:8000", } r = s.get("http://toscrape.com")
import requests
s = requests.Session()
s.proxies = {
  "http": "http://10.10.10.10:8000",
  "https": "http://10.10.10.10:8000",
}
r = s.get("http://toscrape.com")

How to rotate proxies? 

For the internet, your IP address is your identity. One can only make limited requests to a website with one IP. Think of websites as some sort of regulator. Websites get suspicious of requests coming from the same IP over and over again. This is ‘IP Rate Limitation’. IP rate limitations applied by websites can cause blocking, throttling, or CAPTCHAs. One way to overcome this is to rotate proxies. Read more about why you need rotating proxies.

Now let's get to the “how” part. This tutorial demonstrates three ways you work with rotating proxies:

  1. Writing a rotating proxies logic using the Request library
  2. Rotating proxies in python using the Scrapy middleware scrapy-rotating-proxies 
  3. Using Zyte Smart Proxy Manager 

Note: You don’t need any different proxies to run the code demonstrated in this tutorial. If your product/service relies on web scraped data, a free proxy solution will probably not be enough for your needs. 

Let’s discuss them one by one:

Rotating proxies using Request library

In the code shown below, first, we create a proxy pool dictionary. Then, randomly pick a proxy to use for our request. If the proxy works properly we can access the given site. If there’s a connection error we may have to delete this proxy from the list and retry the same URL with another proxy.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
import requests
s = requests.Session()
s.proxies = {
"http": "http://10.10.10.10:8000",
"https": "http://10.10.10.10:8000",
}
r = s.get("http://toscrape.com")
import requests s = requests.Session() s.proxies = { "http": "http://10.10.10.10:8000", "https": "http://10.10.10.10:8000", } r = s.get("http://toscrape.com")
import requests
s = requests.Session()
s.proxies = {
  "http": "http://10.10.10.10:8000",
  "https": "http://10.10.10.10:8000",
}
r = s.get("http://toscrape.com")

Rotating proxies in python using Scrapy

In your settings.py

  1.  add the list of proxies like this.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
ROTATING_PROXY_LIST = [
'Proxy_IP:port',
'Proxy_IP:port',
# ...
]
ROTATING_PROXY_LIST = [ 'Proxy_IP:port', 'Proxy_IP:port', # ... ]
ROTATING_PROXY_LIST = [
  'Proxy_IP:port',
  'Proxy_IP:port',
  # ...
]

If you want more external control over the IPs, you can even load it from a file like this.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
ROTATING_PROXY_LIST_PATH = 'listofproxies.txt'
ROTATING_PROXY_LIST_PATH = 'listofproxies.txt'
ROTATING_PROXY_LIST_PATH = 'listofproxies.txt'
  1. Enable the middleware
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
DOWNLOADER_MIDDLEWARES = {
# ...
'rotating_proxies.middlewares.RotatingProxyMiddleware': 800,
'rotating_proxies.middlewares.BanDetectionMiddleware': 800,
# ...
}
DOWNLOADER_MIDDLEWARES = { # ... 'rotating_proxies.middlewares.RotatingProxyMiddleware': 800, 'rotating_proxies.middlewares.BanDetectionMiddleware': 800, # ... }
DOWNLOADER_MIDDLEWARES = {
  # ...
  'rotating_proxies.middlewares.RotatingProxyMiddleware': 800,
  'rotating_proxies.middlewares.BanDetectionMiddleware': 800,
  # ...
}

That’s it! Now all your requests will automatically be routed randomly between the proxies.

Note: Sometimes the proxy that you are trying to use is just simply banned. In this case, there’s not much you can do about it other than remove it from the pool and retry using another proxy. But other times if it isn’t banned you just have to wait a little bit before using the same proxy again.

Use Zyte Smart Proxy Manager 

The above-discussed ways to rotate proxies work well for building demos and minimum viable products. But things can get tricky as soon as you decide to scale your data extraction project. Infrastructure management of proxy pools is quite challenging, time-consuming, and resource extensive. You will soon find yourself refurbishing proxies to keep the pool healthy, managing bans and sessions, rotating user agents, etc. Proxy infrastructure also needs to be configured to work with headless browsers to crawl javascript-heavy websites. Phew! It’s not shocking how quickly your data extraction project gets converted into a proxy management project.

Thanks to the Zyte Smart Proxy Manager – you don't need to rotate and manage any proxies. It is all done automatically so you can focus on extracting quality data. Let’s see how easy it is to integrate with your scrapy project. 

  1. In the settings file of your Scrapy project, enable the middleware
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# enable the middleware
DOWNLOADER_MIDDLEWARE={'scrapy_zyte_smartproxy.ZyteSmartProxyMiddleware': 610}
# enable the middleware DOWNLOADER_MIDDLEWARE={'scrapy_zyte_smartproxy.ZyteSmartProxyMiddleware': 610}
# enable the middleware
DOWNLOADER_MIDDLEWARE={'scrapy_zyte_smartproxy.ZyteSmartProxyMiddleware': 610}
  1. In your Scrapy spider, add these attributes
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# enable Zyte Proxy
ZYTE_SMARTPROXY_ENABLED = True
# the API key you get with your subscription
ZYTE_SMARTPROXY_APIKEY = '<your_zyte_proxy_apikey>'
# enable Zyte Proxy ZYTE_SMARTPROXY_ENABLED = True # the API key you get with your subscription ZYTE_SMARTPROXY_APIKEY = '<your_zyte_proxy_apikey>'
# enable Zyte Proxy
ZYTE_SMARTPROXY_ENABLED = True

# the API key you get with your subscription
ZYTE_SMARTPROXY_APIKEY = '<your_zyte_proxy_apikey>'

Demo code for the above-discussed settings,

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
zyte_smartproxy_enabled = True
zyte_smartproxy_apikey = 'a7f74201a57542d7a0b0a08946147fd3'
custom_settings = {
"DEFAULT_REQUEST_HEADERS": {
"X-Crawlera-Profile": "desktop",
"X-Crawlera-Cookies": "disable",
}
}
def start_requests(self):
urls = [
'https://quotes.toscrape.com/page/1/',
'https://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
page = response.url.split("/")[-2]
filename = f'quotes-{page}.html'
with open(filename, 'wb') as f:
f.write(response.body)
self.log(f'Saved file {filename}')
import scrapy class QuotesSpider(scrapy.Spider): name = "quotes" zyte_smartproxy_enabled = True zyte_smartproxy_apikey = 'a7f74201a57542d7a0b0a08946147fd3' custom_settings = { "DEFAULT_REQUEST_HEADERS": { "X-Crawlera-Profile": "desktop", "X-Crawlera-Cookies": "disable", } } def start_requests(self): urls = [ 'https://quotes.toscrape.com/page/1/', 'https://quotes.toscrape.com/page/2/', ] for url in urls: yield scrapy.Request(url=url, callback=self.parse) def parse(self, response): page = response.url.split("/")[-2] filename = f'quotes-{page}.html' with open(filename, 'wb') as f: f.write(response.body) self.log(f'Saved file {filename}')
import scrapy


class QuotesSpider(scrapy.Spider):
  name = "quotes"
  zyte_smartproxy_enabled = True
  zyte_smartproxy_apikey = 'a7f74201a57542d7a0b0a08946147fd3'
  custom_settings = {
      "DEFAULT_REQUEST_HEADERS": {
          "X-Crawlera-Profile": "desktop",
          "X-Crawlera-Cookies": "disable",
      }
  }

  def start_requests(self):
      urls = [
          'https://quotes.toscrape.com/page/1/',
          'https://quotes.toscrape.com/page/2/',
      ]
      for url in urls:
          yield scrapy.Request(url=url, callback=self.parse)

  def parse(self, response):
      page = response.url.split("/")[-2]
      filename = f'quotes-{page}.html'
      with open(filename, 'wb') as f:
          f.write(response.body)
      self.log(f'Saved file {filename}')

This piece of code sends a successful HTTP Python request to https://quotes.toscrape.com/

When you use Zyte Proxy Manager, you don’t need to deal with proxy rotation manually. Everything is taken care of internally through the use of our rotating proxies.

You can try Zyte Smart Proxy Manager for 14 days for free.

×

Try Zyte API

Zyte proxies and smart browser tech rolled into a single API.