2. Get serious about legal compliance
Operating legal and ethical web scraping projects at scale needs legal oversight by lawyers with a specialization in web scraping. There are a number of risks, including compliance with personal data regulations, copyright law, contract law and more. Some of these issues are also the subject of litigation — for a recent update, see our article on the Meta v Bright Data case.
Generative Artificial Intelligence (AI) systems and large language models (LLMs) have a host of case law and regulations pending. Your legal team should be keeping on top of these developments to help navigate these laws. Case law is changing rapidly in this space and will only increase as other countries pass AI legislation.Â
⚡Tip 3 : Use Zyte’s Compliant Web Scraping checklist against your projects
Zyte’s Compliant Web Scraping checklist can help you determine if web scraping projects are being performed in a legally compliant and ethical manner. Because there are no specific web scraping regulations, there’s a labyrinth of laws that one must navigate before embarking on a web scraping project. The checklist highlights the key areas to look out for:
Non-Public Data | If the data isn’t publicly available on the internet, then carefully consider whether you should be accessing this data. In most cases you should get permission or review the website terms to determine legality |
---|---|
Explicit Agreement to Terms | If you explicitly agree to the website’s terms of services or other policies in any way, then you must abide by them. Read the relevant policies in their entirety to determine legality |
Mobile App Download | When downloading a mobile app, check which contractual agreements you are agreeing to when downloading. Read the relevant policies in their entirety to determine legality |
Copyrighted Data | Determine if any of the data being scraped is protected by copyright. If it is copyrighted data, determine if your use constitutes copyright infringement or is an exception (such as fair use). Descope the copyrighted data if no exception. |
Personal Data | Determine if any of the data being scraped is considered personal data. If the data includes personal data, ensure you’re compliant. Descope or anonymize the personal data if you can’t abide by applicable data protection laws. |
Improperly Sourced IP Addresses | Ensure that the IPs being sourced are obtained legally and ethically. If residential IPs are being used, ensure compliance with applicable data protection laws Use a reputable provider. |
External Use of Data | Most scraped data should only be used for internal business analysis. If you plan to use the data externally, ensure that your use is compliant. |
If you answer “Yes” to any of the points above, the checklist outlines the next steps to take. If you answer “No” to any of these points then the project is likely lower risk. Remember to consult your own lawyer to ensure compliance in either case.
Continue to the next chapter 3. The quality of your web data is of utmost importance