The process of data scraping, also known as web scraping, involves automatically importing data from a webpage into a file. There are a number of scraping tools available. Businesses and other entities use web scraping today for getting info on competitors, marketing, recruitment, and various kinds of analyses. But is web scraping legal?
The answer is it depends. Whilst the act itself isn’t illegal, what you do with the data after it’s been scraped can be.
Scraping of personal data
GDPR and other personal data laws of different countries are fairly strict when it comes to collecting and storing personal data. So, if you’re scraping personal data, especially of EU residents, you need to have a lawful reason to do so. That can be:
- Explicit consent – unlikely unless the website’s Terms of Use let users know that their data might be scrapped when they sign up. Special explicit consent is necessary for scraping sensitive data.
- Legitimate interest – it would be hard for scrapers to show that they’ve got a legitimate interest in scraping and storing personal data, unless they’re a law enforcement or a government agency, for instance.
The GDPR requires processing only as much data as necessary to accomplish a task. Given that automated web scraping usually processes very large quantities of data for various purposes, it can be deemed contrary to this GDPR provision.
As data of millions of Facebook & LinkedIn users is allegedly made available online, @jon_belcher looks at #datascraping & responsibilities of media platforms. https://t.co/FUqXMB1k7X Published @BizMattersmag pic.twitter.com/I5awC1MTKG
— Excello Law (@ExcelloLaw) April 22, 2021
Therefore, if data scrapers need to process any personal data of EU citizens, even if it’s publicly available, they either need to obtain their explicit consent or prove a legitimate interest and aim to minimize the amount of data collected. That means only collecting what is necessary for a specific purpose/client and not just downloading the entire user list of a LinkedIn group including each user’s profile, for example.
A recent Russian personal data law goes a step further. From March 1st of this year, there’s a new type of personal data called “personal data permitted for dissemination”. That means, for example, press releases of companies that include personal data other than the name and surname of specific persons (photos, positions), or CVs on headhunting websites. Essentially, it includes all personal data to the distribution of which the data subject has consented. Such consent is mandatory, and the data subject has the right to include any limitations they want in that consent. This consent must be shared by the website.
If a data scraper wishes to scrape such data off the web, they must comply with the limitations of that consent. And if a data subject has shared their data publicly on their own and hasn’t provided consent, every entity who uses this data has the “burden of proof that they process the data lawfully“. Therefore, this law, like the GDPR, severely restricts how much scrapping of publicly available personal data web scrapers can do within the jurisdiction.
Computer fraud, copyrighted data, and compliance
In 2019, the US Court of Appeal held in its decision in favor of a data analytics company hiQ against LinkedIn that data that’s publicly available and not copyrighted can be scraped. However, that applies to publicly available information only.
hiQ Labs, Inc. V. Linkedin Corp.: A Federal Court Weighs In On Web Scraping, Free Speech… https://t.co/hIOWj2b2hE By @ropesgray
— Mondaq (@Mondaq) September 18, 2019
Since data scrapers can’t scrape data that are not publicly available, LinkedIn couldn’t use the relevant statute – Computer Fraud and Abuse Act – to make hiQ stop scraping. The law only protects private information. If, however, a data scraping company obtains copyrighted files such as videos and then reposts them for commercial purposes, that is illegal under copyright law.
Some websites’ Terms of Service expressly prohibit data scraping or data crawling of any kind. These terms can also be specified in a file in a website’s root directory titled robots.txt. To give you an example, I checked robots.txt of Twitter for scraping permissions. Here’s a screenshot of the relevant part of the terms:

As you can see, the scraping bots are allowed to scrape hashtags but not information about users and their followers’ info.
How do businesses practice compliance?
Businesses that offer data scraping services must adhere to laws and regulations in order to protect the rights of their customers. They should consult with legal counsel about which data is allowed to be scraped and what type of license is required for it. Additionally, they should require customers to sign a legally binding agreement that outlines their obligations and responsibilities regarding the use of the data scraping service. Furthermore, businesses should ensure that their teams are properly trained on best practices for data protection and privacy compliance when using the service.
And what about proxy service providers? As a leader in the IPPN (IP proxy networks) market, Bright Data has set high standards of compliance in the proxy industry. Every new Bright Data Residential/mobile customer is thoroughly vetted and must be approved by a compliance officer to ensure their use case meets our strict standards. Bright Data’s in-depth onboarding process requires clients to share their national ID and sign our compliance statement amongst various other identity verification techniques.
So, the legality of scraping data depends on the type and amount of data scraped. Whatever the case might be, however, it’s always best to be as transparent as possible about your scraping practices and obtain professional advice if you’re uncertain. You might also require a Data Protection Impact Assessment under the GDPR as some authorities believe data scraping to be “high-risk invisible processing“.
Photo credit: The featured image has been taken by Maxim Hopman. The screenshot has been taken by the author for TechAcute.
Sources: Zyte / Alexander Demchenko (DataOx) / Srishti Saha (datahut) / GDPR-info / Stanislav Rumyantsev (IAPP) / Fiona Campbell (Fieldfisher)