HomeBusinessIs Data Scraping Legal?

Is Data Scraping Legal?

The process of data scraping, also known as web scraping, involves automatically importing data from a webpage into a file. There are a number of scraping tools available. Businesses and other entities use web scraping today for getting info on competitors, marketing, recruitment, and various kinds of analyses. But is web scraping legal?

The answer is it depends. Whilst the act itself isn’t illegal, what you do with the data after it’s been scraped can be.

Scraping of personal data

GDPR and other personal data laws of different countries are fairly strict when it comes to collecting and storing personal data. So, if you’re scraping personal data, especially of EU residents, you need to have a lawful reason to do so. That can be:

  • Explicit consent – unlikely unless the website’s Terms of Use let users know that their data might be scrapped when they sign up. Special explicit consent is necessary for scraping sensitive data.
  • Legitimate interest – it would be hard for scrapers to show that they’ve got a legitimate interest in scraping and storing personal data, unless they’re a law enforcement or a government agency, for instance.

The GDPR requires processing only as much data as necessary to accomplish a task. Given that automated web scraping usually processes very large quantities of data for various purposes, it can be deemed contrary to this GDPR provision.

Therefore, if data scrapers need to process any personal data of EU citizens, even if it’s publicly available, they either need to obtain their explicit consent or prove a legitimate interest and aim to minimize the amount of data collected. That means only collecting what is necessary for a specific purpose/client and not just downloading the entire user list of a LinkedIn group including each user’s profile, for example.

A recent Russian personal data law goes a step further. From March 1st of this year, there’s a new type of personal data called “personal data permitted for dissemination”. That means, for example, press releases of companies that include personal data other than the name and surname of specific persons (photos, positions), or CVs on headhunting websites. Essentially, it includes all personal data to the distribution of which the data subject has consented. Such consent is mandatory, and the data subject has the right to include any limitations they want in that consent. This consent must be shared by the website.

If a data scraper wishes to scrape such data off the web, they must comply with the limitations of that consent. And if a data subject has shared their data publicly on their own and hasn’t provided consent, every entity who uses this data has the “burden of proof that they process the data lawfully“. Therefore, this law, like the GDPR, severely restricts how much scrapping of publicly available personal data web scrapers can do within the jurisdiction.

Computer fraud, copyrighted data, and compliance

In 2019, the US Court of Appeal held in its decision in favor of a data analytics company hiQ against LinkedIn that data that’s publicly available and not copyrighted can be scraped. However, that applies to publicly available information only.

Since data scrapers can’t scrape data that are not publicly available, LinkedIn couldn’t use the relevant statute – Computer Fraud and Abuse Act – to make hiQ stop scraping. The law only protects private information. If, however, a data scraping company obtains copyrighted files such as videos and then reposts them for commercial purposes, that is illegal under copyright law.

Some websites’ Terms of Service expressly prohibit data scraping or data crawling of any kind. These terms can also be specified in a file in a website’s root directory titled robots.txt. To give you an example, I checked robots.txt of Twitter for scraping permissions. Here’s a screenshot of the relevant part of the terms:

Data Scraping
Image: Kate Sukhanova

As you can see, the scraping bots are allowed to scrape hashtags but not information about users and their followers’ info.

So, the legality of scraping data depends on the type and amount of data scraped. Whatever the case might be, however, it’s always best to be as transparent as possible about your scraping practices and obtain professional advice if you’re uncertain. You might also require a Data Protection Impact Assessment under the GDPR as some authorities believe data scraping to be “high-risk invisible processing“.

Photo credit: The featured image has been taken by Maxim Hopman. The screenshot has been taken by the author for TechAcute.
Sources: ZyteAlexander Demchenko (DataOx) / Srishti Saha (datahut) / GDPR-info / Stanislav Rumyantsev (IAPP) / Fiona Campbell (Fieldfisher)

Was this post helpful?

Kate Sukhanova
Kate Sukhanova
I’m a writer with a keen interest in digital technology and traveling. If I get to write about those two things at the same time, I’m the happiest person in the room. When I’m not scrolling through newsfeeds, traveling, or writing about it, I enjoy reading mystery novels, hanging out with my cat, and running my charity shop.

Ad