Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Databases

Bas Kuunk
Bas Kuunk
21,308 Points

Is web scraping legal?

I'm developing a code that scrapes a certain database and stores a filtered portion of information. I'm wondering whether it's illegal to do that (or simply against their terms and conditions). Basically, the database is out in the open, so anyone can read, copy and paste the content. It's not really their data, because they too collect public data and publish it. The data is otherwise hard to come by and the only thing they add is structure: all data is entered by hand.

Taking it a step further, is it legal to publish that data on your own website? What if you make a website that collects data from various similar sources and publish it on your website? Help is greatly appreciated!

1 Answer

Disclaimer: I am not a lawyer and this is not legal advice. If legal advice is what you seek, do yourself a favor and go find a lawyer; don't rely on the internet.

Hey Bas,

So, the internet is inherently public; anyone can scrape most anything. That being said, copyright laws protect creator's content... even if said creators don't explicitly declare their copyrights. In other words, unless you have written permission from the copyright owner or the site is unambiguously licensed as under Creative Commons, none of the content is yours to use. In your case, there's no such thing as "it's not really their data" because it is their data.

But, I guess as long as you're scraping the data for your own personal use — and it's not that much data — no one will know and you should be fine. If you publish that data as your own, though, be prepared for litigation. Also, be aware that it's pretty trivial to tell if your site is being scraped by a bot/script once you start looking for it. And it's even more trivial to ban a given IP address that's been acting maliciously.

So, ya know, do your thing but be careful and be smart. Sometimes, the smartest thing you can do is just ask permission to use the data.