Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
Just because we can do something doesn't mean that we always should do it. Let's take a look at some of the responsibilities that come with the power of web scraping.
Additional Resources
Scraping Legal Cases
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
Now that we've seen how to do some basic
web scraping, I'd like to talk about some
0:00
other responsibilities we have
as citizens of the Internet.
0:04
As Sir Francis Bacon stated,
knowledge is power.
0:08
Having the knowledge and skills needed
to do web scraping is a powerful tool.
0:12
There is another saying
that many attribute,
0:16
ironically enough,
to the Spider-Man comics.
0:19
With great power comes
great responsibility.
0:21
Some of the responsibility
that is incumbent upon us
0:24
as web scraping developers is to know and
follow applicable laws.
0:27
These will vary from country to country.
0:31
And much of the law around
digital content ownership
0:33
is continuously being tested in courts.
0:36
As a disclaimer here,
I'm not an attorney, so
0:40
don't take the following as legal advice.
0:42
With that in mind, however, there are some
specific areas of the law we should be
0:45
aware of at a high level to make us better
citizens and keep us out of trouble.
0:50
Some examples of laws to consider are,
in the United States there
0:55
are three main legal claims that
can be made against web scraping.
1:00
Copyright infringement,
the Computer Fraud and Abuse Act,
1:03
which prohibits accessing a computer
without, or in excess of, authorization.
1:07
Originally designed to protect
financial and government computers,
1:12
there have been instances where other
computers and even cell phones have fallen
1:16
under the CFAA's protection due to the
nature of today's device communication.
1:21
Trespass to Chattels,
which basically means
1:26
interfering with another person's lawful
possession of movable personal property.
1:29
In the European Union there are corporate
laws to consider as well such as Directive
1:34
96/9/EC commonly known as
the Database Directive.
1:39
In Australia, the Spam Act of 2003
prohibits certain forms of web scraping.
1:45
If you decide to produce web
scraping utility, especially one for
1:51
profit, keep these things in mind.
1:54
Again, if you find yourself in a legally
ambiguous web scraping project,
1:56
consult with an attorney who
specializes in this area.
2:01
How can we protect ourselves and
still utilize web scraping tools?
2:05
Many sites include a robots.txt file,
2:09
where limits can be set
as to where bots can go.
2:13
This robots.txt file is
a standardized file which follows
2:16
the robots exclusion standard.
2:20
As you might be able to imagine,
this can become a bit legally muddy,
2:22
a site stating that a human
can access certain parts but
2:26
a computer can't gets a bit
tricky from a legal stand point.
2:29
Similarly, sites may have
a posted terms of service
2:34
that states which part of the site,
if any, data can be collected from, or
2:37
how their data needs to be attributed.
2:41
There have been many interesting
legal cases around web scraping.
2:44
I've included some links in
the teacher's notes about a few.
2:47
Okay, we've seen the power of
scrapping a basic website with Beautiful Soup.
2:51
And now have briefly discussed some of our
responsibilities having these new powers.
2:55
In the next stage let's take a look at
extending our data wrangling skills
3:00
beyond a single page and
start crawling the web.
3:04
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up