Being a Good Citizen3:07 with Ken Alger
Just because we can do something doesn't mean that we always should do it. Let's take a look at some of the responsibilities that come with the power of web scraping.
Now that we've seen how to do some basic web scraping, I'd like to talk about some 0:00 other responsibilities we have as citizens of the Internet. 0:04 As Sir Francis Bacon stated, knowledge is power. 0:08 Having the knowledge and skills needed to do web scraping is a powerful tool. 0:12 There is another saying that many attribute, 0:16 ironically enough, to the Spider-Man comics. 0:19 With great power comes great responsibility. 0:21 Some of the responsibility that is incumbent upon us 0:24 as web scraping developers is to know and follow applicable laws. 0:27 These will vary from country to country. 0:31 And much of the law around digital content ownership 0:33 is continuously being tested in courts. 0:36 As a disclaimer here, I'm not an attorney, so 0:40 don't take the following as legal advice. 0:42 With that in mind, however, there are some specific areas of the law we should be 0:45 aware of at a high level to make us better citizens and keep us out of trouble. 0:50 Some examples of laws to consider are, in the United States there 0:55 are three main legal claims that can be made against web scraping. 1:00 Copyright infringement, the Computer Fraud and Abuse Act, 1:03 which prohibits accessing a computer without, or in excess of, authorization. 1:07 Originally designed to protect financial and government computers, 1:12 there have been instances where other computers and even cell phones have fallen 1:16 under the CFAA's protection due to the nature of today's device communication. 1:21 Trespass to Chattels, which basically means 1:26 interfering with another person's lawful possession of movable personal property. 1:29 In the European Union there are corporate laws to consider as well such as Directive 1:34 96/9/EC commonly known as the Database Directive. 1:39 In Australia, the Spam Act of 2003 prohibits certain forms of web scraping. 1:45 If you decide to produce web scraping utility, especially one for 1:51 profit, keep these things in mind. 1:54 Again, if you find yourself in a legally ambiguous web scraping project, 1:56 consult with an attorney who specializes in this area. 2:01 How can we protect ourselves and still utilize web scraping tools? 2:05 Many sites include a robots.txt file, 2:09 where limits can be set as to where bots can go. 2:13 This robots.txt file is a standardized file which follows 2:16 the robots exclusion standard. 2:20 As you might be able to imagine, this can become a bit legally muddy, 2:22 a site stating that a human can access certain parts but 2:26 a computer can't gets a bit tricky from a legal stand point. 2:29 Similarly, sites may have a posted terms of service 2:34 that states which part of the site, if any, data can be collected from, or 2:37 how their data needs to be attributed. 2:41 There have been many interesting legal cases around web scraping. 2:44 I've included some links in the teacher's notes about a few. 2:47 Okay, we've seen the power of scrapping a basic website with. 2:51 And now have briefly discussed some of our responsibilities having these new powers. 2:55 In the next stage let's take a look at extending our data wrangling skills 3:00 beyond a single page and start crawling the web. 3:04
You need to sign up for Treehouse in order to download course files.Sign up