What is Data Scraping2:54 with Ken Alger
A high-level overview of the world of data scraping in Python. What it is and isn't and how it can be used.
[MUSIC] 0:00 Howdy, I'm Ken, and I'm chomping at the bit to introduce you 0:09 the wild world of data wrangling and specifically, web scraping. 0:13 We'll be taking an eye-level look at how to automate data gathering from the web. 0:18 Of course, we'll have some hands-on practice along the way as well. 0:23 What exactly is web scraping? 0:27 The definition I like best is that it's the automated collecting of data from 0:29 the web by any means other than a program interacting with an API. 0:34 This typically is done through writing a program that requests data 0:38 from a web server and obtains the necessary information. 0:42 Let's take a look at some examples of use cases, 0:46 before we get too deep into the how-tos. 0:48 Real estate listing companies and agents use web scraping to scour the web for 0:51 current property listings to gather competitive data about pricing and 0:55 housing market trends. 1:00 Many companies use web scraping for competitive research. 1:01 Scraping competitors website's for product and review information. 1:05 Social media companies scrape the web to get a better handle on what is trending. 1:09 You can use a web scraper to look at YouTube or a specific category of videos 1:14 with lots of views to determine what topics and titles are doing well. 1:18 The potential for options is really limited only by your imagination. 1:22 Throughout this course we'll build our scraping skills using the Python 1:27 packages Beautiful Soup and Scrapy. 1:31 We'll look at parsing HTML files, and writing spiders that will follow the links 1:33 between pages and sites to further increase our data gathering abilities. 1:38 I'll also touch on how to handle sites that require logins, and how to test for 1:43 scraping applications. 1:47 Along the way we'll also talk about when we should reign in our powers to be good 1:49 Internet citizens. 1:53 Before we get started, let me briefly talk about our tools. 1:55 I'll be using the PyCharm IDE throughout this course. 1:59 The code samples that we'll be working through should work perfectly fine 2:03 in other IDEs as well, and in further editions of PyCharm itself too. 2:07 The developers of PyCharm are always improving that tool. 2:12 If you run into any problems using a different version, 2:16 ask for help in the Treehouse forum. 2:19 Also, if any minor changes or bugs pop-up, 2:21 keep your eyes on the teacher's notes for helpful comments. 2:24 If you spot an issue or difference somewhere, check the notes first and 2:28 then let us know in the forum if we've missed it. 2:31 One last thing, remember that the Treehouse video player has speed controls. 2:35 So if I'm talking too fast, or going really slow, 2:40 feel free to adjust the speed. 2:43 I won't mind, really, even if you laugh at how I sound in super slow-motion. 2:44 Okay, let's get started with building our web scraping skills and knowledge. 2:50
You need to sign up for Treehouse in order to download course files.Sign up