Bummer! This is just a preview. You need to be signed in with a Basic account to view the entire video.
Start a free Basic trial
to watch this video
A high-level overview of the world of data scraping in Python. What it is and isn't and how it can be used.
-
0:00
[MUSIC]
-
0:09
Howdy, I'm Ken, and I'm chomping at the bit to introduce you
-
0:13
the wild world of data wrangling and specifically, web scraping.
-
0:18
We'll be taking an eye-level look at how to automate data gathering from the web.
-
0:23
Of course, we'll have some hands-on practice along the way as well.
-
0:27
What exactly is web scraping?
-
0:29
The definition I like best is that it's the automated collecting of data from
-
0:34
the web by any means other than a program interacting with an API.
-
0:38
This typically is done through writing a program that requests data
-
0:42
from a web server and obtains the necessary information.
-
0:46
Let's take a look at some examples of use cases,
-
0:48
before we get too deep into the how-tos.
-
0:51
Real estate listing companies and agents use web scraping to scour the web for
-
0:55
current property listings to gather competitive data about pricing and
-
1:00
housing market trends.
-
1:01
Many companies use web scraping for competitive research.
-
1:05
Scraping competitors website's for product and review information.
-
1:09
Social media companies scrape the web to get a better handle on what is trending.
-
1:14
You can use a web scraper to look at YouTube or a specific category of videos
-
1:18
with lots of views to determine what topics and titles are doing well.
-
1:22
The potential for options is really limited only by your imagination.
-
1:27
Throughout this course we'll build our scraping skills using the Python
-
1:31
packages Beautiful Soup and Scrapy.
-
1:33
We'll look at parsing HTML files, and writing spiders that will follow the links
-
1:38
between pages and sites to further increase our data gathering abilities.
-
1:43
I'll also touch on how to handle sites that require logins, and how to test for
-
1:47
scraping applications.
-
1:49
Along the way we'll also talk about when we should reign in our powers to be good
-
1:53
Internet citizens.
-
1:55
Before we get started, let me briefly talk about our tools.
-
1:59
I'll be using the PyCharm IDE throughout this course.
-
2:03
The code samples that we'll be working through should work perfectly fine
-
2:07
in other IDEs as well, and in further editions of PyCharm itself too.
-
2:12
The developers of PyCharm are always improving that tool.
-
2:16
If you run into any problems using a different version,
-
2:19
ask for help in the Treehouse forum.
-
2:21
Also, if any minor changes or bugs pop-up,
-
2:24
keep your eyes on the teacher's notes for helpful comments.
-
2:28
If you spot an issue or difference somewhere, check the notes first and
-
2:31
then let us know in the forum if we've missed it.
-
2:35
One last thing, remember that the Treehouse video player has speed controls.
-
2:40
So if I'm talking too fast, or going really slow,
-
2:43
feel free to adjust the speed.
-
2:44
I won't mind, really, even if you laugh at how I sound in super slow-motion.
-
2:50
Okay, let's get started with building our web scraping skills and knowledge.
You need to sign up for Treehouse in order to download course files.
Sign up