Bummer! This is just a preview. You need to be signed in with a Basic account to view the entire video.
Start a free Basic trial
to watch this video
APIs are all around us on the web. Sometimes we can use scraping techniques to interact with them in a meaningful way.
-
0:00
Back in the good old days of the Internet, if we wanted data,
-
0:03
we had to view it on Web pages.
-
0:05
Now, however, many sites provide a Web API that shares their data.
-
0:11
Sometimes, we can use these APIs to directly access information,
-
0:15
without having to scrape the data.
-
0:18
I'd recommend looking to see if the site you are wanting to scrape
-
0:21
offers an API for the information you need.
-
0:24
It can be a big time saver.
-
0:27
Let's take a look at how we can get data from the World Bank, using their API.
-
0:32
There are many instances when using an API is great.
-
0:36
Sometimes, though, scraping results from an API is useful as well,
-
0:41
especially if the API documentation isn't super helpful.
-
0:45
Let's take a brief look at one technique we can use to get and
-
0:49
process data from an API.
-
0:51
In this case, we'll look at The World Bank API.
-
0:54
It's actually very well documented, which provides us with some extra knowledge
-
0:59
as we go about trying to scrape things.
-
1:01
If we look here, at the Developer Information overview page,
-
1:04
it provides information about how to get started, and what the API provides.
-
1:09
Let's look here,
-
1:10
at the Country Queries section, to see what information we might explore there.
-
1:15
It looks like we could use this information to get some generic
-
1:18
information about the countries of the world.
-
1:21
For example, if we wanted to do some high-level data exploration about
-
1:25
income level in regions of the world, let's use this request format here,
-
1:31
look through some ISO codes, and get some information that we could explore.
-
1:35
We won't be doing any actual exploration of data in this course, but
-
1:40
check the teachers' notes for more information.
-
1:42
Let's take a look at the information we get from a country with a lot of horses,
-
1:46
like Ethiopia.
-
1:47
I know their ISO code is ETH, so let's put that into the request format.
-
1:53
So we can copy this, Let's create
-
1:58
a new tab, and we'll do ETH.
-
2:02
It looks like we're getting back the same information as the documentation stated,
-
2:06
and its in XML format.
-
2:08
That's great, we can handle that, we'll use Beautiful Soup to parse this XML,
-
2:13
and get the name, region, and income level.
-
2:16
This could be used, for
-
2:18
example, to generate a histogram chart of regions of the world and income levels.
-
2:23
Lots of options for data visualization, here.
-
2:26
Let's go back to our code, and create a new world_bank.py file.
-
2:31
We don't need it inside the spider.
-
2:38
World_bank.py, and we'll start with our imports.
-
2:42
So, from urlib.request import urlopen.
-
2:47
We're going back to Beautiful Soup, so bs4 import BeautifulSoup, and
-
2:55
we'll be using a csv file of ISO codes, so we 'll want to import csv as well.
-
3:02
Let's define a function to get the country information, get_country,
-
3:08
and we'll pass in our country code, And
-
3:13
just like we've done with Beautiful Soup in the past, we define our HTML string.
-
3:19
It's urlopen, And
-
3:23
it's that request format string that we saw just a moment ago,
-
3:28
worldbank.org/v2/countries/, and we'll use the string formatter,
-
3:39
country_code, and let's bring this down to a new line.
-
3:44
Next, we define our soup object.
-
3:48
So, we pass in our HTML, and for our parser,
-
3:52
since we're dealing with XML, we can use an XML parser.
-
3:58
Scraping XML is pretty straightforward with Beautiful Soup.
-
4:02
If we look at the results we got for Ethiopia, we want to get three fields,
-
4:07
wb:name, wb:region, and the wb:incomeLevel.
-
4:13
Let's go ahead and define those.
-
4:16
Country_name is soup.find( 'wb:name' ),
-
4:26
Region, ( 'wb:region' ),
-
4:31
and income_level, soup.find(
-
4:36
'wb:incomelevel' ), and it was all lowercase.
-
4:43
Now, let's print that information out.
-
4:45
Here's a good example of a time when we can use the get_text method.
-
4:52
Get_text, and we'll print the region,
-
4:57
get_text, And the income_level.
-
5:06
Now, we can loop through the ISO codes, and pass them to our get_country method.
-
5:12
So, if __name__, == '__main__':,
-
5:19
Let's bring that up on the screen a little bit,
-
5:22
I've included a file of ISO codes that we can open up and read.
-
5:27
So, file, country_code,
-
5:33
oop, country_iso_codes.csv,
-
5:38
want to read that.
-
5:43
Now, iso_codes, then, will be our reader, File, and our delimiter is ",".
-
5:53
Now, we can loop through our file, and get our information.
-
5:58
for_code in iso_codes, and we want to pass in our code into our get_country
-
6:03
method, And we want the first one from the list.
-
6:12
Now, we can run world_bank.
-
6:18
And it looks like I made a mistake back up here,
-
6:21
it wasn't all lowercase, it's actually incomeLevel.
-
6:25
Let's try it again, and we get all of our expected data.
-
6:30
Again, we could do something else here,
-
6:32
like saving the information to a csv file or database.
-
6:36
Check the teachers' notes for more resources on that.
You need to sign up for Treehouse in order to download course files.
Sign up