Scraping APIs6:40 with Ken Alger
APIs are all around us on the web. Sometimes we can use scraping techniques to interact with them in a meaningful way.
Back in the good old days of the Internet, if we wanted data, 0:00 we had to view it on Web pages. 0:03 Now, however, many sites provide a Web API that shares their data. 0:05 Sometimes, we can use these APIs to directly access information, 0:11 without having to scrape the data. 0:15 I'd recommend looking to see if the site you are wanting to scrape 0:18 offers an API for the information you need. 0:21 It can be a big time saver. 0:24 Let's take a look at how we can get data from the World Bank, using their API. 0:27 There are many instances when using an API is great. 0:32 Sometimes, though, scraping results from an API is useful as well, 0:36 especially if the API documentation isn't super helpful. 0:41 Let's take a brief look at one technique we can use to get and 0:45 process data from an API. 0:49 In this case, we'll look at The World Bank API. 0:51 It's actually very well documented, which provides us with some extra knowledge 0:54 as we go about trying to scrape things. 0:59 If we look here, at the Developer Information overview page, 1:01 it provides information about how to get started, and what the API provides. 1:04 Let's look here, 1:09 at the Country Queries section, to see what information we might explore there. 1:10 It looks like we could use this information to get some generic 1:15 information about the countries of the world. 1:18 For example, if we wanted to do some high-level data exploration about 1:21 income level in regions of the world, let's use this request format here, 1:25 look through some ISO codes, and get some information that we could explore. 1:31 We won't be doing any actual exploration of data in this course, but 1:35 check the teachers' notes for more information. 1:40 Let's take a look at the information we get from a country with a lot of horses, 1:42 like Ethiopia. 1:46 I know their ISO code is ETH, so let's put that into the request format. 1:47 So we can copy this, Let's create 1:53 a new tab, and we'll do ETH. 1:58 It looks like we're getting back the same information as the documentation stated, 2:02 and its in XML format. 2:06 That's great, we can handle that, we'll use Beautiful Soup to parse this XML, 2:08 and get the name, region, and income level. 2:13 This could be used, for 2:16 example, to generate a histogram chart of regions of the world and income levels. 2:18 Lots of options for data visualization, here. 2:23 Let's go back to our code, and create a new world_bank.py file. 2:26 We don't need it inside the spider. 2:31 World_bank.py, and we'll start with our imports. 2:38 So, from urlib.request import urlopen. 2:42 We're going back to Beautiful Soup, so bs4 import BeautifulSoup, and 2:47 we'll be using a csv file of ISO codes, so we 'll want to import csv as well. 2:55 Let's define a function to get the country information, get_country, 3:02 and we'll pass in our country code, And 3:08 just like we've done with Beautiful Soup in the past, we define our HTML string. 3:13 It's urlopen, And 3:19 it's that request format string that we saw just a moment ago, 3:23 worldbank.org/v2/countries/, and we'll use the string formatter, 3:28 country_code, and let's bring this down to a new line. 3:39 Next, we define our soup object. 3:44 So, we pass in our HTML, and for our parser, 3:48 since we're dealing with XML, we can use an XML parser. 3:52 Scraping XML is pretty straightforward with Beautiful Soup. 3:58 If we look at the results we got for Ethiopia, we want to get three fields, 4:02 wb:name, wb:region, and the wb:incomeLevel. 4:07 Let's go ahead and define those. 4:13 Country_name is soup.find( 'wb:name' ), 4:16 Region, ( 'wb:region' ), 4:26 and income_level, soup.find( 4:31 'wb:incomelevel' ), and it was all lowercase. 4:36 Now, let's print that information out. 4:43 Here's a good example of a time when we can use the get_text method. 4:45 Get_text, and we'll print the region, 4:52 get_text, And the income_level. 4:57 Now, we can loop through the ISO codes, and pass them to our get_country method. 5:06 So, if __name__, == '__main__':, 5:12 Let's bring that up on the screen a little bit, 5:19 I've included a file of ISO codes that we can open up and read. 5:22 So, file, country_code, 5:27 oop, country_iso_codes.csv, 5:33 want to read that. 5:38 Now, iso_codes, then, will be our reader, File, and our delimiter is ",". 5:43 Now, we can loop through our file, and get our information. 5:53 for_code in iso_codes, and we want to pass in our code into our get_country 5:58 method, And we want the first one from the list. 6:03 Now, we can run world_bank. 6:12 And it looks like I made a mistake back up here, 6:18 it wasn't all lowercase, it's actually incomeLevel. 6:21 Let's try it again, and we get all of our expected data. 6:25 Again, we could do something else here, 6:30 like saving the information to a csv file or database. 6:32 Check the teachers' notes for more resources on that. 6:36
You need to sign up for Treehouse in order to download course files.Sign up