Bokeh's ColumnDataSource10:22 with Ken Alger
ColumnDataSource is one way in which the data of a Bokeh graph is stored. Let's take a look at some examples of how to utilize this great feature.
This course was produced with version 0.12.5 of Bokeh. The bar chart produced in this video around the 4:50 mark utilizes the
bokeh.charts module which has been deprecated in newer versions. The rest of this course relies on the
Welcome back. 0:00 We've seen that we can parse in series data, 0:01 such as lists, into our figure object, to ultimately generate our plot. 0:04 We can parse in other sources of data as well, such as pics, umpire raise or 0:08 Panda's data frames. 0:12 However, Bokeh provides its own implementation 0:14 of series data that we can use, the column data source. 0:17 Column data source is a table like data object which maps column names 0:21 to sequences or arrays of data. 0:26 It takes in a data series such as a Python dictionary and 0:28 maps the key to the sequence of values. 0:32 In fact, behind the scenes, when we parse in x equals list one, 0:35 two and three, bokeh is making a column data source with a name of x 0:38 with a mapping of values as one, two, and three. 0:43 It also allows for excess in the values. 0:47 Inside the data, much like one would with a Python dictionary and 0:49 easily accepts Panda's data frames as values. 0:53 In upcoming videos we'll also look at some additional features of having our data 0:56 in a column data source, then make using it in our Bokeh projects 1:00 where useful than sticking with Panda's data frames or numpy.rates. 1:03 We can add in data to the data source for hover tool tips and 1:07 even use the column data source for linked visualizations. 1:11 Again, we'll be taking a look at this feature soon. 1:14 There's something to note about column data source. 1:17 Like in the previous video, the columns must all be the same length. 1:21 Missing or ragged data is not permitted within a single data source. 1:24 Let's have a look at how we can utilize column data source to handle some data, 1:29 and start to visualize some actual figures. 1:32 To get started, you want to download the course files and 1:35 get them set up in your favorite editor. 1:38 I'll head back into PyCharm. 1:41 The starter files include the dataset we'll be using, country-pops.csv, 1:43 along with the requirements.txt file which we already installed in a previous video. 1:47 Let's go ahead and create a new file called stage1-3.py. 1:52 Here, we'll want to get our CSV file imported and 1:58 take a look at the format of our data. 2:01 Since this file contains information for all the countries of the world, 2:03 let's remind ourselves why we're exploring data to just five rows and 2:07 display the header information as well. 2:12 We'll need to import numpy as np, 2:15 import pandas as pd, again, thats a pretty standard naming convention for those. 2:20 We're gonna be ringing in our country-pops.csv file. 2:30 And we'll use the pandas.read_csv method. 2:41 Bring this in. 2:47 And then for data exploration, we'll create a np.array, 2:51 With a header, and 3:01 we need to limit our rows here, nrows=5. 3:04 And print(countries_array). 3:14 And when we run this, We can see then that in our data, 3:20 we have quite a bit of useful information. 3:25 The English and German names of the country, code code, population, etc. 3:28 As is often the case with datasets, 3:34 there's information included that you might not necessarily be interested in for 3:36 the current project, such as birth and death rates. 3:41 But, it's great to have that information available for our future explorations. 3:44 Now that we have a table of data, we should put it to use. 3:49 Imagine that you have this information in your favorite spreadsheet application. 3:52 Let start with the chart that people will often generate in Excel. 3:56 A bar chart for example. 3:59 We'll need to bring in our imports and 4:01 in this case import bokeh.charts to plot a bar chart. 4:03 And from bokeh.io, we'll import our output file and show methods. 4:19 Then we define our output file. 4:25 Let's call it population.html. 4:27 Next, we need to build out our bar chart by parsing in our countries.pandas 4:36 data frame. 4:40 We can do that here at the bottom and tell it which information to use. 4:42 We'll just call it bar_chart. 4:47 Next, we need to parse inner bar chart object into the show method. 5:21 Show(bar_chart) and then run our script. 5:28 Since our n rows is still set to five, we see the five countries displayed on a bar 5:38 chart showing their representative populations. 5:42 Pretty cool. 5:46 You'll notice that we are setting our legend to false so 5:47 that it doesn't display. 5:50 In a later video, we will take a look at how to customize our chart legends. 5:51 There are a variety of other charts that can be generated as well, and 5:55 I've included links to them in the teachers notes. 5:59 While, pandas data frames work great for many applications. 6:01 Bokeh provides another option that allows for some cleaner code. 6:04 And as we'll find out in a later video, some great additional features. 6:08 To take advantage of this, we utilize Bokeh's column data source. 6:12 This will map the column names to the sequences of data. 6:17 Let's go a bit beyond bar charts, or pie graphs with their data. 6:20 Since we're attempting to provide some data analysis in our reporting, and 6:24 see what, if any, correlation there is between the country's population, and 6:28 life expectancy. 6:32 Before we can use it, we need to import column data source from bokeh.plotting and 6:33 parse in our data. 6:38 We'll need to do our imports from Bokeh along with the column data 6:39 source import from bokeh.plotting, we want to import ColumnDataSource. 6:44 And figure. 6:54 Since country's at pandas data frame, 6:56 we can parse that into ColumnDataSource as our data source. 6:58 And since we don't need the country array or print function any longer, 7:01 we can clean up our code and delete those lines. 7:05 And, while we're at it, let's rename our output file to something more meaningful. 7:09 Since we're examining population versus life expectancy, 7:14 let's use pop-life.html as our file name. 7:21 And we can get rid of the numpy import as well. 7:26 And we won't be needing bar any longer. 7:29 Great. 7:38 We wanna set up our country data. 7:40 Just taking our ColumnDataSource, and we want our countries. 7:45 Now, we just need to build out our figure plot, 7:53 similar to what we have done in the past. 7:56 But, we'll use circle ellipse here and 7:58 not pass in any tool parameters to accept the default. 8:01 Our plot. 8:06 And we'll label our x_axis as Population. 8:13 Our y_axis is Life Expectancy. 8:21 Great, we'll do a plot.circles, 8:31 pop in population for our x value. 8:35 Life_expectancy. 8:45 For y value. 8:48 Our source data is our country data. 8:52 And we'll make our glyphs 15 points. 8:58 That should look roughly familiar, but what is going on there with our x and 9:01 y values. 9:05 And what does that source equals country_data bit? 9:06 Well, we are telling our plot to use our source variable, 9:09 our ColumnDataSource information, as the source of our data. 9:13 Our x value is a population value inside that data set, and 9:18 our y values are the values of the matching Life Expectancy column of data. 9:22 Notice that capitalization matters here, and that we need to match our x and 9:27 y values to the correct names of our data set column headers. 9:32 Now, we just need to tell Bokeh to show us our plot and we'll be all set. 9:36 Show plot, and run our script. 9:42 Awesome we have a plot showing us life expectancy versus population for 9:51 our five countries. 9:55 Great work, we have seen several things here. 9:56 How to utilize pandas data frames as a source of data and get it to work with 10:00 Bokeh column data source, and plot that data based on specific column names. 10:04 It's not very helpful right now though in terms of data exploration because we can't 10:09 tell which glyph is which. 10:12 When we come back, we'll look at adding specific colors to our glyphs and 10:15 add legends to our graph to help better understand our data. 10:19
You need to sign up for Treehouse in order to download course files.Sign up