Decision Process9:22 with Ken Alger
With our tasks assigned, let's see how we can visualize our data to make better business decisions.
Matplotlib Version Update
At around 8:20 in the video the scatter plot being show doesn't show a good correlation between data points. This course was recorded with matplotlib version 2.0.2 and in current versions of the library the implementation of scatter plots has changed.
Instructions to add files to your notebook
- Download the project files using the Download tab above.
- Unzip the file and navigate to the folder "s3v2" (Stage 3, Video 2).
- In your Notebook, go to File > Open and navigate to the unzipped folder.
From there you can view files just like we're doing in the video.
- Matplotlib tight_layout()
Let's start by thinking about our three tasks and 0:00 what charts would be appropriate for their needs. 0:03 Number one, average time of an eruption. 0:05 We could use a box plot to showcase the median time along with the quartile data, 0:10 which should be helpful. 0:15 We could also look at a histogram here to see if 0:17 it provides any better visualization. 0:20 Number two, for the wait time information, I think a box plot is a good tool here. 0:23 Number three, since we're wanting to determine a correlation 0:29 between a couple of variables, a scatter plot is a nice plot for this. 0:33 Unlike our data, the old_faithful.csv file includes a header row. 0:37 We'll have to pop that off to use our data. 0:44 Let's start a new notebook called old_faithful, and start with our imports. 0:46 We'll import csv, and 0:54 we'll import matplotlib.pyplot as plt. 0:57 Now we'll set up our input file and styling. 1:06 This one is old_faithful.csv. 1:14 Do our plot figure. 1:19 We'll give it a figure size. 1:23 Again, the 7.5 and 4.25 seems to work pretty well for this screen resolution. 1:24 And it's figsize, not figuresize. 1:34 And we'll use the classic style again. 1:45 Now we can read in our data, pop off our header row, and create our eruption and 1:51 wait times list. 1:56 With open, we'll bring in our input_file as read, 2:00 as old_faithful data eruptions will be our list name. 2:05 And we are missing an underscore there. 2:23 Pop off the header row. 2:30 And create our empty list, 2:35 For eruption_times and waiting_times. 2:48 Now let's loop over the events and append the data to the necessary list, for 2:51 event, 0 through the length of our eruptions, 2:56 -1 there, so eruption_times, and we're gonna append, 3:07 The eruption event data. 3:19 Move that up for clarity. 3:25 Waiting_times, we'll do something similar. 3:29 And that's in our second column. 3:39 Great, now let's work through our task list and generate our plots. 3:47 Let's make these all subplots as well, just to save some screen space. 3:53 We'll start with a box plot of our eruption time data. 3:57 Subplot 2, 2, 1. 4:02 Recall that this is saying that our plot will have subplots of 2 rows, 4:05 and 2 columns, and this is the number 1 subplot. 4:10 Make it a boxplot, and we're passing in our eruption_times data. 4:17 Let's make sure to give our subplot titles and labels. 4:25 Plt.title Old Faithful Eruptions. 4:33 We'll adjust those xticks. 4:43 So instead of a 1, It will be Eruptions 4:47 The xlabel is the Length of Eruption in minutes. 4:58 Great, I think that does it for our first plot. 5:08 Now, we wanted to add a histogram here, too, to see if that might be useful. 5:13 So let's add another subplot and create an 8-bin histogram. 5:17 We'll move that up, start our second subplot. 5:23 8 bins, create our histogram with the eruption_times. 5:35 Then our number of bins variable. 5:43 And add our labeling. 5:46 And for our xlabel, Again, 5:59 it's Length of Eruption in minutes. 6:04 There we go, that should take care of task one. 6:11 Let's get our box plot for the waiting time for task two. 6:15 I'll cut and paste the eruptions box plot and make the needed changes. 6:19 So our subplot is gonna be plot 3. 6:29 And we want waiting_times here and not eruption_times. 6:35 We'll just call this Old Faithful Waiting. 6:43 And our tick will be Waiting. 6:52 And Length of Waiting. 6:58 Now for our scatter plot for task three. 7:04 We'll make another subplot. 7:09 And this will be in position 4. 7:15 Let's give a little more space here. 7:16 Plt.scatter eruption_times and waiting_times. 7:22 And add our title and labels. 7:32 Old Faithful Eruptions. 7:36 Xlabel, the Length of the Eruption, again, in minutes. 7:45 And our ylabel, Time Between Eruptions, minutes. 7:56 Now we can call plt.show and run our cell. 8:08 There's all of our plots, but they're all smooshed together, 8:16 not a great presentation. 8:19 Matplotlib provides a method to adjust padding between subplots called 8:22 tight_layout. 8:25 Let's call that before we call show and see how that cleans things up. 8:27 I'll put a link to more information about tight_layout in the teacher's 8:31 notes as well. 8:33 And run our cell. 8:43 That looks a lot better. 8:47 You'll notice here that based on the eruption duration and 8:49 waiting times, combining those two charts won't be super useful. 8:52 Eruption duration is very short in comparison to waiting time. 8:57 This is a great example of where scale is important. 9:01 I'm also not entirely certain of the usefulness of the histogram for 9:04 Steve's purposes. 9:08 So in the final presentation, we may wanna leave that out. 9:09 Now that we've generated a few charts, let's have a quick break. 9:13 When we get back, we'll sum up our project tasks and 9:17 get our presentation ready for Steve. 9:20
You need to sign up for Treehouse in order to download course files.Sign up