Bummer! This is just a preview. You need to be signed in with a Basic account to view the entire video.
Loading a Dataset6:59 with Nick Pettit
Before we can write a classifier, we need something to classify. That is, we need a dataset.
- Iris flower dataset | Wikipedia
- load_iris() | scikit-learn Documentation
- Treehouse Workshop: Introducing Text Editors
- Which Text Editor Should I Use? | Treehouse Blog
- A Beginner’s Guide To The Windows Command Line
from sklearn.datasets import load_iris
iris = load_iris()
Before we can write a classifier we need something to classify, 0:00 that is we need a data set. 0:03 One of the most classic data sets in all of machine learning is the Iris data set 0:06 which is a set of 150 examples of three different types of 0:12 Iris flowers, the Satosa, Versicolor and Virginica. 0:17 In fact, the iris flower data set even has its own Wikipedia page, 0:23 to which you can find a link in the notes associated with this video. 0:28 The Iris flower data set is like the Hello World program of data sets. 0:32 It's not meant to be used in practical applications, but it's good for testing 0:38 machine learning techniques, particularly ones that involve classification. 0:42 If you scroll down to the data set section and click the show button next to data. 0:47 You can see that this data set has four features. 0:56 The length and width of each sepal and the length and width of each petal. 1:00 After these four features there's a label, 1:07 which is the species of the iris flower, 1:13 setosa, versicolor, and virginica. 1:18 Each of these three labels has 50 examples in the data set for 1:23 a total of 150 examples. 1:28 Let's look at another page of the documentation in Sklearn, 1:30 which you can also find a link to in the notes associated with this video. 1:36 Sklearn has a number of small datasets 1:40 built in to demonstrate the different tools available in Sklearn. 1:44 And one of them happens to be the Iris flower dataset. 1:48 This dataset is too small for real machine learning analysis but 1:53 it's still useful for testing things out in this case classification. 1:57 We're going to load this data set into a python program and 2:02 then make a new example and try to predict the label. 2:05 First, open your favorite text editor. 2:11 In these lessons, I'm going to use Atom, which is available on MAC and 2:14 PC, but any plain text editor should work the same. 2:19 If you're not sure which to use, check the notes associated with this video. 2:23 First, create a new file if you haven't already done so. 2:28 And save it as ml.py. 2:31 I already have an ml.py but I'm just going to save over it. 2:40 The ml stands for machine learning and py means Python. 2:46 You can actually name file whatever you would like as long it ends in .py. 2:53 Make sure you remember where your saving this one on our computer, 2:57 because you need to access it later from a command line console. 3:02 Now, l am going to start by importing this Iris dataset, 3:07 so will say from sklearn.datasets and 3:15 then another space. 3:20 I'll type import and then another space, 3:23 and we'll type load_ Iris. 3:29 The data set isn't quite ready to use yet, 3:36 we have to assign it to a variable in our code, like this. 3:39 I'll type iris and an equal sign and 3:44 then use the function, load_iris. 3:48 Now we could print the entire data set, but that's going to look pretty ugly 3:54 on the console and won't really be all that useful to us anyway. 3:59 Instead, let's just print the labels, otherwise known as target names, 4:04 just to make sure that we've loaded the dataset correctly. 4:09 We can do that by using the print function and 4:13 converting the target names into a list like this. 4:17 So we'll type print and some parentheses, and 4:22 inside we'll type list which is a function. 4:26 And inside the list function, 4:31 we'll use the Iris variable that we created followed by a dot. 4:35 And we'll type target underscore names. 4:41 And that will list and print out the target names or 4:44 the labels in the Iris dataset. 4:49 Now make sure you've typed everything carefully and then save the file. 4:53 Now go back to Anaconda Navigator and 4:59 make sure you're in your machine learning basics environment. 5:04 And click the play button, and choose Open Terminal. 5:08 We could use the interactive Python command line, but 5:15 using the terminal will be a little easier for running files like this. 5:18 If you're on Windows, your terminal will obviously look different than on a Mac. 5:24 But the general principles should remain the same. 5:29 Next, you'll need to navigate to the directory where you stored your file. 5:33 So in my case, I know it's in my home directory inside my Dropbox folder. 5:38 Under treehouse, horses, machine learning, basics, 5:47 and so now I've changed to that directory and I will list out its contents. 5:53 And like I said, this is a little different on Mac and Windows. 5:59 So if you do need some additional help, pause this video and check out the notes. 6:03 Once you've navigated to the folder where your Python file is saved, 6:09 type the word Python followed by a space, 6:14 followed by the name of your program, ml.py and then hit enter. 6:19 You should see the three labels in the data set, setosa, 6:28 versicolor and virginica. 6:32 If you get an error go back to your code and 6:35 make sure it's exactly the same as mine. 6:38 It's easy to miss a parentheses or make a small typo so check carefully. 6:42 If you need help, check out the notes in this video for the exact code. 6:47 Great, now that we've loaded a dataset, next, we'll use it to make predictions. 6:53
You need to sign up for Treehouse in order to download course files.Sign up