🌟 Dreaming of a bright future? 🎓 Ask about the Treehouse Scholarship program! 🚀

🤖 Level up your chatbot knowledge with our latest AI course.

Join our free community Discord server here!

Learn React with us!

Preview

Start a free Courses trial
to watch this video

Sign up for Treehouse

Loading a Dataset

6:59 with Nick Pettit

Before we can write a classifier, we need something to classify. That is, we need a dataset.

Teacher's Notes
Questions?
Video Transcript
Downloads
Workspaces

Resources

Python Code

from sklearn.datasets import load_iris
iris = load_iris()
print(list(iris.target_names))

Related Discussions

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

Related Discussions

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

Before we can write a classifier we need something to classify, 0:00

that is we need a data set. 0:03

One of the most classic data sets in all of machine learning is the Iris data set 0:06

which is a set of 150 examples of three different types of 0:12

Iris flowers, the Satosa, Versicolor and Virginica. 0:17

In fact, the iris flower data set even has its own Wikipedia page, 0:23

to which you can find a link in the notes associated with this video. 0:28

The Iris flower data set is like the Hello World program of data sets. 0:32

It's not meant to be used in practical applications, but it's good for testing 0:38

machine learning techniques, particularly ones that involve classification. 0:42

If you scroll down to the data set section and click the show button next to data. 0:47

You can see that this data set has four features. 0:56

The length and width of each sepal and the length and width of each petal. 1:00

After these four features there's a label, 1:07

which is the species of the iris flower, 1:13

setosa, versicolor, and virginica. 1:18

Each of these three labels has 50 examples in the data set for 1:23

a total of 150 examples. 1:28

Let's look at another page of the documentation in Sklearn, 1:30

which you can also find a link to in the notes associated with this video. 1:36

Sklearn has a number of small datasets 1:40

built in to demonstrate the different tools available in Sklearn. 1:44

And one of them happens to be the Iris flower dataset. 1:48

This dataset is too small for real machine learning analysis but 1:53

it's still useful for testing things out in this case classification. 1:57

We're going to load this data set into a python program and 2:02

then make a new example and try to predict the label. 2:05

First, open your favorite text editor. 2:11

In these lessons, I'm going to use Atom, which is available on MAC and 2:14

PC, but any plain text editor should work the same. 2:19

If you're not sure which to use, check the notes associated with this video. 2:23

First, create a new file if you haven't already done so. 2:28

And save it as ml.py. 2:31

I already have an ml.py but I'm just going to save over it. 2:40

The ml stands for machine learning and py means Python. 2:46

You can actually name file whatever you would like as long it ends in .py. 2:53

Make sure you remember where your saving this one on our computer, 2:57

because you need to access it later from a command line console. 3:02

Now, l am going to start by importing this Iris dataset, 3:07

so will say from sklearn.datasets and 3:15

then another space. 3:20

I'll type import and then another space, 3:23

and we'll type load_iris. 3:29

The data set isn't quite ready to use yet, 3:36

we have to assign it to a variable in our code, like this. 3:39

I'll type iris and an equal sign and 3:44

then use the function, load_iris. 3:48

Now we could print the entire data set, but that's going to look pretty ugly 3:54

on the console and won't really be all that useful to us anyway. 3:59

Instead, let's just print the labels, otherwise known as target names, 4:04

just to make sure that we've loaded the dataset correctly. 4:09

We can do that by using the print function and 4:13

converting the target names into a list like this. 4:17

So we'll type print and some parentheses, and 4:22

inside we'll type list which is a function. 4:26

And inside the list function, 4:31

we'll use the iris variable that we created followed by a dot. 4:35

And we'll type target underscore names. 4:41

And that will list and print out the target names or 4:44

the labels in the Iris dataset. 4:49

Now make sure you've typed everything carefully and then save the file. 4:53

Now go back to Anaconda Navigator and 4:59

make sure you're in your machine learning basics environment. 5:04

And click the play button, and choose Open Terminal. 5:08

We could use the interactive Python command line, but 5:15

using the terminal will be a little easier for running files like this. 5:18

If you're on Windows, your terminal will obviously look different than on a Mac. 5:24

But the general principles should remain the same. 5:29

Next, you'll need to navigate to the directory where you stored your file. 5:33

So in my case, I know it's in my home directory inside my Dropbox folder. 5:38

Under treehouse, courses, machine learning, basics, 5:47

and so now I've changed to that directory and I will list out its contents. 5:53

And like I said, this is a little different on Mac and Windows. 5:59

So if you do need some additional help, pause this video and check out the notes. 6:03

Once you've navigated to the folder where your Python file is saved, 6:09

type the word python followed by a space, 6:14

followed by the name of your program, ml.py and then hit enter. 6:19

You should see the three labels in the data set, setosa, 6:28

versicolor and virginica. 6:32

If you get an error go back to your code and 6:35

make sure it's exactly the same as mine. 6:38

It's easy to miss a parentheses or make a small typo so check carefully. 6:42

If you need help, check out the notes in this video for the exact code. 6:47

Great, now that we've loaded a dataset, next, we'll use it to make predictions. 6:53

You need to sign up for Treehouse in order to download course files.

Sign up

You need to sign up for Treehouse in order to set up Workspace

Sign up