Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
A classifier looks at a piece of data and tries to categorize it. In this video, we'll use scikit-learn to write a classifier using the dataset we loaded previously.
Resources
Python Code
from sklearn.datasets import load_iris
iris = load_iris()
print(list(iris.target_names))
from sklearn import tree
classifier = tree.DecisionTreeClassifier()
classifier = classifier.fit(iris.data, iris.target)
print(classifier.predict([[5.1,3.5,1.4,1.5]]))
In order to make predictions,
we need to choose a classification model.
0:00
And we're going to use the decision tree
classifier that we looked at earlier.
0:05
Back in your Python file, let's add
to the program on the next few lines.
0:10
First, we'll import scikit-learn's module,
0:15
which includes all of
the decision tree models.
0:20
So we'll type from sklearn import tree.
0:27
Next, we'll create a decision
tree classifier and
0:34
assign it to a variable so
we can work with it.
0:38
We'll type classifier as the name
of my variable, and we'll set that
0:42
to tree.DecisionTreeClassifier and
0:48
that's a function, so
add parenthesis at the end.
0:56
Now, we need to actually build
the DecisionTree through which
1:00
each new example will flow.
1:05
This decision tree can be built by
feeding it both the training example and
1:08
the target labels using the fit function,
like this.
1:13
So, again,
we'll use our classifier variable and
1:19
we'll type classifier.fit
1:27
And inside of this function,
we'll pass the Iris.data,
1:34
and then the iris.target or the labels.
1:41
So just to review,
we've created a decision tree model.
1:46
And now we're actually building
the decision tree using its fit function
1:52
which takes a set of examples and
the target labels.
1:56
For more on the fit function, check out
the notes associated with this video.
2:00
At this point the data is loaded and we've
built a decision tree based on that data.
2:05
Now we can feed a new example into
the top of that decision tree, and
2:10
it will flow through each branching
decision like a flow chart
2:15
until it finally reaches its target label.
2:19
Now finally comes the part we've been
working toward, making predictions.
2:24
We can do this using the predict function
on the decision tree classifier,
2:28
like this.
2:34
So first, this is something I'm
going to want to print out.
2:36
And inside of the print function,
I'll type classifier.predict,
2:39
and we'll open and close some parentheses.
2:47
I am wrapping this in
a print function just so
2:52
that we can see the outcome
of the code when we run it.
2:54
Inside of the predict parentheses,
2:58
create two sets of nested square brackets,
like this.
3:02
So there's the first pair.
3:08
And then inside of those square brackets,
we'll make another pair of opening and
3:10
closing square brackets.
3:15
The outermost set of square brackets
is an array of our examples.
3:18
The innermost set of square brackets
3:24
is where we'll put the values of
features for a single example.
3:27
So in other words, we could predict
multiple examples at a time, but
3:33
we're just sticking with one for now.
3:38
So let's start out by testing the model
to make sure it's working correctly.
3:40
We can do that by just putting
in an example from the data set.
3:47
From the Wikipedia page,
3:52
I'll type in the first example from
the data set which happens to be a setosa.
3:53
So inside of the innermost
square brackets,
4:01
I'll type 5.1, 3.5, 1.4, and 0.2.
4:06
And if we go back to the Wikipedia
page to look at that,
4:12
again, you can see that this
first example should be a setosa.
4:18
And now let's save it and then back
in the terminal, we'll run the code.
4:27
So I'll just hit the up arrow to get
the previous command and hit Enter.
4:35
And the output should be the names
of the flowers because we still have
4:42
the Iris.target_names printing first.
4:46
And then next, we have this index (0),
which is exactly what we want.
4:50
Remember, arrays start
counting indices at 0.
4:57
And, in this case,
the index is referring to these labels.
5:01
A setosa is 0, Versicolor is 1,
and virginica is 2.
5:06
So zero is indeed a setosa, and so
5:13
we know that the model is
predicting this correctly.
5:16
So now let's mess with
this data a little bit.
5:21
In the data set, most of the setosas
have a pedal width of about 0.2.
5:24
With examples ranging from 0.1 up to 0.6.
5:31
However, versicolors
range from 1.0 to 1.8.
5:36
And then virginicas have
petal widths from 1.4 to 2.5.
5:41
So in our example, let's change this last
5:46
feature to something like 1.5 instead.
5:51
That would be well above normal for
a setosa, but
5:57
match the upper end of versicolors and
just barely make the cut of virginicas.
6:01
Now save the code, and let's run it again.
6:08
You should see either a 0 or a 1.
6:15
If you hit the up arrow and
hit enter, again and
6:19
again to execute the same code,
you should see both numbers appearing.
6:23
That's because the decision tree
classifier randomly chooses a feature
6:31
that it thinks will make
the best comparison.
6:35
However, because we're always working
with probabilistic behavior in machine
6:38
learning, we wont necessarily get
the same result on every run.
6:43
This can indicate a low level
of confidence which make sense.
6:48
The other values in our new example
6:53
don't line up with any other
examples in the data set.
6:57
And we've pushed the pedal width enough
7:00
that the classifier can't really
draw any confident conclusion.
7:03
It's somewhere between a setosa and
a versicolor.
7:07
That's it for coding.
7:13
In our next video, we'll review some
of the big ideas we've learned.
7:14
You need to sign up for Treehouse in order to download course files.
Sign up