Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
We're going to use a Python library called scikit-learn, which includes lots of well designed tools for performing common machine learning tasks. We're going to install scikit-learn and its dependencies using Anaconda, which is a Python-based platform focused on data science and machine learning.
[MUSIC]
0:00
We're going to use a Python library
called scikit-learn, which includes lots
0:04
of well designed tools for
performing common machine learning tasks.
0:09
We're going to install scikit-learn and
its dependencies using Anaconda,
0:14
which is a Python based platform focused
on data science and machine learning.
0:19
If you haven't installed Anaconda already,
0:25
check the notes associated with this video
for instructions on how to do that first.
0:27
It's easy and quick to set up,
so if you don't have it yet,
0:33
pause this video and
come back when you have Anaconda running.
0:37
You'll also need to download the project
files associated with this video.
0:42
Inside the ZIP file,
you'll find a file called environment.yml.
0:47
Remember where you save this file, because
you'll need to find it again later.
0:53
Once you have Anaconda Navigator
open on your computer and
0:58
the environment.yml file downloaded,
you're ready to get started.
1:02
Inside Anaconda Navigator,
click the Environments tab on the left.
1:08
And by default, you should have at least
one environment already, called base.
1:14
But to make sure that you have all the
dependencies installed that you'll need,
1:21
I've created an environment for us to use.
1:27
Click the Import button at
the bottom of the environments list.
1:31
Then click the folder
icon on the right side.
1:36
And locate the environment.yml file
1:41
that you saved earlier and click Open.
1:45
The .yml file will suggest a name for
your environment.
1:50
In this case, it's MachineLearningBasics,
which is the name of this course.
1:55
However, you can rename the environment
to something else if you'd prefer,
2:00
when you're done, choose Import, This will
2:05
begin the process of downloading and
installing all the dependencies
2:10
you'll need to use scikit-learn,
including scikit-learn itself.
2:15
This might take a few minutes,
2:20
depending on the speed of your
Internet connection and your computer.
2:21
While that's running,
let's take a look at the documentation for
2:26
scikit-learn to get
a little more familiar.
2:29
Point your web browser
to scikit-learn.org, you
2:33
can find a link in the notes associated
with this video if that's easier for you.
2:39
And, well, you've already seen
the home page in a previous video,
2:44
let's click on Classification.
2:49
This is a list of all the different types
of supervised learning functions and
2:53
algorithms within scikit-learn.
2:59
It's quite a long list and
3:02
it can be challenging to understand
which tools we should use.
3:04
That's not something we'll be
able to get into in this course.
3:08
But as we learned previously,
supervised learning starts with a dataset.
3:12
And that's what we want, so
we know we're in the right place.
3:18
Scroll down to the section labeled 1.10,
Decision Trees.
3:22
Then click the first link underneath that
header that says 1.10.1, Classification.
3:30
DecisionTreeClassifier is
the class that we'll use.
3:39
It says, DecisionTreeClassifier is a class
3:44
capable of performing multi-class
classification on a dataset.
3:47
As with other classifiers,
3:53
DecisionTreeClassifier takes
as input two arrays.
3:54
An array x, sparse or dense,
of size n_samples and
3:58
n_features, holding the training samples.
4:03
And an array y of integer
values size n_samples,
4:07
holding the class labels for
the training samples.
4:12
In other words, a DecisionTreeClassifier
is one type of classifier.
4:17
In this case, it's model that
tries to predict a value by making
4:23
its own choices and rules that
are inferred from the data features.
4:28
The programming behind this is
pretty complicated, but essentially,
4:34
the computer tries to fit
the data to a pattern.
4:39
And using other methods in
the DecisionTreeClassifier class,
4:42
you can visualize the decision
path of the tree and
4:46
have the computer show how it
arrived at a certain prediction.
4:49
Looking at the example here,
the DecisionTreeClassifier
4:55
starts with a dataset in
the form of an array.
5:00
This data will be all of our examples,
broken out by feature.
5:06
And then we have another array that
contains our labels for the data.
5:10
Then we create
a DecisionTreeClassifier object and
5:15
use its function called fit to build
a DecisionTreeClassifier from the dataset.
5:20
In other words, this is where
the decision tree is created.
5:28
And using that same decision tree,
5:32
we can now predict new examples
that don't have labels.
5:35
That's a lot of information to take in,
and for the sake of keeping this
5:39
overview at a high level,
I've simplified a lot of the details.
5:44
But by now, your environment
should be done installing.
5:48
And in the next video,
5:51
we're going to go over that same
code again by writing it ourselves.
5:53
To learn more right now,
I highly recommend you check out the notes
5:58
associated with this video, and read the
scikit-learn documentation for yourself.
6:01
You need to sign up for Treehouse in order to download course files.
Sign up