Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
Let's continue defining some machine learning terms so that we have the vocabulary to discuss these ideas in more detail.
Vocabulary and Definitions
- Label: A category for data, or a prediction from a classification algorithm
- Classifier: A supervised machine learning model that makes a prediction about how a piece of data should be categorized
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
As we've learned,
a data set is comprised of examples.
0:00
And each of those examples
has common features
0:03
that a model can use to perform
analysis and comparisons.
0:06
But what do we want a model
to do with that data?
0:10
Ultimately, we want it to make
some kind of a prediction.
0:14
And the prediction it
makes is called a label.
0:18
Let's go back to the earlier
example of a spam filter.
0:23
Each example, in this case an email,
has features which,
0:27
in this case, might be things like
the subject line, body, and sender.
0:32
In this case, the label is whether
the message is spam or not spam.
0:37
A classifier is a type of algorithm or
0:43
model that makes a prediction about how
a piece of data should be categorized.
0:46
You can think of a classifier
like a function.
0:52
Data goes in, and then the classifier
predicts the correct category for
0:55
that data.
0:59
It does this by using an existing data
set that has examples where the labels
1:01
are known.
1:05
So for a spam filter, you would
train the classifier with a data set
1:07
where lots of emails are already
labeled as spam or not spam.
1:11
And then when a new email comes in,
it can try to assign a label.
1:16
There's one more thing I want
to mention before we carry on.
1:22
Cleaning and organizing data in different
ways can often produce different results.
1:25
In the case of the emails, you might find
that the raw data from the email doesn't
1:31
make useful features because further
heuristics need to be applied.
1:36
For a spam filter classifier,
you might create features that counts
1:41
the number of spammy
phrases from a dictionary.
1:46
Like free offer or click here,
or a feature that identifies
1:49
an attachment as a photo or an executable
program that might be a virus.
1:54
There's an old saying in computing
called garbage in, garbage out.
2:01
It means that if you provide
the computer with bad information, or
2:05
if you give your machine learning
model a data set that's inaccurate or
2:10
not representative of the whole truth.
2:14
Then you're going to get a bad result.
2:17
And that's it.
2:20
As you can imagine, there are many more
definitions and terms in machine learning.
2:22
But those are the big ones we'll need
in order to continue with our exercise
2:26
in the next videos.
2:31
Where we're going to write
our own classifier in Python
2:32
using a library called scikit-learn.
2:36
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up