Bummer! This is just a preview. You need to be signed in with a Basic account to view the entire video.
Start a free Basic trial
to watch this video
Let's continue defining some machine learning terms so that we have the vocabulary to discuss these ideas in more detail.
Vocabulary and Definitions
- Label: A category for data, or a prediction from a classification algorithm
- Classifier: A supervised machine learning model that makes a prediction about how a piece of data should be categorized
-
0:00
As we've learned, a data set is comprised of examples.
-
0:03
And each of those examples has common features
-
0:06
that a model can use to perform analysis and comparisons.
-
0:10
But what do we want a model to do with that data?
-
0:14
Ultimately, we want it to make some kind of a prediction.
-
0:18
And the prediction it makes is called a label.
-
0:23
Let's go back to the earlier example of a spam filter.
-
0:27
Each example, in this case an email, has features which,
-
0:32
in this case, might be things like the subject line, body, and sender.
-
0:37
In this case, the label is whether the message is spam or not spam.
-
0:43
A classifier is a type of algorithm or
-
0:46
model that makes a prediction about how a piece of data should be categorized.
-
0:52
You can think of a classifier like a function.
-
0:55
Data goes in, and then the classifier predicts the correct category for
-
0:59
that data.
-
1:01
It does this by using an existing data set that has examples where the labels
-
1:05
are known.
-
1:07
So for a spam filter, you would train the classifier with a data set
-
1:11
where lots of emails are already labeled as spam or not spam.
-
1:16
And then when a new email comes in, it can try to assign a label.
-
1:22
There's one more thing I want to mention before we carry on.
-
1:25
Cleaning and organizing data in different ways can often produce different results.
-
1:31
In the case of the emails, you might find that the raw data from the email doesn't
-
1:36
make useful features because further heuristics need to be applied.
-
1:41
For a spam filter classifier, you might create features that counts
-
1:46
the number of spammy phrases from a dictionary.
-
1:49
Like free offer or click here, or a feature that identifies
-
1:54
an attachment as a photo or an executable program that might be a virus.
-
2:01
There's an old saying in computing called garbage in, garbage out.
-
2:05
It means that if you provide the computer with bad information, or
-
2:10
if you give your machine learning model a data set that's inaccurate or
-
2:14
not representative of the whole truth.
-
2:17
Then you're going to get a bad result.
-
2:20
And that's it.
-
2:22
As you can imagine, there are many more definitions and terms in machine learning.
-
2:26
But those are the big ones we'll need in order to continue with our exercise
-
2:31
in the next videos.
-
2:32
Where we're going to write our own classifier in Python
-
2:36
using a library called scikit-learn.
You need to sign up for Treehouse in order to download course files.
Sign up