Invest in Your Future 💫 2025 Is Yours! Kick off your New Year's Resolutions with 1 year of Treehouse for $150!

Woohoo! A new Figma course has dropped! 🤩

🤑 Join the Treehouse affiliate program and earn 25% commission!

🌟 Dreaming of a bright future? 🎓 Ask about the Treehouse Scholarship program! 🚀

🫡 Treehouse for Military offers discounts to Veterans, service members and their families!

Preview

Start a free Courses trial
to watch this video

Sign up for Treehouse

Labels and Classifiers

2:39 with Nick Pettit

Let's continue defining some machine learning terms so that we have the vocabulary to discuss these ideas in more detail.

Teacher's Notes
Questions?
Video Transcript
Downloads
Workspaces

Vocabulary and Definitions

Label: A category for data, or a prediction from a classification algorithm
Classifier: A supervised machine learning model that makes a prediction about how a piece of data should be categorized

Related Discussions

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

Related Discussions

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

As we've learned, a data set is comprised of examples. 0:00

And each of those examples has common features 0:03

that a model can use to perform analysis and comparisons. 0:06

But what do we want a model to do with that data? 0:10

Ultimately, we want it to make some kind of a prediction. 0:14

And the prediction it makes is called a label. 0:18

Let's go back to the earlier example of a spam filter. 0:23

Each example, in this case an email, has features which, 0:27

in this case, might be things like the subject line, body, and sender. 0:32

In this case, the label is whether the message is spam or not spam. 0:37

A classifier is a type of algorithm or 0:43

model that makes a prediction about how a piece of data should be categorized. 0:46

You can think of a classifier like a function. 0:52

Data goes in, and then the classifier predicts the correct category for 0:55

that data. 0:59

It does this by using an existing data set that has examples where the labels 1:01

are known. 1:05

So for a spam filter, you would train the classifier with a data set 1:07

where lots of emails are already labeled as spam or not spam. 1:11

And then when a new email comes in, it can try to assign a label. 1:16

There's one more thing I want to mention before we carry on. 1:22

Cleaning and organizing data in different ways can often produce different results. 1:25

In the case of the emails, you might find that the raw data from the email doesn't 1:31

make useful features because further heuristics need to be applied. 1:36

For a spam filter classifier, you might create features that counts 1:41

the number of spammy phrases from a dictionary. 1:46

Like free offer or click here, or a feature that identifies 1:49

an attachment as a photo or an executable program that might be a virus. 1:54

There's an old saying in computing called garbage in, garbage out. 2:01

It means that if you provide the computer with bad information, or 2:05

if you give your machine learning model a data set that's inaccurate or 2:10

not representative of the whole truth. 2:14

Then you're going to get a bad result. 2:17

And that's it. 2:20

As you can imagine, there are many more definitions and terms in machine learning. 2:22

But those are the big ones we'll need in order to continue with our exercise 2:26

in the next videos. 2:31

Where we're going to write our own classifier in Python 2:32

using a library called scikit-learn. 2:36

You need to sign up for Treehouse in order to download course files.

Sign up

You need to sign up for Treehouse in order to set up Workspace

Sign up