Examples and Features3:02 with Nick Pettit
Before we continue, we should formally define some of the terms I've been using to describe machine learning, and then break them down further with more examples.
Vocabulary and Definitions
- Example: A single element in a dataset
- Feature: One characteristic of an example
[MUSIC] 0:00 Toward the end of these lessons, we're going to Python and 0:05 the scikit-learn project to write our own classifier. 0:08 But before we continue, we should formally define some of the terms I've been using 0:12 to describe machine learning and 0:17 then break them down further with more examples. 0:19 Speaking of examples, an example is a single element in a dataset. 0:23 Sometimes you might hear an example referred to as a sample, 0:29 but it means the same thing. 0:35 If your data is formatted in a table, 0:37 an example might be a single row in the table. 0:40 A dataset is comprised on many examples. 0:45 And in general, 0:48 each example helps improve the confidence of your model's predictions. 0:49 Say for instance, you're running a movie studio and you want to try an forecast 0:55 how much money a movie might make, so that you can set a budget. 1:00 Your dataset would probably be examples of older movies. 1:04 So what about those older movies might you include? 1:09 Each part of an example is called a feature. 1:13 A feature is one characteristic of an example. 1:17 Again, if you formatted your data in a table, 1:22 each feature might be a single column. 1:25 In the case of predicting a movie's box office performance, your older examples of 1:29 movies might include things like their total box office sales. 1:34 The budget, the genre, release date and 1:38 maybe more advanced features, like a star power calculation. 1:41 Which could take all the actors in each movie and calculate a weighted average of 1:46 their typical box office performance in other movies they've been in. 1:50 A dataset might contain good and bad features. 1:56 And some features that are more important than others. 2:00 For example, you might find that the genre and 2:04 release date is more important than the budget. 2:06 So your model could weigh those features more heavily. 2:10 A feature that might be completely irrelevant is the movie's title. 2:14 Sure a movie needs a title and you might be able to come up with a machine learning 2:19 model that can determine what makes a good and bad movie title. 2:23 But in most cases, its probably too subjective and 2:28 inconsequential to weigh it against other more quantifiable features. 2:31 Something like the box office performance of a movie is very difficult to predict. 2:37 And it includes a huge number of factors that are nearly 2:42 impossible to simulate perfectly. 2:45 But that's why a model is nothing more than that. 2:48 A model or a simplification of the problem. 2:51 It's just one tool that can be used in combination with other approaches 2:55 to arrive at a solution. 3:00
You need to sign up for Treehouse in order to download course files.Sign up