Installing scikit-learn using Anaconda6:07 with Nick Pettit
We're going to use a Python library called scikit-learn, which includes lots of well designed tools for performing common machine learning tasks. We're going to install scikit-learn and its dependencies using Anaconda, which is a Python-based platform focused on data science and machine learning.
- environment.yml - Download this file to install scikit-learn using Anaconda
[MUSIC] 0:00 We're going to use a Python library called scikit-learn, which includes lots 0:04 of well designed tools for performing common machine learning tasks. 0:09 We're going to install scikit-learn and its dependencies using Anaconda, 0:14 which is a Python based platform focused on data science and machine learning. 0:19 If you haven't installed Anaconda already, 0:25 check the notes associated with this video for instructions on how to do that first. 0:27 It's easy and quick to set up, so if you don't have it yet, 0:33 pause this video and come back when you have Anaconda running. 0:37 You'll also need to download the project files associated with this video. 0:42 Inside the ZIP file, you'll find a file called environment.yml. 0:47 Remember where you save this file, because you'll need to find it again later. 0:53 Once you have Anaconda Navigator open on your computer and 0:58 the environment.yml file downloaded, you're ready to get started. 1:02 Inside Anaconda Navigator, click the Environments tab on the left. 1:08 And by default, you should have at least one environment already, called base. 1:14 But to make sure that you have all the dependencies installed that you'll need, 1:21 I've created an environment for us to use. 1:27 Click the Import button at the bottom of the environments list. 1:31 Then click the folder icon on the right side. 1:36 And locate the environment.yml file 1:41 that you saved earlier and click Open. 1:45 The .yml file will suggest a name for your environment. 1:50 In this case, it's MachineLearningBasics, which is the name of this course. 1:55 However, you can rename the environment to something else if you'd prefer, 2:00 when you're done, choose Import, This will 2:05 begin the process of downloading and installing all the dependencies 2:10 you'll need to use scikit-learn, including scikit-learn itself. 2:15 This might take a few minutes, 2:20 depending on the speed of your Internet connection and your computer. 2:21 While that's running, let's take a look at the documentation for 2:26 scikit-learn to get a little more familiar. 2:29 Point your web browser to scikit-learn.org, you 2:33 can find a link in the notes associated with this video if that's easier for you. 2:39 And, well, you've already seen the home page in a previous video, 2:44 let's click on Classification. 2:49 This is a list of all the different types of supervised learning functions and 2:53 algorithms within scikit-learn. 2:59 It's quite a long list and 3:02 it can be challenging to understand which tools we should use. 3:04 That's not something we'll be able to get into in this course. 3:08 But as we learned previously, supervised learning starts with a dataset. 3:12 And that's what we want, so we know we're in the right place. 3:18 Scroll down to the section labeled 1.10, Decision Trees. 3:22 Then click the first link underneath that header that says 1.10.1, Classification. 3:30 DecisionTreeClassifier is the class that we'll use. 3:39 It says, DecisionTreeClassifier is a class 3:44 capable of performing multi-class classification on a dataset. 3:47 As with other classifiers, 3:53 DecisionTreeClassifier takes as input two arrays. 3:54 An array x, sparse or dense, of size n_samples and 3:58 n_features, holding the training samples. 4:03 And an array y of integer values size n_samples, 4:07 holding the class labels for the training samples. 4:12 In other words, a DecisionTreeClassifier is one type of classifier. 4:17 In this case, it's model that tries to predict a value by making 4:23 its own choices and rules that are inferred from the data features. 4:28 The programming behind this is pretty complicated, but essentially, 4:34 the computer tries to fit the data to a pattern. 4:39 And using other methods in the DecisionTreeClassifier class, 4:42 you can visualize the decision path of the tree and 4:46 have the computer show how it arrived at a certain prediction. 4:49 Looking at the example here, the DecisionTreeClassifier 4:55 starts with a dataset in the form of an array. 5:00 This data will be all of our examples, broken out by feature. 5:06 And then we have another array that contains our labels for the data. 5:10 Then we create a DecisionTreeClassifier object and 5:15 use its function called fit to build a DecisionTreeClassifier from the dataset. 5:20 In other words, this is where the decision tree is created. 5:28 And using that same decision tree, 5:32 we can now predict new examples that don't have labels. 5:35 That's a lot of information to take in, and for the sake of keeping this 5:39 overview at a high level, I've simplified a lot of the details. 5:44 But by now, your environment should be done installing. 5:48 And in the next video, 5:51 we're going to go over that same code again by writing it ourselves. 5:53 To learn more right now, I highly recommend you check out the notes 5:58 associated with this video, and read the scikit-learn documentation for yourself. 6:01
You need to sign up for Treehouse in order to download course files.Sign up