Bummer! This is just a preview. You need to be signed in with a Basic account to view the entire video.
Start a free Basic trial
to watch this video
No matter your data source, you should always expect to clean your datasets
-
0:00
[MUSIC]
-
0:04
[SOUND] When it comes to describing our tie by the type of fabric,
-
0:07
we have a lot of different options.
-
0:09
But how can we easily identify the ones made with a cashmere blend?
-
0:14
We can look through the description and see if it lists cashmere.
-
0:17
We're gonna create a function now that creates a new column for
-
0:21
our data of boolean type.
-
0:23
Let's go ahead and open up a new file.
-
0:25
[SOUND] Let's import our
-
0:30
work from the previous.
-
0:35
[BLANK_AUDIO]
-
0:40
There we go.
-
0:41
Okay, so let's create a function
-
0:46
create boolean field from search term.
-
0:55
And let's pass in the data_sample and also the search_term.
-
1:01
And we're gonna pass back a new list that we're just gonna call new_array.
-
1:07
[BLANK_AUDIO]
-
1:15
We need to make sure that this new array,
-
1:18
which will be the new column, has the data header.
-
1:21
We want to make sure this new column has the header of the search term
-
1:25
that we're looking for.
-
1:27
So let's add that in.
-
1:29
So it's new_array.append and
-
1:33
then the data_samples of the existing set of headers,
-
1:40
but we wanna add one more at the end.
-
1:44
So we wanna append cashmere or search_term, I should say.
-
1:50
[BLANK_AUDIO]
-
1:54
When we create a new column for our new data sample,
-
1:58
we wanna have the headers from before, but adding in the new column's header.
-
2:04
So we do that by taking in the data_sample's header and
-
2:09
adding to the very end of the list the search_term.
-
2:13
And then we add that to the new temporary array, just so we have it in there.
-
2:17
And now we can go through the rest of the data sample, all 5,000 or so lines.
-
2:22
[BLANK_AUDIO]
-
2:31
Starting from the second row, within this loop we're going to set the boolean value.
-
2:37
So first of all, we're going to default to false.
-
2:42
[BLANK_AUDIO]
-
2:47
If there is a search term in, well,
-
2:52
we're going to have as the description,
-
2:57
we want to say it's true because it exists.
-
3:02
There we go.
-
3:03
[BLANK_AUDIO]
-
3:08
And whatever that boolean value is for the new field, I wanna append it to the row.
-
3:13
[BLANK_AUDIO]
-
3:18
And then we wanna save that row into the new array.
-
3:22
[BLANK_AUDIO]
-
3:26
All right, so this will build up everything we have, plus the new column.
-
3:30
[BLANK_AUDIO]
-
3:36
Okay, let's try running this cuz this will be really, really fun.
-
3:38
Okay, so we have my_new_csv equals,
-
3:44
my_new_csv equals create boolean field from search term,
-
3:52
passing in the data sample and our search term.
-
3:59
[BLANK_AUDIO]
-
4:22
So that didn't actually filter it, but it did create the new field, right?
-
4:27
So, let's see how to filter so that we can actually count the number
-
4:32
of boolean values and see how many of the ties match the search term.
-
4:38
So we're gonna have to create a new function.
-
4:41
Oh, and this function we're going to call it filter_col_by_bool,
-
4:48
and we're gonna pass in the data_sample and
-
4:53
the column number of the boolean.
-
4:57
[BLANK_AUDIO]
-
5:03
Actually, instead of filtered_rows,
-
5:08
maybe we'll call this matches_search_term.
-
5:13
And for each item in data_sample,
-
5:21
starting with the second row,
-
5:28
if items search term column
-
5:34
matches the value true.
-
5:40
If is true, if it exists, or if it's true,
-
5:45
then matches_search_term should
-
5:50
include that item, append(item).
-
5:56
And at the end of that loop,
-
5:58
we want to return all the items that matches_search_term.
-
6:03
[BLANK_AUDIO]
-
6:07
Okay.
-
6:08
Now, now we can say, here,
-
6:14
number_of_cashmere_ties.
-
6:21
[BLANK_AUDIO]
-
6:35
Equals my_new_csv and
-
6:40
then filter that,
-
6:44
filter_ col_ by_bool.
-
6:49
So you want to filter that, and
-
6:54
then we want the number of records in that.
-
7:01
Now we want the number of records, and we can print that.
-
7:05
And let's see how many cashmere ties we found.
-
7:09
Oops, we forgot the column.
-
7:12
We wanna say, column 11, the last one.
-
7:18
[BLANK_AUDIO]
-
7:21
We have 56 ties that were made with cashmere in our data set.
You need to sign up for Treehouse in order to download course files.
Sign up