1 00:00:00,000 --> 00:00:04,028 [MUSIC] 2 00:00:04,028 --> 00:00:07,044 [SOUND] Now that we've covered sorting algorithms, 3 00:00:07,044 --> 00:00:10,890 the groundwork has been laid to talk about searching algorithms. 4 00:00:10,890 --> 00:00:13,851 If you need to search through an unsorted list of items, 5 00:00:13,851 --> 00:00:17,316 binary search isn't an option because you have no idea which half 6 00:00:17,316 --> 00:00:20,034 of the list contains the item you're looking for. 7 00:00:20,034 --> 00:00:23,002 Your only real option is to start at the beginning and 8 00:00:23,002 --> 00:00:25,903 compare each item in the list to your target value, 9 00:00:25,903 --> 00:00:29,220 one at a time until you find the value you're looking for. 10 00:00:29,220 --> 00:00:33,045 This algorithm is called linear_search or sequential search, 11 00:00:33,045 --> 00:00:36,602 because the search proceeds in a straight line or sequence. 12 00:00:36,602 --> 00:00:40,894 Even though linear_search is inefficient, searching for just one name will happen so 13 00:00:40,894 --> 00:00:44,490 fast that we won't be able to tell anything useful about the algorithm's 14 00:00:44,490 --> 00:00:45,014 runtime. 15 00:00:45,014 --> 00:00:47,585 So let's suppose we add 100 different names and 16 00:00:47,585 --> 00:00:51,008 that we needed to know where they appear in a list of unsorted names. 17 00:00:51,008 --> 00:00:53,325 Here is some code that demonstrates. 18 00:00:53,325 --> 00:00:57,254 As usual, this code at the top isn't relevant to the search algorithm. 19 00:00:57,254 --> 00:01:00,791 It's just like the code that loaded the list of numbers from a file in 20 00:01:00,791 --> 00:01:05,077 the previous stage, but this code calls a different function, load_strings, 21 00:01:05,077 --> 00:01:06,774 that loads a list of strings in. 22 00:01:06,774 --> 00:01:10,215 If you want the load_strings Python code, we'll have it for 23 00:01:10,215 --> 00:01:11,880 you in the teacher's notes. 24 00:01:11,880 --> 00:01:13,735 Here's a separate hard-coded list, 25 00:01:13,735 --> 00:01:16,439 containing the 100 names we're going to search for. 26 00:01:16,439 --> 00:01:20,823 We'll loop through each name in this list and pass it to our search function to get 27 00:01:20,823 --> 00:01:23,470 the index within the full list where it appears. 28 00:01:23,470 --> 00:01:25,920 Now, let's implement the search function. 29 00:01:25,920 --> 00:01:29,171 Compared to the sorting algorithms, this is going to be short. 30 00:01:29,171 --> 00:01:33,987 The index of item function takes the Python list you want to search through and 31 00:01:33,987 --> 00:01:36,816 a single target value you want to search for. 32 00:01:36,816 --> 00:01:39,518 Now we need to loop over each item in the list. 33 00:01:39,518 --> 00:01:43,854 The range function gives us a range of numbers from its first argument up to, but 34 00:01:43,854 --> 00:01:46,004 not including, its second argument. 35 00:01:46,004 --> 00:01:47,773 So if our list had a length of five, 36 00:01:47,773 --> 00:01:51,390 this would loop over the indexes zero through four. 37 00:01:51,390 --> 00:01:54,910 We test whether the list item at the current index matches our target. 38 00:01:56,210 --> 00:01:59,560 If it does, then we return the index of the current item. 39 00:01:59,560 --> 00:02:03,350 This will exit the index of item function without looping over the remaining items 40 00:02:03,350 --> 00:02:04,980 In the list. 41 00:02:04,980 --> 00:02:08,090 If we reach the end of the loop without finding the target value, 42 00:02:08,090 --> 00:02:09,810 that means that it wasn't in the list. 43 00:02:09,810 --> 00:02:11,530 So instead of returning an index, 44 00:02:11,530 --> 00:02:17,100 we return the special Python value, None, which indicates the absence of a value. 45 00:02:17,100 --> 00:02:20,340 Other languages have similar values, like nil or null. 46 00:02:20,340 --> 00:02:23,570 But if yours doesn't, you might have to return a value that would otherwise be 47 00:02:23,570 --> 00:02:26,680 impossible, like an index of negative one. 48 00:02:26,680 --> 00:02:28,600 Now let's call our new search function. 49 00:02:28,600 --> 00:02:32,580 We start by looping over the list of 100 values we're looking for. 50 00:02:32,580 --> 00:02:36,390 We're using the values themselves this time, not their indexes within the list, 51 00:02:36,390 --> 00:02:38,970 so there's no need to mess with Python's range function. 52 00:02:40,210 --> 00:02:43,130 Here's the actual call to the index_of_item function. 53 00:02:43,130 --> 00:02:46,000 We pass it the full list of names that we loaded from the file 54 00:02:46,000 --> 00:02:48,960 plus the name we want to search for within that list. 55 00:02:48,960 --> 00:02:52,060 Then we store the index it returns in a variable. 56 00:02:52,060 --> 00:02:56,590 And lastly, we print the index we get back from the index_of_item function. 57 00:02:56,590 --> 00:03:00,595 Let's save this and go to our console and see if it works. 58 00:03:00,595 --> 00:03:07,702 python linear_search.pi names/unsorted.txt. 59 00:03:09,841 --> 00:03:13,195 And it'll print out the list of indexes for each name. 60 00:03:13,195 --> 00:03:17,551 I actually set it up so that the last two items in the list of names we're going to 61 00:03:17,551 --> 00:03:21,316 search for corresponded to the first and last name within the file. 62 00:03:21,316 --> 00:03:23,890 So if we open up our unsorted.txt file, 63 00:03:23,890 --> 00:03:29,133 we'll see Mary Rosenberger as the first name and Alonso Viviano as the last name. 64 00:03:31,380 --> 00:03:35,718 And those are the last two values in our list of names we're searching for. 65 00:03:35,718 --> 00:03:39,876 So it returned an index of 0 for that second-to-last name, and 66 00:03:39,876 --> 00:03:42,910 you can see that name here on line 1 of the file. 67 00:03:42,910 --> 00:03:46,668 The line numbering starts at 1, and the Python list indexes start at 0, so 68 00:03:46,668 --> 00:03:48,270 that makes sense. 69 00:03:48,270 --> 00:03:52,198 And for the last name, we'll return an index of 109873. 70 00:03:54,681 --> 00:03:58,318 And you can see that name here on line 109874. 71 00:03:58,318 --> 00:04:01,460 So we can see that it's returning the correct indexes. 72 00:04:01,460 --> 00:04:03,470 But right now, we're just searching for 73 00:04:03,470 --> 00:04:06,351 100 different names in a list of 100,000 names. 74 00:04:06,351 --> 00:04:08,472 In the real world, we're going to be looking for 75 00:04:08,472 --> 00:04:11,810 many more names than that within much bigger lists than that. 76 00:04:11,810 --> 00:04:13,820 Can we do this any faster? 77 00:04:13,820 --> 00:04:17,037 Yes, but we'll need to use the binary search algorithm. 78 00:04:17,037 --> 00:04:20,080 And for that to work, we need to sort our list of strings. 79 00:04:20,080 --> 00:04:21,450 We'll do that in the next video.