1 00:00:00,000 --> 00:00:02,195 Now that we have our list of names sorted, 2 00:00:02,195 --> 00:00:04,596 we can use the binary search algorithm on it. 3 00:00:04,596 --> 00:00:09,834 Let's see if we can use it to speed up our search for the indexes of 100 names. 4 00:00:09,834 --> 00:00:13,067 Binary search keeps narrowing down the list until it has the value 5 00:00:13,067 --> 00:00:14,074 it's looking for. 6 00:00:14,074 --> 00:00:18,034 It's faster than linear search because it discards half the potential 7 00:00:18,034 --> 00:00:19,162 matches each time. 8 00:00:19,162 --> 00:00:22,551 Our code here at the top of our binary search script is unchanged from 9 00:00:22,551 --> 00:00:23,746 the previous scripts. 10 00:00:23,746 --> 00:00:28,685 We just call the load strings function to load our 100,000 sorted names from a file. 11 00:00:28,685 --> 00:00:32,866 Here we've hard coded the list of 100 names we're going to sort for again. 12 00:00:32,866 --> 00:00:35,394 It's identical to list from a linear search script, 13 00:00:35,394 --> 00:00:38,990 except that I've again changed the last two names to correspond to the names 14 00:00:38,990 --> 00:00:42,400 on the first and last lines of the file we'll be loading. 15 00:00:42,400 --> 00:00:46,410 Now let's write the function that will implement our binary search algorithm. 16 00:00:46,410 --> 00:00:49,950 Like the linear search function before, it will take two arguments. 17 00:00:49,950 --> 00:00:52,630 The first is the list we're going to search through, and 18 00:00:52,630 --> 00:00:55,700 the second is the target value we'll be searching for. 19 00:00:55,700 --> 00:01:00,167 Again, the binary search function will return the index it found the value at, or 20 00:01:00,167 --> 00:01:02,573 the special value none if it wasn't found. 21 00:01:02,573 --> 00:01:06,876 Binary search is faster than linear search because it discards half the values it 22 00:01:06,876 --> 00:01:08,655 has to search through each time. 23 00:01:08,655 --> 00:01:09,316 To do this, 24 00:01:09,316 --> 00:01:13,234 it needs to keep track of a range that it still needs to search through. 25 00:01:13,234 --> 00:01:16,263 To start, that range is going to include the full list. 26 00:01:16,263 --> 00:01:20,731 The first variable will track the lowest index in the range we're searching. 27 00:01:20,731 --> 00:01:25,301 To start it's going to be zero, the first index in the full list. 28 00:01:25,301 --> 00:01:29,187 Likewise, the last variable will track the highest index in the range we're 29 00:01:29,187 --> 00:01:29,853 searching. 30 00:01:29,853 --> 00:01:33,460 To start, we'll set it to the highest index in the full list. 31 00:01:33,460 --> 00:01:35,881 If the first and last variables are equal, 32 00:01:35,881 --> 00:01:40,465 then it means the size of the search range has shrunk to zero and there is no match. 33 00:01:40,465 --> 00:01:44,013 Until that happens though, we'll keep looping to continue the search. 34 00:01:44,013 --> 00:01:47,833 We want to divide the list of potential matches in half each time. 35 00:01:47,833 --> 00:01:48,374 To do that, 36 00:01:48,374 --> 00:01:52,650 we need to check the value that's in the middle of the range we're searching in. 37 00:01:52,650 --> 00:01:54,420 We add the indexes in the first and 38 00:01:54,420 --> 00:01:58,140 last variables, then divide by two to get their average. 39 00:01:58,140 --> 00:02:01,910 We might get a fractional number, which can't be used as a list index, so 40 00:02:01,910 --> 00:02:07,350 we also round down using Python's double-slash floor division operator. 41 00:02:07,350 --> 00:02:09,840 All this will give us the index of the list element 42 00:02:09,840 --> 00:02:12,240 that's the midpoint of the range we're searching. 43 00:02:12,240 --> 00:02:14,986 We store that in the midpoint variable. 44 00:02:14,986 --> 00:02:18,231 Whoops, looks like my indentation got mixed up there. 45 00:02:18,231 --> 00:02:19,884 Let me fix that real quick. 46 00:02:19,884 --> 00:02:20,810 There we go. 47 00:02:20,810 --> 00:02:25,522 Now we test whether the list element at the midpoint matches the target value. 48 00:02:28,725 --> 00:02:32,596 If it does, we return the midpoint index without looping any further, 49 00:02:32,596 --> 00:02:34,680 our search is complete. 50 00:02:34,680 --> 00:02:41,918 Otherwise, if the midpoint element's value is less than the target value, Then 51 00:02:41,918 --> 00:02:46,550 we know that our target value can't be at the midpoint or any index prior to that. 52 00:02:46,550 --> 00:02:50,360 So we move the new start of our search range to just after the old midpoint. 53 00:02:51,590 --> 00:02:54,480 Otherwise, the midpoint element's value must have been greater than 54 00:02:54,480 --> 00:02:56,130 the target value. 55 00:02:56,130 --> 00:03:00,430 We know that our target value can't be at the midpoint, or any index after that, so 56 00:03:00,430 --> 00:03:04,060 we move the new end of our search range to just before the old midpoint. 57 00:03:05,190 --> 00:03:07,800 By unindenting here, we mark the end of the loop. 58 00:03:07,800 --> 00:03:09,020 If the loop completes, 59 00:03:09,020 --> 00:03:12,570 it means the search range shrank to nothing without our finding a match. 60 00:03:12,570 --> 00:03:15,415 And that means there's no matching value in the list. 61 00:03:15,415 --> 00:03:19,429 So we return the special Python value none to indicate this. 62 00:03:19,429 --> 00:03:22,520 Lastly, just as we did in our linear search script, 63 00:03:22,520 --> 00:03:25,278 we need to search for each of the 100 names. 64 00:03:25,278 --> 00:03:28,331 We loop over each name in our hard-coded list. 65 00:03:28,331 --> 00:03:32,385 And we call the binary_search function with the sorted list of names we're going 66 00:03:32,385 --> 00:03:36,140 to load from the file and the current name we're searching for. 67 00:03:36,140 --> 00:03:39,250 We have store the returned list index in the index variable. 68 00:03:40,710 --> 00:03:42,873 And finally, we print that variable. 69 00:03:44,433 --> 00:03:47,430 Let's save this and go to our console and try running it. 70 00:03:47,430 --> 00:03:51,630 Python binary_search.py, and 71 00:03:51,630 --> 00:03:53,960 it's important to give it the name of the sorted file. 72 00:03:53,960 --> 00:03:56,900 If it loads the unsorted file, the binary search won't work. 73 00:03:56,900 --> 00:04:00,106 So names/sorted,text. 74 00:04:00,106 --> 00:04:05,083 Again, it prints out the list of indexes for each name. 75 00:04:05,083 --> 00:04:09,128 I once again set it up so the last two items in the list of names we are gonna 76 00:04:09,128 --> 00:04:12,713 search for corresponded to the first and last name in the file. 77 00:04:12,713 --> 00:04:16,775 So it returned an index of zero for the second to last name. 78 00:04:18,328 --> 00:04:20,023 And you can see that name. 79 00:04:24,392 --> 00:04:27,315 Here's the second to last name, Aaron Augustine. 80 00:04:29,510 --> 00:04:32,490 You can see that name here on line one of the file. 81 00:04:32,490 --> 00:04:36,164 And for the last name, it returned an index of 109873. 82 00:04:37,420 --> 00:04:41,983 And you can see that name here on line 109874. 83 00:04:45,786 --> 00:04:48,630 Let's check the third to last name for good measure. 84 00:04:48,630 --> 00:04:55,156 It looks like an index of 97022 was printed for that name, Stephen Deras. 85 00:04:56,867 --> 00:05:01,469 Let's search for Steven Deras within the file. 86 00:05:01,469 --> 00:05:05,628 And here it is, on line 97023. 87 00:05:05,628 --> 00:05:09,357 Remember that line numbers start on one instead of zero, 88 00:05:09,357 --> 00:05:14,029 so this actually matches up with the printed list index of 97022. 89 00:05:14,029 --> 00:05:18,249 It looks like our binary search script is working correctly.