1 00:00:00,000 --> 00:00:04,699 Previously we showed you bogo_sort, a terrible sorting algorithm that basically 2 00:00:04,699 --> 00:00:09,213 randomizes the order of the list and then checks to see if it happens to be sorted. 3 00:00:09,213 --> 00:00:12,973 The problem with bogo_sort is that is doesn't get any closer to 4 00:00:12,973 --> 00:00:14,932 a solution with each operation. 5 00:00:14,932 --> 00:00:17,831 And so with lists that have more than a few items, 6 00:00:17,831 --> 00:00:20,532 it will probably never finish sorting them. 7 00:00:20,532 --> 00:00:23,393 Now we're going to look at an algorithm named selection_sort. 8 00:00:23,393 --> 00:00:24,511 It's still slow but 9 00:00:24,511 --> 00:00:28,730 at least each pass through the list brings it a little closer to completion. 10 00:00:29,950 --> 00:00:33,767 Our implementation of selection_sort is going to give two arrays, 11 00:00:33,767 --> 00:00:36,280 an unsorted array and sorted one. 12 00:00:36,280 --> 00:00:39,410 Some version move values around within just one array, but 13 00:00:39,410 --> 00:00:42,046 we're using two arrays to keep the code simpler. 14 00:00:42,046 --> 00:00:44,306 The sorted list starts out empty, but 15 00:00:44,306 --> 00:00:48,972 we'll be moving values from the unsorted list to the sorted list one at a time. 16 00:00:48,972 --> 00:00:52,940 With each pass we'll look through each of the values in the unsorted array, 17 00:00:52,940 --> 00:00:56,485 find the smallest one, and move that to the end of the sorted array. 18 00:00:56,485 --> 00:01:00,062 We'll start with the first value in the unsorted array and 19 00:01:00,062 --> 00:01:03,865 say that's the minimum or smallest value we've seen so far. 20 00:01:03,865 --> 00:01:05,482 Then we'll look at the next value and 21 00:01:05,482 --> 00:01:07,684 see if that's smaller than the current minimum. 22 00:01:07,684 --> 00:01:10,545 If it is we'll mark that as the new minimum. 23 00:01:10,545 --> 00:01:14,450 Then we'll move to the next value and compare that to the minimum again. 24 00:01:14,450 --> 00:01:16,966 If it's smaller, that becomes the new minimum. 25 00:01:16,966 --> 00:01:19,876 We continue that way until we reach the end of the list. 26 00:01:19,876 --> 00:01:24,234 At that point we know whatever value we have marked as the minimum is the smallest 27 00:01:24,234 --> 00:01:25,610 value in the whole list. 28 00:01:25,610 --> 00:01:30,102 Now, here's the part that makes selection_sort better than bogo_sort. 29 00:01:30,102 --> 00:01:33,595 We then move that minimum value from the unsorted list to the end of 30 00:01:33,595 --> 00:01:34,557 the sorted list. 31 00:01:34,557 --> 00:01:38,492 The minimum value isn't part of the unsorted list anymore, so 32 00:01:38,492 --> 00:01:41,695 we don't have to waste time looking at it anymore. 33 00:01:41,695 --> 00:01:46,457 All our remaining comparisons will be on the remaining values in the unsorted list, 34 00:01:46,457 --> 00:01:48,278 then we start the process over. 35 00:01:48,278 --> 00:01:52,626 At this point our list consists of the numbers eight, five, four, and seven. 36 00:01:52,626 --> 00:01:54,172 Our first minimum is eight. 37 00:01:54,172 --> 00:01:57,291 We start by comparing the minimum to five. 38 00:01:57,291 --> 00:02:00,780 Five is smaller than eight, so five becomes the new minimum. 39 00:02:00,780 --> 00:02:04,340 Then we compare five to four, and four becomes the new minimum. 40 00:02:04,340 --> 00:02:08,157 Four is not smaller than seven though, so four remains the minimum. 41 00:02:08,157 --> 00:02:13,111 Four gets moved to the end of the sorted array, becoming its second element. 42 00:02:13,111 --> 00:02:13,973 The process repeats again. 43 00:02:13,973 --> 00:02:18,124 Eight is first minimum, but five is smaller so that becomes the minimum. 44 00:02:18,124 --> 00:02:21,070 Seven is larger, so five stays as the minimum. 45 00:02:21,070 --> 00:02:24,030 And five is what gets moved over to the sorted array. 46 00:02:24,030 --> 00:02:27,600 And so on until there are no more items left in the unsorted array and 47 00:02:27,600 --> 00:02:29,500 all we have left is the sorted array. 48 00:02:30,900 --> 00:02:33,464 So that's how selection sort works in general, 49 00:02:33,464 --> 00:02:36,790 now let's do an actual implementation of it. 50 00:02:36,790 --> 00:02:40,800 This code here at the top is the same as we saw in the bogo_sort example. 51 00:02:40,800 --> 00:02:43,489 It just loads a Python list of numbers from a file. 52 00:02:44,740 --> 00:02:47,920 Let's implement the function that will do our selection sort. 53 00:02:47,920 --> 00:02:52,170 We're going to pass in our Python list containing all the unsorted numbers. 54 00:02:52,170 --> 00:02:55,590 We'll create an empty list that will hold all our sorted values. 55 00:02:57,170 --> 00:02:59,420 We'll loop once for each value in the list. 56 00:03:01,390 --> 00:03:05,060 We call a function named index_of_min, which we're going to write in just 57 00:03:05,060 --> 00:03:10,410 a minute, that finds the minimum value in the unsorted list and returns its index. 58 00:03:10,410 --> 00:03:12,490 Then we call the pop method on the list, and 59 00:03:12,490 --> 00:03:14,960 pass it the index of the minimum value. 60 00:03:14,960 --> 00:03:17,990 Pop will remove that item from the list and return it. 61 00:03:17,990 --> 00:03:21,530 We then add that value to the end of the sorted list. 62 00:03:21,530 --> 00:03:26,150 Going up a level of indentation signals to Python that we're ending the loop. 63 00:03:26,150 --> 00:03:29,080 After the loop finishes, we return the sorted list. 64 00:03:30,800 --> 00:03:34,030 Now we need to write the function that picks out the minimum value. 65 00:03:34,030 --> 00:03:36,655 We pass in the list we're going to search. 66 00:03:36,655 --> 00:03:39,452 We mark the first value in the list as the minimum. 67 00:03:39,452 --> 00:03:41,745 It may or may not be the actual minimum, but 68 00:03:41,745 --> 00:03:45,080 it's the smallest we've seen on this pass through the list. 69 00:03:45,080 --> 00:03:49,342 It's also the only value we've seen on this pass through the list so far. 70 00:03:49,342 --> 00:03:53,934 Now we loop through the remaining values in the list after the first. 71 00:03:53,934 --> 00:03:58,186 We test whether the value we're currently looking at is less than the previously 72 00:03:58,186 --> 00:03:59,259 recorded minimum. 73 00:04:01,627 --> 00:04:06,620 If it is, then we set the current index as the new index of the minimum value. 74 00:04:06,620 --> 00:04:08,410 After we've looped through all the values, 75 00:04:08,410 --> 00:04:11,710 we return the index of the smallest value we found. 76 00:04:11,710 --> 00:04:15,462 Lastly, we need to actually run our selection_sort method and 77 00:04:15,462 --> 00:04:17,483 print the sorted list it returns. 78 00:04:17,483 --> 00:04:20,062 Let's save this, and now let's try running it. 79 00:04:20,062 --> 00:04:25,709 We run the Python command and pass at the name of our script, selection_sort.py. 80 00:04:25,709 --> 00:04:30,124 In the numbers directory I've saved several data files filled with random 81 00:04:30,124 --> 00:04:31,815 numbers, one on each line. 82 00:04:31,815 --> 00:04:34,715 5.txt has 5 lines, 8.txt has 8 lines, and 83 00:04:34,715 --> 00:04:40,610 to help us really measure the speed of our algorithms, 10000.txt has 10,000 lines. 84 00:04:40,610 --> 00:04:43,700 I've even created a file with a million numbers. 85 00:04:43,700 --> 00:04:46,634 Our script takes the path of a file to load as an argument. 86 00:04:46,634 --> 00:04:53,426 So I'll give it the path of our file with five numbers, numbers/5.txt. 87 00:04:53,426 --> 00:04:56,433 The script runs, reads the numbers in the file into a list, 88 00:04:56,433 --> 00:05:01,430 calls our selection_sort method with that list, and then prints the sorted list. 89 00:05:01,430 --> 00:05:04,900 Let me add a couple print statements within the selection_sort function so 90 00:05:04,900 --> 00:05:07,110 you can watch the sort happening. 91 00:05:07,110 --> 00:05:10,350 Don't worry about figuring out the Python formatting string that I use, 92 00:05:10,350 --> 00:05:12,750 it's just there to keep the two lists neatly aligned. 93 00:05:13,820 --> 00:05:16,710 I'll add the first print statement before the loop runs at all. 94 00:05:21,555 --> 00:05:24,670 I'll have it print out the unsorted list and the sorted list. 95 00:05:24,670 --> 00:05:28,161 I'll add an identical print statement within the loop so 96 00:05:28,161 --> 00:05:32,388 we can watch values moving from the unsorted list to the sorted list. 97 00:05:33,397 --> 00:05:34,342 Let's save this. 98 00:05:36,311 --> 00:05:38,808 And we'll try running the same command again. 99 00:05:38,808 --> 00:05:40,472 The output looks like this. 100 00:05:40,472 --> 00:05:45,530 You can see the unsorted list on the left and the sorted list on the right. 101 00:05:45,530 --> 00:05:48,316 Initially the sorted list is empty. 102 00:05:48,316 --> 00:05:51,546 On the first pass it selects the lowest number, one, and 103 00:05:51,546 --> 00:05:53,172 moves it to the sorted list. 104 00:05:53,172 --> 00:05:56,647 Then it moves the next lowest number over, four. 105 00:05:56,647 --> 00:06:00,815 This repeats until all the numbers have been moved to the sorted list. 106 00:06:00,815 --> 00:06:03,133 I have another file with eight different numbers in it. 107 00:06:03,133 --> 00:06:04,341 Let's try our program with that. 108 00:06:04,341 --> 00:06:09,752 Python selection_sort.py numbers/8.txt. 109 00:06:13,524 --> 00:06:15,283 You can see the same process at work here. 110 00:06:15,283 --> 00:06:18,373 Notice that this file had some duplicate values too. 111 00:06:18,373 --> 00:06:22,069 That's okay though because the index of min function only updates 112 00:06:22,069 --> 00:06:26,170 the minimum index if the current value is less than the previous minimum. 113 00:06:26,170 --> 00:06:30,436 If they're equal it just keeps the first minimum value it found, and 114 00:06:30,436 --> 00:06:35,156 waits to move the duplicate value over until the next pass through the list.