1 00:00:00,310 --> 00:00:03,730 So now, we know that the selection sort algorithm works. 2 00:00:03,730 --> 00:00:07,010 But the biggest sets we've been giving it are sorta tiny. 3 00:00:07,010 --> 00:00:11,430 In the real world, algorithms seem to work with datasets of ten of thousands, or 4 00:00:11,430 --> 00:00:13,560 even millions of items, and do it fast. 5 00:00:14,830 --> 00:00:17,836 I have another file with 10,000 random numbers in it. 6 00:00:21,717 --> 00:00:24,523 Let's see if selection_sort can handle that. 7 00:00:24,523 --> 00:00:26,440 If I run this as it is now though, 8 00:00:26,440 --> 00:00:29,786 it'll print out a lot of debug info as it sorts the list. 9 00:00:29,786 --> 00:00:32,127 So first, I'm going to go into the program and 10 00:00:32,127 --> 00:00:35,622 remove the two print statements in this selection_sort function. 11 00:00:39,153 --> 00:00:44,848 Now let's run the program again on the 10000.text file and see how long it takes. 12 00:00:44,848 --> 00:00:52,887 Python selection_sort.py numbers/10000.txt. 13 00:00:52,887 --> 00:00:56,907 1, 1,000, 2, 1,000, 3, 1,000, 4, 1,000, 5, 14 00:00:56,907 --> 00:01:01,756 1,000, 6, 1,000, 7, 1,000, 8, 1,000, 9, 1,000, 10, 15 00:01:01,756 --> 00:01:05,692 1,000, 11,1,000, 12,1,000, 13, 1,000. 16 00:01:05,692 --> 00:01:09,680 And it prints out all 10,000 of those numbers, neatly sorted. 17 00:01:09,680 --> 00:01:11,240 It took a little bit though. 18 00:01:11,240 --> 00:01:11,920 How long? 19 00:01:11,920 --> 00:01:15,250 Well, counting the time off vocally isn't very precise. 20 00:01:15,250 --> 00:01:18,850 And other programs running on the system can skew the amount of time your program 21 00:01:18,850 --> 00:01:20,510 takes to complete. 22 00:01:20,510 --> 00:01:23,710 Let me show you a Unix command that's available here in WorkSpaces 23 00:01:23,710 --> 00:01:25,010 which can help. 24 00:01:25,010 --> 00:01:28,790 You type, time, followed by a space, and then the command you want to run. 25 00:01:30,381 --> 00:01:34,794 So this command by itself would print contents of our five.text file, 26 00:01:34,794 --> 00:01:37,180 cat as in cat name numbers/5.txt. 27 00:01:37,180 --> 00:01:40,688 And this command would do the same thing, but 28 00:01:40,688 --> 00:01:45,949 it also keep track of how long it takes the cat program to complete and 29 00:01:45,949 --> 00:01:49,852 report the result, time cat numbers/5.txt. 30 00:01:51,729 --> 00:01:56,139 The real row in the results is the actual amount of time from when the program 31 00:01:56,139 --> 00:01:58,456 started running to when it completed. 32 00:01:58,456 --> 00:02:01,178 We can see it finished in a fraction of a second. 33 00:02:01,178 --> 00:02:05,759 But as we said, other programs running on the system can take CPU resources. 34 00:02:05,759 --> 00:02:08,630 In which case, your program will seem slower than it is. 35 00:02:08,630 --> 00:02:12,057 So we generally want to ignore the real result. 36 00:02:12,057 --> 00:02:16,331 The user result is the amount of time the CPU actually spent running 37 00:02:16,331 --> 00:02:17,560 the program code. 38 00:02:17,560 --> 00:02:22,550 So this is the total amount of time the code inside the cat program took to run. 39 00:02:22,550 --> 00:02:25,974 The sys results is the amount of time the CPU spent running 40 00:02:25,974 --> 00:02:28,408 Linux kernel calls that your code made. 41 00:02:28,408 --> 00:02:32,506 The Linux kernel is responsible for things like network communications and 42 00:02:32,506 --> 00:02:33,620 reading files. 43 00:02:33,620 --> 00:02:38,310 So loading the 5.text file is probably included in this result. 44 00:02:38,310 --> 00:02:40,170 In evaluating code's performance, 45 00:02:40,170 --> 00:02:44,570 we're generally going to want to add together the user and sys results. 46 00:02:44,570 --> 00:02:46,890 But cat is a very simple program. 47 00:02:46,890 --> 00:02:50,994 Let's try running the time command on our code, and 48 00:02:50,994 --> 00:02:54,165 see if we get a more interesting result, 49 00:02:54,165 --> 00:02:59,591 time python selection_sort.py numbers/10,000.text. 50 00:03:01,691 --> 00:03:03,801 This takes much longer to complete, 51 00:03:03,801 --> 00:03:07,670 nearly 12 seconds according to the real time measurement. 52 00:03:07,670 --> 00:03:12,410 But as we said, the real result is often skewed, so let's disregard that. 53 00:03:12,410 --> 00:03:17,412 If we add the user and sys runtimes together, we get about six seconds 54 00:03:18,974 --> 00:03:23,040 The time for the program to complete will vary a little bit each time you run it. 55 00:03:23,040 --> 00:03:24,780 But if it's doing the same operations, 56 00:03:24,780 --> 00:03:27,800 it usually won't change more than a fraction of a second. 57 00:03:27,800 --> 00:03:30,626 If I run our selection sort script on the same file, 58 00:03:30,626 --> 00:03:33,402 you can see it completes in roughly the same time. 59 00:03:33,402 --> 00:03:38,680 Now, let's try it on another file with one million numbers, 60 00:03:38,680 --> 00:03:44,686 time python selection_sort.py numbers/1000000.txt. 61 00:03:44,686 --> 00:03:46,190 How long does this one take? 62 00:03:46,190 --> 00:03:47,327 I don't even know. 63 00:03:47,327 --> 00:03:50,462 While designing this course, I tried running this command, and 64 00:03:50,462 --> 00:03:53,530 my workspace connection timed out before it completed. 65 00:03:53,530 --> 00:03:56,160 So we'll just say that selection sort takes a very, 66 00:03:56,160 --> 00:03:59,530 very long time to sort a million numbers. 67 00:03:59,530 --> 00:04:03,320 If we're going to sort a list that big, we're going to need a faster algorithm. 68 00:04:03,320 --> 00:04:06,030 We'll look into alternative sorting algorithms shortly.