1 00:00:00,000 --> 00:00:04,597 [MUSIC] 2 00:00:04,597 --> 00:00:09,002 [SOUND] Over the next few videos, we're going to build a data structure that you 3 00:00:09,002 --> 00:00:11,548 may have worked with before, a linked list. 4 00:00:11,548 --> 00:00:15,616 Before we get into what a linked list is, let's talk about why we build data 5 00:00:15,616 --> 00:00:19,890 structures instead of just using the ones that come built in to our languages. 6 00:00:19,890 --> 00:00:23,070 Each data structure solves a particular problem. 7 00:00:23,070 --> 00:00:26,112 We just went over the basics of the array data structure. 8 00:00:26,112 --> 00:00:30,148 And looked at the cost of common operations that we carry out on arrays. 9 00:00:30,148 --> 00:00:33,561 We found that arrays were particularly good at accessing, 10 00:00:33,561 --> 00:00:36,026 reading values happens in constant time. 11 00:00:36,026 --> 00:00:39,196 But arrays are pretty bad at inserting and deleting, 12 00:00:39,196 --> 00:00:41,173 both of which run in linear time. 13 00:00:41,173 --> 00:00:44,234 Linked lists, on the hand, are somewhat better at this, 14 00:00:44,234 --> 00:00:46,053 although there are some caveats. 15 00:00:46,053 --> 00:00:50,766 And if we're trying to solve a problem that involves far more inserts and 16 00:00:50,766 --> 00:00:55,566 deletes than accessing, a linked list can be a better tool than an array. 17 00:00:55,566 --> 00:00:56,927 So what is a linked list? 18 00:00:56,927 --> 00:01:01,652 A linked list is a linear data structure where each element in the list is 19 00:01:01,652 --> 00:01:05,190 contained in a separate object called a node. 20 00:01:05,190 --> 00:01:08,600 A node models two pieces of information, an individual 21 00:01:08,600 --> 00:01:13,950 item of the data that we want to store and a reference to the next node in the list. 22 00:01:13,950 --> 00:01:17,480 The first node in the linked list is called the head of the list, 23 00:01:17,480 --> 00:01:20,010 while the last node is called the tail. 24 00:01:20,010 --> 00:01:22,570 The head and the tail nodes are special. 25 00:01:22,570 --> 00:01:26,930 The list only maintains a reference to the head, although in some implementations, 26 00:01:26,930 --> 00:01:29,370 it keeps a reference to the tail as well. 27 00:01:29,370 --> 00:01:32,190 This aspect of linked lists is very important. 28 00:01:32,190 --> 00:01:36,150 And as you'll see, most of the operations on the list need to be implemented 29 00:01:36,150 --> 00:01:38,610 quite differently compared to an array. 30 00:01:38,610 --> 00:01:42,990 The opposite of the head, the tail, denotes the end of the list. 31 00:01:42,990 --> 00:01:47,440 Every node other than the tail points to the next node in the list, but 32 00:01:47,440 --> 00:01:49,660 tail doesn't point to anything. 33 00:01:49,660 --> 00:01:52,690 This is basically how we know it's the end of the list. 34 00:01:52,690 --> 00:01:56,280 Nodes are what are called self referential objects. 35 00:01:56,280 --> 00:02:00,140 The definition of a node includes a link to another node. 36 00:02:00,140 --> 00:02:03,300 And self referential here means the definition of node 37 00:02:03,300 --> 00:02:05,080 includes the node itself. 38 00:02:05,080 --> 00:02:07,256 Linked lists often come in two forms. 39 00:02:07,256 --> 00:02:08,529 A singly linked list, 40 00:02:08,529 --> 00:02:12,086 where each node stores a reference to the next node in the list. 41 00:02:12,086 --> 00:02:13,813 Or a doubly linked list, 42 00:02:13,813 --> 00:02:18,601 where each node stores a reference to both the node before and after. 43 00:02:18,601 --> 00:02:21,643 If an array is a train with a bunch of cars in order, 44 00:02:21,643 --> 00:02:25,050 then a linked list is like a treasure hunt. 45 00:02:25,050 --> 00:02:26,020 When you start the hunt, 46 00:02:26,020 --> 00:02:29,990 you have a piece of paper with a location of the first treasure. 47 00:02:29,990 --> 00:02:32,568 You go to that location and you find an item, 48 00:02:32,568 --> 00:02:35,570 along with a location to the next item of treasure. 49 00:02:35,570 --> 00:02:39,661 When you finally find an item that doesn't also include a location, 50 00:02:39,661 --> 00:02:41,645 you know that the hunt has ended. 51 00:02:41,645 --> 00:02:46,236 Now that we have a high level view of what a linked list is, let's jump into code and 52 00:02:46,236 --> 00:02:47,452 build one together. 53 00:02:47,452 --> 00:02:51,900 We'll focus on building a singly linked list for this course. 54 00:02:51,900 --> 00:02:54,475 There are advantages to having a doubly linked list, but 55 00:02:54,475 --> 00:02:56,363 we don't want to get ahead of ourselves. 56 00:02:57,396 --> 00:03:04,682 Let's start here by creating a new file where we're going to put all our code for 57 00:03:04,682 --> 00:03:10,031 our linked list, so we'll call this linked_list.py. 58 00:03:10,031 --> 00:03:15,294 And first, we're going to create a class to represent a node. 59 00:03:15,294 --> 00:03:17,494 Say, class Node. 60 00:03:17,494 --> 00:03:22,700 Now, Node is a simple object, in that it won't model much. 61 00:03:22,700 --> 00:03:26,235 So, first we'll add a data variable. 62 00:03:26,235 --> 00:03:28,760 oS an instance variable here called data, and 63 00:03:28,760 --> 00:03:31,205 we'll assign the value None, initially. 64 00:03:31,205 --> 00:03:34,455 And then we'll add one more, we'll call this next_node. 65 00:03:34,455 --> 00:03:36,715 And to this, we'll assign None as well. 66 00:03:36,715 --> 00:03:39,035 So we've created two instance variables. 67 00:03:39,035 --> 00:03:42,325 data, to hold onto the data that we're storing. 68 00:03:42,325 --> 00:03:46,916 And next_node to point to the next node in the list. 69 00:03:46,916 --> 00:03:51,163 Now we need to add a constructor to make this class easy to create. 70 00:03:51,163 --> 00:03:57,450 So we'll add an __init__ method here that takes self and some data to start off. 71 00:03:57,450 --> 00:04:02,205 And all we're going to do is assign data to that instance variable we created. 72 00:04:02,205 --> 00:04:05,610 So that's all we need to model Node. 73 00:04:05,610 --> 00:04:08,430 Before we do anything else though, let's document this. 74 00:04:08,430 --> 00:04:12,250 So, right after the class definition, let's create a doc string. 75 00:04:12,250 --> 00:04:12,750 So """. 76 00:04:14,710 --> 00:04:19,164 Next line, and we'll say, An object for 77 00:04:19,164 --> 00:04:23,370 storing a single node of a linked list. 78 00:04:23,370 --> 00:04:28,699 And then on the next line, we'll say, Models two attributes, 79 00:04:30,231 --> 00:04:33,594 data and the link to the next node in the list. 80 00:04:33,594 --> 00:04:41,416 And then we'll close this doc string off with three more quotation marks. 81 00:04:41,416 --> 00:04:46,501 Okay, using the Node class is fairly straightforward. 82 00:04:46,501 --> 00:04:49,398 So we can create a new instance of Node with some data to store. 83 00:04:49,398 --> 00:04:54,229 Now the way we're going to do this is we're going to bring up the console and 84 00:04:54,229 --> 00:04:59,673 we're going to type out, like we've been typing out before, python, followed by 85 00:04:59,673 --> 00:05:04,761 the name of the script that we wrote, which is linked list linked_list.py. 86 00:05:04,761 --> 00:05:09,321 But before we do that we're going to pass an argument to the python command, 87 00:05:09,321 --> 00:05:13,780 we're gonna say -i and then the name of the script, linked_list.py. 88 00:05:13,780 --> 00:05:17,510 So what this does is this is going to run the python repel, 89 00:05:17,510 --> 00:05:20,698 the read evaluate print loop in the console, but 90 00:05:20,698 --> 00:05:25,401 it's gonna load the contents of our file into that so that we can use it. 91 00:05:25,401 --> 00:05:28,774 So I'll hit Enter, and we have a new instance going. 92 00:05:28,774 --> 00:05:31,006 And now we can use the Node in here. 93 00:05:31,006 --> 00:05:33,182 So we can say N1 = Node. 94 00:05:33,182 --> 00:05:37,137 And since we defined that constructor, we can pass it some data. 95 00:05:37,137 --> 00:05:39,080 So we'll say 10 here. 96 00:05:39,080 --> 00:05:44,830 Now, if we try to inspect this object, the representation returned isn't very useful, 97 00:05:44,830 --> 00:05:48,460 which will make things really hard to debug as our code grows. 98 00:05:48,460 --> 00:05:52,740 So for example, if I type out N1, you'll see that we have a valid instance here, 99 00:05:52,740 --> 00:05:55,740 but it's not very helpful the way it's printed out. 100 00:05:55,740 --> 00:06:00,460 So we can customize this by adding a representation of the object using 101 00:06:00,460 --> 00:06:01,824 the repr function. 102 00:06:01,824 --> 00:06:05,683 Now in the terminal still, we'll type out exist, 103 00:06:05,683 --> 00:06:08,926 like that, hit Enter to exit the console. 104 00:06:08,926 --> 00:06:12,347 And then down here, let's add in some room. 105 00:06:13,391 --> 00:06:18,696 Okay, and here we'll say, def __repr, another set of __, 106 00:06:18,696 --> 00:06:22,657 and then this function takes the argument self. 107 00:06:22,657 --> 00:06:28,307 And in here, we can provide a string representation of what we want printed 108 00:06:28,307 --> 00:06:34,410 to the console when we inspect that object inside of it, inside of the console. 109 00:06:34,410 --> 00:06:38,832 So here we'll say return, again, this is a string representation. 110 00:06:38,832 --> 00:06:43,845 So inside quotes we'll say, , so this represents a Node instance. 111 00:06:43,845 --> 00:06:47,523 And the data it contains, it will say %s. 112 00:06:47,523 --> 00:06:52,159 Which is a Python way of substituting something into a string, 113 00:06:52,159 --> 00:06:54,004 string interpolation. 114 00:06:54,004 --> 00:06:58,331 And outside of the string we can say, %, again and 115 00:06:58,331 --> 00:07:03,774 here we're saying we want to replace this %s with self.data. 116 00:07:03,774 --> 00:07:05,072 Okay, let's hit Save. 117 00:07:05,072 --> 00:07:08,663 And before we move on, let's verify that this works. 118 00:07:08,663 --> 00:07:12,346 So I'm gonna come in here, type clear to get rid of everything. 119 00:07:12,346 --> 00:07:14,485 And then we'll do what we did again. 120 00:07:14,485 --> 00:07:18,801 And you can just hit the up arrow a couple of times to get that command. 121 00:07:18,801 --> 00:07:19,942 All right, so hit Enter. 122 00:07:19,942 --> 00:07:24,317 And now, just so you know, every time you run this you start off from scratch. 123 00:07:24,317 --> 00:07:27,633 So N1 that we created earlier, not there anymore. 124 00:07:27,633 --> 00:07:31,228 So it's going to create it, N1 = Node(10). 125 00:07:31,228 --> 00:07:34,268 And we can type N1 again and hit Enter, and 126 00:07:34,268 --> 00:07:37,407 you have a much better representation now. 127 00:07:37,407 --> 00:07:41,606 So we can see that we have a Node and it contains the data, 10. 128 00:07:41,606 --> 00:07:44,556 We can also create another one. 129 00:07:44,556 --> 00:07:46,903 N2 = Node, that contains the data, 20. 130 00:07:46,903 --> 00:07:50,590 And now we can say N1.next_node = N2. 131 00:07:50,590 --> 00:07:53,190 So N1 now points to N2. 132 00:07:53,190 --> 00:07:58,270 And if we say N1.next_node, you'll see that it points to that Node. 133 00:07:58,270 --> 00:08:00,370 The node contained 20. 134 00:08:00,370 --> 00:08:03,090 Nodes are the building blocks for a list. 135 00:08:03,090 --> 00:08:08,070 And now that we have a Node object, we can use it to create a singly linked list. 136 00:08:08,070 --> 00:08:13,170 So again, I'm gonna exit out of this and then go back to the text editor. 137 00:08:14,730 --> 00:08:16,390 And here, we'll create a new class. 138 00:08:16,390 --> 00:08:18,674 So, class LinkedList:. 139 00:08:18,674 --> 00:08:23,430 The LinkedList class is going to define a head. 140 00:08:23,430 --> 00:08:27,218 And this attribute models the only node that the list is going to have 141 00:08:27,218 --> 00:08:28,154 a reference to. 142 00:08:28,154 --> 00:08:31,914 So here we'll say, head, and we'll assign None, initially. 143 00:08:31,914 --> 00:08:36,059 And then, like we did earlier, let's create a constructor. 144 00:08:36,059 --> 00:08:41,028 So __init__, this takes self. 145 00:08:41,028 --> 00:08:45,880 And then inside, like before, we'll say, self.head = None. 146 00:08:45,880 --> 00:08:48,433 This is the same as doing this, so 147 00:08:48,433 --> 00:08:53,680 we can actually get rid of that and just use the constructor. 148 00:08:53,680 --> 00:08:54,630 Okay, so again, 149 00:08:54,630 --> 00:09:00,160 this head attribute models the only node that the list will have a reference to. 150 00:09:00,160 --> 00:09:04,780 Since every node points to the next node, to find a particular node, 151 00:09:04,780 --> 00:09:09,416 we can go from one node to the next in a process called list traversal. 152 00:09:09,416 --> 00:09:13,783 So in the class constructor here, we'll have set the default value of head to None 153 00:09:13,783 --> 00:09:16,730 so that new lists created are always empty. 154 00:09:16,730 --> 00:09:20,700 Again, you'll notice here, that I didn't explicitly declare the head attribute at 155 00:09:20,700 --> 00:09:25,170 the top of the class definition, and don't worry, that's not an oversight. 156 00:09:25,170 --> 00:09:29,480 The self.head in the initializer means that it's still created. 157 00:09:29,480 --> 00:09:32,630 Okay, so, that's all there is to modeling a linked list. 158 00:09:32,630 --> 00:09:36,900 Now, we can add methods that make it easier to use this data structure. 159 00:09:36,900 --> 00:09:40,878 First, a really simple doc string to provide some information. 160 00:09:40,878 --> 00:09:44,240 So here, we'll create a doc string, """, and 161 00:09:44,240 --> 00:09:47,740 then we'll say Singly linked list, and then close it off. 162 00:09:49,130 --> 00:09:54,022 A common operation carried out on data structures is checking whether it 163 00:09:54,022 --> 00:09:56,919 contains any data or whether it's empty. 164 00:09:56,919 --> 00:09:59,607 At the moment, to check if a list is empty, 165 00:09:59,607 --> 00:10:04,552 we would need to query these instance variables, head and so on, every time. 166 00:10:04,552 --> 00:10:09,004 Ideally, we would like to not expose the inner workings of our data 167 00:10:09,004 --> 00:10:11,159 structure to code that uses it. 168 00:10:11,159 --> 00:10:16,625 Instead, let's make this operation more explicit by defining a method. 169 00:10:16,625 --> 00:10:21,429 So, we'll say, def is_empty, and this method takes itself as an argument. 170 00:10:21,429 --> 00:10:25,523 And here we'll say it return self.head == None. 171 00:10:25,523 --> 00:10:29,526 All we're doing here is checking to see if head is None. 172 00:10:29,526 --> 00:10:35,390 If it is, this condition evaluates to true, which indicates the list is empty. 173 00:10:35,390 --> 00:10:36,500 Now, before we end this video, 174 00:10:36,500 --> 00:10:41,140 let's add one more convenience method to calculate the size of our list. 175 00:10:41,140 --> 00:10:45,180 The name convenience method indicates that what this method is doing, 176 00:10:45,180 --> 00:10:48,740 is not providing any additional functionality that our data structure 177 00:10:48,740 --> 00:10:54,470 can't handle right now, but instead making existing functionality easier to use. 178 00:10:54,470 --> 00:10:58,980 We could calculate the size of our linked list by traversing it every time using 179 00:10:58,980 --> 00:11:04,050 a loop until we hit a tail node, but doing that every time is a hassle. 180 00:11:04,050 --> 00:11:10,033 Okay, so we'll call this method size, and as always, it takes (self). 181 00:11:10,033 --> 00:11:15,201 Unlike calling len on a Python list, not to be confused with a linked list, which 182 00:11:15,201 --> 00:11:20,980 is a constant time operation, our size operation is going to run in linear time. 183 00:11:20,980 --> 00:11:25,603 The only way we can count how many items we have is to visit each node and 184 00:11:25,603 --> 00:11:28,336 call next until we hit the the tail node. 185 00:11:28,336 --> 00:11:34,325 So we'll start by getting a reference to the head, we'll say current = self.head. 186 00:11:34,325 --> 00:11:38,996 Let's also define a local variable named count with an initial 187 00:11:38,996 --> 00:11:43,322 value of 0 that will increment every time we visit a node. 188 00:11:43,322 --> 00:11:47,873 Once we hit the tail, count will reflect the size of that list. 189 00:11:47,873 --> 00:11:52,742 Next, we'll define a while loop that will keep going until there are no more nodes. 190 00:11:52,742 --> 00:11:54,737 So we'll say, while current. 191 00:11:54,737 --> 00:11:59,725 While current is the same as writing out while current != None, but 192 00:11:59,725 --> 00:12:03,604 it's more succinct, so we'll go with this former. 193 00:12:03,604 --> 00:12:07,475 If the latter is more precise for you, you can go with that. 194 00:12:07,475 --> 00:12:10,943 Now inside this loop, we'll increment the count value. 195 00:12:10,943 --> 00:12:12,463 So count + = 1. 196 00:12:12,463 --> 00:12:17,696 +=, if you haven't encountered it before, is the same as writing count = count + 1. 197 00:12:17,696 --> 00:12:21,192 So if count is 0 initially, so it's 0 + 1 is 1, and 198 00:12:21,192 --> 00:12:23,702 then we'll assign that back to count. 199 00:12:23,702 --> 00:12:26,526 Okay, so count + = 1. 200 00:12:26,526 --> 00:12:31,309 Next we're going to assign the next node in the list to current. 201 00:12:31,309 --> 00:12:34,466 So current = current.next_node. 202 00:12:34,466 --> 00:12:39,281 This way, once we get to the tail and call next_node, 203 00:12:39,281 --> 00:12:44,208 current will equal None and the while loop terminates. 204 00:12:44,208 --> 00:12:46,852 So at the end we can return count. 205 00:12:46,852 --> 00:12:52,092 As you can see, we need to visit every node to determine the size, 206 00:12:52,092 --> 00:12:55,636 meaning our algorithm runs in linear time. 207 00:12:55,636 --> 00:12:57,864 So let's document this. 208 00:12:57,864 --> 00:13:02,507 Up in our document string, which we'll add now to size, 209 00:13:02,507 --> 00:13:06,877 we'll say, Returns the number of nodes in the list. 210 00:13:06,877 --> 00:13:08,636 Take linear time. 211 00:13:08,636 --> 00:13:09,895 Let's take a break here. 212 00:13:09,895 --> 00:13:13,931 We can now create lists, check if they're empty and check the size. 213 00:13:13,931 --> 00:13:18,207 In the next video, let's start implementing some common operations.