1 00:00:00,000 --> 00:00:04,763 [MUSIC] 2 00:00:04,763 --> 00:00:07,700 We've covered a lot of theory and places to be concerned with. 3 00:00:07,700 --> 00:00:09,620 So we should start to look at some solutions or 4 00:00:09,620 --> 00:00:11,240 approaches to better security. 5 00:00:11,240 --> 00:00:14,310 Before we get into that though let me make one thing absolutely clear, 6 00:00:14,310 --> 00:00:18,060 you should use established measures, tools, libraries, and services. 7 00:00:18,060 --> 00:00:20,250 It's really fun to play around with making your own algorithms and 8 00:00:20,250 --> 00:00:21,450 security techniques. 9 00:00:21,450 --> 00:00:23,140 But these, especially when kept secret and 10 00:00:23,140 --> 00:00:27,150 internal, are often much more vulnerable than the current industry standard ones. 11 00:00:27,150 --> 00:00:29,071 We want you to be as safe and secure as possible. 12 00:00:29,071 --> 00:00:33,090 So please stick to industry recommended tools, okay, cool. 13 00:00:33,090 --> 00:00:36,460 Let's start with one of the things that almost every security system uses, 14 00:00:36,460 --> 00:00:37,660 passwords. 15 00:00:37,660 --> 00:00:41,650 We need to hash passwords because despite our best warnings, users will come up with 16 00:00:41,650 --> 00:00:46,050 one decent password and use it at site after site, application after application. 17 00:00:46,050 --> 00:00:48,840 And then, when bobsfishingspots.com gets hacked, 18 00:00:48,840 --> 00:00:53,240 now the attackers can target your site or my site and hope for password overlaps. 19 00:00:53,240 --> 00:00:56,560 We however, don't want to provide hackers with a fresh supply of passwords and 20 00:00:56,560 --> 00:00:57,460 account details. 21 00:00:57,460 --> 00:00:59,460 So we'll hash all the passwords that we store. 22 00:01:00,470 --> 00:01:02,520 Just a note before we dive into a screen cast. 23 00:01:02,520 --> 00:01:04,420 I'll be using languages that I am comfortable with, but 24 00:01:04,420 --> 00:01:05,990 these are only illustrations. 25 00:01:05,990 --> 00:01:07,940 Anything I'm doing in Python or JavaScript or 26 00:01:07,940 --> 00:01:11,000 any other language can be done in your language or framework of choice. 27 00:01:11,000 --> 00:01:14,450 Don't think that you have to use exactly the language and code that I'm using. 28 00:01:14,450 --> 00:01:15,670 All right, let's get started. 29 00:01:17,420 --> 00:01:20,183 Python has a handy built-in tool named hashlib, 30 00:01:20,183 --> 00:01:23,423 I'll use that to explore a couple of hashing algorithms. 31 00:01:23,423 --> 00:01:27,204 Then we'll looking at salting our hashes, and finally, argon two, 32 00:01:27,204 --> 00:01:30,353 which won the password hashing competition in 2015. 33 00:01:30,353 --> 00:01:32,909 Hashing algorithms come in all shapes and sizes though, so 34 00:01:32,909 --> 00:01:36,130 be sure to find the one that's best for your use case. 35 00:01:36,130 --> 00:01:39,740 First, let's take a look at a bad algorithm to use, MD5. 36 00:01:39,740 --> 00:01:43,440 Hashes return a bunch of bytes, which we can view as either straight bytes, or 37 00:01:43,440 --> 00:01:45,450 that we can turn into hexadecimal. 38 00:01:45,450 --> 00:01:47,201 Hexadecimal is a little easier to read and 39 00:01:47,201 --> 00:01:50,514 it's more likely what you'd store in a database so I'll be focusing on them. 40 00:01:50,514 --> 00:01:54,080 But I wanna show you both of these for the first couple of hashes. 41 00:01:54,080 --> 00:01:58,210 So let's do hashlib.md5 and we have to send it bytes. 42 00:01:58,210 --> 00:02:02,084 So, we're gonna send it bytes for the word Treehouse and 43 00:02:02,084 --> 00:02:04,352 then let's looks at the digest. 44 00:02:04,352 --> 00:02:06,630 And these are just the bytes that come back out of it. 45 00:02:06,630 --> 00:02:08,410 So, bunch of bytes. 46 00:02:08,410 --> 00:02:12,527 If we do the same code and we use hexdigest, then we will get 47 00:02:12,527 --> 00:02:18,258 the hexadecimal output which you can see here is the 65492 and so on string. 48 00:02:18,258 --> 00:02:19,383 There's not a lot to see here. 49 00:02:19,383 --> 00:02:21,850 But it was definitely fast to execute, right? 50 00:02:21,850 --> 00:02:24,500 That was one of the things that md5 was very well known for. 51 00:02:24,500 --> 00:02:25,445 Let's try a stronger one. 52 00:02:25,445 --> 00:02:29,895 Must do hashlib.sha256, 53 00:02:29,895 --> 00:02:33,650 again we gonna send it some bytes, we'll send Treehouse. 54 00:02:34,740 --> 00:02:39,100 Now look at the digest, print along digest and looks like a be hex. 55 00:02:40,790 --> 00:02:43,180 So there's the head. You could see it's much longer, right? 56 00:02:43,180 --> 00:02:45,190 It's a much bigger hash output. 57 00:02:45,190 --> 00:02:49,120 This was actually once considered a very strong algorithm, but it's compromised 58 00:02:49,120 --> 00:02:53,070 now to collisions being possible within a practical amount of time. 59 00:02:53,070 --> 00:02:54,500 That's how security goes. 60 00:02:54,500 --> 00:02:56,970 Let's look at a currently secure algorithm. 61 00:02:56,970 --> 00:02:59,876 And this one is also from the shar s-h-a family, and 62 00:02:59,876 --> 00:03:04,250 is could by the National Institute of Standards and Technology or NITH. 63 00:03:04,250 --> 00:03:07,410 This algorithm's output is a little different though in that it has a variable 64 00:03:07,410 --> 00:03:08,400 amount of it. 65 00:03:08,400 --> 00:03:11,260 We can control the amount of overall security that goes into creating the hash 66 00:03:11,260 --> 00:03:13,840 and the amount of hash data that comes out. 67 00:03:13,840 --> 00:03:20,065 So lets do hashlib.shake_128 we're gonna again send it 68 00:03:20,065 --> 00:03:25,765 Treehouse and let's ask for a hexdigest that is 20 bytes long. 69 00:03:27,350 --> 00:03:28,584 There we go. 70 00:03:28,584 --> 00:03:30,552 The 128 there is gonna be overall security and 71 00:03:30,552 --> 00:03:34,220 this affects how the algorithm manipulates the data that's fed into it. 72 00:03:34,220 --> 00:03:38,140 In the hexdigest, you can see that I ask for 20 bytes of data back. 73 00:03:38,140 --> 00:03:41,860 Now, regardless of the hash that you use, it's very important to salt them. 74 00:03:41,860 --> 00:03:44,820 Result is a little bit of extra random data that your task to 75 00:03:44,820 --> 00:03:46,130 the thing that you're salting. 76 00:03:46,130 --> 00:03:49,220 Some people use the time stamp of when you signed up, your username or 77 00:03:49,220 --> 00:03:51,080 some other bit of known data. 78 00:03:51,080 --> 00:03:54,745 I'm going to use a randomly generated string from Python UUID library, 79 00:03:54,745 --> 00:03:56,495 which means I need to import that. 80 00:03:59,391 --> 00:04:03,510 And this can generate universally unique identifiers. 81 00:04:03,510 --> 00:04:05,870 So let's create a couple of variables here first. 82 00:04:05,870 --> 00:04:09,350 So we'll create a variable of the string Treehouse. 83 00:04:10,690 --> 00:04:11,594 We'll make a salt. 84 00:04:13,781 --> 00:04:18,127 And then we'll do hashlib.shake_128. 85 00:04:20,734 --> 00:04:25,500 And then we will put in our ( salt+password. 86 00:04:25,500 --> 00:04:29,140 And the we will call hexdigest (40) on that. 87 00:04:30,560 --> 00:04:33,200 And we get this nice big long harsh. 88 00:04:34,310 --> 00:04:37,950 So, there's a thing I can save in my data, it's a good thing to save. 89 00:04:37,950 --> 00:04:42,470 Except, wait a minute, I wouldn't have any way of finding what the salt was. 90 00:04:42,470 --> 00:04:44,990 So I can never verify this harsh. 91 00:04:44,990 --> 00:04:49,220 So to fix that, I'm going to append the salt on to the end of the hash. 92 00:04:49,220 --> 00:04:52,882 So I do exactly like we did before, and 93 00:04:52,882 --> 00:04:57,480 then I would add a : and then I would + the salt. 94 00:04:57,480 --> 00:05:02,160 So now we get a big long hash, we get a : right over here, 95 00:05:02,160 --> 00:05:05,590 and then we get our salt at the end here. 96 00:05:05,590 --> 00:05:06,410 Excellent. 97 00:05:06,410 --> 00:05:09,510 Now, I need to decode it, I'll just reverse that process. 98 00:05:09,510 --> 00:05:11,783 Since hexidecimal doesn't include the :, 99 00:05:11,783 --> 00:05:14,849 we know that it can be part of the hash password, or the salt. 100 00:05:14,849 --> 00:05:17,340 That means that we can chop off the bit after the :. 101 00:05:17,340 --> 00:05:19,670 Add that to the password that the user gives us, 102 00:05:19,670 --> 00:05:23,090 hash the whole thing with shake eight, and then see if the two match. 103 00:05:23,090 --> 00:05:25,570 So why do we want to salt things? 104 00:05:25,570 --> 00:05:29,850 Aren't we making it easier to break the hash by including the salt in plain text? 105 00:05:29,850 --> 00:05:33,370 Adding a salt makes it much harder to use precomputed attacks 106 00:05:33,370 --> 00:05:37,060 like lists of prehashed words and phrases against your hashes. 107 00:05:37,060 --> 00:05:39,360 You should always salt your hashes. 108 00:05:39,360 --> 00:05:42,110 If you're worried about attackers being able to see your salt you 109 00:05:42,110 --> 00:05:45,361 could always use another piece of data that are calculated like sign up date. 110 00:05:45,361 --> 00:05:48,690 Where users applied, like an email address as the salt. 111 00:05:48,690 --> 00:05:49,390 Before we wrap up, 112 00:05:49,390 --> 00:05:53,410 I wanna look at Argon2, which one the last password hashing competition. 113 00:05:53,410 --> 00:05:54,390 Argon2 does something, 114 00:05:54,390 --> 00:05:57,910 too, that many of the password focused hashing algorithms do. 115 00:05:57,910 --> 00:05:59,890 Let me use it, and I'll show you what this is. 116 00:06:01,110 --> 00:06:06,318 So I'll say from argon2 import PasswordHasher. 117 00:06:10,342 --> 00:06:13,963 And then I will hash ("Treehouse"). 118 00:06:15,280 --> 00:06:19,800 And we get back a big long string. 119 00:06:19,800 --> 00:06:23,320 See how it has the $argon2i, here at the beginning. 120 00:06:24,430 --> 00:06:28,960 That tells us that it was hashed using the argon2i algorithm. 121 00:06:28,960 --> 00:06:31,970 The other dollar sign separated values that are in here, 122 00:06:31,970 --> 00:06:34,360 tell us more about how the hash was generated. 123 00:06:34,360 --> 00:06:37,900 And finally, after this last dollar sign, is the hash itself. 124 00:06:37,900 --> 00:06:39,670 You might think that this is less secure, 125 00:06:39,670 --> 00:06:42,250 since then a tag would be able to simulate your hashing set up. 126 00:06:42,250 --> 00:06:46,021 But due to how Argon2 works, where it stresses the process and fills up memory, 127 00:06:46,021 --> 00:06:47,363 this was what matters much. 128 00:06:47,363 --> 00:06:52,475 All of this things here like the m and the t and the p, those are settings you can 129 00:06:52,475 --> 00:06:57,550 tweak to determine how much memory and processor time we're hashing uses. 130 00:06:57,550 --> 00:07:01,260 This has a major benefit because when you save your user's password into your 131 00:07:01,260 --> 00:07:04,070 database, you know how you half them. 132 00:07:04,070 --> 00:07:07,180 You can use that to hash them next time and 133 00:07:07,180 --> 00:07:12,940 you can use that to upgrade hashes in the future, pretty clever Argon2 too. 134 00:07:12,940 --> 00:07:17,122 So remember use a hashing mechanism that protects the infotax from GP use. 135 00:07:17,122 --> 00:07:20,680 Uses of time and memory against an attacker and avoid collisions. 136 00:07:20,680 --> 00:07:23,750 It's also a good idea to be sure and keep track of the algorithm that you used for 137 00:07:23,750 --> 00:07:28,040 hashing, so you can upgrade stored hashes as a tax and algorithms change. 138 00:07:28,040 --> 00:07:29,730 I've talked about it before but it's important so 139 00:07:29,730 --> 00:07:31,238 I'm gonna talk about it again. 140 00:07:31,238 --> 00:07:35,890 Systems like Argon2 are great to use for specific use cases like passwords. 141 00:07:35,890 --> 00:07:39,430 For other hashing situations though there are usually existing tools in place 142 00:07:39,430 --> 00:07:42,210 that are more specialized and appropriate for the situation. 143 00:07:42,210 --> 00:07:44,050 Things like checking the contents of a file, 144 00:07:44,050 --> 00:07:46,720 validating the origin of a message, or fingerprinting a user or 145 00:07:46,720 --> 00:07:49,900 a machine have existing solutions that you should investigate. 146 00:07:49,900 --> 00:07:52,880 Okay, now that we have things safely stored in an unreadable format, 147 00:07:52,880 --> 00:07:55,890 what would be some of our weak courses for encrypting data to get it back?