1 00:00:00,550 --> 00:00:04,020 Most text on the Internet is a special collection of bits defined in some sort of 2 00:00:04,020 --> 00:00:05,300 character set. 3 00:00:05,300 --> 00:00:07,280 These days, that character set is probably UTF-8, 4 00:00:07,280 --> 00:00:10,530 which stands for Unicode Transfer Format. 5 00:00:10,530 --> 00:00:14,580 The 8 is just to differentiate it from other types of UTF character sets, 6 00:00:14,580 --> 00:00:15,810 which store things a little differently. 7 00:00:17,390 --> 00:00:21,110 Over the last ten years, UTF-8 has taken over from other character sets as 8 00:00:21,110 --> 00:00:23,090 the most popular character encoding, and 9 00:00:23,090 --> 00:00:26,765 is able to fit pretty much any character you would want to insert as text. 10 00:00:26,765 --> 00:00:30,560 UTF-8 can handle all Unicode characters, 11 00:00:30,560 --> 00:00:34,590 including international characters, symbols, and things like emoticons. 12 00:00:36,210 --> 00:00:38,720 Whether supporting fancy emoticons is something that interests you or 13 00:00:38,720 --> 00:00:42,500 not, supporting international characters outside of the basic a to z 14 00:00:42,500 --> 00:00:44,700 range is very important. 15 00:00:44,700 --> 00:00:47,470 Even if your application is launching somewhere like the U.S.A. or 16 00:00:47,470 --> 00:00:50,600 United Kingdom, lots of people have special characters in their name and 17 00:00:50,600 --> 00:00:55,040 that will need UTF-8 to support things like accented characters and Cyrillics. 18 00:00:55,040 --> 00:00:57,460 PHP will work with UTF-8 content and 19 00:00:57,460 --> 00:00:59,710 is getting more features to make it even easier. 20 00:00:59,710 --> 00:01:03,140 But much of that functionality is tucked away in a non-default extension 21 00:01:03,140 --> 00:01:04,810 called mbstring. 22 00:01:04,810 --> 00:01:08,370 Before we can work with mbstring, we need to make sure it's enabled. 23 00:01:08,370 --> 00:01:10,700 So open up index.php and I'll show you how to check. 24 00:01:12,990 --> 00:01:16,180 On line 3, we're calling the core function phpinfo. 25 00:01:16,180 --> 00:01:18,680 This function will output all sorts of useful information to the screen, 26 00:01:18,680 --> 00:01:22,490 including information about installed and enabled modules. 27 00:01:22,490 --> 00:01:26,310 On line 4, the exit statement is being used to halt execution of the script, 28 00:01:26,310 --> 00:01:27,930 and none of the rest of this code will run. 29 00:01:27,930 --> 00:01:30,310 We'll get to that in a second. 30 00:01:30,310 --> 00:01:32,900 But first, let's click on the eye icon to preview this script and 31 00:01:32,900 --> 00:01:36,300 see what phpinfo shows us. 32 00:01:36,300 --> 00:01:40,235 If we use the Find functionality in our browser, we can type mbstring. 33 00:01:40,235 --> 00:01:43,360 And not this first part, this is what we want to see here. 34 00:01:44,598 --> 00:01:48,800 If the mbstring section is missing or disabled, then you will need to enable it. 35 00:01:48,800 --> 00:01:51,480 But luckily for us, it's always enabled on Workspaces. 36 00:01:51,480 --> 00:01:54,250 Let's go back to our workspace and remove lines 3 or 4, and 37 00:01:54,250 --> 00:01:56,370 then take a look at the rest of this code. 38 00:01:56,370 --> 00:02:00,400 On line 6, we're calling the function named mb_internal_encoding. 39 00:02:00,400 --> 00:02:04,120 We pass it a string argument with the name of our encoding, which, in this case, 40 00:02:04,120 --> 00:02:05,370 is UTF-8. 41 00:02:05,370 --> 00:02:08,420 This lets the mbstring extension know which character encoding we 42 00:02:08,420 --> 00:02:09,820 wish to work with. 43 00:02:09,820 --> 00:02:14,880 On line 7, we call a function named mb_http_output. 44 00:02:14,880 --> 00:02:18,030 And once again, we pass it a string as an argument containing UTF-8. 45 00:02:18,030 --> 00:02:21,680 This lets mbstring know that our HTML will be output as UTF-8. 46 00:02:21,680 --> 00:02:24,160 These two lines might seem like they're overkill, but 47 00:02:24,160 --> 00:02:26,690 it's handy to be verbose when working with UTF-8. 48 00:02:26,690 --> 00:02:29,210 Line 9 is a string variable called string. 49 00:02:29,210 --> 00:02:33,390 It's full of accented characters that do not exist in ASCII but do exist in UTF-8. 50 00:02:33,390 --> 00:02:37,790 On line 11 here, we're using the header function to set the HTTP header manually. 51 00:02:37,790 --> 00:02:40,580 Headers are separated by name and value with a colon. 52 00:02:41,590 --> 00:02:45,640 And in this instance, we're setting the content type to text/html. 53 00:02:45,640 --> 00:02:48,700 That's not too interesting as that's the default header anyway, but 54 00:02:48,700 --> 00:02:50,850 we're also appending an extra attribute. 55 00:02:50,850 --> 00:02:53,930 By setting charset UTF-8 here, we're letting the browser know that 56 00:02:53,930 --> 00:02:57,078 any HTML following this is going to be UTF-8 too. 57 00:02:57,078 --> 00:03:01,730 Finally, we're outputting a basic HTML page with a title and a body. 58 00:03:01,730 --> 00:03:03,620 This contains two paragraphs. 59 00:03:03,620 --> 00:03:07,080 The first paragraph is going to output an uppercased version of this string, and 60 00:03:07,080 --> 00:03:09,280 the second is going to output the length of the string. 61 00:03:09,280 --> 00:03:11,110 Let's see what happens when we try and refresh the browser. 62 00:03:12,350 --> 00:03:14,400 Now these results look a little funny. 63 00:03:14,400 --> 00:03:17,840 PHP is doing its best to try and uppercase a string, but 64 00:03:17,840 --> 00:03:21,090 it would only be able to change the case of non-Unicode characters. 65 00:03:21,090 --> 00:03:23,900 Also, if we asked for the length of this string, the Unicode characters will 66 00:03:23,900 --> 00:03:28,180 confuse PHP and make it think the string is longer than it really is. 67 00:03:28,180 --> 00:03:32,980 Really, this string is only 36 characters long to a human, but PHP thinks it's 41. 68 00:03:32,980 --> 00:03:37,110 We can fix this quite easily by using the mb extension functions. 69 00:03:40,750 --> 00:03:42,736 All we have to do here is append mb to the front. 70 00:03:42,736 --> 00:03:45,966 [BLANK_AUDIO] 71 00:03:45,966 --> 00:03:47,710 And there we go. 72 00:03:47,710 --> 00:03:52,280 Now the count is correctly 36 characters and all of the characters being uppercase, 73 00:03:52,280 --> 00:03:53,638 not just the ASCII ones. 74 00:03:53,638 --> 00:03:57,600 There's a mbstring replacement function for pretty much every core 75 00:03:57,600 --> 00:04:01,680 string function, so just try shoving mb on the front and it should work. 76 00:04:01,680 --> 00:04:04,448 This will get you started working with the UTF-8 and PHP. 77 00:04:04,448 --> 00:04:06,479 But once you start interacting with a database, 78 00:04:06,479 --> 00:04:09,170 you'll need to get that set up as UTF-8 too. 79 00:04:09,170 --> 00:04:12,790 Visit PHP The Right Way to see a little bit about how that all works. 80 00:04:12,790 --> 00:04:13,820 Let's move on to the next stage.