Understanding UTF-84:14 with Phil Sturgeon
If time zones are hard then UTF-8 is a whole other problem and many developers don’t even know what it is. PHP can help you out, but you need to enable UTF-8 for the database, HTTP headers, HTML output, and when working with strings.
Most text on the Internet is a special collection of bits defined in some sort of 0:00 character set. 0:04 These days, that character set is probably UTF-8, 0:05 which stands for Unicode Transfer Format. 0:07 The 8 is just to differentiate it from other types of UTF character sets, 0:10 which store things a little differently. 0:14 Over the last ten years, UTF-8 has taken over from other character sets as 0:17 the most popular character encoding, and 0:21 is able to fit pretty much any character you would want to insert as text. 0:23 UTF-8 can handle all Unicode characters, 0:26 including international characters, symbols, and things like emoticons. 0:30 Whether supporting fancy emoticons is something that interests you or 0:36 not, supporting international characters outside of the basic a to z 0:38 range is very important. 0:42 Even if your application is launching somewhere like the U.S.A. or 0:44 United Kingdom, lots of people have special characters in their name and 0:47 that will need UTF-8 to support things like accented characters and Cyrillics. 0:50 PHP will work with UTF-8 content and 0:55 is getting more features to make it even easier. 0:57 But much of that functionality is tucked away in a non-default extension 0:59 called mbstring. 1:03 Before we can work with mbstring, we need to make sure it's enabled. 1:04 So open up index.php and I'll show you how to check. 1:08 On line 3, we're calling the core function phpinfo. 1:12 This function will output all sorts of useful information to the screen, 1:16 including information about installed and enabled modules. 1:18 On line 4, the exit statement is being used to halt execution of the script, 1:22 and none of the rest of this code will run. 1:26 We'll get to that in a second. 1:27 But first, let's click on the eye icon to preview this script and 1:30 see what phpinfo shows us. 1:32 If we use the Find functionality in our browser, we can type mbstring. 1:36 And not this first part, this is what we want to see here. 1:40 If the mbstring section is missing or disabled, then you will need to enable it. 1:44 But luckily for us, it's always enabled on Workspaces. 1:48 Let's go back to our workspace and remove lines 3 or 4, and 1:51 then take a look at the rest of this code. 1:54 On line 6, we're calling the function named mb_internal_encoding. 1:56 We pass it a string argument with the name of our encoding, which, in this case, 2:00 is UTF-8. 2:04 This lets the mbstring extension know which character encoding we 2:05 wish to work with. 2:08 On line 7, we call a function named mb_http_output. 2:09 And once again, we pass it a string as an argument containing UTF-8. 2:14 This lets mbstring know that our HTML will be output as UTF-8. 2:18 These two lines might seem like they're overkill, but 2:21 it's handy to be verbose when working with UTF-8. 2:24 Line 9 is a string variable called string. 2:26 It's full of accented characters that do not exist in ASCII but do exist in UTF-8. 2:29 On line 11 here, we're using the header function to set the HTTP header manually. 2:33 Headers are separated by name and value with a colon. 2:37 And in this instance, we're setting the content type to text/html. 2:41 That's not too interesting as that's the default header anyway, but 2:45 we're also appending an extra attribute. 2:48 By setting charset UTF-8 here, we're letting the browser know that 2:50 any HTML following this is going to be UTF-8 too. 2:53 Finally, we're outputting a basic HTML page with a title and a body. 2:57 This contains two paragraphs. 3:01 The first paragraph is going to output an uppercased version of this string, and 3:03 the second is going to output the length of the string. 3:07 Let's see what happens when we try and refresh the browser. 3:09 Now these results look a little funny. 3:12 PHP is doing its best to try and uppercase a string, but 3:14 it would only be able to change the case of non-Unicode characters. 3:17 Also, if we asked for the length of this string, the Unicode characters will 3:21 confuse PHP and make it think the string is longer than it really is. 3:23 Really, this string is only 36 characters long to a human, but PHP thinks it's 41. 3:28 We can fix this quite easily by using the mb extension functions. 3:32 All we have to do here is append mb to the front. 3:40 [BLANK_AUDIO] 3:42 And there we go. 3:45 Now the count is correctly 36 characters and all of the characters being uppercase, 3:47 not just the ASCII ones. 3:52 There's a mbstring replacement function for pretty much every core 3:53 string function, so just try shoving mb on the front and it should work. 3:57 This will get you started working with the UTF-8 and PHP. 4:01 But once you start interacting with a database, 4:04 you'll need to get that set up as UTF-8 too. 4:06 Visit PHP The Right Way to see a little bit about how that all works. 4:09 Let's move on to the next stage. 4:12
You need to sign up for Treehouse in order to download course files.Sign up