Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
If time zones are hard then UTF-8 is a whole other problem and many developers don’t even know what it is. PHP can help you out, but you need to enable UTF-8 for the database, HTTP headers, HTML output, and when working with strings.
Most text on the Internet is a special
collection of bits defined in some sort of
0:00
character set.
0:04
These days, that character set is probably
UTF-8,
0:05
which stands for Unicode Transfer Format.
0:07
The 8 is just to differentiate it from
other types of UTF character sets,
0:10
which store things a little differently.
0:14
Over the last ten years, UTF-8 has taken
over from other character sets as
0:17
the most popular character encoding, and
0:21
is able to fit pretty much any character
you would want to insert as text.
0:23
UTF-8 can handle all Unicode characters,
0:26
including international characters,
symbols, and things like emoticons.
0:30
Whether supporting fancy emoticons is
something that interests you or
0:36
not, supporting international characters
outside of the basic a to z
0:38
range is very important.
0:42
Even if your application is launching
somewhere like the U.S.A. or
0:44
United Kingdom, lots of people have
special characters in their name and
0:47
that will need UTF-8 to support things
like accented characters and Cyrillics.
0:50
PHP will work with UTF-8 content and
0:55
is getting more features to make it even
easier.
0:57
But much of that functionality is tucked
away in a non-default extension
0:59
called mbstring.
1:03
Before we can work with mbstring, we need
to make sure it's enabled.
1:04
So open up index.php and I'll show you how
to check.
1:08
On line 3, we're calling the core function
phpinfo.
1:12
This function will output all sorts of
useful information to the screen,
1:16
including information about installed and
enabled modules.
1:18
On line 4, the exit statement is being
used to halt execution of the script,
1:22
and none of the rest of this code will
run.
1:26
We'll get to that in a second.
1:27
But first, let's click on the eye icon to
preview this script and
1:30
see what phpinfo shows us.
1:32
If we use the Find functionality in our
browser, we can type mbstring.
1:36
And not this first part, this is what we
want to see here.
1:40
If the mbstring section is missing or
disabled, then you will need to enable it.
1:44
But luckily for us, it's always enabled on
Workspaces.
1:48
Let's go back to our workspace and remove
lines 3 or 4, and
1:51
then take a look at the rest of this code.
1:54
On line 6, we're calling the function
named mb_internal_encoding.
1:56
We pass it a string argument with the name
of our encoding, which, in this case,
2:00
is UTF-8.
2:04
This lets the mbstring extension know
which character encoding we
2:05
wish to work with.
2:08
On line 7, we call a function named
mb_http_output.
2:09
And once again, we pass it a string as an
argument containing UTF-8.
2:14
This lets mbstring know that our HTML will
be output as UTF-8.
2:18
These two lines might seem like they're
overkill, but
2:21
it's handy to be verbose when working with
UTF-8.
2:24
Line 9 is a string variable called string.
2:26
It's full of accented characters that do
not exist in ASCII but do exist in UTF-8.
2:29
On line 11 here, we're using the header
function to set the HTTP header manually.
2:33
Headers are separated by name and value
with a colon.
2:37
And in this instance, we're setting the
content type to text/html.
2:41
That's not too interesting as that's the
default header anyway, but
2:45
we're also appending an extra attribute.
2:48
By setting charset UTF-8 here, we're
letting the browser know that
2:50
any HTML following this is going to be
UTF-8 too.
2:53
Finally, we're outputting a basic HTML
page with a title and a body.
2:57
This contains two paragraphs.
3:01
The first paragraph is going to output an
uppercased version of this string, and
3:03
the second is going to output the length
of the string.
3:07
Let's see what happens when we try and
refresh the browser.
3:09
Now these results look a little funny.
3:12
PHP is doing its best to try and uppercase
a string, but
3:14
it would only be able to change the case
of non-Unicode characters.
3:17
Also, if we asked for the length of this
string, the Unicode characters will
3:21
confuse PHP and make it think the string
is longer than it really is.
3:23
Really, this string is only 36 characters
long to a human, but PHP thinks it's 41.
3:28
We can fix this quite easily by using the
mb extension functions.
3:32
All we have to do here is append mb to the
front.
3:40
[BLANK_AUDIO]
3:42
And there we go.
3:45
Now the count is correctly 36 characters
and all of the characters being uppercase,
3:47
not just the ASCII ones.
3:52
There's a mbstring replacement function
for pretty much every core
3:53
string function, so just try shoving mb on
the front and it should work.
3:57
This will get you started working with the
UTF-8 and PHP.
4:01
But once you start interacting with a
database,
4:04
you'll need to get that set up as UTF-8
too.
4:06
Visit PHP The Right Way to see a little
bit about how that all works.
4:09
Let's move on to the next stage.
4:12
You need to sign up for Treehouse in order to download course files.
Sign up