Encoding7:11 with Carling Kirk
Learn about the char type and the byte type, and a bit about text encoding.
In this video, we'll be using the C# interactive window, 0:00 which is a REPL feature inside Visual Studio. 0:03 To open it up, we'll go to View > Other Windows > C# Interactive. 0:07 Inside this window, we can execute C# code like we can in a workspaces console. 0:14 I mentioned earlier that our stream reader deals with encoding. 0:20 Let's take a look at our text file. 0:24 It contains a string called Hello, world! 0:27 So what is a string? 0:30 Well, a string is made up of one or more characters. 0:31 That brings us to the char type. 0:35 The char is a type that represents a single character. 0:38 The way we assign a value to a char type in C# is similar to a string, but 0:41 instead of using double quotes, we use single quotes. 0:45 So here's how we'd create a char variable. 0:49 char capitalH equals, 0:53 single quote, H, and semicolon. 0:56 capitalH. 1:02 Let's go find the documentation on the char type. 1:05 C# char. 1:08 Here's something. 1:12 This page is for the keyword. 1:15 We can click on System.Char to get to the actual type. 1:16 So check this out. 1:21 It's actually a struct and not a class. 1:23 A struct is a lot like a class, but it has some limitations. 1:26 We'll get into a little more about structs later on. 1:30 It says, Represents a character as a UTF-16 code unit. 1:33 So what does it mean by UTF-16 code unit? 1:39 UTF-16 is a Unicode character encoding format. 1:42 Without going into too much detail, It's good to keep in mind that 1:47 every piece of text, even every character, has some kind of encoding behind it. 1:50 This is because computers only know how to deal with numbers, and 1:56 they need some way to translate the numbers into characters. 1:59 Encoding formats are kind of like the Rosetta Stone for computers. 2:03 Unicode formats have many characters in their sets, and each character has a code, 2:07 sometimes called a control code or a codepoint. 2:12 This is so we can accommodate languages that have way more characters than 2:16 the standard English alphabet. 2:20 Let's look up the Unicode character for the lower letter h. 2:22 Do unicode letter h. 2:26 All right, here's the capital letter H. 2:30 Let's see if we can get to it from there. 2:32 Lowercase, U+0068. 2:36 We can use this value to create a char variable. 2:39 We'll do char lowerH equals, single quote, and then a backslash, 2:45 u, and then I'll paste in that code we copied from the web page. 2:52 The backslash is indicating an escape sequence like in our directory string, 2:58 and the 0068 is a hexadecimal value that represents our lower letter h. 3:03 Each character in C# is encoded as two bytes in the default encoding of UTF-16. 3:08 lowerH. 3:14 Let's try getting the underlying bytes of our lower letter h. 3:18 byte unicodeBytes. 3:21 First we'll need to specify an encoding. 3:26 So UnicodeEncoding and 3:28 Unicode.GetBytes. 3:31 And it wants a character array, so I'll pass it new character array. 3:38 And we'll fill it with the lowerH. 3:44 Okay, let's see what it's got. 3:50 And there's our two bytes. 3:54 Now we can convert them back into a string. 3:56 string unicodeString equals 3:59 UnicodeEncoding.Unicode.GetString, and 4:03 we'll pass it the unicodeBytes. 4:10 And Unicode string has our letter h. 4:16 And notice it's a string because, it's got double quotes. 4:22 So what is the byte type exactly? 4:26 In C#, a byte is an integral type, an unsigned eight-bit integer. 4:28 Integral types represent whole numbers and have a minimum and a maximum value. 4:33 Unsigned means that it can only contain positive values. 4:39 An unsigned eight-bit integer can store values from 0 to 255. 4:43 Conversely, when you see that a type is signed, it means that it can have a range 4:48 of values from a negative number to a positive number. 4:52 A signed byte in C#, declared as sbyte, is also eight-bit, but 4:55 it can have a minimum value of a negative 128 and a maximum value of 127. 5:00 Let's create one. 5:06 sbyte signedByte = -128. 5:07 If we tried to assign it a constant value that's out of range, like 200, 5:14 the compiler wouldn't let us. 5:19 Let's try it. 5:21 sbyte signedByte = 200. 5:22 We'll get into other signed and unsigned integral types later in this course. 5:29 How about a character that's not usually on a keyboard, like the degree symbol, 5:34 the unit symbol for a temperature? 5:38 Let's go look it up. 5:41 Unicode degree symbol. 5:43 Here it is. 5:45 00B0. 5:49 Gonna copy that with Ctrl+C. 5:52 Get back to our code. 5:56 So here I'll say char degree equals, single quote, 5:57 backslash, u, and I'll paste in our code from the Unicode page. 6:02 Okay, now let's see what it looks like when the console prints it out. 6:09 Console.WriteLine. 6:13 We'll type a sentence, The current temperature 6:15 is 74.6, then we'll insert our degree symbol. 6:22 Whoops, I got an equal sign in here, 6:29 needs to be a plus, Fahrenheit. 6:34 The current temperature is 74.6 degrees Fahrenheit. 6:42 In the .NET framework, when strings are created in memory, 6:46 their default encoding is UTF-16. 6:50 You'll also see UTF-8, which is usually the encoding for text files. 6:53 In IO, we can specify different encodings if we need to. 6:58 But you usually don't have to worry about it unless you're dealing with 7:01 communicating with other systems that may need a different encoding. 7:04 Check out the notes if you want to read more about different encodings in .NET. 7:07
You need to sign up for Treehouse in order to download course files.Sign up