Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

HTML Treehouse Club: HTML Publish a Story The Head

Can someone explain what is a character set, and why is it UTF-8?

I have been watching The Head and I need someone to explain what a character set is in different words. And why is the character set UTF-8? Can someone also tell me what are other character sets?

Aakash Srivastav
seal-mask
.a{fill-rule:evenodd;}techdegree
Aakash Srivastav
Full Stack JavaScript Techdegree Student 11,638 Points

The Unicode Consortium develops the Unicode Standard. Their goal is to replace the existing character sets with its standard Unicode Transformation Format (UTF).
The Unicode Standard has become a success and is implemented in HTML, XML, Java, JavaScript, E-mail, ASP, PHP, etc. The Unicode standard is also supported in many operating systems and all modern browsers.

Because the character sets in ISO-8859 was limited in size, and not compatible in multilingual environments, the Unicode Consortium developed the Unicode Standard.
The Unicode Standard covers (almost) all the characters, punctuations, and symbols in the world.

Unicode enables processing, storage, and transport of text independent of platform and language.

The default character encoding in HTML-5 is UTF-8.

If an HTML5 web page uses a different character set than UTF-8, it should be specified in the <meta> tag like:

<meta charset="ISO-8859-1">

Unicode is a character set. UTF-8 is encoding.
Unicode is a list of characters with unique decimal numbers (code points). A = 65, B = 66, C = 67, ....
This list of decimal numbers represent the string "hello": 104 101 108 108 111
Encoding is how these numbers are translated into binary numbers to be stored in a computer:
UTF-8 encoding will store "hello" like this (binary): 01101000 01100101 01101100 01101100 01101111

3 Answers

Adding on to the above information i would like to say that UTF-8 is an international encoding standard for use with different languages and scripts, by which each letter, digit, or symbol is assigned a unique numeric value that applies across different platforms and programs. UTF-8 is is one of the versions of the character set developed by UNICODE .Unicode created this universal character encoding standard. It defines the way individual characters are represented in text files, web pages, and other types of documents. In simple words UTF-8 is the standard version and is used to encode the entire information or the codes we write while creating a html document and other documents as well. basically encoding here means to put information into a form in which it can be stored and here this stores all the information or the codes we write in a computer language or the numeric values i.e binary so that the entire information we write in html or other documents is accessible in other platforms and programs as well i.e the characters in our entire html document represents the same symbols and variables when we run the same html document in other languages and in other programs and platforms and mean the same and even represent the same syntax as used in html 5.basically the earlier character sets cannot be used universally on all the platforms and do not contain symbols and variables used in other languages and enviroments therefore Unicode created a universal standard encoding format that can be used anywhere and everywhere which do the same work in multi languages and enviroments and contain all the symbols and variables used in all languages and finally converts them to computer language .Like HTML is the programming language used to encode or convert documents in a way so that they can be displayed on any computer which has access the Web and its tags encode information in a way so that they can tell the computer to display it in browser since computer understands this html language similarly UTF-8 which is one of the universal standard encoding format created by Unicode consortium is used to store all that information in a way that can be read by computers i.e it stores it in the computer language i.e binary digits like in the form 0001110100111 and can be easily accessible in other platforms other than computers and in the same language that the user wants it in to be and the entire document represents the same characters and meaning in all the languages .

CONCLUSION ---- it converts all the variables and characters written by us in html or other documents in the form that it can be stored in the computer and as we know computer is comfortable with binary therefore it converts it in binary and also in a way that it represents the same meaning and the same characters when the same document is run on other plaforms and in other languages globally.

NOTE - AFTER READING THIS , READ AAKASH'S INFO ON UTF-8 GIVEN ABOVE, U WILL BE ABLE TO UNDERSTAND IT QUICKLY WHAT HE MEANS TO SAY IN HIS REMARKABLE INFORMATION.

Umm, well, Thanks! Does anyone else have a simpler answer?

kayla Reid
kayla Reid
1,275 Points

So basically, simpler, answer would be, say back in the 1960's when communication between computers were much simpler rather than now, if you wanted to send a file to another computer it was all in BINARY which many people even the computers found that all these 0's and 1's were UNNECESSARY, so the WORLD WIDE web was created, each country made their own coding in their own language. To avoid any complications they created UNF-8 which basically makes it easier for any coder in any part of the world to make their HTML code because it basically just translates your HTML code into binary. i hope it helps, i tried learning it the simplest i could, and it worked basic and simple grasp.