Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python

Mineral Catalog (tech degree) JSON file

So the part of this project where it asks us to:

"Write a script to that constructs a mineral model instance for each mineral in minerals.json and saves them to a SQLite database. "

I have tried every which way to open the Download file for this project, minerals.json. I am constantly getting a UnicodeDecodeError even if I try to set an "encoding" type I just get the same UnicodeDecodeError with a different byte that it cant decode. Is there a special way I didn't learn about to open this file or are others having an issue with this file as well?

Also this project doesnt list any resources or specify if we must use Flask or Django or if it is up to us to pick a framework. So I just chose a framework, Django to build this project in.

EDIT

The error is happening when line 10 from the minerals.json file is being read.

which contains this string:

{
    "unit cell": "a = 8.508 Å, b = 11.185 Åc=7.299 Å, α = 90.85°β = 114.1°, γ = 79.99°Z = 1"
}

Lacey Williams Henschel or Kenneth Love

would you have any insight into this? Most of the posts on StackOverflow just switch encoding or use some really crazy methods or arent working with the same data type.

I am running on Windows 10, Python 3.5.2 (64 bit).

No matter how I open the file, at the moment I am doing it the recommended way to open a file.

with open('somefolder/minerals.json') as data_file:
    # do stuff

It WILL open the file, but if i attempt to loop or use a library that attempts to loop over that line. Or even if I try to print to the console the entire dictionary. It will Unicode Error at that point.

I have tried changing the open encoding. I have tried a .encode on the line itself.

I am stumped. I am probably missing something small.

I am going to go pull my files over to my Linux machine and see if I can get the same error.

7 Answers

So the only thing that seems to work at this point and what I am going to stick to using for now for Unicode Issues in Windows, before running the script in command prompt everytime I run.

chcp 65001

From all the reading I have done, this requires the least amount of non-sense I would need to force in my script file. Also this does notstick as a permanent change, it only lasts for that instance that Command Prompt is open. Once I close it goes back to Code page 437.

I have a feeling this is related to how Region settings are set in Control Panel. Maybe the practical default is for Windows to use Code page 65001 (UTF-8) from the start instead of this Code page 437 (DOS).

NOTE: That if you try to force a permanent Code page change to something like 65001 with a batch file or something and you did a restart on your machine, the moment it tries to boot up to the OS it will fail. Then you would have to do a system restore.

If anyone finds a better solution than this, let me know. And if anyone else using Windows runs into this, share your issue too. I feel like some poor soul is going to get stuck in this same situation.

Kenneth Love
STAFF
Kenneth Love
Treehouse Guest Teacher

Can you paste in the code that causes the exception and the exception message(s)?

When you changed the encoding, was it something like

with open("somefolder/minerals.json", encoding="utf-8") as data_file:

?

Sure can! So this is the relevant part of the function

filename = os.path.join(BASE_DIR, options['filename'])
try:
    with open(filename, 'r', encoding='utf-8') as data_file:
        # example 1
        print(data_file) # This WORKS, will print this as an _io.TextIOWrapper obj

        # Example 2
        data = json.load(data_file)
            print(data) # this would fail

        # Example 3
        for item in data_file: # this fails at start of the loop
            print(item)

except FileNotFoundError:
    print('Error, that file does not exist.')

I started throwing anything at it, even tried csv.DictReader. if I add in extra exception handling to the above snippet such as:

# Additional Exceptions

except UnicodeDecodeError:
    print('Unicode Decode Error')
    pass
    except UnicodeEncodeError:
    print('Unicode Encode Error')
    pass

My console output will pass on the stack trace but print the following from the minerals.json but ONLY for the for loop, I had to comment out the json.load section because that just fails and skips the code below it.

[

        {

                "name": "Abelsonite",

                "image filename": "240px-Abelsonite_-_Green_River_Formation%2C_Uintah_County%2C_Utah%2C_USA.jpg",

                "image caption": "Abelsonite from the Green River Formation, Uintah County, Utah, US",

                "category": "Organic",

                "formula": "C<sub>31</sub>H<sub>32</sub>N<sub>4</sub>Ni",

                "strunz classification": "10.CA.20",

                "crystal system": "Triclinic",

Unicode Encode Error

This is where it fails, every time.

The very next iteration in that loop, is looping over the unit cell key in the dict. Which I posted the value of that in the initial post. I think maybe that symbol at column 60?

The actual error message reads as this.

UnicodeEncodeError: 'charmap' codec can't encode character '\u03b2' in position 62: character maps to <undefined>
Kenneth Love
STAFF
Kenneth Love
Treehouse Guest Teacher

Hmm, wonder if it's the json.load...

Try reading the file into a string and using json.loads()?

with open(filename, 'r', encoding='utf-8') as data_file:
                    data = json.loads(data_file.read())
                    print(data)

So something like this? With this one I get another UnicodeEncodeError but much further past character 62.

Now the error reads this.

UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 534: character maps to <undefined>

Which seems to be the En Dash

Kenneth Love
Kenneth Love
Treehouse Guest Teacher

Hmm, OK.

json.loads(data_file.read(), encoding="utf-8")

?

Full Error with Path, Python is angry at me. LOL

File "C:\Program Files\Python35\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 27: character maps to <undefined>

So Kenneth,

I tried creating a new minerals.json file called minerals2.json

I threw in a List of Dictionary Items with 3 sets (3 dictionary items inside the list) containing 4 key value string pairs. I put some random string characters in each and also copied a few lines from minerals.json that didnt contain any symbols in their string.

It ran without errors and printed fine.

Jeremy McLain
STAFF
Jeremy McLain
Treehouse Guest Teacher

I downloaded the files from the project and ran this code in the console.

import json
minerals = json.load(open('minerals.json'))

http://d.pr/i/11aoE

It loaded fine. All of the "special" unicode characters printed to the console. I tried it in both Python 2 and 3. This is on a Mac. Perhaps the file is corrupt somehow. Can you try downloading and extracting the file again?

I redownloaded the zip, unpacked the minerals.json file to the proper directory and tried it to run it again.

I am beginning to feel this is a specific issue with Unicode + Windows + Python in Command Prompt. Doing some searches with these as keywords I found this bug

Going to look into this a bit more, but for right now I am not able to get Minerals Catalog working yet. :(

Jeremy McLain
STAFF
Jeremy McLain
Treehouse Guest Teacher

I just tried it on Windows and I got the error.

Python 3.4.4 (v3.4.4:737efcadf5a6, Dec 20 2015, 19:28:18) [MSC v.1600 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> minerals = json.load(open('minerals.json'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python34\lib\json\__init__.py", line 265, in load
    return loads(fp.read(),
  File "C:\Python34\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 3596: character maps to <undefined>

Setting the encoding to utf-8 fixed it though.

>>> minerals = json.load(open('minerals.json', encoding='utf-8'))
>>> minerals
[{'image filename': '240px-Abelsonite_-_Green_River_Formation%2C_Uintah_County%2C_Utah%2C_USA.jpg', 'cleavage': 'Probable on {111}', 'name': 'Abelsonite', 'crystal system': 'Triclinic', 'mohs scale hardness': '2\u20133', 'group': 'Organic Minerals', 'diaphaneity': 'Semitransparent', 'optical properties': 'Biaxial', 'luster': 'Adamantine, sub-metallic', 'streak': 'Pink', 'formula': 'C<sub>31</sub>H<sub>32</sub>N<sub>4</sub>Ni', 'image caption': 'Abelsonite from the Green River Formation, Uintah County, Utah, US', 'color': 'Pink-purple, dark greyish purple, pale purplish red, reddish brown', 'strunz classification': '10.CA.20', 'unit cell': 'a = 8.508 Å, b = 11.185 Åc=7.299 Å, α = 90.85°\u03b2 = 114.1°, \u03b3 = 

I can only assume that Python on Windows defaults to using ASCII instead of UTF8 encoding when reading files.

So weird! Mine doesn't run it, I still get a Unicode error.

What happens when you go into the python shell in CMD and type:

EDIT

>>> import sys
>>> sys.getdefaultencoding()
>>> print(sys.stdout.encoding)

I get

'utf-8'

cp437

I THINK that Code Page ( cp437 ) may be the problem?

Also I found this bug

Jeremy McLain

would you be able to give Python 3.5.2 a try on your Windows OS and do this again?

So I have another Desktop PC which is running Windows 10 as well though it did not have Python installed on it.

Since you were running Python 3.4.4 (32bit) on Windows I thought I would try the same.

I went ahead and tried installing then uninstalling each of the following and tried running that script with minerals.json in this order.

  • Python 3.4.4 (32bit then 64bit)
  • Python 3.5.2 (32bit then 64bit)

No success with any of them. Ugh.

James J. McCombie
seal-mask
.a{fill-rule:evenodd;}techdegree seal-36
James J. McCombie
Python Web Development Techdegree Graduate 21,199 Points

Hello, this might be a but late but...I had the same problem initially but this worked:

with open('<filepath>', encoding='utf8', mode='r') as file:
    data = json.load(file)

only difference I can see is 'utf-8' versus 'utf8'

From what I remember, this didn't work for me both ways. But since this was 2 months ago I can't recall every step, for sure this was one of them. Now I'll have to give this a shot again on the machine I had the original problem just for curiosity and a sanity check. I'll let you know what comes of it. :)

Okay so I think I found something that worked for me. I got it to load EVERYTHING from that minerals.json file now.

I did these steps from this reported bug

even though its fairly outdated, I ran these in my Windows Powershell

Glenn Linderman's Solution

set PYTHONIOENCODING=UTF-8
cmd /u /k chcp 65001
set PYTHONIOENCODING=
exit

EDIT

This way only works while the CMD is open, the moment you close it. You would have to retype these to get it to work again.

NOTE: I found this is not a recommended way even as a temp fix, it can cause other problems when you try to run .bat files and such. So Im down voting this ;-)

Im not sure if

set PYTHONIOENCODING=

even mattered at this point, the first two seemed to do the trick. Well the first one did the trick but the symbols weren't displaying properly.

BEFORE ANY COMMANDS http://d.pr/i/YVlP

AFTER set PYTHONIOENCODING=UTF-8 https://d.pr/sFRx

AFTER cmd /u /k chcp 65001 https://d.pr/3BUI