Welcome to the Treehouse Community
Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.
Looking to learn something new?
Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.
Start your free trialChris Howell
Python Web Development Techdegree Graduate 49,702 PointsMineral Catalog (tech degree) JSON file
So the part of this project where it asks us to:
"Write a script to that constructs a mineral model instance for each mineral in minerals.json and saves them to a SQLite database. "
I have tried every which way to open the Download file for this project, minerals.json. I am constantly getting a UnicodeDecodeError even if I try to set an "encoding" type I just get the same UnicodeDecodeError with a different byte that it cant decode. Is there a special way I didn't learn about to open this file or are others having an issue with this file as well?
Also this project doesnt list any resources or specify if we must use Flask or Django or if it is up to us to pick a framework. So I just chose a framework, Django to build this project in.
EDIT
The error is happening when line 10 from the minerals.json file is being read.
which contains this string:
{
"unit cell": "a = 8.508 Å, b = 11.185 Åc=7.299 Å, α = 90.85°β = 114.1°, γ = 79.99°Z = 1"
}
7 Answers
Chris Howell
Python Web Development Techdegree Graduate 49,702 PointsSo the only thing that seems to work at this point and what I am going to stick to using for now for Unicode Issues in Windows, before running the script in command prompt everytime I run.
chcp 65001
From all the reading I have done, this requires the least amount of non-sense I would need to force in my script file. Also this does notstick as a permanent change, it only lasts for that instance that Command Prompt is open. Once I close it goes back to Code page 437.
I have a feeling this is related to how Region settings are set in Control Panel. Maybe the practical default is for Windows to use Code page 65001 (UTF-8) from the start instead of this Code page 437 (DOS).
NOTE: That if you try to force a permanent Code page change to something like 65001 with a batch file or something and you did a restart on your machine, the moment it tries to boot up to the OS it will fail. Then you would have to do a system restore.
If anyone finds a better solution than this, let me know. And if anyone else using Windows runs into this, share your issue too. I feel like some poor soul is going to get stuck in this same situation.
Kenneth Love
Treehouse Guest TeacherCan you paste in the code that causes the exception and the exception message(s)?
When you changed the encoding, was it something like
with open("somefolder/minerals.json", encoding="utf-8") as data_file:
?
Chris Howell
Python Web Development Techdegree Graduate 49,702 PointsSure can! So this is the relevant part of the function
filename = os.path.join(BASE_DIR, options['filename'])
try:
with open(filename, 'r', encoding='utf-8') as data_file:
# example 1
print(data_file) # This WORKS, will print this as an _io.TextIOWrapper obj
# Example 2
data = json.load(data_file)
print(data) # this would fail
# Example 3
for item in data_file: # this fails at start of the loop
print(item)
except FileNotFoundError:
print('Error, that file does not exist.')
I started throwing anything at it, even tried csv.DictReader. if I add in extra exception handling to the above snippet such as:
# Additional Exceptions
except UnicodeDecodeError:
print('Unicode Decode Error')
pass
except UnicodeEncodeError:
print('Unicode Encode Error')
pass
My console output will pass on the stack trace but print the following from the minerals.json but ONLY for the for loop, I had to comment out the json.load section because that just fails and skips the code below it.
[
{
"name": "Abelsonite",
"image filename": "240px-Abelsonite_-_Green_River_Formation%2C_Uintah_County%2C_Utah%2C_USA.jpg",
"image caption": "Abelsonite from the Green River Formation, Uintah County, Utah, US",
"category": "Organic",
"formula": "C<sub>31</sub>H<sub>32</sub>N<sub>4</sub>Ni",
"strunz classification": "10.CA.20",
"crystal system": "Triclinic",
Unicode Encode Error
This is where it fails, every time.
The very next iteration in that loop, is looping over the unit cell key in the dict. Which I posted the value of that in the initial post. I think maybe that symbol at column 60?
Chris Howell
Python Web Development Techdegree Graduate 49,702 PointsThe actual error message reads as this.
UnicodeEncodeError: 'charmap' codec can't encode character '\u03b2' in position 62: character maps to <undefined>
Kenneth Love
Treehouse Guest TeacherHmm, wonder if it's the json.load
...
Try read
ing the file into a string and using json.loads()
?
Chris Howell
Python Web Development Techdegree Graduate 49,702 Pointswith open(filename, 'r', encoding='utf-8') as data_file:
data = json.loads(data_file.read())
print(data)
So something like this? With this one I get another UnicodeEncodeError but much further past character 62.
Now the error reads this.
UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 534: character maps to <undefined>
Which seems to be the En Dash
Kenneth Love
Treehouse Guest TeacherHmm, OK.
json.loads(data_file.read(), encoding="utf-8")
?
Chris Howell
Python Web Development Techdegree Graduate 49,702 PointsFull Error with Path, Python is angry at me. LOL
File "C:\Program Files\Python35\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 27: character maps to <undefined>
Chris Howell
Python Web Development Techdegree Graduate 49,702 PointsSo Kenneth,
I tried creating a new minerals.json file called minerals2.json
I threw in a List of Dictionary Items with 3 sets (3 dictionary items inside the list) containing 4 key value string pairs. I put some random string characters in each and also copied a few lines from minerals.json that didnt contain any symbols in their string.
It ran without errors and printed fine.
Jeremy McLain
Treehouse Guest TeacherI downloaded the files from the project and ran this code in the console.
import json
minerals = json.load(open('minerals.json'))
It loaded fine. All of the "special" unicode characters printed to the console. I tried it in both Python 2 and 3. This is on a Mac. Perhaps the file is corrupt somehow. Can you try downloading and extracting the file again?
Chris Howell
Python Web Development Techdegree Graduate 49,702 PointsI redownloaded the zip, unpacked the minerals.json file to the proper directory and tried it to run it again.
I am beginning to feel this is a specific issue with Unicode + Windows + Python in Command Prompt. Doing some searches with these as keywords I found this bug
Going to look into this a bit more, but for right now I am not able to get Minerals Catalog working yet. :(
Jeremy McLain
Treehouse Guest TeacherI just tried it on Windows and I got the error.
Python 3.4.4 (v3.4.4:737efcadf5a6, Dec 20 2015, 19:28:18) [MSC v.1600 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> minerals = json.load(open('minerals.json'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python34\lib\json\__init__.py", line 265, in load
return loads(fp.read(),
File "C:\Python34\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 3596: character maps to <undefined>
Setting the encoding to utf-8 fixed it though.
>>> minerals = json.load(open('minerals.json', encoding='utf-8'))
>>> minerals
[{'image filename': '240px-Abelsonite_-_Green_River_Formation%2C_Uintah_County%2C_Utah%2C_USA.jpg', 'cleavage': 'Probable on {111}', 'name': 'Abelsonite', 'crystal system': 'Triclinic', 'mohs scale hardness': '2\u20133', 'group': 'Organic Minerals', 'diaphaneity': 'Semitransparent', 'optical properties': 'Biaxial', 'luster': 'Adamantine, sub-metallic', 'streak': 'Pink', 'formula': 'C<sub>31</sub>H<sub>32</sub>N<sub>4</sub>Ni', 'image caption': 'Abelsonite from the Green River Formation, Uintah County, Utah, US', 'color': 'Pink-purple, dark greyish purple, pale purplish red, reddish brown', 'strunz classification': '10.CA.20', 'unit cell': 'a = 8.508 Å, b = 11.185 Åc=7.299 Å, α = 90.85°\u03b2 = 114.1°, \u03b3 =
I can only assume that Python on Windows defaults to using ASCII instead of UTF8 encoding when reading files.
Chris Howell
Python Web Development Techdegree Graduate 49,702 PointsSo weird! Mine doesn't run it, I still get a Unicode error.
What happens when you go into the python shell in CMD and type:
EDIT
>>> import sys
>>> sys.getdefaultencoding()
>>> print(sys.stdout.encoding)
I get
'utf-8'
cp437
I THINK that Code Page ( cp437 ) may be the problem?
Chris Howell
Python Web Development Techdegree Graduate 49,702 PointsAlso I found this bug
would you be able to give Python 3.5.2 a try on your Windows OS and do this again?
Chris Howell
Python Web Development Techdegree Graduate 49,702 PointsSo I have another Desktop PC which is running Windows 10 as well though it did not have Python installed on it.
Since you were running Python 3.4.4 (32bit) on Windows I thought I would try the same.
I went ahead and tried installing then uninstalling each of the following and tried running that script with minerals.json in this order.
- Python 3.4.4 (32bit then 64bit)
- Python 3.5.2 (32bit then 64bit)
No success with any of them. Ugh.
James J. McCombie
Python Web Development Techdegree Graduate 21,199 PointsHello, this might be a but late but...I had the same problem initially but this worked:
with open('<filepath>', encoding='utf8', mode='r') as file:
data = json.load(file)
only difference I can see is 'utf-8' versus 'utf8'
Chris Howell
Python Web Development Techdegree Graduate 49,702 PointsFrom what I remember, this didn't work for me both ways. But since this was 2 months ago I can't recall every step, for sure this was one of them. Now I'll have to give this a shot again on the machine I had the original problem just for curiosity and a sanity check. I'll let you know what comes of it. :)
Chris Howell
Python Web Development Techdegree Graduate 49,702 PointsOkay so I think I found something that worked for me. I got it to load EVERYTHING from that minerals.json file now.
I did these steps from this reported bug
even though its fairly outdated, I ran these in my Windows Powershell
Glenn Linderman's Solution
set PYTHONIOENCODING=UTF-8
cmd /u /k chcp 65001
set PYTHONIOENCODING=
exit
EDIT
This way only works while the CMD is open, the moment you close it. You would have to retype these to get it to work again.
NOTE: I found this is not a recommended way even as a temp fix, it can cause other problems when you try to run .bat files and such. So Im down voting this ;-)
Chris Howell
Python Web Development Techdegree Graduate 49,702 PointsIm not sure if
set PYTHONIOENCODING=
even mattered at this point, the first two seemed to do the trick. Well the first one did the trick but the symbols weren't displaying properly.
BEFORE ANY COMMANDS http://d.pr/i/YVlP
AFTER set PYTHONIOENCODING=UTF-8 https://d.pr/sFRx
AFTER cmd /u /k chcp 65001 https://d.pr/3BUI
Chris Howell
Python Web Development Techdegree Graduate 49,702 PointsChris Howell
Python Web Development Techdegree Graduate 49,702 PointsLacey Williams Henschel or Kenneth Love
would you have any insight into this? Most of the posts on StackOverflow just switch encoding or use some really crazy methods or arent working with the same data type.
I am running on Windows 10, Python 3.5.2 (64 bit).
No matter how I open the file, at the moment I am doing it the recommended way to open a file.
It WILL open the file, but if i attempt to loop or use a library that attempts to loop over that line. Or even if I try to print to the console the entire dictionary. It will Unicode Error at that point.
I have tried changing the open encoding. I have tried a .encode on the line itself.
I am stumped. I am probably missing something small.
I am going to go pull my files over to my Linux machine and see if I can get the same error.