Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python

Oziel Perez
Oziel Perez
61,321 Points

Unicode issues while using peewee, pymysql, and python

So I have set up a connection to a MySQL database with python3.5.1 and I'm using the Peewee orm with it. I did a test run to see if the python script would fetch all the records in a table and print them all out. When running the script in the terminal, it works perfectly. But when running from the browser, I only get a single record. Here's the code:

#!/Applications/AMPPS/python/bin/python3.5
# -*- coding:UTF-8 -*-

# Step 1) import peewee functions for working with databases, automatically imports PyMySQL for you
from peewee import *

# Step 2) create a MySQL database object to be stored in a variable. Setup with dbname, hostname, port #, username and password
db = MySQLDatabase("mydatabase", host="localhost", port=3306, user="myusername", passwd="mypassword", use_unicode=True, charset='utf8')

# Step 3) create a model that will layout a table from the database
class Translations(Model):
    # A base model that will use our MySQL database
    Page = CharField(max_length=100, unique=False)
    Element = CharField(max_length=50, unique=True)
    English = TextField()
    Spanish = TextField()

    class Meta:
        database = db

# Step 4) if the file is run from the start and not imported, connect to database and start queries
if __name__ == "__main__":
    db.connect()
    result = db.execute_sql("SELECT * FROM Translations") #returs a Cursor object. If selecting, you must now fetch records
    print("Content-type:text/html;charset=utf-8 \r\n\r\n")
    for row in result.fetchall(): #When selecting, fetchall will return all records as tuples
        text = "<div>" + row[0] + ", " + row[1] + ", " + row[2] + ", " + row[3] + "</div>";
        print(text)

After a while of more debugging, I checked the error logs in my virtual server in which I'm running this script and I noticed that I kept getting this error:

[Tue Aug 30 15:39:53.559132 2016] [cgi:error] [pid 22981] [client ::1:49561] AH01215: Traceback (most recent call last):: /Applications/AMPPS/www/sample/mysql-test.py [Tue Aug 30 15:39:53.559437 2016] [cgi:error] [pid 22981] [client ::1:49561] AH01215: File "/Applications/AMPPS/www/sample/mysql-test.py", line 29, in <module>: /Applications/AMPPS/www/sample/mysql-test.py [Tue Aug 30 15:39:53.559470 2016] [cgi:error] [pid 22981] [client ::1:49561] AH01215: print(text): /Applications/AMPPS/www/sample/mysql-test.py [Tue Aug 30 15:39:53.559601 2016] [cgi:error] [pid 22981] [client ::1:49561] AH01215: UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 595: ordinal not in range(128): /Applications/AMPPS/www/sample/mysql-test.py

Apparently, the print function is encoding to ASCII when it should be to UTF-8. It makes sense as this script is querying a table that has some translations of various texts from english to spanish, and some characters have accent marks or tildes in them. I've tried using the encode("utf-8") function and it prints everything out but in byte strings, so I see all the codes for non-ascii characters. I've searched the entire internet for a solution but no luck. . If anyone knows how to fix this, it would be much appreciated. Thanks

Chris Howell
seal-mask
.a{fill-rule:evenodd;}techdegree seal-36
Chris Howell
Python Web Development Techdegree Graduate 49,702 Points

So I copied your code into a script file, created a local database and set up the proper fields. I threw some random values into those fields and it printed just fine. Though I didnt use any special ascii symbols in those fields.

Could you give me similar sample data as you are using for the Translation table. Just 1 entry for the fields: Page, Element, English, and Spanish is fine and you can completely make it up as long as you use similar special characters since that is where it sounds like it is failing at.

Oziel Perez
Oziel Perez
61,321 Points

Here's some sample data: Page - "About SAS" Element - "overview-p" English - "<p>The success of an accounting firm depends upon how well it understands the needs of its clients and how competently and efficiently it meets those needs. We are committed to developing a relationship with each client that will foster an understanding of that client's needs and, above all, to maintaining the highest technical and ethical standards of our profession. Salinas, Allen & Schmitt, L.L.P. serves a diverse clientele with simple to complex tax, accounting and auditing needs. Typical industries served include:</p>" Spanish - "<p>El éxito de una empresa de contabilidad depende de lo bien que entiende las necesidades de sus clientes y la forma eficiente que satisfaga esas necesidades. Estamos comprometidos a desarrollar una relación con cada cliente que fomente la comprensión de las necesidades de ese cliente y, sobre todo, mantenga los más altos estándares técnicos y éticos de nuestra profesión. SAS LLP sirve a una clientela diversa con necesidades de impuestos, contabilidad y auditoría. Industrias típicamente servidas incluyen:</p>

Not all translation texts have html elements. Some have them just for formatting purposes. This spanish text has an e with an accent which could be tested out.

For a live working example of the text being served, go to https://www.sasllpcpa.com/about_sas/ That page is where you can find the same text, served with php and mysql.

Oziel Perez
Oziel Perez
61,321 Points

Also, I thought semicolons at the end of a statement have no effect on the code, right?

Chris Howell
seal-mask
.a{fill-rule:evenodd;}techdegree seal-36
Chris Howell
Python Web Development Techdegree Graduate 49,702 Points

I copy/pasted all of that sample data into my DB. It had no Unicode error.

But when I added some ALT Characters some had no effect and others did give me a UnicodeEncodeError

The error that you are receiving, can't encode character '\u2019' is Right Single Quote

I am assuming mine didn't fail because when you pasted the content in, teamtreehouse parsed out invalid characters into the proper HTML characters. The only quote mark I am seeing is the one English version of that sample data where it states that will foster an understanding of that client's needs and,

If you copied this from something like WordPad or MS Word and didnt filter input or you directly inserted the text without filtering the text. MS Word and some editors insert invisible characters in their program which may not be compatible with the web.

Oziel Perez
Oziel Perez
61,321 Points

I see.... so basically the solution is... I have to manually go in and replace those characters?

Also, submit your answer so you can get your points

Chris Howell
seal-mask
.a{fill-rule:evenodd;}techdegree seal-36
Chris Howell
Python Web Development Techdegree Graduate 49,702 Points

So that character shouldn't be hard to find, it looks different from a single quote as you see here: '

It has that little "curl" on it if it came from something like MS Word or such.

I had a client that used to always write things in MS Word then paste it to her News updates to her site. So she would always get some crazy symbols that showed up that came directly from MS Word and werent compatible with the web almost always were "fancy quote" marks.

Chris Howell
seal-mask
.a{fill-rule:evenodd;}techdegree seal-36
Chris Howell
Python Web Development Techdegree Graduate 49,702 Points

Also,

I am assuming the Firm Profile and SAS Cares section both load from this as well.

There is a good chance that more than just this one section is effected by a possible "fancy single quote" situation. Once you double check your single quotes, I would go see if the error message changes the position or disappears completely :)

1 Answer

Chris Howell
seal-mask
.a{fill-rule:evenodd;}techdegree seal-36
Chris Howell
Python Web Development Techdegree Graduate 49,702 Points

I copy/pasted all of that sample data into my DB. It had no Unicode error.

But when I added some ALT Characters some had no effect and others did give me a UnicodeEncodeError

The error that you are receiving, can't encode character '\u2019' is Right Single Quote

I am assuming mine didn't fail because when you pasted the content in, teamtreehouse parsed out invalid characters into the proper HTML characters. The only quote mark I am seeing is the one English version of that sample data where it states that will foster an understanding of that client's needs and,

If you copied this from something like WordPad or MS Word and didnt filter input or you directly inserted the text without filtering the text. MS Word and some editors insert invisible characters in their program which may not be compatible with the web.