New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unicode errors #117
unicode errors #117
Conversation
@@ -131,7 +132,10 @@ def make_json(self, posts, descriptions, previewimage, output_path, lang): | |||
recent_posts.append(entry) | |||
data = json.dumps(recent_posts, indent=2, sort_keys=True) | |||
with io.open(output_path, "w+", encoding="utf8") as outf: | |||
outf.write(data) | |||
if sys.version_info[0] != 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not right.
On Python 2, json.dumps() might return Unicode in some cases. It doesn’t by default; yet we could use to change the default:
data = json.dumps(recent_posts, ensure_ascii=False, indent=2, sort_keys=True)
with io.open(output_path, "w+", encoding="utf-8") as outf:
try:
outf.write(data.decode('utf-8'))
except AttributeError:
outf.write(data)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutley we can do it like that. Don't know why the first commit is wrong though, isn't json utf-8 by design? In ptyhon2 ".decode('utf-8')" works on both string and unicode objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not work properly.
>>> u"ą".decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0105' in position 0: ordinal not in range(128)
Please switch it to the solution I recommended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, in this case we need
except (AttributeError, UnicodeEncodeError, UnicodeDecodeError):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, excepting more thing makes it more clear.
We don't decode random unicode, we decode output from json.dumps:
>>> import json
>>> json.dumps([u"ą"]).decode('utf-8')
u'["\\u0105"]'
>>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just do it with ensure_ascii=False
. More modern.
We no longer care about python 2 |
make decode if we're using python2