Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode errors #117

Closed
wants to merge 2 commits into from
Closed

Conversation

s2hc-johan
Copy link
Contributor

make decode if we're using python2

@@ -131,7 +132,10 @@ def make_json(self, posts, descriptions, previewimage, output_path, lang):
recent_posts.append(entry)
data = json.dumps(recent_posts, indent=2, sort_keys=True)
with io.open(output_path, "w+", encoding="utf8") as outf:
outf.write(data)
if sys.version_info[0] != 3:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not right.

On Python 2, json.dumps() might return Unicode in some cases. It doesn’t by default; yet we could use to change the default:

data = json.dumps(recent_posts, ensure_ascii=False, indent=2, sort_keys=True)
with io.open(output_path, "w+", encoding="utf-8") as outf:
    try:
        outf.write(data.decode('utf-8'))
    except AttributeError:
        outf.write(data)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutley we can do it like that. Don't know why the first commit is wrong though, isn't json utf-8 by design? In ptyhon2 ".decode('utf-8')" works on both string and unicode objects

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not work properly.

>>> u"ą".decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0105' in position 0: ordinal not in range(128)

Please switch it to the solution I recommended.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, in this case we need

except (AttributeError, UnicodeEncodeError, UnicodeDecodeError):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, excepting more thing makes it more clear.

We don't decode random unicode, we decode output from json.dumps:

>>> import json
>>> json.dumps([u"ą"]).decode('utf-8')
u'["\\u0105"]'
>>>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do it with ensure_ascii=False. More modern.

@ralsina
Copy link
Member

ralsina commented May 2, 2018

We no longer care about python 2

@ralsina ralsina closed this May 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants