Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3: port tests in cookie-store #24531

Merged
merged 10 commits into from Jul 28, 2020

Conversation

ziransun
Copy link
Member

@ziransun ziransun commented Jul 9, 2020

No description provided.

cookie-store/resources/cookie_helper.py Outdated Show resolved Hide resolved
cookie-store/resources/cookie_helper.py Outdated Show resolved Hide resolved
cookie-store/resources/cookie_helper.py Outdated Show resolved Hide resolved
cookie-store/resources/cookie_helper.py Outdated Show resolved Hide resolved
cookie-store/resources/cookie_helper.py Outdated Show resolved Hide resolved
@ziransun
Copy link
Member Author

@Hexcles : I actually have some UnicodeEncodeErrors while running in Python 3 (happening before and after code updates with review comments ). For example, running change_eventhandler_for_http_cookie_and_set_cookie_headers.tentative.https.window.html throws

File "/home/ziran/WPT/wpt-commits/wpt3/cookie-store/resources/cookie_helper.py", line 46, in main
args = parse_qs(qd, keep_blank_values = True)
File "/usr/local/lib/python3.8/urllib/parse.py", line 678, in parse_qs
pairs = parse_qsl(qs, keep_blank_values, strict_parsing,
File "/usr/local/lib/python3.8/urllib/parse.py", line 745, in parse_qsl
value = _coerce_result(value)
File "/usr/local/lib/python3.8/urllib/parse.py", line 104, in _encode_result
return obj.encode(encoding, errors)
UnicodeEncodeError: 'ascii' codec can't encode character '\U0001f36a' in position 5: ordinal not in range(128)

and

...
UnicodeEncodeError: 'ascii' codec can't encode character '\ufffd' in position 17: ordinal not in range(128)

I couldn't see anything obvious at the moment. Any suggestions?

@Hexcles
Copy link
Member

Hexcles commented Jul 10, 2020

That's due to the change in behaviours of parse_qs, which now does decoding in Python 3: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.parse_qs (see the encoding parameter).

You'd have to do something like:

if PY3:
    parse_qs(..., encoding='iso-8859-1')

(sigh)

@ziransun
Copy link
Member Author

That's due to the change in behaviours of parse_qs, which now does decoding in Python 3: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.parse_qs (see the encoding parameter).

You'd have to do something like:

if PY3:
    parse_qs(..., encoding='iso-8859-1')

(sigh)

The complaints have changed to :
UnicodeEncodeError: 'ascii' codec can't encode characters in position 5-8: ordinal not in range(128)
and
UnicodeEncodeError: 'ascii' codec can't encode characters in position 17-19: ordinal not in range(128)

I will try to dig a bit more on this. Meanwhile, if you see anything obvious, let me know. Thank you!

@Hexcles
Copy link
Member

Hexcles commented Jul 10, 2020

Maybe paste the full traceback and try to print out the problematic string.

@ziransun
Copy link
Member Author

ziransun commented Jul 10, 2020

Maybe paste the full traceback and try to print out the problematic string.

0:02.28 INFO STDOUT: printing qd:
 0:02.28 INFO STDOUT: b'&set-cookie=HTTP-%F0%9F%8D%AA%3D%F0%9F%94%B5%3B%20path%3D%2F'
 0:02.28 WARNING Traceback (most recent call last):
  File "/home/ziran/WPT/wpt-commits/wpt3/tools/wptserve/wptserve/handlers.py", line 332, in __call__
    rv = self.func(request, response)
  File "/home/ziran/WPT/wpt-commits/wpt3/cookie-store/resources/cookie_helper.py", line 51, in main
    args = parse_qs(qd, keep_blank_values = True, encoding='iso-8859-1')
  File "/usr/local/lib/python3.8/urllib/parse.py", line 678, in parse_qs
    pairs = parse_qsl(qs, keep_blank_values, strict_parsing,
  File "/usr/local/lib/python3.8/urllib/parse.py", line 745, in parse_qsl
    value = _coerce_result(value)
  File "/usr/local/lib/python3.8/urllib/parse.py", line 104, in _encode_result
    return obj.encode(encoding, errors)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 5-8: ordinal not in range(128)

 0:02.42 INFO STDOUT: printing qd:
 0:02.42 INFO STDOUT: b'charset=iso-8859-1&set-cookie=HTTP-cookie%3Dvalue%EF%BF%BD%3B%20path%3D%2F'
 0:02.42 WARNING Traceback (most recent call last):
  File "/home/ziran/WPT/wpt-commits/wpt3/tools/wptserve/wptserve/handlers.py", line 332, in __call__
    rv = self.func(request, response)
  File "/home/ziran/WPT/wpt-commits/wpt3/cookie-store/resources/cookie_helper.py", line 51, in main
    args = parse_qs(qd, keep_blank_values = True, encoding='iso-8859-1')
  File "/usr/local/lib/python3.8/urllib/parse.py", line 678, in parse_qs
    pairs = parse_qsl(qs, keep_blank_values, strict_parsing,
  File "/usr/local/lib/python3.8/urllib/parse.py", line 745, in parse_qsl
    value = _coerce_result(value)
  File "/usr/local/lib/python3.8/urllib/parse.py", line 104, in _encode_result
    return obj.encode(encoding, errors)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 17-19: ordinal not in range(128)

And more here -

FAIL CookieStore agreed with HTTP headers agree on encoding non-ASCII cookies - assert_equals: CGI should have succeeded in setCookieStringHttp set-cookie: HTTP-🍪=🔵; path=/
{"error": {"code": 500, "message": "Traceback (most recent call last):\n  File \"/home/ziran/WPT/wpt-commits/wpt3/tools/wptserve/wptserve/handlers.py\", line 332, in __call__\n    rv = self.func(request, response)\n  File \"/home/ziran/WPT/wpt-commits/wpt3/cookie-store/resources/cookie_helper.py\", line 51, in main\n    args = parse_qs(qd, keep_blank_values = True, encoding='iso-8859-1')\n  File \"/usr/local/lib/python3.8/urllib/parse.py\", line 678, in parse_qs\n    pairs = parse_qsl(qs, keep_blank_values, strict_parsing,\n  File \"/usr/local/lib/python3.8/urllib/parse.py\", line 745, in parse_qsl\n    value = _coerce_result(value)\n  File \"/usr/local/lib/python3.8/urllib/parse.py\", line 104, in _encode_result\n    return obj.encode(encoding, errors)\nUnicodeEncodeError: 'ascii' codec can't encode characters in position 5-8: ordinal not in range(128)\n"}} expected true but got false
    at setCookieStringHttp (https://web-platform.test:8443/cookie-store/resources/cookie-test-helpers.js:108:3)
    at async https://web-platform.test:8443/cookie-store/change_eventhandler_for_http_cookie_and_set_cookie_headers.tentative.https.window.js:53:3
    at async Test.<anonymous> (https://web-platform.test:8443/cookie-store/resources/cookie-test-helpers.js:221:14)
FAIL Binary HTTP set/overwrite/delete observed in CookieStore - assert_equals: CGI should have succeeded in setCookieBinaryHttp set-cookie: HTTP-cookie=value�; path=/
{"error": {"code": 500, "message": "Traceback (most recent call last):\n  File \"/home/ziran/WPT/wpt-commits/wpt3/tools/wptserve/wptserve/handlers.py\", line 332, in __call__\n    rv = self.func(request, response)\n  File \"/home/ziran/WPT/wpt-commits/wpt3/cookie-store/resources/cookie_helper.py\", line 51, in main\n    args = parse_qs(qd, keep_blank_values = True, encoding='iso-8859-1')\n  File \"/usr/local/lib/python3.8/urllib/parse.py\", line 678, in parse_qs\n    pairs = parse_qsl(qs, keep_blank_values, strict_parsing,\n  File \"/usr/local/lib/python3.8/urllib/parse.py\", line 745, in parse_qsl\n    value = _coerce_result(value)\n  File \"/usr/local/lib/python3.8/urllib/parse.py\", line 104, in _encode_result\n    return obj.encode(encoding, errors)\nUnicodeEncodeError: 'ascii' codec can't encode characters in position 17-19: ordinal not in range(128)\n"}} expected true but got false
    at setCookieBinaryHttp (https://web-platform.test:8443/cookie-store/resources/cookie-test-helpers.js:140:3)
    at async https://web-platform.test:8443/cookie-store/change_eventhandler_for_http_cookie_and_set_cookie_headers.tentative.https.window.js:141:3
    at async Test.<anonymous> (https://web-platform.test:8443/cookie-store/resources/cookie-test-helpers.js:221:14)

@ziransun
Copy link
Member Author

@Hexcles : Looking into python3.8/urllib/parse.py, obj.encode(encoding, errors) , a helpers for bytes handling, uses _implicit_encoding = 'ascii'.

I've updated code and decode qd before passing to parse_qs(). Tests passed for both Python 2 and python 3 in my local runs.

Do let me know if I have missed anything. Thank you!

Copy link
Contributor

@inexorabletash inexorabletash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but maybe some refactoring to pull the if PY3 logic out.

cookie-store/resources/cookie_helper.py Outdated Show resolved Hide resolved
Copy link
Member

@Hexcles Hexcles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed investigation. Python 3 really does not like URLs in bytes, and uses ASCII as the implicit encoding unconditionally to coerce the results: https://github.com/python/cpython/blob/b4cd77de05e5bbaa6a4be90f710b787e0790c36f/Lib/urllib/parse.py#L90-L97

@ziransun could you try keeping request.url as text strings (use isomorphic_decode on L45 and remove the PY3-only decode on L49)?

cookie-store/resources/cookie_helper.py Outdated Show resolved Hide resolved
@ziransun
Copy link
Member Author

@ziransun could you try keeping request.url as text strings (use isomorphic_decode on L45 and remove the PY3-only decode on L49)?

In this case, in python 2 we are passing a unicode type string to quote() function. I guess it should be string type? A Python 2 run with this change threw KeyError, e.g.

File .../cookie-store/resources/cookie_helper.py", line 46, in quote_str
    return quote(cookie_str, '')
  File "/usr/lib/python2.7/urllib.py", line 1304, in quote
    return ''.join(map(quoter, s))
KeyError: u'\xef'

@ziransun
Copy link
Member Author

@Hexcles Any further comments on this?

@ziransun ziransun closed this Jul 21, 2020
@ziransun ziransun reopened this Jul 21, 2020
Copy link
Member

@Hexcles Hexcles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. This one slipped through...

cookie-store/resources/cookie_helper.py Outdated Show resolved Hide resolved
cookie-store/resources/cookie_helper.py Outdated Show resolved Hide resolved
Comment on lines 71 to 73
for set_cookie in args.get('set-cookie', []):
if '=' in set_cookie.split(';', 1)[0]:
name, rest = set_cookie.split('=', 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing prefixes here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefixes were deliberately left out here. I think set_cookie here is a native string here. Is it okay?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh right, because it's from parse_qs...

Unfortunately, this will be problematic if/when we start to enforce prefixes for all string literals. I think we'd need to add some logic to both quote_str and parse_qs_str to always return binary strings (add isomorphic_encode in the PY3 branch).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @Hexcles! Code updated. Please review.

body = '\r\n'.join(body)
headers.append(('content-length', str(len(body))))
headers.append((b'content-length', len(body)))
return 200, headers, body
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like body is a native string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code updated.

Copy link
Member

@Hexcles Hexcles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for bearing with me through this!

@Hexcles Hexcles merged commit 79d185a into web-platform-tests:master Jul 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants