Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A method to escape a string into uri safe structure #3515

Closed
wants to merge 3 commits into from
Closed

A method to escape a string into uri safe structure #3515

wants to merge 3 commits into from

Conversation

tbrand
Copy link

@tbrand tbrand commented Nov 7, 2016

Hi there,
There is a critical difference between Ruby and Crystal for an implementation of URI.escape.

require "uri"
p URI.escape("https://crystal-lang.org/")

Then,

$ ruby sample.cr
-> "https://crystal-lang.org/"
$ crystal sample.cr
-> "https%3A%2F%2Fcrystal-lang.org%2F"

So it is difficult to escape a String in uri-safe structure like as Ruby's behavior.
So I added a method to keep the String url structure.

require "uri"
p URI.escape_uri_safe(":/?#[]@!$&\'()*+,;= ")
p URI.escape_uri_safe("https://crystal-lang.org?test=あああ")

Then,

$ crystal sample.cr
-> ":/?#[]@!$&'()*+,;=%20"
   "https://crystal-lang.org?test=%E3%81%82%E3%81%82%E3%81%82"

May be this issue is involved.

Thanks for your review.

taicsuzu added 3 commits November 7, 2016 23:36

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@0x1eef
Copy link

0x1eef commented Nov 7, 2016

it looks like Crystal URI.escape has the same behaviour as URI.encode_www_form_component in Ruby. why not copy Ruby? it has:

  • URI.escape
  • URI.encode_www_form_component
  • URI.decode_www_form_component

@tbrand
Copy link
Author

tbrand commented Nov 7, 2016

Ruby's encode_www_form_component replaces reserved characters as well.

[Ruby]
URI.encode_www_form_component("https://crystal-lang.org") # => https%3A%2F%2Fcrystal-lang.org
URI.escape("https://crystal-lang.org") # => https://crystal-lang.org (I need this!)
[Crystal]
URI.escape("https://crystal-lang.org") # => https%3A%2F%2Fcrystal-lang.org
URI.escape_uri_safe("https://crystal-lang.org") # => https://crystal-lang.org (My implementation)

I agree that the relation between Ruby's implementation and mine is very confusing.
So if it is ok, I want to rename the original 'escape' method to 'encode_www_form_component' and mine to 'escape'.
On the other hand, I need to keep the backward compatibility.
Any comments?

@kostya
Copy link
Contributor

kostya commented Nov 27, 2016

👍

@asterite
Copy link
Member

I wouldn't mind keeping "Ruby compatibility". I'm not sure I understand why so many methods are needed. I see:

URI.escape("https://crystal-lang.org") # => https://crystal-lang.org (I need this!)

Isn't that just the original string? In which case it would make a difference?

@asterite
Copy link
Member

Or, another question, when do you need the behaviour of Ruby's URI.escape?

@kostya
Copy link
Contributor

kostya commented Nov 27, 2016

parts of uri paths usually escaped like this:

ruby

URI.escape '/bla-жж/'
=> "/bla-%D0%B6%D0%B6/"

crystal

URI.escape '/bla-жж/'
=> %2Fbla-%D0%B6%D0%B6%2F"

@kostya
Copy link
Contributor

kostya commented Nov 27, 2016

right now crystal URI.escape = CGI.escape, and no method for URI parts escape

@5t111111
Copy link
Contributor

5t111111 commented Nov 30, 2016

I wouldn't mind keeping "Ruby compatibility". I'm not sure I understand why so many methods are needed. I see:

Although I'm quite aware of that and that there are some actual use-cases for URI.escape, I think you would better pay attention to that URI.escape had become obsolete in Ruby many years ago.

http://stackoverflow.com/questions/34274838/why-is-uri-escape-marked-as-obsolete-and-where-is-this-regexpunsafe-constant/34275499

@asterite
Copy link
Member

@5t111111 Thank you for the link!

I was also having a hard time imagining a use case for this. For example:

  • somebody gives me an URI: it is supposed to be escaped already, right? So nothing to do.
  • I want to build an URI from components (host, schema, etc.): I use URI.new, optionally using URI.escape to escape each of the query params.
  • Other scenario...?

@kostya
Copy link
Contributor

kostya commented Nov 30, 2016

when you parse uri from html, for example, need to normalize it (escape, and ...)

@straight-shoota
Copy link
Member

Another scenario besides creating a URI from individual components is encoding a string which is not directly used in a URI. URI encoding is also relevant for other applications like sending HTTP POST data as x-www-form-urlencoded.

We need the following transformations for strings:

  • https://en.wikipedia.org/wiki/Crystal (programming language) => https://en.wikipedia.org/wiki/Crystal%20(programming%20language) (reserved characters are not escaped)
  • foo=bar baz&hello/ => foo%3Dbar+baz%26hello%2F (reserved charachters are escaped)

These should be implement in two different methods called encode and encode_www_form (not escape: it is an encoding), as most other API's (except Ruby originally) do:

Please note that there are different rules for URI encoding with slight differences. I think it is sufficient to implement only the most recent RFC 3986 (but keep the option to encode space as +)

@straight-shoota
Copy link
Member

@tbrand Are you still up for this?

@asterite
Copy link
Member

asterite commented Jul 8, 2019

I think we should do something about this. We should have a distinction between encoding an URI (Ruby's URI.encode/URI.escape, which also exists in Elixir) and encoding a query component (Ruby's URI.encode_www_form_component, Elixir's URI.encode_www_form) and we should probably use names similar to Ruby/Elixir.

Also worth noting is that in Ruby, URI.escape/URI.encode is deprecated, and they advise using CGI.escape, which works exactly like Crystal's URI.escape.

So... I guess I don't know. But following what Elixir does seems good and safe.

@straight-shoota
Copy link
Member

IIRC I have a branch with a new implementation laying around. I think it was delayed because I did the URI normalize stuff first.

straight-shoota added a commit to straight-shoota/crystal that referenced this pull request Jul 26, 2019
* Adds methods URI.encode and URI.decode for URL encoding without
  escaping reserved characters.
* URI.escape and URI.unescape are renamed to URI.encode_www_form and
URI.decode_www_form.

All these methods have been moved to a separate file in
`src/uri/encoding.cr` to clean up the main file. They're an isolated
domain just live in URI's class namespace.

Method names have been chosen according to the names of other APIs,
especially Elixir's (see crystal-lang#3515 (comment)).
straight-shoota added a commit to straight-shoota/crystal that referenced this pull request Jul 26, 2019
* Adds methods URI.encode and URI.decode for URL encoding without
  escaping reserved characters.
* URI.escape and URI.unescape are renamed to URI.encode_www_form and
URI.decode_www_form.

All these methods have been moved to a separate file in
`src/uri/encoding.cr` to clean up the main file. They're an isolated
domain just live in URI's class namespace.

Method names have been chosen according to the names of other APIs,
especially Elixir's (see crystal-lang#3515 (comment)).
straight-shoota added a commit to straight-shoota/crystal that referenced this pull request Jul 26, 2019
* Adds methods URI.encode and URI.decode for URL encoding without
  escaping reserved characters.
* URI.escape and URI.unescape are renamed to URI.encode_www_form and
URI.decode_www_form.

All these methods have been moved to a separate file in
`src/uri/encoding.cr` to clean up the main file. They're an isolated
domain just live in URI's class namespace.

Method names have been chosen according to the names of other APIs,
especially Elixir's (see crystal-lang#3515 (comment)).
straight-shoota added a commit to straight-shoota/crystal that referenced this pull request Jul 26, 2019
* Adds methods URI.encode and URI.decode for URL encoding without
  escaping reserved characters.
* URI.escape and URI.unescape are renamed to URI.encode_www_form and
URI.decode_www_form.

All these methods have been moved to a separate file in
`src/uri/encoding.cr` to clean up the main file. They're an isolated
domain just live in URI's class namespace.

Method names have been chosen according to the names of other APIs,
especially Elixir's (see crystal-lang#3515 (comment)).
@RX14 RX14 closed this in #7997 Jul 29, 2019
dnamsons pushed a commit to dnamsons/crystal that referenced this pull request Jan 10, 2020
* Adds methods URI.encode and URI.decode for URL encoding without
  escaping reserved characters.
* URI.escape and URI.unescape are renamed to URI.encode_www_form and
URI.decode_www_form.

All these methods have been moved to a separate file in
`src/uri/encoding.cr` to clean up the main file. They're an isolated
domain just live in URI's class namespace.

Method names have been chosen according to the names of other APIs,
especially Elixir's (see crystal-lang#3515 (comment)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants