Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop XSS escaping because it should be aside HTML escaping. #4555

Closed
wants to merge 9 commits into from

Conversation

akzhan
Copy link
Contributor

@akzhan akzhan commented Jun 12, 2017

Introduce escape_quotes mode.

Fixes #3233, refs #2175.

Thanks to @straight-shoota for clarification. Cite:

The current implementation is not escaping HTML, but rather Javascript (inside HTML). Therefore it should be named differently (like escape_xss or escape_javascript like in Phoenix and Rails). Whether and how this should be in stdlib is a different issue.

src/html.cr Outdated
string.gsub(SUBSTITUTIONS)
def self.escape(string : String, mode : EscapeMode = EscapeMode::Default) : String
subst = case mode
when EscapeMode::Default then DEFAULT_ESCAPE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default extracted as first one to optimize performance for most use case.

@akzhan akzhan force-pushed the classic-HTML.escape branch 2 times, most recently from dd31f0d to aad2f4f Compare June 12, 2017 15:47
…Escapes '&', '"', '\'', '/', '<' and '>' chars only.

XSS escaping still available as a XSS option.

Short escaping like Ruby one now available as a Short option.

Fixes crystal-lang#3233, refs crystal-lang#2175.
@akzhan
Copy link
Contributor Author

akzhan commented Jun 12, 2017

Pull request updated (HTML::EscapeMode::Extended renamed to HTML::EscapeMode::XSS).

CI failed due to unrelated reason.

@RX14
Copy link
Contributor

RX14 commented Jun 12, 2017

HTML.escape is a security-related function so I think we need a think about how this function will actually be used as opposed to how it's meant to be used. I think the default should be the highest escaping level because XSS is silent but escaping too much is likely to show up in development.

@akzhan
Copy link
Contributor Author

akzhan commented Jun 12, 2017

@RX14 HTML.escape is not security-related function. https://www.w3.org/International/questions/qa-escapes#use

It is meaningful. All HTML-related software depends on it (looks like every Web framework).

Rack-style escaping is most used case.

XSS escaping is another issue.

@RX14
Copy link
Contributor

RX14 commented Jun 12, 2017

Except that the reality is that people will use HTML.escape for XSS escaping, regardless of what the internet says.

@akzhan
Copy link
Contributor Author

akzhan commented Jun 12, 2017

Firstly I wants to implement separate xss_escape method, but realizes as escaping mode option due to possibility of other flavors of escaping and practice of implementing this behavior using options (like String.downcase).

@akzhan
Copy link
Contributor Author

akzhan commented Jun 12, 2017

Anyway previous implementation was incorrect, because unusable in practice, see #3233, for example.

@straight-shoota
Copy link
Member

straight-shoota commented Jun 12, 2017

What's the significance of escaping a slash? By itself it doesn't have any "dangerous" meaning in (X)HTML. OWASP recommends to escape it but they don't give a good argument and I can't think of any.

Single and double quotes are not necessarily harmful, either. Only if an escaped string is used inside an attribute value. But we can assume that it's better to be safe and escape them by default. I would add an option to disable escaping of quotes, though.

But: When double quotes are escaped, single quotes ought to be escaped as well. In HTML both can be used as delimiters for attributes values. I don't see any reason to escape double quotes, but not single quotes.

There are different interpretations of what is considered to be dangerous in HTML, therefore there are different sets of escape characters in common use:

  • & <> -> PHP htmlspecialchars (with ENT_NOQUOTES), Python cgi.escape, W3C recommendation
  • & <> " -> PHP htmlspecialchars, Ruby CGI.escape_html,
  • & < > " ' -> Python html.escape, Phoenix Phoenix.HTML, Go html.EscapeString, Django, Jinja, PHP htmlspecialchars (with ENT_QUOTES), W3C recommendation
  • & < > " ' / -> Rack::Utils.escape_html, OWASP recommendation

The default set should be as minimal as possible and as strong as necessary.
I'd suggest to use & < > " ' as default and an option to not escape quotes (i.e. & < >). This would fix the bug and return to the behavior of Rack::Utils except not escaping slash.

XSS is an entirely different thing and should require a detailed discussion.

@akzhan
Copy link
Contributor Author

akzhan commented Jun 12, 2017

Thanks, @straight-shoota.

Looks like we need four flavors:

  • Default - & < > " ' as Python, Go, etc. and W3C recommendation.
  • CGI - & <> as CGI variant (it's rare case for now).
  • OWASP - Ruby CGI.escape, PHP htmlspecialchars (with ENT_QUOTES), Rack::Utils.escape_html, OWASP.
  • XSS - I don't know who use it. It's current implementation.

@akzhan
Copy link
Contributor Author

akzhan commented Jun 12, 2017

Hm, I'll remove XSS escaping at all. It should be aside HTML module.

@straight-shoota
Copy link
Member

straight-shoota commented Jun 12, 2017

Exactly. The current implementation is not escaping HTML, but rather Javascript (inside HTML). Therefore it should be named differently (like escape_xss or escape_javascript like in Phoenix and Rails). Whether and how this should be in stlib is a different issue.

I'm also not sure if there should be a dedicated flavour to escape slashes. Besides Ruby/Rack there don't seem to be any big ones following the OWASP recommendation.

If there are just two sets, the API could be much simplified:

def escape(string : String, escape_quotes = true)
  # ...
end

@akzhan
Copy link
Contributor Author

akzhan commented Jun 12, 2017

I prefer enums because they extendable.

String.compare is example of bad decision for now, it should accept enum instead of boolean.

Introduce CGI, Default and OWASP escape modes.
@akzhan akzhan changed the title HTML.escape - Switch to Rack::Utils.escape_html behavior by default. … Drop XSS escaping because it should be aside HTML escaping. Jun 12, 2017
@akzhan
Copy link
Contributor Author

akzhan commented Jun 12, 2017

Another option is to use Symbols instead of enum. Symbols preferable to extend stdlib functionality, because enum cannot be reopened AFAIR.

@straight-shoota
Copy link
Member

I find the usage like HTML.escape(string, escape_quotes: false) much better conveys meaning than HTML.escape(string, HTML::EscapeMode::CGI).
And I honestly don't know how this could be subject to further extensions anyway. I've looked at many implementations and they are all doing pretty much the same, more or less. HTML won't change significantly anytime (like ever).
Even if some sort of web framework wants do provide a different algorithm with other options, this would be a specialised case and should happen in a custom module.

@akzhan
Copy link
Contributor Author

akzhan commented Jun 12, 2017

@straight-shoota Ok, you convinced me, Current Default will be true.

@ysbaddaden
Copy link
Contributor

Issues talk about problems in HTML.escape when escaping URIs... but this isn't about escaping URIs (which should use URI.encode) but escaping HTML so an untrusted text can be inserted into a HTML structure, without breaking or hijacking the HTML layout, or worse, injecting forms or scripts.

Please detail actual cases and issues where this may problematic. Otherwise this pull request is dangerous, because it makes something that used to be secure, insecure unless manually configured.

@akzhan
Copy link
Contributor Author

akzhan commented Jun 12, 2017

@ysbaddaden there is more disambiguation introduced by #2175. @Ryuuzakis made escaping of JavaScript etc. inside HTML.

HTML requires escaping for minor subset of chars. And yes, every known implementation follows one of proposed cases. No XSS at all because it is not related to HTML itself.

@akzhan
Copy link
Contributor Author

akzhan commented Jun 12, 2017

So this is just a clear extraction of HTML responsibility.

@ysbaddaden
Copy link
Contributor

Just my point: we're doing better!

Are there actual problems to escaping XSS along with HTML? Not merely intellectual (it's not required per se) but practical: this is causing me issues in specific cases.

I suppose we could add an insecure: true parameter (so it's explicit you should never disable it) that would disable XSS escaping, but is it really worth it?

@akzhan
Copy link
Contributor Author

akzhan commented Jun 12, 2017

I was pointed to this issue just with = char.

It was very frustrating for me.

But SO has a kind of answer - https://stackoverflow.com/a/13059657/1336858

@akzhan
Copy link
Contributor Author

akzhan commented Jun 12, 2017

And anyway - escape is not security related thing.

@straight-shoota
Copy link
Member

This PR does not make the application of HTML.escape significantly more dangerous unless it was improperly used to escape other things than HTML markup.

Of course, with this reduced set, it can not be entirely prevented that some unwanted alternations might be introduced into a string of HTML. But neither does the existing escape set.

The weak point are unquoted attributes, as explained by OWASP:

[...] developers frequently leave attributes unquoted. Properly quoted attributes can only be escaped with the corresponding quote. Unquoted attributes can be broken out of with many characters, including [space] % * + , - / ; < = > ^ and |.

To rule out every possible hijacking of unquoted attributes,, "all characters with ASCII values less than 256" except alphanumerical characters need to be escaped. Not just the few that are in the current implementation. They make it seem to be better then just those proposed in this PR, but don't add real security, because every other special character in the above code range is still available.

Implementing this strong escape strategy is highly impractical because it completely prohibits usage of those special characters inside HTML-escaped strings - think of a markup processor that prevents direct HTML input, but uses special characters in the escaped text string for HTML-based enhancements.

That's why every language library and web framework I've come across uses only the few escape characters mentioned in this PR. Unquoted attributes with user-input are highly error-prone and cannot generally be made safe without heavy impacts on functionality.

@straight-shoota
Copy link
Member

straight-shoota commented Jun 28, 2017

HTML.unescape should be updated as well to perform the inversion of HTML.escape. (see #3409 (comment))

There is also XML.escape which should share the same implementation (unless we would decide to escape / for HTML as well, because this is only from HTML's past with SGML) and should be accompanied by XML.unescape.

@straight-shoota
Copy link
Member

What's holding this up? I am confident the reasons for this PR have been properly explained.

@ysbaddaden
Copy link
Contributor

Let's sum up:

  • pro-reasons: standards say you're being over zealous.
  • con-reasons: better safe than sorry.

Let's close.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Aug 23, 2017

Let's be constructive:

  1. This pull request drops a security feature without providing any replacement. We MUST introduce a solution to escape other XSS possibilities.

  2. We MUST introduce a HTML.escape_javascript (for example), following OWASP rule #3.

  3. We MUST introduce a HTML.escape_attribute (for example) following OWASP rule #2.

  4. This pull request introduces different escaping strategies, for no reason. We MUST follow OWASP rule #1, so HTML.escape MUST always escape the following characters:

&  --> &amp;
<  --> &lt;
>  --> &gt;
"  --> &quot;
'  --> &#x27;
/  --> &#x2F;
  1. Last but not least: we MUST document the breaking change as a security issue in the CHANGELOG.

@akzhan
Copy link
Contributor Author

akzhan commented Aug 23, 2017

ok, somebody should add escape_javascript and escape_attribute. And drop escape_quotes mode (I'm unsure).

@straight-shoota
Copy link
Member

straight-shoota commented Aug 23, 2017

  1. I disagree: The current implementation is just buggy and as far as I can tell this is not because it is intended to be a security feature. There is no direct replacement and therefore we can't provide one (because it's just an incorrect implementation). True, we SHOULD add additional escape methods for other contexts but I don't think this MUST necessarily be included in this PR.

  2. Maybe escape_javascript should rather be more like Rails's version than the OWASP recommendation? I'm not sure. But I'd recommend to address this in another PR.

  3. I really don't see any need for this: Having a special escape method for attribute values is only necessary if attributes are written without quotes. Otherwise HTML.escape is perfectly fine. I don't think we should encourage the use of unquoted attribute. Applying OWASP Rule Codegen: fail to generate 'if' when one of the branches has no type #2 even to quoted arguments leads to unnecessary escape sequences and obfuscation of code.

  4. There is a reason for this: If the content returned from HTML.escape is to be used inside a HTML tag like <a <%= HTML.escape(attrs, false) %>> (because you still need to escape < > &) it would be wrong to escape quotes, because href="#" would become <a href=&quot;#&quot;> instead of <a href="#">. But the default should be to escape quotes.

  5. As far as I can tell, thus far the changelog has only been updated when there was a new release. I'm not aware if this practice has changed. If not, this should probably be discussed elsewhere.

@ysbaddaden
Copy link
Contributor

Endless arguing never leads anywhere. Please implement at least the following and I'll happily merge the pull request:

  • HTML.escape(unsafe) always escapes all & < > " ' / characters —injecting raw HTML as attributes without escaping quotes doesn't prevent from injected quotes inside attribute values, it's thus insecure, and shouldn't be unsupported.
  • HTML.escape_javascript(unsafe) with a secure enough list of escaped characters.

@straight-shoota
Copy link
Member

straight-shoota commented Aug 24, 2017

HTML.escape(unsafe) is already there. Do you really think it hurts to add an option for not escaping quotes? This is after all the recommendation from W3C. I can live without the option, 99% need to escape quotes anyway, but I think it should be there.

For HTML.escape_javascript(unsafe) there are a number of possible implementations, just a few examples:

I honestly don't know which one should be implemented. Though I'd tend to the direction of Rails and Phoenix. They prohibit to break out of HTML tag content or attribute contexts as well as Javascript strings. That should be all that is important for this.

Highly inspired by Phoenix.HTML.escape_javascript.
@akzhan
Copy link
Contributor Author

akzhan commented Aug 27, 2017

@ysbaddaden I have added escape_javascript, but should mention that it's escaping sequences are very different than HTML ones (any known implementation).

HTML escaping kept as proposed, because it's ok and shared between a lot of languages and frameworks same way (see #4555 (comment) for example).

src/html.cr Outdated
# ```
# require "html"
#
# HTML.escape_javascript("</crystal> \u2028") # => ""<\\/crystal> &#x2028;""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double quotes should be reduced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks

# Encodes a string with JavaScript escaping, but writes to the `IO` instance provided.
#
# ```
# io = IO::Memory.new
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be better to use a string builder as example.

string.each_char do |char|
io << SUBSTITUTIONS.fetch(char, char)
if previous_char == '\r' && char == '\n'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this would be better as a case statement to express flow more clearly and be shorter overall.

case
when previous_char == '\r' && char == '\n'
when previous_char == '<' && char == '/'
  io << '\\' << '/'
else
  io << ESCAPE_JAVASCRIPT_SUBST.fetch(char, char)
end
previous_char = char

@@ -10,21 +10,41 @@ describe "HTML" do
end

it "escapes dangerous characters from a string" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reword to escapes special characters from HTML string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

str.should eq("safe_string")
end

it "escapes dangerous characters from a string" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reword to escape special characters from a JavaScript string? The characters are not "dangerous" per se, they just might convey special meaning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

@@ -29,8 +38,8 @@ module HTML
#
# HTML.escape("Crystal & You") # => "Crystal &amp; You"
# ```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to list all character substitutions explicitly, so you don't need to look at the source code to figure out what exactly gets escaped and why.

Suggestion:

# Escapes `&`, `<`,  `>`, `"` and `'` chars as `&amp;`, `&lt;`, `&gt;`, `&quot;` and `&#27` according to 
# [OWASP Rule #1](https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.231_-_HTML_Escape_Before_Inserting_Untrusted_Data_into_HTML_Element_Content).
# If *escape_quotes* is `false`, `"` and `'` will not be escaped ([W3C recommendation](https://www.w3.org/International/questions/qa-escapes#use)).

The other methods should have similar explanations.

@straight-shoota
Copy link
Member

@akzhan What do you think about XML.escape? As described above I think it makes no sense to have both XML.escape and HTML.escape wich do the same thing.

Since XML is more general, should we put the method only there? But most people are actually using HTML, so maybe it would be better to have it (only) there. Or make an exception and allow aliases in this case.

It wouldn't be the same if HTML version would also escape / but that doesn't make any sense despite being an OWASP recommendation.

@straight-shoota
Copy link
Member

And perhaps we should think about making escaping ' optional as well. It's probably not advisable, but certain specifications expect that single quotes are not escaped, for example Common Mark Textual content. Libxml's xmlEncodeSpecialChars also does not escape single quotes (see source). But for some reason it escapes \r as &#13;.

I was looking at markd's source code where there is a custom escape method. I figured it could be replaced with the implementation from this PR but if single quotes are escaped, the Common Mark specs won't match.

@akzhan
Copy link
Contributor Author

akzhan commented Sep 20, 2017

I'm tired with this PR. Feel free to propose your own.

@akzhan akzhan closed this Sep 20, 2017
@asterite
Copy link
Member

@akzhan Sorry you got tired.

But remember: you don't have to do what everyone says here. You can accept suggestions from anyone, but you only have to make changes that are requested by core team members (and even then you can argue about your decision, in many cases core team members changed their mind).

I sent #5012 to fix this situation.

@akzhan akzhan deleted the classic-HTML.escape branch September 23, 2017 22:06
@akzhan akzhan restored the classic-HTML.escape branch September 23, 2017 22:07
src/html.cr Outdated
'\u2029' => "&#x2029;",
'\n' => "\\n",
'\r' => "\\n",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

django appears to escape slightly more? https://docs.djangoproject.com/en/1.10/_modules/django/utils/html/#escape Just asking... :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has been closed and #5012 was merged instead. Currently there is no escape method for javascript.

@jhass jhass mentioned this pull request Apr 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants