Drop XSS escaping because it should be aside HTML escaping. #4555

akzhan · 2017-06-12T15:28:00Z

Introduce escape_quotes mode.

Thanks to @straight-shoota for clarification. Cite:

The current implementation is not escaping HTML, but rather Javascript (inside HTML). Therefore it should be named differently (like escape_xss or escape_javascript like in Phoenix and Rails). Whether and how this should be in stdlib is a different issue.

akzhan · 2017-06-12T15:30:34Z

src/html.cr

-    string.gsub(SUBSTITUTIONS)
+  def self.escape(string : String, mode : EscapeMode = EscapeMode::Default) : String
+    subst = case mode
+            when EscapeMode::Default  then DEFAULT_ESCAPE


Default extracted as first one to optimize performance for most use case.

…Escapes '&', '"', '\'', '/', '<' and '>' chars only. XSS escaping still available as a XSS option. Short escaping like Ruby one now available as a Short option. Fixes crystal-lang#3233, refs crystal-lang#2175.

akzhan · 2017-06-12T15:56:59Z

Pull request updated (HTML::EscapeMode::Extended renamed to HTML::EscapeMode::XSS).

CI failed due to unrelated reason.

RX14 · 2017-06-12T17:06:21Z

HTML.escape is a security-related function so I think we need a think about how this function will actually be used as opposed to how it's meant to be used. I think the default should be the highest escaping level because XSS is silent but escaping too much is likely to show up in development.

akzhan · 2017-06-12T17:17:14Z

@RX14 HTML.escape is not security-related function. https://www.w3.org/International/questions/qa-escapes#use

It is meaningful. All HTML-related software depends on it (looks like every Web framework).

Rack-style escaping is most used case.

XSS escaping is another issue.

RX14 · 2017-06-12T17:22:15Z

Except that the reality is that people will use HTML.escape for XSS escaping, regardless of what the internet says.

akzhan · 2017-06-12T17:25:32Z

Firstly I wants to implement separate xss_escape method, but realizes as escaping mode option due to possibility of other flavors of escaping and practice of implementing this behavior using options (like String.downcase).

akzhan · 2017-06-12T17:29:51Z

Anyway previous implementation was incorrect, because unusable in practice, see #3233, for example.

straight-shoota · 2017-06-12T17:39:34Z

What's the significance of escaping a slash? By itself it doesn't have any "dangerous" meaning in (X)HTML. OWASP recommends to escape it but they don't give a good argument and I can't think of any.

Single and double quotes are not necessarily harmful, either. Only if an escaped string is used inside an attribute value. But we can assume that it's better to be safe and escape them by default. I would add an option to disable escaping of quotes, though.

But: When double quotes are escaped, single quotes ought to be escaped as well. In HTML both can be used as delimiters for attributes values. I don't see any reason to escape double quotes, but not single quotes.

There are different interpretations of what is considered to be dangerous in HTML, therefore there are different sets of escape characters in common use:

& <> -> PHP htmlspecialchars (with ENT_NOQUOTES), Python cgi.escape, W3C recommendation
& <> " -> PHP htmlspecialchars, Ruby CGI.escape_html,
& < > " ' -> Python html.escape, Phoenix Phoenix.HTML, Go html.EscapeString, Django, Jinja, PHP htmlspecialchars (with ENT_QUOTES), W3C recommendation
& < > " ' / -> Rack::Utils.escape_html, OWASP recommendation

The default set should be as minimal as possible and as strong as necessary.
I'd suggest to use & < > " ' as default and an option to not escape quotes (i.e. & < >). This would fix the bug and return to the behavior of Rack::Utils except not escaping slash.

XSS is an entirely different thing and should require a detailed discussion.

akzhan · 2017-06-12T18:07:40Z

Thanks, @straight-shoota.

Looks like we need four flavors:

Default - & < > " ' as Python, Go, etc. and W3C recommendation.
CGI - & <> as CGI variant (it's rare case for now).
OWASP - Ruby CGI.escape, PHP htmlspecialchars (with ENT_QUOTES), Rack::Utils.escape_html, OWASP.
XSS - I don't know who use it. It's current implementation.

akzhan · 2017-06-12T18:20:08Z

Hm, I'll remove XSS escaping at all. It should be aside HTML module.

straight-shoota · 2017-06-12T18:30:13Z

Exactly. The current implementation is not escaping HTML, but rather Javascript (inside HTML). Therefore it should be named differently (like escape_xss or escape_javascript like in Phoenix and Rails). Whether and how this should be in stlib is a different issue.

I'm also not sure if there should be a dedicated flavour to escape slashes. Besides Ruby/Rack there don't seem to be any big ones following the OWASP recommendation.

If there are just two sets, the API could be much simplified:

def escape(string : String, escape_quotes = true)
  # ...
end

akzhan · 2017-06-12T18:36:56Z

I prefer enums because they extendable.

String.compare is example of bad decision for now, it should accept enum instead of boolean.

Introduce CGI, Default and OWASP escape modes.

akzhan · 2017-06-12T19:03:16Z

Another option is to use Symbols instead of enum. Symbols preferable to extend stdlib functionality, because enum cannot be reopened AFAIR.

straight-shoota · 2017-06-12T19:15:23Z

I find the usage like HTML.escape(string, escape_quotes: false) much better conveys meaning than HTML.escape(string, HTML::EscapeMode::CGI).
And I honestly don't know how this could be subject to further extensions anyway. I've looked at many implementations and they are all doing pretty much the same, more or less. HTML won't change significantly anytime (like ever).
Even if some sort of web framework wants do provide a different algorithm with other options, this would be a specialised case and should happen in a custom module.

akzhan · 2017-06-12T19:33:29Z

@straight-shoota Ok, you convinced me, Current Default will be true.

ysbaddaden · 2017-06-12T19:39:25Z

Issues talk about problems in HTML.escape when escaping URIs... but this isn't about escaping URIs (which should use URI.encode) but escaping HTML so an untrusted text can be inserted into a HTML structure, without breaking or hijacking the HTML layout, or worse, injecting forms or scripts.

Please detail actual cases and issues where this may problematic. Otherwise this pull request is dangerous, because it makes something that used to be secure, insecure unless manually configured.

akzhan · 2017-06-12T19:42:41Z

@ysbaddaden there is more disambiguation introduced by #2175. @Ryuuzakis made escaping of JavaScript etc. inside HTML.

HTML requires escaping for minor subset of chars. And yes, every known implementation follows one of proposed cases. No XSS at all because it is not related to HTML itself.

akzhan · 2017-06-12T19:53:54Z

So this is just a clear extraction of HTML responsibility.

ysbaddaden · 2017-06-12T20:19:17Z

Just my point: we're doing better!

Are there actual problems to escaping XSS along with HTML? Not merely intellectual (it's not required per se) but practical: this is causing me issues in specific cases.

I suppose we could add an insecure: true parameter (so it's explicit you should never disable it) that would disable XSS escaping, but is it really worth it?

akzhan · 2017-06-12T20:28:11Z

I was pointed to this issue just with = char.

It was very frustrating for me.

But SO has a kind of answer - https://stackoverflow.com/a/13059657/1336858

akzhan · 2017-06-12T20:29:34Z

And anyway - escape is not security related thing.

straight-shoota · 2017-06-12T20:49:59Z

This PR does not make the application of HTML.escape significantly more dangerous unless it was improperly used to escape other things than HTML markup.

Of course, with this reduced set, it can not be entirely prevented that some unwanted alternations might be introduced into a string of HTML. But neither does the existing escape set.

The weak point are unquoted attributes, as explained by OWASP:

[...] developers frequently leave attributes unquoted. Properly quoted attributes can only be escaped with the corresponding quote. Unquoted attributes can be broken out of with many characters, including [space] % * + , - / ; < = > ^ and |.

To rule out every possible hijacking of unquoted attributes,, "all characters with ASCII values less than 256" except alphanumerical characters need to be escaped. Not just the few that are in the current implementation. They make it seem to be better then just those proposed in this PR, but don't add real security, because every other special character in the above code range is still available.

Implementing this strong escape strategy is highly impractical because it completely prohibits usage of those special characters inside HTML-escaped strings - think of a markup processor that prevents direct HTML input, but uses special characters in the escaped text string for HTML-based enhancements.

That's why every language library and web framework I've come across uses only the few escape characters mentioned in this PR. Unquoted attributes with user-input are highly error-prone and cannot generally be made safe without heavy impacts on functionality.

straight-shoota · 2017-06-28T18:59:21Z

~~HTML.unescape should be updated as well to perform the inversion of HTML.escape.~~ (see #3409 (comment))

There is also XML.escape which should share the same implementation (unless we would decide to escape / for HTML as well, because this is only from HTML's past with SGML) and should be accompanied by XML.unescape.

straight-shoota · 2017-08-23T12:31:02Z

What's holding this up? I am confident the reasons for this PR have been properly explained.

ysbaddaden · 2017-08-23T13:45:10Z

Let's sum up:

pro-reasons: standards say you're being over zealous.
con-reasons: better safe than sorry.

Let's close.

ysbaddaden · 2017-08-23T14:31:44Z

Let's be constructive:

This pull request drops a security feature without providing any replacement. We MUST introduce a solution to escape other XSS possibilities.
We MUST introduce a HTML.escape_javascript (for example), following OWASP rule #3.
We MUST introduce a HTML.escape_attribute (for example) following OWASP rule #2.
This pull request introduces different escaping strategies, for no reason. We MUST follow OWASP rule #1, so HTML.escape MUST always escape the following characters:

&  --> &amp;
<  --> &lt;
>  --> &gt;
"  --> &quot;
'  --> &#x27;
/  --> &#x2F;

Last but not least: we MUST document the breaking change as a security issue in the CHANGELOG.

akzhan · 2017-08-23T15:21:11Z

ok, somebody should add escape_javascript and escape_attribute. And drop escape_quotes mode (I'm unsure).

straight-shoota · 2017-08-23T15:22:29Z

I disagree: The current implementation is just buggy and as far as I can tell this is not because it is intended to be a security feature. There is no direct replacement and therefore we can't provide one (because it's just an incorrect implementation). True, we SHOULD add additional escape methods for other contexts but I don't think this MUST necessarily be included in this PR.
Maybe escape_javascript should rather be more like Rails's version than the OWASP recommendation? I'm not sure. But I'd recommend to address this in another PR.
I really don't see any need for this: Having a special escape method for attribute values is only necessary if attributes are written without quotes. Otherwise HTML.escape is perfectly fine. I don't think we should encourage the use of unquoted attribute. Applying OWASP Rule Codegen: fail to generate 'if' when one of the branches has no type #2 even to quoted arguments leads to unnecessary escape sequences and obfuscation of code.
There is a reason for this: If the content returned from HTML.escape is to be used inside a HTML tag like <a <%= HTML.escape(attrs, false) %>> (because you still need to escape < > &) it would be wrong to escape quotes, because href="#" would become <a href="#"> instead of <a href="#">. But the default should be to escape quotes.
As far as I can tell, thus far the changelog has only been updated when there was a new release. I'm not aware if this practice has changed. If not, this should probably be discussed elsewhere.

ysbaddaden · 2017-08-24T10:23:19Z

Endless arguing never leads anywhere. Please implement at least the following and I'll happily merge the pull request:

HTML.escape(unsafe) always escapes all & < > " ' / characters —injecting raw HTML as attributes without escaping quotes doesn't prevent from injected quotes inside attribute values, it's thus insecure, and shouldn't be unsupported.
HTML.escape_javascript(unsafe) with a secure enough list of escaped characters.

straight-shoota · 2017-08-24T13:18:35Z

HTML.escape(unsafe) is already there. Do you really think it hurts to add an option for not escaping quotes? This is after all the recommendation from W3C. I can live without the option, 99% need to escape quotes anyway, but I think it should be there.

For HTML.escape_javascript(unsafe) there are a number of possible implementations, just a few examples:

OWASP Rule #3: Except for alphanumeric characters, escape all characters less than 256 with the \xHH format
Phoenix.HTML.escape_javascript: Escapes quotes (double and single), double backslashes and other. </ \ ' " \r \n \r\n \u2028 \u2029
Rails ActionView::Helpers::JavaScriptHelper.escape_javascript: Escapes carriage returns and single and double quotes for JavaScript segments. </ " ' \n \r \r\n \342\200\250 \342\200\251
Django escapejs: Escapes characters for use in JavaScript strings. \ ' " > < & = - ; \u2028 \u2029
Symfony tag helper escape_javascript: Escape carrier returns and single and double quotes for Javascript segments.: " ' \r \n \r\n

I honestly don't know which one should be implemented. Though I'd tend to the direction of Rails and Phoenix. They prohibit to break out of HTML tag content or attribute contexts as well as Javascript strings. That should be all that is important for this.

Highly inspired by Phoenix.HTML.escape_javascript.

akzhan · 2017-08-27T19:24:24Z

@ysbaddaden I have added escape_javascript, but should mention that it's escaping sequences are very different than HTML ones (any known implementation).

HTML escaping kept as proposed, because it's ok and shared between a lot of languages and frameworks same way (see #4555 (comment) for example).

straight-shoota · 2017-09-05T17:35:05Z

src/html.cr

+  # ```
+  # require "html"
+  #
+  # HTML.escape_javascript("</crystal> \u2028") # => ""<\\/crystal> &#x2028;""


double quotes should be reduced.

fixed, thanks

straight-shoota · 2017-09-06T16:30:46Z

src/html.cr

+  # Encodes a string with JavaScript escaping, but writes to the `IO` instance provided.
+  #
+  # ```
+  # io = IO::Memory.new


It'd be better to use a string builder as example.

straight-shoota · 2017-09-06T19:13:08Z

src/html.cr

    string.each_char do |char|
-      io << SUBSTITUTIONS.fetch(char, char)
+      if previous_char == '\r' && char == '\n'


Maybe this would be better as a case statement to express flow more clearly and be shorter overall.

case when previous_char == '\r' && char == '\n' when previous_char == '<' && char == '/' io << '\\' << '/' else io << ESCAPE_JAVASCRIPT_SUBST.fetch(char, char) end previous_char = char

straight-shoota · 2017-09-06T19:15:25Z

spec/std/html_spec.cr

@@ -10,21 +10,41 @@ describe "HTML" do
    end

    it "escapes dangerous characters from a string" do


Maybe reword to escapes special characters from HTML string?

Done, thanks.

straight-shoota · 2017-09-06T19:16:36Z

spec/std/html_spec.cr

+      str.should eq("safe_string")
+    end
+
+    it "escapes dangerous characters from a string" do


Maybe reword to escape special characters from a JavaScript string? The characters are not "dangerous" per se, they just might convey special meaning.

Done, thanks.

straight-shoota · 2017-09-06T19:24:54Z

src/html.cr

@@ -29,8 +38,8 @@ module HTML
  #
  # HTML.escape("Crystal & You") # => "Crystal &amp; You"
  # ```


It would be nice to list all character substitutions explicitly, so you don't need to look at the source code to figure out what exactly gets escaped and why.

Suggestion:

# Escapes `&`, `<`, `>`, `"` and `'` chars as `&`, `<`, `>`, `"` and `&#27` according to # [OWASP Rule #1](https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.231_-_HTML_Escape_Before_Inserting_Untrusted_Data_into_HTML_Element_Content). # If *escape_quotes* is `false`, `"` and `'` will not be escaped ([W3C recommendation](https://www.w3.org/International/questions/qa-escapes#use)).

The other methods should have similar explanations.

straight-shoota · 2017-09-12T12:11:46Z

@akzhan What do you think about XML.escape? As described above I think it makes no sense to have both XML.escape and HTML.escape wich do the same thing.

Since XML is more general, should we put the method only there? But most people are actually using HTML, so maybe it would be better to have it (only) there. Or make an exception and allow aliases in this case.

It wouldn't be the same if HTML version would also escape / but that doesn't make any sense despite being an OWASP recommendation.

straight-shoota · 2017-09-12T12:45:11Z

And perhaps we should think about making escaping ' optional as well. It's probably not advisable, but certain specifications expect that single quotes are not escaped, for example Common Mark Textual content. Libxml's xmlEncodeSpecialChars also does not escape single quotes (see source). But for some reason it escapes \r as .

I was looking at markd's source code where there is a custom escape method. I figured it could be replaced with the implementation from this PR but if single quotes are escaped, the Common Mark specs won't match.

akzhan · 2017-09-20T08:28:16Z

I'm tired with this PR. Feel free to propose your own.

asterite · 2017-09-20T13:27:04Z

@akzhan Sorry you got tired.

But remember: you don't have to do what everyone says here. You can accept suggestions from anyone, but you only have to make changes that are requested by core team members (and even then you can argue about your decision, in many cases core team members changed their mind).

I sent #5012 to fix this situation.

rdp · 2017-12-28T01:58:24Z

src/html.cr

+    '\u2029' => "&#x2029;",
+    '\n'     => "\\n",
+    '\r'     => "\\n",
+  }


django appears to escape slightly more? https://docs.djangoproject.com/en/1.10/_modules/django/utils/html/#escape Just asking... :)

This PR has been closed and #5012 was merged instead. Currently there is no escape method for javascript.

akzhan commented Jun 12, 2017

View reviewed changes

akzhan force-pushed the classic-HTML.escape branch 2 times, most recently from dd31f0d to aad2f4f Compare June 12, 2017 15:47

HTML.escape - Switch to Rack::Utils.escape_html behavior by default. …

8148350

…Escapes '&', '"', '\'', '/', '<' and '>' chars only. XSS escaping still available as a XSS option. Short escaping like Ruby one now available as a Short option. Fixes crystal-lang#3233, refs crystal-lang#2175.

akzhan force-pushed the classic-HTML.escape branch from aad2f4f to 8148350 Compare June 12, 2017 15:49

Drop XSS escaping because it should be aside HTML escaping.

4c731e8

Introduce CGI, Default and OWASP escape modes.

akzhan changed the title ~~HTML.escape - Switch to Rack::Utils.escape_html behavior by default. …~~ Drop XSS escaping because it should be aside HTML escaping. Jun 12, 2017

oops, formatting

b25f2a8

Follow @straight-shoota to simplify HTML.escape API.

7728d37

straight-shoota mentioned this pull request Jun 28, 2017

add support to many html entities in HTML.unescape #3409

Closed

HTML.escape_javascript with specs.

1a6f502

Highly inspired by Phoenix.HTML.escape_javascript.

doc. update

efc4df2

straight-shoota reviewed Sep 5, 2017

View reviewed changes

typo in doc, thanks @straight-shoota

cb86d0e

straight-shoota reviewed Sep 6, 2017

View reviewed changes

reword example descriptions, thanks @straight-shoota

86c0872

straight-shoota mentioned this pull request Sep 12, 2017

Refactor entities encoder icyleaf/markd#5

Merged

akzhan closed this Sep 20, 2017

akzhan deleted the classic-HTML.escape branch September 23, 2017 22:06

akzhan restored the classic-HTML.escape branch September 23, 2017 22:07

rdp reviewed Dec 28, 2017

View reviewed changes

jhass mentioned this pull request Apr 2, 2020

Stored XSS jhass/carc.in#17

Closed

		@@ -10,21 +10,41 @@ describe "HTML" do
		end

		it "escapes dangerous characters from a string" do

Drop XSS escaping because it should be aside HTML escaping. #4555

Drop XSS escaping because it should be aside HTML escaping. #4555

Conversation

akzhan commented Jun 12, 2017 • edited Loading

Choose a reason for hiding this comment

akzhan commented Jun 12, 2017 • edited Loading

RX14 commented Jun 12, 2017

akzhan commented Jun 12, 2017 • edited Loading

RX14 commented Jun 12, 2017

akzhan commented Jun 12, 2017 • edited Loading

akzhan commented Jun 12, 2017 • edited Loading

straight-shoota commented Jun 12, 2017 • edited Loading

akzhan commented Jun 12, 2017 • edited Loading

akzhan commented Jun 12, 2017 • edited Loading

straight-shoota commented Jun 12, 2017 • edited Loading

akzhan commented Jun 12, 2017

akzhan commented Jun 12, 2017

straight-shoota commented Jun 12, 2017

akzhan commented Jun 12, 2017

ysbaddaden commented Jun 12, 2017

akzhan commented Jun 12, 2017 • edited Loading

akzhan commented Jun 12, 2017

ysbaddaden commented Jun 12, 2017

akzhan commented Jun 12, 2017

akzhan commented Jun 12, 2017

straight-shoota commented Jun 12, 2017

straight-shoota commented Jun 28, 2017 • edited Loading

straight-shoota commented Aug 23, 2017

ysbaddaden commented Aug 23, 2017

ysbaddaden commented Aug 23, 2017 • edited Loading

akzhan commented Aug 23, 2017 • edited Loading

straight-shoota commented Aug 23, 2017 • edited Loading

ysbaddaden commented Aug 24, 2017

straight-shoota commented Aug 24, 2017 • edited Loading

akzhan commented Aug 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

straight-shoota Sep 6, 2017 • edited Loading

Choose a reason for hiding this comment

straight-shoota Sep 6, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

straight-shoota Sep 6, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

straight-shoota Sep 6, 2017 • edited Loading

Choose a reason for hiding this comment

straight-shoota commented Sep 12, 2017

straight-shoota commented Sep 12, 2017

akzhan commented Sep 20, 2017

asterite commented Sep 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akzhan commented Jun 12, 2017 •

edited

Loading

akzhan commented Jun 12, 2017 •

edited

Loading

akzhan commented Jun 12, 2017 •

edited

Loading

akzhan commented Jun 12, 2017 •

edited

Loading

akzhan commented Jun 12, 2017 •

edited

Loading

straight-shoota commented Jun 12, 2017 •

edited

Loading

akzhan commented Jun 12, 2017 •

edited

Loading

akzhan commented Jun 12, 2017 •

edited

Loading

straight-shoota commented Jun 12, 2017 •

edited

Loading

akzhan commented Jun 12, 2017 •

edited

Loading

straight-shoota commented Jun 28, 2017 •

edited

Loading

ysbaddaden commented Aug 23, 2017 •

edited

Loading

akzhan commented Aug 23, 2017 •

edited

Loading

straight-shoota commented Aug 23, 2017 •

edited

Loading

straight-shoota commented Aug 24, 2017 •

edited

Loading

akzhan commented Aug 27, 2017 •

edited

Loading

straight-shoota Sep 6, 2017 •

edited

Loading

straight-shoota Sep 6, 2017 •

edited

Loading

straight-shoota Sep 6, 2017 •

edited

Loading

straight-shoota Sep 6, 2017 •

edited

Loading