Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow invalid values to be set in XML::Node #5200

Merged
merged 3 commits into from Nov 4, 2017

Conversation

RX14
Copy link
Contributor

@RX14 RX14 commented Oct 28, 2017

Fixes #4089.

Previously, invalid content and name values could be set for XML nodes, including values that had no valid encoding in XML (such as \0).

src/xml/node.cr Outdated
raise XML::Error.new("Cannot escape string containing NUL character", 0)
end

string.gsub(SUBSTITUTIONS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use HTML.escape?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's not HTML?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it does the same thing, there is no need to implement this again. That's why XML.escape was removed in favour of HTML.escape.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, it's not exactly the same: HTML.escape encodes ' as &x39; instead of ' because the named entity is not defined for HTML4. But it's still valid XML and no big deal. We could even consider changing the behaviour for HTML.escape because in other places (like HTML.unescape) we support only HTML5 anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, I really don't want to use a module called HTML in a module for XML. It's simply wrong. They're different standards and people working on the HTML escape method won't think about XML when they're modifying it. Besides, it makes very little difference either way because both methods are unlikely to change for a long long time.

Copy link
Contributor Author

@RX14 RX14 Oct 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, XML.escape was used because you shouldn't need it, because we have facilities to build XML built in. We don't have a HTML::Builder so HTML.escape has a purpose.

I wouldn't be against adding XML.escape back again though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the code is put in anyway (as suggested in this PR), it might as well be exposed in the public API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that necessarily follows, personally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, true. But if it's there - why not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there's no point in adding a method as a public API when it's not needed. It means that if later we find some other way to implement this we might need to remove XML.escape and introduce a breaking change. Do you need XML.escape? Does someone else requested it? No. So please let's stick to this PRs goal, without adding more and more APIs. We can't keep adding API, we need stabilization for once.

@asterite
Copy link
Member

@RX14 Are you sure this fixes the issue? XML::Builder doesn't build up nodes, it's an entirely different API. I think escape should be used in both places. Basically, this fixes something, but not exactly the issue in #4809

@RX14
Copy link
Contributor Author

RX14 commented Oct 30, 2017

@asterite I don't understand. XML::Builder is entirely unrelated to XML::Node and uses an entirely different libxml2 API. This PR simply uses the libxml2 API correctly by escaping manually the arguments to the right methods. I only looked at XML::Builder a little bit but it appears that XML api is designed differently and does escaping internally. Please correct me if i'm wrong.

@asterite
Copy link
Member

@RX14 The snippet in the issue is:

# example.cr
require "xml"

xml = XML.build(indent: 2) do |xml|
  xml.element("example", number: "1") do
    xml.text "foo\0bar"
  end
  xml.element("example", number: "2") do
    xml.text "foo\u{08}bar"
  end
end

puts xml

Does it work well with this PR?

@RX14
Copy link
Contributor Author

RX14 commented Oct 30, 2017

Yeah, looks like I didn't read my own issue properly. This does solve a valid issue though, so i'll just update it to fix them all.

@RX14 RX14 added the pr:wip label Oct 30, 2017
@asterite
Copy link
Member

@RX14 To be honest, I wanted to solve this too and that's when I replied this. But that's also for nodes, not for XML::Builder... so we kinda did the same thing :-)

(well, except you actually fixed on of the issues)

@RX14
Copy link
Contributor Author

RX14 commented Oct 30, 2017

@asterite we essentially just need to raise on NUL for the builder, I think.

src/xml/node.cr Outdated

name = name.to_s

if name.includes?('\0') || LibXML.xmlValidateNameValue(name) == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can call name.check_no_null_byte before this if

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather that we raised a single exception type for the case in which the name is invalid, instead of different ones based on how it's invalid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. But for \0 a message like "contains a null byte" might be more helpful.

@RX14 RX14 force-pushed the bugfix/xml-special-entities branch 2 times, most recently from 42dd787 to c2152fb Compare November 2, 2017 23:49
@RX14 RX14 force-pushed the bugfix/xml-special-entities branch from c2152fb to 62a1929 Compare November 2, 2017 23:52
@RX14
Copy link
Contributor Author

RX14 commented Nov 4, 2017

This pr is finished and ready to be reviewed again btw

@RX14 RX14 removed the pr:wip label Nov 4, 2017
@@ -275,8 +275,17 @@ struct XML::Builder
raise XML::Error.new("Error in #{msg}", 0) if ret < 0
end

private def string_to_unsafe(string)
string ? string.to_unsafe : Pointer(UInt8).null
private def string_to_unsafe(string : String?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just def string_to_unsafe(string : Nil) since you have the String overload below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to comment exactly the same thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@RX14 RX14 force-pushed the bugfix/xml-special-entities branch from 62a1929 to 412ca48 Compare November 4, 2017 15:11
@RX14 RX14 added this to the Next milestone Nov 4, 2017
@RX14 RX14 merged commit 39b8e88 into crystal-lang:master Nov 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants