Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truffle fix match backref #4081

Merged
merged 15 commits into from
Aug 26, 2016
Merged

Truffle fix match backref #4081

merged 15 commits into from
Aug 26, 2016

Conversation

nirvdrum
Copy link
Contributor

@jruby/truffle

This fixes $~ to be frame-local (and thread-local), which addresses some issues where values were leaking in places they shouldn't have been. I don't think I've handled every case accurately, but I do think this is more accurate than what we had before and all specs continue to pass.

Since it was a bit involved, I'd appreciate a careful review.


if (matchData != nil()) {
return matcher.getBegin();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really following, where is the copy here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's being made by the caller. Not ideal, but it didn't seem very useful to allocate a string from to_s and then immediately dup it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing guarantees to_s actually duplicates the original string.
@source is only ever used in #string (where it's copied) and MatchData.pre_match_from and even then it returns a copy since byteslice is called.
I think it's safe and OK to not do an explicit copy and keep it simple here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of the String specialization, I continue to dup it. I guess strictly speaking some non-String object could cache its to_s output, but wouldn't that violate Ruby semantics? Even Symbols get newly allocated Strings from to_s.

We do need an explicit copy (or, switch to storing taint state and a rope and updating everything using the source string as necessary). The change was made to fix an issue whereby you can mutate the source string from outside the MatchData via the shared reference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you mutate it? It's only accessible from the regexp.rb source file and never returned directly isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was what I thought originally and why I introduced the bug. But the following fragment shows how that can fall apart:

x = "abc"
x =~ /(.*)/
x.upcase!

Without making a defensive copy, $~.string will be "ABC", rather than "abc", which is what the match was really made against.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, my apologies. It's a mutation the other way round. Please mention this in the comment.
Why not using the makeSubstringNode(0,length) in matchCommon then?
This seems to express the goal more clearly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment already says we need to make a defensive copy to prevent modifications. I take it you're looking for more details about that?

Re: the substring.... Unfortunately, it's not that easy because that substring method is really only dealing with things at the rope level and won't carry forward the taint state. We can deal with that within matchCommon, but I don't think it'll wind up much cleaner and passing a frame around gets complicated with the boundary in place.

Really, I want to overhaul all this anyway. I think when we take a pass over the regexp subsystem, a lot of this will get reworked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, nothing like a good Ruby example to clarify it :)

Right, this matchCommon seems like it's asking to become a node, but let's do that later.

Prior to this change we treated $~ as thread-local. While that worked in many cases, it did mean that the match data lived longer than it should. Fixing this is a bit tricky because generally the match data should not propagate up the call stack beyond the caller, but we implement much of our core library in Ruby itself. So we must propagate values back to the user's call, and then cease propagating up the call stack.
Due to the messiness of propagating $~, Rubinius opted to copy-and-paste these methods and we carried that forward. With the recent reimplementation of $~ propagation, we no longer need the duplication.
We store a copy of the source string in our MatchData instances to make it easier to perform some operations in Ruby. Additionally, we need it to track the taint state. We could store the taint state and the string's rope as separate fields to avoid issues of mutability, but it would preclude writing operations in Ruby without allocating a new string for each such operation. Duping the strings seems like the safest option currently, but this should be revisited.
 Due to the messiness of propagating $~, Rubinius opted to copy-and-paste these methods and we carried that forward. With the recent reimplementation of $~ propagation, we no longer need the duplication.
@nirvdrum nirvdrum force-pushed the truffle-fix_match_backref branch from bae45f6 to bee562d Compare August 26, 2016 17:43
@nirvdrum nirvdrum added this to the truffle-dev milestone Aug 26, 2016
@nirvdrum nirvdrum merged commit f1b2a0e into truffle-head Aug 26, 2016
@nirvdrum nirvdrum deleted the truffle-fix_match_backref branch August 26, 2016 17:44
@enebo enebo added this to the Invalid or Duplicate milestone Dec 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants