-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Truffle fix match backref #4081
Conversation
|
||
if (matchData != nil()) { | ||
return matcher.getBegin(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really following, where is the copy here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's being made by the caller. Not ideal, but it didn't seem very useful to allocate a string from to_s
and then immediately dup
it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing guarantees to_s
actually duplicates the original string.
@source
is only ever used in #string
(where it's copied) and MatchData.pre_match_from
and even then it returns a copy since byteslice
is called.
I think it's safe and OK to not do an explicit copy and keep it simple here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of the String specialization, I continue to dup
it. I guess strictly speaking some non-String object could cache its to_s
output, but wouldn't that violate Ruby semantics? Even Symbols get newly allocated Strings from to_s
.
We do need an explicit copy (or, switch to storing taint state and a rope and updating everything using the source string as necessary). The change was made to fix an issue whereby you can mutate the source string from outside the MatchData via the shared reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you mutate it? It's only accessible from the regexp.rb source file and never returned directly isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was what I thought originally and why I introduced the bug. But the following fragment shows how that can fall apart:
x = "abc"
x =~ /(.*)/
x.upcase!
Without making a defensive copy, $~.string
will be "ABC"
, rather than "abc"
, which is what the match was really made against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right, my apologies. It's a mutation the other way round. Please mention this in the comment.
Why not using the makeSubstringNode(0,length)
in matchCommon
then?
This seems to express the goal more clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment already says we need to make a defensive copy to prevent modifications. I take it you're looking for more details about that?
Re: the substring.... Unfortunately, it's not that easy because that substring method is really only dealing with things at the rope level and won't carry forward the taint state. We can deal with that within matchCommon, but I don't think it'll wind up much cleaner and passing a frame around gets complicated with the boundary in place.
Really, I want to overhaul all this anyway. I think when we take a pass over the regexp subsystem, a lot of this will get reworked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, nothing like a good Ruby example to clarify it :)
Right, this matchCommon
seems like it's asking to become a node, but let's do that later.
…e invoke to cut out an extra frame.
…ng support for 'operator' mode.
Prior to this change we treated $~ as thread-local. While that worked in many cases, it did mean that the match data lived longer than it should. Fixing this is a bit tricky because generally the match data should not propagate up the call stack beyond the caller, but we implement much of our core library in Ruby itself. So we must propagate values back to the user's call, and then cease propagating up the call stack.
Due to the messiness of propagating $~, Rubinius opted to copy-and-paste these methods and we carried that forward. With the recent reimplementation of $~ propagation, we no longer need the duplication.
We store a copy of the source string in our MatchData instances to make it easier to perform some operations in Ruby. Additionally, we need it to track the taint state. We could store the taint state and the string's rope as separate fields to avoid issues of mutability, but it would preclude writing operations in Ruby without allocating a new string for each such operation. Duping the strings seems like the safest option currently, but this should be revisited.
Due to the messiness of propagating $~, Rubinius opted to copy-and-paste these methods and we carried that forward. With the recent reimplementation of $~ propagation, we no longer need the duplication.
bae45f6
to
bee562d
Compare
@jruby/truffle
This fixes $~ to be frame-local (and thread-local), which addresses some issues where values were leaking in places they shouldn't have been. I don't think I've handled every case accurately, but I do think this is more accurate than what we had before and all specs continue to pass.
Since it was a bit involved, I'd appreciate a careful review.