Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String#rpartition #3429

Closed
wants to merge 25 commits into from
Closed

Conversation

johnjansen
Copy link
Contributor

No description provided.

{"", "", self}
else
pre = self[0 .. (pos - 1)]
post = self[(pos + search_size) .. -1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run the formatter tool.

describe "by char" do
"hello".rpartition('l').should eq ({"hel", "l", "o"})
"hello".rpartition('o').should eq ({"", "", "hello"})
"hello".rpartition('h').should eq ({"", "", "hello"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why if the char is at the beginning or at the end of the string is the same as if they were not found at all?

> "hello".rpartition("o")
=> ["hell", "o", ""]
> "hello".rpartition("h")
=> ["", "h", "ello"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats me overthinking ... shoulda checked ruby ;-)

search_size = search.is_a?(Char) ? 1 : search.size

if pos == 0 \
|| pos == self.size-1 \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some inconsistencies in the formatting. Use the crystal formatter tool otherwise the CI will complain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Brian, i wasn't aware of the formatting tool i'm afraid, so apologies, i will look into that asap


pre = ""
mid = ""
post = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre, mid, post = {"", "", ""}

pre = self[0..(pos - 1)]
mid = search.to_s
post = self[(pos + search_size)..-1]
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO would look better with case switch:

case pos
when .nil?
  # ...
when 0
  # ...
else
  # ...
end


pre, mid, post = {"", "", ""}

if match_result.nil?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto :)

# If it is not found, returns two empty strings and str.
#
# ```
# "hello".rpartition("l") # => ["hel", "l", "o"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example shows that rpartition returns an array, but that is not the case. Please check that the code samples in the docs matches the real output.

"hello".rpartition("he").should eq ({"", "he", "llo"})
end

describe "by regex" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more corner case to ensure the regex is matched from right to left would be:

"hello".rpartition(/l./).should eq ({"hel", "lo", ""})

All the existing test cases work fine if the code would make a lookup from pos=0 to length-1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @bcardiff, I was assuming that there was a vague probability that the split point would more likely be toward the right hand end of the string and this would save unnecessary iterations. what are your feelings on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concerns were that the spec works if the implementation returns the first match from left to right since there is only one possible match. With /l./ there are two and the spec I propose ensure we want the rightmost one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with the implementation BTW.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok great, we have the new test, and we are all green :-)

break unless match_result.nil?
pos -= 1
end
match_result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this line necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, debugging in playground ... thats going away right now

@bcardiff
Copy link
Member

@johnjansen as soon as specs pass I am ok with this. Note that the history shall be squashed. I could handle via github directly, unless, you prefer to do that on your own.

@johnjansen
Copy link
Contributor Author

@bcardiff ill let you do the merge, and happy to see all the attempts squashed into oblivion ;-)

@johnjansen
Copy link
Contributor Author

@bcardiff is it just a matter of clicking "close and comment" to merge the PR ?

denisdefreyne and others added 2 commits October 17, 2016 14:57

Verified

This commit was signed with the committer’s verified signature.
ecoskey Emerson Coskey
@asterite
Copy link
Member

@johnjansen Thank you for this!

A question I had for some time now: why two empty strings are returned when the match is not found? Can we maybe return nil in this case? What's the use case for retuning two empty strings? I ask because I'm pretty sure there must be a reason.

@johnjansen
Copy link
Contributor Author

Hi @asterite

I wondered the same, but fell back to the same behavior as ruby ... much as i'm enthusiastic to get my first contribution merged, this probably warrants more discussion. I cant formulate a good argument either way without contradicting myself in the process :-)

J

@bcardiff
Copy link
Member

TL;DR 1: I think is ok to return a non-nilable tuple

I am not sure if rpartition needs to return nil. If we try to use it for cycling, it sounds like returning nil seems reasonable, to stop iteration. But due to the nature of unpacking the result using it in a loop the code looks weird:

s = "abc.def.ghi.jkl"

# if rpartition returned nil
while m = s.rpartition "."
  s, _, suffix = m
  suffix # => "jkl" , "ghi" , "def"
end

s # => "abc"

I would expect a block alternative to be used to cycle the results. #each_rpartition(Char|String, &block) or something like that that will yield all the partitions.

And last, for keeping the non-nillable tuple, a partition always exists. if you concat the 3 parts you get the original string. If the method was named "match" or something like that I would agree that we need a nil to represent the no-matching result.

TL;DR 2: I think #partition and #rpartition should be equal when there is no match

Maybe if no string is matched the result could be {self, "", ""} so the rpartition is used to split the right part after the separator. Like extracting the extension of a file, when no extension is found, getting an empty string seems appropriate. Also that would mean that #partition and #rpartition return the same result when there is no match at all (i would like that). But all this would defer from ruby, I am sure they had some reason for this. NB: currently there is no String#partition.

@johnjansen
Copy link
Contributor Author

I'll write partition too, but it will be later in the week ... if you want to ticket it and assign it to me :-)

@johnjansen
Copy link
Contributor Author

adding a block version of both could be a third task and we could agree on the nilability of elements ... once again, happy to pick that up too

@bcardiff
Copy link
Member

@johnjansen 👍 on #partition for later. But I would say to no worries for the block version. I am not sure if it would be that useful.

@johnjansen
Copy link
Contributor Author

awesome @bcardiff i will put it up on my whiteboard ;-)

@asterite
Copy link
Member

In the future (or soon) we could support unpacking in if and while. Ruby 2.3 or 2.4 supports this already. Basically:

if x, y = foo
end

is the same as this:

tmp = foo
if tmp
  x, y = tmp
end

So returning nil in partition would be good.

@johnjansen
Copy link
Contributor Author

ok @asterite i'll get to it in a little bit ... also note the partition PR ;-)

@asterite
Copy link
Member

@johnjansen In any case I don't know what final shape I'd like it to be. I'd like to know real use cases for partition before making a choice. What's your use case?

@johnjansen
Copy link
Contributor Author

i dont have one right now, but someone was complaining about its absence, and i wanted to start contributing with something simple, so i decided to have a go

@johnjansen
Copy link
Contributor Author

johnjansen commented Oct 18, 2016

@asterite ... and now i do ... in the process of porting some code, i found this

    def may_shift_colon?
      return false unless @text.includes?(":")
      partitions = @text.partition(":")
      partitions.last[0] !~ /\A\d+/ || partitions.first[-1] !~ /\A\d+/
    end

obviously, im happy to handle this any way thats necessary, but its one example ;-)

thinking about an implementation of the above with a union type return as below

class String
  def partition(search : (Char | String)) : Tuple((String|Nil), (String|Nil), (String|Nil))
    pre, mid, post = {nil, nil, nil}
    search_size = search.is_a?(Char) ? 1 : search.size
    case pos = self.index(search)
    when .nil?
      pre = self
    when 0
      mid = search.to_s
      post = self[(pos + search_size)..-1]
    else
      pre = self[0..(pos - 1)]
      mid = search.to_s
      post = self[(pos + search_size)..-1]
    end
    {pre, mid, post}
  end
end

def may_shift_colon?(text)
    return false unless text.includes?(':')
    first, _, last = text.partition(':')
    (last && last[0] !~ /\A\d+/) || (first && first[-1] !~ /\A\d+/)
end

presumably the last line can be

  last.not_nil![0] !~ /\A\d+/ || first.not_nil![-1] !~ /\A\d+/

given that, i like the guarantee that each part is a string ... although i can see the use for Nil too

@asterite
Copy link
Member

I found this about a use case for partition:

https://m0j0.wordpress.com/2007/07/11/partition-saves-my-bacon/

So it seems Python has this too. It's also a bit more efficient than split because it doesn't allocate an array. I think we can add it and keep the same behaviour as in Ruby and Python. rpatition also works the same in Ruby and Python: two empty strings followed by the original string.

I think the reason behind that behaviour is this. Imagine we expect to receive a string with a colon and we want to get what's before and after the first colon. So:

"foo:bar".partition(":") # => {"foo", ":", "bar"}
"foo:bar:baz".partition(":") # => ["foo", ":", "bar:baz"}

Now if there's no colon, we assume it's at the end of the string (because we are searching from the start). So:

# think "foo:".rpartition(":") # => {"foo", ":", ""}
"foo".partition(":") # => {"foo", "", ""}

An empty string is returned instead of the separator to distinguish it from the case where the colon was actually there.

Now, if we want to do the same but for the last colon:

"foo:bar".rpartition(":")     # => {"foo", ":", "bar"}
"foo:bar:baz".rpartition(":") # => {"foo:bar", ":", "baz"}

If there's no colon, since we are searching from the back of the string, we assume it's at the beginning of the string. So:

# think ":foo".rpatition(":") # => {"", ":", "foo"}
"foo".rpartition(":") # => {"", "", "foo"}

So now I think it makes a lot of sense to do it like that :-)

pos = self.rindex(search)
search_size = search.is_a?(Char) ? 1 : search.size

pre, mid, post = {"", "", ""}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre = mid = post = ""

@Sija
Copy link
Contributor

Sija commented Oct 19, 2016

@asterite It's also useful for splitting file paths, i.e. getting last path segment, that is the filename.

@asterite
Copy link
Member

@johnjansen Could you rebase this PR against master?

johnjansen and others added 6 commits October 19, 2016 22:48
…o string-rpartition

# Conflicts:
#	spec/std/string_spec.cr
#	src/string.cr
`crystal spec` commands now supports JUnit XML output, which requires
libxml2 to be installed.

With current defaults, the command fails due missing `libxml2-dev`
dependencies in the image.

This change ensures `libxml2-dev` is installed along the other
dependencies so can be used for both building executables and running
specs.

Also includes `libyaml-dev` so any attempt to using `yaml` works
out of the box.
@johnjansen
Copy link
Contributor Author

@asterite @bcardiff i think we are good to merge ... ?

@luislavena
Copy link
Contributor

@johnjansen I'm not sure, as there are still listed 19 commits for the changes.

https://github.com/crystal-lang/crystal/pull/3429/commits

I might recommend you look at the Contributing document for instructions to perform a clean rebase.

@johnjansen
Copy link
Contributor Author

thanks @luislavena, i hate rebase ;-) always messes with my head

@asterite
Copy link
Member

@johnjansen Don't worry, I also hate it too :-)

I actually tried to manually rebase this PR but failed... or, mmm... I'm lazy to rewrite history.

Another alternative you can do is to start from scratch: close this PR and create a new one in a different branch.

@luislavena
Copy link
Contributor

@johnjansen I use rebase all the time, but for some reason you're getting other commits that shouldn't be part of this branch.

I recommend taking a look to the document I linked before about updating your local copy against the upstream master and ensure only your commits are in the list of the rebase.

Perhaps use the interactive mode? (git rebase -i upstream/master)

@johnjansen
Copy link
Contributor Author

@asterite im going to close this one, and open another, with just the stuff we want :-)

@johnjansen johnjansen closed this Oct 20, 2016
@johnjansen johnjansen deleted the string-rpartition branch October 20, 2016 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants