Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix enumerator size #3320

Closed
wants to merge 30 commits into from
Closed

Conversation

fmfdias
Copy link
Contributor

@fmfdias fmfdias commented Feb 18, 2015

Hi,

This is an attempt to fix the Enumerator#size method issues identified previously here and here. It's still a work in progress but I opened this pull request to check with you guys if I'm on the right track and also get some guidance.

I took a look at the JRuby implementation and what they do is pass the a 'calculated' size to the enumerator. Usually the size is enough (each, reverse_each, etc...), but there are some methods that need some math, like the cycle, each_cons and each_slice.

To start I've implemented a to_enum_with_size private method on the Enumerable module to use on these methods to create the an enum with the size calculation. Doing it here allow it to be used on Array and other classes that mix in the Enumerable module. Nevertheless, my question here is if it's the right place to do it.

Another thing I would like to point out is the each_cons and each_slice size spec. The size calculation is a little bit more complex that the cycle. The spec is only testing some value combinations, but I think it could be better. Do you guys have a better way to implement it?

When I was at it, I saw that both each_cons and each_slice raise an exception when called without a block. I added the calls to the spec that tests that the exception is raised and fixed it.

Like I said, this is still a work in progress. Enumerable seems to be fixed with these changes, but Array and other aren't. I already started with Array. With time, and if you are all ok with these changes, I'll continue to add them to this pull request.

Thank you

@@ -351,7 +351,18 @@ def any?
end

def cycle(many=nil)
return to_enum(:cycle, many) unless block_given?
unless block_given?
size = Proc.new do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern occurs in a few places in this PR so this comment applies to all of them: can we not simply pass the size itself here instead of a Proc? There's little overhead in calculating the size at this point and it removes the need for evaluating the block on every #size call later on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you! These sizes are not that complex to calculate. There will be others, like the Array#combination that might be worthwhile calculating it only if needed. Will see when we get there, but in the meantime I'll change these.

@yorickpeterse
Copy link
Contributor

Spec changes look fine to me. For Enumerable/Enumerator we might want to do some general book keeping to clean things up, especially regarding passing in enum sizes.

@yorickpeterse
Copy link
Contributor

To further iterate (now that I'm at the office): I think we should fix Enumerator#initialize so we don't need #initialize_enumerator anymore. Right now the former method changes behaviour depending on whether or not you give it a block and uses the same argument as either the receiver or the size. The argument list I'm thinking of as an alternative would be:

  • size
  • receiver
  • method_name
  • *method_args
  • &block

This would still match the signature of MRI (which is Enumerator#initialize(size)) and should remove the need for the extra initialize_enumerator method.

@fmfdias
Copy link
Contributor Author

fmfdias commented Feb 18, 2015

Hi @yorickpeterse,

Thanks for the input.
I've replied to some of your comments inline. Regarding the need of the to_enum_with_size method I'm open to try other approaches. I also don't feel to much comfortable about it.
In the meantime I'll try the approach described in the to_enum documentation.

I'm doing this off hours, and with some time constraints. That's why I'm only replying now. My main objective is to help and so because I like this, but if you guys feel that the time might be an issue, please go ahead and continue with it as you see better. I'll continue to help in other ways.

Will get back when I have more news.
Thank you

@yorickpeterse
Copy link
Contributor

There's no rush, so take your time :)

@fmfdias
Copy link
Contributor Author

fmfdias commented Feb 19, 2015

Hi @yorickpeterse,

Thanks!
Reimplemented it with the to_enum and removed the to_enum_with_size methods and it works :)
There's some new edge case checks for the specs of the methods that involve some math in the calculation.
Before going to the Array and others, I'm thinking on improving the added specs.
The reasons are:

  • One spec is missing is that the size is always nil when the Enumerable doesn't implement a size method.
  • Until now I saw 3 repeating patterns:
    • The resulting enumerator size returns the enumerable size (collect, find_all, etc...)
    • The resulting enumerator size returns nil (find_index, take_while, etc...)
    • The resulting enumerator size returns has some specific math involved (cycle, each_slice, each_cons)
  • Array, and probably others, seem to also present similar patterns (e.g: Array#each_index) and they override some Enumerable methods (e.g.: Array#cycle).

Doing this would greatly reduce the amount of work of creating/changing the specs. I was thinking on using it_behaves_like but I don't if it's the best way to do it. There's also the question of where to place in the case of having to share them with Enumerables like Array. I'll try to figure it out, but any advice would be great.

Thank you

@yorickpeterse
Copy link
Contributor

For re-using specs we use shared examples, e.g. https://github.com/rubinius/rubinius/blob/master/spec/ruby/shared/enumerator/with_object.rb. This can then be used as following: https://github.com/rubinius/rubinius/blob/master/spec/ruby/core/enumerator/with_object_spec.rb#L5. This is probably the best solution here since Array and Enumerable share the same expected behaviour.

Regarding the patterns, make sure the changes also take care of lazy enumerators as in that case size is always supposed to return nil if I remember correctly.

@bjfish
Copy link
Contributor

bjfish commented Feb 20, 2015

I had a failing spec I forgot to add before for each_with_index_spec:

 it "has correct size with no proc given" do
    [1, 2, 3, 5].each_with_index.size.should == 4
  end

@fmfdias I thought I'd mention it in case you want to add it.

@fmfdias
Copy link
Contributor Author

fmfdias commented Feb 22, 2015

Hi guys,

Thanks for the feedback.
Sorry for the late reply but had a hard drive failure this week and had less time to look at this.

I've updated the specs with two shared spec files. The enumeratorized is to be use by any enumerable implementation and besides implementing the 2 first use cases I enumerated in my last comment, it also has the cycle case that's shared with the Array. The enumerable_enumeratorized is specific to the Enumerable specs and basically adds tests for Enumerable with no size method implementation.

The usage of this specs relies on it_behaves_like on most cases and in some edge cases it relies on the it_should_behave_like (e.g: Other shared specs, each_slice, each_cons...)

@yorickpeterse Thanks for the ideas. I also checked with MRI 2.1.5 and size is also passed to lazy enumerators. This version also does that :)

@bjfish Thank you, this version also has a spec for each_with_index. If you remember any other that might be missing, please tell me and I'll add.

For next steps I'm going to add some missing specs on Array (e.g: combination, permutation, etc...) and start implementing them.

Thanks

@fmfdias
Copy link
Contributor Author

fmfdias commented Feb 23, 2015

Hi,

Added the implementation for the Array methods that were failing the specs.

I moved to Hash... added failing specs and started to implement it on hash.rb when I found out that there's another hash implementation (hash_hamt.rb).
This might seem like a noob question, but... how do enable it / test it? :)

Thank you

@jemc
Copy link
Member

jemc commented Feb 24, 2015

@fmfdias
Try RBXOPT=-Xhash.hamt

@fmfdias
Copy link
Contributor Author

fmfdias commented Feb 24, 2015

@jemc Thank you!

The Hash part is implemented. Moving to Range!

Cheers

@fmfdias
Copy link
Contributor Author

fmfdias commented Feb 28, 2015

Hi guys,

A little status update...
Range, Kernel and Numeric methods with to_enum calls are fixed and have tests. At this moment I'm checking all the to_enum calls and comparing them to the MRI implementation, so it might take longer than I expected to finish this pull request.
This weekend I hope to advance a bit on this!

Have a nice weekend!

@yorickpeterse
Copy link
Contributor

Thanks for looking into this, and take your time! 👍

@fmfdias
Copy link
Contributor Author

fmfdias commented Mar 1, 2015

Hi guys,

@yorickpeterse: Thanks :)

I think the bulk work is done. I tried to find, check and fix every Kernel#to_enum, Kernel#enum_for and Enumerator#new existing on the Rubinius code. Hope none has skipped, hehehe.

Please review and I'll make the necessary changes as soon as I can.

Have a nice weekend!

@yorickpeterse
Copy link
Contributor

I haven't looked at the actual code yet (I'll do so in the next few days), but looking at the commits themselves I have two nit-picky comments:

  1. The merge commit fmfdias@e809904 can be removed using a git rebase if I'm not mistaken. It's not hugely crucial but it keeps the logs clean.
  2. Whenever possible we prefer for the first line of a Git commit to be <= 50 characters. Given such a summary is clear enough it's much easier to quickly scan through commits to see what they're up to. An example: 066dc00

Both these changes can be made using an interactive Git rebase (git rebase -i HEAD~53 where 53 is the number of latest commits to rebase). If you're new to rebasing I suggest reading http://git-scm.com/book/en/v2/Git-Branching-Rebasing since it's pretty useful (although a bit hard to grok at first). If it's too much of a fuzz I can take a look at it and explain how/what I did :)

@yorickpeterse yorickpeterse self-assigned this Mar 2, 2015
return 0 if comb < 0
__descending_factorial__(size, comb) / __descending_factorial__(comb, comb)
end
private :__binomial_coefficient__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be moved into a mirror for the Array class (https://github.com/rubinius/rubinius/blob/master/kernel/bootstrap/array_mirror.rb), that way we don't pollute the Array class nor can Ruby code in the wild mess with it. Mirrors can be used as following: https://github.com/rubinius/rubinius/blob/master/kernel/common/array.rb#L48. Another benefit of using a mirror is that this removes the need for the ugly __ prefix/suffix.

@yorickpeterse
Copy link
Contributor

After looking through the code I have two general remarks:

  1. As mentioned in the individual comments it's probably best to remove the private __XXX__ methods into a mirror class (whenever possible). This ensures third-party Ruby code can't mess with it.
  2. How do these changes handle lazy enumerators? Do the existing lazy enumerator specs still pass?

@fmfdias fmfdias force-pushed the fix_enumerator_size branch from 8f509ff to 2d239ba Compare March 6, 2015 01:35
@fmfdias
Copy link
Contributor Author

fmfdias commented Mar 6, 2015

Hi @yorickpeterse,

Thanks for the clarifications!
I've implemented the EnumerableHelper module with the cycle_size function to use on both Enumerable and Array also made the rebase.

I ran the specs with ruby 2.2.1 and they also pass, so I think this work can also be reflected on 2.2 branch too.

Ruby 2.2 brought two new Enumerable methods, slice_before and slice_when.
I checked and they both return an enumerator but it's size is nil. There's already a spec for slice_before, that I'll update, but not one for slice_when.
Speaking of slice_when, he's not referenced on #3264. It doesn't exist on the NEWS file referenced in the issus, but it's not the 2.2.0 final release NEWS file.

Thank you!

@tmornini
Copy link
Contributor

tmornini commented Mar 6, 2015

@fmfdias Thanks for all of this work, and @yorickpeterse thanks for coaching it.

Rubinius is so exciting, and it's magical to watch the community working together so effectively.

@yorickpeterse
Copy link
Contributor

@fmfdias Specs for slice_after were added in c3fd4a3, not sure if anybody is currently working on slice_when. I've updated #3264 to also include slice_when. I'll do another run-through of this PR later today/tomorrow, and thanks again for the wonderful work 😄

@fmfdias
Copy link
Contributor Author

fmfdias commented Mar 6, 2015

Hi @tmornini and @yorickpeterse,

Thanks 😊 I'm really glad I could to help! It's been a great experience and I'm enjoying it a lot. 😃
Rubinius is indeed a great project and I found it very accessible and easy to participate.
Congrats to you all for that 😃!

@yorickpeterse I'll try to do any change request as soon as I can so this one can be released.

Speaking of the #3269, is/was there any similar issue opened for Ruby 2.0/2.1? I'm asking this because while developing this one I found out that on MRI > 2.1, Numeric#step supports keyword arguments. There might be others in similar situation, so I was thinking on checking for this changes that might be missing from Rubinius.
Later I'll open the issue / pull request with specs (and probably an implementation) for the Numeric#spec keyword args.

You guys have a nice weekend!

@yorickpeterse
Copy link
Contributor

Speaking of the #3269, is/was there any similar issue opened for Ruby 2.0/2.1? I'm asking this because while developing this one I found out that on MRI > 2.1, Numeric#step supports keyword arguments. There might be others in similar situation, so I was thinking on checking for this changes that might be missing from Rubinius.

Not sure if I'm following. If you're referring to other changes required for 2.2 compatibility, sure, there's probably stuff not mentioned in the NEWS file and the corresponding issue.

@fmfdias
Copy link
Contributor Author

fmfdias commented Mar 7, 2015

Sorry... wrong ticket 😅 The one I wanted to mention was #3264.

Having a check list is cool to check how much specs are still missing. I was trying to know if there was a similar one for 2.0 and 2.1 compatibility. If they exist, my idea is to compare them to the NEWS file and spot any other missing stuff, like the Numeric#step keyword arguments.

Thanks

@yorickpeterse
Copy link
Contributor

As far as I'm aware there's no ticket for what remains to be added for 2.0/2.1.

@fmfdias
Copy link
Contributor Author

fmfdias commented Mar 8, 2015

Hi @yorickpeterse

Thanks! I'll try to do a quick check to see if anything more is missing.
In the meantime, I started to add specs and implement the kw args on Numeric#step. I'm using this PR as the base for that work to avoid conflicts.
If I finish it on time and If you agree, I'll merge these specs and implementation to this PR , otherwise I'll create it's own PR.

Thank you!

@yorickpeterse
Copy link
Contributor

@fmfdias Doing that in a separate PR is probably best (keeps things clear). You can simply branch said PR from this one which should prevent any merge conflicts.

@fmfdias
Copy link
Contributor Author

fmfdias commented Mar 12, 2015

Hi @yorickpeterse

Sorry for the delay, but didn't had time to finish the Numeric#step implementation before.
I followed your suggestion and created a branch based on this PR.

Thank you!

@yorickpeterse
Copy link
Contributor

Looking into this at the moment, in particular:

  • I'm wrapping the Git commits so the lines are at most 80 characters long (http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html is a good read on this). I don't think we mention anything about this anywhere, so don't worry about it.
  • It seems some 2.2 specific stuff is in this PR, I'll probably pull that out and merge that into the 2.2 branch instead (if possible). Until we merge the 2.2 branch it's preferred for 2.2 changes to go in a separate PR

Otherwise things are looking fine, I should be done with this in 30 minutes or so :)

@fmfdias
Copy link
Contributor Author

fmfdias commented Mar 12, 2015

Cool!

Sorry about the commit message. I'll take that into consideration next time I commit to the project :)
About the 2.2 stuff... I tested everything with ruby 2.1.5 and tried only to implement things related to MRI 2.1. Can you point out the stuff you found so I can take a look?

Thank you

@yorickpeterse
Copy link
Contributor

The Enum stuff has been rebased into the following commits: ac5f92a...64d73f9

The String#sub changes have been rebased into the 2.2 branch as the following commits: ccf1686...33dae72

@yorickpeterse
Copy link
Contributor

@fmfdias

About the 2.2 stuff... I tested everything with ruby 2.1.5 and tried only to implement things related to MRI 2.1. Can you point out the stuff you found so I can take a look?

The commits in ccf1686...33dae72 mentioned things changed in Ruby 2.2. Since we're dropping the use of version guards this means that until we fully support 2.2 changes for said version have to go in a separate branch. Or do the commits mean that this has changed since 2.1 (and not 2.2)?

@yorickpeterse
Copy link
Contributor

Ah I see, this changed in 2.1 and not 2.2. In that case I'll merge the commits into the master branch as well.

@yorickpeterse
Copy link
Contributor

With this merged/sorted out I'm closing this. Big props for taking care of this! 👍 🍺

@fmfdias
Copy link
Contributor Author

fmfdias commented Mar 12, 2015

That behavior comes at least from 2.0.
But 2.1 also introduced some slight behavior changes (Numeric#step TypeError to ArgumentError) :/

@fmfdias
Copy link
Contributor Author

fmfdias commented Mar 12, 2015

@yorickpeterse
Thank you! Glad I could help 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants