Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Array#count instead of falling back to Enumerable#count #3133

Merged
merged 1 commit into from
Sep 18, 2014
Merged

Implement Array#count instead of falling back to Enumerable#count #3133

merged 1 commit into from
Sep 18, 2014

Conversation

jemc
Copy link
Member

@jemc jemc commented Sep 18, 2014

This avoids iteration with #each if no arguments are given to #count.

This is an easy win for performance for large arrays, and MRI and JRuby already have this optimization.

In addition to performance, the implementing Module as given by introspection now matches MRI, meaning behavior will be consistent if some silly user deems to monkey-patch Enumerable#count.

Array.instance_method(:count).owner #=> Array (not Enumerable)

This avoids iteration with #each if no arguments are given.
@jemc
Copy link
Member Author

jemc commented Sep 18, 2014

(Travis failure looks unrelated)

@jc00ke
Copy link
Member

jc00ke commented Sep 18, 2014

Are there specs for this? If not, can you add some. If so, are there tags to delete?

@jc00ke
Copy link
Member

jc00ke commented Sep 18, 2014

Never mind, I spoke before I looked.

jc00ke added a commit that referenced this pull request Sep 18, 2014
Implement Array#count instead of falling back to Enumerable#count
@jc00ke jc00ke merged commit 47f4647 into rubinius:master Sep 18, 2014
@@ -415,6 +415,18 @@ def concat(other)
concat other
end

def count(item = undefined)
seq = 0
if !undefined.equal?(item)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason for writing this backwards, instead of using if item != undefined or if !item.equal?(undefined) (or perhaps just if item)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yorickpeterse
The implementation was directly copied from that of Enumerable#count for consistency's sake, then modified to return the optimized result in the third case.

Regarding the options you mentioned, I can speculate as to why the original author of Enumerable#count chose not to use them:

  1. if item != undefined Having item be the first term here could be problematic, if the object at item overrode the != operator.
  2. if !item.equal?(undefined) This has the same potential problem as 1.
  3. if item This one is definitely not acceptable, because [true,true,false].count(false) should return 1, not 3. I imagine this is why undefined was used here instead of just testing falsiness or .nil?.
  4. if undefined != item would probably be okay, though that form is a little more 'relaxed' might be more likely to tempt someone to make the kind of refactorings later mentioned above that could break things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of Enumerable#count I referred to:
https://github.com/rubinius/rubinius/blob/master/kernel/common/enumerable.rb#L59-L69

@jemc jemc deleted the array_count branch September 18, 2014 15:20
yorickpeterse pushed a commit to yorickpeterse/oga that referenced this pull request Sep 25, 2014
Instead of relying on String#count for counting newlines in text nodes, Oga now
does this in C/Java. String#count isn't exactly the fastest way of counting
characters. Performance was measured using
benchmark/xml/lexer/string_average_bench.rb. Before this patch the results were
as following:

    MRI:   0.529s
    Rbx:   4.965s
    JRuby: 0.622s

After this patch:

    MRI:   0.424s
    Rbx:   1.942s
    JRuby: 0.665s => numbers vary a bit, seem roughly the same as before

The commands used for benchmarking:

    $ rake clean # to make sure that C exts aren't shared between MRI/Rbx
    $ rake generate
    $ rake fixtures
    $ ruby benchmark/xml/lexer/string_average_bench.rb

The big difference for Rbx is probably due to the implementation of String#count
not being super fast. Some changes were made
(rubinius/rubinius#3133) to the method, but this hasn't
been released yet.

JRuby seems to perform in a similar way, so either it was already optimizing
things for me or I suck at writing well performing Java code.

This fixes #51.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants