Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Indexable#to_a preallocate the array #6079

Merged
merged 1 commit into from May 8, 2018

Conversation

carlhoerberg
Copy link
Contributor

Benchmark:

require "benchmark"

struct Tuple
  def to_aa
    Array(Union(*T)).new(size) do |i|
      self[i]
    end
  end
end

t = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 }
Benchmark.ips do |x|
  x.report("original") { t.to_a }
  x.report("optimized") { t.to_aa }
end

Output:

original    2.86M (349.71ns) (± 2.36%)  1.72× slower
optimized   4.93M (202.81ns) (± 3.79%)       fastest

src/tuple.cr Outdated
@@ -392,6 +392,18 @@ struct Tuple
pp.list("{", self, "}")
end

# Returns a `Array` with all the tuple's elements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns an Array (...)

src/tuple.cr Outdated
# tuple = { 1, 2 }
# tuple.to_a # => [1, 2]
# ```
def to_a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can actually be put on Indexable to improve all indexable collections instead of just tuple.

Copy link
Contributor

@RX14 RX14 May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this means you can remove Deque#to_a too, and I'd rather you copied Deque#to_a's implementation using each and preallocating the size than this code which does a bounds-check on every self[i].

Of course, benchmarking after this change would be required. (and maybe investigate using this code but using self.unsafe_at too, to see if there's better performance)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require "benchmark"

module Enumerable
  def to_aa
    ary = Array(T).new(size)
    each { |e| ary << e }
    ary
  end
end

struct Tuple
  def to_a1
    Array(Union(*T)).new(size) do |i|
      self[i]
    end
  end

  def to_a2
    arr = Array(Union(*T)).new(size)
    {% for i in 0...T.size %}
      arr << self[{{i}}]
    {% end %}
    arr
  end
end

t = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, "a" }
Benchmark.ips do |x|
  x.report("to_a") { t.to_a }
  x.report("to_aa") { t.to_aa }
  x.report("to_a1") { t.to_a1 }
  x.report("to_a2") { t.to_a2 }
end

output:

 to_a   1.37M (732.25ns) (± 4.60%)  1.51× slower
to_aa   2.04M (489.27ns) (± 9.49%)  1.01× slower
to_a1   1.65M (606.33ns) (± 3.47%)  1.25× slower
to_a2   2.06M ( 485.6ns) (± 8.03%)       fastest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conclusion: Skipping optimizing Tuple#to_a, will just preallocate the size of the array in Enumerable#to_a

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use size in Enumerable#to_a, because size traverses the enumerable, so you'll end up traversing it twice.

Let's preallocate in Indexable and Tuple, but leave Enumerable#to_a as is.

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but @asterite don't you think that counting (which only happens if the class doesn't implement their own size method) still might be faster than dynamically growing an array?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlhoerberg I guess that depends on the implementation of the Enumerable. We should assume it's cost intensive by default and use a faster algorithm where size is trivial (like in Indexable).

src/indexable.cr Outdated
# Returns an `Array` with all the elements in the collection.
#
# ```
# { 1, 2, 3 }.to_a # => [1, 2, 3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run formatting (crystal tool format) also for the example code:

{1, 2, 3}.to_a # => [1, 2, 3]

Thank you.

@carlhoerberg
Copy link
Contributor Author

Fixed the doc formatting and rebased

@RX14
Copy link
Contributor

RX14 commented May 8, 2018

I think this can still be speced, we do have some indexable specs and a stub indexable to test with.

@asterite
Copy link
Member

asterite commented May 8, 2018

What happened to Tuple#to_a using macro for?

@carlhoerberg
Copy link
Contributor Author

carlhoerberg commented May 8, 2018 via email

@asterite
Copy link
Member

asterite commented May 8, 2018

Bah, nevermind. Let's not worry about that for now.

@RX14
Copy link
Contributor

RX14 commented May 8, 2018

@asterite Tuple each uses macro for already, and since yield is inlined it generates exactly duplicate code.

@asterite
Copy link
Member

asterite commented May 8, 2018

@RX14 Ah, right, I forgot about that.

# ```
def to_a
ary = Array(T).new(size)
each { |e| ary << e }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimizer should make this into a single memcopy for indexable with the elements in sequence, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. The optimizer is pretty dumb.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this ends up as a bottleneck in any specific class, there's plenty of scope for overriding it to use memcpy in classes where that's possible.

@RX14 RX14 added this to the Next milestone May 8, 2018
@RX14 RX14 merged commit 5d3a458 into crystal-lang:master May 8, 2018
@luislavena
Copy link
Contributor

@RX14 can you change the title of this issue to match the commits of the PR? That way someone linking back will be able to spot the connection.

Thank you.

@RX14 RX14 changed the title Optimize Tuple#to_a by preallocate Array Make Indexable#to_a preallocate the array May 8, 2018
chris-huxtable pushed a commit to chris-huxtable/crystal that referenced this pull request Jun 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants