-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Indexable#to_a preallocate the array #6079
Conversation
src/tuple.cr
Outdated
@@ -392,6 +392,18 @@ struct Tuple | |||
pp.list("{", self, "}") | |||
end | |||
|
|||
# Returns a `Array` with all the tuple's elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns an Array
(...)
src/tuple.cr
Outdated
# tuple = { 1, 2 } | ||
# tuple.to_a # => [1, 2] | ||
# ``` | ||
def to_a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can actually be put on Indexable
to improve all indexable collections instead of just tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this means you can remove Deque#to_a
too, and I'd rather you copied Deque#to_a
's implementation using each and preallocating the size than this code which does a bounds-check on every self[i]
.
Of course, benchmarking after this change would be required. (and maybe investigate using this code but using self.unsafe_at
too, to see if there's better performance)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require "benchmark"
module Enumerable
def to_aa
ary = Array(T).new(size)
each { |e| ary << e }
ary
end
end
struct Tuple
def to_a1
Array(Union(*T)).new(size) do |i|
self[i]
end
end
def to_a2
arr = Array(Union(*T)).new(size)
{% for i in 0...T.size %}
arr << self[{{i}}]
{% end %}
arr
end
end
t = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, "a" }
Benchmark.ips do |x|
x.report("to_a") { t.to_a }
x.report("to_aa") { t.to_aa }
x.report("to_a1") { t.to_a1 }
x.report("to_a2") { t.to_a2 }
end
output:
to_a 1.37M (732.25ns) (± 4.60%) 1.51× slower
to_aa 2.04M (489.27ns) (± 9.49%) 1.01× slower
to_a1 1.65M (606.33ns) (± 3.47%) 1.25× slower
to_a2 2.06M ( 485.6ns) (± 8.03%) fastest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conclusion: Skipping optimizing Tuple#to_a
, will just preallocate the size of the array in Enumerable#to_a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use size
in Enumerable#to_a
, because size
traverses the enumerable, so you'll end up traversing it twice.
Let's preallocate in Indexable
and Tuple
, but leave Enumerable#to_a
as is.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but @asterite don't you think that counting (which only happens if the class doesn't implement their own size
method) still might be faster than dynamically growing an array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carlhoerberg I guess that depends on the implementation of the Enumerable
. We should assume it's cost intensive by default and use a faster algorithm where size
is trivial (like in Indexable
).
src/indexable.cr
Outdated
# Returns an `Array` with all the elements in the collection. | ||
# | ||
# ``` | ||
# { 1, 2, 3 }.to_a # => [1, 2, 3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run formatting (crystal tool format
) also for the example code:
{1, 2, 3}.to_a # => [1, 2, 3]
Thank you.
Fixed the doc formatting and rebased |
I think this can still be speced, we do have some indexable specs and a stub indexable to test with. |
What happened to |
It didn't yield any performance improvement
…On Tue, May 8, 2018, 17:55 Ary Borenszweig ***@***.***> wrote:
What happened to Tuple#to_a using macro for?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6079 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAK_TmUSn4eu5uJFZKUCfOiFYU4O85u7ks5twb_xgaJpZM4T2hDh>
.
|
Bah, nevermind. Let's not worry about that for now. |
@asterite Tuple |
@RX14 Ah, right, I forgot about that. |
# ``` | ||
def to_a | ||
ary = Array(T).new(size) | ||
each { |e| ary << e } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optimizer should make this into a single memcopy for indexable with the elements in sequence, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. The optimizer is pretty dumb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this ends up as a bottleneck in any specific class, there's plenty of scope for overriding it to use memcpy in classes where that's possible.
@RX14 can you change the title of this issue to match the commits of the PR? That way someone linking back will be able to spot the connection. Thank you. |
Benchmark:
Output: