Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add enumerable#chunk #2821

Merged
merged 1 commit into from
Oct 25, 2016
Merged

add enumerable#chunk #2821

merged 1 commit into from
Oct 25, 2016

Conversation

kostya
Copy link
Contributor

@kostya kostya commented Jun 12, 2016

this version return array, but ruby returns enumerator class(implemented on fibers), which missing in crystal. at least this is 10 times faster than ruby

s = 0
t = Time.now
x = Array.new(100000) { rand(10000) }
100.times do
  s += x.chunk { |a| a } .to_a.size
end
p s
p Time.now - t
crystal
9999100
00:00:00.7460890

ruby
9999300
7.995598

@jhass
Copy link
Member

jhass commented Jun 12, 2016

Interestingly since Enumerable#slice_when was added to Ruby I haven't seen a usecase for Enumerable#chunk that wasn't more clearly solved using Enumerable#slice_before, Enumerable#slice_when or Enumerable#slice_after. Do you have anything that can't be solved with one of them or where Enumerable#chunk leads to clearer code?

In any case if we decide to accept this, docs a crucial for such a complex method.

@kostya
Copy link
Contributor Author

kostya commented Jun 13, 2016

Enumerable#slice_before, Enumerable#slice_when or Enumerable#slice_after not so obvious to use (strange name, strange behavior), chunk is clear for me.

@asterite
Copy link
Member

Note that while we don't have Enumerator we do have Iterator, which is the similar concept.

@kostya
Copy link
Contributor Author

kostya commented Jun 13, 2016

iterator need to implement next function, but how to implement it, if you only have each function from enumerable?

@kostya
Copy link
Contributor Author

kostya commented Jun 13, 2016

i trying to implement iterator, but have error with code:

class A(T)
  def initialize(@collection : Enumerable(T))
    @iterator = @collection.each
  end
end

a = A(Int32).new([1,2,3])
Enumerable(T) is not a generic class, it's a generic module

  def initialize(@collection : Enumerable(T))

@asterite
Copy link
Member

We can probably have chunk in Enumerable like you did, and chunk in Iterator to be lazy. Check how iterators are created there. There's no way right now to have an instance variable of type "generic module", but you can work around that like this:

module Iterator
  def chunk
    Chunk(typeof(self), T).new(self)
  end

  class Chunk(I, T)
    @iterator : I

    def initialize(@iterator : Iterator(T))
    end
  end
end 

@kostya
Copy link
Contributor Author

kostya commented Jun 13, 2016

iterator added

@asterite
Copy link
Member

@kostya Cool! Do you still have the non-iterator version? We could have both. Not all Enumerable have each without block, so the version that returns arrays can still be useful.


it "works with block" do
res = [] of Tuple(Bool, Array(Int32))
[1, 2, 3].chunk { |x| x < 3 }.each { |k, v| res << {k, v} }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use |(k, v)| for now? Auto-unpacking is only available in the next version, with this the current specs can't be compiled anymore with 0.17.4.

@kostya
Copy link
Contributor Author

kostya commented Jun 13, 2016

what classes without each? may be implement for them each on channels?

@asterite
Copy link
Member

@kostya The standard library provides each for most collection classes, but we can't expect it to be in user-defined collections.

In the future I'd like to have something like Enumerator in Crystal if it can be done efficiently with fibers... but plain old iterators are always faster.

@kostya
Copy link
Contributor Author

kostya commented Jun 13, 2016

how add two methods chunk for each with block and without? different names?

@kostya
Copy link
Contributor Author

kostya commented Jun 13, 2016

added chunks method

when :_alone
return {tail_key.not_nil!, [tail_val]}
else
raise "symbols beginning with an underscore are reserved"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArgumentError

end

it "raises a RuntimeError if the block returns a Symbol starting with an underscore other than :_alone or :_separator" do
expect_raises do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expect_raises ArgumentError

@asterite
Copy link
Member

@kostya I'll review the code and we'll eventually add this to the standard library :-)

One thing: instead of using nil, :_separator and :_alone for special meaning, we could maybe two marker types:

module Enumerable
  module Chunk
    record Drop
    record Alone
  end
end

So one can do:

enumerable.chunk { Enumerable::Chunk::Alone }

It's a bit more verbose, but this way nil and any symbol can participate in chunk methods.

else
raise ArgumentError.new("symbols beginning with an underscore are reserved")
end
return {tail_key.not_nil!, [tail_val]} if tail_key == Enumerable::Chunk::Alone
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not_nil! needed now? Could you maybe add a spec with nil values? :-)

Copy link
Contributor Author

@kostya kostya Jun 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil is hard here because, if i try to setup previous = uninitialized U, it gives error:

can't use Object as a Proc argument type yet, use a more specific type

    previous = uninitialized U


pending "work with class" do
[1, 1, 2, 3, 3].each_chunk(&.class).to_a.should eq [{Int32, [1, 1, 2, 3, 3]}]
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil is fixed, but now this not compiled because try to create uninitialized Int32.class

@asterite
Copy link
Member

@kostya don't worry about the nil case, if it can be done because of a compiler bug we can wait until that bug is fixed

@kostya
Copy link
Contributor Author

kostya commented Jun 30, 2016

funny that this code give error:

./bin/crystal eval 'p [1,2,3].chunk { Enumerable::Chunk::Drop }.to_a'
instantiating 'Enumerable::Chunk::Accumulator(Int32, Enumerable::Chunk::Drop:Class)#initialize()'

in ./src/enumerable.cr:106: can't use Object as a Proc argument type yet, use a more specific type

        @key = uninitialized U
        ^~~~

but this not:

./bin/crystal eval 'p [1,2,3].chunk { rand >= 0 ? Enumerable::Chunk::Drop : 1 }.to_a'
[]

@kostya
Copy link
Contributor Author

kostya commented Aug 1, 2016

This is done. Code is much readable now.

@asterite
Copy link
Member

@kostya Thank you for this! And sorry for the long delay. I'll merge this and add a few more comments and do some small changes.

@asterite asterite merged commit ac83b6c into crystal-lang:master Oct 25, 2016
@asterite asterite added this to the 0.20.0 milestone Oct 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants