Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement auto-tuple unpacking in block arguments #2778

Closed
wants to merge 1 commit into from

Conversation

asterite
Copy link
Member

@asterite asterite commented Jun 8, 2016

This PR is simple in what it does, but it deserves a very long explanation as to why suddenly I'm suggesting this. I'm not 100% convinced that we should go in this direction, although I now think it's harmless and it actually makes the language more powerful and easy to deal with (less noise).

First, what this PR does is that:

  1. If there's a single yield expression and it's a tuple type, and there is more than one block argument, the tuple is automatically unpacked
  2. If there is more than one yield expression, and the first one is a tuple type, and there is more arguments than the tuple size, the tuple is automatically unpacked

For example:

ary = [{1, 2}, {3, 4}]
ary.each do |tuple|
  # here `tuple` is a tuple, because there is one yield argument
  # and just one block argument
end

ary.each do |x, y|
  # `x` and `y` are unpacked, because there is one yield argument that is a tuple,
  # and there is more than one block argument
end

def foo
  yield({1, 2}, 3)
end

foo do |x|
  x # => {1, 2}
end

foo do |x, y|
  x # => {1, 2}
  y # => 3
end

# This is the second rule in action
foo do |x, y, z|
  x # => 1
  y # => 2
  z # => 3
end

If you are familiar with Ruby, you'll know that Ruby does rule 1 for arrays. Rule 2 is new in Crystal, and I think it makes more sense.

Note that this only works for Tuples, because their size if known at compile time. You can still use explicit unpacking for arrays if you want to (for tuples too, of course).

Now comes the long explanation as to why I'd like to include this in the language, and then why I think it's not "dangerous".

Since a few days, together with @jhass , we've been enhancing the language with a few missing things: splats in yields, splat in block arguments, and splats in generic types. One of the goals was being able to make Hash be an Enumerable. The idea was to make Enumerable be Enumerable(*T), so basically enumerable of more than one type (a tuple type), and Hash would include Enumerable(K, V).

Then we could implement each_with_index like this:

module Enumerable(*T)
  def each_with_index
    i = 0
    each do |*elem|
      yield *elem, i
      i += 1
    end
  end
end

That would work for Hash, because Hash does yield key, value, which will be bound to elem as a tuple, and then splatted back to the block.

The problem comes with select and others similar. This is the current implementation:

module Enumerable(*T)
  def select(&block : T ->)
    ary = [] of T
    each { |e| ary << e if yield e }
    ary
  end
end

This now is incorrect for Array, because we don't want this:

a = [1, 2, 3]
a.select { |x| x >= 2 } # returns an Array({Int32}), because T is {Int32}

However for Hash that works fine, because T is a tuple {K, V}.

The ugly solution is to check, in every Enumerable method, if T has just one element or many elements. This isn't only ugly, but also bad: if someone wants to add a new method to Enumerable she now has to do this check too.

So... how does it work in Ruby?

It turns out that Hash actually does yield [k, v]. That is, it doesn't do yield k, v, but yields an array instead. Here's the proof, it's C code. You can see that it checks the block arity: if it's more than 1 it avoids creating this small array and directly binds the values to the block arguments (this is an optimization).

So, because Hash yields an array, and arrays are automatically unpacked if they are the only argument yielded to a block, and the block has more than one argument, one can do:

hash = {1 => 'a'}
hash.each do |key, value| # here unpkacing happens
  key # => 1
  value # => 'a'
end

# One can also do:
hash.each do |kv|
  kv # => [1, 'a']
end

With that, when you do hash.group_by or any other Enumerable method, you can do hash.group_by { |k, v| } and auto-unpacking happens. It's very convenient.

However, when you do each_with_index, because more than one argument is yielded (the array and the index), auto-unpacking doesn't happen in Ruby:

hash = {1 => 'a'}

# Probably not what you'd expect
hash.each_with_index do |key, value, index|
  key # => [1, 'a']
  value # => 0
  index # => nil
end

But one can (has to) do:

hash = {1 => 'a'}

# Probably not what you'd expect
hash.each_with_index do |(key, value), index|
  key # => 1
  value # => 'a'
  index # => 0
end

That's why we couldn't translate this idiom well to Crystal, because our Hash does yield k, v, not yield({k, v}). And we don't have automatic tuple unpacking.

With automatic tuple unpacking the Enumerable problem is solved: we can do yield({k, v}) in Hash, and make Hash be Enumerable({K, V}) (which is what it really is, it's an enumerable of key-value pairs). I actually tried this and it worked flawlessly (all specs passed, and I even tried compiling other projects). We can use group_by, first, to_a and all other Enumerable methods, and because we have automatic unpacking we can write |key, value| instead of the more noisy and ugly |(key, value)|. We can then make NamedTuple an Enumerable too, very easily. This avoids a lot of boilerplate and redundant definitions.

There's still one issue: we'd need to use explicit unpacking in each_with_index. That's were rule 2 is nice: we don't need to use explicit unpacking. We can simply do:

hash = {1 => 'a'}

# each_with_index does `yield({key, value}, index)`, and because
# the first argument is a tuple type and we have more than one yield
# expression and more block arguments that the size of this tuple,
# auto-unpacking happens
hash.each_with_index do |key, value, index|
  key # => 1
  value # => 'a'
  index # => 0
end

I think this makes sense, as it's very common to want to iterate an object with more info, and the first object might be a tuple, so it makes sense to auto-unpack it even if there is more than one yield argument.

We could late consider unpacking not only the first argument, but other arguments, if they are tuples and there are enough block arguments. However, I'm not sure there is a strong use case to motivate this, and one can always use explicit unpacking for this.

For this to be "perfect", we should change reduce to yield elem, memo and not memo, elem. I think this is actually good, as it basically makes it more consistent: all enumerable methods yield the element first, and then extra arguments.

Now, automatic tuple unpacking might be considered "dangerous" (well, I actually don't know if this is true, I'm just guessing :-P). But I don't think it's dangerous: if I provide more block arguments, and I know that the first yielded value is a tuple, it's very likely that I want to unpack the tuple. Otherwise, why would I provide more arguments than the number of yielded expressions?

Note that one can still use explicit unpacking if one wants to. But I now think automatic unpacking is really, really powerful and convenient, and can probably have many use cases, and reduce a lot of noise. For example:

a = [1, 2, 3]
b = [4, 5, 6]
a.each.zip(b.each).cycle.first(10).each do |x, y| # no need to write |(x, y)| anymore
  puts "#{x} => #{y}"
end

Finally, this just comes now, and was pretty easy to implement, because implementing splatting in yields and block arguments prepared the ground for this (that was the hard part).

Thoughts?

@kostya
Copy link
Contributor

kostya commented Jun 8, 2016

i think this is too magic:

def foo
  yield({1, 2}, 3)
end

# This is the second rule in action
foo do |x, y, z|
  x # => 1
  y # => 2
  z # => 3
end

what if decompose by tuple literal {} also, not by () which is means in ruby exactly tuple

foo do |x, y, z| # error
foo do |{x, y}, z| # x => 1 y => 2, z => 3
foo do |x, z| # x => {1, 2}, y => 3
foo do |x| # x => {1, 2}

some kind of pattern matching, tuple to tuple

but this is ok:

def foo
  yield(*{1, 2}, 3)
end

foo do |x, y, z| # x => 1 y => 2, z => 3

@asterite
Copy link
Member Author

asterite commented Jun 8, 2016

@kostya But then you can't do each_with_index do |key, value, index|, you are forced to do each_with_index do |(key, value), index|. In fact, you can do it in many ways, and they are all correct and work as expected:

hash = {1 => 'a'}

# I know that Hash is an Enumerable({K, V}), and each_with_index yields
# key, value, index, so this is OK and works
hash.each_with_index do |key, value, index|
  p key # => 1
  p value # => 'a'
  p index # => 0
end

# I know it's a tuple and an index, also works
hash.each_with_index do |kv, index|
  p kv # => {1, 'a'}
  p index # => 0
end

# I can use (key, value) if I want to, basically the case above with explicit unpacking
hash.each_with_index do |(key, value), index|
  p key # => 1
  p value # => 'a'
  p index # => 0
end

I mean, I can't imagine why I would want each_with_index do |key, value, index| to make key be a tuple, value be the index, and index always nil. There's no use case for that. The more useful thing is to bind things as this PR proposes it.

Another example:

hash = {1 => 'a'}

# This doesn't surprise you, that it works in Ruby?
hash.each do |key, value|
end

# Doesn't this surprise you that it doesn't do what you'd expect it to do, in Ruby?
# Why can I use `key, value` above, but not here?
hash.each_with_object(10) do |key, value, obj|
  # huh? obj is nil? why?
end

# Why am I forced to write the last case like this?
hash.each_with_object(10) do |(key, value), obj|
end

@asterite
Copy link
Member Author

asterite commented Jun 8, 2016

@kostya I mean, using your example and the many variants:

def foo
  yield({1, 2}, 3)
end

# OK, matches your expectation
foo do |(x, y), z|
  p x # => 1
  p y # => 2
  p z # => 3
end

# OK, matches your expectation
foo do |x, z|
  p x # => {1, 2}
  p z # => 3
end

# OK, again matches your expectation
foo do |x|
  p x # => {1, 2}
end

# Now, would you like this to behave like this?
foo do |x, y, z|
  p x # => 1
  p y # => 2
  p z # => 3
end

# Or like this?
foo do |x, y, z|
  p x # => {1, 2}
  p y # => 3
  p z # => nil
end

If we remove rule 2, the last case will work like that, yielding nil for z, always. In which case would you like that to happen?

@asterite
Copy link
Member Author

asterite commented Jun 8, 2016

Put another way, if you only know rule 1 (from Ruby) and you ignore rule 2, using (key, value) like in Ruby, it will still work. Now, you don't have to do that if you don't want to.

I don't know why Ruby doesn't do Rule 2. I tried to search for some Ruby specification but could find one. Well, I actually found one but it's very confusing.

@asterite
Copy link
Member Author

asterite commented Jun 8, 2016

An alternative to this proposal, which we've been discussing with @jhass on IRC is to let Hash do yield({key, value}), and one would always have to use explicit unpacking to iterate a hash:

hash = {1 => 'a'}
hash.each do |(key, value)|
  key # => 1
  value # => 'a'
end 

hash.each do |key, value|
  key # => {1, 'a'}
  value # => nil
end

With that, making Hash include Enumerable({K, V}) is trivial and works.

The only downside is that explicit unpacking has to always be used. But it's explicit, never implicit.

@asterite
Copy link
Member Author

asterite commented Jun 8, 2016

Yet another alternative is to just do rule 1. No need for extra parentheses in hash.each, but needed in each_with_index, reduce and a few others.

@bcardiff
Copy link
Member

bcardiff commented Jun 8, 2016

I would like to see more use cases for rule 2 before adding it.
I'm ok with adding rule 1.
As from the samples, since the user will be forced to do explicit unpack, and all the scenarios highlighted where the rule 2 would act are nonsense, adding rule 2 should break no code.

@asterite
Copy link
Member Author

asterite commented Jun 8, 2016

I'll try sending a PR with just rule 1, but additionally I'd like the compiler to give an error on extra block arguments. For example:

a = [1, 2, 3]
a.each { |x| } # OK
a.each { |x, y| } # Error, too many block arguments (given 2, expected max 1)

h = {1 => 2, 3 => 4}
h.each { |kv| } # OK
h.each { |(key, value)| } # OK
h.each { |key, value| } # OK
h.each { |key, value, extra| } # Error, too many block arguments (given 2, expected max 2)

That way you can't accidentally bind block arguments in a wrong way that would result in a nil silently appearing. Well, with nil I guess there will eventually be a compile error, maybe, but it will be harder to digest.

What do you think?

Passing less block arguments is of course OK, for example useful for 5.times { ... }.

@david50407
Copy link
Contributor

I think both rule 1 and 2 are seems to be good and very convenient when passing tuple to yield a block.

IMO, we'll have more situations for needing unpacking tuple than not unpacking, but if Crystal is going to apply both rules, then need another way or syntax to escape auto-unpacking. (Or it's already exist?)

Verified

This commit was signed with the committer’s verified signature. The key has expired.
@asterite asterite force-pushed the feature/block_auto_unpack branch from d78467e to f2c36f6 Compare June 9, 2016 22:54
@asterite
Copy link
Member Author

Closed in favor of #2783

@asterite asterite closed this Jun 10, 2016
@asterite asterite deleted the feature/block_auto_unpack branch June 10, 2016 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants