Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Hash#transform_keys and Hash#transform_values #4385

Merged
merged 3 commits into from Aug 17, 2018

Conversation

deepj
Copy link
Contributor

@deepj deepj commented May 6, 2017

This PR adds similar functionality in Ruby 2.4 which isHash#transform_keys and Hash#transform_values.

Hash#transform_keys returns a new hash with all keys converted using the block operation.

hash = {:a => 1, :b => 2, :c => 3}
hash.transform_keys { |key| key.to_s } # => {"A" => 1, "B" => 2, "C" => 3}

Hash#transform_values returns a new hash with the results of running block once for every value.

hash = {:a => 1, :b => 2, :c => 3}
hash.transform_values { |value| value + 1 } # => {:a => 2, :b => 3, :c => 4}

I haven't added bang equivalents since what I know it's not possible to modify types of keys/values in self.

@deepj deepj force-pushed the transform-keys-values branch 2 times, most recently from 303e966 to fd1e266 Compare May 7, 2017 08:36
@bcardiff
Copy link
Member

bcardiff commented May 8, 2017

@deepj the bang version can be implemented, as long as they don't change the type of the values nor keys.

Would you prefer to implement those in this PR as well?

@deepj
Copy link
Contributor Author

deepj commented May 8, 2017

@bcardiff I meant the bang versions cannot work in the same way (including type casting)

Yes, I can add them including a notice they don't work with type casting.

@deepj
Copy link
Contributor Author

deepj commented May 8, 2017

I tried to implement transform_keys! the following way

class Hash
  def transform_keys!(&block  : K -> K)
    each do |key, value|
      self[yield(key)] = value
      delete(key)
    end
  end
end

h = {'a' => 1, 'b' => 2}
h.transform_keys!(&.upcase)
puts h

But h returns an empty hash and I don't know why. Crystal 0.22.0

@bcardiff
Copy link
Member

bcardiff commented May 8, 2017

@deepj it's because the hash in changed inside the iterator.

For example if you swap delete and self[..] = you will end in an infinite loop.

I think we will need to duplicate the keys. Unless we go down with something that deals with the hash internals for reallocating the entries. So something as follow could work:

class Hash
  def transform_keys!(&block  : K -> K)
    each.to_a.each do |(key, value)|
      delete(key)
      self[yield(key)] = value
    end
  end
end

h = {'a' => 1, 'b' => 2}
h.transform_keys!(&.upcase)
puts h

Note that deleting the key should happen before inserting. Just in case the key transformation did nothing for that key.

But, now i am thinking of the corner cases:

h = {'a' => 1, 'b' => 2}
h.transform_keys!(&.succ)
puts h

The above change leads of {'c' => 2}

So we would need something like

class Hash
  def transform_keys!(&block  : K -> K)
    pairs = each.to_a
    clear
    pairs.each do |(key, value)|
      self[yield(key)] = value
    end
  end
end

which is essentially h = h.transform_keys(&.succ).

So, unless we really do it in-place there is not real value for bang of keys. Let's leave that out.

But the transform_values! could be there. Doing something efficient for this overload seems straight forward. Check Hash#each.

@RX14
Copy link
Contributor

RX14 commented May 8, 2017

This looks like map for keys/values on hash. Why not call it map_keys/map_values?

@Sija
Copy link
Contributor

Sija commented May 8, 2017

@RX14 One reason would be familiarity, since there's already Hash#transform_keys in Rails.

@bew
Copy link
Contributor

bew commented May 8, 2017

I'm all for map_keys! & map_values! they're the perfect names IMO.

The implementation needs to use internals I think: https://carc.in/#/r/1zwm

class Hash
  def map_keys!
    current = @first
    while current
      current.key = yield current.key
      current = current.fore
    end
  end

  private class Entry
    setter key : K # key has a getter only, adding a setter here
  end
end

h = {'a' => 1, 'b' => 2}
h.map_keys!(&.succ)
puts h # => {'b' => 1, 'c' => 2}

This impl has no copy (well maybe when copying from the block result to the Entry, but I'm not sure..), so it should be the fastest and the safest (no allocations or anything that could fail)

@konovod
Copy link
Contributor

konovod commented May 9, 2017

What if two keys would be mapped to the same value? It should raise, not create inconsistent UB'ish state imho. https://carc.in/#/r/200a

@deepj deepj force-pushed the transform-keys-values branch 2 times, most recently from b2e8dc4 to fba15fb Compare May 9, 2017 12:46
src/hash.cr Outdated
#
# ```
# hash = {"a" => 1, "b" => 2, "c" => 3}
# hash.transform_key! { |key| key.succ }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be transform_keys! (you missed the s)

src/hash.cr Outdated
# hash = {"a" => 1, "b" => 2, "c" => 3}
# hash.transform_key! { |key| key.succ }
# hash # => {"b" => 1, "c" => 2, "d" => 3}
def transform_keys!(&block : K -> K)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is breaking the Hash invariant. A call to rehash is needed at the end or the final state won't be useful for user.

I don't fully like that the Entry#key is changed from getter to property since that was preventing the invariant to be broken.

I would say to either base the implementation on rehash (zeroing the buckets, creating new Entry, iterate the old entries, storing new entries in the right place) or remove this implementation.

If any implementation is added, I would say to assert that actual.should eq(expected) and expected.should eq(actual) so we can detect if the code messes up with the invariant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @konovod comment I would say that insert_in_bucket should be used also so we avoid having more than one entry with the same key.

If the user provides a function that collide two keys into one there is no guarantee which value will be preserved. It's that or raising.

@bew
Copy link
Contributor

bew commented Feb 23, 2018

I thought this was merged, @deepj do you want to continue working on it?

If we can't find a consensus for mutating methods, at least it would be great to have this for non-mutating ones.

@deepj
Copy link
Contributor Author

deepj commented Feb 23, 2018

@bew Hi, I totally forgot on this PR. But my conclusion is, the best things here is keeping only #transform_keys and #transform_values.

What do you think @bcardiff @RX14

@bew
Copy link
Contributor

bew commented Apr 26, 2018

I think you can also keep transform_values! too, that can be useful at times.

@deepj
Copy link
Contributor Author

deepj commented Aug 2, 2018

@bew So, I left #transform_keys, #transform_values, #transform_values!

@bcardiff Can be this reviewed again, please? Is something else what I can do here and push things forward?

@bew
Copy link
Contributor

bew commented Aug 15, 2018

LGTM, can this have some reviews?

@RX14 RX14 requested a review from bcardiff August 16, 2018 22:46
Copy link
Member

@bcardiff bcardiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RX14 RX14 added this to the 0.26.1 milestone Aug 17, 2018
@RX14 RX14 merged commit 011aa64 into crystal-lang:master Aug 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants