Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dig method to Enumerable #3536

Closed
wants to merge 1 commit into from

Conversation

samueleaton
Copy link
Contributor

@samueleaton samueleaton commented Nov 12, 2016

Add the dig method as seen in Ruby's Hash#dig and Array#dig

Also adds a dig? method that will not raise an error but will instead return nil if the key path doesn't exist.

@sdogruyol
Copy link
Member

Awesome 🎉

Copy link
Contributor

@yxhuvud yxhuvud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One big difference to the Ruby implementations is that it also accept Arrays, ie

{:x => [1,2,3]}.dig(:x, 2) 

would return 3.

Conversely, #dig is also implemented on Arrays (and Ostructs and Structs, but I'm not certain how relevant those would be).

@@ -845,6 +876,14 @@ class Hash(K, V)
raise "Hash table too big"
end

private def dig_deep(hash_keys)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does extracting the implementation into a private method instead of putting it in the variant taking the array add anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the has_key? method. It references another private method that does more work.

My reasoning is that the deep_dig method would be the workhorse but the other dig and dig? methods would be the public facing methods, making the code more DRY.

# h.get ["z"] # => nil
# h.get [] of String # => nil
# ```
def dig?(hash_keys : Array)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot Arrays also be hash keys? What would you assume

arr = [1]
h1 = {arr => 42}
puts h1.dig(arr)

to output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh crap. ok, I'll have to rework it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there some kind of spec or anything that show what is a valid hash key?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, everything that responds to #hash.

@yxhuvud
Copy link
Contributor

yxhuvud commented Nov 12, 2016

I tried to do an implementation that matches the ruby behaviour, but I run into some strange compile error on HEAD. Works fine in Crystal 0.19.4:

  def dig(key : K, *args)
    value = fetch(key, nil)
    if args.size == 0 || value.nil?
      return value
    end
    if value.responds_to?(:dig)
      value.dig(*args)
    else
      raise ArgumentError.new
    end
  end

(or well, the Ruby variant raises TypeError, but that doesn't exist here)

@samueleaton
Copy link
Contributor Author

samueleaton commented Nov 12, 2016

@yxhuvud If we were to do recursion we wouldn't be able to have a single method that could handle both dig and dig?. They would both need to contain their own logic about whether to raise en exception or not.

I would like to do some benchmarks between recursion and reduce.

@Sija
Copy link
Contributor

Sija commented Nov 12, 2016

@samueleaton you could raise on nil outside of shared method, in dig.

@samueleaton
Copy link
Contributor Author

@Sija But i would lose the meta information about which key caused the exception.

raise KeyError.new "Missing hash key: #{key.inspect}" 

@samueleaton
Copy link
Contributor Author

In order to get this working for hashes AND arrays I'm gonna refactor it to be recursive as @yxhuvud suggested. We can see about DRYing it up and refactoring afterword.

@samueleaton
Copy link
Contributor Author

i think I found a bug that is preventing recursion.

My dig? method:

def dig?(k : K, *key_path)
  val = fetch(k, nil)
  return val if key_path.size == 0
  return nil unless val.responds_to? :dig
  val.dig?(*key_path)
end

using the following hash:

# my hash
h = {"a" => {"b" => {"c" => "d"}}, "x" => "z"}

If I give dig? a path that is equal to the longest path in the hash (root to furthest leaf), it works:

h.dig?("a", "b", "c") # => "d"

But if the path is shorter than the longest path, it fails:

h.dig?("x")

wrong number of arguments for 'Hash(String, Int32)#dig?' (given 0, expected 1+)
Overloads are:
 - Hash(K, V)#dig?(k : K, *key_path)

    val.dig?(*key_path)

@ysbaddaden
Copy link
Contributor

@samueleaton here is a functionalish implementation of dig that looks like working, searching in hashes and arrays mixed together, or any other type that implements dig —maybe Enumerable should implement it?

class Hash(K, V)
  def dig(key : K, *subkeys)
    if (value = self[key]?) && value.responds_to?(:dig)
      value.dig(*subkeys)
    end
  end

  def dig(key : K)
    self[key]?
  end
end

class Array(T)
  def dig(key : Int32, *subkeys)
    if (value = self[key]?) && value.responds_to?(:dig)
      value.dig(*subkeys)
    end
  end

  def dig(key : Int32)
    self[key]?
  end
end

hsh = { "a" => { "b" => { "c" => 1 } } }

p hsh.dig("a")
p hsh.dig("a", "b")
p hsh.dig("a", "b", "c")

ary = [ [ [ 1, 2] ] ]
p ary.dig(0)
p ary.dig(0, 0)
p ary.dig(0, 0, 1)

hsh_ary = { "a" => { "b" => [1, 2, { "c" => [1, 2, 3] }, 4] } }
p hsh_ary.dig("a", "b", 2, "c")

BTW I don't think there is much value in having both a dig? nilable method and a dig raising method. Calling hsh["a"]["c"][2] will already raise. The whole value of dig is to search within hashes and arrays and to return a value, in a nice syntax that won't raise.

@samueleaton
Copy link
Contributor Author

@ysbaddaden that's elegant AF. You make it look easy haha.
And yes I agree; Enumerable is the place for it. Pushing your changes.

@samueleaton samueleaton changed the title [WIP] Add dig and dig? methods to hash and array Add dig method to Enumerable Nov 12, 2016
describe "dig" do
it "gets the value at given path from given splat" do
a = ["x", {"a" => ["b"]}]
a.dig(1, "a", 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing .should ... expectation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+5 pts to Sija. 👍

end
end

def dig(key : Int32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this definition redundant due to previously defined dig(key : T)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be resolved now

Copy link
Contributor

@Sija Sija Nov 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, sorry! In the end there might be second definition needed (dig(index : Int)) due to []?(index : Int) defined in Indexable(T) module (included by Array(T)), which does not take T as an argument type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of my specs were still passing after I removed it. It should be fine.

@@ -509,6 +509,20 @@ describe "Array" do
b.should eq(3)
end

describe "dig" do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't all the specs be in spec/std/enumerable_spec.cr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be resolved now

@samueleaton
Copy link
Contributor Author

samueleaton commented Nov 13, 2016

@ysbaddaden Squashed it. Let me know if there anything else to do. 😊

Did you want to add this to the 0.20.0 milestone?

# h.dig "a", "b", 1 # => 20
# h.dig "a", "b", "c", "d", "e" # => nil
# ```
def dig(key : K, *subkeys)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pardon if I just don't get the current state of things, but where is K defined in Enumerable? I thought that type placeholder names which weren't defined in context needed to be declared with forall clauses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it should be like this, right?

def dig(key : K, *subkeys) forall K
  if (value = self[key]?) && value.responds_to?(:dig)
    value.dig(*subkeys)
  end
end

def dig(key : K) forall K
  self[key]?
end

@RX14
Copy link
Contributor

RX14 commented Nov 13, 2016

This method doesn't fit into Enumerable because it uses #[] instead of #each, but not in Indexable either because it works on hashes.

This method seems to belong to a module which assumes #[](k : Key) : Value, which can be applied to both array-likes (Key is an Int32) and hash-likes. Unfortunately creating a module for every single way of thinking about a collection would get confusing as to which abstraction you wanted.

@ysbaddaden
Copy link
Contributor

You're right @RX14, it doesn't belong to Enumerable. A Diggable module could fit better, but we may not want countless modules for each and every feature... Though dig should be implemented for Hash, Array, StaticArray, as well as Deque and Slice... So maybe a module would fit better, after all?

@yxhuvud
Copy link
Contributor

yxhuvud commented Nov 14, 2016

Well, such a module would be Mappable rather than Diggable, surely?

Also, Set would be an option as well, and if we get some Tree implementations, potentially that as well.

But really, that is probably overthinking it as the common use case probably be to dig into JSON.

@BrianHawley
Copy link

If there were a Mappable module, with the key and value types as generic parameters, like Hash, surely it could be used for more than just dig in the future. If this method would merit a different category, others could fit in the same category.

@asterite
Copy link
Member

This kind of methods make a lot of sense in a dynamic language, but not so much in Crystal, unless type safety is preserved. For that, this should probably be a macro of some sort, I don't think it can be done with a single method. That's why I wouldn't add this method in the standard library. I'd also like to see some real use cases for this.

@samueleaton
Copy link
Contributor Author

@asterite The major benefit would be when working with JSON or YAML. Being able to traverse the depth of the object and either return the value or nil would be fantastic. If we just add it to arrays and hashes that would be a huge win.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Nov 16, 2016

@asterite my use case (Hash only): configuring an application through a YAML (or JSON) file, where many optional components may have some configuration buried deep in hashes, or not (use defaults). Compare:

config.dig("adapters", "http", name, "port")

# vs

config["adapters"].fetch("http", {}).fetch(name, {}).fetch("port", nil)

I don't have a use case for digging inside arrays, thought.

@samueleaton
Copy link
Contributor Author

@ysbaddaden

Given JSON:

{
  "customers": [
    {
      "total_spend": 199
    },
    {
      "total_spend": 25
    }
  ]
}

To get the total_spend for the first customer:

hash.dig "customers", 0, "total_spend" # => 199

Seeing as JSON can have arrays it would seem incomplete to not support Array#dig

@yxhuvud
Copy link
Contributor

yxhuvud commented Nov 16, 2016

As it stands it kinda competes with JSON.mapping. Perhaps the same could be achieved by doing something there?

@bcardiff
Copy link
Member

Although the specs passes checking for equality is half of the story.
Aligned with asterite's #3536 (comment) what happen is that the return value of the dig method will have some uncomfortable type. For example from the proposed enumerable_spec.cr you will have

$ ./bin/crystal spec ./spec/std/enumerable_spec.cr:227
h.dig("a", "b", "c", 1) # => 20
typeof(h.dig("a", "b", "c", 1)) # => (Int32 | Nil)

h.dig("a", "b", "c") # => [10, 20]
typeof(h.dig("a", "b", "c")) # => (Array(Int32) | Nil)

h.dig([1, 2, 3], "a") # => "b"
typeof(h.dig([1, 2, 3], "a")) # => (Hash(String, Array(Int32)) | String | Nil)

Since checking for equality works smoothly with unions the returning type can go unnoticed.
Having this unions will force the user to some .as(T) methods after the dig probably in order to apply any but object methods.

For the time being it will be better to leave this patterns in a shard and if end up been used a lot we can add it later.

@ysbaddaden
Copy link
Contributor

Is the union really that problematic? I like dig a lot, it's a simple, nice looking, way to extract optional values from hashes; it already returns a nilable, by design, that must be dealt with; it doesn't require to create types to map values to, as YAML.mapping would require.

I won't create a Shard for such a small method (let's not go the npm way) but will most likely monkey-patch Hash#dig into different apps.

@sdogruyol
Copy link
Member

I agree with @ysbaddaden 👍

@Sija
Copy link
Contributor

Sija commented Apr 15, 2018

I vote to reopen this PR.

@RX14
Copy link
Contributor

RX14 commented Jul 10, 2018

If we reopen this:

  • the method must be on Indexable and Hash, not Enumerable.
  • dig should raise on a missing key, dig? should return nil.

The main usecase would be for traversing recursive aliases - in which case typeof(foo.dig(...)) would always be typeof(foo) which makes complete sense.

@ysbaddaden
Copy link
Contributor

I don't understand why there should be both dig and dig?. I repeat myself:

Calling hsh["a"]["c"][2] will already raise. The whole value of dig is to search within hashes (...) and to return a value (or not).

I don't understand why this should be restricted to recursive types. Again, the whole point is to extract a value at any level in the hash. Of course you get an union and must deal with it.

@ysbaddaden ysbaddaden reopened this Jul 10, 2018
@ysbaddaden ysbaddaden closed this Jul 10, 2018
@ysbaddaden
Copy link
Contributor

Sorry, I pushed wrong button.

@RX14
Copy link
Contributor

RX14 commented Jul 10, 2018

It's not going to be restricted to recursive types - I don't even know how that would be possible, I was just talking about the main usecase probably being JSON::Type if it comes back.

And I don't understand why dig isn't what you want - you can't just replace it with #[] calls because of the union types and casts. dig is a way to neatly avoid the casts when dealing with recursive aliases, and dig? is a way to extract optional values from hashes or whatever.

@ysbaddaden
Copy link
Contributor

  1. OK. I misunderstood 🙇‍♂️
  2. I guess I don't fathom the use case. I see x.dig(a, b, c) as a nilable alternative for x[a][b][c] without intermediary nil checks. Especially if we keep Any.

@RX14
Copy link
Contributor

RX14 commented Jul 10, 2018

@ysbaddaden yes that makes sense if we keep Any and the prime purpose of dig becomes to avoid nil checks. However if we remove Any then the prime purpose becomes to avoid .as casts not nil checks and having two variants becomes neccesary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants