Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Hash#first(n : Int) return subhash #2280

Closed
wants to merge 1 commit into from

Conversation

greyblake
Copy link
Contributor

Implement Hash#first(n : Int) method to return subhash.

E.g.

{"a" => 1, "b" => 2, "c" => 3}.first(2) 
# => {"a" => 1, "b" => 2}

@jhass
Copy link
Member

jhass commented Mar 6, 2016

I'd be curious about the usecase for that, I still consider depending on the order of a map that much a code smell.

@greyblake
Copy link
Contributor Author

My use case is the follwing. I have a hash where key is a name and value is a rate of entity.
After sorting the hash (there is not such method for hashes, I had to made it up), I want to get top N items

@jhass
Copy link
Member

jhass commented Mar 6, 2016

Why is it a hash, compared to say an array of tuples or structs?

@greyblake
Copy link
Contributor Author

Well, it can be of course array of tuples, but then again, I will need to convert it back to Hash, because Hash is much easier to work with. I need to get rate(value) by name(key)

@greyblake
Copy link
Contributor Author

@jhass Actually, does Crystal hash guarantee an order of elements? I've realized, that I am not sure about this. If no, this PR does not make sense and you can close it.

@jhass
Copy link
Member

jhass commented Mar 6, 2016

It does, it maintain a single linked list through the hash items just like Ruby's does.

@ysbaddaden
Copy link
Contributor

I'm not sure there is any guarantee for hash elements to be ordered, I would say this is an implementation detail and shouldn't be relied upon.

@kumpelblase2
Copy link
Contributor

But given there's a first instance method, this is already relied on, isn't it?

@asterite
Copy link
Member

asterite commented Mar 7, 2016

Hash keeps order of insertion, it's not said in the API docs but it should (I didn't have time to document it yet). That said, we can always reconsider this "feature": it takes more memory and logic to maintain the order, but maybe it's not used that much, I don't know.

I'm not sure about a #first(n) method. @greyblake Could we see the code that would use this method?

@greyblake
Copy link
Contributor Author

@asterite

I am working on a natural language detection tool, based on N-grams:

  def self.guess_all(sample_text, top = 5)
    sample_trigrams = LanguageProfileBuilder.build(sample_text)
    dists = Array(Tuple(String, Float64)).new(@@profiles.size)
    @@profiles.each do |code, lang_trigrams|
      dist = Comparer.distance(sample_trigrams, lang_trigrams)
      dists << {code, dist}
    end
    sorted = dists.sort_by { |kv| kv[1] }
    sorted.first(top)
  end

So, in this example I use dists which is array of Tuples, where 1st element is a language code (e.g. eng), the second is a distance that represents how close the sample_text to every particular language.

sorted.first(5) - is more like a pseudo code (currently there is not support for Array#first, see #2281.

Finally, I want the method to return Hash of first n-top languages with their distances.

P.S. It's also would be good to have a proper Hash#sort / sort_by method, but it's gonna be a separate thing.

@asterite
Copy link
Member

asterite commented Mar 8, 2016

Can't you do sorted.take(top).to_h?

@greyblake
Copy link
Contributor Author

Well, of course I can, I still I would prefer to have this method on Hash.

Maybe, if you don't really like the idea, we can close the PR, until somebody else raises similar quation. I am OK with it.

@asterite
Copy link
Member

asterite commented Mar 8, 2016

My main concern is that I don't know what would be more intuitive: should first return a Hash, or an Array of tuples? The use case is also contrived, it would mean "give me a hash of the first elements that were inserted into this hash". However, in your case you need the first elements because you sorted them first. And it's cheaper to sort an array of tuples than a hash. And in your code you are sorting by value, not sure how would that translate to sorting into a hash.

Basically, all of the above doubts :-)

@greyblake
Copy link
Contributor Author

@asterite I guess you're right, it's a topic to discuss. Personally for me it would be more natural if it returns hash... But if we take a look at Ruby it returns Array of arrays:

 {a: 10, b: 20, c: 30}.first(2)
 => [[:a, 10], [:b, 20]] 

So for Crystal it would be Array of tuples (still it is better then Array of arrays) :).

So I you think it will be more efficient to return Array of tuples I am OK with :)

@asterite
Copy link
Member

asterite commented Jun 2, 2016

@greyblake Let's make it return an array of tuples, like in Ruby. Then I'll merge. Thanks!

@greyblake
Copy link
Contributor Author

greyblake commented Jun 2, 2016

OK, thanks, I will, as soon as I solve this problem on my Debian laptop:) #1964

@asterite
Copy link
Member

Now that Hash is Enumerable and Enumerable#first(n) exists and it will return an array of tuple (like in Ruby), there's isn't much more to do here. Well, we could override it to return a Hash, but you can can to first(3).to_h to get the same behaviour.

@asterite asterite closed this Jun 10, 2016
@greyblake
Copy link
Contributor Author

Cool! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants