-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented #sorted_insert for Array using binary search. #5153
Conversation
Fixed the things @RX14 wanted fixed + another spec on an empty array just in case |
Thank you, but what's the use case of this method? |
@asterite you have a pre-sorted array and want to insert something into the array in a manner to keep it sorted but it's much faster than just iterating and inserting. Personally i've needed it to implement a scheduler, where I had a queue of about 150,000 work items which needed to be kept ordered by computed next run timestamp, but insert then sort was way way too slow. |
What @RX14 said. arr.sorted_insert(5) looks better than index = arr.bsearch_index(5)`
if index
arr.insert(index, 5)
else
arr.push(5)
end It's convenient and something you need when you start working with really large data sets and you need to find specific elements inside it. |
Mmm... what if I to insert an item in a sorted fashion, but not insert it if it's there already? I'm not convinced that this method is needed. You can just write those 3 lines, it's probably not something that you'll have to write all the time. Plus with that method in place one might be tempted to insert multiple values at once to have a sorted array, when it's faster to insert multiple items and then do a single sort. Additionally, Ruby doesn't have this method. And I don't know if other languages provide a method such as this. |
And finally, if you need to keep an array sorted, it's probably better to use another data structure like a heap. |
Our library at work provides it since it is a very common thing for us that we need. The operation is of logarithmic complexity on insertion which isn't that bad. If you don't know about a big set of data forehand to fill the array with, then you aren't losing that much. Primary use case is when collection grows over time compared to when you have a pre-defined set of data to process. On using a Heap data structure, the most efficient implementation of a heap is done using adaptation of this on top of an array. e: On complexity and speed, quicksort worst-case is O(n2) and average O(nlog(n)) while binary search insertion should be O(nlog(n)) in both if you also count in moving elements. |
@asterite there are plenty of methods in Array which are "just a few lines". The point is that because this is named |
I would prefer to have an always ordered data structure, with |
With 150,000 items sorted insertion could be also slow, if timestamp is a bit random (ie must be inserted close to beginning). And without optimised For
I used such technique once when I didn't what to use binary search tree, and I was sure array will not be larger than several dozen thousands of pointers. |
The typical way to implement a scheduler is to use a priority queue and while a sorted array can substitute for one, you are very likely to get better performance by using other data structures. @Groogy You get O(n) cost for insertion this way which is waaay slow comparatively speaking. Yes, you find the spot in O(logn), but moving the entries out of place cost O(n). Sure, it is a somewhat fast O(n) due to memmove being fast, but it is still bad. @RX14: See for example https://github.com/yxhuvud/pairing_heap/blob/master/src/priority_queue.cr for one that seems to have somewhat decent performance - you get O(1) for insertion and O(logn) for deletion of the topmost entry. Which is about as good as you get. Can we please not add convenience methods that give us bad performance? |
It seems nobody feels strong enough to about this to push it further. The usecases this addresses seem good for a shard, providing usecase optimized data structures and/or usecase optimized extensions to stdlib data structures. Then when that shard proliferates we can revisit which parts of it might make sense for stdlib. |
I noticed there's no nice sorted_insert method for Array so I decided to make one. It's fairly simple and it uses the bsearch_index method to find where to put the provided object.
This is my first pull request so let me know if I missed anything or if you want me to add something more to the feature.