Implemented #sorted_insert for Array using binary search. #5153

Groogy · 2017-10-20T15:38:03Z

I noticed there's no nice sorted_insert method for Array so I decided to make one. It's fairly simple and it uses the bsearch_index method to find where to put the provided object.

This is my first pull request so let me know if I missed anything or if you want me to add something more to the feature.

src/array.cr

Groogy · 2017-10-20T16:12:30Z

Fixed the things @RX14 wanted fixed + another spec on an empty array just in case

asterite · 2017-10-20T16:55:04Z

Thank you, but what's the use case of this method?

RX14 · 2017-10-20T17:26:20Z

@asterite you have a pre-sorted array and want to insert something into the array in a manner to keep it sorted but it's much faster than just iterating and inserting.

Personally i've needed it to implement a scheduler, where I had a queue of about 150,000 work items which needed to be kept ordered by computed next run timestamp, but insert then sort was way way too slow.

Groogy · 2017-10-20T17:32:29Z

What @RX14 said.
I have a cache which will grow to quite a large size and to not keep it sorted and search through the array is not really an option. What the method implements can be done manually but this just looks better and in my opinion makes the resulting code a bit more "crystal-esque" if that makes sense?

arr.sorted_insert(5)

looks better than

index = arr.bsearch_index(5)`
if index
  arr.insert(index, 5)
else
  arr.push(5)
end

It's convenient and something you need when you start working with really large data sets and you need to find specific elements inside it.

asterite · 2017-10-20T17:47:36Z

Mmm... what if I to insert an item in a sorted fashion, but not insert it if it's there already?

I'm not convinced that this method is needed. You can just write those 3 lines, it's probably not something that you'll have to write all the time. Plus with that method in place one might be tempted to insert multiple values at once to have a sorted array, when it's faster to insert multiple items and then do a single sort.

Additionally, Ruby doesn't have this method. And I don't know if other languages provide a method such as this.

asterite · 2017-10-20T17:51:31Z

And finally, if you need to keep an array sorted, it's probably better to use another data structure like a heap.

Groogy · 2017-10-20T17:57:01Z

Our library at work provides it since it is a very common thing for us that we need. The operation is of logarithmic complexity on insertion which isn't that bad. If you don't know about a big set of data forehand to fill the array with, then you aren't losing that much. Primary use case is when collection grows over time compared to when you have a pre-defined set of data to process.

On using a Heap data structure, the most efficient implementation of a heap is done using adaptation of this on top of an array.

e: On complexity and speed, quicksort worst-case is O(n2) and average O(nlog(n)) while binary search insertion should be O(nlog(n)) in both if you also count in moving elements. ~~So on large enough random data where the quicksort might get unlucky the insertion should technically be faster?~~(tested, result: nope, it's the reverse. If working with a chunk of data at once, this will excel on lower numbers) If you don't count the allocation overhead of course.

RX14 · 2017-10-20T18:06:02Z

@asterite there are plenty of methods in Array which are "just a few lines". The point is that because this is named sorted_insert you know at a glance what the operation is doing, instead of having to work it out from the implementation. I really don't think that this usecase is that rare.

konovod · 2017-10-20T19:01:48Z

I would prefer to have an always ordered data structure, with insert = sorted_insert, find = bsearch_index, so I can't accidentially break the contract by using << on it. On the other hand, such data structure isn't hard to write and stdlib can live without it.

funny-falcon · 2017-10-21T09:13:42Z

@RX14

Personally i've needed it to implement a scheduler, where I had a queue of about 150,000 work items which needed to be kept ordered by computed next run timestamp,

With 150,000 items sorted insertion could be also slow, if timestamp is a bit random (ie must be inserted close to beginning). And without optimised shift it will be a way slow to remove 'next to run'.
Such queue unavoidably must be implemented with heap. What about adding "#heapify" and "#heappop" methods? Python has great heapq module

For binary_insert to be relatively fast up to several thousand elements, there should be more progress in unification of array and Deque (#5148) :

insert_at should detect insertion is done near beginning of array, and then allocate room before @buffer, thus following insertion near beginning could be done by moving small number of elements to left, instead of moving almost whole array to right.

I used such technique once when I didn't what to use binary search tree, and I was sure array will not be larger than several dozen thousands of pointers.

yxhuvud · 2017-10-21T18:13:38Z

Personally i've needed it to implement a scheduler, where I had a queue of about 150,000 work items which needed to be kept ordered by computed next run timestamp, but insert then sort was way way too slow.

The typical way to implement a scheduler is to use a priority queue and while a sorted array can substitute for one, you are very likely to get better performance by using other data structures.

@Groogy You get O(n) cost for insertion this way which is waaay slow comparatively speaking. Yes, you find the spot in O(logn), but moving the entries out of place cost O(n). Sure, it is a somewhat fast O(n) due to memmove being fast, but it is still bad.

@RX14: See for example https://github.com/yxhuvud/pairing_heap/blob/master/src/priority_queue.cr for one that seems to have somewhat decent performance - you get O(1) for insertion and O(logn) for deletion of the topmost entry. Which is about as good as you get.

Can we please not add convenience methods that give us bad performance?

jhass · 2019-10-30T08:26:17Z

It seems nobody feels strong enough to about this to push it further. The usecases this addresses seem good for a shard, providing usecase optimized data structures and/or usecase optimized extensions to stdlib data structures. Then when that shard proliferates we can revisit which parts of it might make sense for stdlib.

RX14 requested changes Oct 20, 2017

View reviewed changes

src/array.cr Outdated Show resolved Hide resolved

src/array.cr Outdated Show resolved Hide resolved

src/array.cr Outdated Show resolved Hide resolved

Implemented #sorted_insert for Array using binary search.

7b25de1

funny-falcon mentioned this pull request Oct 28, 2017

Make Array to be workable as a queue, ie respect push+shift pattern. #5148

Closed

jhass closed this Oct 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented #sorted_insert for Array using binary search. #5153

Implemented #sorted_insert for Array using binary search. #5153

Groogy commented Oct 20, 2017

Groogy commented Oct 20, 2017 •

edited

Loading

asterite commented Oct 20, 2017

RX14 commented Oct 20, 2017 •

edited

Loading

Groogy commented Oct 20, 2017 •

edited

Loading

asterite commented Oct 20, 2017

asterite commented Oct 20, 2017

Groogy commented Oct 20, 2017 •

edited

Loading

RX14 commented Oct 20, 2017

konovod commented Oct 20, 2017

funny-falcon commented Oct 21, 2017 •

edited

Loading

yxhuvud commented Oct 21, 2017 •

edited

Loading

jhass commented Oct 30, 2019

Implemented #sorted_insert for Array using binary search. #5153

Implemented #sorted_insert for Array using binary search. #5153

Conversation

Groogy commented Oct 20, 2017

Groogy commented Oct 20, 2017 • edited Loading

asterite commented Oct 20, 2017

RX14 commented Oct 20, 2017 • edited Loading

Groogy commented Oct 20, 2017 • edited Loading

asterite commented Oct 20, 2017

asterite commented Oct 20, 2017

Groogy commented Oct 20, 2017 • edited Loading

RX14 commented Oct 20, 2017

konovod commented Oct 20, 2017

funny-falcon commented Oct 21, 2017 • edited Loading

yxhuvud commented Oct 21, 2017 • edited Loading

jhass commented Oct 30, 2019

Groogy commented Oct 20, 2017 •

edited

Loading

RX14 commented Oct 20, 2017 •

edited

Loading

Groogy commented Oct 20, 2017 •

edited

Loading

Groogy commented Oct 20, 2017 •

edited

Loading

funny-falcon commented Oct 21, 2017 •

edited

Loading

yxhuvud commented Oct 21, 2017 •

edited

Loading