Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrays.sort #3919

Closed
sri opened this issue May 24, 2016 · 16 comments
Closed

Arrays.sort #3919

sri opened this issue May 24, 2016 · 16 comments

Comments

@sri
Copy link
Contributor

sri commented May 24, 2016

Is there a reason that JRuby's RubyArray uses it own hand-rolled sort (in Qsort.java) instead of using Java's builtin Arrays.sort?

I'm looking at these files:

The reason I'm asking: I encountered a bug in a piece of code. A class was overriding the <=> method, but due to a typo, it was doing this comparision:

def <=>(other); self.a <=> other.b; end

instead of

def <=>(other); self.a <=> other.a; end.

Obviously, that is a bug in the application. I didn't notice the bug because a == b most of the time. However, recently, there were cases where a != b and that's when the bug manifisted. And JRuby's sort got stuck in an infinite loop.

When I converted JRuby to use Arrays.sort instead of Qsort and tried it with the bad data, it threw an exception: "Comparison method violates its general contract!".

Also, there are other benefits of using Arrays.sort:
https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/java/util/TimSort.java#L29

Is there interest in converting RubyArray to use Arrays.sort?

Thanks!
-Sriram

@sri
Copy link
Contributor Author

sri commented May 25, 2016

Ah, I see according to this

Improved sort for JRUBY-2198: Array#sort is slower than MRI
4c38d86

looks like, Arrays.sort was pulled out because it was slow. But that was in 2008. Looks like Arrays.sort was rewritten in 2009 to use TimSort:

openjdk-mirror/jdk7u-jdk@cdf72de

Is there an easy way to benchmark the current Qsort.java with the current Arrays.sort?

@headius
Copy link
Member

headius commented May 25, 2016

@sri Yes, we introduced our own sort both for performance and to match MRI behaviorally.

It would definitely be worth examining whether the rewritten JDK-provided Arrays.sort is now as fast or faster, since we always like deleting code. Are you interested in working on a test patch and benchmarking it?

@sri
Copy link
Contributor Author

sri commented May 25, 2016

@headius Yes, I am totally interested. The deleting part seems easy enough: delete Qsort.java and replace calls to it with Arrays.sort. I can get started on that.

Is there a benchmarking data that I can use to benchmark the sorts? Or am I coming up with my own?

@sri
Copy link
Contributor Author

sri commented May 27, 2016

@headius Here is my first attempt at a sort benchmark: https://github.com/sri/jruby-sort-benchmark.

They are pretty comparable. The major difference in the total timing is Java's native sort is that it doesn't do much work when the array is sorted or mostly sorted. I recall that was TimSort's benefits.

Btw, these benchmarks were run inside of a Ubuntu VM on my Macbook laptop.

Here is another run on my machine:

JRuby's QSort
Iterations: 5000000
JRUBY_VERSION: 9.1.3.0-SNAPSHOT
RUBY_VERSION: 2.3.0
Rehearsal -------------------------------------------------
sorted_ints     0.050000   0.010000   0.060000 (  0.029992)
random_ints     2.150000   0.010000   2.160000 (  1.918142)
random_floats   4.850000   0.000000   4.850000 (  4.485699)
random_objs    12.870000   0.010000  12.880000 ( 12.627390)
--------------------------------------- total: 19.950000sec

                    user     system      total        real
sorted_ints     0.030000   0.000000   0.030000 (  0.028930)
random_ints     2.310000   0.000000   2.310000 (  2.299806)
random_floats   5.270000   0.000000   5.270000 (  5.253322)
random_objs    19.450000   0.000000  19.450000 ( 16.056569)
==================================================================
Java's native Arrays.sort
Iterations: 5000000
JRUBY_VERSION: 9.1.3.0-SNAPSHOT
RUBY_VERSION: 2.3.0
Rehearsal -------------------------------------------------
sorted_ints     0.040000   0.000000   0.040000 (  0.030032)
random_ints     3.060000   0.020000   3.080000 (  2.494412)
random_floats   4.290000   0.000000   4.290000 (  4.047059)
random_objs    10.930000   0.000000  10.930000 ( 10.746148)
--------------------------------------- total: 18.340000sec

                    user     system      total        real
sorted_ints     0.020000   0.000000   0.020000 (  0.021787)
random_ints     6.640000   0.020000   6.660000 (  4.840657)
random_floats   4.410000   0.010000   4.420000 (  4.413161)
random_objs    11.040000   0.010000  11.050000 ( 11.017320)

Looking at the overall timing, Arrays.sort seems a bit faster, but not sure I understand this part on with Arrays.sort does pretty badly on the random_ints part:

sorted_ints     0.020000   0.000000   0.020000 (  0.021787)
random_ints     6.640000   0.020000   6.660000 (  4.840657)
random_floats   4.410000   0.010000   4.420000 (  4.413161)
random_objs    11.040000   0.010000  11.050000 ( 11.017320)

Any suggestions on what I should do next?

Thanks!

@kares
Copy link
Member

kares commented May 27, 2016

najs work ... you should also include Java version as the numbers might be different on Java 7 vs. 8

@sri
Copy link
Contributor Author

sri commented May 27, 2016

@enebo
Copy link
Member

enebo commented May 27, 2016

@sri I think too you may want to add some <16 element data in here as well. Our impl did insertion sort for smaller lists.

I expect this new improved code in Java will be as good as us. I think this impl of Sort happened way back in the 1.4 days when it was trivial to micro opt anything to perform better than what they shipped.

@sri
Copy link
Contributor Author

sri commented May 27, 2016

@enebo OK added that (sri/jruby-sort-benchmark@031e618#diff-40e9e3b314e488cae33dd0064545543eR19) and updated the Readme with the latest report: https://github.com/sri/jruby-sort-benchmark/blob/master/README.md. Seems comparable. I varies a little bit each time I run it. I wonder if there is a better way to capture the results.

@headius
Copy link
Member

headius commented May 27, 2016

Probably would get better results if you ran the whole benchmark in a loop, so the JVM can settle in and JIT/GC things properly.

Thanks for looking into this! If the numbers look mostly the same after all our benchmarking, I have no problem making this move. No code is good code :-)

@sri
Copy link
Contributor Author

sri commented May 28, 2016

@headius I've update the Readme (https://github.com/sri/jruby-sort-benchmark/blob/master/README.md) as you suggested (https://github.com/sri/jruby-sort-benchmark/blob/master/sort_benchmark.rb#L31).

It seems to be all over the place. Maybe I should run the benkmarks on the same data? I can try that next. Currently, it generates random data for Qsort and Arrays.sort.

Also, you mentioned about matching MRI behavior -- how can I test that out?

@sri
Copy link
Contributor Author

sri commented May 29, 2016

OK I modified the script to load the same data for both the sorts (see the Readme).
The results look the same:

  • QSort is slightly better with random_ints (2.43s vs 2.75s) and random_floats (4.10s vs 4.69s).
  • Arrays.sort is slight better with random_objs: (10.79s vs 12.47s).

This is from the 5th run:

5th run of Qsort

                    user     system      total        real
sorted_ints     0.030000   0.000000   0.030000 (  0.026600)
random_ints     2.410000   0.000000   2.410000 (  2.411838)
random_floats   3.940000   0.000000   3.940000 (  3.941184)
random_objs    12.470000   0.000000  12.470000 ( 12.478023)
small_arrays    0.150000   0.000000   0.150000 (  0.148153)
Rehearsal -------------------------------------------------
sorted_ints     0.020000   0.000000   0.020000 (  0.023694)
random_ints     2.440000   0.000000   2.440000 (  2.434369)
random_floats   4.100000   0.000000   4.100000 (  4.103881)
random_objs    12.470000   0.000000  12.470000 ( 12.470829)
small_arrays    0.140000   0.000000   0.140000 (  0.149050)
--------------------------------------- total: 19.170000sec

5th run of Arrays.sort

                    user     system      total        real
sorted_ints     0.030000   0.000000   0.030000 (  0.022867)
random_ints     2.710000   0.000000   2.710000 (  2.719829)
random_floats   4.700000   0.010000   4.710000 (  4.711551)
random_objs    10.820000   0.000000  10.820000 ( 10.837966)
small_arrays    0.150000   0.000000   0.150000 (  0.150290)
Rehearsal -------------------------------------------------
sorted_ints     0.030000   0.000000   0.030000 (  0.025033)
random_ints     2.750000   0.000000   2.750000 (  2.756032)
random_floats   4.680000   0.010000   4.690000 (  4.691306)
random_objs    10.790000   0.000000  10.790000 ( 10.795925)
small_arrays    0.150000   0.000000   0.150000 (  0.147779)
--------------------------------------- total: 18.410000sec

It'll be great if someone else can run and confirm the results!

@sri
Copy link
Contributor Author

sri commented Jun 1, 2016

Oh, btw, I had to add this option to the JVM to get it to run without memory errors: -J-Xmx5000M (for both Arrays.sort and Qsort)

sri added a commit to sri/jruby that referenced this issue Jun 9, 2016
Java's Arrays.sort more robust and is faster is certain cases.

See jruby#3919 and
https://github.com/sri/jruby-sort-benchmark.
@enebo
Copy link
Member

enebo commented Jul 27, 2016

Fixed by PR. closing.

@enebo enebo closed this as completed Jul 27, 2016
@annuyadav
Copy link

Hi, I am getting 'Comparison method violates its general contract!' error while sorting below array:
[["1"], ["10"], ["20"], ["30"], ["40"], ["50"], ['140'],
[nil], [nil], [nil], [nil], [nil], [nil], [nil], [nil], [nil], [nil], [nil], [nil], [nil], [nil], [nil], [nil], [nil],
["60"], ["70"], ["80"], ["90"], ["100"], ["110"], ["120"], ["130"]].sort

but if remove an nil from the above error then its working fine. Please help in this issue.

@BanzaiMan
Copy link
Member

@annuyadav This is expected behavior in Ruby in general, so it has nothing to do with this issue. You need to clean up your data before calling Array#sort.

@anup1710
Copy link

anup1710 commented Jun 6, 2019

@annuyadav - That's correct you need to clean the data for sorting or make comparator method to handle it gracefully to return correct order for nil values. I.e. 0, 1 or -1

@jruby jruby locked as off-topic and limited conversation to collaborators Jun 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants