-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MINOR: Dry up RubyArray and improve performance in a few spots #4751
Conversation
return new RubyArray(runtime, values); | ||
} | ||
|
||
public static RubyArray newArray(Ruby runtime, List<? extends IRubyObject> list) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still'd like to have the List
version around for binary compatibility - it also does not need toArray
for packed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kares right bringing it back sec :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
brought it back here https://github.com/jruby/jruby/pull/4751/files#diff-008115c4ee37b136355736c9aa675fa3R264 with nicely separate packed and not packed paths :)
@@ -4100,7 +4077,9 @@ public IRubyObject sample(ThreadContext context, IRubyObject[] args) { | |||
if (args.length > 0) { | |||
IRubyObject hash = TypeConverter.checkHashType(context.runtime, args[args.length - 1]); | |||
if (!hash.isNil()) { | |||
IRubyObject[] rets = ArgsUtil.extractKeywordArgs(context, (RubyHash) hash, new String[] { "random" }); | |||
IRubyObject[] rets = ArgsUtil.extractKeywordArgs(context, (RubyHash) hash, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep it consistently a one liner maybe :) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kares sure :)
@@ -4792,7 +4771,7 @@ public IRubyObject min(ThreadContext context, IRubyObject num, Block block) { | |||
} | |||
|
|||
private static final int optimizedCmp(ThreadContext context, IRubyObject a, IRubyObject b, int token, CachingCallSite op_cmp, CallSite op_gt, CallSite op_lt) { | |||
if (token == ((RubyBasicObject) a).getMetaClass().generation) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this cast is all over the place - it used to help inlining ... without the JIT wasn't just smart-enough to get it "fast"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kares ah I see, phew I'm not sure about this point, but in Java 8 this should not help anything anymore right. I'd rather expect this to potentially be harmful because it increases the size of the method and the type profiler will figure out that this is always a RubyBasicObject
eventually right? (I could investigate this a little with JitWatch I guess :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kk, but Java 7 still supported.
so it might be valuable to measure using that as well, although Java 8 first I guess
(9K is already considerably slower on 7)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kares wow I stand 100% corrected here, check this out:
Tried to use a slightly deeper (in the sense of inheritance) case and benchmarked this:
@Benchmark
@OperationsPerInvocation(EVENTS_PER_INVOCATION)
public final void castPerformanceImpactWithout(final Blackhole blackhole) throws Exception {
for (int i = 0; i < EVENTS_PER_INVOCATION; ++i) {
blackhole.consume(random().isTrue());
}
}
@Benchmark
@OperationsPerInvocation(EVENTS_PER_INVOCATION)
public final void castPerformanceImpactWith(final Blackhole blackhole) throws Exception {
for (int i = 0; i < EVENTS_PER_INVOCATION; ++i) {
blackhole.consume(((RubyBoolean) random()).isTrue());
}
}
Results are:
# Run complete. Total time: 00:01:03
Benchmark Mode Cnt Score Error Units
CastExperimentBenchmark.castPerformanceImpactWith thrpt 20 92950.858 ± 1169.586 ops/ms
CastExperimentBenchmark.castPerformanceImpactWithout thrpt 20 69318.454 ± 2354.502 ops/ms
didn't expect this at all (same effect for JDK 7 and 8 btw.) => will revert + thanks for the very educational moment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private static IRubyObject random() {
return RubyUtil.RUBY.newBoolean(RANDOM.nextBoolean());
}
private static final Random RANDOM = new Random();
was my object generator :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overhead in these cases seems to be the keeping of the type profile itself, inlining is not affected as such and method size is even a byte larger from the cast as I expected:
5189 2084 4 org.logstash.benchmark.CastExperimentBenchmark::castPerformanceImpactWith (28 bytes)
@ 9 org.logstash.benchmark.CastExperimentBenchmark::random (13 bytes) inline (hot)
@ 6 java.util.Random::nextBoolean (14 bytes) inline (hot)
@ 2 java.util.Random::next (47 bytes) inline (hot)
@ 8 java.util.concurrent.atomic.AtomicLong::get (5 bytes) inline (hot)
@ 32 java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes) inline (hot)
@ 9 sun.misc.Unsafe::compareAndSwapLong (0 bytes) (intrinsic)
@ 9 org.jruby.Ruby::newBoolean (16 bytes) inline (hot)
@ 15 org.jruby.RubyBasicObject::isTrue (17 bytes) inline (hot)
@ 18 org.openjdk.jmh.infra.Blackhole::consume (28 bytes) disallowed by CompilerOracle
2917 2053 4 org.logstash.benchmark.CastExperimentBenchmark::castPerformanceImpactWithout (27 bytes)
@ 9 org.logstash.benchmark.CastExperimentBenchmark::random (13 bytes) inline (hot)
@ 6 java.util.Random::nextBoolean (14 bytes) inline (hot)
@ 2 java.util.Random::next (47 bytes) inline (hot)
@ 8 java.util.concurrent.atomic.AtomicLong::get (5 bytes) inline (hot)
@ 32 java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes) inline (hot)
@ 9 sun.misc.Unsafe::compareAndSwapLong (0 bytes) (intrinsic)
@ 9 org.jruby.Ruby::newBoolean (16 bytes) inline (hot)
@ 12 org.jruby.RubyBasicObject::isTrue (17 bytes) inline (hot)
@ 12 org.jruby.RubyBasicObject::isTrue (17 bytes) inline (hot)
\-> TypeProfile (140240/280503 counts) = org/jruby/RubyBoolean$False
\-> TypeProfile (140263/280503 counts) = org/jruby/RubyBoolean$True
@ 17 org.openjdk.jmh.infra.Blackhole::consume (28 bytes) disallowed by CompilerOracle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, and yes it surprising (and IDE gives warnings all over) also did not know about that.
... but I recall @headius lecture on some of the issues along the way of making JRuby.
think he also raised the issue to the HotSpot team - maybe for 9 thay might do smt about it.
or maybe they won't and eventually just push GraalVM's optimizations for all as Java 10 :)
03cd174
to
51cd6a2
Compare
@kares thanks for the swift review, addressed all comments I think/hope :) |
51cd6a2
to
5152772
Compare
@kares reverted the removed cast and added some benchmarks here #4751 (comment) (just in case GH hides it now that I pushed over the change, kind of interesting imo :)) |
no one complained in a week or so, thus let's ship it ... |
Some cleanups I came across debugging
RubyArray
performance recently :)RubyArray
USE_PACKED_ARRAYS == true
paths to a singleif
branch to be more JIT friendly hereisPackedArray
will be nicely dead code eliminated now for theCollection
case and we save an expensivesize
call here and there (+ bytecode becomes smaller :))static
that had no instance reference anywherelist.toArray(new IRubyObject[list.size()]))
with the faster versioncollection.toArray(IRubyObject.NULL_ARRAY)
Just a few suggestions, let me know what you think :)