Apply Optimize Java 8 Streams refactoring #2837

khatchad · 2018-08-20T20:58:41Z

Apply Optimize Java 8 Streams Refactoring

Description

This is a semantics-preserving automated refactoring that attempts to optimize code using Java 8 streams when it is safe and possibly advantageous to do so. It may make certain streams parallel, others sequential, and unorder streams where necessarily. The tool does not add new functionality; it only rearranges existing code in an effort to increase its performance. Your program's results should be the same before and after the refactoring.

Details

We are evaluating a research prototype automated refactoring Eclipse plug-in called Optimize Java 8 Streams Refactoring. We have applied the tool to your project in the hopes of receiving feedback.
The approach is very conservative. That may mean that not all changes that can be made were made.
The source code should be semantically equivalent to the original.
There was one manual change reversal due to parallelization of a stream with only one element.
The tool does not assess overhead vs. the amount of data processed; it only performs the refactoring when it is safe and possibly advantageous to do so. However, we ran several performance tests on the refactored code (see below).

Performance Evaluation

We did not find dedicated performance tests in your project (please let us know if we missed it). But, we did evaluate our tool on other projects (see iluwatar/java-design-patterns#786 (comment) and numenta/htm.java#539 (comment)). We have found an average speedup of ~3.25 as a result of our refactoring across all of our tests thus far.

Feedback

Thank you for your help in this evaluation! Any feedback you can provide would be very helpful. In particular, we are interested if each of the proposed changes are helpful or not.

See https://github.com/ponder-lab/Optimize-Java-8-Streams-Refactoring.

This is a manual step.

joakime · 2018-08-20T21:10:57Z

First (and most important) be sure you follow the Eclipse Legal / IP-Validation requirements.
You currently do not have an Eclipse ECA on file at Eclipse, nor have you follow the Developer Certificate of Origin requirement to "sign off" the git commit.
https://github.com/eclipse/jetty.project/blob/jetty-9.4.x/CONTRIBUTING.md#eclipse-contributor-agreement
We'll need both of those addressed before we can accept this PR.

The java 8 .parallelStream() method uses the ForkJoinPool.commonPool() to spawn threads to handle this parallel processing.
This is undesired on a server platform, as those extra threads are not being handled by the server managed thread pools.
However, since this PR is limited to just jetty-start the impact should be negligible.

khatchad · 2018-08-20T21:18:46Z

@joakime Thanks for the prompt feedback. AFAIK, I have signed the agreement; it seems that the validation server is down. But, I did forget to sign my commits. I'll do that now.

sbordet · 2018-08-20T21:27:50Z

Using parallelStream() is only for exceptional cases.

Here we are on a server (not a good place to use parallelStream()) and we are iterating on streams that contain 10s or at most 100s elements (again not a good parallelStream() case).

Rule of thumb shared by most experts about parallelStream() is that you need at least 10k elements to win the parallel setup overhead.

@khatchad IIUC your benchmarks have been done with 1 million elements to show those gains.
Do you have gains for 10 and 100 elements only?

See https://github.com/ponder-lab/Optimize-Java-8-Streams-Refactoring. Signed-off-by: Raffi Khatchadourian <raffi.khatchadourian@hunter.cuny.edu>

This is a manual step. Signed-off-by: Raffi Khatchadourian <raffi.khatchadourian@hunter.cuny.edu>

….project into streams

khatchad · 2018-08-20T21:31:37Z

@joakime I signed my commits retroactively but doing so after a push doesn't seem to change the old commits. I guess I'll need to open a new PR. Sorry about that.

@sbordet I don't. Indeed, 1 million is the best gains I've seen in the literature. I'll run my script with 10 and 100 to see how it looks.

khatchad · 2018-08-20T23:37:41Z

Superseded by #2838.

gregw · 2018-08-20T23:41:33Z

I am not keen for Jetty as a server to use things like parallelStreams. The use of parallelism is really good if you have a single big task to do and wish to use all the CPUs available to achieve that in minimum time.
However, that is not the job of a server, which has many many tasks to do and the most important thing to optimize is the number of tasks completed rather than the elapsed time of any particular task.

Specifically, the last thing we want to do for any task that originates from a single connection/request, is to spread it's handling over all the CPUs of a server, filling all the caches with data from/for that single request. It is far FAR better that each CPU and its caches is dedicated to tasks from individual requests for as long as possible, allowing other CPUs to work on other requests with minimal (if any) contention. We actually go to some lengths in our scheduler to achieve this.

So thanks for the contribution, but it's not a direction we wish to follow.

khatchad · 2018-08-20T23:58:49Z

@gregw Thank you for the feedback!

khatchad · 2018-08-21T01:02:40Z

@sbordet:

@khatchad IIUC your benchmarks have been done with 1 million elements to show those gains.
Do you have gains for 10 and 100 elements only?

It would seem that 10 and 100 elements is not enough to overcome the overhead. Please see below. The left is the original and the right is the refactored version:

156,160c160,164
< Benchmark                                    (N)  Mode  Cnt   Score    Error  Units
< AbstractDocumentTest.shouldRetrieveChildren   10  avgt    5  ≈ 10⁻⁴           ms/op
< AbstractDocumentTest.shouldRetrieveChildren  100  avgt    5   0.001 ±  0.001  ms/op
< DomainTest.shouldConstructCar                 10  avgt    5  ≈ 10⁻⁴           ms/op
< DomainTest.shouldConstructCar                100  avgt    5   0.001 ±  0.001  ms/op
---
> Benchmark                                    (N)  Mode  Cnt  Score   Error  Units
> AbstractDocumentTest.shouldRetrieveChildren   10  avgt    5  0.043 ± 0.022  ms/op
> AbstractDocumentTest.shouldRetrieveChildren  100  avgt    5  0.293 ± 0.027  ms/op
> DomainTest.shouldConstructCar                 10  avgt    5  0.045 ± 0.009  ms/op
> DomainTest.shouldConstructCar                100  avgt    5  0.280 ± 0.026  ms/op

khatchad added 2 commits August 20, 2018 16:49

Apply Optimize Java 8 Stream refactoring.

da8e05d

See https://github.com/ponder-lab/Optimize-Java-8-Streams-Refactoring.

Remove spurious refactoring.

7ca06b3

This is a manual step.

khatchad added 3 commits August 20, 2018 17:28

Apply Optimize Java 8 Stream refactoring.

fbb0d0e

See https://github.com/ponder-lab/Optimize-Java-8-Streams-Refactoring. Signed-off-by: Raffi Khatchadourian <raffi.khatchadourian@hunter.cuny.edu>

Remove spurious refactoring.

628a521

This is a manual step. Signed-off-by: Raffi Khatchadourian <raffi.khatchadourian@hunter.cuny.edu>

Merge branch 'streams' of https://github.com/khatchadourian-lab/jetty…

39f16e7

….project into streams

khatchad mentioned this pull request Aug 20, 2018

Apply Optimize Java 8 Streams refactoring #2838

Closed

khatchad closed this Aug 20, 2018

khatchad deleted the streams branch August 20, 2018 23:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply Optimize Java 8 Streams refactoring #2837

Apply Optimize Java 8 Streams refactoring #2837

khatchad commented Aug 20, 2018

joakime commented Aug 20, 2018

khatchad commented Aug 20, 2018

sbordet commented Aug 20, 2018

khatchad commented Aug 20, 2018

khatchad commented Aug 20, 2018

gregw commented Aug 20, 2018

khatchad commented Aug 20, 2018

khatchad commented Aug 21, 2018

Apply Optimize Java 8 Streams refactoring #2837

Apply Optimize Java 8 Streams refactoring #2837

Conversation

khatchad commented Aug 20, 2018

Apply Optimize Java 8 Streams Refactoring

Description

Details

Performance Evaluation

Feedback

joakime commented Aug 20, 2018

khatchad commented Aug 20, 2018

sbordet commented Aug 20, 2018

khatchad commented Aug 20, 2018

khatchad commented Aug 20, 2018

gregw commented Aug 20, 2018

khatchad commented Aug 20, 2018

khatchad commented Aug 21, 2018