Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply optimize stream refactoring #539

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

khatchad
Copy link

@khatchad khatchad commented Mar 21, 2018

We are in the process of evaluating a research project associated with an automated refactoring tool that attempts to optimize Java 8 streams for maximum performance. In the case of your project, it converted several streams to parallel. The tool is very conservative, meaning that the tool goes through an extensive static analysis to ensure that the transformation is safe, i.e., original program semantics have been preserved. However, this also means that the proposed changes can be a proper subset of all streams that can be optimized. We would appreciate any feedback on the change set that you have.

We found a JMH performance test in your project that does a quick test on a small dataset. To assess the increased parallelism, however, we altered this test to have a larger data set. The results show an overall speedup of ~1.55 (see below):

Score Mode/Unit
avgt
Benchmark ms/op
org.numenta.nupic.benchmarks.SpatialPoolerLocalInhibitionBenchmark.measureAvgCompute_7_Times 1.131135828
org.numenta.nupic.benchmarks.TemporalMemoryBenchmark.measureAvgCompute_7_Times 1.973052763
Average 1.552094295

We have also pasted the patch to your JMH test suite that we used to increase the data set. Please let us know if you have any questions and again we appreciate any feedback!

diff --git a/src/jmh/java/org/numenta/nupic/benchmarks/AbstractAlgorithmBenchmark.java b/src/jmh/java/org/numenta/nupic/benchmarks/AbstractAlgorithmBenchmark.java
index 8a81d171d..4c978d47c 100644
--- a/src/jmh/java/org/numenta/nupic/benchmarks/AbstractAlgorithmBenchmark.java
+++ b/src/jmh/java/org/numenta/nupic/benchmarks/AbstractAlgorithmBenchmark.java
@@ -43,11 +43,11 @@ public abstract class AbstractAlgorithmBenchmark {
 
     @Setup
     public void init() {
-        SDR = new int[2048];
+        SDR = new int[40960];
 
         //Layer components
         ScalarEncoder.Builder dayBuilder = ScalarEncoder.builder()
-            .n(8)
+            .n(160)
             .w(3)
             .radius(1.0)
             .minVal(1.0)
@@ -76,9 +76,9 @@ public abstract class AbstractAlgorithmBenchmark {
      */
     protected Parameters getParameters() {
         Parameters parameters = Parameters.getAllDefaultParameters();
-        parameters.set(KEY.INPUT_DIMENSIONS, new int[] { 8 });
-        parameters.set(KEY.COLUMN_DIMENSIONS, new int[] { 2048 });
-        parameters.set(KEY.CELLS_PER_COLUMN, 32);
+        parameters.set(KEY.INPUT_DIMENSIONS, new int[] { 160 });
+        parameters.set(KEY.COLUMN_DIMENSIONS, new int[] { 40960 });
+        parameters.set(KEY.CELLS_PER_COLUMN, 640);
 
         //SpatialPooler specific
         parameters.set(KEY.POTENTIAL_RADIUS, 12);//3
diff --git a/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerGlobalInhibitionBenchmark.java b/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerGlobalInhibitionBenchmark.java
index 08fe1dea5..69e502db3 100644
--- a/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerGlobalInhibitionBenchmark.java
+++ b/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerGlobalInhibitionBenchmark.java
@@ -41,9 +41,9 @@ public class SpatialPoolerGlobalInhibitionBenchmark extends AbstractAlgorithmBen
     public void init() {
         super.init();
 
-        input = new int[7][8];
-        for(int i = 0;i < 7;i++) {
-            input[i] = encoder.encode((double) i + 1);
+        input = new int[140][160];
+        for(int i = 0;i < 140;i++) {
+            input[i] = encoder.encode((double) ((i % 7) + 1));
         }
     }
 
@@ -55,10 +55,10 @@ public class SpatialPoolerGlobalInhibitionBenchmark extends AbstractAlgorithmBen
     }
 
     @Benchmark
     @BenchmarkMode(Mode.AverageTime)
     @OutputTimeUnit(TimeUnit.MILLISECONDS)
     public int[] measureAvgCompute_7_Times(Blackhole bh) throws InterruptedException {
-        for(int i = 0;i < 7;i++) {
+        for(int i = 0;i < 140;i++) {
             pooler.compute(memory, input[i], SDR, true);
         }
 
diff --git a/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerLocalInhibitionBenchmark.java b/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerLocalInhibitionBenchmark.java
index 8bbe19698..f333aba40 100644
--- a/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerLocalInhibitionBenchmark.java
+++ b/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerLocalInhibitionBenchmark.java
@@ -39,17 +39,17 @@ public class SpatialPoolerLocalInhibitionBenchmark extends AbstractAlgorithmBenc
     public void init() {
         super.init();
 
-        input = new int[7][8];
-        for(int i = 0;i < 7;i++) {
-            input[i] = encoder.encode((double) i + 1);
+        input = new int[140][160];
+        for(int i = 0;i < 140;i++) {
+            input[i] = encoder.encode((double) ((i % 7) + 1));
         }
     }
 
     @Benchmark
     @BenchmarkMode(Mode.AverageTime)
     @OutputTimeUnit(TimeUnit.MILLISECONDS)
     public int[] measureAvgCompute_7_Times(Blackhole bh) throws InterruptedException {
-        for(int i = 0;i < 7;i++) {
+        for(int i = 0;i < 140;i++) {
             pooler.compute(memory, input[i], SDR, true);
         }
 
diff --git a/src/jmh/java/org/numenta/nupic/benchmarks/TemporalMemoryBenchmark.java b/src/jmh/java/org/numenta/nupic/benchmarks/TemporalMemoryBenchmark.java
index e6d13c4e0..d3440a674 100644
--- a/src/jmh/java/org/numenta/nupic/benchmarks/TemporalMemoryBenchmark.java
+++ b/src/jmh/java/org/numenta/nupic/benchmarks/TemporalMemoryBenchmark.java
@@ -44,31 +44,31 @@ public class TemporalMemoryBenchmark extends AbstractAlgorithmBenchmark {
         super.init();
 
         //Initialize the encoder's encoded output
-        input = new int[7][8];
-        for(int i = 0;i < 7;i++) {
-            input[i] = encoder.encode((double) i + 1);
+        input = new int[140][160];
+        for(int i = 0;i < 140;i++) {
+            input[i] = encoder.encode((double) ((i % 7) + 1));
         }
 
-        SDRs = new int[7][];
+        SDRs = new int[140][];
 
-        for(int i = 0;i < 7;i++) {
+        for(int i = 0;i < 140;i++) {
             pooler.compute(memory, input[i], SDR, true);
             SDRs[i] = ArrayUtils.where(SDR, ArrayUtils.WHERE_1);
         }
 
-        for(int j = 0;j < 100;j++) {
-            for(int i = 0;i < 7;i++) {
+        for(int j = 0;j < 2000;j++) {
+            for(int i = 0;i < 140;i++) {
                 temporalMemory.compute(memory, SDRs[i], true);
             }
         }
     }
 
     @Benchmark
     @BenchmarkMode(Mode.AverageTime)
     @OutputTimeUnit(TimeUnit.MILLISECONDS)
     public ComputeCycle measureAvgCompute_7_Times(Blackhole bh) throws InterruptedException {
         ComputeCycle cc = null;
-        for(int i = 0;i < 7;i++) {
+        for(int i = 0;i < 140;i++) {
             cc = temporalMemory.compute(memory, SDRs[i], true);
         }

@khatchad khatchad mentioned this pull request Mar 21, 2018
@cogmission
Copy link
Collaborator

Hi @khatchad,

Thank you very much for your interest and effort!

Please have a look at the issue #538 because I've identified what I believe to be the source of the problems. I think my comments include a more "wholistic" solution as opposed to increasing the test size? (Although that was very very interesting that increasing the test sample size also avoids the issue - kind of scary actually).

On another note, conventionally and historically a column width > 2048 for the SDR size is never really used and I'm not sure how increasing the size to > 40k columns will serve to give us performance feedback for our "normal" use case. My feeling is that unconventional column widths might not be the best application for the benchmarks because it differs from the "usual" use case?

Please let me know your thoughts on the above? Benchmark composition and evaluation isn't something I have a lot of experience with, so I'm interested in opinions from people with expertise in this area? 👍

On again another note, your project of auto-optimizing Java streams to look for opportunities for concurrency is VERY cool! :-)

@cogmission
Copy link
Collaborator

@khatchad Also, to contribute to this project you will need to fill out the Contributor's License and @rhyolight will have to do some administrative processing of it after you've completed that, ok?

@khatchad
Copy link
Author

@cogmission,

Also, to contribute to this project you will need to fill out the Contributor's License and @rhyolight will have to do some administrative processing of it after you've completed that, ok?

No problem. I have filled out the paper work.

@khatchad
Copy link
Author

@cogmission:

Thank you very much for your interest and effort!

Thank you for considering the request!

Please have a look at the issue #538 because I've identified what I believe to be the source of the problems. I think my comments include a more "wholistic" solution as opposed to increasing the test size?

Yes, I've also replied there as well. The test size increase was not meant as a fix of the problem, so fixing the comparator seems reasonable to me (although I don't have much experience with your project).

On another note, conventionally and historically a column width > 2048 for the SDR size is never really used and I'm not sure how increasing the size to > 40k columns will serve to give us performance feedback for our "normal" use case. My feeling is that unconventional column widths might not be the best application for the benchmarks because it differs from the "usual" use case?

Thanks for this feedback. Admittedly, I've blindly increased the test size, perhaps even artificially, due to my unfamiliarity with the project. Normally in JMH test, there is a @Param annotation that can be easily used to increase the test data size, but, since I didn't find one, I tried many different ways to do so. I will repeat the speedup analysis but this time by not increasing the column size. I'll then report the results again.

@khatchad
Copy link
Author

I should also note that this test was run on 8 cores.

@khatchad
Copy link
Author

khatchad commented Mar 22, 2018

OK, I've reverted the changes to the columns, which results in the following patch:

diff --git a/src/jmh/java/org/numenta/nupic/benchmarks/AbstractAlgorithmBenchmark.java b/src/jmh/java/org/numenta/nupic/benchmarks/AbstractAlgorithmBenchmark.java
index 8a81d171d..74c422dcb 100644
--- a/src/jmh/java/org/numenta/nupic/benchmarks/AbstractAlgorithmBenchmark.java
+++ b/src/jmh/java/org/numenta/nupic/benchmarks/AbstractAlgorithmBenchmark.java
@@ -47,7 +47,7 @@ public abstract class AbstractAlgorithmBenchmark {
 
         //Layer components
         ScalarEncoder.Builder dayBuilder = ScalarEncoder.builder()
-            .n(8)
+            .n(160)
             .w(3)
             .radius(1.0)
             .minVal(1.0)
@@ -76,9 +76,9 @@ public abstract class AbstractAlgorithmBenchmark {
      */
     protected Parameters getParameters() {
         Parameters parameters = Parameters.getAllDefaultParameters();
-        parameters.set(KEY.INPUT_DIMENSIONS, new int[] { 8 });
+        parameters.set(KEY.INPUT_DIMENSIONS, new int[] { 160 });
         parameters.set(KEY.COLUMN_DIMENSIONS, new int[] { 2048 });
-        parameters.set(KEY.CELLS_PER_COLUMN, 32);
+        parameters.set(KEY.CELLS_PER_COLUMN, 640);
 
         //SpatialPooler specific
         parameters.set(KEY.POTENTIAL_RADIUS, 12);//3
diff --git a/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerGlobalInhibitionBenchmark.java b/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerGlobalInhibitionBenchmark.java
index 08fe1dea5..69e502db3 100644
--- a/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerGlobalInhibitionBenchmark.java
+++ b/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerGlobalInhibitionBenchmark.java
@@ -41,9 +41,9 @@ public class SpatialPoolerGlobalInhibitionBenchmark extends AbstractAlgorithmBen
     public void init() {
         super.init();
 
-        input = new int[7][8];
-        for(int i = 0;i < 7;i++) {
-            input[i] = encoder.encode((double) i + 1);
+        input = new int[140][160];
+        for(int i = 0;i < 140;i++) {
+            input[i] = encoder.encode((double) ((i % 7) + 1));
         }
     }
 
@@ -55,10 +55,10 @@ public class SpatialPoolerGlobalInhibitionBenchmark extends AbstractAlgorithmBen
     }
 
     @Benchmark
     @BenchmarkMode(Mode.AverageTime)
     @OutputTimeUnit(TimeUnit.MILLISECONDS)
     public int[] measureAvgCompute_7_Times(Blackhole bh) throws InterruptedException {
-        for(int i = 0;i < 7;i++) {
+        for(int i = 0;i < 140;i++) {
             pooler.compute(memory, input[i], SDR, true);
         }
 
diff --git a/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerLocalInhibitionBenchmark.java b/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerLocalInhibitionBenchmark.java
index 8bbe19698..f333aba40 100644
--- a/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerLocalInhibitionBenchmark.java
+++ b/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerLocalInhibitionBenchmark.java
@@ -39,17 +39,17 @@ public class SpatialPoolerLocalInhibitionBenchmark extends AbstractAlgorithmBenc
     public void init() {
         super.init();
 
-        input = new int[7][8];
-        for(int i = 0;i < 7;i++) {
-            input[i] = encoder.encode((double) i + 1);
+        input = new int[140][160];
+        for(int i = 0;i < 140;i++) {
+            input[i] = encoder.encode((double) ((i % 7) + 1));
         }
     }
 
     @Benchmark
     @BenchmarkMode(Mode.AverageTime)
     @OutputTimeUnit(TimeUnit.MILLISECONDS)
     public int[] measureAvgCompute_7_Times(Blackhole bh) throws InterruptedException {
-        for(int i = 0;i < 7;i++) {
+        for(int i = 0;i < 140;i++) {
             pooler.compute(memory, input[i], SDR, true);
         }
 
diff --git a/src/jmh/java/org/numenta/nupic/benchmarks/TemporalMemoryBenchmark.java b/src/jmh/java/org/numenta/nupic/benchmarks/TemporalMemoryBenchmark.java
index e6d13c4e0..d3440a674 100644
--- a/src/jmh/java/org/numenta/nupic/benchmarks/TemporalMemoryBenchmark.java
+++ b/src/jmh/java/org/numenta/nupic/benchmarks/TemporalMemoryBenchmark.java
@@ -44,31 +44,31 @@ public class TemporalMemoryBenchmark extends AbstractAlgorithmBenchmark {
         super.init();
 
         //Initialize the encoder's encoded output
-        input = new int[7][8];
-        for(int i = 0;i < 7;i++) {
-            input[i] = encoder.encode((double) i + 1);
+        input = new int[140][160];
+        for(int i = 0;i < 140;i++) {
+            input[i] = encoder.encode((double) ((i % 7) + 1));
         }
 
-        SDRs = new int[7][];
+        SDRs = new int[140][];
 
-        for(int i = 0;i < 7;i++) {
+        for(int i = 0;i < 140;i++) {
             pooler.compute(memory, input[i], SDR, true);
             SDRs[i] = ArrayUtils.where(SDR, ArrayUtils.WHERE_1);
         }
 
-        for(int j = 0;j < 100;j++) {
-            for(int i = 0;i < 7;i++) {
+        for(int j = 0;j < 2000;j++) {
+            for(int i = 0;i < 140;i++) {
                 temporalMemory.compute(memory, SDRs[i], true);
             }
         }
     }
 
     @Benchmark
     @BenchmarkMode(Mode.AverageTime)
     @OutputTimeUnit(TimeUnit.MILLISECONDS)
     public ComputeCycle measureAvgCompute_7_Times(Blackhole bh) throws InterruptedException {
         ComputeCycle cc = null;
-        for(int i = 0;i < 7;i++) {
+        for(int i = 0;i < 140;i++) {
             cc = temporalMemory.compute(memory, SDRs[i], true);
         }

This results in the following speedup analysis:

SUM of Score Mode/Units
avgt
Benchmark ms/op
org.numenta.nupic.benchmarks.SpatialPoolerLocalInhibitionBenchmark.measureAvgCompute_7_Times 1.131135828
org.numenta.nupic.benchmarks.TemporalMemoryBenchmark.measureAvgCompute_7_Times 1.973052763
Avg 1.552094295

As can be seen, the org.numenta.nupic.benchmarks.SpatialPoolerGlobalInhibitionBenchmark.measureAvgCompute_7_Times benchmark was not executed due to #538.

@cogmission
Copy link
Collaborator

Hi @khatchad,

The test size increase was not meant as a fix of the problem

Ah I see. Hmmm... Well if that's the case then why do we want to change the parameters of the test at all (other than the changes necessary to fix the exception being thrown)? I'm not sure why we want to increase the test sizes within the project, if you require these settings for your research/experimentation then you can set them locally to your project by changing the required parameters and then making a new build, no? I'm unclear why I'd change the test in the codebase to adhere to your specific needs? :-)

Not that we don't like to keep our users happy!

@khatchad
Copy link
Author

Hi @cogmission,

Ah I see. Hmmm... Well if that's the case then why do we want to change the parameters of the test at all (other than the changes necessary to fix the exception being thrown)? I'm not sure why we want to increase the test sizes within the project, if you require these settings for your research/experimentation then you can set them locally to your project by changing the required parameters and then making a new build, no? I'm unclear why I'd change the test in the codebase to adhere to your specific needs? :-)

Yes, agreed. If you notice in the pull request files, there are no changes to test code. The patch I put in the comments is just to show you how we arrived at the performance results. In fact, we did just want you mention above, i.e., the test code changes are just local.

As for your question as to why we changed the test code locally, we wanted the tests to process more data for a better probability of exploiting parallelism. By no means are we suggesting that the test code be changed permanently. As you said, it is specific to our experiment.

Thanks again for your consideration and please let us know if there are any questions!

@khatchad
Copy link
Author

khatchad commented Aug 3, 2018

Hi all. Does anyone happen to know why GitHub is saying that I haven't signed the contributor license when I have? I believe that this PR should be passing. Thanks!

@cogmission
Copy link
Collaborator

@khatchad
Hi,
It has to be approved by @rhyolight first...

@khatchad
Copy link
Author

khatchad commented Aug 3, 2018

Thank you @cogmission and @rhyolight! Just to reiterate, the only suggested changes are found in https://github.com/numenta/htm.java/pull/539/files. It is not necessary to change the tests, they were just augmented to exploit more parallelism and to evaluate the approach. Thank you again for your feedback as well as the opportunity to contribute to your project!

@cogmission
Copy link
Collaborator

AFK, but will get to it asap

@cogmission
Copy link
Collaborator

@khatchad

Hi, I’m on vacation right now. Will be back toward end of August... Please hold tight. Thanks, David

@khatchad
Copy link
Author

No problem, @cogmission. Thanks for the consideration and enjoy your vacation!

@khatchad khatchad changed the title Apply optimize stream refactoring. Apply optimize stream refactoring Aug 20, 2018
@khatchad
Copy link
Author

FYI, I noticed an error in the "Avg" column for the average speedup in the table listed in #539 (comment). It should be 1.552094295. I've updated it accordingly. Sorry about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants