Skip to content

Cache the index of the last accessed polynomial #1712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 4, 2018

Conversation

pleroy
Copy link
Member

@pleroy pleroy commented Feb 4, 2018

In favorable cases this is about 30% faster. In unfavorable cases this is marginally slower, but probably in the noise.

Before:

----------------------------------------------------------------------------------------------------------------------
Benchmark                                                                               Time           CPU Iterations
----------------------------------------------------------------------------------------------------------------------
BM_EphemerisMultithreadingBenchmark/3/1                                         172615405 ns          0 ns        100 +9.98764788888029195e+06 m +1.99940980346540436e+07 m +2.99994767007804625e+07 m
BM_EphemerisMultithreadingBenchmark/3/2                                         119318150 ns          0 ns        100 +9.98764788888029195e+06 m +1.99940980346540436e+07 m +2.99994767007804625e+07 m
BM_EphemerisMultithreadingBenchmark/3/3                                          67171344 ns          0 ns        100 +9.98764788888029195e+06 m +1.99940980346540436e+07 m +2.99994767007804625e+07 m
BM_EphemerisMultithreadingBenchmark/3/4                                          68544608 ns          0 ns        100 +9.98764788888029195e+06 m +1.99940980346540436e+07 m +2.99994767007804625e+07 m
BM_EphemerisMultithreadingBenchmark/3/5                                          66004206 ns          0 ns        100 +9.98764788888029195e+06 m +1.99940980346540436e+07 m +2.99994767007804625e+07 m
BM_EphemerisSolarSystemMajorBodiesOnly/-3                                      102641683616 ns 102555057400 ns          1 +1.00027593204189058e+00 ua
BM_EphemerisSolarSystemMinorAndMajorBodies/-3                                  230143093351 ns 229961074100 ns          1 +1.00027593305567897e+00 ua
BM_EphemerisSolarSystemAllBodiesAndOblateness/-3                               261569303409 ns 261426475800 ns          1 +1.00027577397770173e+00 ua
BM_EphemerisL4ProbeMajorBodiesOnly<&FlowEphemerisWithAdaptiveStep>/-3          1199530377 ns 1185607600 ns          1 154379 steps, +9.99375002345204155e-01 ua, +1.09612969423844309e+00 ua, degree 82.044495
BM_EphemerisL4ProbeMinorAndMajorBodies<&FlowEphemerisWithAdaptiveStep>/-3      3238862846 ns 3198020500 ns          1 154379 steps, +9.99375003101658388e-01 ua, +1.09612971674595627e+00 ua, degree 186.217687
BM_EphemerisL4ProbeAllBodiesAndOblateness<&FlowEphemerisWithAdaptiveStep>/-3   3259336479 ns 3260420900 ns          1 154379 steps, +9.99375002982150651e-01 ua, +1.09612960413471527e+00 ua, degree 187.141040
BM_EphemerisLEOProbeMajorBodiesOnly<&FlowEphemerisWithAdaptiveStep>/-3         3979957516 ns 3978025500 ns          1 750001 steps, +9.99957708942228352e-01 ua, +9.99472639320210021e+01 nmi
BM_EphemerisLEOProbeMajorBodiesOnly<&FlowEphemerisWithFixedStepSLMS>/-3        8431033716 ns 8424054000 ns          1 3155761 steps, +9.99973822025392423e-01 ua, +9.99960134579607995e+01 nmi
BM_EphemerisLEOProbeMajorBodiesOnly<&FlowEphemerisWithFixedStepSRKN>/-3        31613871197 ns 31512202000 ns          1 3155761 steps, +9.99973821849166056e-01 ua, +9.99960129884997997e+01 nmi
BM_EphemerisLEOProbeMinorAndMajorBodies<&FlowEphemerisWithAdaptiveStep>/-3     7814131494 ns 7722049500 ns          1 750001 steps, +9.99957751818578711e-01 ua, +9.99474216789493966e+01 nmi
BM_EphemerisLEOProbeMinorAndMajorBodies<&FlowEphemerisWithFixedStepSLMS>/-3    13806965291 ns 13556486900 ns          1 3155761 steps, +9.99973820518458734e-01 ua, +9.99960086472014638e+01 nmi
BM_EphemerisLEOProbeMinorAndMajorBodies<&FlowEphemerisWithFixedStepSRKN>/-3    60889881393 ns 60653188800 ns          1 3155761 steps, +9.99973820321925166e-01 ua, +9.99960084979087469e+01 nmi
BM_EphemerisLEOProbeAllBodiesAndOblateness<&FlowEphemerisWithAdaptiveStep>/-3  7892333431 ns 7878050500 ns          1 752025 steps, +9.99980466918425570e-01 ua, +8.93176027305386953e+01 nmi
BM_EphemerisLEOProbeAllBodiesAndOblateness<&FlowEphemerisWithFixedStepSLMS>/-3 14065352563 ns 14024489900 ns          1 3155761 steps, +9.99963448237548125e-01 ua, +8.91392610421626443e+01 nmi
BM_EphemerisLEOProbeAllBodiesAndOblateness<&FlowEphemerisWithFixedStepSRKN>/-3 66174206491 ns 65972822900 ns          1 3155761 steps, +9.99963448506915542e-01 ua, +8.91392576638347691e+01 nmi
BM_EphemerisFittingTolerance<&FlowEphemerisWithAdaptiveStep>/-4                1197785375 ns 1185607600 ns          1 154379 steps, +9.99375002345204155e-01 ua, +1.09612969423844309e+00 ua, degree 84.413812
BM_EphemerisFittingTolerance<&FlowEphemerisWithAdaptiveStep>/-3                1203694631 ns 1201207700 ns          1 154379 steps, +9.99375002345204155e-01 ua, +1.09612969423844309e+00 ua, degree 82.044495
BM_EphemerisFittingTolerance<&FlowEphemerisWithAdaptiveStep>/-2                1166978430 ns 1170007500 ns          1 154379 steps, +9.99375002345204044e-01 ua, +1.09612969423843420e+00 ua, degree 76.283386
BM_EphemerisFittingTolerance<&FlowEphemerisWithAdaptiveStep>/-1                1141520619 ns 1138807300 ns          1 154379 steps, +9.99375002345204044e-01 ua, +1.09612969423843420e+00 ua, degree 70.000000
BM_EphemerisFittingTolerance<&FlowEphemerisWithAdaptiveStep>/0                 1185772626 ns 1185607600 ns          1 154379 steps, +9.99375002345204155e-01 ua, +1.09612969423844508e+00 ua, degree 63.205634
BM_EphemerisFittingTolerance<&FlowEphemerisWithAdaptiveStep>/1                 1110106234 ns 1092007000 ns          1 154379 steps, +9.99375002345203156e-01 ua, +1.09612969423847573e+00 ua, degree 61.000000
BM_EphemerisFittingTolerance<&FlowEphemerisWithAdaptiveStep>/2                 1120295729 ns 1076406900 ns          1 154379 steps, +9.99375002345204821e-01 ua, +1.09612969423839779e+00 ua, degree 57.000000
BM_EphemerisFittingTolerance<&FlowEphemerisWithAdaptiveStep>/3                 1071555954 ns 1076406900 ns          1 154379 steps, +9.99375002345200714e-01 ua, +1.09612969423857276e+00 ua, degree 55.000000
BM_EphemerisFittingTolerance<&FlowEphemerisWithAdaptiveStep>/4                 1046421661 ns 1045206700 ns          1 154379 steps, +9.99375002345202934e-01 ua, +1.09612969423848794e+00 ua, degree 54.000000
BM_EphemerisStartup<&FlowEphemerisWithFixedStepSLMS>/3                            1519677 ns    1531874 ns        499 11 steps, +9.99929985864532966e-01 ua, +9.99929985544194766e-01 ua, degree 55.000000
BM_EphemerisStartup<&FlowEphemerisWithFixedStepSRKN>/3                              59264 ns      62578 ns      11218 11 steps, +9.99929985864532966e-01 ua, +9.99929985544194766e-01 ua, degree 55.000000

After:

----------------------------------------------------------------------------------------------------------------------
Benchmark                                                                               Time           CPU Iterations
----------------------------------------------------------------------------------------------------------------------
BM_EphemerisMultithreadingBenchmark/3/1                                         142686164 ns          0 ns        100 +9.98765333456185088e+06 m +1.99941208081129901e+07 m +2.99993670256617330e+07 m
BM_EphemerisMultithreadingBenchmark/3/2                                         102562724 ns          0 ns        100 +9.98765333456185088e+06 m +1.99941208081129901e+07 m +2.99993670256617330e+07 m
BM_EphemerisMultithreadingBenchmark/3/3                                          62416341 ns          0 ns        100 +9.98765333456185088e+06 m +1.99941208081129901e+07 m +2.99993670256617330e+07 m
BM_EphemerisMultithreadingBenchmark/3/4                                          64479839 ns          0 ns        100 +9.98765333456185088e+06 m +1.99941208081129901e+07 m +2.99993670256617330e+07 m
BM_EphemerisMultithreadingBenchmark/3/5                                          62340847 ns          0 ns        100 +9.98765333456185088e+06 m +1.99941208081129901e+07 m +2.99993670256617330e+07 m
BM_EphemerisSolarSystemMajorBodiesOnly/-3                                      101513218662 ns 101353849700 ns          1 +9.92179030813603147e-01 ua
BM_EphemerisSolarSystemMinorAndMajorBodies/-3                                  229380653476 ns 228931467500 ns          1 +9.92179031500273201e-01 ua
BM_EphemerisSolarSystemAllBodiesAndOblateness/-3                               260844399968 ns 260412469300 ns          1 +9.92178865022683709e-01 ua
BM_EphemerisL4ProbeMajorBodiesOnly<&FlowEphemerisWithAdaptiveStep>/-3          1280951691 ns 1279208200 ns          1 154375 steps, +9.91602312582897105e-01 ua, +1.07738936581450595e+00 ua, degree 81.478731
BM_EphemerisL4ProbeMinorAndMajorBodies<&FlowEphemerisWithAdaptiveStep>/-3      3260062686 ns 3260420900 ns          1 154375 steps, +9.91602312429099242e-01 ua, +1.07738938749625812e+00 ua, degree 184.913688
BM_EphemerisL4ProbeAllBodiesAndOblateness<&FlowEphemerisWithAdaptiveStep>/-3   3296772702 ns 3291621100 ns          1 154375 steps, +9.91602312327636626e-01 ua, +1.07738926882847252e+00 ua, degree 185.993355
BM_EphemerisLEOProbeMajorBodiesOnly<&FlowEphemerisWithAdaptiveStep>/-3         2883043344 ns 2870418400 ns          1 750001 steps, +9.91871687195589047e-01 ua, +9.99471081184549206e+01 nmi
BM_EphemerisLEOProbeMajorBodiesOnly<&FlowEphemerisWithFixedStepSLMS>/-3        7022871850 ns 7004444900 ns          1 3155761 steps,  +9.91888191440387867e-01 ua, +9.99957023179948123e+01 nmi
BM_EphemerisLEOProbeMajorBodiesOnly<&FlowEphemerisWithFixedStepSRKN>/-3        22706418532 ns 22666945300 ns          1 3155761 steps, +9.91888190645376921e-01 ua, +9.99957011459851941e+01 nmi
BM_EphemerisLEOProbeMinorAndMajorBodies<&FlowEphemerisWithAdaptiveStep>/-3     5447131230 ns 5428834800 ns          1 750001 steps, +9.91871692890459844e-01 ua, +9.99471401652953801e+01 nmi
BM_EphemerisLEOProbeMinorAndMajorBodies<&FlowEphemerisWithFixedStepSLMS>/-3    10363971098 ns 10342866300 ns          1 3155761 steps, +9.91888192082356346e-01 ua, +9.99957081377918655e+01 nmi
BM_EphemerisLEOProbeMinorAndMajorBodies<&FlowEphemerisWithFixedStepSRKN>/-3    43156586351 ns 43025075800 ns          1 3155761 steps, +9.91888191588845114e-01 ua, +9.99957073553868554e+01 nmi
BM_EphemerisLEOProbeAllBodiesAndOblateness<&FlowEphemerisWithAdaptiveStep>/-3  6091343584 ns 6084039000 ns          1 752025 steps, +9.91854987153988343e-01 ua, +8.93207501932555203e+01 nmi
BM_EphemerisLEOProbeAllBodiesAndOblateness<&FlowEphemerisWithFixedStepSLMS>/-3 11220841250 ns 11216471900 ns          1 3155761 steps, +9.91841968896397086e-01 ua, +8.91413444586663388e+01 nmi
BM_EphemerisLEOProbeAllBodiesAndOblateness<&FlowEphemerisWithFixedStepSRKN>/-3 48630476563 ns 48157508700 ns          1 3155761 steps, +9.91841969295739090e-01 ua, +8.91413360663637349e+01 nmi
BM_EphemerisFittingTolerance<&FlowEphemerisWithAdaptiveStep>/-4                1288079853 ns 1294808300 ns          1 154375 steps, +9.91602312582897105e-01 ua, +1.07738936581450595e+00 ua, degree 84.200502
BM_EphemerisFittingTolerance<&FlowEphemerisWithAdaptiveStep>/-3                1287400656 ns 1294808300 ns          1 154375 steps, +9.91602312582897105e-01 ua, +1.07738936581450595e+00 ua, degree 81.478731
BM_EphemerisFittingTolerance<&FlowEphemerisWithAdaptiveStep>/-2                1259090903 ns 1185607600 ns          1 154375 steps, +9.91602312582901657e-01 ua, +1.07738936581426081e+00 ua, degree 75.696946
BM_EphemerisFittingTolerance<&FlowEphemerisWithAdaptiveStep>/-1                1216446886 ns 1216807800 ns          1 154375 steps, +9.91602312582901657e-01 ua, +1.07738936581426081e+00 ua, degree 70.000000
BM_EphemerisFittingTolerance<&FlowEphemerisWithAdaptiveStep>/0                 1187819804 ns 1185607600 ns          1 154375 steps, +9.91602312582901657e-01 ua, +1.07738936581427724e+00 ua, degree 63.205197
BM_EphemerisFittingTolerance<&FlowEphemerisWithAdaptiveStep>/1                 1176090363 ns 1170007500 ns          1 154375 steps, +9.91602312582899215e-01 ua, +1.07738936581442979e+00 ua, degree 61.000000
BM_EphemerisFittingTolerance<&FlowEphemerisWithAdaptiveStep>/2                 1149569843 ns 1154407400 ns          1 154375 steps, +9.91602312582899215e-01 ua, +1.07738936581438782e+00 ua, degree 57.000000
BM_EphemerisFittingTolerance<&FlowEphemerisWithAdaptiveStep>/3                 1120413877 ns 1123207200 ns          1 154375 steps, +9.91602312582901213e-01 ua, +1.07738936581430389e+00 ua, degree 55.000000
BM_EphemerisFittingTolerance<&FlowEphemerisWithAdaptiveStep>/4                 1123504283 ns 1123207200 ns          1 154375 steps, +9.91602312582899437e-01 ua, +1.07738936581438960e+00 ua, degree 54.000000
BM_EphemerisStartup<&FlowEphemerisWithFixedStepSLMS>/3                            1674047 ns    1720599 ns        408 11 steps, +9.91835233804689742e-01 ua, +9.91835231548724217e-01 ua, degree 55.000000
BM_EphemerisStartup<&FlowEphemerisWithFixedStepSRKN>/3                              65472 ns      67080 ns      10000 11 steps, +9.91835233804689742e-01 ua, +9.91835231548724217e-01 ua, degree 55.000000

@eggrobin eggrobin added the LGTM label Feb 4, 2018
}
{
// Need to use |lower_bound|, not |upper_bound|, because it allows
// heterogeneous arguments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment (see lower_bound, upper_bound).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

// speeds up the lookup by a factor of 7. This member is atomic because of
// multithreading in the ephemeris, and is mutable to maintain the fiction
// that evaluation has no side effects. Any value in the range of
// |polynomials_| or 0 is correct.
Copy link
Member

@eggrobin eggrobin Feb 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on the alternative of a std::map<std::thread::id, int>.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am explaining why it works with multithreading. No point in documenting the roads not taken.

Copy link
Member Author

@pleroy pleroy Feb 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record: we are trying to optimize a binary search. We know that if polynomials_ is a map the performance is bad. So surely adding a map lookup to try to optimize a binary search doesn't make sense. Also, this really needs to be an atomic: I started with a shared_mutex and the cost of acquiring the mutex, even without contention (1 thread) was prohibitive.

@pleroy pleroy merged commit 4c78a00 into mockingbirdnest:master Feb 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants