Changed Forkable::ancestry_ to an absl::InlinedVector. #2776

rnlahaye · 2020-10-28T00:23:30Z

Previously it was a deque.
This change elides allocations in most cases.

Note that as part of the change, the direction of ancestry was reversed
(i.e. front and back have been swapped). This is only a notational
change and has no effect.

The inlined size of two was decided upon by analyzing a journal. Most (~80%) instances of ancestry_ eventually got to size 2, whereas only 10% got to size 3.

I also ran the benchmarks. I wasn't sure which ones to run, so I ran all of them. The results are
attached here.

Some context: At some point I noticed my game had become laggy and profiled it to see why. It turned out that a lot of time was being spent allocating/deallocating Forkable::ancestry_. Changing it to an InlinedVector reduced time per frame spent waiting on Principia from ~40ms to ~7ms.

pleroy · 2020-10-28T00:23:59Z

[Automated message from GitHub Pull Request Builder] Answer "ok to test" to trigger testing of this PR.

rnlahaye · 2020-10-28T00:24:15Z

ok to test

eggrobin · 2020-10-28T00:54:15Z

I don’t think you can see the CI results, so copying that here: the linter says

forkable_body.hpp:94 Redundant blank line at the end of a code block should be deleted.

rnlahaye · 2020-10-28T01:04:28Z

ok to test

Previously it was a deque. This change elides allocations in most cases. Note that as part of the change, the direction of ancestry was reversed (i.e. front and back have been swapped). This is only a notational change and has no effect.

pleroy

Thanks for your contribution.

Would you mind writing a small benchmark? Something that creates a trajectory, adds a few points, creates 2-3 forks, and iterates over the entire trajectory and the forks. I'd imagine that the performance difference would be fairly clear (or whatever variation thereof where the issue is the clearest). We don't have benchmarks that are very clean in terms of the performance of DiscreteTrajectory at the moment, so it would be good to make sure that we don't create performance regressions in the future.

pleroy · 2020-10-28T11:12:03Z

physics/forkable.hpp

  TimelineConstIterator current_;
-  std::deque<not_null<Tr4jectory const*>> ancestry_;  // Pointers not owned.
+  absl::InlinedVector<not_null<Tr4jectory const*>, 2>
+      ancestry_;  // Pointers not owned.


Remove the comment, it's a relic of the distant past.

pleroy · 2020-10-28T11:17:00Z

physics/forkable_body.hpp

-    ancestry_.erase(ancestry_.begin(), --ancestry_.end());
-    current_ = ancestry_.front()->timeline_end();
+  if (current_ == ancestry_.back()->timeline_end() && ancestry_.size() > 1) {
+    ancestry_ = {ancestry_.front()};


I'd imagine that ancestry_.resize(1) would be more efficient in terms of heap management and a bit more readable IMO.

I agree that resize would be more readable, but it doesn't compile since not_null<> has no default constructor.

Let’s make it an erase(ancestry_.begin() + 1, ancestry_.end()) then.

pleroy · 2020-10-28T11:23:43Z

physics/forkable.hpp

  TimelineConstIterator current_;
-  std::deque<not_null<Tr4jectory const*>> ancestry_;  // Pointers not owned.
+  absl::InlinedVector<not_null<Tr4jectory const*>, 2>


I would think that 3 would work better. While most trajectories only have 2 elements in this member, most trajectories will go up to 3 (history, psychohistory, prediction) at some point in their lifetime, we might as well reserve space for that case.

Done. I added a comment with the reasoning.

pleroy · 2020-10-28T11:56:25Z

physics/forkable_body.hpp

@@ -285,7 +283,8 @@ LowerBound(Instant const& time) const {
            // We found an interesting timeline, i.e. one that is nonempty and
            // not forked at the fork point of its parent.  Cut the ancestry and
            // return the beginning of that timeline.
-            iterator.ancestry_.erase(iterator.ancestry_.begin(), ancestry_it);
+            iterator.ancestry_ = {ancestry_it, iterator.ancestry_.rend()};


This seems wrong. The construction using reverse iterators is going to effectively reverse the range, so the element at rend-1, i.e., at begin will land at the back. I would stick to erase here. It would be good to add a test covering this situation: you'll need a trajectory with many forks, call LowerBound with a time near the middle, and check that the resulting iterator gives you data in the right order.

You are correct; it is wrong. I added a new test and changed it to erase.

pleroy · 2020-10-28T12:00:47Z

(Please ignore the CI failures, they are bogus.)

rnlahaye · 2020-10-29T22:34:36Z

I added a benchmark. Here are the results.

Benchmark                                                  Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------
BM_DiscreteTrajectoryFront                              -0.8572         -0.8572           154            22           154            22
BM_DiscreteTrajectoryBack                               -0.8964         -0.8964           152            16           152            16
BM_DiscreteTrajectoryBegin                              -0.8577         -0.8577           156            22           156            22
BM_DiscreteTrajectoryEnd                                -0.9683         -0.9683           152             5           152             5
BM_DiscreteTrajectoryCreateDestroy/8                    -0.0532         -0.0526          1163          1101          1161          1100
BM_DiscreteTrajectoryCreateDestroy/64                   -0.0314         -0.0311          9142          8855          9129          8845
BM_DiscreteTrajectoryCreateDestroy/512                  -0.0017         -0.0020         75286         75158         75209         75056
BM_DiscreteTrajectoryCreateDestroy/1024                 +0.0397         +0.0374        160894        167273        160706        166713
BM_DiscreteTrajectoryIterate/8                          -0.9046         -0.9054          1674           160          1672           158
BM_DiscreteTrajectoryIterate/64                         -0.8907         -0.8912         10654          1165         10643          1157
BM_DiscreteTrajectoryIterate/512                        -0.8870         -0.8878         84547          9551         84432          9471
BM_DiscreteTrajectoryIterate/1024                       -0.8929         -0.8934        172283         18447        172051         18342
BM_DiscreteTrajectoryReverseIterate/8                   -0.8299         -0.8308          1762           300          1760           298
BM_DiscreteTrajectoryReverseIterate/64                  -0.7994         -0.8020         11562          2320         11550          2286
BM_DiscreteTrajectoryReverseIterate/512                 -0.7837         -0.7859         88633         19173         88558         18957
BM_DiscreteTrajectoryReverseIterate/1024                -0.7780         -0.7803        176187         39116        176040         38676
BM_DiscreteTrajectoryFind/8                             -0.9023         -0.9025           456            45           456            44
BM_DiscreteTrajectoryFind/64                            -0.8208         -0.8214           477            85           476            85
BM_DiscreteTrajectoryFind/512                           -0.7258         -0.7277           569           156           568           155
BM_DiscreteTrajectoryFind/1024                          -0.7449         -0.7449           576           147           575           147
BM_DiscreteTrajectoryLowerBound/8                       -0.9483         -0.9483          1142            59          1141            59
BM_DiscreteTrajectoryLowerBound/64                      -0.9092         -0.9092          1158           105          1157           105
BM_DiscreteTrajectoryLowerBound/512                     -0.8618         -0.8619          1237           171          1235           171
BM_DiscreteTrajectoryLowerBound/1024                    -0.8712         -0.8712          1279           165          1277           165

eggrobin

Thanks for this impressive performance improvement; just a few nitpicks, looks good.

eggrobin · 2020-10-29T23:47:55Z

benchmarks/discrete_trajectory.cpp

+namespace {
+
+// Creates a trajectory with the given number of steps.
+std::unique_ptr<DiscreteTrajectory<World>> CreateTrajectory(int const steps) {


nit: not_null<std::unique_ptr<DiscreteTrajectory<World>>>

eggrobin · 2020-10-29T23:49:20Z

benchmarks/discrete_trajectory.cpp

+  auto trajectory = std::make_unique<DiscreteTrajectory<World>>();
+  DegreesOfFreedom<World> const d(World::origin, Velocity<World>());
+  Instant t;
+  for (int i = 0; i < steps; i++, t += Second) {


We tend to write 1 * Second etc., it is more readable (and since are those are all constexpr the compilers are smart enough).

eggrobin · 2020-10-29T23:49:52Z

benchmarks/discrete_trajectory.cpp

+BENCHMARK(BM_DiscreteTrajectoryLowerBound)->Range(8, 1024);
+
+}  // namespace physics
+}  // namespace principia


New line at end of file (the latest CI failure was the linter complaining about this).

eggrobin · 2020-10-29T23:51:41Z

benchmarks/discrete_trajectory.cpp

+// |parent| should be nonempty.
+// |pos| should be in [0, 1].
+not_null<DiscreteTrajectory<World>*> ForkAt(DiscreteTrajectory<World>& parent,
+                                            float const pos) {


Let’s not mix precisions, we use double everywhere (it gets converted to double where it is used on line 45 anyway).

eggrobin · 2020-10-29T23:54:44Z

physics/forkable.hpp

@@ -84,10 +84,11 @@ class ForkableIterator {
  void CheckNormalizedIfEnd();

  // |ancestry_| is never empty.  |current_| is an iterator in the timeline
-  // for |ancestry_.back()|.  |current_| may be at end.
+  // for |ancestry_.back()|.  |current_| may be at end. The inline size of 3
+  // for |ancestry_| is intended to cover a trajectory's history, psychohistory,


A vessel’s history, psychohistory, and prediction.

eggrobin · 2020-10-30T00:06:38Z

physics/forkable_body.hpp

@@ -239,8 +239,8 @@ LowerBound(Instant const& time) const {
  // This queue is parallel to |iterator.ancestry_|, and each entry is an


Not a queue anymore.

eggrobin · 2020-10-30T00:16:23Z

physics/forkable_test.cpp

+      fork->NewFork(fork->timeline_find(t2_));
+  fork2->push_back(t3_);
+
+  auto lower_bound_it = fork2->LowerBound(t1_);


const the iterators.

eggrobin · 2020-10-31T11:32:59Z

benchmarks/discrete_trajectory.cpp

+
+// Creates a trajectory with the given number of steps.
+std::unique_ptr<DiscreteTrajectory<World>> CreateTrajectory(int const steps) {
+  auto trajectory = std::make_unique<DiscreteTrajectory<World>>();


make_not_null_unique

eggrobin · 2020-10-31T11:38:53Z

benchmarks/discrete_trajectory.cpp

+}  // namespace
+
+void BM_DiscreteTrajectoryFront(benchmark::State& state) {
+  std::unique_ptr<DiscreteTrajectory<World>> trajectory = CreateTrajectory(4);


const the pointers that are const (not_null<...> const or auto const, everywhere in this file).

eggrobin · 2020-10-31T11:48:08Z

physics/forkable_body.hpp

-    ancestry_.erase(ancestry_.begin(), --ancestry_.end());
-    current_ = ancestry_.front()->timeline_end();
+  if (current_ == ancestry_.back()->timeline_end() && ancestry_.size() > 1) {
+    ancestry_ = {ancestry_.front()};


Let’s make it an erase(ancestry_.begin() + 1, ancestry_.end()) then.

rnlahaye added 3 commits October 27, 2020 19:35

Changed ancestry_ to an InlinedVector.

1a56650

Previously it was a deque. This change elides allocations in most cases. Note that as part of the change, the direction of ancestry was reversed (i.e. front and back have been swapped). This is only a notational change and has no effect.

Remove spurious TODOs.

7cf378c

Lint fix.

242148e

rnlahaye force-pushed the slow branch from 2d874d3 to 242148e Compare October 28, 2020 02:36

pleroy requested changes Oct 28, 2020

View reviewed changes

mockingbirdnest deleted a comment from pleroy Oct 28, 2020

rnlahaye added 6 commits October 28, 2020 22:56

Added a new test case for Forkable::LowerBound.

8b9fa2e

Updated test to catch bug.

4a82830

Fixed bug.

371a195

Adressed PR comments.

67e61de

Removed the other deque from Forkable.

591956f

Added a benchmark for DiscreteTrajectory.

322ef52

eggrobin requested changes Oct 31, 2020

View reviewed changes

Addressed pull request comments.

3ff0c99

eggrobin added the LGTM label Oct 31, 2020

eggrobin approved these changes Oct 31, 2020

View reviewed changes

pleroy approved these changes Oct 31, 2020

View reviewed changes

eggrobin merged commit 59e6ae5 into mockingbirdnest:master Oct 31, 2020

rnlahaye deleted the slow branch October 31, 2020 18:35

pleroy mentioned this pull request Dec 5, 2020

Request for Lighter version of principia. #2810

Closed

pleroy mentioned this pull request Nov 20, 2021

Benchmark for DiscreteTrajectory #3206

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changed Forkable::ancestry_ to an absl::InlinedVector. #2776

Changed Forkable::ancestry_ to an absl::InlinedVector. #2776

rnlahaye commented Oct 28, 2020

pleroy commented Oct 28, 2020

rnlahaye commented Oct 28, 2020

eggrobin commented Oct 28, 2020

rnlahaye commented Oct 28, 2020

pleroy left a comment

pleroy Oct 28, 2020

rnlahaye Oct 29, 2020

pleroy Oct 28, 2020

rnlahaye Oct 29, 2020

eggrobin Oct 31, 2020

pleroy Oct 28, 2020

rnlahaye Oct 29, 2020

pleroy Oct 28, 2020

rnlahaye Oct 29, 2020

pleroy commented Oct 28, 2020

rnlahaye commented Oct 29, 2020

eggrobin left a comment

eggrobin Oct 29, 2020

eggrobin Oct 29, 2020

eggrobin Oct 29, 2020

eggrobin Oct 29, 2020

eggrobin Oct 29, 2020

eggrobin Oct 30, 2020

eggrobin Oct 30, 2020

eggrobin Oct 31, 2020

eggrobin Oct 31, 2020

eggrobin Oct 31, 2020

		@@ -239,8 +239,8 @@ LowerBound(Instant const& time) const {
		// This queue is parallel to \|iterator.ancestry_\|, and each entry is an

Changed Forkable::ancestry_ to an absl::InlinedVector. #2776

Changed Forkable::ancestry_ to an absl::InlinedVector. #2776

Conversation

rnlahaye commented Oct 28, 2020

pleroy commented Oct 28, 2020

rnlahaye commented Oct 28, 2020

eggrobin commented Oct 28, 2020

rnlahaye commented Oct 28, 2020

pleroy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pleroy commented Oct 28, 2020

rnlahaye commented Oct 29, 2020

eggrobin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment