Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed Forkable::ancestry_ to an absl::InlinedVector. #2776

Merged
merged 10 commits into from
Oct 31, 2020

Conversation

rnlahaye
Copy link
Contributor

Previously it was a deque.
This change elides allocations in most cases.

Note that as part of the change, the direction of ancestry was reversed
(i.e. front and back have been swapped). This is only a notational
change and has no effect.

The inlined size of two was decided upon by analyzing a journal. Most (~80%) instances of ancestry_ eventually got to size 2, whereas only 10% got to size 3.

I also ran the benchmarks. I wasn't sure which ones to run, so I ran all of them. The results are
attached here.

Some context: At some point I noticed my game had become laggy and profiled it to see why. It turned out that a lot of time was being spent allocating/deallocating Forkable::ancestry_. Changing it to an InlinedVector reduced time per frame spent waiting on Principia from ~40ms to ~7ms.

@pleroy
Copy link
Member

pleroy commented Oct 28, 2020

[Automated message from GitHub Pull Request Builder] Answer "ok to test" to trigger testing of this PR.

@rnlahaye
Copy link
Contributor Author

ok to test

@eggrobin
Copy link
Member

I don’t think you can see the CI results, so copying that here: the linter says

forkable_body.hpp:94 Redundant blank line at the end of a code block should be deleted.

@rnlahaye
Copy link
Contributor Author

ok to test

Previously it was a deque.
This change elides allocations in most cases.

Note that as part of the change, the direction of ancestry was reversed
(i.e. front and back have been swapped). This is only a notational
change and has no effect.
Copy link
Member

@pleroy pleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution.

Would you mind writing a small benchmark? Something that creates a trajectory, adds a few points, creates 2-3 forks, and iterates over the entire trajectory and the forks. I'd imagine that the performance difference would be fairly clear (or whatever variation thereof where the issue is the clearest). We don't have benchmarks that are very clean in terms of the performance of DiscreteTrajectory at the moment, so it would be good to make sure that we don't create performance regressions in the future.

TimelineConstIterator current_;
std::deque<not_null<Tr4jectory const*>> ancestry_; // Pointers not owned.
absl::InlinedVector<not_null<Tr4jectory const*>, 2>
ancestry_; // Pointers not owned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the comment, it's a relic of the distant past.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Sorry, something went wrong.

ancestry_.erase(ancestry_.begin(), --ancestry_.end());
current_ = ancestry_.front()->timeline_end();
if (current_ == ancestry_.back()->timeline_end() && ancestry_.size() > 1) {
ancestry_ = {ancestry_.front()};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd imagine that ancestry_.resize(1) would be more efficient in terms of heap management and a bit more readable IMO.

Sorry, something went wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that resize would be more readable, but it doesn't compile since not_null<> has no default constructor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s make it an erase(ancestry_.begin() + 1, ancestry_.end()) then.

TimelineConstIterator current_;
std::deque<not_null<Tr4jectory const*>> ancestry_; // Pointers not owned.
absl::InlinedVector<not_null<Tr4jectory const*>, 2>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that 3 would work better. While most trajectories only have 2 elements in this member, most trajectories will go up to 3 (history, psychohistory, prediction) at some point in their lifetime, we might as well reserve space for that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I added a comment with the reasoning.

@@ -285,7 +283,8 @@ LowerBound(Instant const& time) const {
// We found an interesting timeline, i.e. one that is nonempty and
// not forked at the fork point of its parent. Cut the ancestry and
// return the beginning of that timeline.
iterator.ancestry_.erase(iterator.ancestry_.begin(), ancestry_it);
iterator.ancestry_ = {ancestry_it, iterator.ancestry_.rend()};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong. The construction using reverse iterators is going to effectively reverse the range, so the element at rend-1, i.e., at begin will land at the back. I would stick to erase here. It would be good to add a test covering this situation: you'll need a trajectory with many forks, call LowerBound with a time near the middle, and check that the resulting iterator gives you data in the right order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct; it is wrong. I added a new test and changed it to erase.

@pleroy
Copy link
Member

pleroy commented Oct 28, 2020

(Please ignore the CI failures, they are bogus.)

@mockingbirdnest mockingbirdnest deleted a comment from pleroy Oct 28, 2020
@rnlahaye
Copy link
Contributor Author

I added a benchmark. Here are the results.

Benchmark                                                  Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------
BM_DiscreteTrajectoryFront                              -0.8572         -0.8572           154            22           154            22
BM_DiscreteTrajectoryBack                               -0.8964         -0.8964           152            16           152            16
BM_DiscreteTrajectoryBegin                              -0.8577         -0.8577           156            22           156            22
BM_DiscreteTrajectoryEnd                                -0.9683         -0.9683           152             5           152             5
BM_DiscreteTrajectoryCreateDestroy/8                    -0.0532         -0.0526          1163          1101          1161          1100
BM_DiscreteTrajectoryCreateDestroy/64                   -0.0314         -0.0311          9142          8855          9129          8845
BM_DiscreteTrajectoryCreateDestroy/512                  -0.0017         -0.0020         75286         75158         75209         75056
BM_DiscreteTrajectoryCreateDestroy/1024                 +0.0397         +0.0374        160894        167273        160706        166713
BM_DiscreteTrajectoryIterate/8                          -0.9046         -0.9054          1674           160          1672           158
BM_DiscreteTrajectoryIterate/64                         -0.8907         -0.8912         10654          1165         10643          1157
BM_DiscreteTrajectoryIterate/512                        -0.8870         -0.8878         84547          9551         84432          9471
BM_DiscreteTrajectoryIterate/1024                       -0.8929         -0.8934        172283         18447        172051         18342
BM_DiscreteTrajectoryReverseIterate/8                   -0.8299         -0.8308          1762           300          1760           298
BM_DiscreteTrajectoryReverseIterate/64                  -0.7994         -0.8020         11562          2320         11550          2286
BM_DiscreteTrajectoryReverseIterate/512                 -0.7837         -0.7859         88633         19173         88558         18957
BM_DiscreteTrajectoryReverseIterate/1024                -0.7780         -0.7803        176187         39116        176040         38676
BM_DiscreteTrajectoryFind/8                             -0.9023         -0.9025           456            45           456            44
BM_DiscreteTrajectoryFind/64                            -0.8208         -0.8214           477            85           476            85
BM_DiscreteTrajectoryFind/512                           -0.7258         -0.7277           569           156           568           155
BM_DiscreteTrajectoryFind/1024                          -0.7449         -0.7449           576           147           575           147
BM_DiscreteTrajectoryLowerBound/8                       -0.9483         -0.9483          1142            59          1141            59
BM_DiscreteTrajectoryLowerBound/64                      -0.9092         -0.9092          1158           105          1157           105
BM_DiscreteTrajectoryLowerBound/512                     -0.8618         -0.8619          1237           171          1235           171
BM_DiscreteTrajectoryLowerBound/1024                    -0.8712         -0.8712          1279           165          1277           165

Copy link
Member

@eggrobin eggrobin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this impressive performance improvement; just a few nitpicks, looks good.

namespace {

// Creates a trajectory with the given number of steps.
std::unique_ptr<DiscreteTrajectory<World>> CreateTrajectory(int const steps) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not_null<std::unique_ptr<DiscreteTrajectory<World>>>

auto trajectory = std::make_unique<DiscreteTrajectory<World>>();
DegreesOfFreedom<World> const d(World::origin, Velocity<World>());
Instant t;
for (int i = 0; i < steps; i++, t += Second) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tend to write 1 * Second etc., it is more readable (and since are those are all constexpr the compilers are smart enough).

BENCHMARK(BM_DiscreteTrajectoryLowerBound)->Range(8, 1024);

} // namespace physics
} // namespace principia
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New line at end of file (the latest CI failure was the linter complaining about this).

// |parent| should be nonempty.
// |pos| should be in [0, 1].
not_null<DiscreteTrajectory<World>*> ForkAt(DiscreteTrajectory<World>& parent,
float const pos) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s not mix precisions, we use double everywhere (it gets converted to double where it is used on line 45 anyway).

@@ -84,10 +84,11 @@ class ForkableIterator {
void CheckNormalizedIfEnd();

// |ancestry_| is never empty. |current_| is an iterator in the timeline
// for |ancestry_.back()|. |current_| may be at end.
// for |ancestry_.back()|. |current_| may be at end. The inline size of 3
// for |ancestry_| is intended to cover a trajectory's history, psychohistory,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A vessel’s history, psychohistory, and prediction.

@@ -239,8 +239,8 @@ LowerBound(Instant const& time) const {
// This queue is parallel to |iterator.ancestry_|, and each entry is an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a queue anymore.

fork->NewFork(fork->timeline_find(t2_));
fork2->push_back(t3_);

auto lower_bound_it = fork2->LowerBound(t1_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const the iterators.


// Creates a trajectory with the given number of steps.
std::unique_ptr<DiscreteTrajectory<World>> CreateTrajectory(int const steps) {
auto trajectory = std::make_unique<DiscreteTrajectory<World>>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_not_null_unique

} // namespace

void BM_DiscreteTrajectoryFront(benchmark::State& state) {
std::unique_ptr<DiscreteTrajectory<World>> trajectory = CreateTrajectory(4);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const the pointers that are const (not_null<...> const or auto const, everywhere in this file).

ancestry_.erase(ancestry_.begin(), --ancestry_.end());
current_ = ancestry_.front()->timeline_end();
if (current_ == ancestry_.back()->timeline_end() && ancestry_.size() > 1) {
ancestry_ = {ancestry_.front()};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s make it an erase(ancestry_.begin() + 1, ancestry_.end()) then.

@eggrobin eggrobin added the LGTM label Oct 31, 2020
@eggrobin eggrobin merged commit 59e6ae5 into mockingbirdnest:master Oct 31, 2020
@rnlahaye rnlahaye deleted the slow branch October 31, 2020 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants