Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when staging/decoupling #2641

Closed
lpgagnon opened this issue Jul 12, 2020 · 12 comments · Fixed by #2971
Closed

Crash when staging/decoupling #2641

lpgagnon opened this issue Jul 12, 2020 · 12 comments · Fixed by #2971
Labels
Milestone

Comments

@lpgagnon
Copy link

lpgagnon commented Jul 12, 2020

Crash to desktop when hitting spacebar for a staging even involving at least one decoupler.
I think I've only seen it happen after having reached LEO.

Last seen in test-2626 build, but I've been getting this one every few days since Fuchs; earliest FATAL.log instance is 2020-06-05.

ERROR.log:

Log file created at: 2020/07/12 10:42:29
Running on machine: LPG-GTX
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
    @   00007FFA1ADD63EF  	google::LogMessageFatal::~LogMessageFatal [0x00007FFA1ADD63EE+46]
    @   00007FF9E13170D8  	(No symbol) [0x00007FF9E13170D7]
    @   00007FF9E14FF990  	principia__VesselVelocity [0x00007FF9E14FF98F+1342959]
    @   00007FF9E14FEF9D  	principia__VesselVelocity [0x00007FF9E14FEF9C+1340412]
    @   00007FF9E149E025  	principia__VesselVelocity [0x00007FF9E149E024+943236]
    @   00007FF9E1385288  	principia__CatchUpLaggingVessels [0x00007FF9E1385287+231]
    @   000001CEF5A158B7  	(No symbol) [0x000001CEF5A158B6]
F0712 10:42:29.588973  8164 discrete_trajectory_body.hpp:196] Check failed: --timeline_.end() == it Append out of order at -1.28352951710541439e+09 s, last time is -1.28352951496634579e+09 s

INFO.log

@eggrobin
Copy link
Member

Same remark as in #2640: We need the INFO log, not the ERROR log.

Also, « append out of order » issues can be very tricky to debug, so a journal would help if you can reproduce the issue.

@lpgagnon
Copy link
Author

INFO.log added. journal's not likely; I haven't seen any pattern to the occurrences

@pleroy
Copy link
Member

pleroy commented Jul 13, 2020

Decoded stack trace below. I have a vague recollection of seeing out-of-order appends in this code before... Ah, it was #2507.

Iterator** const collided_vessels) {
journal::Method<journal::CatchUpLaggingVessels> m({plugin},
{collided_vessels});
CHECK_NOTNULL(plugin);
VesselSet collided_vessel_set;
plugin->CatchUpLaggingVessels(collided_vessel_set);
void Plugin::CatchUpLaggingVessels(VesselSet& collided_vessels) {
CHECK(!initializing_);
// Start all the integrations in parallel.
std::vector<PileUpFuture> pile_up_futures;
for (auto* const pile_up : pile_ups_) {
pile_up_futures.emplace_back(
pile_up,
vessel_thread_pool_.Add([this, pile_up]() {
// Note that there cannot be contention in the following method as
// no two pile-ups are advanced at the same time.
return pile_up->DeformAndAdvanceTime(current_time_);
}));
}
// Wait for the integrations to finish and figure out which vessels collided
// with a celestial.
for (auto& pile_up_future : pile_up_futures) {
WaitForVesselToCatchUp(pile_up_future, collided_vessels);
}
// Update the vessels.
for (auto const& [_, vessel] : vessels_) {
if (vessel->psychohistory().back().time < current_time_) {
if (Contains(collided_vessels, vessel.get())) {
vessel->DisableDownsampling();
}
vessel->AdvanceTime();
void Vessel::AdvanceTime() {
// Squirrel away the prediction so that we can reattach it if we don't have a
// prognostication.
auto prediction = prediction_->DetachFork();
history_->DeleteFork(psychohistory_);
AppendToVesselTrajectory(&Part::history_begin,
DiscreteTrajectory<Barycentric>& trajectory) {
CHECK(!parts_.empty());
std::vector<DiscreteTrajectory<Barycentric>::Iterator> its;
std::vector<DiscreteTrajectory<Barycentric>::Iterator> ends;
its.reserve(parts_.size());
ends.reserve(parts_.size());
for (auto const& [_, part] : parts_) {
its.push_back((*part.*part_trajectory_begin)());
ends.push_back((*part.*part_trajectory_end)());
}
Instant fork_time;
if (!trajectory.is_root()) {
fork_time = trajectory.Fork()->time;
}
// Loop over the times of the trajectory.
for (;;) {
auto const& it0 = its[0];
bool const at_end_of_part_trajectory = it0 == ends[0];
Instant const first_time = at_end_of_part_trajectory ? Instant()
: it0->time;
bool const after_fork_time = !at_end_of_part_trajectory &&
(trajectory.is_root() ||
first_time > fork_time);
// Loop over the parts at a given time.
BarycentreCalculator<DegreesOfFreedom<Barycentric>, Mass> calculator;
int i = 0;
for (auto const& [_, part] : parts_) {
auto& it = its[i];
CHECK_EQ(at_end_of_part_trajectory, it == ends[i]);
if (!at_end_of_part_trajectory) {
calculator.Add(it->degrees_of_freedom, part->mass());
CHECK_EQ(first_time, it->time);
++it;
}
++i;
}
if (at_end_of_part_trajectory) {
return;
}
// Append the parts' barycentre to the trajectory.
if (after_fork_time) {
DegreesOfFreedom<Barycentric> const vessel_degrees_of_freedom =
calculator.Get();
trajectory.Append(first_time, vessel_degrees_of_freedom);
DegreesOfFreedom<Frame> const& degrees_of_freedom) {
CHECK(this->is_root() || time > this->Fork()->time)
<< "Append at " << time << " which is before fork time "
<< this->Fork()->time;
if (!timeline_.empty() && timeline_.cbegin()->first == time) {
LOG(WARNING) << "Append at existing time " << time
<< ", time range = [" << this->front().time << ", "
<< this->back().time << "]";
return;
}
auto it = timeline_.emplace_hint(timeline_.end(),
time,
degrees_of_freedom);
// Decrementing |end()| is much faster than incrementing |it|. Don't ask.
CHECK(--timeline_.end() == it)

@pleroy pleroy added the bug label Jul 13, 2020
@pleroy
Copy link
Member

pleroy commented Jul 14, 2020

Last bit of the INFO log:

I0712 10:42:29.467294  8164 vessel.cpp:126] Extracting part KzProcFairingSide2.TU (77FABE01) from vessel 0133 station crew (018369b3-02b6-4ab1-8955-bd2f6ea2f96f)
I0712 10:42:29.467294  8164 vessel.cpp:116] Adding part KzProcFairingSide2.TU (77FABE01) to vessel 0133 station crew Debris (adcaef30-19ba-4ccf-ab40-b2450aee3946)
I0712 10:42:29.468291  8164 plugin.cpp:565] Collision between HeatShield2 (97846DF6) and KzProcFairingSide2.TU (77FABE01)
I0712 10:42:29.468291  8164 plugin.cpp:565] Collision between KzProcFairingSide2.TU (77FABE01) and HeatShield2 (97846DF6)
I0712 10:42:29.468291  8164 vessel.cpp:168] Preparing history of vessel 0133 station crew Debris (adcaef30-19ba-4ccf-ab40-b2450aee3946) at -1.28352951496634579e+09 s
I0712 10:42:29.587976  8164 plugin.cpp:565] Collision between HeatShield2 (97846DF6) and KzProcFairingSide2.TU (77FABE01)
I0712 10:42:29.587976  8164 plugin.cpp:565] Collision between KzProcFairingSide2.TU (77FABE01) and HeatShield2 (97846DF6)

The "Preparing history" message seems to indicate that we have a newly created vessel with a history containing a single point (the barycentre of the parts at the time when the vessel was created), and that we later try to insert the barycentre of the parts' histories, which has a point before the start of the new history.

@lpgagnon: Did you have any engine/RCS burning at the time of the decoupling? (Grasping at straws...)

@lpgagnon
Copy link
Author

In this case, rcs was probably active on the primary vessel ("133 station crew"). "133 station crew Debris" would be a single-part "vessel" (KzProcFairingSide2.TU) being jettisoned, so nothing firing there.

@lpgagnon
Copy link
Author

lpgagnon commented Aug 20, 2020

Haven't seen this happen since upgrading from Galileo to Gallai now seen with Gallai as well

Sorry, something went wrong.

@Bellabong
Copy link

I get this from time to time as well and just as unpredictably/inconsistently, but when I turned on journalling for the two times I could replicate the crash from a save... it didn't crash.

Sorry, something went wrong.

@lpgagnon
Copy link
Author

not seen since Gallois, I believe

Sorry, something went wrong.

@Growflavor
Copy link

Might the following fatal when decoupling while orbiting Trappist1e be related?

To make the Principia logs short here is a fatal at decoupling of fairings of a probe already orbiting 1e by loading a saved game:

https://drive.google.com/drive/folders/1rBWbw6P4icUrzB2m8YriKyuL_eRKSjjn?usp=sharing

@barcharcraz
Copy link

INFO.20210430-124413.18744.log

I've gotten this crash in Grassmann, info log attached

@pleroy
Copy link
Member

pleroy commented May 1, 2021

To all the people affected by this bug: I don't think we have a hope of solving it without a journal. The logs don't tell us much, other that the curious "Preparing history" message milliseconds before the crash. And as for the stack: well, I have stared at this code for quite some time without finding anything suspicious.

Given that someone runs into the problem every few months I have to believe that this is a rare cornercase, and these typically require that I can reproduce them under a debugger.

@pleroy
Copy link
Member

pleroy commented May 2, 2021

Note to self: might be the same crash as #2931, for which we have a journal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants