hetzner: do not reboot instance after partitioning failed #433

danbst · 2016-05-12T14:07:41Z

There may be errors during partitioning phase (incorrect configuration, problems with devices, etc), but they are rarely critical to reboot rescue shell.
With this change time to redeploy after partition error is reduced by 1 minute.

I still do bootstrapping phase, because nixpart version could be changed

Also I'd like to ask nixops devs:

Is it correct place to define new state?
Bootstrapping phase doesn't remove old bootstrapping tools, just overwrites them. I see, there is check on whether space is enough. So with this change redeploy after error can fail on very-very small instances (where total space in rescue tmpfs is less them 700Mb). Though it affects only "paritioning exception" scenario, so no tests would be affected

part of #429

There may be errors during partitioning phase (incorrect configuration, problems with devices, etc), but they are rarely critical to reboot rescue shell. With this change time to redeploy after error is reduced by 1 minute.

rbvermaa · 2016-05-18T08:22:35Z

@aszlig Do you have any opinions on this?

aszlig · 2016-05-18T09:03:05Z

nixops/backends/hetzner.py

@@ -591,7 +597,11 @@ def create(self, defn, check, allow_reboot, allow_recreate):

        if not self.vm_id:
            self.log("installing machine...")
-            self.reboot_rescue(install=True, partitions=defn.partitions)


The same can be achieved by just checking for the RESCUE state here.

I was too cautious here =)

aszlig · 2016-05-18T09:07:29Z

I originally did this because sometimes when blivet has failed it could end up in an unrecoverable state, so I'm not sure whether it makes sense to reuse the rescue from that state. Maybe it would make sense to add a flag or even just emit a warning telling the user what to do if the he/she wants to throw away the current state.

danbst · 2016-05-20T09:12:08Z

@aszlig Thanks for review!

1 > blivet has failed it could end up in an unrecoverable state
What kind of unrecoverable state? Did rescue reboot fix it? I ask, because I've caught the "unrecoverable state" that even rescue reboot didn't fix (caused by manually configured LVM over RAID). Even nixops destroy didn't fix it

2 > or even just emit a warning telling the user what to do if the he/she wants to throw away the current state
I do like this approach!

hetznerIron> installing machine...
hetznerIron> rescue system is present and accessible
hetznerIron> partitioning...
hetznerIron> Exception: partitioning failed! Try one of these before redeploy:
  1) Check your kickstart script.
  2) Run "destroy" command to restart rescue system.
  3) Login to rescue with password XXXX and try to fix problems manually.

3 This isn't related to my changes, but there is an easy way to brick Nixops rescue mode. Just press Ctrl-C during rescue reboot

hetznerIron> installing machine...
hetznerIron> rebooting machine в??hetznerIronв?T (MY_IP) into rescue system
hetznerIron> sending reboot command... done.
hetznerIron> waiting for rescue system...Connection to MY_IP closed by remote host.
[down]....^C
$

Now NixOps cannot reboot instance (wrong password). I think this is because of lines https://github.com/danbst/nixops/blob/7c60f056f2438c58796c9dc5dd4bc1030bfddcdf/nixops/backends/hetzner.py#L357-L358

        self._wait_for_rescue(self.main_ipv4)
        self.rescue_passwd = rescue_passwd

I can fill issue for this if reboot on bootstrap problems will be left.

domenkozar · 2016-12-06T14:55:38Z

Maybe it just could be a flag to enable this?

danbst · 2016-12-06T16:27:39Z

@domenkozar unfortunately, I have no free Hetzner machines to further maintain this PR.

hetzner: do not reboot instance after partitioning failed

7c60f05

There may be errors during partitioning phase (incorrect configuration, problems with devices, etc), but they are rarely critical to reboot rescue shell. With this change time to redeploy after error is reduced by 1 minute.

aszlig reviewed May 18, 2016
View reviewed changes

domenkozar added the backends/hetzner label Jun 3, 2016

danbst closed this Dec 6, 2016

danbst mentioned this pull request Dec 13, 2016

Hetzner improvements wishlist #429

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hetzner: do not reboot instance after partitioning failed #433

hetzner: do not reboot instance after partitioning failed #433

danbst commented May 12, 2016

rbvermaa commented May 18, 2016

aszlig May 18, 2016 •

edited

danbst May 20, 2016

aszlig commented May 18, 2016

danbst commented May 20, 2016

domenkozar commented Dec 6, 2016

danbst commented Dec 6, 2016

hetzner: do not reboot instance after partitioning failed #433

hetzner: do not reboot instance after partitioning failed #433

Conversation

danbst commented May 12, 2016

rbvermaa commented May 18, 2016

aszlig May 18, 2016 • edited

Choose a reason for hiding this comment

danbst May 20, 2016

Choose a reason for hiding this comment

aszlig commented May 18, 2016

danbst commented May 20, 2016

domenkozar commented Dec 6, 2016

danbst commented Dec 6, 2016

aszlig May 18, 2016 •

edited