Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/borgbackup: Borgbackup retry #99196

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft

Conversation

toonn
Copy link
Contributor

@toonn toonn commented Sep 30, 2020

Motivation for this change

I was lacking an option to specify borgbackup jobs should restart if
they failed. My remote borg repo isn't always available so setting up
weekly backups wasn't very useful. The backups failed every time so my
files were never backed up. Specifying Restart=on-failure and
RestartSec=X min should prevent spurious failures from ruining your
day when it comes time to restore your non-existent backups : )

Things done

I've run this patch (applied at nixos-19.09) for quite a while now and it's worked like a charm.

  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@bjornfor
Copy link
Contributor

Related: #85696

@stale
Copy link

stale bot commented Jul 4, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jul 4, 2021
@dotlambda
Copy link
Member

I suggest you set a shorter interval instead and use preHook to check how old the last backup is. If it's less than a week old, just call exit.

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jul 4, 2021
@toonn
Copy link
Contributor Author

toonn commented Jul 4, 2021

@dotlambda, why? That sounds like a worse version of using the appropriate systemd features? Systemd knows whether or not the unit ran successfully so it doesn't require any superfluous checking of the remote repository.

@dotlambda dotlambda requested a review from mweinelt July 4, 2021 15:46
@mweinelt
Copy link
Member

mweinelt commented Jul 4, 2021

use preHook to check how old the last backup is

Is that even possible in an append-only scenario?

@toonn
Copy link
Contributor Author

toonn commented Oct 2, 2021

Would appreciate a review on this.

Copy link
Member

@aanderse aanderse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a high level this change LGTM... but I don't have borgbackup experience. What do we need to do to get this merged?

ping @Lassulus @mweinelt

❤️ @toonn

@toonn
Copy link
Contributor Author

toonn commented Nov 7, 2021

@dotlambda, your scheduled end of the week (very end of the week) ping : )

Copy link
Member

@dotlambda dotlambda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a test case in which the server machine is started after the job is started for the first time in order to test whether the job is actually restarted.

nixos/modules/services/backup/borgbackup.nix Outdated Show resolved Hide resolved
nixos/modules/services/backup/borgbackup.nix Outdated Show resolved Hide resolved
@toonn
Copy link
Contributor Author

toonn commented Nov 13, 2021

@dotlambda, I have no experience with NixOS tests. How would I start a server after starting a job on it?

@Stunkymonkey
Copy link
Contributor

should this be closed in favor of #153346?

@toonn
Copy link
Contributor Author

toonn commented Jan 24, 2022

Have to say this is leaving a rather sour taste in my mouth.

I'd also still like the Restart option so a failed attempt doesn't necessarily mean a lost backup.

@Stunkymonkey
Copy link
Contributor

@toonn could you please resolve the merge conflicts? I would be happy to have this merged. Please have a look at: https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/services/backup/borgbackup.nix#L104=

I was lacking an option to specify borgbackup jobs should restart if
they failed. My remote borg repo isn't always available so setting up
weekly backups wasn't very useful. The backups failed every time so my
files were never backed up. Specifying `Restart=on-failure` and
`RestartSec=X min` should prevent spurious failures from ruining your
day when it comes time to restore your non-existent backups : )

Might need a Persistent timer for more reliability, in case system
happens to be powered down every day at the time of backup.
toonn and others added 4 commits June 10, 2022 13:07
A persistent timer is useful if your machine is regularly turned off
around scheduled backup times.

It's useful to randomly delay the backup timers especially when they are
persistent because they tend to bunch together when powering up your
machine.

I've renamed the `persistentTimer` option to be in line with other NixOS
options like `nix.gc.persistent` and `systemd.timer` terminology.

I've renamed `mkBackupTimers` to the singular because the plural is
deceptive, the function only ever generates a config for a single timer.
I also alphabetized the attrsets here and for the options, excepting
`type` as seems to be the convention.
Co-authored-by: Robert Schütz <nix@dotlambda.de>
@toonn
Copy link
Contributor Author

toonn commented Jun 10, 2022

@Stunkymonkey, rebased.

I renamed mkBacupTimers to my original singular because the function never generates a config for more than one timer.

And I renamed persistentTimer to persistent, this is in line with other options, e.g. nix.gc.persistent, system.autoupgrade.persistent, networking.dhcpcd.persistent, and the systemd.timers terminology. I'm looking into how to generate a warning for this change now, so anyone who's set it knows to migrate.

@toonn
Copy link
Contributor Author

toonn commented Jun 10, 2022

Found mkRenamedOptionModule but still looking into how that would work for an option with a generic <name> in the middle.

@mweinelt
Copy link
Member

mweinelt commented Jun 10, 2022

@dotlambda, I have no experience with NixOS tests. How would I start a server after starting a job on it?

By replacing start_all() with client.start() and trying to backup once while the server is still down. Then server.start() and wait for the retry.

I agree that this change should come with a test.

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jan 8, 2023
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 20, 2024
@wegank wegank marked this pull request as draft March 25, 2024 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants