-
-
Notifications
You must be signed in to change notification settings - Fork 15.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test-driver.py: delete VM state directory after test run #91046
Conversation
nixos/doc/manual/development/running-nixos-tests-interactively.xml
Outdated
Show resolved
Hide resolved
fc58905
to
34b3d16
Compare
nixos/doc/manual/development/running-nixos-tests-interactively.xml
Outdated
Show resolved
Hide resolved
34b3d16
to
a05d109
Compare
nixos/doc/manual/development/running-nixos-tests-interactively.xml
Outdated
Show resolved
Hide resolved
a05d109
to
7484afc
Compare
One thing I do not like about the Not saying the current default is better, I was bitten by that too. |
That's a fair point.
Just to be sure I understand what you're suggesting: you'd rather like
to delete a potential VM state during the machine startup instead of
cleaning that as soon as we shutdown the machine?
|
nixos/doc/manual/development/running-nixos-tests-interactively.xml
Outdated
Show resolved
Hide resolved
That would be my personal preference. Keep in mind that machine startup/shutdown does not always correspond exactly with script invocation/cleanup. We can have additional The current PR does this properly by removing state in |
7484afc
to
44d780e
Compare
Keeping the VM state test across several run sometimes lead to subtle and hard to spot errors in practice. We delete the VM state which contains (among other things) the qcow volume. We also introduce a -K (--keep-vm-state) flag making VM state to persist after the test run. This flag makes test-driver.py to match its previous behaviour.
44d780e
to
7e7aa52
Compare
I updated the script to remove the state on startup instead of doing that during the test teardown. @danielfullmer you're absolutely right. I took care of doing that during the test startup and not during the machine startup. I updated the doc accordingly to this behaviors change. |
It's fine from my PoV. @flokli can we merge? |
👍 |
Deleting virtual machine images without confirmation seems like the worst possible default imaginable. It's basically an unprompted |
From the discussions in #90663 (comment), cleaning up, plus a This is called As described in the linked issue, multiple people ran into annoying behaviour, until they realized it was the qcow file that was kept between restarts. If people want that behaviour, they have a way to opt in to it - and it's documented in the release notes, too. |
Citation needed.
If you mean This stray state issue bit me personally twice. Several people publicly stated being bitten by this in these two PRs. Some people also mentioned me being bitten by this at some point in some private side-channels. I know as a matter of fact some people are wrapping the test runner in a bash script removing any stray state before running the actual VM test, some other people are moving
The whole point of this PR is to make the precise opposite behavior the default one. I'm open to discuss this and change my mind. As a matter of fact, I actually incorporated almost all the requested changes as part of this PR. That being said, you need to back up your affirmations with some practical arguments if you want to move this discussion ahead. |
As far as I can tell, this change is entirely inert: os.path.isfile will always return diff --git a/third_party/nixpkgs/nixos/lib/test-driver/test-driver.py b/third_party/nixpkgs/nixos/lib/test-driver/test-driver.py
index 7b8d5803a..43281d5e8 100644
--- a/third_party/nixpkgs/nixos/lib/test-driver/test-driver.py
+++ b/third_party/nixpkgs/nixos/lib/test-driver/test-driver.py
@@ -778,7 +778,7 @@ class Machine:
def cleanup_statedir(self) -> None:
self.log("delete the VM state directory")
- if os.path.isfile(self.state_dir):
+ if os.path.isdir(self.state_dir):
shutil.rmtree(self.state_dir)
def shutdown(self) -> None: …it breaks everything:
|
As an aside, a piece of software logging "delete the VM state directory" (ie, in the imperative case) makes me consider whether I—the human—am supposed to delete the directory, and definitely confused me for a bit given that the software certainly didn't seem to be deleting the directory itself. It probably should log "deleted the VM state directory", but only after it has actually successfully done so. |
Thanks for the heads up. I realized this recently and worked on a fix, I totally forgot to upstream that. I'm still a bit unsure about the fix, I need to spend a few brain cycles to be sure it's not making things worse. I'll try to PR that soon together with the message improvement you're suggesting. In the meantime, here's the diff I'm currently applying. diff --git a/nixos/lib/test-driver/test-driver.py b/nixos/lib/test-driver/test-driver.py
index 612bb02aaa9..bc19033277b 100644
--- a/nixos/lib/test-driver/test-driver.py
+++ b/nixos/lib/test-driver/test-driver.py
@@ -162,6 +162,11 @@ class Machine:
if match:
self.name = match.group(1)
+ self.logger = MachineLogAdapter(
+ logger,
+ extra=dict(machine=self.name, colour_code=next(machine_colours_iter)),
+ )
+
self.script = args.get("startCommand", self.create_startcommand(args))
tmp_dir = os.environ.get("TMPDIR", tempfile.gettempdir())
@@ -171,7 +176,10 @@ class Machine:
os.makedirs(path, mode=0o700, exist_ok=True)
return path
- self.state_dir = create_dir("vm-state-{}".format(self.name))
+ self.state_dir = os.path.join(tmp_dir, f"vm-state-{self.name}")
+ if not args["keepVmState"]:
+ self.cleanup_statedir()
+ os.makedirs(self.state_dir, mode=0o700, exist_ok=True)
self.shared_dir = create_dir("shared-xchg")
self.booted = False
@@ -180,10 +188,6 @@ class Machine:
self.socket = None
self.monitor: Optional[socket.socket] = None
self.allow_reboot = args.get("allowReboot", False)
- self.logger = MachineLogAdapter(
- logger,
- extra=dict(machine=self.name, colour_code=next(machine_colours_iter)),
- )
@staticmethod
def create_startcommand(args: Dict[str, str]) -> str:
@@ -714,8 +718,9 @@ class Machine:
self.logger.info(f"QEMU running (pid {self.pid})")
def cleanup_statedir(self) -> None:
- self.logger.info("delete the VM state directory")
- if os.path.isfile(self.state_dir):
+ self.logger.info(self.state_dir)
+ if os.path.isdir(self.state_dir):
+ self.logger.info(f"delete the VM state directory")
shutil.rmtree(self.state_dir)
def shutdown(self) -> None:
@@ -870,10 +875,10 @@ def main() -> None:
for nr, vde_socket, _, _ in vde_sockets:
os.environ["QEMU_VDE_SOCKET_{}".format(nr)] = vde_socket
- machines = [create_machine({"startCommand": s}) for s in vm_scripts]
- for machine in machines:
- if not cli_args.keep_vm_state:
- machine.cleanup_statedir()
+ machines = [
+ create_machine({"startCommand": s, "keepVmState": cli_args.keep_vm_state})
+ for s in vm_scripts
+ ]
machine_eval = [
"global {0}; {0} = machines[{1}]".format(m.name, idx)
for idx, m in enumerate(machines) |
Thanks for the quick response, and glad to hear you're aware of the issue.
Not sure why the f-string, and we should definitely say either "deleted" (past tense, after deleting) or "deleting" (present continuous, printed before deleting), rather than "delete" (imperative case, asking the user to do something). |
@NinjaTrappeur did you get a chance to PR the fix in the meantime? |
The previous version of the code would only kick in if the state directory path pointed at a *file*, which never occurs. Making that codepath actually work reveals an ordering bug, which this patch fixes as well. It also replaces the confusing, imperative case log message "delete VM state directory" with "deleting VM state directory". Finally, we hint the user about how to prevent this deletion. IE. by passing the --keep-vm-state flag. Bug report: #91046 (comment) Credit goes to Edef for the rebase on top of a recent nixpkgs commit and for writing most of this commit message. Co-authored-by: edef <edef@edef.eu>
The previous version of the code would only kick in if the state directory path pointed at a *file*, which never occurs. Making that codepath actually work reveals an ordering bug, which this patch fixes as well. It also replaces the confusing, imperative case log message "delete VM state directory" with "deleting VM state directory". Finally, we hint the user about how to prevent this deletion. IE. by passing the --keep-vm-state flag. Bug report: #91046 (comment) Credit goes to Edef for the rebase on top of a recent nixpkgs commit and for writing most of this commit message. Co-authored-by: edef <edef@edef.eu> (cherry picked from commit ecb73fd)
Motivation for this change
Continuation of #90663, read the discussion there first to get the full context of this PR.
Keeping the VM state test across several run sometimes lead to subtle
and hard to spot errors in practice.
Things done
We delete the VM state which
contains (among other things) the qcow volume.
We also introduce a -K (--keep-vm-state) flag making VM state to
persist after the test run. This flag makes test-driver.py to match
its previous behaviour.
Note: some diffs in the
running-nixos-tests-interactively.xml
file are coming from the docbook autoformater after runningmake
. I decided to bit the bullet and these diffs, preventing any future reformatting. I can undo this if necessary.sandbox
innix.conf
on non-NixOS linux)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)nix path-info -S
before and after)