Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add video recording to NixOS VM tests #41165

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

aszlig
Copy link
Member

@aszlig aszlig commented May 28, 2018

Sometimes, it's a bit difficult to debug VM tests post-mortem, especially when GUIs are involved. So this adds a new nixos-test QEMU UI module to qemu_test, which writes raw video frame deltas to an intermediate file (the reasons for this are detailed in the commit message of db8e70f), so that it can be encoded to a video file at a later stage.

Encoding of the video is done using a helper tool called nixos-test-encode-video, which encodes the intermediate format into a more commonly recognized one determined by the filename given (for example nixos-test-encode-video foo.video bar.webm will encode the intermediate foo.video into bar.webm using a WebM container with VP9).

NixOS VM test runners now have another attribute called videos, which will gather all the build products from the normal test runner, encode the videos to WebM and add them to the build products. This could see a bit of improvement in Hydra so that those videos are directly displayed in the browser instead of a download.

I've also added usage of the .video attribute in release.nix, so that this is done by default but doesn't increase test run time when building the tests directly without going through release.nix.

A Hydra jobset of this branch is available at https://headcounter.org/hydra/jobset/aszlig/nixos-tests.

Partial example GIF from the enlightenment test:

0vd1c18y8jw1a460ynwdfyhx1gyv4zha-test

Converted back to draft since there are still a few issues to solve:

  • Add version field for intermediate format
  • Test whether endian conversion is done correctly
  • Move encoder and QEMU driver into separate repository
  • Improve frame timing/pixel format options in encoder

@GrahamcOfBorg
Copy link

Success on x86_64-linux (full log)

Attempted: qemu_test

Partial log (click to expand)

patching script interpreter paths in /nix/store/g5cz1s2rwc74ws3x29x1zhvbl629h22q-qemu-for-nixos-tests-2.12.0
checking for references to /build in /nix/store/g5cz1s2rwc74ws3x29x1zhvbl629h22q-qemu-for-nixos-tests-2.12.0...
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
strip is /nix/store/92d2ifxcni4n3zx9s8wnkcjlvnx5ajlc-binutils-2.30/bin/strip
/nix/store/g5cz1s2rwc74ws3x29x1zhvbl629h22q-qemu-for-nixos-tests-2.12.0

@GrahamcOfBorg
Copy link

Failure on x86_64-darwin (full log)

Attempted: qemu_test

Partial log (click to expand)

cannot build derivation '/nix/store/wn4vwaa25msqww4l6slh96v7rnhvc2qr-graphite2-1.3.6.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/7zswqah6rx30dkcm6rvryzkj02hx39wh-snappy-1.1.7.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/nk9fhqi8gpdj4k3sshzhmax5k860y83i-soxr-0.1.3.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/hkihdgf951715cdmk7p33i4jx9aw4rkz-x265-2.7.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/gmd19xlkyqm5y3j5lxa30gk8bizqgn16-harfbuzz-1.7.6.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/4v0qxcxrh2ridnaljpijx5v0nzpxfd1y-libass-0.14.0.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/j8845bkb9lwycvvdxqm2xyvx70nc8lcw-ffmpeg-3.4.2.drv': 3 dependencies couldn't be built
cannot build derivation '/nix/store/y44na8h4sq0nynxz52aynsk2sh88z1nz-nixos-test-encode-video.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/q1c9kkry2cx33sw0jh762fvqjdq9n5an-qemu-for-nixos-tests-2.12.0.drv': 2 dependencies couldn't be built
error: build of '/nix/store/q1c9kkry2cx33sw0jh762fvqjdq9n5an-qemu-for-nixos-tests-2.12.0.drv' failed

@dotlambda
Copy link
Member

Related: #33299

@GrahamcOfBorg
Copy link

Failure on aarch64-linux (full log)

Attempted: qemu_test

Partial log (click to expand)

36 makeWrapper /nix/store/pzdxl08js43135hc4c28id1a6fna4sqr-hook/nix-support/setup-hook
99 _callImplicitHook /nix/store/7b2p80jk29w7jz2d907nsbpqgxr1zk98-stdenv-linux/setup
117 _eval /nix/store/7b2p80jk29w7jz2d907nsbpqgxr1zk98-stdenv-linux/setup
44 runHook /nix/store/7b2p80jk29w7jz2d907nsbpqgxr1zk98-stdenv-linux/setup
1088 installPhase /nix/store/7b2p80jk29w7jz2d907nsbpqgxr1zk98-stdenv-linux/setup
1283 genericBuild /nix/store/7b2p80jk29w7jz2d907nsbpqgxr1zk98-stdenv-linux/setup
2 main /nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh

builder for '/nix/store/9kjkkxzdj3n3imw1k1mpg73axzrip86z-qemu-for-nixos-tests-2.12.0.drv' failed with exit code 1
�[31;1merror:�[0m build of '/nix/store/9kjkkxzdj3n3imw1k1mpg73axzrip86z-qemu-for-nixos-tests-2.12.0.drv' failed

@aszlig
Copy link
Member Author

aszlig commented May 28, 2018

@dotlambda: This is basically a follow-up to that PR and what I suggested back then. Having a custom UI module in QEMU also has the advantage that we can watch regions for changes, which is particularly useful for making tests with keyboard/mouse input more reliable. Right now we handle input blindly based for example on the assumption that a particular application has started, but we don't actually know whether an input field has focus or even the application is accepting input (we only check whether there is an X window, which could be empty) at all. With access to frame deltas we can simply watch for a certain region to change, press a key, wait for another change (and if it doesn't change, repeat the keypress), press the next key and so on.

@aszlig
Copy link
Member Author

aszlig commented May 28, 2018

There is a however still an issue with the videos, because not all frames are written to the intermediate file. The amount of frames received from QEMU is correct but it seems that the UI module doesn't seem to get all the frame updates from the console. I'm adding a WIP label until I've resolved that.

Copy link
Member

@grahamc grahamc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@aszlig
Copy link
Member Author

aszlig commented May 29, 2018

Needed to fixup the first commit because this would break QEMU on non-x86_64-linux systems.

@aszlig
Copy link
Member Author

aszlig commented May 29, 2018

@GrahamcOfBorg build qemu_test

@GrahamcOfBorg
Copy link

Failure on x86_64-darwin (full log)

Attempted: qemu_test

Partial log (click to expand)

CC      libavcodec/aacenctab.o
HOSTCC  libavcodec/aacps_fixed_tablegen.o
/nix/store/i9g6883w05pyj9aa4sl8hn945qyy6j53-bash-4.4-p19/bin/bash: gcc: command not found
make: *** [ffbuild/common.mak:152: libavcodec/aacps_fixed_tablegen.o] Error 127
make: *** Waiting for unfinished jobs....
2 warnings generated.
builder for '/nix/store/xsvcfml2m4imhagn2pcwphr2fhh0zkq2-ffmpeg-3.4.2.drv' failed with exit code 2
cannot build derivation '/nix/store/c1vahagnxmg7kzy28l214j3298x92cx4-nixos-test-encode-video.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/s9p0rl04rgw215hwfvn4sdmz7gjvinky-qemu-for-nixos-tests-2.12.0.drv': 1 dependencies couldn't be built
�[31;1merror:�[0m build of '/nix/store/s9p0rl04rgw215hwfvn4sdmz7gjvinky-qemu-for-nixos-tests-2.12.0.drv' failed

@GrahamcOfBorg
Copy link

Success on x86_64-linux (full log)

Attempted: qemu_test

Partial log (click to expand)

patching script interpreter paths in /nix/store/xcv4a8d27msnn2wialrp26wf8rcr2dnk-qemu-for-nixos-tests-2.12.0
checking for references to /build in /nix/store/xcv4a8d27msnn2wialrp26wf8rcr2dnk-qemu-for-nixos-tests-2.12.0...
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
strip is /nix/store/92d2ifxcni4n3zx9s8wnkcjlvnx5ajlc-binutils-2.30/bin/strip
/nix/store/xcv4a8d27msnn2wialrp26wf8rcr2dnk-qemu-for-nixos-tests-2.12.0

@GrahamcOfBorg
Copy link

Success on aarch64-linux (full log)

Attempted: qemu_test

Partial log (click to expand)

patching script interpreter paths in /nix/store/wbfdcr44a92wzb0nd0q94nnr7mxim79p-qemu-for-nixos-tests-2.12.0
checking for references to /build in /nix/store/wbfdcr44a92wzb0nd0q94nnr7mxim79p-qemu-for-nixos-tests-2.12.0...
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
strip is /nix/store/8yfik687kfccisxnad42j19lfb7ij9b4-binutils-2.30/bin/strip
/nix/store/wbfdcr44a92wzb0nd0q94nnr7mxim79p-qemu-for-nixos-tests-2.12.0

@aszlig
Copy link
Member Author

aszlig commented May 29, 2018

@GrahamcOfBorg build qemu_test

@GrahamcOfBorg
Copy link

Success on x86_64-linux (full log)

Attempted: qemu_test

Partial log (click to expand)

patching script interpreter paths in /nix/store/2rkz845i1j1a8p9wzc5ydyvw56c4sl00-qemu-for-nixos-tests-2.12.0
checking for references to /build in /nix/store/2rkz845i1j1a8p9wzc5ydyvw56c4sl00-qemu-for-nixos-tests-2.12.0...
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
strip is /nix/store/92d2ifxcni4n3zx9s8wnkcjlvnx5ajlc-binutils-2.30/bin/strip
/nix/store/2rkz845i1j1a8p9wzc5ydyvw56c4sl00-qemu-for-nixos-tests-2.12.0

@GrahamcOfBorg
Copy link

Failure on x86_64-darwin (full log)

Attempted: qemu_test

Partial log (click to expand)

CC      libavcodec/aacenc_tns.o
CC      libavcodec/aacenctab.o
HOSTCC  libavcodec/aacps_fixed_tablegen.o
/nix/store/i9g6883w05pyj9aa4sl8hn945qyy6j53-bash-4.4-p19/bin/bash: gcc: command not found
make: *** [ffbuild/common.mak:152: libavcodec/aacps_fixed_tablegen.o] Error 127
make: *** Waiting for unfinished jobs....
builder for '/nix/store/xsvcfml2m4imhagn2pcwphr2fhh0zkq2-ffmpeg-3.4.2.drv' failed with exit code 2
cannot build derivation '/nix/store/y8kp09p11rzs70z7yi09w14ycxcnh3x1-nixos-test-encode-video.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/wldfgfaiydxq8mskh8ac0j7kimpxf2wq-qemu-for-nixos-tests-2.12.0.drv': 1 dependencies couldn't be built
error: build of '/nix/store/wldfgfaiydxq8mskh8ac0j7kimpxf2wq-qemu-for-nixos-tests-2.12.0.drv' failed

@GrahamcOfBorg
Copy link

Success on aarch64-linux (full log)

Attempted: qemu_test

Partial log (click to expand)

patching script interpreter paths in /nix/store/lmr1pg50ls8h80xfqvlnngpj4jir9b1n-qemu-for-nixos-tests-2.12.0
checking for references to /build in /nix/store/lmr1pg50ls8h80xfqvlnngpj4jir9b1n-qemu-for-nixos-tests-2.12.0...
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
cannot find section .dynamic
strip is /nix/store/8yfik687kfccisxnad42j19lfb7ij9b4-binutils-2.30/bin/strip
/nix/store/lmr1pg50ls8h80xfqvlnngpj4jir9b1n-qemu-for-nixos-tests-2.12.0

av_dict_free(&opt);

return ecode;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO the Nixpkgs repo is really not the place to store large C source files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could also put them into a separate repository if you'd prefer that.

@edolstra
Copy link
Member

edolstra commented Jun 5, 2018

While this is cool, it significantly increases the complexity of our testing infrastructure, in particular adding a custom version of QEMU that might be a PITA to maintain.

How large are the video files generated by this for a NixOS release? Our storage costs are already pretty high...

@aszlig
Copy link
Member Author

aszlig commented Jun 5, 2018

@edolstra: The intermediate video files are up to 2 MB (they only contain compressed frame deltas, although lossless) and the encoded videos in WebM format are around 2-5 MB. If that's too much we could alternatively just not encode them and only make the intermediate format available so that videos can be encoded on a per-case basis.

@stale
Copy link

stale bot commented Jun 4, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 4, 2021
@RaitoBezarius
Copy link
Member

I think it would be a shame to let this PR rot.
What I suggest is the following:

  • expose all the required knobs to have such a feature live out-of-tree?

Right now this is a module that encodes every frame delta from QEMU to
an intermediate format so that we can use FFmpeg to encode it into a
proper video.

In the end, the goal is to have videos of the graphical machine output
for the NixOS tests, which is especially useful in tests involving X
where we're basically blind when we run into a race condition or a
loaded Hydra node causes the test to fail.

The reason why I picked the approach to use an intermediate format and
encode it properly later is based on my benchmarks of some seemingly
simple approaches I tried before.

First of all I tried to search the web for solutions that already
existed and found a few, but they weren't really suitable:

  * Use the screendump QMP command to collect frames from the VM, which
    works to some degree but it misses frames.
  * Enable SPICE[1] and capture video from the server, which I actually
    tried to implement before the next option. However, existing
    solutions for capturing video off a SPICE server are rare and when
    testing with my own PoC implementation, I got frame drops as well
    and I didn't manage to capture early boot.
  * Try a patch[2] from the QEMU development mailing list, which adds a
    HMP command to capture and encode it directly to a video. This was
    the slowest option of all and it even lead to test failures because
    we got a timeout during VM startup.
  * Similar results to SPICE I had when capturing video using VNC and
    VncProxy[3].

So I dug through the QEMU code base and found out that UI modules get
frame deltas from Pixman, which is perfect for us, because we're not
losing frames and it also allows direct access to pixel data. It also is
fast and I couldn't even properly benchmark the overhead properly as
tests usually tend to vary in speed for a few seconds.

Before actually writing our own intermediate format, I tried to use an
existing format that would be suitable for us. The requirements for this
format would be to support different frame sizes and variable
framerates, plus it needs to be very fast to encode.

While asking in the #ffmpeg channel on freenode, the best format for
these requirements would be using the NUT[4] format (thanks to "furq"
for the suggestion). However while reading the format specification I
came to the realisation that our requirements are so simple that even
NUT is complicated in comparison, which is why I written our own format.

The specification is as follows:

  The first byte (the opcode) is either an 'S' (0x53, for "switch") or
  an 'U' (0x55, for "update") and determines the format of the following
  data.

  A "switch" is a surface change, like eg. a resize of the display and
  the data following the opcode are the dimensions (width and height,
  both are unsigned 32 bit integers), format (unsigned 8 bit integer)
  and bytes per pixel (unsigned 8 bit integer, currently either 2 or 4)
  of the surface.

  An "update" is a portion of the region that has changed since the last
  update and it's followed by X, Y, width and height (all 32 bit
  unsigned integers) coordinates of the updated region, the absolute
  time (64 bit unsigned integer) and the raw frame data afterwards. Note
  that we don't provide a length here, because we can infer that from
  the bytes per pixel of the last "switch" packet and the coordinates.

  All of the data is in the native endian format of the host processor
  architecture, which is not a problem, because encoding of the final
  video will take place on the same processor architecture.

All of the data is also gzip compressed, so that we don't accumulate
gigabytes of frame data during test runs.

I also moved the qemu_test expression out of the default.nix of the main
qemu expression, so that when we improve this we don't accidentally
break stuff for users of the normal QEMU.

[1]: https://www.spice-space.org/
[2]: https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg00865.html
[3]: https://github.com/amitbet/vncproxy
[4]: https://ffmpeg.org/~michael/nut.txt

Signed-off-by: aszlig <aszlig@nix.build>
This tool converts the intermediate format that we now get from QEMU
when running with --nixos-test outfile into any video format that can be
played by most video players, particularly web browsers when watching
the videos on Hydra.

The video format to be used is determined by the output file name, so
it's easy to change it to a different format rather than WebM (which is
the format this was designed for) by simply changing the file name.

I also have to give credit to "kepstin" from the #ffmpeg channel on
freenode for helping with the rescaling of the presentation timestamp.

Signed-off-by: aszlig <aszlig@nix.build>
This now encodes the raw video frames for every single VM test in our
internal format. Due to compression the average video file size is about
~3M so I think it's a good idea to enable it by default so we can see
post-mortem what went wrong.

Instead of killing the machines with the SIGKILL signal during cleanup,
we now kill it with SIGTERM because otherwise the atexit() handler
doesn't run and we get a video file without the gzip buffer flushed.

I'm also adding a helper attribute called 'videos' to every test, so
that all these videos can be encoded on Hydra. The derivation propagates
the build products from the actual test runner and also adds videos to
the Hydra build products.

Signed-off-by: aszlig <aszlig@nix.build>
This is mainly for Hydra so that the videos show up in the build
products. Encoding this to WebM takes a while especially for long test
runs, but it also helps debugging tests without the need to manually run
the encoding process on the actual output path of the test runner's
videos.

Signed-off-by: aszlig <aszlig@nix.build>
When writing with header_size - 1 we shouldn't check whether header_size
has been written. I also added an exit(1) to make sure this is fatal,
because otherwise things such as the issue here go completely unnoticed.

Signed-off-by: aszlig <aszlig@nix.build>
For some tests such as runInMachine, there is neither a .videos
attribute nor does it actually need to record videos.

However, right now it still does record videos so we need to fix that
soon.

Signed-off-by: aszlig <aszlig@nix.build>
This is actually quite common whenever video frames are written that a
signal can interrupt a call to write(). The first option that came into
my mind was to use sigprocmask() to make sure that we don't get
interrupted while writing the frame data. However this would also
introduce some overhead.

So instead, we're now just ignoring a frame update/switch whenever we
reach the end of the file, so we don't have that overhead and also only
loose one frame to the end of the video stream because we do actually
flush the gzip buffer in the atexit() handler.

Signed-off-by: aszlig <aszlig@nix.build>
When building with Nix there is a NIX_BUILD_CORES environment variable
set to either the default, which should be equal to av_cpu_count() or a
user-set value, so let's respect that.

If the value is 0 or unset (not inside a Nix build), we use
av_cpu_count() instead.

Signed-off-by: aszlig <aszlig@nix.build>
If we only want to run tests without actually encoding the videos, it
doesn't really make sense to pull in the FFmpeg dependency.

When debugging tests we can still either use the encoder manually or
simply append the .videos attribute in order to get the encoded videos
(which then of course will pull in the FFmpeg dependency).

Signed-off-by: aszlig <aszlig@nix.build>
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Sep 11, 2023
@aszlig
Copy link
Member Author

aszlig commented Sep 11, 2023

Rebased against current master.

Copy link
Contributor

@flokli flokli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as I appreciate the work put into this, I still don't think patching qemu downstream is the right way forward. Quoting my original comment from three years ago:

Did you try approaching upstream qemu with the problems you're facing? I'm not sure if we do ourselves much of a favor shipping our own custom fork of qemu, and tooling to produce a somewhat ffmpeg-specific intermediate format etc. We certainly can't be the only ones facing this. I'd rather see this somewhere inside or alongside qemu, or spice being fixed to not drop frames (if the underlying architecture allows this).

Please approach upstream to get either this or another reliable way of video recording merged. I don't think this is suitable to ship in nixpkgs only.

@aszlig
Copy link
Member Author

aszlig commented Sep 18, 2023

Please approach upstream to get either this or another reliable way of video recording merged. I don't think this is suitable to ship in nixpkgs only.

I'd like to avoid putting such a module upstream since currently it's too Nix-specific and I also have plans for using this same module to add UI test functionality (eg. selecting/matching regions for change or constraining OCR). Furthermore, upstream already rejected a similar implementation (although that one used ffmpeg, while ours does not).

@aszlig
Copy link
Member Author

aszlig commented Sep 18, 2023

Addendum: Since QEMU 7.0 there is a new D-Bus display, which as far as I can see calls a D-Bus listener for every frame update. This is a lot more close to what we want, but it seems to always write the full frame image even if only a subset has changed. It's not a show-stopper though since we can easily work around this limitation if we even need to.

I'll experiment with that a bit to see whether it's a viable option or whether it has similar drawbacks such as SPICE/VNC.

@wegank wegank added 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 2.status: merge conflict labels Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet