Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libvirtd: Add support for remote libvirt URIs #824

Merged
merged 8 commits into from Jul 17, 2019

Conversation

erosennin
Copy link
Contributor

@erosennin erosennin commented Dec 28, 2017

This PR adds support for deploying to remote libvirtd hosts via qemu+ssh://... or qemu+tcp://... URIs.

  • Make the URI configurable instead of hardcoging it to qemu:///system.
  • Use libvirt API to upload the disk image instead of using the local image directory.
  • Use libvirt API to determing the QEMU executable instead of using /run/current-system/sw/bin/qemu-system-x86_64.
  • Error handling.
  • Backward compatibility.
  • Update documentation and examples.

@@ -47,6 +47,15 @@ in
'';
};

options = {
deployment.libvirtd.URI = mkOption {
type = types.str;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this benefit from types.enum?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may or may not reveal how little I know about libvirt :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this benefit from types.enum?

Probably not, this option can be an arbitrary URI string: https://libvirt.org/uri.html.

It would be nice to have types.URL and types.URI though. :)

@@ -217,10 +224,15 @@ def read_file(stream, nbytes, f):
stream.sendAll(read_file, f)
stream.finish()

def _qemu_executable(self):
domaincaps_xml = self.conn.getDomainCapabilities(
emulatorbin=None, arch='x86_64', machine=None, virttype='kvm',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcoding arch could be a problem, couldn't it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version has hardcoded qemu-system-x86_64, so I decided to let it be for now it and maybe address it later in a separate PR. Didn't want to include too many unrelated changes in a single PR.

qemu_executable = "qemu-system-x86_64"
qemu = spawn.find_executable(qemu_executable)
assert qemu is not None, "{} executable not found. Please install QEMU first.".format(qemu_executable)
qemu = self._qemu_executable()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be better to keep the assert ? I would miss the error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, if QEMU is not installed, the call to libvirt.open() will fail with some unclear exception like no connection driver available for qemu:///system. I'll try to handle the error and output some useful message to the user.

@teto
Copy link
Member

teto commented Jan 17, 2018

Sounds interesting, do I need nixops on the remote node or just libvirt is fine ?

@erosennin
Copy link
Contributor Author

Only libvirtd is needed on the remote host. I'm currently testing it with remote libvirtd on Debian and it works fine.

@teto
Copy link
Member

teto commented Jan 17, 2018

I fear this might be time consuming since I often had permission problems with libvirt/qemu but I want to give it a try anyway (btw any advice ?). https://libvirt.org/remote.html looks like a good resource for remote usage. I quickly tried but it failed to connect. Does it need to be a hostname ?
URI Utils error : internal error: Unable to parse URI qemu+ssh://<ipv6>/system

@erosennin
Copy link
Contributor Author

Should probably be qemu+ssh://[<ipv6>]/system.

:)

@teto
Copy link
Member

teto commented Jan 18, 2018

When deploying with this PR, I run into (I don't with master nixops)

Information: You may need to update /etc/fstab.                           
client> 
mknod: /dev/vda1: File exists     

I am on a slightly modified nixos-unstable so this could be the reason why. Hope someone else can try else or I might try on stock nixos-unstable. Whatversion of nixpkgs do you use ?

@erosennin
Copy link
Contributor Author

@teto Rebased the PR against recent master, hope that helps.

@teto
Copy link
Member

teto commented Jan 19, 2018

Before I test again, which version of nixpkgs do you use ? so that I can try on that one :)

@flokli
Copy link
Contributor

flokli commented Jan 21, 2018

@teto The issue you saw was most likely fixed in #826 , which got merged into master 23 days ago (but still isn't included in a release)

@teto
Copy link
Member

teto commented Jan 22, 2018

@flokli thanks for the pointer, it seems the problem is indeed gone.

with current nixos-unstable 5402412b97247bcc0d2b693e159d55d114d1327b and your PR rebased on nixops a232fc5 (https://github.com/teto/nixops/tree/libvirt_uri), I get

client> connecting to qemu:///system...
server> connecting to qemu+ssh://my_vm/system...
client> preparing disk image...
server> preparing disk image...
server> qemu-img: Could not open '/run/user/1000/nixops-tmpNo4lo8/disk.qcow2': Failed to get "write" lock
server> Is another process using the image?
client> uploading disk image...
libvirt: Storage Driver error : internal error: creation of non-raw file images is not supported without qemu-img.
error: Multiple exceptions: command ‘['qemu-img', 'rebase', '-f', 'qcow2', '-b', '', '/run/user/1000/nixops-tmpNo4lo8/disk.qcow2']’ failed on machine ‘server’ (exit code 1), internal error: creation of non-raw file images is not supported without qemu-img.
------------------------------
Traceback (most recent call last):
  File "/home/teto/nixops/nixops/parallel.py", line 41, in thread_fun
    result_queue.put((worker_fun(t), None))
  File "/home/teto/nixops/nixops/deployment.py", line 948, in worker
    r.create(self.definitions[r.name], check=check, allow_reboot=allow_reboot, allow_recreate=allow_recreate)
  File "/home/teto/nixops/nixops/backends/libvirtd.py", line 157, in create
    self._prepare_storage_volume()
  File "/home/teto/nixops/nixops/backends/libvirtd.py", line 186, in _prepare_storage_volume
    "", temp_disk_path])
  File "/home/teto/nixops/nixops/backends/__init__.py", line 324, in _logged_exec
    return nixops.util.logged_exec(command, self.logger, **kwargs)
  File "/home/teto/nixops/nixops/util.py", line 151, in logged_exec
    raise CommandFailed(err, res)
CommandFailed: command ‘['qemu-img', 'rebase', '-f', 'qcow2', '-b', '', '/run/user/1000/nixops-tmpNo4lo8/disk.qcow2']’ failed on machine ‘server’ (exit code 1)
------------------------------
Traceback (most recent call last):
  File "/home/teto/nixops/nixops/parallel.py", line 41, in thread_fun
    result_queue.put((worker_fun(t), None))
  File "/home/teto/nixops/nixops/deployment.py", line 948, in worker
    r.create(self.definitions[r.name], check=check, allow_reboot=allow_reboot, allow_recreate=allow_recreate)
  File "/home/teto/nixops/nixops/backends/libvirtd.py", line 157, in create
    self._prepare_storage_volume()
  File "/home/teto/nixops/nixops/backends/libvirtd.py", line 190, in _prepare_storage_volume
    self._vol = self._create_volume(image_info['virtual-size'], image_info['actual-size'])
  File "/home/teto/nixops/nixops/backends/libvirtd.py", line 212, in _create_volume
    vol = self.pool.createXML(xml)
  File "/nix/store/zj4dq8ipqiwakj0hql5kdrsql11laf0g-python2.7-libvirt-3.10.0/lib/python2.7/site-packages/libvirt.py", line 3185, in createXML
    if ret is None:raise libvirtError('virStorageVolCreateXML() failed', pool=self)
libvirtError: internal error: creation of non-raw file images is not supported without qemu-img.

I discovered the lslocks command but even with -u or -J flags I could not get the full paths, I assume it's the correct one.

$ lslocks                                                                                                                                                                                                                                                    
COMMAND           PID  TYPE SIZE MODE  M      START        END PATH
.libvirtd-wrapp  1051 POSIX      WRITE 0          0          0 /run...
.firefox-wrappe  1693 POSIX      READ  0 1073741826 1073742335 
.firefox-wrappe  1693 POSIX      READ  0        128        128 

My deployement consists of 2 identical machines, with just one supposed to be remote.

  server = libvirtd-remote;
  client = libvirtd-local;

Maybe when building the 2 identical drives is the problem ?

@erosennin
Copy link
Contributor Author

libvirtError: internal error: creation of non-raw file images is not supported without qemu-img.

This is a bug in the current nixos-unstable, there is a PR for that: NixOS/nixpkgs#34052. Hope it is merged soon.

@teto
Copy link
Member

teto commented Jan 22, 2018

Ok I give up ^^ tell me when planets align :>

@flokli
Copy link
Contributor

flokli commented Jan 22, 2018

@teto planets aligned, and the PR landed in nixpkgs master. Can you test again?

@teto
Copy link
Member

teto commented Jan 23, 2018 via email

@flokli
Copy link
Contributor

flokli commented Jan 31, 2018

@teto NixOS/nixpkgs#34052 should have landed in channels/nixos-unstable. Could you give it another try?

@teto
Copy link
Member

teto commented Feb 1, 2018

I've nixos-unstable with #34052 in ~/nixpkgs3 and ran ./dev-shell -I nixpkgs="$HOME/nixpkgs3" and ran into
qemu-img: Could not open '/run/user/1000/nixops-tmpyNAvGS/disk.qcow2': Failed to get "write" lock.
Also tried
export NIX_PATH=nixpkgs=/home/teto/nixpkgs3:nixos-config=/home/teto/dotfiles/nixpkgs/configuration.nix:/nix/var/nix/profiles/per-user/root/channels ./dev-shell with the same result.
Not sure if the nixpkgs patch is not enough or if nixops pick the wrong nixpkgs.

@flokli
Copy link
Contributor

flokli commented Feb 2, 2018

Not sure what broke there. Maybe this file already exists, and/or is in use somehow?

I did ran ./dev-shell -I nixpkgs=/path/to/my/nixpkgs, with this folder pointing to the latest nixos-unstable (2e4aded3669), and got nixops to build the vm image on a local machine, upload it to the libvirt machine (debian), and boot the vm up.

What still needs to be tested is the code after the initial setup (the ssh access part)

Nixops tries to get a 'primary ip' to ssh into by using virNetworkGetDHCPLeases.
This however doesn't seem to work for bridge interfaces / with non-libvirt-started dhcp servers.

I was only able to test it on an hardware with this configuration, so nixops couldn't find an IP to ssh into and kept waiting for an IP address. When using the nixops generated ssh privkey, I was however able to log in, so everything before seems to have worked.

Additionally, for convenience, we might need to tunnel the following ssh connection via the remote libvirt host, as the discovered IPs might be private and not routed to the libvirt host from elsewhere (or at least try to use a public ipv6 address if available). If we get the tunneling working, IPv6 link local might be the best choice, as we can calculate this from the mac address.

I'm not sure if it's possible to abuse the libvirt connection as a jumphost to internal IPs. @erosennin, do you have an idea?

@teto
Copy link
Member

teto commented Feb 2, 2018

I upgraded my remote VM to nixos-unstable even though I don't think it's necessary but deployement still fails locally. When using master, it works.

server> connecting to qemu+ssh://some_vm/system...
client> connecting to qemu:///system...
client> preparing disk image...
server> preparing disk image...
server> qemu-img: Could not open '/run/user/1000/nixops-tmpr1bwoI/disk.qcow2': Failed to get "write" lock
server> Is another process using the image?
client> qemu-img: Error while writing to COW image: No space left on device
error: Multiple exceptions: command ‘['qemu-img', 'rebase', '-f', 'qcow2', '-b', '', '/run/user/1000/nixops-tmpr1bwoI/disk.qcow2']’ failed on machine ‘server’ (exit code 1), command ‘['qemu-img', 'rebase', '-f', 'qcow2', '-b', '', '/run/user/1000/nixops-tmpr1bwoI/disk.qcow)
------------------------------
Traceback (most recent call last):
  File "/home/teto/nixops/nixops/parallel.py", line 41, in thread_fun
    result_queue.put((worker_fun(t), None))
  File "/home/teto/nixops/nixops/deployment.py", line 948, in worker
    r.create(self.definitions[r.name], check=check, allow_reboot=allow_reboot, allow_recreate=allow_recreate)
  File "/home/teto/nixops/nixops/backends/libvirtd.py", line 157, in create
    self._prepare_storage_volume()
  File "/home/teto/nixops/nixops/backends/libvirtd.py", line 186, in _prepare_storage_volume
    "", temp_disk_path])
  File "/home/teto/nixops/nixops/backends/__init__.py", line 324, in _logged_exec
    return nixops.util.logged_exec(command, self.logger, **kwargs)
  File "/home/teto/nixops/nixops/util.py", line 151, in logged_exec
    raise CommandFailed(err, res)
CommandFailed: command ‘['qemu-img', 'rebase', '-f', 'qcow2', '-b', '', '/run/user/1000/nixops-tmpr1bwoI/disk.qcow2']’ failed on machine ‘server’ (exit code 1)
------------------------------
Traceback (most recent call last):
  File "/home/teto/nixops/nixops/parallel.py", line 41, in thread_fun
    result_queue.put((worker_fun(t), None))
  File "/home/teto/nixops/nixops/deployment.py", line 948, in worker
    r.create(self.definitions[r.name], check=check, allow_reboot=allow_reboot, allow_recreate=allow_recreate)
  File "/home/teto/nixops/nixops/backends/libvirtd.py", line 157, in create
    self._prepare_storage_volume()
  File "/home/teto/nixops/nixops/backends/libvirtd.py", line 186, in _prepare_storage_volume
    "", temp_disk_path])
  File "/home/teto/nixops/nixops/backends/__init__.py", line 324, in _logged_exec
    return nixops.util.logged_exec(command, self.logger, **kwargs)
  File "/home/teto/nixops/nixops/util.py", line 151, in logged_exec
    raise CommandFailed(err, res)

Have you tried a deployement of 2 VMs with similar config ?

@flokli
Copy link
Contributor

flokli commented Feb 2, 2018

Yes, I rebased on master.
Good catch - it seems like the vm name is not baked into the image path.

I added two commits below, that clean up imports and fix above issue:
https://gist.github.com/flokli/869308cf0e64c6d657fbcd3abf7bbc96

@flokli
Copy link
Contributor

flokli commented Feb 2, 2018

@erosennin, could you cherry-pick those in here as well, and rebase to latest master?

@erosennin
Copy link
Contributor Author

@flokli thanks, added your changes and rebased.

@mbrgm
Copy link
Member

mbrgm commented Feb 19, 2018

Is there anything that can be done to move this PR forward? Really looking forward to having it merged.

@erosennin
Copy link
Contributor Author

@mbrgm I'll try to find some time for it this week.

@AmineChikhaoui
Copy link
Member

@teto Hi, no worries :) sorry I'm not really familiar with the libvirt backend, so can't really help here. @rbvermaa is the official maintainer of NixOps. I'm sure he'll give this a round of review and merge once he got time :)

@teto
Copy link
Member

teto commented Jun 25, 2018

bump (sorry xD).

@rbvermaa
Copy link
Member

rbvermaa commented Jul 3, 2018

@teto the PR is not really backward compatible, right? (imageDir option removed) Is there a trivial fix for people, or can we support both storagePool and imageDir to maintain backward compatibility?

I don't know much about the libvirtd backend, so the changes look fine except for the remark above. If the tests for libvirtd backend succeed, I don't see any reasons (other than above) against merging.

@teto
Copy link
Member

teto commented Jul 23, 2018

I suppose that when imageDir is set, _make_domain_xml could create the pool ?
or one could use the API https://libvirt.org/html/libvirt-libvirt-storage.html#virStoragePoolCreate

@teto
Copy link
Member

teto commented Aug 16, 2018

@erosennin any intent to complete this ? that would be great !

I tried to keep backwards compatibility via the following patch that adds a path attritube on storage creation when imageDir is set (doc on the path attribute https://libvirt.org/formatstorage.html#StoragePoolTarget). I have permission problems with it, the libvirtd doc seems wrong about how permissions are attributed to the qcow2.

diff --git a/nix/libvirtd.nix b/nix/libvirtd.nix
index dad19a1..4c2ba60 100644
--- a/nix/libvirtd.nix
+++ b/nix/libvirtd.nix
@@ -45,6 +45,14 @@ in
   ###### interface
 
   options = {
+    deployment.libvirtd.imageDir = mkOption {
+      type = types.nullOr types.path;
+      default = null;
+      description = ''
+        Directory to store VM image files. Note that it should be writable both by you and by libvirtd daemon.
+      '';
+    };
+
     deployment.libvirtd.storagePool = mkOption {
       type = types.str;
       default = "default";
diff --git a/nixops/backends/libvirtd.py b/nixops/backends/libvirtd.py
index 644d75e..aa7bcd6 100644
--- a/nixops/backends/libvirtd.py
+++ b/nixops/backends/libvirtd.py
@@ -35,6 +35,9 @@ class LibvirtdDefinition(MachineDefinition):
         self.extra_devices = x.find("attr[@name='extraDevicesXML']/string").get("value")
         self.extra_domain = x.find("attr[@name='extraDomainXML']/string").get("value")
         self.headless = x.find("attr[@name='headless']/bool").get("value") == 'true'
+        self.image_dir = x.find("attr[@name='imageDir']/string")
+        if self.image_dir:
+            self.image_dir = self.image_dir.get("value")
         self.domain_type = x.find("attr[@name='domainType']/string").get("value")
         self.kernel = x.find("attr[@name='kernel']/string").get("value")
         self.initrd = x.find("attr[@name='initrd']/string").get("value")
@@ -160,7 +163,7 @@ class LibvirtdState(MachineState):
             (self.client_private_key, self.client_public_key) = nixops.util.create_key_pair()
 
         if self.storage_volume_name is None:
-            self._prepare_storage_volume()
+            self._prepare_storage_volume(defn)
             self.storage_volume_name = self.vol.name()
 
         self.domain_xml = self._make_domain_xml(defn)
@@ -178,7 +181,7 @@ class LibvirtdState(MachineState):
         self.start()
         return True
 
-    def _prepare_storage_volume(self):
+    def _prepare_storage_volume(self, defn):
         self.logger.log("preparing disk image...")
         newEnv = copy.deepcopy(os.environ)
         newEnv["NIXOPS_LIBVIRTD_PUBKEY"] = self.client_public_key
@@ -200,14 +203,14 @@ class LibvirtdState(MachineState):
 
         self.logger.log("uploading disk image...")
         image_info = self._get_image_info(temp_disk_path)
-        self._vol = self._create_volume(image_info['virtual-size'], image_info['actual-size'])
+        self._vol = self._create_volume(image_info['virtual-size'], image_info['actual-size'], defn.image_dir)
         self._upload_volume(temp_disk_path, image_info['actual-size'])
 
     def _get_image_info(self, filename):
         output = self._logged_exec(["qemu-img", "info", "--output", "json", filename], capture_stdout=True)
         return json.loads(output)
 
-    def _create_volume(self, virtual_size, actual_size):
+    def _create_volume(self, virtual_size, actual_size, path=None):
         xml = '''
         <volume>
           <name>{name}</name>
@@ -215,12 +218,14 @@ class LibvirtdState(MachineState):
           <allocation>{actual_size}</allocation>
           <target>
             <format type="qcow2"/>
+            {eventual_path}
           </target>
         </volume>
         '''.format(
             name="{}.qcow2".format(self._vm_id()),
             virtual_size=virtual_size,
             actual_size=actual_size,
+            eventual_path= "<path >%s</path>" % path if path else ""
         )
         vol = self.pool.createXML(xml)
         self._vol = vol
@@ -267,9 +272,10 @@ class LibvirtdState(MachineState):
                 '    <type arch="x86_64">hvm</type>',
                 "    <kernel>%s</kernel>" % defn.kernel,
                 "    <initrd>%s</initrd>" % defn.initrd if len(defn.kernel) > 0 else "",
-                "    <cmdline>%s</cmdline>" % defn.cmdline if len(defn.kernel) > 0 else "",
+                "    <cmdline>%s</cmdline>"% defn.cmdline if len(defn.kernel) > 0 else "",
                 '</os>']
 
+
         domain_fmt = "\n".join([
             '<domain type="{5}">',
             '  <name>{0}</name>',

@teto
Copy link
Member

teto commented Aug 20, 2018

I solved the permission problem by hardcoding the permissions for now in the XML as teto:users and it worked (according to the doc libvirtd should use the parent's permissions when no permission is precised but that was not the case for me). For the curious here is my messy branch https://github.com/teto/nixops/tree/qemu_agent with fixes for this PR and #922 . I hope that @erosennin can complete this PR else I might take a try at it when I get time.

@jokogr
Copy link

jokogr commented Nov 8, 2018

@erosennin @teto is anyone willing to complete this PR? Could I do anything to help?

@teto
Copy link
Member

teto commented Nov 9, 2018

@jokogr that would be great ! I don't have the time to untangle the 2/3 PRs I merged into https://github.com/teto/nixops/tree/qemu_agent but with the info I've given on this thread, you should be able to complete the PR without too much hassle.

@w4
Copy link

w4 commented Feb 9, 2019

Does anyone know how much work is left here to get this ready to merge?

@azazel75
Copy link

azazel75 commented Mar 6, 2019

I've tried this out by rebasing it to master and work well, good job! The only caveat seems that libvirt doesn't come anymore with a default storage and so the image upload task failed the first time I tried. To create the storage I've used the command virsh pool-create my_pool_definition.xml (I didn't find another way) with a configuration of a dir storage type that points to /var/lib/libvirt/images.

The custom network that is present into the documentation seems non necessary with vanilla libvirt config because the default net is appropriately configured.

To connect to the VM network with address 192.168.122.0/24 I've used the sshuttle utility.

@makefu
Copy link

makefu commented Mar 7, 2019

i'd love to see this PR merged in order to do a remote deployment!

@azazel75
Copy link

azazel75 commented Mar 7, 2019

@makefu in the meantime i use the following code to have a specialized version of NixOps with this patch included:

let
  localPkgs = import <nixpkgs> {};
  nixopsPkg = builtins.fetchTarball {
    url = "https://github.com/azazel75/nixops/archive/remote-libvirt.tar.gz";
    sha256 = "1mg1bsgwrydyk2i0g5pzc35kr6ybjrihvfns8kd8d6b74qxnyh40";
  };
  libvirtNixops = (import "${nixopsPkg}/release.nix" {}).build.${localPkgs.system};
in 
  # rest of shell.nix

@flokli
Copy link
Contributor

flokli commented Mar 8, 2019

@azazel75 libvirt being without a default storage is not a problem limited to remote URIs, but a generic bug with the libvirt backend, right? So I guess this should be fixed in another PR.

What else is preventing this PR from being merged? @domenkozar @AmineChikhaoui

@nixos-discourse
Copy link

This pull request has been mentioned on Nix community. There might be relevant details there:

https://discourse.nixos.org/t/what-am-i-doing-wrong-here/2517/1

@ngerstle
Copy link

Any progress? This would be really nice to get included..

@AmineChikhaoui AmineChikhaoui merged commit 5cc73e2 into NixOS:master Jul 17, 2019
@AmineChikhaoui
Copy link
Member

Thanks everybody and sorry that it took a while to get merged.

@teto
Copy link
Member

teto commented Jul 17, 2019

wow !that's good news ! May motivate me to share/update some PRs.

@azazel75
Copy link

Wow @AmineChikhaoui thanks! Keep up the good trend and check out #1123 too, It's an one liner but it's very useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet