Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: aws nvme support #965

Merged
merged 9 commits into from Sep 6, 2018
Merged

Conversation

srghma
Copy link
Contributor

@srghma srghma commented Jun 2, 2018

#935
#846

@srghma
Copy link
Contributor Author

srghma commented Jun 4, 2018

Work done:

before:

  • block_device_mapping for aws is stored in xvd* format, even if its specified in sd* format in blockDeviceMapping.XXX.device

now:

  • block_device_mapping for aws is stored as specified in blockDeviceMapping.XXX.device without changes

P.S.
I think its fine and non-breaking, because:

  1. manual uses xvd* format for blockDeviceMapping.XXX.device https://nixos.org/nixops/manual/#opt-deployment.ec2.blockDeviceMapping
  2. based on my understanding something like
{
      .....
      fileSystems."/data" = {
        autoFormat = true;
        fsType = "ext4";
        device = "/dev/sdf";
        ec2.disk = resources.ebsVolumes.foo-disk;
      };
    };
}

is not possible and would fail on os start with error systemd device mount job cant find "/dev/sdf" disk (because its attached as "/dev/xvdf" disk).

DRAWBACKS:

this will work

{
  machine = {
    deployment.ec2.blockDeviceMapping."/dev/nvme1n1".size = 1;
    deployment.ec2.blockDeviceMapping."/dev/nvme2n1".size = 1;

    deployment.ec2.blockDeviceMapping."/dev/nvme1n1".deleteOnTermination = true;
    deployment.ec2.blockDeviceMapping."/dev/nvme2n1".deleteOnTermination = true;

    deployment.autoRaid0.raid.devices = [ "/dev/nvme1n1" "/dev/nvme2n1" ];

    fileSystems."/data" = {
      autoFormat = true;
      device = "/dev/raid/raid";
      fsType = "ext4";
    };
  };
}

but this will not

{
  machine = {
    deployment.ec2.blockDeviceMapping."/dev/nvme1n1".size = 1;
    deployment.ec2.blockDeviceMapping."/dev/nvme3n1".size = 1; # will be attached as /dev/nvme2n1

    deployment.ec2.blockDeviceMapping."/dev/nvme1n1".deleteOnTermination = true;
    deployment.ec2.blockDeviceMapping."/dev/nvme3n1".deleteOnTermination = true;

    deployment.autoRaid0.raid.devices = [ "/dev/nvme1n1" "/dev/nvme3n1" ];

    fileSystems."/data" = {
      autoFormat = true;
      device = "/dev/raid/raid";
      fsType = "ext4";
    };
  };
}

@srghma
Copy link
Contributor Author

srghma commented Jun 4, 2018

@AmineChikhaoui can you check

@srghma
Copy link
Contributor Author

srghma commented Jun 5, 2018

hi @danbst , since this pr is related to https://github.com/NixOS/nixops/issues/569 as the first step to add nvme support for hertzner, could you check please this pr

@srghma srghma changed the title Feat: aws nvme support (WIP) Feat: aws nvme support Jun 10, 2018
@danbst
Copy link
Contributor

danbst commented Jun 10, 2018

sorry @srghma , I don't use Hetzner right now, so I can't help you here.

@srghma
Copy link
Contributor Author

srghma commented Jun 11, 2018

@danbst, I didnt make herzner support (or ever will make, I don't have server I can experiment on, it costs a lot)

Could you please review the code)

@srghma
Copy link
Contributor Author

srghma commented Jun 11, 2018

I don't know how to request a review, this https://help.github.com/articles/requesting-a-pull-request-review/ is not working for me

@srghma
Copy link
Contributor Author

srghma commented Jun 11, 2018

@aszlig @AmineChikhaoui @moretea @rbvermaa could you review it too

Copy link
Contributor

@danbst danbst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, made a shallow stylistic review. Big +++ for adding tests!

@@ -1,16 +1,16 @@
{
network.description = "NixOS terminal server";

machine =
machine =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this file from PR, because those are only whitespace changes

Copy link
Contributor Author

@srghma srghma Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I think we can do this with #968 )

nix/ec2.nix Outdated
@@ -476,7 +476,6 @@ in
type = config.deployment.ec2.instanceType or "unknown";
mapping = import ./ec2-properties.nix;
in attrByPath [ type ] null mapping;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this file, because those are only whitespace changes

@@ -57,7 +57,7 @@ def __init__(self, xml, config):
self.associate_public_ip_address = config["ec2"]["associatePublicIpAddress"]
self.use_private_ip_address = config["ec2"]["usePrivateIpAddress"]
self.security_group_ids = config["ec2"]["securityGroupIds"]
self.block_device_mapping = {_xvd_to_sd(k): v for k, v in config["ec2"]["blockDeviceMapping"].iteritems()}
self.block_device_mapping = config["ec2"]["blockDeviceMapping"]
Copy link
Contributor

@danbst danbst Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while it generally looks fine, have you checked whether there are backwards compatibility problems?

I think the situation is like this. Early NixOps version used /dev/sdX to nominate devices, but Amazon later renamed those to /dev/xvdX, so NixOps did a hack to runtime replace /dev/sdX -> /dev/xvdX for nixops.state config. So in theory, there may be NixOps installations with /dev/sdX entries in blockDeviceMapping config.

This and further usages of self.block_device_mapping may work incorrectly in this scenario. Though I haven't checked this thoroughly, I'll leave it to you :)

Copy link
Contributor Author

@srghma srghma Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using https://gist.github.com/srghma/be1f6ce596b406bed3137aa948508a5a configuration

I do

 ~/projects/nixops   fix-aws-5gen-disk-attach  gco origin/master
Note: checking out 'origin/master'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 3b2751e Merge pull request #933 from amemni/gce-labels-disks-snaps
 ~/projects/nixops  ➦ 3b2751e 
~/projects/vd-rails-deploy   master ✚  nix-shell
....build

[nix-shell:~/projects/vd-rails-deploy]$ make nixops_create
nixops create '<base.nix>'
created deployment ‘daeb8e51-6dab-11e8-8272-02425e3939e4’
daeb8e51-6dab-11e8-8272-02425e3939e4


[nix-shell:~/projects/vd-rails-deploy]$ nixops deploy
......
backend.............> attaching volume ‘vol-09b9b99d682509147’ as ‘/dev/xvdf’... [attaching] [attached]
building all machine configurations...
.............

backend.............> A dependency job for local-fs.target failed. See 'journalctl -xe' for details.
backend.............> error: Traceback (most recent call last):
  File "/nix/store/6pzj7d648mksqdmk1npd6ljrk59xbz2p-python2.7-nixops-1.6.1pre0_abcdef/lib/python2.7/site-packages/nixops/deployment.py", line 731, in worker
    raise Exception("unable to activate new configuration (exit code {})".format(res))
Exception: unable to activate new configuration (exit code 4)

error: activation of 1 of 1 machines failed (namely on ‘backend’)

[nix-shell:~/projects/vd-rails-deploy]$ nixops ssh backend

[root@backend:~]# systemctl status local-fs.target
● local-fs.target - Local File Systems
   Loaded: loaded (/nix/store/6dkz6azki0bag13m6vccy7bi41n9dak3-systemd-237/example/systemd/system/local-fs.target; enabled; vendor preset: enabled)
  Drop-In: /nix/store/r8m7q7j1sd854cqh50yjq6xymmldbrlq-system-units/local-fs.target.d
           └─overrides.conf
   Active: inactive (dead) since Mon 2018-06-11 19:19:37 UTC; 4min 53s ago
     Docs: man:systemd.special(7)

Jun 11 19:19:37 ip-172-31-73-78.ec2.internal systemd[1]: Stopped target Local File Systems.
Jun 11 19:21:14 backend systemd[1]: Dependency failed for Local File Systems.
Jun 11 19:21:14 backend systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
Jun 11 19:21:14 backend systemd[1]: local-fs.target: Triggering OnFailure= dependencies.
Jun 11 19:21:14 backend systemd[1]: local-fs.target: Failed to enqueue OnFailure= job: No such file or directory

[root@backend:~]# systemctl status dev-sdf.device
● dev-sdf.device
   Loaded: loaded
   Active: inactive (dead)

Jun 11 19:21:14 backend systemd[1]: dev-sdf.device: Job dev-sdf.device/start timed out.
Jun 11 19:21:14 backend systemd[1]: Timed out waiting for device dev-sdf.device.
Jun 11 19:21:14 backend systemd[1]: dev-sdf.device: Job dev-sdf.device/start failed with result 'timeout'.

change to

      fileSystems."/data" = {
        autoFormat = true;
        fsType = "ext4";
        device = "/dev/xvdf";
        ec2.disk = resources.ebsVolumes.foo-disk;
      };
[nix-shell:~/projects/vd-rails-deploy]$ nixops deploy
building all machine configurations...
these derivations will be built:
....

backend.............> activation finished successfully
foo> deployment finished successfully

~/projects/nixops  ➦ 3b2751e  git checkout fix-aws-5gen-disk-attach
Previous HEAD position was 3b2751e Merge pull request #933 from amemni/gce-labels-disks-snaps
Switched to branch 'fix-aws-5gen-disk-attach'
Your branch is up to date with 'origin/fix-aws-5gen-disk-attach'.

~/projects/vd-rails-deploy   master ●✚  nix-shell

[nix-shell:~/projects/vd-rails-deploy]$ nixops deploy
backend.............> attaching volume ‘vol-09b9b99d682509147’ as ‘/dev/xvdf’...
building all machine configurations...
backend.............> copying closure...
foo> closures copied successfully
backend.............> updating GRUB 2 menu...
backend.............> activating the configuration...
backend.............> setting up /etc...
backend.............> setting up tmpfiles
backend.............> activation finished successfully
backend.............> detaching device ‘/dev/sdf’...
backend.............> umount: /dev/sdf: no mount point specified.
foo> deployment finished successfully

[nix-shell:~/projects/vd-rails-deploy]$ nixops ssh backend
Last login: Mon Jun 11 19:22:45 2018 from 185.29.253.219

[root@backend:~]# lsblk
NAME    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda    202:0    0   5G  0 disk
└─xvda1 202:1    0   5G  0 part /
xvdf    202:80   0   5G  0 disk /data


P.S.

I wrote about this here #965 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yes, no problem here. I was talking about the following configuration:

let

  region = "us-west-1";
  accessKeyId = "default";
  ec2 =
    { resources, ... }:
    { deployment.targetEnv = "ec2";
      deployment.ec2.accessKeyId = accessKeyId;
      deployment.ec2.region = region;
      deployment.ec2.instanceType = "m3.medium";
      deployment.ec2.keyPair = resources.ec2KeyPairs.my-key-pair;

      deployment.ec2.blockDeviceMapping."/dev/sdc" = {
          disk = "ephemeral0";
      };
    };

in
{ proxy    = ec2;

  # Provision an EC2 key pair.
  resources.ec2KeyPairs.my-key-pair =
    { inherit region accessKeyId; };
}

And turns out that due to how boto works, it is fine:

[root@proxy:~]# df -h
Filesystem                Size  Used Avail Use% Mounted on
devtmpfs                  189M     0  189M   0% /dev
tmpfs                     1.9G     0  1.9G   0% /dev/shm
tmpfs                     941M  3.9M  937M   1% /run
tmpfs                     1.9G  300K  1.9G   1% /run/wrappers
/dev/disk/by-label/nixos  3.0G  1.6G  1.3G  56% /
/dev/xvdc                 3.9G  8.1M  3.7G   1% /disk0
tmpfs                     1.9G     0  1.9G   0% /sys/fs/cgroup
tmpfs                     377M     0  377M   0% /run/user/0

even though /dev/sdX don't work anymore for EC2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, still good that we have tested it)

for k, v in self.block_device_mapping.items():
if devices == [] or _sd_to_xvd(k) in devices:
# because name of the nvme device depends on the order it attached to maching
sorted_block_device_mapping = sorted(self.block_device_mapping.items())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code and comment is used in two places. What about moving it closer to initialization? So walk order is unified and deterministic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also there is a typo in comment maching -> machine. But even with typo fixed I don't understand comment. Maybe rephrase: "Volumes should be attached in lexicographic order (ordered by device name). This will preserve nvme devices' names"

for k, v in defn.block_device_mapping.iteritems():
if re.match("/dev/sd[a-e]", k) and not v['disk'].startswith("ephemeral"):
raise Exception("non-ephemeral disk not allowed on device ‘{0}’; use /dev/xvdf or higher".format(_sd_to_xvd(k)))
is_root_device = re.match("/dev/sd[a-e]", k) or re.match("/dev/xvd[a-e]", k) or re.match("/dev/nvme0n1", k)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is similar code above:

is_root_device = dev.startswith("/dev/sda") or dev.startswith("/dev/xvda") or dev.startswith("/dev/nvme0")

looks like there is variable misname here

@srghma
Copy link
Contributor Author

srghma commented Jun 11, 2018

@danbst refactored, big tnx

Copy link
Contributor

@danbst danbst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add the stuff from DRAWBACK section to doc (deployment.ec2.blockDeviceMapping attribute description). Also specify, which index should it start from - nvme0n1 or nvme1n1. And also, add that device naming link to doc

if re.match("/dev/sd[a-e]", k) and not v['disk'].startswith("ephemeral"):
raise Exception("non-ephemeral disk not allowed on device ‘{0}’; use /dev/xvdf or higher".format(_sd_to_xvd(k)))
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html
device_name_recommended_for_ebs_volumes = re.match("/dev/sd[a-e]", k) or re.match("/dev/xvd[a-e]", k) or re.match("/dev/nvme0n1", k)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

misname again, should be device_names_not_recommended instead :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oups

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually think that or re.match("/dev/nvme0n1", k) case is not possible, I'll better remove it from list

@srghma
Copy link
Contributor Author

srghma commented Jun 12, 2018

@danbst done, fine? )

Copy link
Contributor

@danbst danbst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!
@AmineChikhaoui @domenkozar @rbvermaa ping for final review/merge

@AmineChikhaoui
Copy link
Member

LGTM, thanks !
I guess a broader question is how do we want to handle this in non-nixops provisioned machines. I remember some discussion about maybe using udev rules ? so not sure if we want to go with a fix in nixops only and handle NixOS on EC2 in general later or do one fix in the EC2 AMI generation instead.
cc @edolstra @rbvermaa @copumpkin

@copumpkin
Copy link
Member

@AmineChikhaoui can you elaborate on what doesn't work on non-nixops NixOS machines? I'm now using NVMe NixOS boxes on AWS that aren't provisioned with NixOps and haven't had any trouble. If there is trouble, it would be good to get an issue against nixpkgs repo!

@AmineChikhaoui
Copy link
Member

@copumpkin Oh wait I think I confused things, this is probably only a nixops issue.

@rbvermaa
Copy link
Member

I'll try to review this tomorrow.

nix/ec2.nix Outdated

<filename>/dev/sd[a-e]</filename> or <filename>/dev/xvd[a-e]</filename> must be ephemeral devices.

nvme devices should have name like <filename>/dev/nvme[1-26]n1</filename>, the number in device name should not be skipped.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify this sentence a bit? It's not immediately clear what is meant with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the following instances, EBS volumes are exposed as NVMe block devices: C5, C5d, i3.metal, M5, and M5d (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html). For this instances volumes should be attached as <filename>/dev/nvme[1-26]n1</filename>, the number in device name should not be skipped.

@rbvermaa

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rbvermaa is the refactored sentence above better? Can I make a commit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srghma With skipped, do you mean that there should be no 'hole' in the numbering of the devices? If so, are you only allowed to detach the volumes in 'descending' order, aka only the last one each time.

Also, in the above sentence 'For this instances' should be 'For these instance'.

Copy link
Contributor Author

@srghma srghma Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rbvermaa

  1. yes, there should be no hole in numbering, will change

detach the volumes in 'descending' order

I think user can deattach any volume he wants

lets consider an example

{
  machine =
    { resources, pkgs, ... }:
    {
      fileSystems."/disk1" = {
        autoFormat = true;
        fsType = "ext4";
        device = "/dev/nvme1n1";
        ec2.disk = resources.ebsVolumes.disk1;
      };

      # user wants to deattach it
      fileSystems."/disk2" = {
        autoFormat = true;
        fsType = "ext4";
        device = "/dev/nvme2n1";
        ec2.disk = resources.ebsVolumes.disk2;
      };

      fileSystems."/disk3" = {
        autoFormat = true;
        fsTye = "ext4";
        device = "/dev/nvme3n1";
        ec2.disk = resources.ebsVolumes.disk3;
      };
    };
}

if user just comments second device

{
  machine =
    { resources, pkgs, ... }:
    {
      fileSystems."/disk1" = {
        autoFormat = true;
        fsType = "ext4";
        device = "/dev/nvme1n1";
        ec2.disk = resources.ebsVolumes.disk1;
      };

      # user wants to deattach it
      # fileSystems."/disk2" = {
      #   autoFormat = true;
      #   fsType = "ext4";
      #   device = "/dev/nvme2n1";
      #   ec2.disk = resources.ebsVolumes.disk2;
      # };

      fileSystems."/disk3" = {
        autoFormat = true;
        fsTye = "ext4";
        device = "/dev/nvme3n1"; # will be attached as /dev/nvme2n1 and will cause mount error "cant find device /dev/nvme3n1"
        ec2.disk = resources.ebsVolumes.disk3;
      };
    };
}

right solution is

{
  machine =
    { resources, pkgs, ... }:
    {
      fileSystems."/disk1" = {
        autoFormat = true;
        fsType = "ext4";
        device = "/dev/nvme1n1";
        ec2.disk = resources.ebsVolumes.disk1;
      };

      # user wants to deattach it
      # fileSystems."/disk2" = {
      #   autoFormat = true;
      #   fsType = "ext4";
      #   device = "/dev/nvme2n1";
      #   ec2.disk = resources.ebsVolumes.disk2;
      # };

      fileSystems."/disk3" = {
        autoFormat = true;
        fsTye = "ext4";
        device = "/dev/nvme2n1"; # will be attached as /dev/nvme2n1
        ec2.disk = resources.ebsVolumes.disk3;
      };
    };
}

@rbvermaa
Copy link
Member

OK, finally managed to run the tests (original account was EC2 classic, for these instances VPC is needed). Will resume reviewing tomorrow.

@srghma
Copy link
Contributor Author

srghma commented Jun 20, 2018

@rbvermaa have you tested it? can we merge pr?

@AmineChikhaoui
Copy link
Member

@srghma This seems to break existing deployments, you can reproduce by deploying with a current nixops release then re-deploy with this patch.
Example of what I get while redeploying:

frontend> attaching volume ‘vol-036c4f859db44031f’ as ‘/dev/xvdf’... 
data0...> attaching volume ‘vol-0dc9036be9c548701’ as ‘/dev/xvdf’... error: Multiple exceptions (2): 
  * data0: EC2ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>InvalidParameterValue</Code><Message>Invalid value '/dev/sdf' for unixDevice. Attachment point /dev/sdf is already in use</Message></Error></Errors><RequestID>ab12ef46-fa9f-4272-bdca-438b92ea5119</RequestID></Response>
  * frontend: EC2ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>InvalidParameterValue</Code><Message>Invalid value '/dev/sdf' for unixDevice. Attachment point /dev/sdf is already in use</Message></Error></Errors><RequestID>dba6b156-5dfc-4fe7-861e-d51a3d3e04b7</RequestID></Response>

@srghma
Copy link
Contributor Author

srghma commented Jun 25, 2018

@AmineChikhaoui

hi

can you provide configuration

I thought it wasn't possible to use something like this https://gist.github.com/srghma/be1f6ce596b406bed3137aa948508a5a#file-base-nix-L55

      fileSystems."/data" = {
        autoFormat = true;
        fsType = "ext4";
        device = "/dev/sdf";
        ec2.disk = resources.ebsVolumes.foo-disk;
      };

I thought it will fail with error

A dependency job for local-fs.target failed. See 'journalctl -xe' for details.

as it did here #965 (comment)

@AmineChikhaoui
Copy link
Member

@srghma this is what I have for the filesystem

fileSystems."/data" = { 
  fsType = "xfs";
  options = [ "noatime" "nodiratime" ];
  device = "/dev/xvdf";
  autoFormat = true;
  ec2.size = lib.mkDefault volumeSize;
  ec2.volumeType = "gp2";
};

@srghma
Copy link
Contributor Author

srghma commented Jun 27, 2018

reproduced

 ✘  ~/projects/nixops   master  git rev-parse HEAD
3b2751e997a245ea5fff6917ee62f51384bce197

 ~/projects/nixops-aws-nvme-repro   master  git rev-parse HEAD
8f3770f755d7832bfd9a9987865d4fd49ba23c7f

#### its here
#### https://github.com/srghma/nixops-aws-nvme-repro/commit/8f3770f755d7832bfd9a9987865d4fd49ba23c7f

[nix-shell:~/projects/nixops-aws-nvme-repro]$ nixops create '<base.nix>'
created deployment ‘5ecf52b5-7a3e-11e8-86a3-02423e907994’
5ecf52b5-7a3e-11e8-86a3-02423e907994

[nix-shell:~/projects/nixops-aws-nvme-repro]$ nixops deploy
[nix-shell:~/projects/nixops-aws-nvme-repro]$ nixops deploy
backendSecurityGroup> creating EC2 security group ‘charon-5ecf52b5-7a3e-11e8-86a3-02423e907994-backendSecurityGroup’...
backendKeyPair......> uploading EC2 key pair ‘charon-5ecf52b5-7a3e-11e8-86a3-02423e907994-backendKeyPair’...
backendSecurityGroup> adding new rules to EC2 security group ‘charon-5ecf52b5-7a3e-11e8-86a3-02423e907994-backendSecurityGroup’...
stage...............> creating EC2 instance (AMI ‘ami-ff0d1d9f’, type ‘c4.large’, region ‘us-west-1’)...
stage...............> waiting for IP address... [pending] [pending] [running] 54.183.197.244 / 172.31.12.89
stage...............> waiting for SSH..........................................
stage...............> replacing temporary host key...
stage...............> creating EBS volume of 5 GiB...
stage...............> waiting for volume ‘vol-0e97b3b477ef79bcf’ to become available... [creating] [available]
stage...............> attaching volume ‘vol-0e97b3b477ef79bcf’ as ‘/dev/xvdf’... [attaching] [attached]
stage...............> setting state version to 18.03
building all machine configurations...
these derivations will be built:
  /nix/store/gp8bhsh2mgq0xrhavg45jqmsrppwg8g6-system-path.drv
  /nix/store/20f0grz8plrqirkg49ai3iszkrbqbdpv-unit-polkit.service.drv
  /nix/store/2b9i7ksxbyvv5v8ppp87q72rbh7afjcn-etc-os-release.drv

...........
foo> closures copied successfully
stage...............> updating GRUB 2 menu...
stage...............> stopping the following units: apply-ec2-data.service, audit.service, kmod-static-nodes.service, network-local-commands.service, network-setup.service, nix-daemon.service, nix-daemon.socket, nscd.service, print-host-key.service, rngd.service, systemd-journal-catalog-update.service, systemd-modules-load.service, systemd-sysctl.service, systemd-timesyncd.service, systemd-tmpfiles-clean.timer, systemd-tmpfiles-setup-dev.service, systemd-udev-trigger.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, systemd-udevd.service, systemd-update-done.service
stage...............> setting up /etc...
stage...............> NOT restarting the following changed units: amazon-init.service, systemd-journal-flush.service, systemd-logind.service, systemd-random-seed.service, systemd-remount-fs.service, systemd-tmpfiles-setup.service, systemd-update-utmp.service, systemd-user-sessions.service, user@0.service
stage...............> activating the configuration...
stage...............> restarting systemd...
stage...............> setting up tmpfiles
stage...............> reloading the following units: dbus.service, dev-hugepages.mount, dev-mqueue.mount, firewall.service, sys-fs-fuse-connections.mount, sys-kernel-debug.mount
stage...............> restarting the following units: dhcpcd.service, sshd.service, systemd-journald.service
stage...............> starting the following units: apply-ec2-data.service, audit.service, kmod-static-nodes.service, network-local-commands.service, network-setup.service, nix-daemon.socket, nscd.service, print-host-key.service, rngd.service, systemd-journal-catalog-update.service, systemd-modules-load.service, systemd-sysctl.service, systemd-timesyncd.service, systemd-tmpfiles-clean.timer, systemd-tmpfiles-setup-dev.service, systemd-udev-trigger.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, systemd-update-done.service
stage...............> the following new units were started: data.mount
stage...............> activation finished successfully
foo> deployment finished successfully

[nix-shell:~/projects/nixops-aws-nvme-repro]$ nixops ssh stage

[root@stage:~]# lsblk
NAME    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda    202:0    0  50G  0 disk
└─xvda1 202:1    0  50G  0 part /
xvdf    202:80   0   5G  0 disk /data

[root@stage:~]# logout
Shared connection to 54.183.197.244 closed.

 ~/projects/nixops   master  gco fix-aws-5gen-disk-attach
Switched to branch 'fix-aws-5gen-disk-attach'
Your branch is up to date with 'origin/fix-aws-5gen-disk-attach'.
 ~/projects/nixops   fix-aws-5gen-disk-attach  git rev-parse HEAD
9ce5df02469c6f111a650cd07c1461932b619c4d

[nix-shell:~/projects/nixops-aws-nvme-repro]$ exit
 ~/projects/nixops-aws-nvme-repro   master ●  nix-shell
...........
running test
......................................
----------------------------------------------------------------------
Ran 38 tests in 0.007s

OK

[nix-shell:~/projects/nixops-aws-nvme-repro]$ nixops deploy
stage...............> creating EBS volume of 5 GiB...
stage...............> waiting for volume ‘vol-01c2e03bfda097684’ to become available... [creating] [available]
stage...............> attaching volume ‘vol-01c2e03bfda097684’ as ‘/dev/xvdf’... error: EC2ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>InvalidParameterValue</Code><Message>Invalid value '/dev/sdf' for unixDevice. Attachment point /dev/sdf is already in use</Message></Error></Errors><RequestID>07706533-16dd-4c49-bbe4-9393e7cc9967</RequestID></Response>

@srghma
Copy link
Contributor Author

srghma commented Jun 29, 2018

https://pastebin.com/BtBRK6HV - python2 tests.py tests.functional.test_backups

https://pastebin.com/j4aMz0F6 - python2 tests.py tests.functional.test_starting_starts -A ec2
https://pastebin.com/tnbADVxd - python2 tests.py tests.functional.test_ec2_with_nvme_device_mapping

the issue @AmineChikhaoui found with /dev/xdv devices - https://pastebin.com/g6D2sxUM

@AmineChikhaoui, I have fixed the issue, can you try again?

@AmineChikhaoui
Copy link
Member

@srghma yeah I confirm it works now.

@rbvermaa mind having a final look ?

@srghma
Copy link
Contributor Author

srghma commented Jul 10, 2018

Hi @rbvermaa, I'm still interested in this pr, if you can't review it, can you delegate this to someone else?

@AmineChikhaoui
Copy link
Member

@srghma can you rebase please ?

@srghma
Copy link
Contributor Author

srghma commented Sep 5, 2018

@AmineChikhaoui I've merged instead rebase so you could see how I've resolved conflict https://pastebin.com/wJxrxBcY

with this commit 98dc2de

I've substituted

if devices == [] or _sd_to_xvd(k) in devices:

with

            device_real = device_name_stored_to_real(device_stored)

            if devices == [] or device_real in devices:

P.S. you can find same pattern in restore method

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants