Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NixOS EC2 AMI: Support IMDSv2 #104193

Merged
merged 4 commits into from Nov 19, 2020
Merged

Conversation

grahamc
Copy link
Member

@grahamc grahamc commented Nov 18, 2020

Motivation for this change

AWS's metadata service has two versions. Version 1 allowed plain HTTP requests to get metadata. However, this was frequently abused when a user could trick an AWS-hosted server in to proxying requests to the metadata service. Since the metadata service is frequently used to generate AWS access keys, this is pretty gnarly. Version two is identical except it requires the caller to request a token and provide it on each request.

Today, starting a NixOS AMI in EC2 where the metadata service is configured to only allow v2 requests fails: the user's SSH key is to placed, and configuration provided by the user-data is not applied. The server is useless. This patch addresses that.

OpenStack "borrowed" this metadata fetcher when it happened to be the same API, but this new version is different enough, imo, to warrant undoing this borrow.

This PR is not yet tested in an actual deployment, and I'll mark it as ready once I have. It should be backported to stable and new AMIs should be built.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

These two APIs have diverged over time and are no longer compatible.
Copy link
Member

@andir andir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thank you for digging into this 👍

When you are done can you ensure that the commit messages have an equal amount of information as the PR description has right now? This will make it much easier for someone to debug this in the future as most editors support showing commit messages inline.

@grahamc grahamc force-pushed the ec2-metadata-imdsv2 branch 2 times, most recently from 6badc53 to b1c1087 Compare November 18, 2020 21:37
@grahamc grahamc marked this pull request as ready for review November 18, 2020 22:08
@grahamc
Copy link
Member Author

grahamc commented Nov 18, 2020

Done, @andir. I've tested this with real AWS machines and it works correctly, fixing the issue preventing our AMI from working with IMDSv2.

Copy link
Member

@andir andir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't run this but the solution really looks straight forward. Given that this is for AWS instances and not for my kettle's IoT interface the additional 30M do not seem too bad. If, in fact, this becomes an issue we could still use a slimmer version of curl or revisit our options here.

There are a few comments that I'd like to see clarified before I can "approve" this.

@worldofpeace
Copy link
Contributor

Oh, I see my review was requested. Is the goal to have this in the stable distro?
Anyways, I think graham is cool etc. and he do cool thing. that's my disposition on this pull request. Strongly in favor to put smiley faces, hearts, and sparkles.

@grahamc
Copy link
Member Author

grahamc commented Nov 18, 2020

Yeah, I want to backport it so NixOS on AWS users can improve the security of their instances.

@grahamc grahamc added the 9.needs: port to stable A PR needs a backport to the stable release. label Nov 18, 2020
@grahamc
Copy link
Member Author

grahamc commented Nov 19, 2020

Got a +1 from @andir on IRC.

@AmineChikhaoui
Copy link
Member

FYI @endgame and @jonringer , this implements #96593 a bit differently I think.

@grahamc
Copy link
Member Author

grahamc commented Nov 19, 2020

I provided Amine a copy of the AMI to test with NixOps, so let's merge once that is ready.

AWS's metadata service has two versions. Version 1 allowed plain HTTP
requests to get metadata. However, this was frequently abused when a
user could trick an AWS-hosted server in to proxying requests to the
metadata service. Since the metadata service is frequently used to
generate AWS access keys, this is pretty gnarly. Version two is
identical except it requires the caller to request a token and provide
it on each request.

Today, starting a NixOS AMI in EC2 where the metadata service is
configured to only allow v2 requests fails: the user's SSH key is not
placed, and configuration provided by the user-data is not applied.
The server is useless. This patch addresses that.

Note the dependency on curl is not a joyful one, and it expand the
initrd by 30M. However, see the added comment for more information
about why this is needed. Note the idea of using `echo` and `nc` are
laughable. Don't do that.
Comment on lines +54 to +60
try=1
while [ $try -le 10 ]; do
echo "(attempt $try/10) validating the EC2 instance metadata service v2 token..."
preflight_imds_token && break
try=$((try + 1))
sleep 1
done
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a preflight check of the IMDS token, as apparently sometimes the instance metadata service takes a second to figure itself out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we do not need more of these fallbacks.. It will soon escape the scope of where a shell script appears to be the right thing to me.

The commit message indicates it can take minutes for the metadata service to respond properly. Yet we are only waiting for to 30s for a response. Is that a trade-off between slow boot / unusable machines?

I wonder if the metadata service might also fail to hand out the "current" token during startup. Since we are already retrieving the instance-id should be validate something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we talked on IRC and it is moment not minutes. I guess 30s are fine then?!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"minutes" was a mistake on my part. I updated it to say "moments". I too hope we don't need more, it is already growing to the limits of comfort. I think once we get evidence of a working token, we're good to go, but I'm happy to be shown wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andir I'm not sure there's a "current" token:

https://aws.amazon.com/blogs/security/defense-in-depth-open-firewalls-reverse-proxies-ssrf-vulnerabilities-ec2-instance-metadata-service/ under "what's new in IMDSv2":

The token is never stored by IMDSv2 and can never be retrieved by subsequent calls, so a session and its token are effectively destroyed when the process using the token terminates. There’s no limit on the number of requests within a single session, and there’s no limit on the number of IMDSv2 sessions.

So guess it generates some cryptographic doohickey each time you PUT?

According to Freenode's ##AWS, the metadata server can sometimes
take a few moments to get its shoes on, and the very first boot
of a machine can see failed requests for a few moments.
preflight_imds_token() {
# retry-delay of 1 selected to give the system a second to get going,
# but not add a lot to the bootup time
${curl}/bin/curl \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point we might consider setting PATH=${curl}/bin:$PATH or something. This will safe use ~100 bytes or so ;-)

Comment on lines +54 to +60
try=1
while [ $try -le 10 ]; do
echo "(attempt $try/10) validating the EC2 instance metadata service v2 token..."
preflight_imds_token && break
try=$((try + 1))
sleep 1
done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we do not need more of these fallbacks.. It will soon escape the scope of where a shell script appears to be the right thing to me.

The commit message indicates it can take minutes for the metadata service to respond properly. Yet we are only waiting for to 30s for a response. Is that a trade-off between slow boot / unusable machines?

I wonder if the metadata service might also fail to hand out the "current" token during startup. Since we are already retrieving the instance-id should be validate something?

Copy link
Member

@AmineChikhaoui AmineChikhaoui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks @grahamc

Copy link
Contributor

@endgame endgame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the ping. There are a couple of small things that I think need to change. Fixing IMDSv2 on nixos was actually a yakshave; I need the public IPv4 address from IMDS for something else I'm working on, and tried to fix IMDS while I was there.

Also, the metadata dir should probably cleared on each boot, because otherwise it can become outdated w.r.t actions people can take in the management console.

I think there's enough difference in #96593 that it's probably worth @grahamc having a look and seeing if there's anything he wants to lift: I constructed the PUT request using busybox nc, so I don't pull additional packages into initramfs, and the fetcher is parameterised so the EC2/openstack common parts have been DRYed up. Some of the comments might also be worth pulling across. I don't have a preflight check on the IMDS, though.

Sidenote: I must confess that I'm a little frustrated by what feels like a process failure here: I'd been trying to get help testing my PR for months, and there's so much to do on NixOS that duplicating work isn't great. But at the end of the day, progress is excellent and I'm glad to see this about to land 👍 🎉

Comment on lines +8 to +21
if ! [ -e "$metaDir/ami-manifest-path" ]; then
wget ${wgetExtraOptions} -O "$metaDir/ami-manifest-path" http://169.254.169.254/1.0/meta-data/ami-manifest-path
fi

if ! [ -e "$metaDir/user-data" ]; then
wget ${wgetExtraOptions} -O "$metaDir/user-data" http://169.254.169.254/1.0/user-data && chmod 600 "$metaDir/user-data"
fi

if ! [ -e "$metaDir/hostname" ]; then
wget ${wgetExtraOptions} -O "$metaDir/hostname" http://169.254.169.254/1.0/meta-data/hostname
fi

if ! [ -e "$metaDir/public-keys-0-openssh-key" ]; then
wget ${wgetExtraOptions} -O "$metaDir/public-keys-0-openssh-key" http://169.254.169.254/1.0/meta-data/public-keys/0/openssh-key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please start using http://169.254.169.254/2009-04-04 and download the public IPv4 address? This is a version that openstack and ec2 both support (I checked the docs), and supports the http://169.254.169.254/2009-04-04/public-ipv4 endpoint which i need for some other stuff.

wget ${wgetExtraOptions} -O "$metaDir/ami-manifest-path" http://169.254.169.254/1.0/meta-data/ami-manifest-path
fi

if ! [ -e "$metaDir/user-data" ]; then
Copy link
Contributor

@endgame endgame Nov 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For both ec2 and openstack, I think these calls should not be guarded, and that it is correct to unconditionally replace fetched files. It might even be correct to clear the contents of metaDir on boot and re-fetch everything.

Example: currently, I can stop an EC2 instance in the management console, change its user data, and reboot. Then the updated user data is not applied. I think it's also possible to change image key pairs, and if I change the machine's subnet to start or stop assigning public IP addresses, then those files could appear or disappear.

{ curl, targetRoot, wgetExtraOptions }:
# Note: be very cautious about dependencies, each dependency grows
# the closure of the initrd. Ideally we would not even require curl,
# but there is no reasonable way to send an HTTP PUT request without
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be "reasonable", but you can construct a PUT with busybox nc:

echo -ne 'PUT /latest/api/token HTTP/1.1\r\nHost: 169.254.169.254\r\nX-aws-ec2-metadata-token-ttl-seconds: 60\r\n\r\n' | nc 169.254.169.254 80 | tail -n 1

@grahamc
Copy link
Member Author

grahamc commented Nov 19, 2020

I need the public IPv4 address from IMDS

I wonder why it needs to be part of the very early boot process, and not fetched as part of a service pre-start or something? We could add it, but I fear growing what is already a fairly gnarly script in a gnarly and unfortunate location.

Also, the metadata dir should probably cleared on each boot, because otherwise it can become outdated w.r.t actions people can take in the management console.

This sounds like a good idea, but I'm a bit unsure about backporting to stable. It would be a pretty big change in behavior, potentially breaking workflows. Maybe an additional PR which only hits master, plus release notes. What do you think?

I constructed the PUT request using busybox nc, so I don't pull additional packages into initramfs

I think the HTTP protocol is complicated enough to deserve using curl, and that implementing it with nc to save a few bytes is just going to cause pain for users / admins later on, and that it is well worth the extra ~20M it adds to the initrd. Even if it isn't so preferable to be doing this in the first place.

the fetcher is parameterised so the EC2/openstack common parts have been DRYed up

Since they are different APIs with different implementations, I don't like the idea of trying to make it "dry" -- they're doing different things with APIs that look similar but evolve differently. I thought it made sense to split out the two, creating a less dynamic script, as opposed to a highly parameterized script in stage-1. I'd rather a bit of duplication here, as opposed to risking subtle bugs from dryness.

Sidenote: I must confess that I'm a little frustrated by what feels like a process failure here: I'd been trying to get help testing my PR for months, and there's so much to do on NixOS that duplicating work isn't great. But at the end of the day, progress is excellent and I'm glad to see this about to land.

Yes. I agree with you.

@endgame
Copy link
Contributor

endgame commented Nov 19, 2020

Thanks for the quick response @grahamc .

IPv4 from IMDS - does it need to happen here?

I have a service that needs to know its public IP address, and was setting it up as a NixOS module. So I want that file present on disk so that I can refer to it in configuration.nix aka instance user data, before the image boot script does nixos-rebuild switch as part of normal boot. If there is a better place to achieve what I want?

Clear metadata on boot - is this actually a scary change?

Agree with your suggestion - this should happen later and go into release notes, so we don't blow up someone's cloud deployment by accident. I don't think that's that likely, but likelihood × severity = risk, and it's high enough that it's worth being cautious. I can do a separate PR for this.

DRYing openstack/ec2 metadata fetchers

Openstack explicitly advertises their IMDS as EC2-compatible. Though I'm not sure whether they have a token service on their IMDS, and maybe we say that means they've failed in that goal? In any case, maybe add a comment to keep the two fetchers in "reasonable" sync?

@grahamc
Copy link
Member Author

grahamc commented Nov 19, 2020

re the IP address: would it be possible to define your service like this? :

systemd.services.foobar.script = ''
  ip=$(get-from-imds public_ipv4)
  exec my-program "$ip"
'';

If you can send that PR shortly after this one, we can merge it soon while we're all on top of it. BTW I've started #nixos-cloud on Freenode if you're interested in that sort of thing.

w.r.t. Openstack, I've found it really hit or miss if a deployment has an up to date set of components, or has even configured and deployed all the components. The variability there leaves me grayer than I was before ... 😬

@grahamc
Copy link
Member Author

grahamc commented Nov 19, 2020

Pending the discussion here, a backport PR: #104302

@endgame
Copy link
Contributor

endgame commented Nov 19, 2020

re the IP address: would it be possible to define your service [to fetch the ip in a script]?

Probably; I hadn't thought about doing it that way. It's a low-pressure side project so I'll give it a go at some point. Definitely should not hold up this PR.

I'll look at clearing out the metadata directory on boot this weekend, hopefully.

That's all my concerns addressed, I think. Should we ship this, then?

@grahamc
Copy link
Member Author

grahamc commented Nov 19, 2020

Thanks, @endgame! Let's go!

@grahamc grahamc merged commit 7fa7bf2 into NixOS:master Nov 19, 2020
@grahamc grahamc deleted the ec2-metadata-imdsv2 branch November 19, 2020 21:11
grahamc added a commit to grahamc/nixpkgs that referenced this pull request Nov 19, 2020
fixup breakage from NixOS#104193

(cherry picked from commit 1ef139f)
grahamc added a commit that referenced this pull request Nov 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants