Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ceph bluestore #107822

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

Ceph bluestore #107822

wants to merge 3 commits into from

Conversation

aij
Copy link
Contributor

@aij aij commented Dec 28, 2020

Motivation for this change

Test Ceph's Bluestore on NixOS and include necessary setup for ceph-volume to work.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

Copy link
Contributor

@jonringer jonringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of the commit messages being ceph:, i would do nixosTests.ceph: as you're not affecting the underlying package, just the nixos tests.

Also, just copy the OSD bootstrap keyring to the OSD hosts
instead of the admin keyring.
@flokli
Copy link
Contributor

flokli commented Dec 28, 2020

Is non-bluestore still a thing? If so, could we keep one non-bluestore node in the cluster?

@lejonet
Copy link
Contributor

lejonet commented Dec 29, 2020

Is non-bluestore still a thing? If so, could we keep one non-bluestore node in the cluster?
Yes, it is still very much a thing, even tho the team behind ceph is very much pushing bluestore (which is understandable for many reasons, but of my own experience, they've maybe pushed it a bit too hard at times).

@flokli
Copy link
Contributor

flokli commented Dec 29, 2020

What I mean is, the test should probably test both, if we can easily.

@lejonet
Copy link
Contributor

lejonet commented Dec 29, 2020

What I mean is, the test should probably test both, if we can easily.

Yeah, seeing as at the OSD level, it really shouldn't matter what type of store is being used, testing them in the same test should definitively be doable.

Copy link
Contributor

@lejonet lejonet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

A thing we should probably think about is, if we want to add a ceph-volume test, which is basically just the single node test, but using ceph-volume in different ways. It all depends on how comprehensive we want to cover the ceph-volume tool honestly. Also a thing to be noted is that when using filestore, ceph-volume (at least before) required a journal partition to be defined.

@aij
Copy link
Contributor Author

aij commented Dec 29, 2020

The single-node test is still testing the "old style" non-LVM volume for osd0. I was originally planning to test ceph-volume with both filestore and bluestore, but I ran into the required separate journal too. Now that I think about it, that shouldn't be pretty easy to add to the single-node test.

The multi-node test is currently only using bluestore because I wanted to test bluestore without the xfs module loaded, and I wasn't sure if I should refactor the node configurations to make them less uniform.

How much do we need to worry about resource usage in these tests? I noticed they weren't blocking nixos-unstable updates from happening when ceph was broken, and I kind of want to write a more thorough / bigger HA cluster test, but I also wouldn't want to decrease the chances of tests getting included in regular hydra runs.

ceph-volume insists on using an exernal XFS journal when using
filestore.
@stale
Copy link

stale bot commented Jun 28, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 28, 2021
@flokli
Copy link
Contributor

flokli commented Jun 28, 2021

@aij any chance you can pick this up?

As far as resource usage goes, even if tests can't be included into the regular hydra runs, having these tests, to validate version bumps to not break anything is still super valuable.

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 28, 2021
@aij
Copy link
Contributor Author

aij commented Jun 29, 2021

Since bfc11c6 added bluestore tests, I'm not sure how useful this MR is any more.

This does use ceph-volume but it doesn't seem optimal since it has to be run with --no-systemd to avoid it trying to edit systemd files, which on NixOS are managed by Nix...

Is it worth trying to update this, or should I just close it? I've been a bit low on time, but should have some time next week when I might be able to revisit it.

@flokli
Copy link
Contributor

flokli commented Jul 3, 2021

Yeah, I wasn't aware bfc11c6 did land.

In general, it seems the ceph-volume commands are way less boilerplate-y, avoiding all the manual state poking in /var/lib/ceph/osd/$osd-name.

Maybe this could be rebased, updating ceph-single-node-bluestore.nix and ceph-single-node.nix to use ceph-volume?

It could even join the two files, introducing a withBluestore boolean flag - the code seems mostly redundant.

@stale
Copy link

stale bot commented Jan 3, 2022

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jan 3, 2022
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 20, 2024
@wegank wegank marked this pull request as draft March 25, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants