Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph: 12.2.7->13.2.4 #49866

Closed
wants to merge 4 commits into from
Closed

ceph: 12.2.7->13.2.4 #49866

wants to merge 4 commits into from

Conversation

krav
Copy link
Contributor

@krav krav commented Nov 7, 2018

Motivation for this change

This upgrades ceph to the latest version. Closes #42830.

There are caveats:

  • The dashboard is not included. This seems to be in line with other distributions, upstream master has a less byzantine build system for it coming.
  • The package builds its own boost, I couldn't get it to locate ours.
  • I ended up making changes to generic.nix that probably would break ceph 12, and I didn't see a reason to maintain several simultaneous versions, so I merged it with default.nix (in it's own commit).

Additionally I added a client output containing just enough to manage rbd mounts (modelled on debian's ceph-common) and fixed some permission and path stuff in the module (see commit).

Ping @nh2, @wkennington, @adev, @ak3n.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Fits CONTRIBUTING.md.

nixos/modules/services/network-filesystems/ceph.nix Outdated Show resolved Hide resolved
nixos/modules/services/network-filesystems/ceph.nix Outdated Show resolved Hide resolved
nixos/modules/services/network-filesystems/ceph.nix Outdated Show resolved Hide resolved
pkgs/tools/filesystems/ceph/default.nix Outdated Show resolved Hide resolved
pkgs/tools/filesystems/ceph/default.nix Outdated Show resolved Hide resolved
pkgs/tools/filesystems/ceph/default.nix Outdated Show resolved Hide resolved
@srhb
Copy link
Contributor

srhb commented Nov 9, 2018

cc @lejonet if you're still Ceph-knowledgeable. :)

@lejonet
Copy link
Contributor

lejonet commented Nov 9, 2018

I've gone through the PR and see no shenaningans you haven't addressed. I fully agree with the name changes on the variables in the service module and thank you for removing the mgr requirement that I have forgotten to fix with a PR of my own (@srhb and I have talked about that one before, it was a naive assumption from my part when making the module, because I was only testing single node configuration).

I'd ofc also like an answer of the extraUsers vs users, because from my understanding, they should be the same and equivalent to each other but I never found any docs telling which should be preferred, which is why I chose the ones without extra.

Copy link
Contributor

@lejonet lejonet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Biggest thing to note regarding maintaining only one version is that every other major ceph release is a LTS release. This means that all security fixes get backported to that release until the next LTS release. So there is an argument for having 1 LTS package and one "current" package for ceph, but as they are going through some changes with the build-env and such, to make it much easier to package and maintain, just having one release atm makes sense.

@krav krav requested a review from infinisil as a code owner November 12, 2018 08:16
@krav krav changed the title ceph: 12.2.7->13.2.2 wip: ceph: 12.2.7->13.2.2 Nov 22, 2018
@krav
Copy link
Contributor Author

krav commented Nov 22, 2018

I'm putting a wip on this. Just discovered that ceph-volume has disappeared from the output, and I bet some people find that tool useful.

@krav krav changed the title wip: ceph: 12.2.7->13.2.2 ceph: 12.2.7->13.2.2 Nov 26, 2018
@srhb srhb self-assigned this Feb 19, 2019
@srhb
Copy link
Contributor

srhb commented Feb 21, 2019

Outstanding issues:

  • Check that things that depend on ceph still work after the changes to all-packages
  • Rebase on master/unstable (and remove the extraneous argparse dep)
  • ?
  • Squash all the things

@srhb srhb changed the title ceph: 12.2.7->13.2.2 ceph: 12.2.7->13.2.4 Feb 21, 2019
@srhb
Copy link
Contributor

srhb commented Feb 25, 2019

@krav Do you want to address the points above, or would you rather one of lejonet or I do it? :) I don't want to force push to this branch without your go-ahead.

@lejonet
Copy link
Contributor

lejonet commented Mar 5, 2019

This (sledgehammer) appears to work, but I'd really prefer to figure out why cherrypy reports a wrong version, but only in the (weird?) ceph env.

srhb@ef3a380

I will see if I can get some digging done on it tonight, I agree with you that even if that sledgehammer works, it feels appropriate to find the root cause of it. So that we can fix the actual problem instead of hiding the symptoms.

@azazel75
Copy link
Contributor

azazel75 commented May 8, 2019

guys, @lejonet: any news on this?

@lejonet
Copy link
Contributor

lejonet commented May 9, 2019

I sadly was hit with a lot of real life so I haven't been able to do more digging than we did at this time, @srhb might have done more but iirc theu also were hit with real life

@johanot
Copy link
Contributor

johanot commented Jun 4, 2019

@srhb @lejonet I finally managed to get some time at work allocated for ceph. Where do you suggest I start? I guess we might just jump directly to Ceph 14 now, but question is whether we should try and get this PR merged first thing?

@lordcirth
Copy link
Contributor

lordcirth commented Jun 4, 2019

While I personally would prefer to have Nautilus (14), Mimic(13) is an LTS release, so there's an argument to be made for focusing on getting that working instead.

@johanot
Copy link
Contributor

johanot commented Jun 4, 2019

@lordcirth Oh.. I thought Nautilus was LTS. Appently not. In that case, I agree with you.

@lordcirth
Copy link
Contributor

lordcirth commented Jun 4, 2019

@lordcirth Oh.. I thought Nautilus was LTS. Appently not. In that case, I agree with you.

Yes, it's noted here in the .0 changelogs:

http://docs.ceph.com/docs/master/releases/nautilus/#v14-2-0-nautilus
http://docs.ceph.com/docs/master/releases/mimic/#v13-2-0-mimic

@azazel75
Copy link
Contributor

azazel75 commented Jun 4, 2019

I've spent some time in testing this PR out and it builds issue, but then I haven't tried a real setup and I stopped looking into this ....I was trying to understand if that problem with the dashboard is still there but I'm kind of a newbie on ceph and there is a ton of stuff in here ;-).. anyway, if I can be of any help, let me know

@lordcirth
Copy link
Contributor

lordcirth commented Jun 4, 2019

v13.2.6 just landed:

Notable Changes
---------------
* Ceph v13.2.6 now packages python bindings for python3.6 instead of
  python3.4, because EPEL7 recently switched from python3.4 to
  python3.6 as the native python3. See the announcement[1] _`
  for more details on the background of this change.

For a detailed changelog, please refer to the official blog post entry
at https://ceph.com/releases/v13-2-6-mimic-released/

[1]: https://lists.fedoraproject.org/archives/list/epel-announce@lists.fedoraproject.org/message/EGUMKAIMPK2UD5VSHXM53BH2MBDGDWMO

@lejonet
Copy link
Contributor

lejonet commented Jun 4, 2019

@johanot as @lordcirth noted, mimic is a LTS and thus there is value to ensure that this PR gets merged first. Mainly because it comes with some needed "upgrades" to the service module, in my own opinion at least, and secondly because its more or less done.

"The only thing" that needs to be fixed iirc is the dashboard plugin. The problem there is that they are using an embedded python interprenter, which they instrument through the CPython API. This means that our munging with env vars and so, to get stuff to find the relevant python env for it, doesn't have any effect.
This in turn results in the env being lost between the loading of a module and executing it, which leads to the dashboard plugins usage of cherrypy failing because it cannot find the .egg file to do a version check that enables some monkeypatching on very old versions of cherrypy.

With that said, in 14.2 they have refactored the dashboard plugin to be its own thing, and thus isn't "a part of" standard ceph anymore so I would suggest that minor work should be done with this PR to get it to 13.2.6. Potentially using the sledgehammer approach that @srhb found in #49866 (comment)

Work should then be done to get us up to using python3 and potentially 14.2.x (might be possible in the same PR, I haven't scrutinized the changes between 13 and 14).

Edit:
If we focus to get this to 13.2.6 and getting it merged, then there might be time to take the "should we have both LTS and latest ceph or just one of them" discussion.
I can see a value of having both, and hopefully they won't do any more major overhauls to the build-system that makes having both versions a nightmare to maintain.

@johanot
Copy link
Contributor

johanot commented Jun 16, 2019

Ok.. I've managed to bump to 13.2.6 here: master...johanot:ceph-13 .. But unfortunately it required applying the sledgehammer a couple of times again, removing two more cherrypy version checks. :( (johanot@1534794)

Without hesitation I've deployed this branch to our production env. Will try to test it sufficiently during the coming week.

@devhell
Copy link
Contributor

devhell commented Jul 18, 2019

Is there any news on this? Old ceph is currently marked as broken.

@johanot
Copy link
Contributor

johanot commented Jul 18, 2019

The news from my side is that we are now on ceph 14 in production. The ceph 14 upgrade (which I have on another branch somewhere) is largely based on the changes in this PR, and while we ran ceph 13 (johanot@1534794) in prod, we saw no issues. So basically, I'm ok with my changes getting merged, sledgehammering included, but... I neither have permissions to push nor merge.

@devhell
Copy link
Contributor

devhell commented Jul 20, 2019

@johanot Maybe, if you have time, it would be good to open a separate PR for Ceph 14 and reference this PR?

@@ -1,4 +1,4 @@
import ./make-test.nix ({pkgs, ...}: rec {
import ./make-test.nix ({pkgs, lib, ...}: rec {
name = "All-in-one-basic-ceph-cluster";
meta = with pkgs.stdenv.lib.maintainers; {
maintainers = [ lejonet ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to be a maintainer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I should add myself to maintainers. So yes, I want to be a maintainer.


rgwMimeTypesFile = mkOption {
type = with types; nullOr path;
default = "${pkgs.mime-types}/etc/mime.types";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use defaultText:

default = pkgs.krb5Full;
defaultText = "pkgs.krb5Full";

@@ -51,6 +51,9 @@ import ./make-test.nix ({pkgs, ...}: rec {
enable = true;
daemons = [ "0" "1" ];
};

# So that we don't have to battle systemd when bootstraping
systemd.targets.ceph.wantedBy = lib.mkForce [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean bootstrapping a new cluster is tricky?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both yes and no. The main reason I force the target off in the test is to not get a bunch of errors related to daemons not being able to find their files, because they haven't been bootstrapped yet, to clutter the test output. The errors are harmless but made it significantly harder to find actual errors in the output.

@johanot
Copy link
Contributor

johanot commented Jul 25, 2019

@grahamc @srhb Could one of you please rebase this branch with master and perhaps my ceph-13 branch as well? Then we can do another round of integration-testing and a collective review of all the changes.

@srhb
Copy link
Contributor

srhb commented Jul 25, 2019

I can do that tonight if you don't just want to open a separate PR with those changes. :)

@johanot
Copy link
Contributor

johanot commented Jul 25, 2019

@srhb as we talked about in another channel, I'll try to assemble a new branch and PR it separately. I can say already that ceph 13.2.6 doesn't build after rebasing with master :P I'll be back at some point once I get that sorted.

@johanot johanot mentioned this pull request Jul 27, 2019
10 tasks
@johanot
Copy link
Contributor

johanot commented Jul 27, 2019

TLDR; there's a collision on python2 backports on master, preventing ceph from building. I decided that instead of wasting time fiddling around with the python2-env, perhaps we should try to have a go with ceph 14, which uses python3. Please see #65470 and please comment and review over there.

@srhb
Copy link
Contributor

srhb commented Sep 4, 2019

Included in #65470 -- thanks again 😄

@srhb srhb closed this Sep 4, 2019
@nh2
Copy link
Contributor

nh2 commented Jun 21, 2020

"The only thing" that needs to be fixed iirc is the dashboard plugin. The problem there is that they are using an embedded python interprenter, which they instrument through the CPython API. This means that our munging with env vars and so, to get stuff to find the relevant python env for it, doesn't have any effect.
This in turn results in the env being lost between the loading of a module and executing it

@lejonet Can you explain this a bit further?

I'm trying to get the dashboard to work with Ceph 15.2.3, and get:

10 mgr[py] Computed sys.path '/nix/store/ahn8d1bnkrvj7gbxc90cwqq037g740rs-ceph-15.2.3/lib/python3.7/site-packages:/nix/store/4naw242f2izj1rs55hxcc66zifqcqbn0-python3-3.7.6-env/lib/python3.7/site-packages:/nix/store/55b9ip7xkpimaccw9pa0vacy5q94f5xa-python3-3.7.6/lib/python37.zip:/nix/store/55b9ip7xkpimaccw9pa0vacy5q94f5xa-python3-3.7.6/lib/python3.7:/nix/store/55b9ip7xkpimaccw9pa0vacy5q94f5xa-python3-3.7.6/lib/python3.7/lib-dynload:/nix/store/icpiyydgbnp62pvdh96ff0g4by2zy145-ceph-unwrapped-15.2.3/share/ceph/mgr:/nix/store/55b9ip7xkpimaccw9pa0vacy5q94f5xa-python3-3.7.6/lib/python3.7/site-packages'
-1 mgr[py] Module not found: 'cephadm'
-1 mgr[py] Traceback (most recent call last):
  File "/nix/store/icpiyydgbnp62pvdh96ff0g4by2zy145-ceph-unwrapped-15.2.3/share/ceph/mgr/cephadm/__init__.py", line 6, in <module>
    from .module import CephadmOrchestrator
  File "/nix/store/icpiyydgbnp62pvdh96ff0g4by2zy145-ceph-unwrapped-15.2.3/share/ceph/mgr/cephadm/module.py", line 31, in <module>
    from ceph.deployment import inventory, translate
ModuleNotFoundError: No module named 'ceph'

-1 mgr[py] Class not found in module 'cephadm'

which is this.

I do not understand why it cannot find the ceph module, because it uses PySys_SetPath(get_site_packages()) which correctly includes /nix/store/...-ceph-15.2.3/lib/python3.7/site-packages/ (which contains ceph-1.0.0-py3.7.egg/ceph/__init__.py), and if I set PYTHONPATH to this for my python3 and import ceph, that works just fine.

What does the CPython API do different that results in this breakage?

@nh2
Copy link
Contributor

nh2 commented Jun 21, 2020

I can see in strace that the maximum it does with ceph-1.0.0-py3.7.egg is to read the contents of this dir:

getdents64(22</nix/store/...-ceph-15.2.3/lib/python3.7/site-packages/ceph-1.0.0-py3.7.egg>, [{d_ino=1317894, d_off=783223069217015699, d_reclen=32, d_type=DT_DIR, d_name="EGG-INFO"}, {d_ino=1194113, d_off=3081453319264304844, d_reclen=24, d_type=DT_DIR, d_name=".."}, {d_ino=1317901, d_off=5351582700327092632, d_reclen=24, d_type=DT_DIR, d_name="ceph"}, {d_ino=1317893, d_off=9223372036854775807, d_reclen=24, d_type=DT_DIR, d_name="."}], 32768) = 104
getdents64(22</nix/store/...-ceph-15.2.3/lib/python3.7/site-packages/ceph-1.0.0-py3.7.egg>, [], 32768) = 0
close(22</nix/store/...-ceph-15.2.3/lib/python3.7/site-packages/ceph-1.0.0-py3.7.egg>) = 0

But then it never does more than that, it never goes into ceph-1.0.0-py3.7.egg/ceph/.

@nh2
Copy link
Contributor

nh2 commented Jun 21, 2020

Looks like I can work around it by putting $(toPythonPath $out)/ceph-1.0.0-py3.7.egg straight onto the PYTHONPATH, I'm not sure why that's necessary though, and it's ugly.

WIth that I get a successful

2020-06-21T01:47:43.210+0000 7fa9c759ff40  1 mgr[py] Loading python module 'dashboard'

@nh2
Copy link
Contributor

nh2 commented Jun 21, 2020

Also I found that some nixpkgs python packages do

rm -f "$out/lib/${python.libPrefix}"/site-packages/site.py*

but i don't know why and none of them have comments on it. @FRidh @jonringer Do you know?

Maybe it's related?

@nh2
Copy link
Contributor

nh2 commented Jun 21, 2020

When I run (inspection tip from @energizah / energizer on IRC):

# PYTHONPATH=/nix/store/1fy23hzpll8kx160z1mb502bq7rrirzc-ceph-15.2.3/lib/python3.7/site-packages/ python3 -m site
sys.path = [
    '/root',
    '/nix/store/1fy23hzpll8kx160z1mb502bq7rrirzc-ceph-15.2.3/lib/python3.7/site-packages',
    '/nix/store/1fy23hzpll8kx160z1mb502bq7rrirzc-ceph-15.2.3/lib/python3.7/site-packages/ceph_volume-1.0.0-py3.7.egg',
    '/nix/store/1fy23hzpll8kx160z1mb502bq7rrirzc-ceph-15.2.3/lib/python3.7/site-packages/ceph-1.0.0-py3.7.egg',
    '/nix/store/55b9ip7xkpimaccw9pa0vacy5q94f5xa-python3-3.7.6/lib/python37.zip',
    '/nix/store/55b9ip7xkpimaccw9pa0vacy5q94f5xa-python3-3.7.6/lib/python3.7',
    '/nix/store/55b9ip7xkpimaccw9pa0vacy5q94f5xa-python3-3.7.6/lib/python3.7/lib-dynload',
    '/nix/store/55b9ip7xkpimaccw9pa0vacy5q94f5xa-python3-3.7.6/lib/python3.7/site-packages',
]
USER_BASE: '/root/.local' (doesn't exist)
USER_SITE: '/root/.local/lib/python3.7/site-packages' (doesn't exist)
ENABLE_USER_SITE: True

That looks like what it needs (putting ceph-1.0.0-py3.7.egg onto the path), but I don't understand why that's not in effect, and why it would be bad.

I can also see:

# cat /nix/store/1fy23hzpll8kx160z1mb502bq7rrirzc-ceph-15.2.3/lib/python3.7/site-packages/easy-install.pth 
./ceph_volume-1.0.0-py3.7.egg
./ceph-1.0.0-py3.7.egg

But I'm not sure what that does.

@lejonet
Copy link
Contributor

lejonet commented Jun 21, 2020

@nh2 It was a while since I did the deep dive into the whole mess that is cephs handling of python modules with Cpython, but if I remember correctly, due to how they initialized the actual interprenter, the PYTHONPATH env variable did not propagate to the actual interprenter that it then starts all modules with. This is a "working as intended" of the Cpython API, iirc it even said in the documentation around the functions they used that if you wanted to preserve env vars, like PYTHONPATH, you had to explicitly set them ahead of invoking the Cpython interprenter.
And that is ofc not done by cephs code, but its interesting that you could get $(toPythonPath $out)/ceph-1.0.0-py3.7.egg to work with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade to Ceph 13
10 participants