New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ceph: 12.2.7->13.2.4 #49866
ceph: 12.2.7->13.2.4 #49866
Conversation
cc @lejonet if you're still Ceph-knowledgeable. :) |
I've gone through the PR and see no shenaningans you haven't addressed. I fully agree with the name changes on the variables in the service module and thank you for removing the mgr requirement that I have forgotten to fix with a PR of my own (@srhb and I have talked about that one before, it was a naive assumption from my part when making the module, because I was only testing single node configuration). I'd ofc also like an answer of the extraUsers vs users, because from my understanding, they should be the same and equivalent to each other but I never found any docs telling which should be preferred, which is why I chose the ones without extra. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Biggest thing to note regarding maintaining only one version is that every other major ceph release is a LTS release. This means that all security fixes get backported to that release until the next LTS release. So there is an argument for having 1 LTS package and one "current" package for ceph, but as they are going through some changes with the build-env and such, to make it much easier to package and maintain, just having one release atm makes sense.
I'm putting a |
Outstanding issues:
|
@krav Do you want to address the points above, or would you rather one of lejonet or I do it? :) I don't want to force push to this branch without your go-ahead. |
I will see if I can get some digging done on it tonight, I agree with you that even if that sledgehammer works, it feels appropriate to find the root cause of it. So that we can fix the actual problem instead of hiding the symptoms. |
guys, @lejonet: any news on this? |
I sadly was hit with a lot of real life so I haven't been able to do more digging than we did at this time, @srhb might have done more but iirc theu also were hit with real life |
While I personally would prefer to have Nautilus (14), Mimic(13) is an LTS release, so there's an argument to be made for focusing on getting that working instead. |
@lordcirth Oh.. I thought Nautilus was LTS. Appently not. In that case, I agree with you. |
Yes, it's noted here in the .0 changelogs: http://docs.ceph.com/docs/master/releases/nautilus/#v14-2-0-nautilus |
I've spent some time in testing this PR out and it builds issue, but then I haven't tried a real setup and I stopped looking into this ....I was trying to understand if that problem with the dashboard is still there but I'm kind of a newbie on ceph and there is a ton of stuff in here ;-).. anyway, if I can be of any help, let me know |
v13.2.6 just landed:
|
@johanot as @lordcirth noted, mimic is a LTS and thus there is value to ensure that this PR gets merged first. Mainly because it comes with some needed "upgrades" to the service module, in my own opinion at least, and secondly because its more or less done. "The only thing" that needs to be fixed iirc is the dashboard plugin. The problem there is that they are using an embedded python interprenter, which they instrument through the CPython API. This means that our munging with env vars and so, to get stuff to find the relevant python env for it, doesn't have any effect. With that said, in 14.2 they have refactored the dashboard plugin to be its own thing, and thus isn't "a part of" standard ceph anymore so I would suggest that minor work should be done with this PR to get it to 13.2.6. Potentially using the sledgehammer approach that @srhb found in #49866 (comment) Work should then be done to get us up to using python3 and potentially 14.2.x (might be possible in the same PR, I haven't scrutinized the changes between 13 and 14). Edit: |
Ok.. I've managed to bump to 13.2.6 here: master...johanot:ceph-13 .. But unfortunately it required applying the sledgehammer a couple of times again, removing two more cherrypy version checks. :( (johanot@1534794) Without hesitation I've deployed this branch to our production env. Will try to test it sufficiently during the coming week. |
Is there any news on this? Old ceph is currently marked as broken. |
The news from my side is that we are now on ceph 14 in production. The ceph 14 upgrade (which I have on another branch somewhere) is largely based on the changes in this PR, and while we ran ceph 13 (johanot@1534794) in prod, we saw no issues. So basically, I'm ok with my changes getting merged, sledgehammering included, but... I neither have permissions to push nor merge. |
@johanot Maybe, if you have time, it would be good to open a separate PR for Ceph 14 and reference this PR? |
@@ -1,4 +1,4 @@ | |||
import ./make-test.nix ({pkgs, ...}: rec { | |||
import ./make-test.nix ({pkgs, lib, ...}: rec { | |||
name = "All-in-one-basic-ceph-cluster"; | |||
meta = with pkgs.stdenv.lib.maintainers; { | |||
maintainers = [ lejonet ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you like to be a maintainer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I should add myself to maintainers. So yes, I want to be a maintainer.
|
||
rgwMimeTypesFile = mkOption { | ||
type = with types; nullOr path; | ||
default = "${pkgs.mime-types}/etc/mime.types"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use defaultText:
nixpkgs/nixos/modules/config/krb5/default.nix
Lines 86 to 87 in 2e869f7
default = pkgs.krb5Full; | |
defaultText = "pkgs.krb5Full"; |
@@ -51,6 +51,9 @@ import ./make-test.nix ({pkgs, ...}: rec { | |||
enable = true; | |||
daemons = [ "0" "1" ]; | |||
}; | |||
|
|||
# So that we don't have to battle systemd when bootstraping | |||
systemd.targets.ceph.wantedBy = lib.mkForce []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean bootstrapping a new cluster is tricky?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both yes and no. The main reason I force the target off in the test is to not get a bunch of errors related to daemons not being able to find their files, because they haven't been bootstrapped yet, to clutter the test output. The errors are harmless but made it significantly harder to find actual errors in the output.
I can do that tonight if you don't just want to open a separate PR with those changes. :) |
@srhb as we talked about in another channel, I'll try to assemble a new branch and PR it separately. I can say already that ceph 13.2.6 doesn't build after rebasing with master :P I'll be back at some point once I get that sorted. |
TLDR; there's a collision on python2 backports on master, preventing ceph from building. I decided that instead of wasting time fiddling around with the python2-env, perhaps we should try to have a go with ceph 14, which uses python3. Please see #65470 and please comment and review over there. |
Included in #65470 -- thanks again 😄 |
@lejonet Can you explain this a bit further? I'm trying to get the dashboard to work with Ceph 15.2.3, and get:
which is this. I do not understand why it cannot find the What does the CPython API do different that results in this breakage? |
I can see in getdents64(22</nix/store/...-ceph-15.2.3/lib/python3.7/site-packages/ceph-1.0.0-py3.7.egg>, [{d_ino=1317894, d_off=783223069217015699, d_reclen=32, d_type=DT_DIR, d_name="EGG-INFO"}, {d_ino=1194113, d_off=3081453319264304844, d_reclen=24, d_type=DT_DIR, d_name=".."}, {d_ino=1317901, d_off=5351582700327092632, d_reclen=24, d_type=DT_DIR, d_name="ceph"}, {d_ino=1317893, d_off=9223372036854775807, d_reclen=24, d_type=DT_DIR, d_name="."}], 32768) = 104
getdents64(22</nix/store/...-ceph-15.2.3/lib/python3.7/site-packages/ceph-1.0.0-py3.7.egg>, [], 32768) = 0
close(22</nix/store/...-ceph-15.2.3/lib/python3.7/site-packages/ceph-1.0.0-py3.7.egg>) = 0 But then it never does more than that, it never goes into |
Looks like I can work around it by putting WIth that I get a successful
|
Also I found that some nixpkgs python packages do rm -f "$out/lib/${python.libPrefix}"/site-packages/site.py* but i don't know why and none of them have comments on it. @FRidh @jonringer Do you know? Maybe it's related? |
When I run (inspection tip from @energizah /
That looks like what it needs (putting I can also see:
But I'm not sure what that does. |
@nh2 It was a while since I did the deep dive into the whole mess that is cephs handling of python modules with Cpython, but if I remember correctly, due to how they initialized the actual interprenter, the PYTHONPATH env variable did not propagate to the actual interprenter that it then starts all modules with. This is a "working as intended" of the Cpython API, iirc it even said in the documentation around the functions they used that if you wanted to preserve env vars, like PYTHONPATH, you had to explicitly set them ahead of invoking the Cpython interprenter. |
Motivation for this change
This upgrades ceph to the latest version. Closes #42830.
There are caveats:
Additionally I added a
client
output containing just enough to managerbd
mounts (modelled on debian'sceph-common
) and fixed some permission and path stuff in the module (see commit).Ping @nh2, @wkennington, @adev, @ak3n.
Things done
sandbox
innix.conf
on non-NixOS)nix-shell -p nox --run "nox-review wip"
./result/bin/
)nix path-info -S
before and after)