New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mesos: 1.0.1 -> 1.1.0 #21416
mesos: 1.0.1 -> 1.1.0 #21416
Conversation
I've fixed a few of the problems outlined in #19064, but I still need to figure out how to fix the python bindings. |
Regarding the python problems: I think the crash might have been introduced when we moved to protobuf 2.6 (from 2.5) when we moved to Mesos 1.0.1. With the protobuf 2.6 python bindings, we pass I don't know what the fix is, but I want to try the following:
|
So, I've built 0.28.2 to compare (from 7b1efbf), and I've also taken a look at the 1.1.0 deb package. Here's how things look: 0.28.2
1.1.0 (NixOS)
1.1.0 (deb pkg)
One thing that stands out is that in both the 0.28.2 package and the debian package of 1.1.0: libprotobuf is statically linked. I suspect this is due to telling the mesos build system to use our package of protobuf, as opposed to using the bundled version. I'll see if building with the bundled version resolves things, and if that works, I'll see if I can write a patch to have mesos always statically link the lib (and then I'll see if I can upstream that patch). |
33dd855
to
a6cbaec
Compare
@kamilchm I fixed the libprotobuf+python problems by statically linking libprotobuf:
I'm running one more build to sanity check. However, I'm pretty sure that the package is now in a state that I feel comfortable asking for testing. So for anyone following this PR, please do try it out and let me know if you run into any issues. In the meantime, I'm going to try to write some tests. |
a6cbaec
to
2b40278
Compare
@kamilchm I'm getting an error that, from looking through the IRC logs, I see you too were running into:
Interestingly, if I override the docker path in the following ways and I still get the same "Aborted" error:
So it's clearly not a problem invoking I've sprinkled in some LOG statements, and I'll see if I can figure out where precisely it's failing, and then I'll try to step through with gdb. |
So, it's definitely failing at the |
Ah, that's because, per the problems mentioned in the previous PR, I removed the patch to set For the curious, here's what gets invoked:
|
fba5a15
to
e053900
Compare
@cstrahan It was some time ago when I was trying to run different NixOS configuration. However, I would stick with nixos tests which I introduced later. If we find new bugs these needs to be added to the test module. |
I resolved the I'm still adapting @kamilchm's tests, as I'm running into issues with the "docker" containerizer (it really wants to connect to the internet to fetch the test image, despite having already |
Oh, so I made some bad assumptions, and I need to patch out some more paths. I'm going to let mesos build overnight, and then I'll update this tomorrow morning. Hopefully that's the last big change before I have the new tests working. |
Hm... So I'm having a hard time deciding what to do about Another option, and the one I've been experimenting with up to now, is to patch the source so When creating images with our |
I think, for now, I'm going to add |
It reminds me my confusion about |
I might switch to wrappers... I'll have to give it some thought. I don't like the indirection, but the simplicity of that approach might outweigh any objections to wrappers. |
We shouldn't place additional restrictions on containers, outside of what Mesos does itself. We should always be able to run the same containers on a NixOS Mesos as on a Mesosphere Mesos, etc. I agree that forcing the container shell to |
e053900
to
d5273dc
Compare
@philip-wernersbach Thanks for the feedback! Are you using Mesos on NixOS? Curious how things have been working for you, if so. |
As of the last push, @kamilchm's test cases pass (with some minor edits, e.g. using mesos executor instead of docker). There are also a handful of new module options that I've found useful while testing Mesos locally and in the NixOS tests. |
d5273dc
to
62b530d
Compare
@cstrahan No, we're using Mesos 1.0.1 on CentOS, installed via Nix and Disnix (@svanderburg). Except for the problems noted here and here, which I've fixed with hacks, everything is working well for our use case. Which is using Marathon with the Mesos containerizer. |
@philip-wernersbach gotcha. The second issue(s) should be fixed now, but I'll need to check the SSL/curl problem. What was the hack you had to do to fix it? |
62b530d
to
dab9464
Compare
dab9464
to
7ebcada
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested my use cases and it works for me.
Great work that makes Mesos on NixOS fully usable as it was before 1.0!
master = "master:5050"; | ||
dockerRegistry = registry; | ||
executorEnvironmentVariables = { | ||
PATH = "/run/current-system/sw/bin"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be the default to make Mesos on NixOS behaves the sames as in other distros? @philip-wernersbach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kamilchm In the test framework in mesos_test.py
we set the command to cat test.result
, but that'll only work if cat
is accessible on $PATH
(in the original source (which remains unchanged, btw), the default mesos agent path is the typical /bin:/usr/bin:...etc..., so on normal distros cat
and such would be on $PATH
). So, to keep our test simple (that is, to avoid patching in the full path to cat
in mesos_test.py
) we mimic how mesos would run on other distros by using /run/current-system/sw/bin
in this test case.
Alternatively, we could set the default to /run/current-system/sw/bin
(e.g. via the executorEnvironmentVariables.PATH
) so stuff like coreutils
are accessible by default, and then we could remove this line.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think it's reasonable to keep it explicit.
self.download_uri = sys.argv[2] | ||
|
||
def resourceOffers(self, driver, offers): | ||
log("XXX got resource offer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XXX -> NixosTestScheduler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll make that change before merging.
}; | ||
|
||
patches = [ | ||
# https://reviews.apache.org/r/36610/ | ||
# TODO: is this still needed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this patch is no longer needed since https://reviews.apache.org/r/46730/ was merged
Should this be merged? Or is it a WIP? |
I think this is, at this point, strictly an improvement on what we had before, and shouldn't introduce any regressions. So it should be safe to merge, though I do want to make some minor changes here and there. |
Ok, since you have merge rights I'll just leave it to you :) |
Alright, think I've waited long enough; merging. |
Motivation for this change
Update Mesos to 1.1.0.
Things done
(nix.useSandbox on NixOS,
or option
build-use-sandbox
innix.conf
on non-NixOS)
nix-shell -p nox --run "nox-review wip"
./result/bin/
)/cc @philip-wernersbach @kamilchm @benley @kevincox @ibrahimsag