Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python2: backport fix for pyc race condition, part 2 #64900

Merged
merged 1 commit into from Jul 17, 2019

Conversation

abbradar
Copy link
Member

@abbradar abbradar commented Jul 16, 2019

Motivation for this change

Turns out fixing this only in importlib is not sufficient and we
need to backport CPython part of the fix too.

This patch is based on https://hg.python.org/cpython/rev/c16063765d3a
but because the code around is different there are some changes (C-strings
instead of Python objects etc.)

With this patch Tensorflow builds successfully on many-core machine.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nix-review --run "nix-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

I'd appreciate review of the patch given that I basically rewrote it.

@abbradar
Copy link
Member Author

I should also mention #64067 where the first part was introduced, but contrary to it this one isn't backported by any upstream distributions, reason unknown.

@timokau
Copy link
Member

timokau commented Jul 16, 2019

I'd appreciate review of the patch given that I basically rewrote it.

I can't give a comprehensive review due to a lack of time and I'm far from a C-developer myself anyway. But out of interest, why didn't you use my patch at https://github.com/NixOS/nixpkgs/pull/63616/files#diff-7af205fc79201d40af30ad4983bcc777?

@abbradar
Copy link
Member Author

Huh, I guess I... missed it? I'll compare them, thanks!

@timokau
Copy link
Member

timokau commented Jul 16, 2019

Well I did tell you that I pushed it, but its easy to miss 🤷‍♀️

Anyway, we definitely have to be a little more careful here than with #64067, since its's C, malloc/free is involved and there is no other-distro precedent.

@FRidh
Copy link
Member

FRidh commented Jul 16, 2019

Could we apply this patch only when building tensorflow?

@abbradar
Copy link
Member Author

@FRidh We could do that but it's a bug nevertheless, I feel it could show up in other massively parallel builds too. If there are any we'll have until the EOL that is ;)

@abbradar
Copy link
Member Author

I minimized the patch, also bringing it as close to the spirit of original patch as possible. Hopefully this helps with the review.

@abbradar
Copy link
Member Author

abbradar commented Jul 16, 2019

@FRidh Thinking more about it, could be problematic because Tensorflow will then link to its custom Python 2 library and then we'll have a conflicting Python library loaded into our usual interpreter.

EDIT: Fixable via patchelf though.

@abbradar
Copy link
Member Author

Reading more into the original patch: unless I'm mistaken there is actually a race condition in it too! Namely, in C (but not in Python) file is opened exclusively to prevent concurrent writes but then fclose()d before rename(), so another Python process can open .tmp file in this short period of time and begin writing while the file has already been renamed. This code isn't in final Python 3.4 release AFAICS so I suppose it has been replaced with something else.

Did I overlook something there? The fix is quite easy (move fclose() a bit further down) so I propose to do just that.

@timokau
Copy link
Member

timokau commented Jul 16, 2019

Could very well be a bug, probably wouldn't happen much in practice and the exclusive open was added as an afterthought.

@abbradar
Copy link
Member Author

Hehe, actually the bug that I described is present in Python right now but in a very light form - because id() is used the chance of this actually happening is pretty slim. Because rename() works differently on POSIX and Windows simply moving rename into with block wouldn't work for Windows here I think, so it's not that simple to fix. I'll report it but don't think it's nearly serious enough to fix it by ourselves given id() is 64-bit address of an object so the chances of a clash actually happening in such conditions are pretty slim.

Turns out fixing this only in importlib is not sufficient and we
need to backport CPython part of the fix too.

This patch is based on https://hg.python.org/cpython/rev/c16063765d3a
but because the code around is different there are some changes (C-strings
instead of Python objects etc.)

With this patch Tensorflow builds successfully on many-core machine.
@abbradar
Copy link
Member Author

Patch updated, tested this one through the night running Tensorflow compilations.

@FRidh FRidh self-assigned this Jul 17, 2019
@FRidh
Copy link
Member

FRidh commented Jul 17, 2019

could be problematic because Tensorflow will then link to its custom Python 2 library and then we'll have a conflicting Python library loaded into our usual interpreter.

EDIT: Fixable via patchelf though.

Yes, did the same with tkinter. Well, let's just see how it turns out. If we get trouble that may be the way forward.

@FRidh FRidh merged commit da295a1 into NixOS:staging Jul 17, 2019
@abbradar
Copy link
Member Author

Checked out tkinter - nice hack! I wonder if something similar could be done for things like Java (headless + GUI libraries as modules of some kind)...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants