Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove limits on output file size. #256

Merged
merged 3 commits into from Sep 13, 2021
Merged

Remove limits on output file size. #256

merged 3 commits into from Sep 13, 2021

Conversation

andrewla
Copy link
Contributor

@andrewla andrewla commented Dec 5, 2020

The use of self-pointers into a vector<> meant that the vector could not be relocated, which meant that any growth in the file had to fit within the initial reservation.

Now, instead of a pointer to a header and a pointer to the contents, those values are computed dynamically from the underlying vector().

I made an attempt at writing a test that would provide coverage, but couldn't make it work. If you have suggestions for writing a test to test this functionality, please let me know and I'll give it a shot.

@andrewla
Copy link
Contributor Author

andrewla commented Feb 2, 2021

Fixes #88
Fixes #62

@domenkozar
Copy link
Member

@andrewla I know this is probably not helpful, but PRs with tests have a higher chance of getting merged.

I do not have ideas how to make a dummy big output.

@andrewla
Copy link
Contributor Author

andrewla commented Feb 5, 2021

I'm going to put this on hold. As a stress test, I used this as an overlay for patchelf in nix, I'm getting errors trying to build gcc.

Not clear if this is patchelf related or some configuration problem, but for the moment I don't consider this safe to merge.

@dstahlke
Copy link

FWIW, this patch seems to work for me. I tried this patch, and also tried just boosting the buffer size to 256MB. The output of each method was identical. Unfortunately I am not able to share my test case, because of IP reasons. I haven't reviewed the changes to make sure there are no lingering pointers, but at least it passes valgrind.

@andrewla
Copy link
Contributor Author

I'm all but positive that the errors encountered are related to how nix does bootstrapping of the compiler, rather than related to this change specifically.

@Mic92
Copy link
Member

Mic92 commented Aug 21, 2021

Hey. I joined the maintainer team of patchelf recently if you could resolve the merge conflicts I give this a test.

@andrewla
Copy link
Contributor Author

andrewla commented Sep 3, 2021

I applied the latest changes; I get a strange test failure locally because it can't find the "strip" utility but that appears to be some autoconf issue that is probably local to my machine; running the strip manually (to produce libbig-dynstr.debug makes the test pass).

@Mic92
Copy link
Member

Mic92 commented Sep 4, 2021

Does make STRIP=strip fixes that? Might be because of

STRIP ?= strip

@Mic92
Copy link
Member

Mic92 commented Sep 4, 2021

So one way to test this is to link in a large binary blob (can be created with dd): https://balau82.wordpress.com/2012/02/19/linking-a-binary-blob-with-gcc/ and than create a case where this section needs to be replaced.
I am currently working on a deadline for a different project so I might look later at this unless you come up with a test first.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-to-work-around-patchelf-maximum-file-size-exceeded-error/14920/2

@trofi
Copy link

trofi commented Sep 8, 2021

I think the problem is not in original binary size, but in limited growth that patchelf allows for. We can pick a tiny hello world and try to add 32MB of data to it. Say, 32MB of NEEDED libraries:

$ nix-shell -p patchelf

# very clumsy way to achieve many long "--add-needed AAAAAAAAA...." strings.
[nix-shell:/tmp]$ echo 'int main(){}' | gcc -x c - -o a -fPIC -pie; ./a; while patchelf --add-needed $(perl -e 'print "a" x 100000;') --add-needed $(perl -e 'print "b" x 100000;') --add-needed $(perl -e 'print "c" x 100000;') --add-needed $(perl -e 'print "d" x 100000;') --add-needed $(perl -e 'print "e" x 100000;') --add-needed $(perl -e 'print "f" x 100000') --add-needed $(perl -e 'print "g" x 100000') --add-needed $(perl -e 'print "h" x 100000') --add-needed $(perl -e 'print "i" x 100000') --add-needed $(perl -e 'print "j" x 100000') --add-needed $(perl -e 'print "k" x 100000') --add-needed $(perl -e 'print "l" x 100000') --add-needed $(perl -e 'print "m" x 100000') --add-needed $(perl -e 'print "m" x 100000') --add-needed $(perl -e 'print "o" x 100000') --add-needed $(perl -e 'print "p" x 100000') --add-needed $(perl -e 'print "q" x 100000') --add-needed $(perl -e 'print "r" x 100000') --add-needed $(perl -e 'print "s" x 100000') --add-needed $(perl -e 'print "t" x 100000') a; do echo again; done
again
again
again
again
again
again
again
again
again
again
again
again
again
again
again
again
again
patchelf: maximum file size exceeded

@samuela
Copy link
Member

samuela commented Sep 8, 2021

Ran into this issue while attempting to package tensorflow 2.6.0:

Rewriting #!/nix/store/1g54cwl6m78rgrqsnn1pp29dbvldpvsd-python3-3.9.6/bin/python3.9 to #!/nix/store/1g54cwl6m78rgrqsnn1pp29dbvldpvsd-python3-3.9.6
wrapping `/nix/store/hcghxmsg8smr0fmdpq9nr2pfifgcz586-python3.9-tensorflow-gpu-2.6.0/bin/tflite_convert'...
patchelf: maximum file size exceeded
builder for '/nix/store/gkp1bdbkya9p5zgxrbbp70qks31zmsz3-python3.9-tensorflow-gpu-2.6.0.drv' failed with exit code 1

@samuela
Copy link
Member

samuela commented Sep 8, 2021

Oddly enough, I've not been able to reproduce the failure outside of the nix build environment. I've been trying to provide a test case for this, but patchelf isn't failing outside of nix-build. In my postFixup:

for $lib in ...
      echo "processing patchelf"
      cp "$lib" "$lib.orig"
      echo "$lib"
      echo $(patchelf --print-rpath "$lib")
      patchelf --set-rpath "${cudatoolkit}/lib:${cudatoolkit.lib}/lib:${cudnn}/lib:${nccl}/lib:$(patchelf --print-rpath "$lib")" "$lib"
done

And I get output:

processing patchelf
/nix/store/i5ywd496x0ppp1lifkfwncsmixdsvm9s-python3.9-tensorflow-gpu-2.6.0/lib/python3.9/site-packages/tensorflow/python/_pywrap_toco_api.so
/run/opengl-driver/lib:$ORIGIN/../../_solib_local/_U_S_Stensorflow_Spython_C_Upywrap_Utoco_Uapi.so___Utensorflow:$ORIGIN/../../_solib_local/_U_S_Stensorflow_Spython_C_Upywrap_Utensorflow_Uinternal_Ulinux___Utensorflow_Spython:$ORIGIN/:$ORIGIN/..:/nix/store/g3h82za6g1j3gwz1br0spzpx1f20hc3c-tensorflow-gpu-2.6.0/lib:/nix/store/9bh3986bpragfjmr32gay8p95k91q4gy-glibc-2.33-47/lib:/nix/store/ql152marawkw62kvkjskndxccffdfns1-gcc-9.3.0-lib/lib
patchelf: maximum file size exceeded
exitHandler
_callImplicitHook 0 failureHook
builder for '/nix/store/1cvkfzsnpiz31qp0wpimqgiqhgj95xf3-python3.9-tensorflow-gpu-2.6.0.drv' failed with exit code 1

(You can find the full code to reproduce in NixOS/nixpkgs#137069, nix-build -A tensorflowWithCuda.)

But when I try to test patchelf outside of nix-build:

lib="/nix/store/i5ywd496x0ppp1lifkfwncsmixdsvm9s-python3.9-tensorflow-gpu-2.6.0/lib/python3.9/site-packages/tensorflow/python/_pywrap_toco_api.so.orig"
cp "$lib" /tmp/deleteme.so
patchelf --set-rpath "/nix/store/7z8w0varw65bfvgqb82lm4isggvfap23-cudatoolkit-11.0.3/lib:/nix/store/94nh9ffcg1fxx9915nqyfv1928q3k1g0-cudatoolkit-11.0.3-lib/lib:/nix/store/hcddc6qcb7mdv61qd4i5cgjz0yb43cf6-cudatoolkit-11.0-cudnn-8.1.1/lib:/nix/store/mqnx8hl70sras9szypmbn6y8bq6fcb92-nccl-2.7.8-1-cuda-11.2/lib:$(patchelf --print-rpath "/tmp/deleteme.so")" "/tmp/deleteme.so"

But then... it works oddly enough 😕

But @trofi's test case looks simpler anyhow!

samuela added a commit to samuela/nixpkgs that referenced this pull request Sep 8, 2021
@samuela
Copy link
Member

samuela commented Sep 9, 2021

Ok, I tried running this patched version of patchelf on my build, and I'm still seeing the same error. So either there's a bug in this PR, or I'm experience a separate failure mode altogether. I find the bug situation more likely since the error message is exactly the same as what's reported here.

Here's a branch with a full reproduction: https://github.com/samuela/nixpkgs/tree/samuela/pr-256-does-not-work. Just run nix-build -A python3Packages.tensorflowWithCuda to see the failure. Beware that building tensorflow is not for the faint of heart. I would recommend leveraging existing builds from my public cachix cache: https://app.cachix.org/cache/ploop. This way you can avoid having to build tensorflow from scratch entirely.

EDIT: I'm thoroughly confused how I'm still seeing this error message at all considering the word "maximum" doesn't even show up anywhere in patchelf.cc after this commit...

EDIT 2: Mystery solved... the subtlety was that it was failing in addOpenGLRunpath, not in the explicit patchelf call. The so that it's failing on is 781M, but I'd be happy to upload it if someone thinks that this test case would still be useful somehow.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-to-work-around-patchelf-maximum-file-size-exceeded-error/14920/3

@Mic92
Copy link
Member

Mic92 commented Sep 9, 2021

I think the problem is not in original binary size, but in limited growth that patchelf allows for. We can pick a tiny hello world and try to add 32MB of data to it. Say, 32MB of NEEDED libraries:

$ nix-shell -p patchelf

# very clumsy way to achieve many long "--add-needed AAAAAAAAA...." strings.
[nix-shell:/tmp]$ echo 'int main(){}' | gcc -x c - -o a -fPIC -pie; ./a; while patchelf --add-needed $(perl -e 'print "a" x 100000;') --add-needed $(perl -e 'print "b" x 100000;') --add-needed $(perl -e 'print "c" x 100000;') --add-needed $(perl -e 'print "d" x 100000;') --add-needed $(perl -e 'print "e" x 100000;') --add-needed $(perl -e 'print "f" x 100000') --add-needed $(perl -e 'print "g" x 100000') --add-needed $(perl -e 'print "h" x 100000') --add-needed $(perl -e 'print "i" x 100000') --add-needed $(perl -e 'print "j" x 100000') --add-needed $(perl -e 'print "k" x 100000') --add-needed $(perl -e 'print "l" x 100000') --add-needed $(perl -e 'print "m" x 100000') --add-needed $(perl -e 'print "m" x 100000') --add-needed $(perl -e 'print "o" x 100000') --add-needed $(perl -e 'print "p" x 100000') --add-needed $(perl -e 'print "q" x 100000') --add-needed $(perl -e 'print "r" x 100000') --add-needed $(perl -e 'print "s" x 100000') --add-needed $(perl -e 'print "t" x 100000') a; do echo again; done
again
again
again
again
again
again
again
again
again
again
again
again
again
again
again
again
again
patchelf: maximum file size exceeded

Thanks. I try to add this test case as soon as I can.

@andrewla
Copy link
Contributor Author

andrewla commented Sep 10, 2021

To be honest I'm surprised that the test works, because each resize is for the same amount, which is less that 32MB. I guess the shifting or alignment makes it a cumulative effect.

Regardless, it takes a long time for this to run, and the produced executable doesn't run because the needed libraries aren't found.

If we use add-rpath instead, we get the same effect (overflow), and we can run the produced executable, but we can only add one rpath per command line so it's a little less efficient.

Considering a patch that would allow text arguments to be prefixed with @ to indicate that the argument should be read from a file instead of directly from the command line; that way we can just pass in a 40MB rpath string in one go.

@andrewla
Copy link
Contributor Author

Created #316 for the @file syntax. That makes testing this much simpler, and is also nice because these rpaths can get crazy.

@Mic92
Copy link
Member

Mic92 commented Sep 11, 2021

The @-pr was merged.

The use of self-pointers into a vector<> meant that the vector could not
be relocated, which meant that any growth in the file had to fit within the
inital reservation.

Now, instead of a pointer to a header and a pointer to the contents, those
values are computed dynamically from the underlying vector().
@andrewla
Copy link
Contributor Author

andrewla commented Sep 13, 2021

Added a test that attempts to grow the file. Have confirmed that this fails on the current master:

$ git checkout origin-master
...
$ git cherry-pick f9ebafc7a7cb964af62d53bf3ae8e7c6c20b7d1a
...
$ make
$ cd tests
$ ./grow-file.sh 
patchelf: maximum file size exceeded

@Mic92
Copy link
Member

Mic92 commented Sep 13, 2021

We also build now for clang for regression testing. I fixed the tests for that.

@Mic92 Mic92 merged commit f40a896 into NixOS:master Sep 13, 2021
@domenkozar domenkozar mentioned this pull request Sep 13, 2021
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-to-work-around-patchelf-maximum-file-size-exceeded-error/14920/5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants