Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ghc: Add enableDwarf flag, off by default. #52255

Closed
wants to merge 7 commits into from

Conversation

nh2
Copy link
Contributor

@nh2 nh2 commented Dec 15, 2018

This allows our GHCs to build programs with DWARF debug information when -g is passed, see https://ghc.haskell.org/trac/ghc/wiki/DWARF.

On by default on non-Darwin only because elfutils (which GHC uses for this) works only for ELF executables. Off by default until GHC ticket #15960 - Using -g causes differences in generated core is fixed.

PR is against staging so that the impact can be properly evaluated.

Motivation for this change

This can make debugging of many Haskell bugs and performance issues significantly easier.

Common use cases are:

  • load the core dumps in gdb and see from what Haskell segfaulting C code is called
  • set breakpoints and use valgrind to debug malloc issues
  • you can also use perf and kcachegrind to debug performance issues but results here are mixed

~This change is expected to increase the size of GHC and Haskell libraries, as debugging symbols are added to the object files.

The size of the eventual Haskell binaries that are not explicitly built with ghc -g should remain unaffected when the standard statish linking is used (that links Haskell libraries statically, and system libraries dynamically, which is nixpkgs' default).

Closure size should rise minimally to the addition of elfutils that provides libdw.

I believe that the slightly increased download size for developers will be tremendously offset by the benefits of being able to trouble-shoot crashes and performance issues without having to recompile the entire ecosystem.
(This is from my own perspective of using Haskell for work every day for many years, but I'm relatively sure that others feel the same.)

CC @bgamari @angerman @Gabriel439 @vaibhavsagar @basvandijk @roelvandijk

Things done

Much of the below isn't ticked yet because building takes a while and I added it for all GHC versions.

I have tested the change on 8.2.2 on top of release-18.03 so far, where I managed to produce dwarf symbols.

  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Assured whether relevant documentation is up to date
  • Fits CONTRIBUTING.md.

TODO list

@domenkozar
Copy link
Member

+1

@cartazio
Copy link

this wont work on darwin fyi :(

@@ -17,6 +18,9 @@
# library instead of the faster but GPLed integer-gmp library.
enableIntegerSimple ? !(stdenv.lib.any (stdenv.lib.meta.platformMatch stdenv.hostPlatform) gmp.meta.platforms), gmp

, # Allows this GHC to produce binaries that have DWARF debug symbols.
enableDwarf ? (!stdenv.isDarwin) # elfutils libdw supports only ELF
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better expression than !stdenv.isDarwin I could use that gives me "all ELF platforms"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angerman on #ghc chat said:

I doubt there is any consumer of Mach-o that is not macOS or iOS or really exotic. Same for Windows. Everyone else just uses elf.

@nh2
Copy link
Contributor Author

nh2 commented Dec 15, 2018

this wont work on darwin fyi :(

Yes just did a force push to correct that and updated the PR description.

@cartazio
Copy link

additionally, theres some sort limit that building base with -g tickles in the macho format (at least when i tried to )

@bgamari
Copy link
Contributor

bgamari commented Dec 15, 2018

It seems to me that this would go nicely with the DWARF-enabled package set I proposed in #51987.

@matthewbauer
Copy link
Member

Are ghc binaries built with debugging information by default? How does this interact with cabal's -O1 setting?

@nh2
Copy link
Contributor Author

nh2 commented Dec 15, 2018

@matthewbauer

Are ghc binaries built with debugging information by default?

Do you mean in general, or with this PR?

In general, GHC HQ provides bindists in both a normal and a -dwarf flavour.

For this PR:

  • --enable-dwarf-unwind only enables the stack unwinding functionality (printing of the curent stack trace if the program/libraries are compiled with -g), currently triggered by either of
    • The GHC RTS crashing (you get a stack trace when Please report this as a GHC bug happens in the RTS)
    • if you send SIGQUIT to GHC
  • we compile all the bundled libraries used by ghc (base, bytestring etc.) with -g, as well as the RTS C code, same as shown here
    • but I forgot to put that in (it isn't enabled by by --enable-dwarf-unwind), I'll fix that in a moment
  • when the above two are done, the final GHC binary itself will have debugging information in, it is not stripped, as the linker that builds it will simply copy all the symbols from all the .o files into the binary
    • so if the nix installPhase just calls make install instead of make install-strip (which I think is the case), the GHC binary itself will show a nice stack trace with all info when it crashes in the RTS (as of writing, for GHC itself, only the RTS panic function rtsFatalInternalErrorFn() prints a stack trace, Haskell panics don't)
  • when GHC is built with --enable-dwarf-unwind, you can use the GHC.ExecutionStack module in your built programs, in particular the getStackTrace/showStackTrace functions
  • An important note is:

    While this sort of debugging does not require all libraries and RTS to be built with -g, it certainly is a very good idea. After all, currently stack unwinding stops at the first closure that doesn't have DWARF information associated with it.
    This is the reason why we want to have -g in GHC's bundled libraries, because GHC's bundled libraries (e.g. the version of bytestring provided by GHC) are used unless you explicitly add a different version to your GHC package database on top

How does this interact with cabal's -O1 setting?

Cabal may decide to not pass -g to GHC when building things, or to strip away debug symbols at the end of the build.

That is controlled by the cabal / Setup.hs --configure CLI flags:

--enable-executable-stripping / disable-executable-stripping
--enable-library-stripping / disable-library-stripping
--enable-debug-info[=n] / --disable-debug-info

This PR is only about providing GHC and its bundled libraries with -g and enabling the stack printing functionality; the effort to convince Cabal to produce haskellPackages package sets with -g is in the other issue #51987 (comment) (which needs this PR).

Edit: I've added much of this information to a new Features section I created on https://ghc.haskell.org/trac/ghc/wiki/DWARF#Features.

@nh2
Copy link
Contributor Author

nh2 commented Dec 15, 2018

I just learned from @bgamari of a new GHC bug that shows that -g can actually change the generated code and thus potentially make things slower:

that being said, I do know that currently enabling -g does change compilation
I'm pretty sure that the change doesn't compromise correctness but it's not clear to me whether it affects performance
https://ghc.haskell.org/trac/ghc/ticket/15960
I have not investigated further
but the reduction in the join point count is concerning
and I have fixed cases in the past where source notes definitely inhibited optimisation

In the face of this, I'll change this to enableDwarf = false by default until the bug is fixed upstream.

But I'd really like to have this built by Hydra as well.

Can anybody instruct me how to achive that?

@nh2 nh2 changed the title ghc: Add enableDwarf flag, on by default. ghc: Add enableDwarf flag, off by default. Dec 15, 2018
@nh2 nh2 force-pushed the ghc-dwarf-staging branch 2 times, most recently from 47a4d99 to 9b1b37e Compare December 16, 2018 01:32
@nh2
Copy link
Contributor Author

nh2 commented Dec 16, 2018

After having added the forgotten GhcLibHcOpts = -g3 and GhcRtsHcOpts = -g3 entries, I get this error when building GHC 8.2.2 with nix's 8.2.2-binary:

/build/ghc1518_0/ghc_4.s: Assembler messages:

/build/ghc1518_0/ghc_4.s:1719:0: error:
     Error: invalid operands (.debug_frame and .note.GNU-stack sections) for `-'
     |
1719 |         .uleb128 1f-.-1
     | ^

@nh2
Copy link
Contributor Author

nh2 commented Dec 16, 2018

I get this error

@bgamari said:

IIRC it's a binutils bug
but we worked around it in 8.4.3
indeed it looks like it's fixed upstream: https://sourceware.org/bugzilla/show_bug.cgi?id=23040
but the fix hasn't made it into a release yet

nh2:

ah. Does "worked around" mean "in the GHC tree"? If yes, I could just skip building dwarf for 8.2.2

@bgamari:

yes
as of 8.4.3 we produce assembler that binutils will accept despite the bug

I think the issue is https://ghc.haskell.org/trac/ghc/ticket/15068

@nh2
Copy link
Contributor Author

nh2 commented Dec 16, 2018

Hmm, I got the assembler error even when building /nix/store/62qbyv9d40kzaz6mn9v28q5l7m35gng5-ghc-8.5.20180118.drv.

It looks like I only get it on ghc/head.nix which is currently at 8.5.20180118 - surprisingly old

, version ? "8.5.20180118"
. Probably too old.

8.4.4 and the 8.6.* releases all built.

@nh2
Copy link
Contributor Author

nh2 commented Dec 16, 2018

Here are some numbers for binary sizes:

For the Haskell hello application built on this branch with NIX_PATH=nixpkgs=. nix-build -E 'with (import <nixpkgs> {}); haskell.packages.dwarf.ghc844.hello'.

I've checked the presence of debugging symbols using objdump -Wi /nix/store/x1l9gjvk5jhdn9rf6cvbryilqz22ij7l-hello-1.0.0.2/bin/hello.

The program has "statish" linking:

% ldd /nix/store/x1l9gjvk5jhdn9rf6cvbryilqz22ij7l-hello-1.0.0.2/bin/hello
	linux-vdso.so.1 =>  (0x00007fff9713f000)
	libm.so.6 => /nix/store/fa2sgggx44vak9fsw3zgzv8xbh99a8hc-glibc-2.27/lib/libm.so.6 (0x00007fe100ffe000)
	libgmp.so.10 => /nix/store/f5wq2jnpi80pax6n63asyj095f2c1lp1-gmp-6.1.2/lib/libgmp.so.10 (0x00007fe100d6a000)
	librt.so.1 => /nix/store/fa2sgggx44vak9fsw3zgzv8xbh99a8hc-glibc-2.27/lib/librt.so.1 (0x00007fe100b62000)
	libdl.so.2 => /nix/store/fa2sgggx44vak9fsw3zgzv8xbh99a8hc-glibc-2.27/lib/libdl.so.2 (0x00007fe10095e000)
	libelf.so.1 => /nix/store/jjmdb0cg43l0b1abpphgi4ikxvych6hc-elfutils-0.175/lib/libelf.so.1 (0x00007fe100745000)
	libdw.so.1 => /nix/store/jjmdb0cg43l0b1abpphgi4ikxvych6hc-elfutils-0.175/lib/libdw.so.1 (0x00007fe1004f3000)
	libc.so.6 => /nix/store/fa2sgggx44vak9fsw3zgzv8xbh99a8hc-glibc-2.27/lib/libc.so.6 (0x00007fe10013f000)
	libpthread.so.0 => /nix/store/fa2sgggx44vak9fsw3zgzv8xbh99a8hc-glibc-2.27/lib/libpthread.so.0 (0x00007fe0fff20000)
	/nix/store/fa2sgggx44vak9fsw3zgzv8xbh99a8hc-glibc-2.27/lib/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007fe101393000)
	libz.so.1 => /nix/store/hplrkgq6qikida9zjbnlm5xk4r35d53w-zlib-1.2.11/lib/libz.so.1 (0x00007fe0ffd04000)
	liblzma.so.5 => /nix/store/xkn2jyrs31ddiyv04lypg814qqq8pmw2-xz-5.2.4/lib/liblzma.so.5 (0x00007fe0ffadd000)
	libbz2.so.1 => /nix/store/7pqz1wamgajar7c61g1dsg5475bjfvf8-bzip2-1.0.6.0.1/lib/libbz2.so.1 (0x00007fe0ff8cd000)

Sizes (normal, with strip run on it, and strip -S, which strips debug symbols only and is nixpkgs' current default, run on it; each compressed with gzip and xz default params on Ubuntu 16.04):

hello                   985 KB
hello-stripped-S        970 KB
hello-stripped          781 KB

hello.gz                286 KB
hello-stripped-S.gz     280 KB
hello-stripped.gz       235 KB

hello.xz                223 KB
hello-stripped-S.xz     219 KB
hello-stripped.xz       184 KB

From this I conclude that the size overhead of full DWARF symbols is very low compared to nixpkgs' default of strip -S for this hello program.

@nh2
Copy link
Contributor Author

nh2 commented Jan 16, 2019

(I'm not sure if hostPlatform or targetPlatform or buildPlatform is wanted here though)

@angerman Do you know that?

I've for put in a TODO as I don't know which one is right and so that we can get this merged already.

Once cross compilation to Darwin works, those that are experts in that will be able to go the last mile easily by spotting the TODO.

This is ready to get merged to staging from my side.

Copy link
Contributor

@bgamari bgamari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking forward to having this. Thanks @nh2!

@bgamari
Copy link
Contributor

bgamari commented Jan 23, 2019

That being said, I wonder if a note in the users guide is in order? Otherwise I'm a bit worried that the new package set will be quite undiscoverable.

Copy link
Contributor

@bgamari bgamari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should work out the overriding story before we commit to this since the current situation is unfortunately entirely non-composable. Perhaps instead of adding the dwarf package sets here we could just expose something like,

let withDwarfEnabled = name: args: packages."${name}".override (args // {
  ghc = bh.compiler.dwarf."${name}";
  buildHaskellPackages = bh.packages.dwarf."${name}";
  overrides =
    let
      dwarfOverrides =
        self : super : {
        # Override the mkDerivation function so that the GHC flags that
        # are needed for debugging symbols are passed to all Cabal invocations.
        mkDerivation = old: super.mkDerivation (old // {
          configureFlags = (old.configureFlags or []) ++ [
            "--enable-debug-info=3"
            "--disable-library-stripping"
            "--disable-executable-stripping"
          ];
          # Disable nixpkgs' own stripping.
          dontStrip = true;
        });
      };
    in composeExtensions dwarfOverrides args.overrides;
});

This significantly less elegant and less convenient than the current package sets but is nevertheless sufficient and avoids committing us to an approach which is fundamentally inextensible.

Perhaps a more elegant (albeit still not ideomatic) approach would be to expose an attribute set of partial applications of withDwarfEnabled as defined above:

dwarf = pkgs.lib.genAttrs normalGhcCompilers withDwarfEnabled

The user would then have to apply an empty overrides set to an attribute of dwarf to get a Haskell packages set.

@domenkozar
Copy link
Member

@nh2 any chance to rebase this? I'd love to have it in 19.03 - even if overriding is broken for now.

@nh2
Copy link
Contributor Author

nh2 commented Feb 27, 2019

@domenkozar I agree, even if the package set isn't super useful, it's good to be able to have at least the pre-built compiler downloadable from cache.nixos.org, and the ability to build many Haskell executables with DWARF.

I think we should put it into 19.03 but put a warning that it may be changed in the future.

I just pushed https://github.com/nh2/nixpkgs/tree/ghc-dwarf-19.03 but it's still building, so not sure yet if it works.

@matthewbauer
Copy link
Member

Why not just use enableSeparateDebugInfo for all compilers? That way we don’t have to build every compiler 2 times. Those patches should be okay even when you don’t want dwarf symbols.

@nh2
Copy link
Contributor Author

nh2 commented Feb 28, 2019

I just pushed https://github.com/nh2/nixpkgs/tree/ghc-dwarf-19.03 but it's still building, so not sure yet if it works.

I've built stack with it now, so the rebase worked. I also pushed it to the static-haskell-nix cachix (commit d9bda1f, build command was nix-build $(NIX_PATH=nixpkgs=. nix-instantiate -E 'with (import <nixpkgs> {}); haskell.packages.dwarf.ghc863.stack')).

@nh2
Copy link
Contributor Author

nh2 commented Feb 28, 2019

Why not just use enableSeparateDebugInfo for all compilers? That way we don’t have to build every compiler 2 times. Those patches should be okay even when you don’t want dwarf symbols.

@matthewbauer Can you elaborate a bit? I'm not super familiar with how separateDebugInfo works, and if it can easily be used with ghc.

Also, what do you mean with "those patches"?

@matthewbauer
Copy link
Member

My main concern is this doubles the number of GHC compilers we build. I like enabling debug info, but that should be in a separate debug output not a whole new compiler. separateDebugInfo strips out all of the debug symbols and puts the symbols in the debug identified by the build id. I'd really like to do this everywhere (#18530) but there are concerns with how expensive it would become.

I would propose doing this:

  • Set enableSeparateDebugInfo = true for all ghc compilers along with the -g flags and any other patches needed to get ghc to include debugging symbols.
  • Maybe set --enable-dwarf-unwind by default for compilers on Linux as long as it doesn't lead to issues in output. It definitely makes sense to disable it on some compilers where it doesn't work.
  • Add enableSeparateDebugInfo to the enableDebugging function. Possibly enable debugging for all haskellPackages, if it's not too expensive. /cc @vcunat @edolstra

As a side note, I also really dislike having integer-simple as a separate package set. It has many of the same problems as a "dwarf" package set, but there's not a good alternative available because GHC requires you to choose an integer library at build time .

@vcunat
Copy link
Member

vcunat commented Mar 3, 2019

We could estimate price for our haskellPackages specifically, if you produce code. See: #18530 (comment)

@nh2
Copy link
Contributor Author

nh2 commented Mar 3, 2019

@vcunat I left a comment with some basic statistics in #18530 (comment)

@vcunat
Copy link
Member

vcunat commented Mar 3, 2019

I admit I've read almost nothing from this thread so far :-/ but I tried this PR with ? true everywhere, and ghc won't even compile. I'm getting file number 1 already allocated; perhaps you know more, as I noticed this thread. Maybe I'm doing something wrong, though...

@nh2
Copy link
Contributor Author

nh2 commented Mar 3, 2019

@vcunat #52255 (comment) shows a nix invocation that should work on the branch here in the PR, which builds one large Haskell executable and its dependency closure (which may be interesting to compare with the dependency closure of the one you get if you leave the .dwarf away).

I can also offer to collect some info for you, if you let me know what it shall be :)

@vcunat
Copy link
Member

vcunat commented Mar 3, 2019

That evaluates to the same ghc which fails for me, three times now (two different machines). That is, after merging this PR with my branch discussed on the other thread: 0173520.

@vcunat
Copy link
Member

vcunat commented Mar 4, 2019

I can confirm on pure commit 5899036 it works for me.

@nixos-discourse
Copy link

This pull request has been mentioned on Nix community. There might be relevant details there:

https://discourse.nixos.org/t/what-are-your-goals-for-19-09/2875/23

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/ghc-with-libdw-support/6654/2

@nh2
Copy link
Contributor Author

nh2 commented Apr 13, 2020

Relevant for GHC 8.10: #69552

@cdepillabout
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet