Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backtrace line numbers #3303

Merged

Conversation

ysbaddaden
Copy link
Contributor

@ysbaddaden ysbaddaden commented Sep 14, 2016

Alternative to #3170 which implements ELF, Mach-O and DWARF parsers (instead on relying on the external addr2line executable) to parse the running program, looking for the .debug_line section to be decompressed to print a backtrace which includes the originating file line and columns for each line of the backtrace.

For example:

0x452124: ??? at /home/github/crystal/test_addr2line.cr 4:5
0x42be4e: ??? at /home/github/crystal/test_addr2line.cr 13:13
0x428043: ??? at /home/github/crystal/src/signal.cr 129:1
0x42bd29: main at /home/julien/.cache/crystal/macro67505680.cr 12:15
0x7fcffd0b9f45: __libc_start_main at ??
0x4279f9: ??? at ??
0x0: ??? at ??

As you may notice, the symbol names are sometimes not properly found. Work as begun to parse the DWARF address to symbol debug information. Since the binaries generated on Linux don't contain the .debug_aranges section (neither with gcc or clang), I begun to reconstruct it from the .debug_info (which relies on the .debug_abbrev and .debug_str sections) instead of creating a parser for DWARF aranges.

TODO:

Hopefully OSX now uses DWARF instead of STABS, so we won't have to add support for another debug format to support OSX. OSX keep the DWARF sections in the .o files but will NEVER put them into the executable 😭

POSTPONED (follow up pull request):

  • print the function name from the DWARF info (then fallback on seeking the symbol with dladdr)
  • parse symtabs section of ELF files (when DWARF info is missing)
  • parse symtabs section of Mach-O files (when DWARF info is missing)

Sorry, something went wrong.

elf.read_section?(".debug_line") do |sh, io|
Debug::DWARF::LineNumbers.new(io, sh.size)
end
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to memoize the decompressed DWARF line numbers section in memory. This consumes some memory (around 8MB for the Crystal compiler) but this is only generated when printing the backtraces, and it will save time to avoid decompressing the whole section each time an exception is raised, for a service that rescues exceptions and logs them.

@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Sep 16, 2016

Here comes support for parsing Mach-O files... except that the DWARF sections aren't located in the executable Mach-O file itself (unlike ELF), but in its companion dSYM file (which is also a Mach-O file) which isn't even generated unless we ask for it using dsymutil (or so it seems).

I'm not exactly sure what to do.

  • The dumbest idea would be for crystal to call dsymutil and save a foo.dsym file alongside a foo executable
  • The craziest idea would be to call dsymutil, parse th resulting dSYM, and... reinject the relevant __debug_* sections back into the executable :neckbeard:

@ysbaddaden
Copy link
Contributor Author

Note that Go actually generates its own sections (__gopclntab for pc -> line numbers and __gosymtab for symbols) using a Plan9 inspired format, and injects them somehow into the compiled executable (for all platforms it seems).

@ysbaddaden
Copy link
Contributor Author

Injecting segments and sections at link time is relatively easy. We just have to pass -createsect segname sectname file. Hence I believe this is what Go does. I couldn't find tools to inject sections after the executable is linked.

Calling dsymutil --flat foo creates the foo.dwarf dSYM Mach-O companion file with the wanted __DWARF segment.

@ysbaddaden
Copy link
Contributor Author

@waj I investigated parsing the DWARF directly from the .o files: it's hard!

The .o files are Mach-O object files (easy) and they indeed have a __DWARF segment with the expected sections (eg: __debug_line, yay). The DWARF sections are now segmented into many files instead of having a single section for the executable, and we need to manually map/offset them (argh). I found the debug map in the executable as stabs entries in the __symtab section, but this is a mapping of symbols for each .o file, but I got lost understanding how to remap the __debug_line section... (which maps address ranges to file:line:column, not symbols).

The best information I could find (not actual documentation):

I propose to:

  1. run dsymutil to generate a .dwarf file in crystal's cache folder (alongside the .o files);
  2. look for a DSYM file, respecting Apple's standard, that is look for a .dSYM folder a .dwarf file (checking the UUID to invalidate older copies), first alongside the binary (if the user ran dsymutil herself, for shipping OSX software for example), then fallback to look inside the crystal cache folder.

I'm too lazy to invest more time in redoing what the dsymutil tool is doing (for a platform I don't use), so I propose to merely use it, open a new issue about mapping the DWARF from the .o files. Maybe someone will be motivated enough to work on that.

Line Numbers in optimized code

As per the LLVM documentation, the DWARF information should be as proper in optimized builds as in non-optimized builds, so line number information (and everything else) should still be relevant when shipping software built with --release. See http://llvm.org/docs/SourceLevelDebugging.html#debugging-optimized-code

@ysbaddaden
Copy link
Contributor Author

Now with support for OSX... somewhat... that is only if there is an up-to-date dSYM bundle generated with dsymutil, either as foo.dwarf (--flat) or as foo.dSYM/Contents/Resources/DWARF/foo (default).

Example:

$ crystal build --debug foo.cr
$ dsymutil --flat foo
$ foo
OH NO (DEF)! (Exception)
0x10b260c0b: *Kls#callee1:NoReturn at /Users/julien/test_addr2line.cr 4:5
0x10b2396f1: *callee3:NoReturn at /Users/julien/test_addr2line.cr 13:13
0x10b235c5f: __crystal_main at /Users/julien/crystal/src/signal.cr 129:1
0x10b2395c8: main at /Users/julien/.cache/crystal/macro4378508480.cr 12:15

Maybe Crystal could run dsymutil by itself whenever compiling with --debug (either in build, run or eval), or maybe we could call it when decoding a backtrace? Thought we'll have the same issue than in #3170 (comment)

@asterite
Copy link
Member

It seems for 32 bits a type is wrong. After that I think this should be green :-)

@luislavena
Copy link
Contributor

Hello @ysbaddaden, this is pure gold! 😄

I did a quick test and compared with lldb (how I use it right now) and your changes:

def foo
  bar
end

def bar
  baz
end

def baz
  raise "BOOM"
end

foo
$ bin/crystal build --debug foo.cr
Using compiled compiler at .build/crystal

Current: https://gist.github.com/luislavena/b323cbd50600171f355cad4c31e1e0f1#file-run-foo-log
LLDB: https://gist.github.com/luislavena/b323cbd50600171f355cad4c31e1e0f1#file-lldb-foo-log
This: https://gist.github.com/luislavena/b323cbd50600171f355cad4c31e1e0f1#file-core-trace-foo-log

Noticed this misses some of the symbols from the callstack (foo & bar), but got the last one correctly.

Wondering if there are chances we can learn something from Rust and backtraces-rs

@ysbaddaden
Copy link
Contributor Author

@luislavena I don't decode the function names from the DWARF sections (yet). It's still using dladdr, which fails to get some symbols. Hence the difference in *baz:NoReturn (symbol) in the backtrace vs baz (dwarf) in LLDB.

@ysbaddaden
Copy link
Contributor Author

I still have more work to achieve because of ASLR on some platforms. I already fixed it manually for OSX (using _dyld_image_slice) but OpenBSD also has ASLR which I fixed with dladdr. I hoped it would be a cross platform solution, but I don't have the same behavior in Linux (weak ASLR) than OpenBSD. I'll have to test on Alpine which has a patched kernel with strong ASLR, and do other round of tests on the different platforms...

@ysbaddaden ysbaddaden force-pushed the core-backtrace-line-numbers branch from 030c4f7 to fb14b1c Compare October 12, 2016 08:43
@RX14 RX14 mentioned this pull request Oct 25, 2016
@ysbaddaden ysbaddaden force-pushed the core-backtrace-line-numbers branch from d6e0161 to e2442fd Compare November 5, 2016 23:20
@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Nov 6, 2016

Almost done:

I fixed the ASLR issue on Linux with the PaX patch applied (e,g, Alpine Linux): just use dladdr to offset the address, like in OpenBSD and other platforms it would seem. ~~Maybe OSX doesn't need a special treatment too. I must verify that (EDIT: it needs special treatment). I must also check that FreeBSD and OpenBSD didn't break.

I also found an issue in DWARF::LineNumbers which wrongly expected the matrix' sequences to have growing addresses —found on Alpine Linux.

Symbols will come in another pull request. OSX prints all symbols, but linux is missing a lot, and on ARM, it doesn't print a single one :'(

@ysbaddaden ysbaddaden force-pushed the core-backtrace-line-numbers branch from 191de70 to a3ce5b6 Compare November 6, 2016 09:54
@ysbaddaden
Copy link
Contributor Author

It looks like Travis eventually fails because crystal tool format wants to format some generated macro files in .crystal 😕

@ysbaddaden ysbaddaden changed the title Core backtrace line numbers [WIP] Backtrace line numbers Nov 6, 2016
@ysbaddaden
Copy link
Contributor Author

@asterite done!

@asterite
Copy link
Member

asterite commented Nov 6, 2016

So cool!!! I probably won't have time to try this until the next week (I'll be speaking at oredev), so feel free to ping others to try this, maybe @waj would like to give it a shot.

Supports all currently supported platforms: Darwin, FreeBSD, Linux
and OpenBSD. Takes care to offset the unwinded addresses to match
the fixed addresses of the DWARF sections whtn the platform uses
ASLR.
Skips any call in the callstack that relates to creating the
exception, unwinding the stack or raising the exception, so the
first backtrace entry is the call that did raise an exception.

closes crystal-lang#3161
@ysbaddaden ysbaddaden force-pushed the core-backtrace-line-numbers branch from d8f696f to 04a529c Compare November 7, 2016 10:27
@ysbaddaden
Copy link
Contributor Author

Rebased on master with a clean commit history, and removed Debug::DWARF::Info and Debug::DWARF::Abbrev parsers, that will come in an upcoming pull request, where I'll replace the raw symbols for the nice function name stored in the .debug_aranges sections or from .debug_info if the former is missing (it happens).

require "tempfile"

describe "Backtrace" do
it "prints file line:colunm" do
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line:column 😉

@ysbaddaden ysbaddaden merged commit bdd3170 into crystal-lang:master Nov 18, 2016
@ysbaddaden ysbaddaden deleted the core-backtrace-line-numbers branch November 18, 2016 16:13
@Sija
Copy link
Contributor

Sija commented Dec 7, 2016

Backtrace lines for Proc are not consistent with the rest.

Instead of showing the file lineno:colno pattern after at, it shows @file:lineno string after method signature. See below:

0x106371d56: ~procProc(Nil)@/usr/local/Cellar/crystal-lang/0.20.1/src/http/server.cr:148 at ??
0x106361819: ~proc2Proc(Fiber, (IO::FileDescriptor | Nil))@/usr/local/Cellar/crystal-lang/0.20.1/src/fiber.cr:29 at ??
0x10636e3fb: ~proc4Proc(HTTP::Server::Context, String)@lib/kemal/src/kemal/route.cr:11 at ??

@asterite
Copy link
Member

asterite commented Dec 7, 2016

That is actually the name of the proc that is being generated, and not part of the backtrace that @ysbaddaden is generating

@Sija
Copy link
Contributor

Sija commented Dec 7, 2016

@asterite Would it be possible to move it into the backtrace part of the line?

@RX14
Copy link
Member

RX14 commented Dec 7, 2016

@Sija But it's part of the function symbol name, and we should display the exact function symbol names to aid using C debuggers and the likes.

@ysbaddaden
Copy link
Contributor Author

@Sija I suspect the compiler doesn't set the location when generating the LLVM IR for procs, because since it's a symbol, it should have a location in DWARF.

@RX14 actually we should display the function name from DWARF, instead of the mangled name. I don't know what name are given to procs, though.

@asterite
Copy link
Member

asterite commented Dec 7, 2016

Some time ago the name given to procs like ->{ ... } or captured blocks was ~proc1, ~proc2, etc. Not very descriptive. Now, since that's not very helpful, I decided to add their full location to the name. That way when they are shown in the backtrace you can trace back their location, even without compiling with -d. For example:

def foo
  a = ->{
    raise "OH NO!"
  }
  a.call
end

foo

Running it we get:

OH NO! (Exception)
0x108bbe662: *CallStack::unwind:Array(Pointer(Void)) at ??
0x108bbe601: *CallStack#initialize:Array(Pointer(Void)) at ??
0x108bbe5d8: *CallStack::new:CallStack at ??
0x108bba7a1: *raise<Exception>:NoReturn at ??
0x108bba781: *raise<String>:NoReturn at ??
0x108bbdcb3: ~procProc(NoReturn)@foo.cr:2 at ??
0x108bbdc87: *foo:NoReturn at ??
0x108bba348: __crystal_main at ??
0x108bbdb28: main at ??

Note how we can go back from the stack trace to every line in our program, except the line that effectively threw the exception. For methods, we'll, we just search def foo in our program in the corresponding type. For a proc, the info is right there.

We can change it back to ~procN, but then this info will be lost even without compiling with -d, so I'm not sure. I don't think those names bother much (on the contrary, I think they are very useful)

@RX14
Copy link
Member

RX14 commented Dec 7, 2016

@ysbaddaden I think we should have the option to display mangled names at least

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants