-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize bc+obj codegen, focus: devel rebuilds. #2591
Conversation
ozra
commented
May 13, 2016
•
edited
Loading
edited
- Renders bc-file in RAM
- Compares cached bc-file with ram-buffer
- Writes out if changed
- Generates obj if changed
- File writes always go via a temp-file -> rename now
- bc-flags changes incorporated in cached files names
- Removes forking for each render
- Only single threaded concurrency now (could be improved further using core-count threads with one part of the list each)
- On my machine I consistently get about 2s bc-obj-phase in re-compile (when working on the parser in the compiler for instance) compared to about 40s before - (for the bc-obj-phase alone).
- Full compile in re-compile is about 20s vs ca 60s before (on my slow HDD machine).
- Again: the focus is improving compile time for re-compile in devel mode, when there's obviously changes in just a few files each iteration.
- Please try and bench it with similar test-case on your machine, comparing with main-compiler.
- Bonus: I can actually do other things while compiling now - before my machine completely froze during compile.
I think it failed because Travis still uses llvm 3.5... but the compiler doesn't use that version anymore and should work with 3.6. This needs to be fixed somewhere... |
It uses llvm 3.5 because we tell it so: https://github.com/crystal-lang/crystal/blob/master/bin/ci#L68 |
@asterite the other failure is caused by format changes in |
Ah, sorry, didn't run format! |
7088880
to
3361ef0
Compare
3361ef0
to
b7a111b
Compare
Now format should be there X-) |
32bits error might be a fluke of Travis worker memory while OSX error seems to be related to LLVM version Ary mentioned. @asterite any particular reason we don't build against 3.6 now? What is the status of OSX and LLVM in this matter? |
@luislavena Everything should work well with LLVM 3.6. We could even simplify omnibus-crystal to use it, I think right now it uses LLVM 3.5. The reason why it's using an old version is because it works and it's always easy to keep thing unchanged when they work ¯_(ツ)_/¯ |
Ok, so I'll let this "hang" meanwhile, since it won't work with <3.something? I'll see what I can do about it, but I'm kinda keen on getting the codegen working against 3.8+, I might look into making that. |
@ozra I was planning on testing this PR next week, and updating travis and others if it works well |
Alright, cool. |
@ozra Did you try this on linux or mac? |
Linux is all I run, are there weird problems? I'll be happy to do what I can to make it work, because it's quite nice with faster compiles :) |
I believe the performance you get is similar to the ones here #745 . Basically, forking a lot in linux is slow, but in mac it seems to be fast enough. |
I say because I tried your changes on Mac and it's always is slower than before. |
Aha, interesting. What are the speeds comparatively in Mach? |
@ozra I can't remember, but it was like two or three times slower than what we have now. Probably the best thing to do is to use fork in mac and use spawn in linux, using your PR. |
@asterite, I'll re-work this to use a ifdef, and also look in to the "just create n-threads workers handling a list each" and update. I googled around, but found no benches other than pagefaults being slower in Darwin then in Linux kernel (but they were dated so it might have changed a lot), so I haven't figured out what could be the main culprit (and on "what side"). Two different approaches will no doubt be simplest, but utilizing as many cores but with only using initial forking before looping should benefit both. |
@asterite - What's the status on LLVM-versions and such now - I was thinking about re-implementing this with the platform-conditional - the speed up is badly needed for us Linux folks. |
@ozra We are still using LLVM 3.5~3.6 for releasing, because LLVM 3.8 needs an entirely different way to be built. |
Alright. |
@ozra I finally combined ideas from #745 and your PR and put them all in c792139 . I used memory buffers from LLVM and compare that to the old .bc before writing them, which is slightly faster than writing to disk. Now also only a limited number of forks (8) is done, so this should be considerably faster on linux. And on mac it's already faster. So thank you for this PR! :-) |
Awesome! :-) |