Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize bc+obj codegen, focus: devel rebuilds. #2591

Closed

Conversation

ozra
Copy link
Contributor

@ozra ozra commented May 13, 2016

  • Renders bc-file in RAM
  • Compares cached bc-file with ram-buffer
  • Writes out if changed
  • Generates obj if changed
  • File writes always go via a temp-file -> rename now
  • bc-flags changes incorporated in cached files names
  • Removes forking for each render
  • Only single threaded concurrency now (could be improved further using core-count threads with one part of the list each)
  • On my machine I consistently get about 2s bc-obj-phase in re-compile (when working on the parser in the compiler for instance) compared to about 40s before - (for the bc-obj-phase alone).
    • Full compile in re-compile is about 20s vs ca 60s before (on my slow HDD machine).
  • Again: the focus is improving compile time for re-compile in devel mode, when there's obviously changes in just a few files each iteration.
  • Please try and bench it with similar test-case on your machine, comparing with main-compiler.
  • Bonus: I can actually do other things while compiling now - before my machine completely froze during compile.

@asterite
Copy link
Member

I think it failed because Travis still uses llvm 3.5... but the compiler doesn't use that version anymore and should work with 3.6. This needs to be fixed somewhere...

@asterite
Copy link
Member

It uses llvm 3.5 because we tell it so: https://github.com/crystal-lang/crystal/blob/master/bin/ci#L68

@luislavena
Copy link
Contributor

@asterite the other failure is caused by format changes in compiler.cr.

@ozra
Copy link
Contributor Author

ozra commented May 13, 2016

Ah, sorry, didn't run format!

@ozra ozra force-pushed the codegen-bc-obj-phase-optimization branch from 7088880 to 3361ef0 Compare May 13, 2016 14:38
@ozra ozra force-pushed the codegen-bc-obj-phase-optimization branch from 3361ef0 to b7a111b Compare May 13, 2016 14:47
@ozra
Copy link
Contributor Author

ozra commented May 13, 2016

Now format should be there X-)

@luislavena
Copy link
Contributor

32bits error might be a fluke of Travis worker memory while OSX error seems to be related to LLVM version Ary mentioned.

@asterite any particular reason we don't build against 3.6 now? What is the status of OSX and LLVM in this matter?

@asterite
Copy link
Member

@luislavena Everything should work well with LLVM 3.6. We could even simplify omnibus-crystal to use it, I think right now it uses LLVM 3.5. The reason why it's using an old version is because it works and it's always easy to keep thing unchanged when they work ¯_(ツ)_/¯

@ozra
Copy link
Contributor Author

ozra commented May 15, 2016

Ok, so I'll let this "hang" meanwhile, since it won't work with <3.something? I'll see what I can do about it, but I'm kinda keen on getting the codegen working against 3.8+, I might look into making that.

@asterite
Copy link
Member

@ozra I was planning on testing this PR next week, and updating travis and others if it works well

@ozra
Copy link
Contributor Author

ozra commented May 15, 2016

Alright, cool.

@asterite
Copy link
Member

@ozra Did you try this on linux or mac?

@ozra
Copy link
Contributor Author

ozra commented May 17, 2016

Linux is all I run, are there weird problems? I'll be happy to do what I can to make it work, because it's quite nice with faster compiles :)

@asterite
Copy link
Member

I believe the performance you get is similar to the ones here #745 . Basically, forking a lot in linux is slow, but in mac it seems to be fast enough.

@asterite
Copy link
Member

asterite commented May 17, 2016

I say because I tried your changes on Mac and it's always is slower than before.

@ozra
Copy link
Contributor Author

ozra commented May 17, 2016

Aha, interesting. What are the speeds comparatively in Mach?

@asterite
Copy link
Member

@ozra I can't remember, but it was like two or three times slower than what we have now.

Probably the best thing to do is to use fork in mac and use spawn in linux, using your PR.

@ozra
Copy link
Contributor Author

ozra commented May 20, 2016

@asterite, I'll re-work this to use a ifdef, and also look in to the "just create n-threads workers handling a list each" and update. I googled around, but found no benches other than pagefaults being slower in Darwin then in Linux kernel (but they were dated so it might have changed a lot), so I haven't figured out what could be the main culprit (and on "what side"). Two different approaches will no doubt be simplest, but utilizing as many cores but with only using initial forking before looping should benefit both.

@ozra
Copy link
Contributor Author

ozra commented Sep 2, 2016

@asterite - What's the status on LLVM-versions and such now - I was thinking about re-implementing this with the platform-conditional - the speed up is badly needed for us Linux folks.

@asterite
Copy link
Member

asterite commented Sep 2, 2016

@ozra We are still using LLVM 3.5~3.6 for releasing, because LLVM 3.8 needs an entirely different way to be built.

@ozra
Copy link
Contributor Author

ozra commented Sep 2, 2016

Alright.

@asterite
Copy link
Member

@ozra I finally combined ideas from #745 and your PR and put them all in c792139 . I used memory buffers from LLVM and compare that to the old .bc before writing them, which is slightly faster than writing to disk. Now also only a limited number of forks (8) is done, so this should be considerably faster on linux. And on mac it's already faster. So thank you for this PR! :-)

@asterite asterite closed this Dec 16, 2016
@ozra
Copy link
Contributor Author

ozra commented Mar 2, 2017

Awesome! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants