Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: faster compilation for JSON::MappingError #6111

Merged
merged 1 commit into from May 22, 2018

Conversation

asterite
Copy link
Member

Passing Class as an argument is a really bad idea: the method will be instantiated once for each class that invokes JSON.mapping. This makes compilation really slow. Also, so many overloads are not necessary, the class is only used in that file in four places.

Related to #6082 (comment)

@asterite asterite mentioned this pull request May 20, 2018
@RX14
Copy link
Contributor

RX14 commented May 20, 2018

Would be nice to have benchmarks for compiling your mapping test using the old stdlib, the master stdlib, and the stdlib in this PR using the same compiler build.

@asterite
Copy link
Member Author

It doesn't matter if it's master or not. Try compiling the code I posted in the other issue with this PR and without it, and you'll see the difference.

@asterite
Copy link
Member Author

It would be nice to find other places in the standard library and compiler where simple changes like this improve compilation speed, but I don't know how to detect that. Maybe a tool that would show how many instantiations a method has, sorted by that.

@oprypin
Copy link
Member

oprypin commented May 20, 2018

@asterite, I've been messing with this piece of code that shows all instantiations, and sorting the results by count seems easily doable.
The Location can be used as a key, except when macros are involved...

@asterite
Copy link
Member Author

@oprypin Yup, I was going to mention you because I know you've been playing with it. I think DefInstanceKey#def_object_id can be used as a key, though the location is probably good too.

Here's a diff to quickly try this out

Here's the result when trying out the code before this PR:

$ bin/crystal build json_test_old.cr --no-codegen
Using compiled compiler at `.build/crystal'
JSON::MappingError:Class - new:cause (300)
JSON::MappingError - initialize (300)
JSON::MappingError:Class - new (200)
String::Builder - << (114)
Foo0 - class (111)

Hehe, it seems that was the culprit after all. 300 instantiations of new with a cause named argument, plus 300 of initialize, plus another 200 for the new. You can bet that will be slow.

With this PR:

JSON::MappingError:Class - new (3)
JSON::MappingError - initialize (3)

But for some reason I also see:

Foo0 - class (111)
Foo0:Class - name (100)
Foo0:Class - to_s (100)
Foo0:Class - to_s (100)
Foo0:Class - from_json (100)

I'll investigate that a bit, though now I'm curious to know which methods are troublesome in the compiler...

@asterite
Copy link
Member Author

I updated the gist and it now shows percentages. So now we can see:

TOTAL: 1620
JSON::MappingError:Class - new:cause (300) (18.52%)
JSON::MappingError - initialize (300) (18.52%)
JSON::MappingError:Class - new (200) (12.35%)
String::Builder - << (114) (7.04%)

That represents half of the total instantiations in the program. Now wonder it was 1 second slower...

String::Builder#<< is another thing too, but it's a small method that immediately delegates to obj.to_s, so it might not be that terrible (the problem mostly happens with big methods).

With the compiler I see:

TOTAL: 10870
Crystal::Nop - raise (796) (7.32%)
Pointer(UInt8) - []= (549) (5.05%)
Pointer(UInt8) - value= (524) (4.82%)
Crystal::FrozenTypeException:Class - for_node (522) (4.8%)
Crystal::ToSVisitor - end_visit (463) (4.26%)
Crystal::Nop - accept (400) (3.68%)
Crystal::ToSVisitor - end_visit_any (283) (2.6%)
Pointer(UInt8) - + (278) (2.56%)
Array(LLVM::BasicBlock) - push (275) (2.53%)
Pointer(UInt8) - [] (271) (2.49%)
String::Builder - << (267) (2.46%)
Pointer(UInt8) - value (249) (2.29%)
Pointer(UInt8) - + (243) (2.24%)
Pointer(UInt8):Class - malloc (242) (2.23%)
Pointer(UInt8):Class - malloc (238) (2.19%)
Crystal::Nop - at (219) (2.01%)
Crystal::Expressions - set_type_from (218) (2.01%)
String - == (215) (1.98%)
Array(LLVM::BasicBlock) - << (209) (1.92%)
Crystal::Nop - to_s (206) (1.9%)
Pointer(UInt8) - realloc (199) (1.83%)
Pointer(UInt8) - realloc (198) (1.82%)
Hash::Entry(String, Crystal::Type) - initialize (197) (1.81%)
Hash::Entry(String, Crystal::Type):Class - new (197) (1.81%)
String - object_id (195) (1.79%)
Hash(String, String) - insert_in_bucket (177) (1.63%)
Crystal::Formatter - visit_any (177) (1.63%)
Hash(String, String) - []= (177) (1.63%)
Crystal::Nop - doc= (174) (1.6%)
Crystal::ClassVarsInitializerVisitor - nesting_exp? (174) (1.6%)
Crystal::ClassVarsInitializerVisitor - end_visit_any (174) (1.6%)
Hash::Entry(String, Crystal::Type) - next= (168) (1.55%)
Hash(String, String) - bucket_index (153) (1.41%)
Pointer(UInt8):Class - new (152) (1.4%)
Pointer(UInt8):Class - null (152) (1.4%)
Array(LLVM::BasicBlock):Class - allocate (150) (1.38%)
Crystal::Call - initialize:name_column_number (148) (1.36%)
Crystal::MainVisitor - visit_any (146) (1.34%)
Crystal::Nop - add_observer (142) (1.31%)
Crystal::Not - initialize (141) (1.3%)
Crystal::Not:Class - new (141) (1.3%)
Hash::Entry(String, Crystal::Type) - value= (137) (1.26%)
Crystal::MacroInterpreter::ReplaceBlockVarsTransformer - before_transform (134) (1.23%)
Crystal::Nop - transform (134) (1.23%)
Array(LLVM::BasicBlock) - size (131) (1.21%)
Hash(String, String) - find_entry_in_bucket (130) (1.2%)
Hash(String, String) - find_entry (130) (1.2%)
Array(Crystal::ASTNode) - unsafe_at (125) (1.15%)
Crystal::MacroInterpreter::ReplaceBlockVarsTransformer - after_transform (122) (1.12%)
Crystal::Nop - end_location= (121) (1.11%)
Array(LLVM::BasicBlock) - resize_to_capacity (113) (1.04%)
Array(LLVM::BasicBlock) - double_capacity (113) (1.04%)
Array(LLVM::BasicBlock) - check_needs_resize (113) (1.04%)
Crystal::Return - initialize (111) (1.02%)
Crystal::Return:Class - new (111) (1.02%)

Looots of redundant code generated, I guess.

I'm also seeing many primitives being instantiated, that's something that might not be correct, I'll investigate that too.

I don't know what to think about this. Would it be good to have such tool, so when compile times become big you'll refactor your code so that it compiles faster? I don't know of another language where you'd have to do that, and I don't know how annoying it could be... or how to preserve those "nice" compile times later and not have someone accidentally screw all of that.

@RX14
Copy link
Contributor

RX14 commented May 20, 2018

How much harder would it be to measure the time spent on each instantiation as well?

@asterite
Copy link
Member Author

@RX14 That's much harder. An instantiation will instantiate other stuff... I don't know how to measure that.

@asterite
Copy link
Member Author

asterite commented May 20, 2018

It would be sooo much easier if all methods were typed. Then there would be just one instantiation per method definition, unless the method is "generic" (forall T...).

@RX14
Copy link
Contributor

RX14 commented May 20, 2018

@asterite is there one "entry point" method for the start of an instantiation? In that case since we're single-threaded we can store the current instantiation in a stack in a global variable, and associate the instantiation at the top of the stack with a time. So at the start of the instantiation entrypoint, we:

  • work out the time since the current instantiation at the top of the stack started and compute the time spent, add it to the accumulated time for that expansion.
  • record the current instantation in the top of the stack as starting now and having an accumulated time of zero
    on the method exit do the opposite (pop the expansion descriptor off the stack, put it in a "finished expansions" array, set the start time of the expansion at the top of the stack as now, ret)

And yes, it's much easier if all methods were typed. That's how every other language does it and it's why crystal is an order of magnitude more expressive than those languages.

@asterite
Copy link
Member Author

asterite commented May 20, 2018

@RX14 Yeah, that probably can be done. Not sure what we'll use it for, though.

One thing I see that could be improved, probably without a language change, is how methods are instantiated regarding self.

With this code:

class Foo
  def foo
  end
end

class Bar < Foo
end

Foo.new.foo
Bar.new.foo

The (updated, it had a bug because instantiations didn't take into account the real type that hold the method, that's why there were 38 instantiations of Pointer(UInt8)#value) tool tells:

TOTAL: 7
Foo:Class - allocate (1) (14.29%)
Foo:Class - new (1) (14.29%)
Bar:Class - allocate (1) (14.29%)
Bar:Class - new (1) (14.29%)
Bar - foo (1) (14.29%)
Foo - foo (1) (14.29%)
<Program> - main (1) (14.29%)

Now that there's Bar#foo and Foo#foo, even though Bar doesn't override foo. The compiler will instantiate that superclass method specifically for the subclass, to possibly avoid doing "virtual" calls if there are overloads of other called method.

For example:

class Foo
  def foo
    virtual
  end

  def virtual
    1
  end
end

class Bar < Foo
  def virtual
    "foo"
  end
end

p typeof(Foo.new.foo) # => Int32
p typeof(Bar.new.foo) # => String

That's closer to Ruby, because if you do Bar.new.foo there's no way the virtual method from Foo will be called.

The disadvantage is that if there's a huge class hierarchy (like that of ASTNode, or even Exception) then a lot of redundant methods will be generated.

Maybe in this case the compiler should generate the method once, passing self as Foo+, but of course the semantic changes a bit and now the return type will be Int32 | String. Maybe that's fine though, I don't know, we'll have to try. But I think that will reduce the number of instantiations quite a bit (though it will always be linear relative to the total code size).

@Sija
Copy link
Contributor

Sija commented May 21, 2018

Maybe in this case the compiler should generate the method once, passing self as Foo+, but of course the semantic changes a bit and now the return type will be Int32 | String.

Ouch, sounds like a big disadvantage to me.

@sdogruyol sdogruyol merged commit 8c24616 into master May 22, 2018
@sdogruyol sdogruyol added this to the Next milestone May 22, 2018
chris-huxtable pushed a commit to chris-huxtable/crystal that referenced this pull request Jun 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants