Improve JSON and YAML exception locations #4855

asterite · 2017-08-19T15:21:18Z

This PR improves the error location reported when parsing invalid or unexpected JSON/YAML sources. Many places in the current code were using {0, 0} as a location, which probably made debugging invalid sources a bit hard.

For example, in this snippet:

require "yaml"

class Foo
  YAML.mapping({
    foo: Int32,
  })
end

source = <<-YAML
  foo: hello
  YAML

Foo.from_yaml(source)

Before we got:

Invalid Int32: hello (YAML::ParseException)

Now we get:

Invalid Int32: hello at line 1, column 6 (YAML::ParseException)

Similarly, when an attribute is missing or it's unknown (in strict mode) the reported error will be correct this time.

asterite · 2017-08-19T20:20:57Z

It seems the compiler can't compile the specs in linux 64 bits because of lack of memory (or the compiler eating it all up).

One of the reasons memory grew in the last time is because the codegen now uses one LLVM::Context pero LLVM::Module, with one module per crystal type, more or less. I could merge this and then send a PR to revert that change and use a single LLVM::Context like before, because we still don't need it (we'll need it once we have parallelism).

Or maybe we can run separately:

make crystal
make std_spec
make compiler_spec

That should consume a bit less memory...

Thoughts?

refi64 · 2017-08-19T20:29:25Z

I could merge this and then send a PR to revert that change and use a single LLVM::Context like before, because we still don't need it (we'll need it once we have parallelism).

Isn't that just delaying the inevitable, though? At some point it'll need to be resolved.

Also it's really neat to see a PR from you again!

mverzilli · 2017-08-19T20:34:25Z

Glad you took a look at that :). I was looking at it because I want to unblock #4707. Both solutions are temporal, I'd go with splitting the runnables for now, but no strong opinion. What I know is we have to apply one of them now because a long term solution will not be available anytime soon.

asterite · 2017-08-19T20:40:51Z

@kirbyfan64 Yes, this is delaying the inevitable. This is why almost every day I think "It's fun to program in Crystal, but it is scalable?". I don't have an answer for that, and from time to time I lose hope and quit. Then I think "maybe someone some day will come with a solution" and continue helping Crystal grow in areas where solutions exist. But I don't know, I personally can't see a way to make compilation modular (without dramatically changing the language) to allow compiling more and more code, and I've expressed this feeling many times now. That's why I think focusing on solving this first is the most important thing to do right now. Not Windows, not parallelism, not adding more features or fixing small bugs, but finding a way to make the language scale, and doing it, not just believing it can or will be done.

People at Manas told me there is a solution (they believe there is a solution) so in the meantime I'll continue programming stuff that I find interesting/fun without worrying about whether all of it makes sense. Much like playing a video game doesn't have a purpose other than having fun.

akzhan · 2017-08-19T20:45:01Z

@asterite what about designing of new language? Something more scalable with union types and modules.

And thread aware from start.

asterite · 2017-08-19T21:01:38Z

@akzhan I have no idea what that language could look like without being one of the existing languages (Go, Rust, Pony, Nim, D). Crystal has unique features, but those features don't exist in other statically typed languages for a very good reason (or that's at least what I think now, though I have no proof for that): they don't scale to big programs.

asterite · 2017-08-19T21:27:03Z

On the other hand, the compiler is where this issue happens most. I didn't see any other program to present this problem. The compiler has a big AST hierarchy and many, many types and lines of codes. I don't think the typical Crystal program has many lines or such a complex type hierarchy, so the problem won't happen there. By running all the specs in halves the problem will be kind of fixed.

RX14 · 2017-08-19T22:44:19Z

I don't think anyone has actually tried to take an optimiser to the compiler and perform non-architectural performance tweaks. Sure, having a compiler architecture which is inherently fast is great but I think there may be some performance low hanging fruit in the compiler, especially in memory usage, which will help scale the compiler some more.

Regarding scaling in general, the worst case possible solution is that we add a concept of "modules" which break the program into pieces, each of which can be compiled separately. That's the worst case, and to be perfectly honest, it wouldn't be the end of the world. I believe there's a solution which doesn't involve doing that, but I feel comfortable knowing there's an ugly solution which will allow us to scale. Also, microservices are cool, we could always tell people who hit compilation time problems to split up their monolith!

I also think that we can't bury our heads in the sand and give up because we don't have a solution to scaling. The truth is, the more popular Crystal gets, the more smart people are likely to work on the compiler and tackle the scaling problem for themselves. Perhaps there's some neat hack, or compiler theory, we haven't thought of to enable incremental complication which is easy to implement. Who knows.

I've been thinking about this problem and Crystal's community problem for a while and I think that growing a larger community and getting 5x, 10x the number of people involved, and most importantly the coolness factor, is likely to solve at least some of our problems.

mverzilli · 2017-08-19T22:58:15Z

In situations like this I like to think of the worst case scenario. Worst case, we reach to a point where we are convinced that Crystal doesn't scale beyond the size of its own compiler. So we end up with a great language which you can't use to write programs over 40kLOC and which forces you to split you projects in services beyond that. I don't know about the rest of the community, but for a worst case, I'd be quite damn happy with that :).

asterite · 2017-08-19T23:00:04Z

@RX14 @mverzilli Well said!

By the way, I changed bin/ci to do make crystal std_spec compiler_spec doc instead of make crystal spec doc and now the specs pass 🎉

If approved, I'll merge this PR with that change.

RX14 · 2017-08-19T23:00:12Z

@mverzilli I think it's pretty clear that you can scale beyond that inside a single executable as long as you can define seperable parts which you know will have a consistent binary ABI. And I don't see any reason why explicitly defining those modules wouldn't be possible. So I like to think the worst case is slightly better than you think!

RX14

I don't like having to change this to reduce memory usage but it's been holding up PRs and being intermittent for a while so we should just use this workaround. @asterite you mentioned another possible memory optimization, it would be nice to get that merged in addition to this if that's sane.

asterite · 2017-08-19T23:39:03Z

@RX14 Hmm, I tried using a single LLVM::Context instead of many and the memory usage, I was probably mistaken.

straight-shoota · 2017-08-20T12:53:10Z

@asterite It looks like you missed a significant verb... ;) What did the memory usage do?

asterite · 2017-08-20T13:34:13Z

Ah, the memory usage remained the same. Right now on my machine it takes about 2GB to compile the compiler. Brutal.

refi64 · 2017-08-20T13:37:06Z

A little off-topic, but would a custom GC help? I know of GCs that have options for being more memory efficient. Assuming the compiler is creating a crapton of objects, switching from Boehm would probably give either decent speed gains or memory gains or both.

refi64 · 2017-08-20T13:37:56Z

e.g. http://rpython.readthedocs.io/en/latest/garbage_collection.html#minimark-gc

asterite · 2017-08-20T13:40:27Z

@kirbyfan64 The problem is that the compiler is creating a crapton of objects that can't be released with the current compiler's design. It's not about the GC.

refi64 · 2017-08-20T13:47:04Z

It can be, though!

A custom GC allows a custom allocator and more intimate language knowledge. You can implement language-specific optimizations and adjust the internal memory management structures accordingly. Even a savings of a word per object can seriously add up when thousands of objects are allocated.

In addition, generational GCs work better in languages where objects are more frequently allocated like Crystal.

I mean, at some point Crystal's going to outgrow Boehm anyway. It might be at least work some slight investigation.

RX14 · 2017-08-20T15:20:30Z

A different GC isn't going to change which object can or cannot be GCed.

RX14 · 2017-08-20T15:21:28Z

@asterite 2GB is nothing compared to all_spec which takes 8gb of ram to compile.

Improve JSON and YAML exception locations

65d30e9

asterite added kind:feature pr:needs-review topic:stdlib labels Aug 19, 2017

asterite self-assigned this Aug 19, 2017

RX14 approved these changes Aug 19, 2017

View reviewed changes

ci: run std and compiler specs separately

1147a21

mverzilli approved these changes Aug 19, 2017

View reviewed changes

RX14 approved these changes Aug 19, 2017

View reviewed changes

asterite merged commit e811cf9 into master Aug 19, 2017

asterite deleted the feature/improve-json-yaml-error-location branch August 19, 2017 23:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve JSON and YAML exception locations #4855

Improve JSON and YAML exception locations #4855

asterite commented Aug 19, 2017

asterite commented Aug 19, 2017

refi64 commented Aug 19, 2017

mverzilli commented Aug 19, 2017

asterite commented Aug 19, 2017

akzhan commented Aug 19, 2017 •

edited

Loading

asterite commented Aug 19, 2017

asterite commented Aug 19, 2017 •

edited

Loading

RX14 commented Aug 19, 2017 •

edited

Loading

mverzilli commented Aug 19, 2017

asterite commented Aug 19, 2017

RX14 commented Aug 19, 2017

RX14 left a comment •

edited

Loading

asterite commented Aug 19, 2017

straight-shoota commented Aug 20, 2017

asterite commented Aug 20, 2017

refi64 commented Aug 20, 2017

refi64 commented Aug 20, 2017

asterite commented Aug 20, 2017

refi64 commented Aug 20, 2017

RX14 commented Aug 20, 2017

RX14 commented Aug 20, 2017

Improve JSON and YAML exception locations #4855

Improve JSON and YAML exception locations #4855

Conversation

asterite commented Aug 19, 2017

asterite commented Aug 19, 2017

refi64 commented Aug 19, 2017

mverzilli commented Aug 19, 2017

asterite commented Aug 19, 2017

akzhan commented Aug 19, 2017 • edited Loading

asterite commented Aug 19, 2017

asterite commented Aug 19, 2017 • edited Loading

RX14 commented Aug 19, 2017 • edited Loading

mverzilli commented Aug 19, 2017

asterite commented Aug 19, 2017

RX14 commented Aug 19, 2017

RX14 left a comment • edited Loading

Choose a reason for hiding this comment

asterite commented Aug 19, 2017

straight-shoota commented Aug 20, 2017

asterite commented Aug 20, 2017

refi64 commented Aug 20, 2017

refi64 commented Aug 20, 2017

asterite commented Aug 20, 2017

refi64 commented Aug 20, 2017

RX14 commented Aug 20, 2017

RX14 commented Aug 20, 2017

akzhan commented Aug 19, 2017 •

edited

Loading

asterite commented Aug 19, 2017 •

edited

Loading

RX14 commented Aug 19, 2017 •

edited

Loading

RX14 left a comment •

edited

Loading