Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve JSON and YAML exception locations #4855

Merged
merged 2 commits into from
Aug 19, 2017

Conversation

asterite
Copy link
Member

This PR improves the error location reported when parsing invalid or unexpected JSON/YAML sources. Many places in the current code were using {0, 0} as a location, which probably made debugging invalid sources a bit hard.

For example, in this snippet:

require "yaml"

class Foo
  YAML.mapping({
    foo: Int32,
  })
end

source = <<-YAML
  foo: hello
  YAML

Foo.from_yaml(source)

Before we got:

Invalid Int32: hello (YAML::ParseException)

Now we get:

Invalid Int32: hello at line 1, column 6 (YAML::ParseException)

Similarly, when an attribute is missing or it's unknown (in strict mode) the reported error will be correct this time.

@asterite
Copy link
Member Author

It seems the compiler can't compile the specs in linux 64 bits because of lack of memory (or the compiler eating it all up).

One of the reasons memory grew in the last time is because the codegen now uses one LLVM::Context pero LLVM::Module, with one module per crystal type, more or less. I could merge this and then send a PR to revert that change and use a single LLVM::Context like before, because we still don't need it (we'll need it once we have parallelism).

Or maybe we can run separately:

make crystal
make std_spec
make compiler_spec

That should consume a bit less memory...

Thoughts?

@refi64
Copy link
Contributor

refi64 commented Aug 19, 2017

I could merge this and then send a PR to revert that change and use a single LLVM::Context like before, because we still don't need it (we'll need it once we have parallelism).

Isn't that just delaying the inevitable, though? At some point it'll need to be resolved.

Also it's really neat to see a PR from you again!

@mverzilli
Copy link

Glad you took a look at that :). I was looking at it because I want to unblock #4707. Both solutions are temporal, I'd go with splitting the runnables for now, but no strong opinion. What I know is we have to apply one of them now because a long term solution will not be available anytime soon.

@asterite
Copy link
Member Author

@kirbyfan64 Yes, this is delaying the inevitable. This is why almost every day I think "It's fun to program in Crystal, but it is scalable?". I don't have an answer for that, and from time to time I lose hope and quit. Then I think "maybe someone some day will come with a solution" and continue helping Crystal grow in areas where solutions exist. But I don't know, I personally can't see a way to make compilation modular (without dramatically changing the language) to allow compiling more and more code, and I've expressed this feeling many times now. That's why I think focusing on solving this first is the most important thing to do right now. Not Windows, not parallelism, not adding more features or fixing small bugs, but finding a way to make the language scale, and doing it, not just believing it can or will be done.

People at Manas told me there is a solution (they believe there is a solution) so in the meantime I'll continue programming stuff that I find interesting/fun without worrying about whether all of it makes sense. Much like playing a video game doesn't have a purpose other than having fun.

@akzhan
Copy link
Contributor

akzhan commented Aug 19, 2017

@asterite what about designing of new language? Something more scalable with union types and modules.

And thread aware from start.

@asterite
Copy link
Member Author

@akzhan I have no idea what that language could look like without being one of the existing languages (Go, Rust, Pony, Nim, D). Crystal has unique features, but those features don't exist in other statically typed languages for a very good reason (or that's at least what I think now, though I have no proof for that): they don't scale to big programs.

@asterite
Copy link
Member Author

asterite commented Aug 19, 2017

On the other hand, the compiler is where this issue happens most. I didn't see any other program to present this problem. The compiler has a big AST hierarchy and many, many types and lines of codes. I don't think the typical Crystal program has many lines or such a complex type hierarchy, so the problem won't happen there. By running all the specs in halves the problem will be kind of fixed.

@RX14
Copy link
Member

RX14 commented Aug 19, 2017

I don't think anyone has actually tried to take an optimiser to the compiler and perform non-architectural performance tweaks. Sure, having a compiler architecture which is inherently fast is great but I think there may be some performance low hanging fruit in the compiler, especially in memory usage, which will help scale the compiler some more.

Regarding scaling in general, the worst case possible solution is that we add a concept of "modules" which break the program into pieces, each of which can be compiled separately. That's the worst case, and to be perfectly honest, it wouldn't be the end of the world. I believe there's a solution which doesn't involve doing that, but I feel comfortable knowing there's an ugly solution which will allow us to scale. Also, microservices are cool, we could always tell people who hit compilation time problems to split up their monolith!

I also think that we can't bury our heads in the sand and give up because we don't have a solution to scaling. The truth is, the more popular Crystal gets, the more smart people are likely to work on the compiler and tackle the scaling problem for themselves. Perhaps there's some neat hack, or compiler theory, we haven't thought of to enable incremental complication which is easy to implement. Who knows.

I've been thinking about this problem and Crystal's community problem for a while and I think that growing a larger community and getting 5x, 10x the number of people involved, and most importantly the coolness factor, is likely to solve at least some of our problems.

@mverzilli
Copy link

In situations like this I like to think of the worst case scenario. Worst case, we reach to a point where we are convinced that Crystal doesn't scale beyond the size of its own compiler. So we end up with a great language which you can't use to write programs over 40kLOC and which forces you to split you projects in services beyond that. I don't know about the rest of the community, but for a worst case, I'd be quite damn happy with that :).

@asterite
Copy link
Member Author

@RX14 @mverzilli Well said!

By the way, I changed bin/ci to do make crystal std_spec compiler_spec doc instead of make crystal spec doc and now the specs pass 🎉

If approved, I'll merge this PR with that change.

@RX14
Copy link
Member

RX14 commented Aug 19, 2017

@mverzilli I think it's pretty clear that you can scale beyond that inside a single executable as long as you can define seperable parts which you know will have a consistent binary ABI. And I don't see any reason why explicitly defining those modules wouldn't be possible. So I like to think the worst case is slightly better than you think!

Copy link
Member

@RX14 RX14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like having to change this to reduce memory usage but it's been holding up PRs and being intermittent for a while so we should just use this workaround. @asterite you mentioned another possible memory optimization, it would be nice to get that merged in addition to this if that's sane.

@asterite asterite merged commit e811cf9 into master Aug 19, 2017
@asterite asterite deleted the feature/improve-json-yaml-error-location branch August 19, 2017 23:04
@asterite
Copy link
Member Author

@RX14 Hmm, I tried using a single LLVM::Context instead of many and the memory usage, I was probably mistaken.

@straight-shoota
Copy link
Member

@asterite It looks like you missed a significant verb... ;) What did the memory usage do?

@asterite
Copy link
Member Author

Ah, the memory usage remained the same. Right now on my machine it takes about 2GB to compile the compiler. Brutal.

@refi64
Copy link
Contributor

refi64 commented Aug 20, 2017

A little off-topic, but would a custom GC help? I know of GCs that have options for being more memory efficient. Assuming the compiler is creating a crapton of objects, switching from Boehm would probably give either decent speed gains or memory gains or both.

@refi64
Copy link
Contributor

refi64 commented Aug 20, 2017

@asterite
Copy link
Member Author

@kirbyfan64 The problem is that the compiler is creating a crapton of objects that can't be released with the current compiler's design. It's not about the GC.

@refi64
Copy link
Contributor

refi64 commented Aug 20, 2017

It can be, though!

A custom GC allows a custom allocator and more intimate language knowledge. You can implement language-specific optimizations and adjust the internal memory management structures accordingly. Even a savings of a word per object can seriously add up when thousands of objects are allocated.

In addition, generational GCs work better in languages where objects are more frequently allocated like Crystal.

I mean, at some point Crystal's going to outgrow Boehm anyway. It might be at least work some slight investigation.

@RX14
Copy link
Member

RX14 commented Aug 20, 2017

A different GC isn't going to change which object can or cannot be GCed.

@RX14
Copy link
Member

RX14 commented Aug 20, 2017

@asterite 2GB is nothing compared to all_spec which takes 8gb of ram to compile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants