Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Truffle] Lazily initialised, shared string literals. #3687

Merged
merged 5 commits into from
Feb 22, 2016

Conversation

chrisseaton
Copy link
Contributor

We want to be able to create a Ruby string with a literal value as easily as possible in our Java code, but we only want to create one rope (our equivalent of the char[]) per VM, as these are immutable and sharing them improves caching elsewhere.

@nirvdrum

@chrisseaton chrisseaton added this to the truffle-dev milestone Feb 19, 2016
@@ -32,7 +32,7 @@ public Object execute(VirtualFrame frame) {

@Override
public Object isDefined(VirtualFrame frame) {
return create7BitString("nil", UTF8Encoding.INSTANCE);
return stringLiterals().NIL.getString();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the motivation for this work - we want to be able to refer to a compilation final immutable rope which we can package up in a tiny, escape analysable string object very quickly.

@nirvdrum
Copy link
Contributor

This looks good. My only real gripe is the naming. We have a StringLiteralNode that has nothing to do with this StringLiterals class. And there's confusing terminology around Java vs Ruby strings.


return context.getRopeTable().getRope(
literal.getBytes(StandardCharsets.US_ASCII),
USASCIIEncoding.INSTANCE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want this to UTF8Encoding.INSTANCE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've constrained to 7-bit ASCII so we can use the advantageous 7-bit rope (see the assert in the constructor). We aren't likely to need unicode string literals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be for correctness, however. In the defined? case, for instance, MRI would return a UTF-8 string. We can have a 7-bit UTF-8 rope (that's the dominant case in normal operation).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take that back. In the defined? case, MRI actually returns ASCII-8BIT, which would be ASCIIEncoding.INSTANCE (NB: that's different yet from USASCIIEncoding.INSTANCE).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'll check for more cases as I go. Others include exception message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it looks like defined? strings are all Encoding::ASCII_8BIT, whatever the magic comment or internal/external encodings. I'll switch to that for this use case and I'll check as I start to use it for other use cases.

@chrisseaton
Copy link
Contributor Author

I renamed it CoreStrings and CoreString. Do you think that's better?

@chrisseaton
Copy link
Contributor Author

Can you take a look again please @nirvdrum

@nirvdrum
Copy link
Contributor

Other than my cautionary comment, this looks good to me.

# Conflicts:
#	truffle/src/main/java/org/jruby/truffle/language/RubyNode.java
chrisseaton added a commit that referenced this pull request Feb 22, 2016
[Truffle] Lazily initialised, shared string literals.
@chrisseaton chrisseaton merged commit e386346 into master Feb 22, 2016
@chrisseaton chrisseaton deleted the truffle-string-literals branch February 22, 2016 13:35
@eregon
Copy link
Member

eregon commented Feb 23, 2016

Is this worth the extra complexity to initialize lazily?
If it's mostly for defined? then that's a fairly small set of strings which I believe should not impact startup significantly (or we need to revisit why it is so expensive to create such strings in the first place).

@eregon
Copy link
Member

eregon commented Feb 23, 2016

Also, maybe the leaf ropes should be weakly cached like Symbols so this explicit global state is not necessary?
There is something similar (the fstring table) for MRI for literals and such.

@enebo enebo added this to the Non-Release milestone Dec 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants