Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Truffle] Encoding#replicate #4161

Merged
merged 6 commits into from Sep 19, 2016
Merged

Conversation

bjfish
Copy link
Contributor

@bjfish bjfish commented Sep 18, 2016

Work In Progress.

I've got one known failure here which I think is due to the following from encoding.rb not recognizing the encodings added at runtime. I'm still thinking about a fix for this. I'll probably replace these with a primitive for the few places these are used.

  EncodingList = Encoding.list.freeze
  EncodingMap = build_encoding_map

Please review @nirvdrum

@bjfish bjfish added the truffle label Sep 18, 2016
@eregon
Copy link
Member

eregon commented Sep 18, 2016

I'll probably replace these with a primitive for the few places these are used.

👍

private final DynamicObject[] ENCODING_LIST_BY_ENCODING_LIST_INDEX = new DynamicObject[EncodingDB.getEncodings().size()];
private final DynamicObject[] ENCODING_LIST_BY_ENCODING_INDEX = new DynamicObject[EncodingDB.getEncodings().size()];
private final List<DynamicObject> ENCODING_LIST_BY_ENCODING_LIST_INDEX = new ArrayList<DynamicObject>(EncodingDB.getEncodings().size());
private final Map<Integer, DynamicObject> ENCODING_LIST_BY_ENCODING_INDEX = new HashMap<Integer,DynamicObject>(EncodingDB.getEncodings().size());
private final Map<String, DynamicObject> LOOKUP = new HashMap<>();
Copy link
Member

@eregon eregon Sep 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will probably need some synchronization if encodings can be added after startup.
Maybe a CopyOnWriteArrayList and a ConcurrentHashMap, or synchronize methods accessing those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the synchronized keyword as we discussed.

@bjfish
Copy link
Contributor Author

bjfish commented Sep 19, 2016

It is pretty difficult to make updates in the Encoding area because we have lists/maps of Encodings in the EncodingDB, EncodingService, EncodingManager, and in the Encoding::EncodingList/Map.

Also, I don't think JRuby replicates correctly currently because I don't see the Encoding appear in the Encoding.name_list or Encoding.list.

@nirvdrum
Copy link
Contributor

You've probably worked this out by now, but EncodingDB is jcodings's way of storing the underlying encoding objects, EncodingService is JRuby's way of bridging Ruby and jcodings, EncodingManager is our way of bridging Ruby and jcodings, and Encoding::EncodingList/Map is our way of bridging Rubinius and jcodings.

@nirvdrum
Copy link
Contributor

@eregon would know better than me, but I think CopyOnWriteArrayList is better than synchronized here. In virtually all expected cases, the encoding list is processed during bootstrap and then effectively becomes read-only. Synchronizing those read methods seems unnecessarily heavy to me.

@bjfish
Copy link
Contributor Author

bjfish commented Sep 19, 2016

@nirvdrum, @eregon would know better than me too here, correct me if I am wrong ...

I discussed choosing synchronized over CopyOnWriteArrayList with @eregon here and he agreed but we didn't really discuss why.

I thought CopyOnWriteArrayList\ConcurrentHashMap here would be too granular because our methods update/read more than one of these data structures at once. So, I thought synchronized on every method would work to ensure that our reads/writes to multiple data structures are consistent.

I agree that synchronized may be more synchronization than we need because we probably don't need to synchronize reads when no writing is occurring which is where we could use a ReadWriteLock to only sync reads when writes are happening. For the moment, I just chose to use the blunt approach synchronized because I didn't think read contention here would be a common scenario for us right now but maybe it is?

@nirvdrum
Copy link
Contributor

I don't think you really need to worry about consistency all that much in general. Replicating an encoding doesn't affect the indices or name mappings of existing encodings in EncodingDB or EncodingManager. So, if a thread is looking at an older version of one of the already bootstrapped data structures, they should still get valid data. The contention is really around one thread replicating an encoding and another thread trying to use it, which should be handled by user code in my opinion. As long as a single thread's view doesn't become inconsistent, I think we're okay.

return ENCODING_LIST_BY_ENCODING_LIST_INDEX;
}

public DynamicObject getRubyEncoding(Encoding encoding) {
DynamicObject rubyEncoding = ENCODING_LIST_BY_ENCODING_INDEX[encoding.getIndex()];
@TruffleBoundary
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashMap.get caused Too deep inlining, probably caused by recursive inlining.

@@ -55,44 +57,46 @@ public static DynamicObject newRubyEncoding(RubyContext context, Encoding encodi
return Layouts.ENCODING.createEncoding(context.getCoreLibrary().getEncodingFactory(), encoding, string, dummy);
}

public DynamicObject[] getUnsafeEncodingList() {
public synchronized List<DynamicObject> getUnsafeEncodingList() {
Copy link
Member

@eregon eregon Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This synchronized is useless though :p
Since the full list is returned, there could be ConcurrentModificationException when iterating while it's being modified.
So either use a CopyOnWriteArrayList here or replace with a helper that just returns one encoding and not the whole list.

@eregon
Copy link
Member

eregon commented Sep 19, 2016

@nirvdrum synchronized is not so heavy but here I'm of the opinion using concurrent data-structures might be enough and actually easier to follow.
synchronized would ensure all three are kept consistent but I would suspect only one of them is used for reading at a time, so inconsistencies should be unnoticeable.

@bjfish I'd like to have a look in my IDE and make this thread-safe after your PR, could you remove the synchronized commit? Then let's merge this without mixing with concurrency :)


}

@Primitive(name = "encoding_get_object_encoding_by_index", needsSelf = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just encoding_get_encoding_by_index ?
Since anyway primitives return Ruby-level values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed at 79e5a8a

@bjfish
Copy link
Contributor Author

bjfish commented Sep 19, 2016

@eregon I've reverted the synchronized commit at 4222dc8

@eregon
Copy link
Member

eregon commented Sep 19, 2016

Let's get this merged as soon as it's green!

@bjfish bjfish merged commit cd5efd5 into truffle-head Sep 19, 2016
@bjfish bjfish deleted the truffle-encoding-replicate branch September 19, 2016 19:20
private final DynamicObject[] ENCODING_LIST_BY_ENCODING_LIST_INDEX = new DynamicObject[EncodingDB.getEncodings().size()];
private final DynamicObject[] ENCODING_LIST_BY_ENCODING_INDEX = new DynamicObject[EncodingDB.getEncodings().size()];
private final List<DynamicObject> ENCODING_LIST_BY_ENCODING_LIST_INDEX = new ArrayList<DynamicObject>(EncodingDB.getEncodings().size());
private final Map<Integer, DynamicObject> ENCODING_LIST_BY_ENCODING_INDEX = new HashMap<Integer,DynamicObject>(EncodingDB.getEncodings().size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this become a map? That's fairly expensive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the List API doesn't support adding at an index > size I guess.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aliases resolve to their aliased encoding:

irb(main):002:0> 'abc'.encode('ASCII-8BIT').encoding.name
=> "ASCII-8BIT"
irb(main):003:0> 'abc'.encode('BINARY').encoding.name
=> "ASCII-8BIT"

Since we associate only the jcodings Encoding with the rope, we need to look up the associated DynamicObject. Prior to this change, there was essentially a 1:1 mapping, since aliases always resolved to their aliased encoding.

Encoding#replicate, however, works differently.

irb(main):004:0> 'abc'.encode('BINARY').encoding.name
=> "ASCII-8BIT"
irb(main):005:0> 'abc'.encode(Encoding::BINARY.replicate('blah')).encoding.name
=> "blah"

I'm pretty sure he made it a map so he could store multiple DynamicObjects with the same encoding index.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, this works, but I am not sure why:

jt ruby -e 'Encoding::BINARY.replicate("blah"); p e=Encoding.find("blah"); p "".encode(e).encoding; p "".encode(Encoding::BINARY).encoding'
#<Encoding:blah>
#<Encoding:blah>
#<Encoding:ASCII-8BIT>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It creates another Encoding instance, lazily, see Encoding.replicate.

@bjfish
Copy link
Contributor Author

bjfish commented Sep 20, 2016

@eregon

Because the List API doesn't support adding at an index > size I guess.

This is correct. This could probably be done with an ArrayList.set as long as we ensure that the index we are setting is index < size which may be done by adding a null every time an encoding is defined.

@nirvdrum

I'm pretty sure he made it a map so he could store multiple DynamicObjects with the same encoding index.

My understanding was that EncodingDB.replicate registers a new Encoding with it's own index. So, I did not intend to "store multiple DynamicObjects with the same encoding index."

@nirvdrum
Copy link
Contributor

@bjfish Thanks for the clarification.

@enebo enebo modified the milestone: truffle-dev Nov 9, 2016
@enebo enebo added this to the Invalid or Duplicate milestone Dec 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants