Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting jars work with Dir globs for all classloaders #1823

Closed
mkristian opened this issue Jul 17, 2014 · 20 comments
Closed

getting jars work with Dir globs for all classloaders #1823

mkristian opened this issue Jul 17, 2014 · 20 comments

Comments

@mkristian
Copy link
Member

directory globs do not work with some classloaders since there is no to retrieve a list of files of directory via classloader API

that is problem for some J2EE classloaders and probably some/all OSGi classloaders. the problem are the missing directory info and that there is not way to "find" the jar location.

that is quite a problem since the default gems also do not work #1761
with all its consequences.

proposal to get at least jruby-stdlib.jar working on ALL classloaders:

add the missing directory info explicit into the jar

jruby-stdlib already has
META-INF/jruby.home
which works, so adding the missing directory in
META-INF/jruby.meta
will also work.

this would allow tools like warbler and gem-maven-plugin to add those directory infos for packing war files or packing jar with embedded gems. this would be big step forward to have better classloader support then JRuby has right now.

any other solution outside of JRuby itself would mean to "adjust" the LOAD_PATH before "launching" the application. even that will not work with gems like 'backport' which use Dir globs to require bunch of files depending on the ruby version.

@headius @ratnikov

@mkristian
Copy link
Member Author

to illustrate the need I added an osgi testcase which uses a hack to get gems embedded into an osgi bundle:

a7d900e#diff-521ec0c91fa4f679d2e8dbe01d7596cfR13

this takes the lib directory of each gem and copies it to the root of the jar/bundle. this does not work for too many gems. some gems have resources loaded from somewhere inside the gem but not under the require_path of that gem.

another option to get those gems working would be to prepend all those require_path of the gems to the LOAD_PATH. still not all gems work for other reasons:

other gems use gem 'multi_json', '~> 1.0.3' to ensure some versions contraints, which only works if the gem gets activated by rubygems, i.e. such a gem can not be unwrapped (but one could redefine def gem( *args ); end).

that rubygems works is also tested here
a7d900e#diff-c5eb4ad5fb6d4e25f1a6dcc0e8663ad9R97

currently you can load default gems (see rake and openssl in that testcase) but they never get "activated" or recognized as default gems. i.e. to use another version of the rake or jruby-openssl you need to make sure you "prepend" the $LOAD_PATH to jruby finds them first.

the other issues of this testcase is related to #1841

@mkristian mkristian added this to the JRuby 1.7.14 milestone Jul 27, 2014
@mkristian
Copy link
Member Author

fixed c04c50f and 3fc67ba

with this defaults gems do work as usual for OSGi containers: c04c50f#diff-ef4ef45ccfd90a731f49c5d9b9046926L79 and c04c50f#diff-c5eb4ad5fb6d4e25f1a6dcc0e8663ad9L95

it will probably work with J2EE classloaders as well. if jboss-wildfly can be used during maven integration test I will add it in way which currently fails on jruby-1.7.13 (running the war packed/unexploded)

integration tests using embedded gems which supply these directory info are coming as well.

@kares
Copy link
Member

kares commented Aug 11, 2014

@mkristian good job, getting globing/resolving working under "embed" scenarios is I guess a 80% success!

couple of hints I'm thinking if possible to avoid the "ugly" .jrubydir file - possibly to use/hack around existing .jar tools to handle the job, 2 options come into play (maybe the first one is not appropriate - not 100% sure) :

  • META-INF/INDEX.LIST allows to list the (package) content of a .jar, class-loaders might use those, but we can maybe revert to reading it as well ... "lazily" http://docs.oracle.com/javase/7/docs/technotes/guides/jar/jar.html#JAR_Index
  • second option would be to generate (and compile) package-info.java files with annotations that would include the package's resources listed - reading those at runtime should than be pretty fast

I would probably vote for the first option since it is less invasive and more conventional without requiring third-party jar gems to do anything that "special" to get globbing working above them ... anyway +1 for all the work (we can definitely to those improvements on top of it) !

@mkristian
Copy link
Member Author

cool - I did not know about that META-INF/INDEX.LIST that is definitely the better way to go ;)

@kares
Copy link
Member

kares commented Aug 11, 2014

it only list packages ... but I was thinking if it was not enough maybe it can be "hacked" or even if not another .LIST be invented .. I think I've seen something like this done previously in JRuby to speed-up load times ...

@ratnikov
Copy link
Contributor

Is META-INF/INDEX.LIST something that is accessible via classpath? If so, does it merge the indexes of different jars or not really?

Just indexing jar contents doesn't seem too useful, since I already build a cache to resolve directories.

@kares
Copy link
Member

kares commented Aug 11, 2014

yeap, that was what I was about to point out: #1546
... I think that will need to be replaced / updated to work best if there's any "index" generated

@mkristian
Copy link
Member Author

this was originally meant to be used for uri: protocol and URLResource
where the only API used is URL (coming from a classloader) or
Thread.currentThread.contextClassLoader (J2EE case)

merging is no problem since you can load an Enumeration of resources which
gives back all the files (as inputstream) under a given resource path - for
ALL classloaders - J2EE and OSGi cases

@ratnikov
Copy link
Contributor

Nice! Didn't know about the Enumeration call.

There's a slight concern with cache invalidation (in case during runtime
someone adds some bundle/whatever to the classpath), but hopefully
getResources is cheap enough to be able to call it each time before we load
a directory resource after failing to load it directly.

But assuming the main bundles do the /META-INF/INDEX.LIST that sounds like
the way to go.

On Mon, Aug 11, 2014 at 11:08 AM, Christian Meier notifications@github.com
wrote:

this was originally meant to be used for uri: protocol and URLResource
where the only API used is URL (coming from a classloader) or
Thread.currentThread.contextClassLoader (J2EE case)

merging is no problem since you can load an Enumeration of resources which
gives back all the files (as inputstream) under a given resource path - for
ALL classloaders - J2EE and OSGi cases


Reply to this email directly or view it on GitHub
#1823 (comment).

@mkristian
Copy link
Member Author

well. adding a INDEX.LIST to an OSGi bundle seems to be a bit tricky:

https://stackoverflow.com/questions/12996182/indexing-a-jar-using-maven-bundle-plugin

but that trick could work with other build systems like ant, gradle, etc as well.

@mkristian
Copy link
Member Author

in case during runtime someone adds some bundle/whatever to the classpath

hmm - that sounds strange to me. actually the only classloader I know (with my limited knowledge) is the JRubyClassLoader where you can add jars to it during runtime. BUT I hope we are not talking about the JRubyClassLoader since that is the thing which takes care of the jars which are added to it via require 'myjar.jar'

classpath is also a strange concept for me since it is not really clear what it means. classpath != application classloader. the classpath: protocol has nothing to do with $CLASSPATH, classpath: != 'java.class.path' system-property

let's look:

  • $CLASSPATH seems to be JRubyClassLoader.getURLs
  • classpath: seems to look at least in the parent classloader of JRubyClassLoader and/or Thread.currentThread.getContextClassLoader
  • java.class.path elements are put into $LOAD_PATH
  • what does classpath: means in the context of having the ruby app distributed over several classloaders ?

since I hardly use jruby via commandline (recently I use it more often) and mostly use it via java application:

two of my application build up a classloader space for the "application" which runs jruby with all its ruby resources inside that classloader space. BUT somehow the "classpath" which was used to launch the java application is part of the $LOAD_PATH, i.e. all those jars are added there. that is wrong since the jruby part has nothing to do with the classpath of java application. the same thing I saw with OSGi where those osgi framework jars appeared in the $LOAD_PATH. it also appears that jruby-rack is fiddling with $LOAD_PATH to clean up or so - not sure to what extend. to remove this unwanted effect I called IsolatedScriptingContainer where the ScriptingContainer gets isolated from the outer environment as good as possible or as far I understand things.

classpath: is very close to uri:classloader: but uri:classloader is defined as load a resource from Thread.currentThread.contextClassLoader and has no meaning in a setup which does not use the contextClassLoader ! see #1872

classpath: has nothing to do with $CLASSPATH

classpath is a concept for launching jruby via org.jruby.Main ! and it is an unclear concept for running jruby inside a java application.

currently I have the feeling classpath is immutable and the only thing which is not is the JRubyClassLoader which is NOT used for loading ruby resources but to give a home for the embedded jars which are required along the way.

the JRubyClassLoader also has a lot of code to deal with "certain" cases the super URLClassLoader fails, i.e. it reimplement parts of the classloading in case super.findClass fails. and the JRubyClassLoader.getResource delivers jar: urls as internal protocol even though the urls added via addURL are just URLs - it looks like the JRubyClassLoader does not really behave well, regarding the classloader API.

IMO the jruby commandline should be a special case of "a java application" running jruby.

plugin frameworks use classloader to separate different plugin from each other and its underlying frameworks - OSGi, J2EE, ClassWorld framework, etc.

conclusion: the classloader is the central place to load resources from even for the commandline case.

@kares
Copy link
Member

kares commented Aug 12, 2014

@mkristian probably no need to worry about runtime added .jars classes to CLs (other than JRuby's)

I think the classpath: means anywhere in the CP ... they are not the same as $CLASSPATH (in JRuby) since that is the custom CP of the JRubyClassLoader (only seen by the Ruby part) ... otherwise classpath: should mean "42" :) ... that is of course including stuff at the context class-loader (as far as I understand) and all the parent loaders.

I would not consider a CP immutable in general esp. with OSGi (although I have little experience with OSGi I think there's a lot of CP work hidden underneath to support module isolation/versioning etc.) ... even with servlets just consider hot-reload - JVM's CP is changing (this case of course is a bad example since JRuby gets rebooted thus the new app that comes up sees a new "immutable" CP).

@ratnikov
Copy link
Contributor

Interesting. I assumed there'd be a JVM description of what classpath: URI
handling is, but couldn't find it, so I guess it's not base java (and
seemingly a spring framework thing).

Anyhow, the way I think about classpath: loading is:

  • a path of classpath:/somepathpath/foo.rb used in File context (or loading
    context, since it's essentially the same thing), ruby will attempt to gets
    its inputstream and file characteristics by doing
    ClassLoader.getResource("/somepath/foo.rb").

I kinda feel that all that $CLASSPATH and weird $LOAD_PATH specifics (which
I'm not sure still applies, but I guess maybe during startup JRuby copies
everything in java.class.path into $LOAD_PATH) is silly and we would be
good to do away with it, since they don't have direct counter-parts in java
world. Trying to introduce ways to be able to mess with those things from
Ruby makes them quirky and incomplete. =/

On Tue, Aug 12, 2014 at 4:36 AM, Christian Meier notifications@github.com
wrote:

in case during runtime someone adds some bundle/whatever to the classpath

hmm - that sounds strange to me. actually the only classloader I know
(with my limited knowledge) is the JRubyClassLoader where you can add jars
to it during runtime. BUT I hope we are not talking about the
JRubyClassLoader since that is the thing which takes care of the jars which
are added to it via require 'myjar.jar'

classpath is also a strange concept for me since it is not really clear
what it means. classpath != application classloader. the classpath:
protocol has nothing to do with $CLASSPATH, classpath: !=
'java.class.path' system-property

let's look:

  • $CLASSPATH seems to be JRubyClassLoader.getURLs
  • classpath: seems to look at least in the parent classloader of
    JRubyClassLoader and/or Thread.currentThread.getContextClassLoader
  • java.class.path elements are put into $LOAD_PATH
  • what does classpath: means in the context of having the ruby app
    distributed over several classloaders ?

since I hardly use jruby via commandline (recently I use it more often)
and mostly use it via java application:

two of my application build up a classloader space for the "application"
which runs jruby with all its ruby resources inside that classloader space.
BUT somehow the "classpath" which was used to launch the java application
is part of the $LOAD_PATH, i.e. all those jars are added there. that is
wrong since the jruby part has nothing to do with the classpath of java
application. the same thing I saw with OSGi where those osgi framework jars
appeared in the $LOAD_PATH. it also appears that jruby-rack is fiddling
with $LOAD_PATH to clean up or so - not sure to what extend. to remove this
unwanted effect I called IsolatedScriptingContainer
https://github.com/jruby/jruby/blob/test-uri-protocol/core/src/main/java/org/jruby/embed/IsolatedScriptingContainer.java
where the ScriptingContainer gets isolated from the outer environment as
good as possible or as far I understand things.

classpath: is very close to uri:classloader: but uri:classloader is
defined as load a resource from Thread.currentThread.contextClassLoader and
has no meaning in a setup which does not use the contextClassLoader ! see
#1872 #1872

classpath: has nothing to do with $CLASSPATH

classpath is a concept for launching jruby via org.jruby.Main ! and it is
an unclear concept for running jruby inside a java application.

currently I have the feeling classpath is immutable and the only thing
which is not is the JRubyClassLoader which is NOT used for loading ruby
resources but to give a home for the embedded jars which are required along
the way.

the JRubyClassLoader also has a lot of code to deal with "certain" cases
the super URLClassLoader fails, i.e. it reimplement parts of the
classloading in case super.findClass fails. and the
JRubyClassLoader.getResource delivers jar: urls as internal protocol even
though the urls added via addURL are just URLs - it looks like the
JRubyClassLoader does not really behave well, regarding the classloader API.

IMO the jruby commandline should be a special case of "a java application"
running jruby.

plugin frameworks use classloader to separate different plugin from each
other and its underlying frameworks - OSGi, J2EE, ClassWorld framework, etc.

conclusion: the classloader is the central place to load resources from
even for the commandline case.


Reply to this email directly or view it on GitHub
#1823 (comment).

@kares
Copy link
Member

kares commented Aug 12, 2014

@ratnikov exactly ... it's invented by Spring - they did a lot of work to support it (~ similar to what JRuby does)

@mkristian
Copy link
Member Author

@kares since INDEX.LIST does not work (see #1872 (comment)), I was looking into package-info.java suggestion of yours. I see something like this is done in other places:
http://tech.puredanger.com/2007/02/28/package-annotations/

i.e. have a jruby package annotation which contains a list of files of that "package" ? maybe you had something else in mind ?!

@kares
Copy link
Member

kares commented Aug 13, 2014

yy ... was just a "blind" shot I'm really not sure which approach is better to have a "FILES.LIST" under META-INF or package-info like .class files with @Files({"foo.rb", "bar.rb"}) annotations generated ... the first one might be easier for gem authors with .jar extensions to adopt (the second one requires them to use a custom annotation that is not "backwards" compatible), but I'm not sure about the drawbacks ...

@ratnikov
Copy link
Contributor

I think I prefer FILES.LIST since a lot of hard that I build that contains
ruby scripts is done by jar directly, so not sure how I'd go about
generating proper package.info whereas jar tf is easy.
On Aug 13, 2014 9:50 AM, "Karol Bucek" notifications@github.com wrote:

yy ... was just a "blind" shot I'm really not sure which approach is
better to have a "FILES.LIST" under META-INF or package-info like
.class files with @Files({"foo.rb", "bar.rb"}) annotations generated ...
the first one might be easier for gem authors with .jar extensions to adopt
(the second one requires them to use a custom annotation that is not
"backwards" compatible), but I'm not sure about the drawbacks ...


Reply to this email directly or view it on GitHub
#1823 (comment).

@mkristian
Copy link
Member Author

just thought about the tricky when I have embedded gems in several jars.
can not have the same class more then ones. but resources,i .e. regualr
files I can have duplicates.

On Wed, Aug 13, 2014 at 3:50 PM, Karol Bucek notifications@github.com
wrote:

yy ... was just a "blind" shot I'm really not sure which approach is
better to have a "FILES.LIST" under META-INF or package-info like
.class files with @Files({"foo.rb", "bar.rb"}) annotations generated ...
the first one might be easier for gem authors with .jar extensions to adopt
(the second one requires them to use a custom annotation that is not
"backwards" compatible), but I'm not sure about the drawbacks ...


Reply to this email directly or view it on GitHub
#1823 (comment).

@mkristian
Copy link
Member Author

yes the idea of just taking the output of jar tf ... seems easy enough to use with different build tools.

@kares I do not expect gem authors to provide those FILES.LIST, I had more tools like warbler and gem-maven-plugin in mind, i.e. when you pack gems into jar/war or the situation @ratnikov described.

anyways I will add a script to JRuby::Commands which is part of jruby already which will do the job.

@enebo enebo removed this from the JRuby 1.7.14 milestone Aug 27, 2014
@enebo enebo added this to the JRuby 1.7.15 milestone Aug 27, 2014
@mkristian mkristian modified the milestones: JRuby 1.7.16, JRuby 1.7.15 May 5, 2015
@mkristian
Copy link
Member Author

that is done . . .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants