Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java proxies sometimes not bound quickly enough #3166

Closed
cheald opened this issue Jul 22, 2015 · 4 comments
Closed

Java proxies sometimes not bound quickly enough #3166

cheald opened this issue Jul 22, 2015 · 4 comments

Comments

@cheald
Copy link
Contributor

cheald commented Jul 22, 2015

It appears that in some cases, Java class proxies can be made available before they're fully bound. We are using the Stanford NLP libraries to perform entity extraction on text, with the process:

  1. Create an Annotator
  2. Pass text to the Annotator, which produces an Annotation instance
  3. Call .get(TokensAnnotation) to get a List<CoreLabel>
  4. Iterate the List<CoreLabel> and for each CoreLabel call #original_text
      class EntityExtractor < Component::Service
        java_import "edu.stanford.nlp.ling.CoreAnnotations$TokensAnnotation"

        include_package "edu.stanford.nlp.pipeline"
        java_import "java.util.Properties"
        java_import "intoxicant.analytics.coreNlp.StopwordAnnotator"

        # Binds an RMQ consumer that receives jobs
        process do |msg|
          if msg["data"]["type"] == "html"
            page = Page.id_or_url(msg["id"], msg["url"])
            process msg, page if page
          end
        end

        @semaphore = Mutex.new

        def self.pipeline
          @semaphore.synchronize do
            @pipeline ||= begin
              props = Properties.new.tap do |p|
                p.set_property "annotators", "tokenize, ssplit, stopword, pos, lemma, ner"
                p.set_property "ner.applyNumericClassifiers", "false"
                p.set_property "ner.useSUTime", "false"
                p.set_property "parse.maxlen", "80" # See https://mailman.stanford.edu/pipermail/java-nlp-user/2014-November/006483.html
                p.set_property "customAnnotatorClass.stopword", "intoxicant.analytics.coreNlp.StopwordAnnotator"
              end
              StanfordCoreNLP.new props
            end
          end
        end

        def process(msg, page)
          entities = extract_entities annotate(msg["data"]["article"])
        end

        def annotate(article)
          Annotation.new(article).tap {|doc| self.class.pipeline.annotate(doc) }
        end

        def extract_entities(document)
          document.get(TokensAnnotation).map {|e| [e.original_text, e.ner] } # < -- undefined method `original_text' for #<Java::EduStanfordNlpLing::CoreLabel:0x412ce5ad>
          # ...
        end
      end

However, once per app restart (across several dozen instances of the application) 1-2 instances produce an error like:

undefined method `original_text' for #<Java::EduStanfordNlpLing::CoreLabel:0x412ce5ad>

Future invocations work just fine.

Per IRC discussion, this sounds like an error with the proxy class being made available before it's fully bound.

@kares
Copy link
Member

kares commented Jul 28, 2015

these are tricky to fix but reproducing helps, which seems like we won't be able to in this case ;(
... I assume all instances in List<CoreLabel> are actually of class CoreLabel (no sub-classes) ?
... also for the record what JRuby version are you using ... latest of 1.7.x (1.7.21) if not pls try.

@cheald
Copy link
Contributor Author

cheald commented Jul 28, 2015

Yes, they are actually CoreLabel instances. This is deployed against jruby-1.7.19 - I'll try an upgrade to .21 tomorrow.

@kares
Copy link
Member

kares commented Jul 28, 2015

@cheald that is actually good news ... this should be fixed than since 1.7.20 due #2014 and #1621 so pls do

@cheald
Copy link
Contributor Author

cheald commented Jul 29, 2015

I upgraded to 1.7.21 last night and haven't seen this recur yet. I'll reopen the issue if it does.

@cheald cheald closed this as completed Jul 29, 2015
@kares kares added this to the Invalid or Duplicate milestone Jul 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants