Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enumerator out of memory error #2577

Open
iconara opened this issue Feb 7, 2015 · 5 comments
Open

Enumerator out of memory error #2577

iconara opened this issue Feb 7, 2015 · 5 comments

Comments

@iconara
Copy link
Contributor

iconara commented Feb 7, 2015

I've found that it's possible to stress JRuby into crashing with an out of memory error with the following code.

The code zips the bytes of two strings together, their lengths don't matter at all, a single character is sufficient. It then checks whether any of the bytes are nil, which should be impossible, but happens. Originally I didn't check for nil, but the first indication that there was a problem was that I got errors that I did things with nil where there couldn't be any nil. When I put a begin…rescue around it to see what was nil I got an out of memory error instead.

s1 = 'a'
s2 = 'b'

100000.times do
  b1 = s1.each_byte
  b2 = s2.each_byte
  bytes = b1.zip(b2).flatten
  if bytes.any? { |b| b.nil? }
    puts('this can never happen')
  end
end

prints the following in JRuby 1.7.18, 1.7.19 and HEAD (probably all other versions too):

this can never happen
this can never happen
this can never happen
this can never happen
this can never happen
Error: Your application used more memory than the safety cap of 500M.
Specify -J-Xmx####m to increase it (#### = cap size in MB).
Specify -w for full OutOfMemoryError stack trace

the exact number of "this can never happen" differ.

The full stack trace of the OutOfMemoryError is:

java.lang.OutOfMemoryError: unable to create new native thread
    at java.lang.Thread.start0(Native Method)
    at java.lang.Thread.start(Thread.java:713)
    at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
    at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
    at org.jruby.RubyEnumerator$ThreadedNexter.ensureStarted(RubyEnumerator.java:700)
    at org.jruby.RubyEnumerator$ThreadedNexter.next(RubyEnumerator.java:654)
    at org.jruby.RubyEnumerator.next(RubyEnumerator.java:461)
    at org.jruby.RubyEnumerator$INVOKER$i$0$0$next.call(RubyEnumerator$INVOKER$i$0$0$next.gen)
    at org.jruby.RubyClass.finvoke(RubyClass.java:616)
    at org.jruby.runtime.Helpers.invoke(Helpers.java:593)
    at org.jruby.RubyBasicObject.callMethod(RubyBasicObject.java:359)
    at org.jruby.RubyEnumerable.zipEnumNext(RubyEnumerable.java:1679)
    at org.jruby.RubyEnumerable$50.call(RubyEnumerable.java:1635)
    at org.jruby.runtime.CallBlock.doYield(CallBlock.java:80)
    at org.jruby.runtime.BlockBody.yield(BlockBody.java:82)
    at org.jruby.runtime.Block.yield(Block.java:147)
    at org.jruby.RubyString.enumerateBytes(RubyString.java:5468)
    at org.jruby.RubyString.each_byte19(RubyString.java:5275)
    at org.jruby.RubyString$INVOKER$i$0$0$each_byte19.call(RubyString$INVOKER$i$0$0$each_byte19.gen)
    at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodZeroBlock.call(JavaMethod.java:472)
    at org.jruby.RubyClass.finvoke(RubyClass.java:541)
    at org.jruby.runtime.Helpers.invoke(Helpers.java:589)
    at org.jruby.RubyBasicObject.callMethod(RubyBasicObject.java:394)
    at org.jruby.RubyEnumerator.each(RubyEnumerator.java:294)
    at org.jruby.RubyEnumerator$INVOKER$i$each.call(RubyEnumerator$INVOKER$i$each.gen)
    at org.jruby.RubyClass.finvoke(RubyClass.java:520)
    at org.jruby.runtime.Helpers.invoke(Helpers.java:577)
    at org.jruby.RubyEnumerable.callEach(RubyEnumerable.java:96)
    at org.jruby.RubyEnumerable.zipCommonEnum(RubyEnumerable.java:1626)
    at org.jruby.RubyEnumerable.zipCommon19(RubyEnumerable.java:1547)
    at org.jruby.RubyEnumerable.zip19(RubyEnumerable.java:1491)
    at org.jruby.RubyEnumerable$INVOKER$s$0$0$zip19.call(RubyEnumerable$INVOKER$s$0$0$zip19.gen)
    at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:210)
    at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:206)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:161)
    at tmp.jruby_issue.invokeOther3:zip(tmp/jruby_issue.rb)
    at tmp.jruby_issue.\=tmp\|jruby_issue\,rb_CLOSURE_1__tmp\|jruby_issue\,rb_0(tmp/jruby_issue.rb:7)
    at org.jruby.runtime.CompiledIRBlockBody.commonYieldPath(CompiledIRBlockBody.java:66)
    at org.jruby.runtime.IRBlockBody.yieldSpecific(IRBlockBody.java:84)
    at org.jruby.runtime.Block.yieldSpecific(Block.java:116)
    at org.jruby.RubyFixnum.times(RubyFixnum.java:300)
    at org.jruby.RubyFixnum$INVOKER$i$0$0$times.call(RubyFixnum$INVOKER$i$0$0$times.gen)
    at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:303)
    at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:141)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:145)
    at tmp.jruby_issue.invokeOther13:times(tmp/jruby_issue.rb)
    at tmp.jruby_issue.__script__(tmp/jruby_issue.rb:4)
    at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:636)
    at org.jruby.ir.Compiler$1.load(Compiler.java:112)
    at org.jruby.Ruby.runScript(Ruby.java:827)
    at org.jruby.Ruby.runScript(Ruby.java:820)
    at org.jruby.Ruby.runNormally(Ruby.java:750)
    at org.jruby.Ruby.runFromMain(Ruby.java:572)
    at org.jruby.Main.doRunFromMain(Main.java:404)
    at org.jruby.Main.internalRun(Main.java:299)
    at org.jruby.Main.run(Main.java:226)
    at org.jruby.Main.main(Main.java:198)
@headius
Copy link
Member

headius commented Mar 12, 2015

This may just be a down side of our having to use threads for all Enumerator#next logic.

You are creating two enumerators and then zipping the one against the other. This will probably create at least one thread, and possibly two. Once those threads reach the end of the data, they should shut down. If you walk away from them before they're complete, they should also shut down. What you're seeing here is that too many threads have been created and not cleaned up (perhaps due to GC delays) and so we can't create any more.

@headius
Copy link
Member

headius commented Mar 12, 2015

I attempted to make it force a GC when it fails to create a new thread, but it doesn't seem to help here.

@headius
Copy link
Member

headius commented Mar 12, 2015

It looks like our best bet would be to finally start making non-threaded enumerator logic similar to what we already have for Array#each (RubyEnumerator.ArrayNexter for example). That should make it possible for us to handle more core-class cases without threads, which should make your case work.

@iconara
Copy link
Contributor Author

iconara commented Mar 13, 2015

Thanks for looking into this. From my casual understanding of the problem it feels like as long as the underlying collection is sequential or in some other way externally enumerable there should be no need to use threads for enumeration.

I looks, for example, like the RubyString#each_byte creates RubyEnumerators and passes a size function, so could it also pass a function that enumerated the string (essentially a RubyEnumerator.Nexter)? I'm basically just trying to see if I'm understanding the underlying code correctly, I'm not familiar enough with it to see the whole picture or the downsides of a solution like that.

@iconara
Copy link
Contributor Author

iconara commented Mar 13, 2015

What I proposed is kind of what happens for RubyArray, but instead of having RubyEnumerator check the type of the underlying collection and deciding on the best "nexter" strategy the collection creates the strategy when it creates the RubyEnumerator.

abargnesi pushed a commit to OpenBEL/openbel-api that referenced this issue Jan 12, 2016
The JRuby enumerator uses a thread per next object in an enumerator
which proves costly. Hundreds of threads are created (tested with
yourkit) when batch-creating evidence due to the "each_slice(500)" of
the enumerator.

This issue is logged in JRuby:
jruby/jruby#2577

The solution employed was to yield each evidence directly to the block
and batch 500 into an array at a time. This should avoid the OOM
exception received:

ava.lang.OutOfMemoryError: unable to create new native thread

Indeed the thread count was observed to be lower in yourkit.
abargnesi pushed a commit to OpenBEL/openbel-api that referenced this issue Jan 13, 2016
The JRuby enumerator uses a thread per next object in an enumerator
which proves costly. Hundreds of threads are created (tested with
yourkit) when batch-creating evidence due to the "each_slice(500)" of
the enumerator.

This issue is logged in JRuby:
jruby/jruby#2577

The solution employed was to yield each evidence directly to the block
and batch 500 into an array at a time. This should avoid the OOM
exception received:

ava.lang.OutOfMemoryError: unable to create new native thread

Indeed the thread count was observed to be lower in yourkit.
abargnesi pushed a commit to OpenBEL/openbel-api that referenced this issue Mar 16, 2016
Fixed:

- Facets are not created for evidence uploaded through a dataset.
- Facets are empty while uploading a dataset.
- Dataset evidence collection is missing annotation/namespace URIs (#95).

Changed:

- Mongo schema redesign for evidence.facets and evidence facet cache.
- Bumped MongoDB requirement to 3.2.0. We now use the $slice operator
  for facet aggregation operations.

Added:

- Export evidence using BEL translator plugins (#44).
- Export dataset evidence using BEL translator plugins (#99).
- Mongo migration scripts for existing installations of openbel-api.
- Upgrading guide.
- 0.6.0 changelog notes.

Squashed commit of the following:

commit be2e6e1
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 15:07:24 2016 -0400

    replace method for BEL.keys_to_symbols

    additional style alignment

commit fbf5368
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 09:25:06 2016 -0400

    return 404 when translating empty evidence results

    refs #44

commit ac61baf
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 08:32:37 2016 -0400

    added storage.engine note for UPGRADING to 0.6.0

commit 3f4f700
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 08:27:14 2016 -0400

    added UPGRADING guide

commit 29f86e8
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 08:05:01 2016 -0400

    added document for 0.6.0 mongodb migration

commit 0e22354
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 06:30:26 2016 -0400

    add configuration check for MongoDB 3.2

    Check will fail to start OpenBEL API is MongoDB is < 3.2

commit 45e5e39
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 06:17:57 2016 -0400

    added missing arg to render evidence collection

commit 1edb037
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Mar 14 14:45:43 2016 -0400

    set mongo operation timeouts to unbounded

    The operation timeout is the number of seconds that can pass before
    subsequent reads from a mongo operation. This change makes this read
    timeout unbounded in order to satisfy long evidence and facet creation
    queries.

commit 39524ca
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Mar 14 13:46:25 2016 -0400

    remove cache facets during dataset load

    Cached facets were removed at the end of a dataset load. Now they are
    additionally removed at the start of the load as well as every increment
    of 10k nanopubs loaded.

commit 68c2107
Merge: de9a500 61a291d
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Mar 14 12:50:35 2016 -0400

    Merge branch 'next' into rewrite_references

commit 61a291d
Merge: 1b4dbb7 1bdf14e
Author: Tony Bargnesi <abargnesi@gmail.com>
Date:   Mon Mar 14 12:20:40 2016 -0400

    Merge pull request #101 from nbargnesi/issue100

    Issue100

commit 1bdf14e
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Mon Mar 14 12:05:43 2016 -0400

    document auth.enabled, auth.secret

commit 0e900f6
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Tue Feb 2 13:56:15 2016 -0500

    include only auth enabled/secret in default config

    for #100

commit fbb8b06
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Tue Feb 2 13:55:54 2016 -0500

    simplify authenticate route to enabled/disabled

commit fe724ff
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Tue Feb 2 13:54:30 2016 -0500

    remove rest-client dependency

commit de9a500
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Thu Mar 10 14:29:16 2016 -0500

    set mongo connection pool size to 30

    This number was chosen in order to have at most 30 long-running queries
    simulaneously executing. This would then fail the 31st query unless a
    connection could be obtained with a timeout of 5 seconds.

commit 8d46fc1
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Mar 9 14:54:15 2016 -0500

    do not index value of experiment_context/metadata

    annotation values can be large amount of text that will not fit into an
    index key of 1024, if it's attempted you may see an error:

      WiredTigerIndex::insert: key too large to index...

commit 4426582
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 8 23:01:46 2016 -0500

    flatten translator arrays so we return one, if any

commit 4d42c35
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 8 20:38:41 2016 -0500

    bump puma to 3.1.0

commit 5081567
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 8 20:36:41 2016 -0500

    remove unnecessary local variables

commit 32c5e56
Author: Tony Bargnesi <abargnesi@gmail.com>
Date:   Tue Mar 8 16:59:38 2016 -0500

    Update README.md

commit 53ea95f
Author: Tony Bargnesi <abargnesi@gmail.com>
Date:   Tue Mar 8 16:51:59 2016 -0500

    Update README.md

commit 53653c0
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Mar 7 23:06:27 2016 -0500

    correct references when serialization evidence

    using rewrite references work in bel.rb

commit 1b4dbb7
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Feb 2 16:11:02 2016 -0500

    convert /api/evidence to BEL using translators

    factored out rendering of evidence_resource_collection to evidence
    helper

    refs #44

commit 3500811
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Feb 2 15:20:01 2016 -0500

    factored out filters validation into helper

    functional decomposition of filter validation for better
    understanding and maintenance; now reporting multiple JSON errors when
    responding with 400.

commit 83935aa
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Feb 2 15:18:27 2016 -0500

    added doc for opening ::Sinatra::Helpers::Stream

    It is important to convey why methods were added to this class. The
    methods are a convenience so RDF.rb's writers can expect to call them.

commit c984f8a
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Feb 2 15:08:44 2016 -0500

    bump version dependencies for bel-rdf-jena / rdf

    rdf bumped to 1.99.1

    bel-rdf-jena bumped to 0.4.2

commit e4eb5dd
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Feb 1 14:50:34 2016 -0500

    dataset serialization to all bel.rb translators

    updated dependencies to support all bel.rb translators

    refs #99

commit b1243d8
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Jan 26 15:57:16 2016 -0500

    aggregate on full-text search; avoids Mongo limits

    A full-text search filter to /api/evidence with a sort on bel_statement
    only used the text index. This means that the bel_statement sort had to
    be done in memory.

    This reaches the 32 MB sort limit with only several tens of thousands of
    documents.

    The solution employed here was to use cursored aggregation allowing disk
    use for sort stages.

    The solution was introduced as an alternative code path if a FTS filter
    was included in the HTTP request. Although this did minimize the risk of
    regression there is a fair bit of to clean up in the mongo
    access layer.

    closes #96

commit 5d44fd0
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Jan 25 21:48:12 2016 -0500

    return annotation/namespace defs in BEL Script

    removed normalization of experiment_context annotation keywords. The
    normalized names were in inconsistent with references.annotations
    definitions.

    integrate next version of bel.rb (0.4.3) to get fixes for
    annotation/namespace formats.

    refs #95

commit 92f7e7e
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Jan 25 15:51:14 2016 -0500

    require MongoDB 3.2; closes #98

commit 0507714
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Jan 25 14:57:28 2016 -0500

    added 0.6.0 mongo migration helper, details follow

    The clear_evidence_facets_cache.rb mongo migration will clear out new
    evidence facet cache storage in case searches were built before
    migrating all documents in the "evidence" collection.

commit 7707a92
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Thu Jan 14 14:16:24 2016 -0500

    fix /api/datasets/{id}/evidence for facet changes

    Now facets correctly in light of evidence facet changes and respects
    "max_values_per_facet".

commit 19eedef
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Thu Jan 14 13:10:57 2016 -0500

    add scripts for Mongo data migrations in 0.6.0

    - Drops evidence_facets since it has been replaced by
      evidence_facet_cache plus individual "evidence_facet_cache_{UUID}"
      collections.
    - Updates each evidence document to have "facets" field contain JSON
      objects instead of JSON strings.

commit 21a7bc4
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Thu Jan 14 13:08:32 2016 -0500

    bumped next version to 0.6.0

    Minor release looking to include:
    - New evidence facet storage in mongo.
    - Improve dataset import for large documents (occasional OOM).
    - Evidence streaming.
    - Evidence export to multiple formats.

commit bb2ac16
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Jan 13 16:44:47 2016 -0500

    facet cache collection creation and removal

    This design builds individual facet_cache collections based on the
    filters applied to the evidence collection. Each filtered evidence
    collection will get it's own "evidence_facet_cache_{UUID}" mongo
    collection. The facets values are grouped by category, name so it's
    trivial to cursor out the facets (still need to set the filter string
    though).

    This alleviates the max document size issue for large evidence
    collections. A max of 1000 facet values can be added to each category,
    name pair in order to stay within the size limit.

    Facet cache eviction isn't great here:

    - Individual evidence changes require removal of facet caches for the
      empty filter search as well as any overlapping filter/facet.
    - Creation or removal of a dataset will remove all facet caches. The
      thought is that for large dataset imports it is more effective to
      regenerate than cache vs. trying to synchronize it with new data.

    This includes a breaking change to evidence document schema. The
    evidence "facets" array stores the full category, name, value json
    objects instead of flat strings. This is done to make it possible to
    separate values into category, name groupings. We should include an
    upgrade note for this and possibly a script.

commit f5a08a3
Merge: f038be2 a515587
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Jan 13 16:42:24 2016 -0500

    Merge branch 'master' into next

commit f038be2
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Jan 11 22:58:47 2016 -0500

    batch evidence to an array, avoid JRuby enumerator

    The JRuby enumerator uses a thread per next object in an enumerator
    which proves costly. Hundreds of threads are created (tested with
    yourkit) when batch-creating evidence due to the "each_slice(500)" of
    the enumerator.

    This issue is logged in JRuby:
    jruby/jruby#2577

    The solution employed was to yield each evidence directly to the block
    and batch 500 into an array at a time. This should avoid the OOM
    exception received:

    ava.lang.OutOfMemoryError: unable to create new native thread

    Indeed the thread count was observed to be lower in yourkit.
abargnesi pushed a commit to abargnesi/bel_parser that referenced this issue Jun 17, 2016
Analyzing stacktraces indicated many threads were being created with
calls to Enumerator.next on JRuby. These threads stayed did not complete
and ultimiately resulted in an out of memory error.

The solution employed is to process all lines yielded to the block but
expand, in a stateful manner, when a line continuator is encountered.

JRuby bug: jruby/jruby#2577
abargnesi pushed a commit to OpenBEL/bel_parser that referenced this issue Jun 17, 2016
Analyzing stacktraces indicated many threads were being created with
calls to Enumerator.next on JRuby. These threads stayed did not complete
and ultimiately resulted in an out of memory error.

The solution employed is to process all lines yielded to the block but
expand, in a stateful manner, when a line continuator is encountered.

JRuby bug: jruby/jruby#2577

remove line_continuator mixin; orphaned

closes OpenBEL/bel.rb#126
abargnesi pushed a commit to OpenBEL/openbel-api that referenced this issue Jul 5, 2016
Squashed commit of the following:

commit 804c313
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Jun 8 05:35:52 2016 -0400

    bump versions; published 1.0.1

commit a01f56f
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Jun 8 03:29:17 2016 -0400

    bumped bel to version 1.0.0

commit c28b74d
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Jun 7 15:35:11 2016 -0400

    set language version as configured in OpenBEL API

commit d023769
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Jun 6 11:24:05 2016 -0400

    /api/version route; exposes API semantic version

commit 12af9ce
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Jun 6 10:42:55 2016 -0400

    refactored /api/language routes into one class

commit 5005429
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Jun 1 14:44:46 2016 -0400

    remove explicit statement parse for nanopub

    statement parsing is encapsulated within Nanopub state

commit 335a982
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Jun 1 14:38:09 2016 -0400

    create Annotation model before unification

commit 24f3cdf
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue May 31 10:57:11 2016 -0400

    json-format filters; thanks @wshayes!

commit d15f0e3
Merge: b36876e be3bba1
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Fri May 27 20:19:06 2016 -0400

    Merge branch 'next' of github.com:OpenBEL/openbel-api into next

commit b36876e
Author: Nick <nick@>
Date:   Fri May 27 14:15:18 2016 -0400

    change nanopubs_store to nanopub_store

    The latter is what is used in code.

commit be3bba1
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Fri May 27 19:45:49 2016 -0400

    Fixed some typo's

commit 9e15b4f
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Fri May 27 19:37:09 2016 -0400

    Updating configuration and API documentation

commit 5452f09
Author: Nick <nick@>
Date:   Fri May 27 14:15:18 2016 -0400

    change nanopubs_store to nanopub_store

    The latter is what is used in code.

commit 13fe4d2
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri May 27 02:05:42 2016 -0400

    fix reference to BELParser default resources

    refs OpenBEL/bel_parser#44

commit 07ee8d5
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri May 27 02:00:58 2016 -0400

    functional validation API for expressions

    closes OpenBEL/bel_parser#44

commit 38dad57
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri May 27 01:57:27 2016 -0400

    added validation API doc within /api/expressions

commit e0aa6fb
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Thu May 26 21:12:04 2016 -0400

    Added /api back to all routes

commit 69e07c2
Merge: 1d90827 8dc3089
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Wed May 25 13:30:16 2016 -0400

    Merge branch 'next' of github.com:OpenBEL/openbel-api into next

commit 1d90827
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Wed May 25 13:30:10 2016 -0400

    Updated RAML file - schemas and examples are now embedded

commit 8dc3089
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed May 25 01:34:23 2016 -0400

    [wip] Result for expression validation.

commit 7ab1680
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed May 25 00:47:47 2016 -0400

    config the default URI reader to ref TDB directory

    The default URI reader is established as the TDB directory that the
    biological concepts come from.

    The default URL reader will be ResourceURLReader and will only be used
    when the URI cannot be determined for a resource.

commit f710ac5
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed May 25 00:20:52 2016 -0400

    pluralize the "nanopubs" route; /api/nanopubs/...

    renamed route file, route class name, paths, and references

commit f109a3c
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue May 24 09:50:43 2016 -0400

    datasetload; serialize statement from hash

    The bel_statement is serialized after hash conversion in order to be
    saved to Mongo.

commit d0a71c0
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon May 23 16:16:00 2016 -0400

    refactor generate_uuid as instance method in mixin

commit 2217192
Merge: 7e454a3 fcb8d52
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Mon May 23 11:29:49 2016 -0400

    Merge branch 'next' of github.com:OpenBEL/openbel-api into next

commit 7e454a3
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Mon May 23 11:29:41 2016 -0400

    Fixed nanopub renaming issue

commit fcb8d52
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon May 16 13:11:45 2016 -0400

    refactor expression components api for bel_parser

    use the BELParser::Expression::Model as parsed objects

    removed unused classes that leveraged libbel APIs; the libbel API
    will be removed from bel.rb when bel_parser is fully integrated.

commit 81a79db
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Sat May 14 15:07:06 2016 -0400

    TmpFix for BEL language version (text/plain) issue

    Would only return the text/plain version never the application/json version.  I changed it to only return the JSON formatted data and commented out the Accept header option code.

commit f2066aa
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Fri May 13 20:22:39 2016 -0400

    Missed a nanopub -> Nanopub edit

commit 07edd9d
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Fri May 13 10:40:00 2016 -0400

    Refactor naming and language paths

    Refactored naming:  evidence to nanopub, summary text to support
    Moved /api/{functions|relations|version} to /api/language/...

commit dda76e9
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed May 11 15:11:08 2016 -0400

    rename for Nanopub model; refs OpenBEL/bel.rb#121

commit a1dafde
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue May 10 15:54:56 2016 -0400

    set bel & bel plugins to version, ~> 1.0.0.beta

commit 9e60c51
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Tue May 3 11:22:43 2016 -0400

    Remove sinatra reloader - no longer needed

commit b0a6058
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon May 2 15:50:11 2016 -0400

    return first for annotation/namespace properties

commit 27ce1e4
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon May 2 15:35:09 2016 -0400

    guard when item does not respond to match_text

    annotation_value/namespace_value resources

commit 937b3f2
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon May 2 15:27:47 2016 -0400

    correct inScheme (in_scheme accessor) in namespace

commit 665f18a
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon May 2 15:20:12 2016 -0400

    fix fromSpecies accessor (from_species)

    refs OpenBEL/bel.rb#120

commit 9446578
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon May 2 14:14:29 2016 -0400

    bumped bel.rb dependency to version 1.0.0

    1.0.0 is the version of bel.rb on the next branch. This will be the next
    major release of bel.rb. OpenBEL API needs version 1.0.0 in order to get
    bel_parser and translator plugin changes.

    refs OpenBEL/bel_parser#43

commit e57b936
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri Apr 29 12:21:41 2016 -0400

    remove return_type from relationship resource

    included some cleanup in route

    closes #48

commit d790081
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Fri Apr 29 12:12:14 2016 -0400

    Partial update for /api/relationships

    Waiting on https://waffle.io/OpenBEL/bel_parser/cards/572386c9d39509b000f2b31b

commit 0da2c0e
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri Apr 29 11:25:36 2016 -0400

    fix vocab references due to rdf/rdf-vocab upgrade

commit b52355a
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri Apr 29 11:18:35 2016 -0400

    fix pref_label accessor in routes/resources

    closes #47

commit 01e3060
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri Apr 29 02:41:43 2016 -0400

    bumped bel-rdf-jena plugin version to 0.4.3.beta

    Transitively includes 0.4.0.beta version of rdf-jena.

commit e696785
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri Apr 29 02:10:15 2016 -0400

    pass configured BEL version to Completion API

    update RDF serialization gems to version 2.0.0

    remove dependency on 'rdf' gem; already a dependency for bel.rb

    closes OpenBEL/bel_parser#45

commit 041174e
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Tue Apr 19 14:15:31 2016 -0400

    don't check cookie form if not using jwt=

commit 0e837bc
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Tue Apr 19 10:42:48 2016 -0400

    spec test auth capabilities

commit ec6a143
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Tue Apr 19 09:20:07 2016 -0400

    cleanup auth lint warnings

commit 863a3de
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Tue Apr 19 09:19:45 2016 -0400

    fix token query string access in auth middleware

commit b2607e9
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri Apr 8 13:38:15 2016 -0400

    refactored /api/functions for BEL 1.0 / 2.0

    The functions route now uses the configured BEL specification to
    return functions. So far the short, long, description, and return type
    are provided.

    Updated functions resources to match object model.

    refs OpenBEL/bel_parser#33

commit 2dfe73f
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri Apr 8 13:35:51 2016 -0400

    added "bel.version" setting to configuration

    added bel_parser gem as runtime dependency in .gemspec

    validate bel.version is set in configuration and that it is a defined
    BEL specification (BELParser::Language.defines_version?)

commit c9c29f5
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri Apr 8 12:48:45 2016 -0400

    bumped version to 1.0.0; prepped CHANGELOG

    1.0.0 will be a major version bump to support a configurable BEL
    specification. This will bring support for BEL 2.0 into OpenBEL API.

commit 74517e2
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Mar 30 20:41:01 2016 -0400

    bumped version to 0.6.3; added changelog item

    refs #108

commit 31a27b9
Merge: 22eed27 29eb920
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Mar 30 20:33:06 2016 -0400

    Merge branch 'master' into next

commit 22eed27
Merge: 386c2ea 8d79b26
Author: Tony Bargnesi <abargnesi@gmail.com>
Date:   Wed Mar 30 20:28:27 2016 -0400

    Merge pull request #108 from nbargnesi/param_auth

    look for tokens as parameters as well

commit 8d79b26
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Wed Mar 30 16:51:58 2016 -0400

    look for tokens as parameters as well

commit 386c2ea
Merge: b2abcdf ca2c733
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Mar 23 09:21:37 2016 -0400

    Merge branch 'master' into next

    fixed conflicts in CHANGELOG.md, UPGRADING.md, and VERSION by keeping
    master's changes.

commit b2abcdf
Merge: be2e6e1 85cd7a3
Author: Tony Bargnesi <abargnesi@gmail.com>
Date:   Tue Mar 22 22:21:41 2016 -0400

    Merge pull request #106 from nbargnesi/issue105

    fixes #105

commit 85cd7a3
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Tue Mar 22 18:17:44 2016 -0400

    fixes #105

commit be2e6e1
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 15:07:24 2016 -0400

    replace method for BEL.keys_to_symbols

    additional style alignment

commit fbf5368
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 09:25:06 2016 -0400

    return 404 when translating empty evidence results

    refs #44

commit ac61baf
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 08:32:37 2016 -0400

    added storage.engine note for UPGRADING to 0.6.0

commit 3f4f700
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 08:27:14 2016 -0400

    added UPGRADING guide

commit 29f86e8
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 08:05:01 2016 -0400

    added document for 0.6.0 mongodb migration

commit 0e22354
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 06:30:26 2016 -0400

    add configuration check for MongoDB 3.2

    Check will fail to start OpenBEL API is MongoDB is < 3.2

commit 45e5e39
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 06:17:57 2016 -0400

    added missing arg to render evidence collection

commit 1edb037
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Mar 14 14:45:43 2016 -0400

    set mongo operation timeouts to unbounded

    The operation timeout is the number of seconds that can pass before
    subsequent reads from a mongo operation. This change makes this read
    timeout unbounded in order to satisfy long evidence and facet creation
    queries.

commit 39524ca
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Mar 14 13:46:25 2016 -0400

    remove cache facets during dataset load

    Cached facets were removed at the end of a dataset load. Now they are
    additionally removed at the start of the load as well as every increment
    of 10k nanopubs loaded.

commit 68c2107
Merge: de9a500 61a291d
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Mar 14 12:50:35 2016 -0400

    Merge branch 'next' into rewrite_references

commit 61a291d
Merge: 1b4dbb7 1bdf14e
Author: Tony Bargnesi <abargnesi@gmail.com>
Date:   Mon Mar 14 12:20:40 2016 -0400

    Merge pull request #101 from nbargnesi/issue100

    Issue100

commit 1bdf14e
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Mon Mar 14 12:05:43 2016 -0400

    document auth.enabled, auth.secret

commit 0e900f6
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Tue Feb 2 13:56:15 2016 -0500

    include only auth enabled/secret in default config

    for #100

commit fbb8b06
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Tue Feb 2 13:55:54 2016 -0500

    simplify authenticate route to enabled/disabled

commit fe724ff
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Tue Feb 2 13:54:30 2016 -0500

    remove rest-client dependency

commit de9a500
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Thu Mar 10 14:29:16 2016 -0500

    set mongo connection pool size to 30

    This number was chosen in order to have at most 30 long-running queries
    simulaneously executing. This would then fail the 31st query unless a
    connection could be obtained with a timeout of 5 seconds.

commit 8d46fc1
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Mar 9 14:54:15 2016 -0500

    do not index value of experiment_context/metadata

    annotation values can be large amount of text that will not fit into an
    index key of 1024, if it's attempted you may see an error:

      WiredTigerIndex::insert: key too large to index...

commit 4426582
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 8 23:01:46 2016 -0500

    flatten translator arrays so we return one, if any

commit 4d42c35
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 8 20:38:41 2016 -0500

    bump puma to 3.1.0

commit 5081567
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 8 20:36:41 2016 -0500

    remove unnecessary local variables

commit 32c5e56
Author: Tony Bargnesi <abargnesi@gmail.com>
Date:   Tue Mar 8 16:59:38 2016 -0500

    Update README.md

commit 53ea95f
Author: Tony Bargnesi <abargnesi@gmail.com>
Date:   Tue Mar 8 16:51:59 2016 -0500

    Update README.md

commit 53653c0
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Mar 7 23:06:27 2016 -0500

    correct references when serialization evidence

    using rewrite references work in bel.rb

commit 1b4dbb7
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Feb 2 16:11:02 2016 -0500

    convert /api/evidence to BEL using translators

    factored out rendering of evidence_resource_collection to evidence
    helper

    refs #44

commit 3500811
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Feb 2 15:20:01 2016 -0500

    factored out filters validation into helper

    functional decomposition of filter validation for better
    understanding and maintenance; now reporting multiple JSON errors when
    responding with 400.

commit 83935aa
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Feb 2 15:18:27 2016 -0500

    added doc for opening ::Sinatra::Helpers::Stream

    It is important to convey why methods were added to this class. The
    methods are a convenience so RDF.rb's writers can expect to call them.

commit c984f8a
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Feb 2 15:08:44 2016 -0500

    bump version dependencies for bel-rdf-jena / rdf

    rdf bumped to 1.99.1

    bel-rdf-jena bumped to 0.4.2

commit e4eb5dd
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Feb 1 14:50:34 2016 -0500

    dataset serialization to all bel.rb translators

    updated dependencies to support all bel.rb translators

    refs #99

commit b1243d8
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Jan 26 15:57:16 2016 -0500

    aggregate on full-text search; avoids Mongo limits

    A full-text search filter to /api/evidence with a sort on bel_statement
    only used the text index. This means that the bel_statement sort had to
    be done in memory.

    This reaches the 32 MB sort limit with only several tens of thousands of
    documents.

    The solution employed here was to use cursored aggregation allowing disk
    use for sort stages.

    The solution was introduced as an alternative code path if a FTS filter
    was included in the HTTP request. Although this did minimize the risk of
    regression there is a fair bit of to clean up in the mongo
    access layer.

    closes #96

commit 5d44fd0
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Jan 25 21:48:12 2016 -0500

    return annotation/namespace defs in BEL Script

    removed normalization of experiment_context annotation keywords. The
    normalized names were in inconsistent with references.annotations
    definitions.

    integrate next version of bel.rb (0.4.3) to get fixes for
    annotation/namespace formats.

    refs #95

commit 92f7e7e
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Jan 25 15:51:14 2016 -0500

    require MongoDB 3.2; closes #98

commit 0507714
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Jan 25 14:57:28 2016 -0500

    added 0.6.0 mongo migration helper, details follow

    The clear_evidence_facets_cache.rb mongo migration will clear out new
    evidence facet cache storage in case searches were built before
    migrating all documents in the "evidence" collection.

commit 7707a92
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Thu Jan 14 14:16:24 2016 -0500

    fix /api/datasets/{id}/evidence for facet changes

    Now facets correctly in light of evidence facet changes and respects
    "max_values_per_facet".

commit 19eedef
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Thu Jan 14 13:10:57 2016 -0500

    add scripts for Mongo data migrations in 0.6.0

    - Drops evidence_facets since it has been replaced by
      evidence_facet_cache plus individual "evidence_facet_cache_{UUID}"
      collections.
    - Updates each evidence document to have "facets" field contain JSON
      objects instead of JSON strings.

commit 21a7bc4
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Thu Jan 14 13:08:32 2016 -0500

    bumped next version to 0.6.0

    Minor release looking to include:
    - New evidence facet storage in mongo.
    - Improve dataset import for large documents (occasional OOM).
    - Evidence streaming.
    - Evidence export to multiple formats.

commit bb2ac16
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Jan 13 16:44:47 2016 -0500

    facet cache collection creation and removal

    This design builds individual facet_cache collections based on the
    filters applied to the evidence collection. Each filtered evidence
    collection will get it's own "evidence_facet_cache_{UUID}" mongo
    collection. The facets values are grouped by category, name so it's
    trivial to cursor out the facets (still need to set the filter string
    though).

    This alleviates the max document size issue for large evidence
    collections. A max of 1000 facet values can be added to each category,
    name pair in order to stay within the size limit.

    Facet cache eviction isn't great here:

    - Individual evidence changes require removal of facet caches for the
      empty filter search as well as any overlapping filter/facet.
    - Creation or removal of a dataset will remove all facet caches. The
      thought is that for large dataset imports it is more effective to
      regenerate than cache vs. trying to synchronize it with new data.

    This includes a breaking change to evidence document schema. The
    evidence "facets" array stores the full category, name, value json
    objects instead of flat strings. This is done to make it possible to
    separate values into category, name groupings. We should include an
    upgrade note for this and possibly a script.

commit f5a08a3
Merge: f038be2 a515587
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Jan 13 16:42:24 2016 -0500

    Merge branch 'master' into next

commit f038be2
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Jan 11 22:58:47 2016 -0500

    batch evidence to an array, avoid JRuby enumerator

    The JRuby enumerator uses a thread per next object in an enumerator
    which proves costly. Hundreds of threads are created (tested with
    yourkit) when batch-creating evidence due to the "each_slice(500)" of
    the enumerator.

    This issue is logged in JRuby:
    jruby/jruby#2577

    The solution employed was to yield each evidence directly to the block
    and batch 500 into an array at a time. This should avoid the OOM
    exception received:

    ava.lang.OutOfMemoryError: unable to create new native thread

    Indeed the thread count was observed to be lower in yourkit.
@headius headius mentioned this issue Mar 27, 2018
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants