Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Should of done these patches ages. I've long trumped it faster to read an SDfile as follows in CDK 1.5/2.0:
This is fast but is unfortunately not fault tolerant. In real life SDfile have junk in them and so we fall back to the slower
IteratingSDFReader
.ChEBI 149 with MDLV2000Reader
ChEBI 149 with IteratingSDFReader
IteratingSDFReader actually contains some of the first patches I made to CDK. Looking at it 5 years wiser I realised some simple changes could bring this down to the same speed.
Step 1. Replace the synchronized StringBuffer with StringBuilder:
Step 2. Avoid redundant memcpy (string.getBytes()), use string prefix matching instead of REGEX, and only check the V2000/V3000 line on the line it must be in:
I think tweaked the data header/block reading a little. Hopefully Greg Landrums push towards an Open Molfile push. Technically you can have an SDF record separator not be counted as such when it's in a data value:
However I can not see a genuine use case for that, whilst I can see this happening by accidient: