New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AtomPairs2DFingerprint #417
Conversation
I'll have to review this offline as the diff is too big for github. For starters can you fix the build (don't use LATEST in the parent version of cdk-fingerprint/pom.xml) also delete the OS X .DS_store files from finder. |
Okay
|
Oh and also license file needs to be LGPL. |
@egonw will be able to clarify if we can add include the comment about it being public domain as well, open sources licenses are complicated with this respect. |
Got it. I'll make the changes. Unfortunately, it does need to match PaDEL exactly -- all of my collab's historical QSAR models use it, but their client wants them to 1) stop using the original PaDEL software, 2) use CDK, and 3) still use that same fingerprint from PaDEL even though it's not in CDK...and then compare its performance to things that are in CDK. |
Okay I might add an option to improve the hashing then... running on the Briem-Lessel activity benchmark (https://link.springer.com/article/10.1023%2FA%3A1008793325522) it does quite poorly. Here are the results, for comparison Daylight Path (Fingeprint.java in CDK) get ~65%, ECFP4 (CircularFingerprint in CDK) gets about ~78%.
|
Yeah...pretty much what I would figure.
…On Mon, Feb 5, 2018 at 3:29 PM, John Mayfield ***@***.***> wrote:
Okay I might add an option to improve the hashing then... running on the
Briem-Lessel activity benchmark (https://link.springer.com/
article/10.1023%2FA%3A1008793325522) it does quite poorly. Here are the
results, for comparison Daylight Path (Fingeprint.java in CDK) get ~65%,
ECFP4 (CircularFingerprint in CDK) gets about ~78%.
$ duster benchmark -p atompair_fp.fps
[duster/benchmark] k=10, measure=Tanimoto,
[duster/benchmark] Loaded fingerprints in 0s9ms
[duster/benchmark] 5 classes: ACE (39) TXA2 (48) HMG (109) PAF (132) 5HT3 (47)
[duster/benchmark] Scored fingerprints in 0s45ms
ACE: 51%
Score(i): 0.63, 0.60, 0.53, 0.55, 0.50, 0.43, 0.48, 0.60, 0.43, 0.38
CumuScore(i): 0.63, 0.61, 0.58, 0.58, 0.56, 0.54, 0.53, 0.54, 0.52, 0.51
Dist(i) Min: 0.66, 0.65, 0.74, 0.62, 0.72, 0.76, 0.75, 0.60, 0.76, 0.67
Dist(i) Max: 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.97, 0.95, 0.95
Dist(i) Avg: 0.94, 0.92, 0.92, 0.90, 0.91, 0.94, 0.92, 0.87, 0.91, 0.90
TXA2: 50%
Score(i): 0.63, 0.53, 0.59, 0.61, 0.45, 0.35, 0.47, 0.57, 0.47, 0.37
CumuScore(i): 0.63, 0.58, 0.59, 0.59, 0.56, 0.53, 0.52, 0.53, 0.52, 0.50
Dist(i) Min: 0.66, 0.66, 0.68, 0.66, 0.56, 0.65, 0.54, 0.62, 0.53, 0.53
Dist(i) Max: 1.00, 1.00, 0.91, 0.90, 0.89, 0.89, 0.89, 0.88, 0.87, 0.86
Dist(i) Avg: 0.91, 0.86, 0.83, 0.83, 0.84, 0.81, 0.80, 0.79, 0.75, 0.76
HMG: 77%
Score(i): 0.84, 0.80, 0.76, 0.78, 0.71, 0.78, 0.81, 0.74, 0.79, 0.73
CumuScore(i): 0.84, 0.82, 0.80, 0.80, 0.78, 0.78, 0.78, 0.78, 0.78, 0.77
Dist(i) Min: 0.74, 0.74, 0.68, 0.65, 0.63, 0.63, 0.63, 0.66, 0.66, 0.44
Dist(i) Max: 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.96, 0.96, 0.96
Dist(i) Avg: 0.95, 0.93, 0.91, 0.90, 0.89, 0.88, 0.88, 0.87, 0.87, 0.86
PAF: 48%
Score(i): 0.68, 0.55, 0.50, 0.41, 0.46, 0.47, 0.45, 0.45, 0.41, 0.44
CumuScore(i): 0.68, 0.61, 0.57, 0.53, 0.52, 0.51, 0.50, 0.50, 0.49, 0.48
Dist(i) Min: 0.66, 0.66, 0.67, 0.66, 0.66, 0.61, 0.61, 0.58, 0.58, 0.58
Dist(i) Max: 1.00, 0.96, 0.96, 0.96, 0.94, 0.94, 0.94, 0.94, 0.93, 0.93
Dist(i) Avg: 0.91, 0.88, 0.87, 0.84, 0.84, 0.81, 0.80, 0.81, 0.79, 0.78
5HT3: 53%
Score(i): 0.65, 0.60, 0.58, 0.54, 0.54, 0.50, 0.56, 0.48, 0.38, 0.44
CumuScore(i): 0.65, 0.63, 0.61, 0.59, 0.58, 0.57, 0.57, 0.56, 0.54, 0.53
Dist(i) Min: 0.80, 0.75, 0.71, 0.73, 0.72, 0.70, 0.74, 0.71, 0.68, 0.67
Dist(i) Max: 1.00, 1.00, 1.00, 0.91, 0.91, 0.88, 0.87, 0.87, 0.84, 0.84
Dist(i) Avg: 0.94, 0.90, 0.89, 0.85, 0.84, 0.82, 0.82, 0.80, 0.78, 0.77
[duster/benchmark] null: 57.8% (55.9% avg-of-avg)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#417 (comment)>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/ABRfS1OI75L662uHN-Kg81xdmNfGUjL6ks5tR2SxgaJpZM4R529w>
.
|
RE: LGPL and Public Domain: My understanding is that due to the nature of public domain software, it can be included in anything (b/c it has 0 restrictions). So, we can include public domain code into a LGPL licensed product, so long as the public domain part is identified as such. What we can't do is go the other way and put LGPL into public domain and then treat it as public domain. This link discusses this with respect to GPL, but I wouldn't think the differences would be significant in this respect: https://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.en.html#GPLUSGovAdd |
Almost there, could you squish down the commits (see http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html) then I could annotate specific comments via GitHUb. Also have attach a patch file optimising and fixing the comments below. Patch File: apfp.diff
|
68c5092
to
e2d1cf6
Compare
It's fine if the code is Public Domain (it must still have a clear header the authors and that it is PD). But any PD code can be used in any LGPL project. We have more public domain bits in the CDK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor things left, I think... and what John said, please remove the .DS_Store/ folders :)
I support the idea of the squashing of commits.
* Note: I adapted this fingerprint from Yap Chun Wei's PaDEL source code, which can be found here: | ||
* http://www.yapcwsoft.com/dd/padeldescriptor/ | ||
* | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a clear copyright/license statement. See many other source code files, but it must include:
- who wrote the code (so, that includes at least Lyle and Yap)
- the original code's license applies, but since that is public domain (PaDEL website: "The source code is released as public domain."), you actually in this rare occasion the right to set a license (but PD is fine)
return dist + "_" + a.getSymbol() + "_" + b.getSymbol(); | ||
} | ||
|
||
private static String encodeHalPath(int dist, IAtom a, IAtom b) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally like to have some brief documentation on private methods too, to support maintenance of the code...
/* This work is the product of a US Government employee as part of his/her regular duties | ||
* and is thus in the public domain. | ||
* | ||
* Author: Lyle D. Burgoon, Ph.D. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a community standard to add your email address too. The reason for that is, if someone would even like to change the license (or resolve some copyright question), there is some guidance how to reach the author. There are legal obligations to do ones best to contact original authors, and this custom is to help out a bit.
I'm making changes now, and then I'll squash the commits again. Probably have something up later today/this evening. |
Sorry been busy.. if not already done could you fix the test failure as well. I introduced it when I change the counting fingerprint implementation to be more inline with what the API intends. Otherwise just the bits Egon's said then all good to go in: ``
|
Sounds good. Busy on this end, too, but will get to it hopefully tomorrow.
…On Feb 12, 2018 5:43 AM, "John Mayfield" ***@***.***> wrote:
Sorry been busy.. if not already done could you fix the test failure as
well. I introduced it when I change the counting fingerprint implementation
to be more inline with what the API intends. Otherwise just the bits Egon's
said then all good to go in:
``
[ERROR] testGetCountFingerprint(org.openscience.cdk.fingerprint.AtomPairs2DFingerprintTest)
Time elapsed: 0.012 s <<< FAILURE!
java.lang.AssertionError: expected:<780> but was:<9>
at org.openscience.cdk.fingerprint.AtomPairs2DFingerprintTest.
testGetCountFingerprint(AtomPairs2DFingerprintTest.java:78)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#417 (comment)>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/ABRfSw4hd6Iik43s6SRktkk9fhuT2g9eks5tUBW9gaJpZM4R529w>
.
|
Made the change in the test case. Made Egon's changes. |
Thanks, looks good. |
Adds new fingerprint from PaDEL -- AtomPairs2DFingerprint. My collaborators want to use CDK, but they have a need for this particular fingerprint to be present. In order to get this to build and test under maven I had to change the version in the pom.xml to LATEST. That also required downloading of other items -- which are here. I'm more than happy to drop some of these unnecessary files from the repo if you want.