Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SF#364 type promotion in whistogram is based upon the index, not the weight #45

Closed
perldl-bot opened this issue Mar 2, 2015 · 4 comments

Comments

@perldl-bot
Copy link

perldl-bot commented Mar 2, 2015

From http://sourceforge.net/p/pdl/bugs/364 (from @djerius)

@zmughal zmughal added this to the PDL v2.009 milestone Mar 15, 2015
@zmughal zmughal modified the milestones: PDL v2.012, PDL v2.013 Jul 29, 2015
@zmughal zmughal modified the milestones: PDL v2.013, PDL v2.014 Aug 11, 2015
@mohawk2
Copy link
Member

mohawk2 commented Jan 1, 2022

The input was never an "index", it was just input data. The operation's "type-promotion" (really, determination) was never based on only the first parameter. The output's type is written "float+", which is what happened here (the output got promoted to float), which can be seen easily by using this updated code:

use PDL;

srand(372);
my $wt = random( PDL::double, 1000 );
my $lidx = zeroes( PDL::long, 1000 );
my $fidx = zeroes( PDL::float, 1000 );
my $didx = zeroes( PDL::double, 1000 );

my $lh = whistogram( $lidx, $wt, 1, 0, 1 );
my $fh = whistogram( $fidx, $wt, 1, 0, 1 );
my $dh = whistogram( $didx, $wt, 1, 0, 1 );

my $exp = $wt->dsum;
print "$_->[0]=", $_->[1]->info, "\n" for ['lh',$lh], ['fh',$fh], ['dh',$dh], ['exp',$exp];

print "delta, long index: ", $lh - $exp, "\n";
print "delta, float index: ", $fh - $exp, "\n";
print "delta, double index: ", $dh - $exp, "\n";

You get an identical difference between a "long" input type, and a "float" input type. The difference between those and double is simply the result of the difference between single and double precision. There is no bug here.

@mohawk2 mohawk2 closed this as completed Jan 1, 2022
@mohawk2
Copy link
Member

mohawk2 commented Apr 15, 2022

@djerius writes:

Sorry for the late reply, but I believe that there's still an issue here.

Here's the output of your script (With a slight fix on PDL 2.038 setting $exp = PDL($wt->dsum), as dsum doesn't return a piddle):

lh=PDL: Float D [1]
fh=PDL: Float D [1]
dh=PDL: Double D [1]
exp=PDL: Double D []
delta, long index: [0.00069109649]
delta, float index: [0.00069109649]
delta, double index: [0]

As you point out, the difference is due to the difference in precision between float and double, but that's the actual problem. In each case the weight parameter is double precision, but the determined type of the output histogram does not take that into account, leading to a catastrophic loss of precision.

You state "the operation's "type-promotion" (really, determination) was never based on only the first parameter", but that seems to be exactly what is happening. There are only two parameters of import, the input data and the weight, and it doesn't seem to pay attention to the latter.

@mohawk2
Copy link
Member

mohawk2 commented Apr 15, 2022

(With a slight fix on PDL 2.038 setting $exp = PDL($wt->dsum), as dsum doesn't return a piddle)

pdl> p sequence(2)->dsum->info
PDL: Double D []

Looks quite a lot like an ndarray (not piddle) to me?

@mohawk2
Copy link
Member

mohawk2 commented Apr 15, 2022

You state:

There are only two parameters of import, the input data and the weight, and it doesn't seem to pay attention to the latter.

The signature of whistogram, with two ndarray input parameters, one of which has a type-qualifier of float+:

Signature: (in(n); float+ wt(n);float+[o] hist(m); double step; double min; int msize => m)

Now, a quote from https://metacpan.org/pod/PDL::PP#Type-conversions-and-the-signature (you did look in the docs?):

As we had already seen for the int, float and double qualifiers, a pdl marked with a type+ qualifier does not influence the datatype of the pdl operation.

If you consider getting back a value that is less than double-precision a "catastrophic loss", then you'd better pass the in parameter as double-precision. This is how PDL has operated for really quite a long time (at least 1.99987, from 1998). I am open to the idea that there is indeed "an issue here", but so far I am not clever enough to see it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants