Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add arbitrary-precision integer to std #1081

Merged
merged 1 commit into from Jun 10, 2018
Merged

Add arbitrary-precision integer to std #1081

merged 1 commit into from Jun 10, 2018

Conversation

tiehuis
Copy link
Member

@tiehuis tiehuis commented Jun 8, 2018

This is required for the self-hosted implementation so here it is. The main thing remaining that would be useful (besides general performance improvements) would be randomized testing against another library (GMP). All branches have been covered however and since we don't employ any complex algorithms this isn't a huge concern just yet.

A few notes on the implementation:

  • Any unsigned power of two integer type less than 64 bits in size is supported
    as a Limb type.
  • The algorithms used are kept simple for the moment. More complicated
    algorithms are generally only more useful as integer sizes increase a
    lot and I don't expect our current usage to be used for this purpose
    just yet.
  • All branches (practically) have been covered by tests.

See https://github.com/tiehuis/zig-bn/tree/986a2b3243d0454b8430a6adf4ad48611850c1b8/bench
for rough performance comparison numbers.

Closes #364.

A few notes on the implementation:

 - Any unsigned power of two integer type less than 64 bits in size is supported
 as a Limb type.
 - The algorithms used are kept simple for the moment. More complicated
 algorithms are generally only more useful as integer sizes increase a
 lot and I don't expect our current usage to be used for this purpose
 just yet.
 - All branches (practically) have been covered by tests.

See https://github.com/tiehuis/zig-bn/tree/986a2b3243d0454b8430a6adf4ad48611850c1b8/bench
for rough performance comparison numbers.

Closes #364.
@@ -464,6 +464,7 @@ set(ZIG_STD_FILES
"math/atan.zig"
"math/atan2.zig"
"math/atanh.zig"
"math/big/int.zig"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just following a standard structure, same as Go.

// Converts primitive integer values onto a stack-based big integer, or passes through existing
// Int types with no modifications. This can fail at runtime if using a very large dynamic
// integer but it is very unlikely and is considered a user error.
fn wrapInt(allocator: *Allocator, bn: var) *const Int {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I have this to be unconditional (not return an error) is that something like cmp should work without error. This does mean there is an upper limit on comptime integers passed as arguments. Fine in practice, a user can simply call Bn.set first in this case.

The wrapper_buffer_size is larger than it needs to be as well. Technically 512 gives us space for 512 * 8 = 4096 bit comptime arguments, but the maximum size is actually smaller 512. Probably just overlooked something or the allocator is over-allocating (shouldn't be).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense. We should be able to get a compile error (not runtime error) when the value cannot fit, right?

Copy link
Member

@andrewrk andrewrk Jun 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it starts to get too complicated, we can always cast the wrapInt value to u64/i64 (compile error if it does not fit) and then document that one should use Bn.set for values outside this range.

return !r.isOdd();
}

fn bitcount(self: *const Int) usize {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loosely, but did we want to expose the LLVM ctpop intrinsic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe that is #767

self.len += 1;

// TODO: shift == 64 at compile-time fails. Fails on u128 limbs.
w_value >>= Limb.bit_count / 2;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few deficiencies in the stage1 compiler's big integer library. I'll make a pass through at some point and fix up the edge cases as these issues typically result in silent comptime errors which is the worst kind.

@andrewrk
Copy link
Member

andrewrk commented Jun 8, 2018

Great timing! I think I'll be needing this within the next couple weeks for the self hosted compiler.

// rma = a * b
//
// For greatest efficiency, ensure rma does not alias a or b.
pub fn mul(rma: *Int, av: var, bv: var) !void {
Copy link
Member Author

@tiehuis tiehuis Jun 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have actually implemented Karatsuba (and Toom-3) variants for this, but they were a bit unclean to implement well (on low-level limbs with temporary allocations) so I haven't pushed them anywhere just yet. These are only useful for balanced operand sizes as well, so I expect the main use case right now wouldn't get any benefits. Unbalanced multiplication requires some other algorithms which I haven't explored in detail.

@tiehuis
Copy link
Member Author

tiehuis commented Jun 8, 2018

One additional note: Did we want a rational big number type at this point for anything? This is pretty easy to implement atop this.

}
}

pub const Int = struct {
Copy link
Member

@andrewrk andrewrk Jun 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some ideas that might be worth exploring, but maybe you already considered them, and have some thoughts?

Also I don't consider any of these merge blockers.

allocator in function params instead of Int struct

Downsides: introduces a new kind of footgun where you can use one allocator for initializing and then a different one for addition or something and that would be undefined behavior. This could be mitigated in several ways:

  • Int could include allocator as a field, but only in debug mode and assert that passed allocator matches.
  • Generally, allocators can include safety in debug mode that make note of the *Allocator address as metadata along side allocated data, and can assert that a free/realloc uses the same address.
  • Even more generally, the Allocator interface itself, could do this on top of every allocator implementation.

Another downside: less convenient from an API perspective

Benefits:

  • less memory usage
  • better cache performance?

union for limbs to handle the case where the int is <= @maxValue(Limb)

This is for higher performance and to avoid using memory allocation when possible. In general we would like to use big ints when the value could possibly be larger than one of the builtin int sizes, but even in these cases it is likely that a usize could store it after all.

try to use less memory

One example source file is std/special/compiler_rt/udivmodti4_test.zig. It's 65300 lines and each line has 4 bigints. At 40 bytes per big int, this comes out to 10MB. If big ints were 16 bytes instead, this comes out to 4MB, which is a pretty significant difference.

Related: #471

Here are some ideas to conserve memory:

comptime {
    // status quo
    // 40 bytes on x86_64
    // 20 bytes on i386
    @compileLog(@sizeOf(struct {
        allocator: *u8,
        positive: bool,
        limbs: []i32,
        len: usize,
    }));
    // no allocator field
    // 32 bytes on x86_64
    // 16 bytes on i386
    @compileLog(@sizeOf(struct {
        positive: bool,
        limbs: []i32,
        len: usize,
    }));
    // Use len to manage heap allocation
    // 24 bytes on x86_64
    // 12 bytes on i386
    @compileLog(@sizeOf(struct {
        limbs: packed union {
            Heap: [*]u8,
            Single: usize,
        },
        len: usize,
        positive: bool,
    }));
    // u32 for len
    // 16 bytes on x86_64
    // 12 bytes on i386
    @compileLog(@sizeOf(struct {
        limbs: packed union {
            Heap: [*]u8,
            Single: usize,
        },
        len: u32,
        positive: bool,
    }));
    // use the most significant bit of len for positive
    // 16 bytes on x86_64
    // 8 bytes on i386
    @compileLog(@sizeOf(packed struct {
        limbs: packed union {
            Heap: [*]u8,
            Single: usize,
        },
        positive: bool,
        len: @IntType(false, usize.bit_count - 1),
    }));
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this all comes down to the difference in usage of arbitrary precision integers. For mathematical/cryptographic purposes you generally don't have a large number of objects and they are typically very large so you don't need to specialize small sizes. The compiler on the other hand uses it as its default integer type and allocates many of them at once.

Let me think about this a bit and I'll get back to you. I'm still inclined to prefer the allocator field still stored with the type as is since it simplifies the usage and avoids the possible use errors as suggested. I'll see how much the implementation changes special-casing small limbs and/or changing the length representation. At the least, packing the pointer in the length is a worthwhile change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me think about this a bit and I'll get back to you.

Sounds good.

Another argument in favor of how you did it here is that the compiler is the one which has weird memory requirements, and thus can be the one to do the tricks. For example, in the compiler the type can look something like this:

const Value = union(enum) {
    Pointer: *PointerInfo,
    Bool: bool,
    SmallSignedInt: i64,
    SmallUnsignedInt: u64,
    Int: *std.math.big.Int,
};

So here we've managed to keep the base Value type down to @sizeOf(usize), at the cost of a little complexity initializing and dealing with some of the extra info that requires more bytes to store.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I think for the moment I'll use the type:

struct {
    allocator: *u8,
    limbs: []u8,
    metadata: usize,
}

Where the top-bit of metadata is the sign bit, with the remaining being the length of the limbs. This is 32 bytes so a small 8 byte saving. We can drop this down to 24 bytes in the future by using a u31/u32 for the cap and length if we want a bit more space (limiting the max Int size to 16gb, however).

Also note that general purpose memory allocators may not allocate certain small bin sizes (jemalloc for instance only does 16, 32, 48...) so certain byte sizes may not be as useful. This isn't always the case though, ptmalloc2 (libc) for instance looks like it does 8-byte strides so it doesn't matter there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an internal change/optimization and I don't expect the API to change, I'll merge now and get around to doing this when it is used/later.

@andrewrk
Copy link
Member

andrewrk commented Jun 8, 2018

One additional note: Did we want a rational big number type at this point for anything? This is pretty easy to implement atop this.

I think it would make sense to use that for the comptime_float type. In stage1 we have a framework for this, but the implementation just uses all the f128 APIs. https://github.com/ziglang/zig/blob/master/src/bigfloat.cpp

@tiehuis tiehuis merged commit dc8bda7 into master Jun 10, 2018
@tiehuis tiehuis deleted the bigint branch June 10, 2018 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants