Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add f16 type #1159

Merged
merged 4 commits into from Jun 27, 2018
Merged

add f16 type #1159

merged 4 commits into from Jun 27, 2018

Conversation

bnoordhuis
Copy link
Contributor

Add support for half-precision floating point operations.

Introduce __extendhfsf2 and __truncsfhf2 in std/special/compiler_rt
and add __gnu_h2f_ieee and __gnu_f2h_ieee as aliases; they are used
in Windows builds.

The logic in std/special/compiler_rt/extendXfYf2.zig has been reworked
and can now operate on 16 bits floating point types.

closes #1122


// Various constants whose values follow from the type parameters.
// Any reasonable optimizer will fold and propagate all of these.
const srcBits: comptime_int = @sizeOf(src_t) * CHAR_BIT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can omit all these explicit comptime_int declarations. I think it is clear enough that all the values are comptime implicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Just a minor thing and then I think it's good to go.

return extendXfYf2(f32, f16, @bitCast(f16, a));
}

pub extern fn __gnu_h2f_ieee(a: u16) f32 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be better as an actual symbol alias like this:

@export("__extenddftf2", @import("extendXfYf2.zig").__extenddftf2, linkage);
@export("__extendsftf2", @import("extendXfYf2.zig").__extendsftf2, linkage);

@@ -22,6 +22,11 @@ comptime {
@export("__floatuntidf", @import("floatuntidf.zig").__floatuntidf, linkage);
@export("__extenddftf2", @import("extendXfYf2.zig").__extenddftf2, linkage);
@export("__extendsftf2", @import("extendXfYf2.zig").__extendsftf2, linkage);
@export("__extendhfsf2", @import("extendXfYf2.zig").__extendhfsf2, linkage);
@export("__gnu_h2f_ieee", @import("extendXfYf2.zig").__gnu_h2f_ieee, linkage);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check this out, you can make this an actual symbol alias:

@export("__gnu_h2f_ieee", @import("extendXfYf2.zig").__extendhfsf2, linkage);

This avoids an extra function call overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually had that in a prior version but it was giving me duplicate symbol definition errors in a couple of tests. It was complaining __gnu_h2f_ieee conflicted with __extendhfsf2, presumably because they're one and the same address.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's curious. I'd like to look at the LLVM IR if you got that error message. It's supposed to work for every target: http://llvm.org/docs/LangRef.html#aliases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For posterity, this is the error it emits:

$ build/zig test std/special/compiler_rt/index.zig
lld: error: duplicate symbol: __gnu_h2f_ieee
>>> defined at extendXfYf2.zig:13 (/home/bnoordhuis/src/zig/std/special/compiler_rt/extendXfYf2.zig:13)
>>>            ./zig-cache/test.o:(__extendhfsf2)
>>> defined at extendXfYf2.zig:13
>>>            ./zig-cache/compiler_rt.o:(__extendhfsf2)

lld: error: duplicate symbol: __truncsfhf2
>>> defined at truncXfYf2.zig:3 (/home/bnoordhuis/src/zig/std/special/compiler_rt/truncXfYf2.zig:3)
>>>            ./zig-cache/test.o:(__gnu_f2h_ieee)
>>> defined at truncXfYf2.zig:3
>>>            ./zig-cache/compiler_rt.o:(__gnu_f2h_ieee)

That's with the functions removed from extendXfYf2.zig and truncXfYf2.zig.

Giving the symbols different linkage didn't help. Defining the gnu symbols only when !is_test does and otherwise seems to have no ill effects so that's what I'll do.

@andrewrk
Copy link
Member

I'll look into the windows failure.

@bnoordhuis
Copy link
Contributor Author

I've managed to reproduce it locally under wine. I'll poke at it more tomorrow if you don't get to it first.

@bnoordhuis
Copy link
Contributor Author

I figured out the Windows failures, kind of; it seems to be a stack alignment issue. This fixes it for me:

diff --git a/std/special/compiler_rt/extendXfYf2.zig b/std/special/compiler_rt/extendXfYf2.zig
index c2721be0..a419eefd 100644
--- a/std/special/compiler_rt/extendXfYf2.zig
+++ b/std/special/compiler_rt/extendXfYf2.zig
@@ -20,7 +20,7 @@ pub extern fn __gnu_h2f_ieee(a: u16) f32 {
 
 const CHAR_BIT = 8;
 
-pub fn extendXfYf2(comptime dst_t: type, comptime src_t: type, a: src_t) dst_t {
+inline fn extendXfYf2(comptime dst_t: type, comptime src_t: type, a: src_t) dst_t {
     const src_rep_t = @IntType(false, @typeInfo(src_t).Float.bits);
     const dst_rep_t = @IntType(false, @typeInfo(dst_t).Float.bits);
     const srcSigBits = std.math.floatMantissaBits(src_t);
diff --git a/std/special/compiler_rt/truncXfYf2.zig b/std/special/compiler_rt/truncXfYf2.zig
index 8491db9a..6e463aba 100644
--- a/std/special/compiler_rt/truncXfYf2.zig
+++ b/std/special/compiler_rt/truncXfYf2.zig
@@ -10,7 +10,7 @@ pub extern fn __truncsfhf2(a: f32) u16 {
 
 const CHAR_BIT = 8;
 
-pub fn truncXfYf2(comptime dst_t: type, comptime src_t: type, a: src_t) dst_t {
+inline fn truncXfYf2(comptime dst_t: type, comptime src_t: type, a: src_t) dst_t {
     const src_rep_t = @IntType(false, @typeInfo(src_t).Float.bits);
     const dst_rep_t = @IntType(false, @typeInfo(dst_t).Float.bits);
     const srcSigBits = std.math.floatMantissaBits(src_t);

Anyone have ideas on whether a better fix is possible?

@andrewrk
Copy link
Member

Here are 2 clues:

@setAlignStack(16);

const result: dst_rep_t align(@alignOf(dst_t)) = absResult | dst_rep_t(sign) << @intCast(DstShift, dstBits - srcBits);
return @bitCast(dst_t, result);

maybe try aligning result before bitcasting it in truncXfYf2, like the other function does:

+    const result: dst_rep_t = absResult | @truncate(dst_rep_t, sign >> @intCast(SrcShift, srcBits - dstBits));
+    return @bitCast(dst_t, result); 

@andrewrk
Copy link
Member

andrewrk commented Jun 26, 2018

If you can find out the assembly line it's crashing on and it looks something like

=> 0x00000000006a53b4 <+676>:	movaps -0xa8(%rbp),%xmm0

then I think that is #1148

@bnoordhuis
Copy link
Contributor Author

bnoordhuis commented Jun 26, 2018

Aw, I'm terribly sorry; I forgot to mention that I adopted that change (edit: align(@alignOf(...))) locally but that alone didn't fix the crash. I'll look into @setAlignStack tomorrow.

@andrewrk
Copy link
Member

andrewrk commented Jun 26, 2018

I think the Windows ABI expects the stack to be 16 bytes aligned always. If that fixes it we might consider implicitly doing @setAlignStack(16) on all exported functions by default when targeting Windows.

Come to think of it I wonder if this is related to our 32-bit windows test failures.

Fixes a bug where the result of a @floatCast wasn't actually checked; it
was checking the result from the previous @floatCast.
Add support for half-precision floating point operations.

Introduce `__extendhfsf2` and `__truncsfhf2` in std/special/compiler_rt.

Add `__gnu_h2f_ieee` and `__gnu_f2h_ieee` as aliases that are used in
Windows builds.

The logic in std/special/compiler_rt/extendXfYf2.zig has been reworked
and can now operate on 16 bits floating point types.

`extendXfYf2()` and `truncXfYf2()` are marked `inline` to work around
a not entirely understood stack alignment issue on Windows when calling
the f16 versions of the builtins.

closes #1122
Replace a conditional ceil/floor call with an unconditional trunc call.
@bnoordhuis
Copy link
Contributor Author

I've tried various incarnations of @setAlignStack() but to no effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add support for f16 floating point type
3 participants