Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Arithmetic with Overflow #6223

Closed
wants to merge 30 commits into from

Conversation

bcardiff
Copy link
Member

@bcardiff bcardiff commented Jun 19, 2018

This PR adds overflow to integer arithmetics, adds new language constructs to allow user
to choose where overflowing (checked) or wrapping (unchecked) semantic is desired in a given lexical scope.

It also updates the STD where unchecked semantic is required. Or at least, where specs began to fail if run with --overflow-checked.

Updates in the compiler options

There are two compiler options --overflow-checked and --overflow-unchecked to change the
default overflow policy to be used. Currently the default is --overflow-unchecked to keep the current
semantic and allow code to be updated more easily. In future versions I think the default should be --overflow-checked in favor of safety.

After building a crystal compiler you build with --overflow-checked option and run specs as follows.

$ ./bin/crystal foo.cr --overflow-checked
$ ./bin/crystal build foo.cr --overflow-checked
$ ./bin/crystal spec --overflow-checked

Updates in the language

The operations affected by the overflow policy are only +,-,* between integers.

The new language constructs are:

checked { <expr> }
unchecked { <expr> }

The type and value of the whole [un]checked { <expr> } matches the type and value of <expr> or it raise an OverflowError.

When an OverflowError is raised the location advertised by the exception is/should be the location of the operator that generate the overflow.

Lexical Scope

The intention of using a lexical scope is that it would make as easy as possible to know how the operation is performed.

When compiled with --overflow-checked,

def foo(v)
  unchecked { 
    # the * will not raise on overflow
    bar(v) * 2 
  }
end

def bar(v)
  # the + will raise on overflow
  v + 2
end

Regarding blocks the idea is the same, when compiled with --overflow-unchecked,

def bar(v)
  # all operations here would be unchecked due to default policy
  yield v
end

def foo
  checked {
    bar(127_i8) do |v| 
      # Although the block is yielded from bar, the + is
      # performed as checked since it is written lexically
      # inside a checked section
      v + 1_i8
    end
  }
  1
rescue OverflowError
  0
end

foo # => 0

Implementation details

Given that +,-,* for Int are primitives they are always inlined and the codegen stage knows what to do along the way.
It is easy to change the semantic of the operation give some context, in this case the lexical scope information.

The current behaviour and under the unchecked policy the + is translated to add IR.

def inc(v)
  v + 1
end
define i32 @"*inc<Int32>:Int32"(i32 %v) #0 {
entry:
  %0 = add i32 %v, 1
  ret i32 %0
}

Under the checked policy the + is translated to llvm.sadd.with.overflow.* a LLVM with Overflow Intrinsics.

define i32 @"*inc<Int32>:Int32"(i32 %v) #0 {
entry:
  %0 = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %v, i32 1)
  %1 = extractvalue { i32, i1 } %0, 0
  %2 = extractvalue { i32, i1 } %0, 1
  br i1 %2, label %overflow, label %normal

overflow:                                         ; preds = %entry
  %3 = call %OverflowError* @"*OverflowError::new:OverflowError"()
  call void @"*raise<OverflowError>:NoReturn"(%OverflowError* %3)
  unreachable

normal:                                           ; preds = %entry
  ret i32 %1
}

Note: In both situations when the operands do not belong to the same type some expansion, truncation and additional checks for overflow are done. This is because the operation is carried on the "bigger" type, and the result is always translated to the type of the left operand. This is the current semantic.

Limitations of the scopes

While developing the feature I came across two operations that would be nice to overflow along with +,-,*.

  1. Int#**(exponent : Int)
  2. Int#-()

This operation are not primitives as the others, but they are operators. I think it would be nicer if their overflow semantic is changed together with the current overflow scope policy.

Maybe for now, Int#**(exponent : Int) got its counterpart Int#unchecked_pow(exponent : Int) since the former will match the compiler policy that will eventually be checked.

The second one was used in one single place
as -info.value.to_{{method}} and changed to unchecked { 0.to_{{method}} - info.value.to_{{method}} }.

These two cases are methods and they are not inlined, so the mechanism described in "Implementation details" does not longer apply.

If we want them to be affected by the overflow policy the possible paths to go are:

  1. Make them @[AlwaysInline] and make the inline methods be affected by the policy.
  2. Add another annotation to specify the method is affected by the overflow policy. If not inlined then the code gen should make the call to the checked or the unchecked version directly.

Inlining the ** might not be that nice since it's ~10 lines and used it in many places. Adding the annotation seems a bit to much for a first round.

Another usage of forwarding the policy to other methods I can imagine is for example in matrix operations: If * is to be applied between matrices or matrix-scalar the user might have fine-grained control beyond the global policy (--overflow-[un]checked). But I found that either the packages always wrap silently or use floats.

So, all in all, up to here the only real candidate was **, hence the unchecked_pow method.

Standard Library changes

As a migration technique a temporal macro __next_unchecked is added and should be replaced with unchecked on the release after merge.

These changes are marked as Random: Mark required unchecked code blocks or different std-lib sections. All of these change can be changed later specially Time and JSON. They are focused of keeping the current behavior mostly. In the case of Time, there was some manual overflow check.

All of these changes were pushed due to existing specs that failed otherwise.

Pending

  • Add [un]checked do; end notation also. I like that { } fits better to be used inside expressions.
  • Overflow on type casts (eg: 256.to_u8 should raise in checked context)
  • Fix/Test some cases of variables shadowing if they are first declared in a [un]checked scope
  • Fix/Test lexical scope and anonymous functions declaration
  • Formatter (Same rules as method, but with the OverflowCheckScope ast node)
  • Track highlighters to add the new keywords

Feedback is welcome

since it's done in codegen_primitive and others codegen_binary avoids setting it
More linear code for operators
for exact location information of the exception in the caller context
Make default unchecked to keep current semantics but schedule change on the default policy
Move OverflowCheck to OverflowCheckScope::Policy
Changing overflow_check should invalidate cache
Removing any of these prevent grisu3_specs to pass
Int#** avoid k *= k in last iteration due to possible overflow
Otherwise generated enum values might overflow (even if they are not used due to manual assignment)
@bcardiff
Copy link
Member Author

bcardiff commented Jul 2, 2018

@wontruefree The only symbols that are left besides &... are ~+ ~- ~*, but at least one other language is using &..., we are not alone.

Unless your keyword layout has something I am not seeing, or we go to with heavy signs ➕➖✖➗(they are not easy to type!), the option to go seems to be &...

@asterite
Copy link
Member

asterite commented Jul 2, 2018

When I first thought about these operators, I thought about +?, -?, etc. It could even maybe be +^, -^, etc... the ^ meaning "overflow" (it goes up :-P). There are plenty of possibilities to choose from (and that symbol doesn't have to come before the operator, it can come after it).

@asterite
Copy link
Member

asterite commented Jul 2, 2018

(that said, I'm fine with &+, etc.)

@bcardiff
Copy link
Member Author

bcardiff commented Jul 2, 2018

^ is associated with exponents. I know we use ** for that, but still..
? suffix might conflict with some expr.op without argument and the ternary operator. I would prefer to leave some space in the operators to not require ... some space. 🥁

I'm fine with & prefix also.

BTW: I found it confusing that swift names overflowing operators to the one that after an overflow will wrap around &+. And we are calling overflowing when an exception might be raised +. #NamesAreHard

@Sija
Copy link
Contributor

Sija commented Jul 2, 2018

BTW: I found it confusing that swift names overflowing operators to the one that after an overflow will wrap around &+. And we are calling overflowing when an exception might be raised +. #NamesAreHard

I'd also assume that overflowing in this case means that's allowed to overflow.

@bew
Copy link
Contributor

bew commented Jul 2, 2018

Following up on what @asterite said, it could be a char after the operator:
What about <operator>! for overflow checking disabled overflow checks ? (dangerous, as @straight-shoota said below)
It feels good because it's known that some methods ending with ! can raise (like not_nil!).

So like: +! *!

@jkthorne
Copy link
Contributor

jkthorne commented Jul 2, 2018

thank @bcardiff that makes a lot of sense.

@straight-shoota
Copy link
Member

@bew But the standard operators +, -, * should raise on overflow. Disabling overflow checks is the exception and needs different operators. Still, ! could make sense because it marks these operators as dangerous.

But then, ! is also an unary operator, so maybe not a good choice here.

@bew
Copy link
Contributor

bew commented Jul 2, 2018

Oh right thanks @straight-shoota, I've updated my message accordingly.

@jhass
Copy link
Member

jhass commented Jul 2, 2018

Yeah, let's avoid something where a wrong space is still valid syntax with a different meaning even if it then would fail the semantic phase, as in 1 +! 2 vs 1 + !2

@Sija
Copy link
Contributor

Sija commented Jul 2, 2018

Maybe @? It visually "wraps around" and has no prior meaning.

@funny-falcon
Copy link
Contributor

If use suffix instead of prefix, then why not same & character?
+&, -&, *&, >>&, /&, %& .
Will it collide with some existing syntax?

It could be read as "add with and", "sub with and" etc, that is close to logical semantic:
a.to_u8 +& b.to_u8 == ((a + b) & 0xff).to_u8

d &+= a &+ b &* c
vs
d +&= a +& b *& c

Though, prefix looks a bit prettier.

@jhass
Copy link
Member

jhass commented Jul 2, 2018

a @+ b could work I guess. a +@ b is bad due to a + @b. (minor: +@ is unary + in Ruby)

@ysbaddaden
Copy link
Contributor

I think &+ is quite clever. It reads as "wrapping add" or (a + b) & 0xFF, which is exactly what we do in languages without wrapping integers (e.g. Ruby).

@ysbaddaden
Copy link
Contributor

I.e. no need to bother to be smart or clever. Swift paved the way. Let's follow.

@megatux
Copy link
Contributor

megatux commented Jul 3, 2018 via email

@sempervictus
Copy link

@megatux: I think the road to malbolge is paved with something like that :)
The Swift way seems the most straightforward (likely to be understood by new developers) and safe approach, reducing the pebcak->equifax scenarios down the line.

@jkthorne
Copy link
Contributor

jkthorne commented Sep 4, 2018

I am curious what is the progress on this?

@RX14
Copy link
Contributor

RX14 commented Sep 10, 2018

@bcardiff ?

@bcardiff
Copy link
Member Author

The amp ops will have the right semantics in 0.27.
The plain ops will change their semantics in 0.28.

I will extract those changes from old branch probably next week.

I’m glad there was interest in the numerical features in the last days!

@jkthorne
Copy link
Contributor

I just saw the amp ops PR go threw!
The 0.27 release will not have the amp ops but they will not be implemented right?

@RX14
Copy link
Contributor

RX14 commented Oct 10, 2018

@wontruefree no, 0.26 had the operators but no implementation. 0.27 will have &+ behave the same as +, 0.28 will change + to raise on overflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet