[WIP] Arithmetic with Overflow #6223

bcardiff · 2018-06-19T17:31:43Z

This PR adds overflow to integer arithmetics, adds new language constructs to allow user
to choose where overflowing (checked) or wrapping (unchecked) semantic is desired in a given lexical scope.

It also updates the STD where unchecked semantic is required. Or at least, where specs began to fail if run with --overflow-checked.

Updates in the compiler options

There are two compiler options --overflow-checked and --overflow-unchecked to change the
default overflow policy to be used. Currently the default is --overflow-unchecked to keep the current
semantic and allow code to be updated more easily. In future versions I think the default should be --overflow-checked in favor of safety.

After building a crystal compiler you build with --overflow-checked option and run specs as follows.

$ ./bin/crystal foo.cr --overflow-checked
$ ./bin/crystal build foo.cr --overflow-checked
$ ./bin/crystal spec --overflow-checked

Updates in the language

The operations affected by the overflow policy are only +,-,* between integers.

The new language constructs are:

checked { <expr> }
unchecked { <expr> }

The type and value of the whole [un]checked { <expr> } matches the type and value of <expr> or it raise an OverflowError.

When an OverflowError is raised the location advertised by the exception is/should be the location of the operator that generate the overflow.

Lexical Scope

The intention of using a lexical scope is that it would make as easy as possible to know how the operation is performed.

When compiled with --overflow-checked,

def foo(v)
  unchecked { 
    # the * will not raise on overflow
    bar(v) * 2 
  }
end

def bar(v)
  # the + will raise on overflow
  v + 2
end

Regarding blocks the idea is the same, when compiled with --overflow-unchecked,

def bar(v)
  # all operations here would be unchecked due to default policy
  yield v
end

def foo
  checked {
    bar(127_i8) do |v| 
      # Although the block is yielded from bar, the + is
      # performed as checked since it is written lexically
      # inside a checked section
      v + 1_i8
    end
  }
  1
rescue OverflowError
  0
end

foo # => 0

Implementation details

Given that +,-,* for Int are primitives they are always inlined and the codegen stage knows what to do along the way.
It is easy to change the semantic of the operation give some context, in this case the lexical scope information.

The current behaviour and under the unchecked policy the + is translated to add IR.

def inc(v)
  v + 1
end

define i32 @"*inc<Int32>:Int32"(i32 %v) #0 {
entry:
  %0 = add i32 %v, 1
  ret i32 %0
}

Under the checked policy the + is translated to llvm.sadd.with.overflow.* a LLVM with Overflow Intrinsics.

define i32 @"*inc<Int32>:Int32"(i32 %v) #0 {
entry:
  %0 = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %v, i32 1)
  %1 = extractvalue { i32, i1 } %0, 0
  %2 = extractvalue { i32, i1 } %0, 1
  br i1 %2, label %overflow, label %normal

overflow:                                         ; preds = %entry
  %3 = call %OverflowError* @"*OverflowError::new:OverflowError"()
  call void @"*raise<OverflowError>:NoReturn"(%OverflowError* %3)
  unreachable

normal:                                           ; preds = %entry
  ret i32 %1
}

Note: In both situations when the operands do not belong to the same type some expansion, truncation and additional checks for overflow are done. This is because the operation is carried on the "bigger" type, and the result is always translated to the type of the left operand. This is the current semantic.

Limitations of the scopes

While developing the feature I came across two operations that would be nice to overflow along with +,-,*.

Int#**(exponent : Int)
Int#-()

This operation are not primitives as the others, but they are operators. I think it would be nicer if their overflow semantic is changed together with the current overflow scope policy.

Maybe for now, Int#**(exponent : Int) got its counterpart Int#unchecked_pow(exponent : Int) since the former will match the compiler policy that will eventually be checked.

The second one was used in one single place
as -info.value.to_{{method}} and changed to unchecked { 0.to_{{method}} - info.value.to_{{method}} }.

These two cases are methods and they are not inlined, so the mechanism described in "Implementation details" does not longer apply.

If we want them to be affected by the overflow policy the possible paths to go are:

Make them @[AlwaysInline] and make the inline methods be affected by the policy.
Add another annotation to specify the method is affected by the overflow policy. If not inlined then the code gen should make the call to the checked or the unchecked version directly.

Inlining the ** might not be that nice since it's ~10 lines and used it in many places. Adding the annotation seems a bit to much for a first round.

Another usage of forwarding the policy to other methods I can imagine is for example in matrix operations: If * is to be applied between matrices or matrix-scalar the user might have fine-grained control beyond the global policy (--overflow-[un]checked). But I found that either the packages always wrap silently or use floats.

So, all in all, up to here the only real candidate was **, hence the unchecked_pow method.

Standard Library changes

As a migration technique a temporal macro __next_unchecked is added and should be replaced with unchecked on the release after merge.

These changes are marked as Random: Mark required unchecked code blocks or different std-lib sections. All of these change can be changed later specially Time and JSON. They are focused of keeping the current behavior mostly. In the case of Time, there was some manual overflow check.

All of these changes were pushed due to existing specs that failed otherwise.

Pending

Add [un]checked do; end notation also. I like that { } fits better to be used inside expressions.
Overflow on type casts (eg: 256.to_u8 should raise in checked context)
Fix/Test some cases of variables shadowing if they are first declared in a [un]checked scope
Fix/Test lexical scope and anonymous functions declaration
Formatter (Same rules as method, but with the OverflowCheckScope ast node)
Track highlighters to add the new keywords

Feedback is welcome

since it's done in codegen_primitive and others codegen_binary avoids setting it

More linear code for operators

for exact location information of the exception in the caller context

Make default unchecked to keep current semantics but schedule change on the default policy

Move OverflowCheck to OverflowCheckScope::Policy

intrinsics are added on demand

Changing overflow_check should invalidate cache

Removing any of these prevent grisu3_specs to pass

Int#** avoid k *= k in last iteration due to possible overflow

Otherwise generated enum values might overflow (even if they are not used due to manual assignment)

bcardiff · 2018-07-02T18:35:43Z

@wontruefree The only symbols that are left besides &... are ~+ ~- ~*, but at least one other language is using &..., we are not alone.

Unless your keyword layout has something I am not seeing, or we go to with heavy signs ➕➖✖➗(they are not easy to type!), the option to go seems to be &...

asterite · 2018-07-02T18:38:27Z

When I first thought about these operators, I thought about +?, -?, etc. It could even maybe be +^, -^, etc... the ^ meaning "overflow" (it goes up :-P). There are plenty of possibilities to choose from (and that symbol doesn't have to come before the operator, it can come after it).

asterite · 2018-07-02T18:39:31Z

(that said, I'm fine with &+, etc.)

bcardiff · 2018-07-02T18:45:44Z

^ is associated with exponents. I know we use ** for that, but still..
? suffix might conflict with some expr.op without argument and the ternary operator. I would prefer to leave some space in the operators to not require ... some space. 🥁

I'm fine with & prefix also.

BTW: I found it confusing that swift names overflowing operators to the one that after an overflow will wrap around &+. And we are calling overflowing when an exception might be raised +. #NamesAreHard

Sija · 2018-07-02T19:00:12Z

BTW: I found it confusing that swift names overflowing operators to the one that after an overflow will wrap around &+. And we are calling overflowing when an exception might be raised +. #NamesAreHard

I'd also assume that overflowing in this case means that's allowed to overflow.

bew · 2018-07-02T19:48:39Z

Following up on what @asterite said, it could be a char after the operator:
What about <operator>! for ~~overflow checking~~ disabled overflow checks ? (dangerous, as @straight-shoota said below)
It feels good because it's known that some methods ending with ! can raise (like not_nil!).

So like: +! *!

jkthorne · 2018-07-02T19:48:50Z

thank @bcardiff that makes a lot of sense.

straight-shoota · 2018-07-02T20:27:25Z

@bew But the standard operators +, -, * should raise on overflow. Disabling overflow checks is the exception and needs different operators. Still, ! could make sense because it marks these operators as dangerous.

But then, ! is also an unary operator, so maybe not a good choice here.

bew · 2018-07-02T20:56:00Z

Oh right thanks @straight-shoota, I've updated my message accordingly.

jhass · 2018-07-02T21:09:55Z

Yeah, let's avoid something where a wrong space is still valid syntax with a different meaning even if it then would fail the semantic phase, as in 1 +! 2 vs 1 + !2

Sija · 2018-07-02T21:13:37Z

Maybe @? It visually "wraps around" and has no prior meaning.

funny-falcon · 2018-07-02T21:16:58Z

If use suffix instead of prefix, then why not same & character?
+&, -&, *&, >>&, /&, %& .
Will it collide with some existing syntax?

It could be read as "add with and", "sub with and" etc, that is close to logical semantic:
a.to_u8 +& b.to_u8 == ((a + b) & 0xff).to_u8

d &+= a &+ b &* c
vs
d +&= a +& b *& c

Though, prefix looks a bit prettier.

jhass · 2018-07-02T21:18:56Z

a @+ b could work I guess. a +@ b is bad due to a + @b. (minor: +@ is unary + in Ruby)

ysbaddaden · 2018-07-02T21:20:23Z

I think &+ is quite clever. It reads as "wrapping add" or (a + b) & 0xFF, which is exactly what we do in languages without wrapping integers (e.g. Ruby).

ysbaddaden · 2018-07-02T21:21:23Z

I.e. no need to bother to be smart or clever. Swift paved the way. Let's follow.

megatux · 2018-07-03T21:59:50Z

Sorry to add more confusion. Is it possible to use "wrapping" characters? e.g. [+] |+| <+> _+_ . They are longer but we already have <=> and === . (Ignore me if this does not make any sense to you :D)

…

On Mon, Jul 2, 2018 at 6:21 PM Julien Portalier ***@***.***> wrote: I.e. no need to bother to be smart or clever. Swift paved the way. Let's follow. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#6223 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAz4ZGlgyTpoMOGFetsz5AGF15N1HL8lks5uCo7YgaJpZM4Ut7us> .

-- ----------------------------------------------------- .^. In an open world, who needs windows or gates? /V\ Cristian Molina <http://www.megatux.me/> // \\ GNU/Linux User #73047, Ubuntu User # 14733 /( _ )\ San Luis - Argentina ^^ ^^ ---------------------------------------------

sempervictus · 2018-07-04T08:19:38Z

@megatux: I think the road to malbolge is paved with something like that :)
The Swift way seems the most straightforward (likely to be understood by new developers) and safe approach, reducing the pebcak->equifax scenarios down the line.

jkthorne · 2018-09-04T20:41:03Z

I am curious what is the progress on this?

RX14 · 2018-09-10T10:09:11Z

@bcardiff ?

bcardiff · 2018-09-10T12:09:14Z

The amp ops will have the right semantics in 0.27.
The plain ops will change their semantics in 0.28.

I will extract those changes from old branch probably next week.

I’m glad there was interest in the numerical features in the last days!

jkthorne · 2018-10-10T18:42:52Z

I just saw the amp ops PR go threw!
The 0.27 release will not have the amp ops but they will not be implemented right?

RX14 · 2018-10-10T18:50:07Z

@wontruefree no, 0.26 had the operators but no implementation. 0.27 will have &+ behave the same as +, 0.28 will change + to raise on overflow.

bcardiff added 30 commits June 18, 2018 17:52

Add OverflowError exception

2ca9135

Rename args & variables due to new checked keyword

a89f3d4

Refactor: avoid setting ivar in codegen_binary_op

969dcba

since it's done in codegen_primitive and others codegen_binary avoids setting it

Refactor: extract codegen_binary_op_add, _sub and _mul methods

acd0f90

Refactor: move truncate code to each operator

e050734

More linear code for operators

Forward AST call operator location to codegen_binary_op

f64630e

for exact location information of the exception in the caller context

Keep IntegerType used as a result of codegen_binary_extend_int

723b260

Add --overflow-checked and --overflow-unchecked compiler options

ae5984e

Make default unchecked to keep current semantics but schedule change on the default policy

Add checked / unchecked keywords

4456370

Move OverflowCheck to OverflowCheckScope::Policy

Add typing rules for OverflowCheckScope

9ff91be

Add migration helper macro __next_unchecked

4399d5e

Hasher: Mark required unchecked code blocks

9c0d02d

Random: Mark required unchecked code blocks

5c64f54

Code gen: Use llvm overflow intrinsics for checked int addition

c45b4bb

intrinsics are added on demand

Code gen: Add overflow check before truncation

30f9e35

Ensure lexical scope of OverflowCheckScope

6744d68

Add overflow check policy to bc_flags

ed7ab29

Changing overflow_check should invalidate cache

Code gen: Use llvm overflow intrinsics for checked int subtraction

e6cb6cb

Code gen: Use llvm overflow intrinsics for checked int multiplication

8c54b65

Dwarf: Mark required unchecked code blocks

385bc00

FloatPrinter: Mark required unchecked code blocks

ef7baa8

Removing any of these prevent grisu3_specs to pass

String: Mark required unchecked code blocks

380ae97

Int#** avoid k *= k in last iteration due to possible overflow

Hash: Mark required unchecked code blocks

72f556c

Crypto: Mark required unchecked code blocks

7d50f83

MD5: Mark required unchecked code blocks

683042d

JSON: Mark required unchecked code blocks

1a3a1b2

SHA1: Mark required unchecked code blocks

03bf1b0

Time: Mark required unchecked code blocks

a0488b9

Compiler: Mark required unchecked code blocks

0ee4f1f

Otherwise generated enum values might overflow (even if they are not used due to manual assignment)

CI: Make specs run with --overflow-checked

5ff7481

bcardiff mentioned this pull request Jul 4, 2018

Add &+ &- &* &** operators parsing #6329

Merged

bcardiff mentioned this pull request Jul 31, 2018

Parse integer division operator #6470

Merged

RX14 mentioned this pull request Aug 21, 2018

BUG: multiplication error when switching order of variables #6584

Closed

bcardiff mentioned this pull request Sep 28, 2018

Add wrapping arithmetics operators #6890

Merged

RX14 mentioned this pull request Nov 5, 2018

Memory consumption compared to D and Go #7025

Closed

bcardiff mentioned this pull request Dec 20, 2018

Add overflow detection with preview opt-in #7206

Merged

bcardiff closed this in #7206 Jan 23, 2019

bcardiff mentioned this pull request Nov 7, 2019

Fix warning on using Channel.select #8454

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Arithmetic with Overflow #6223

[WIP] Arithmetic with Overflow #6223

bcardiff commented Jun 19, 2018 •

edited

bcardiff commented Jul 2, 2018

asterite commented Jul 2, 2018

asterite commented Jul 2, 2018

bcardiff commented Jul 2, 2018

Sija commented Jul 2, 2018

bew commented Jul 2, 2018 •

edited

jkthorne commented Jul 2, 2018

straight-shoota commented Jul 2, 2018

bew commented Jul 2, 2018

jhass commented Jul 2, 2018

Sija commented Jul 2, 2018

funny-falcon commented Jul 2, 2018

jhass commented Jul 2, 2018

ysbaddaden commented Jul 2, 2018

ysbaddaden commented Jul 2, 2018

megatux commented Jul 3, 2018 via email

sempervictus commented Jul 4, 2018

jkthorne commented Sep 4, 2018

RX14 commented Sep 10, 2018

bcardiff commented Sep 10, 2018

jkthorne commented Oct 10, 2018

RX14 commented Oct 10, 2018

[WIP] Arithmetic with Overflow #6223

[WIP] Arithmetic with Overflow #6223

Conversation

bcardiff commented Jun 19, 2018 • edited

Updates in the compiler options

Updates in the language

Lexical Scope

Implementation details

Limitations of the scopes

Standard Library changes

Pending

bcardiff commented Jul 2, 2018

asterite commented Jul 2, 2018

asterite commented Jul 2, 2018

bcardiff commented Jul 2, 2018

Sija commented Jul 2, 2018

bew commented Jul 2, 2018 • edited

jkthorne commented Jul 2, 2018

straight-shoota commented Jul 2, 2018

bew commented Jul 2, 2018

jhass commented Jul 2, 2018

Sija commented Jul 2, 2018

funny-falcon commented Jul 2, 2018

jhass commented Jul 2, 2018

ysbaddaden commented Jul 2, 2018

ysbaddaden commented Jul 2, 2018

megatux commented Jul 3, 2018 via email

sempervictus commented Jul 4, 2018

jkthorne commented Sep 4, 2018

RX14 commented Sep 10, 2018

bcardiff commented Sep 10, 2018

jkthorne commented Oct 10, 2018

RX14 commented Oct 10, 2018

bcardiff commented Jun 19, 2018 •

edited

bew commented Jul 2, 2018 •

edited