Use slice instead of StaticArray for String::CHAR_TO_DIGIT #5616

S-YOU · 2018-01-20T18:38:38Z

By using Pointer and Slice instead of StaticArray,
less LLVM IR generation for String::CHAR_TO_DIGIT (1054 hits to 76)

ysbaddaden · 2018-01-20T22:52:50Z

But requires a HEAP allocation.

S-YOU · 2018-01-21T02:34:01Z

Is there any performance benefit (or something) over having this in stack?

asterite · 2018-01-21T04:11:02Z

What does hits mean here?

S-YOU · 2018-01-21T05:49:21Z

I meant "hits" as number of occurrences of CHAR_TO_DIGIT including CHAR_TO_DIGIT62 in outputted .ll file. But I now realized, that number depends on I use it or not, and how many times I used.

S-YOU · 2018-01-21T05:51:53Z

This is minimal code I used to generate .ll file in release mode

puts String::CHAR_TO_DIGIT

I see like following in .ll file

%.unpack90 = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 1), align 1
  %.unpack91 = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 2), align 2
  %.unpack92 = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 3), align 1
  %.unpack93 = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 4), align 4
  %.unpack94 = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 5), align 1
  %.unpack95 = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 6), align 2
  %.unpack96 = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 7), align 1
  %.unpack97 = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 8), align 8
  %.unpack98 = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 9), align 1
  %
... x 256 times

%obj1.repack1.i.i.i = getelementptr inbounds [256 x i8], [256 x i8]* %obj1.i.i.i, i64 0, i64 1
  store i8 %.unpack90, i8* %obj1.repack1.i.i.i, align 1
  %obj1.repack3.i.i.i = getelementptr inbounds [256 x i8], [256 x i8]* %obj1.i.i.i, i64 0, i64 2
  store i8 %.unpack91, i8* %obj1.repack3.i.i.i, align 1
  %obj1.repack5.i.i.i = getelementptr inbounds [256 x i8], [256 x i8]* %obj1.i.i.i, i64 0, i64 3
  store i8 %.unpack92, i8* %obj1.repack5.i.i.i, align 1
  %obj1.repack7.i.i.i = getelementptr inbounds [256 x i8], [256 x i8]* %obj1.i.i.i, i64 0, i64 4
  store i8 %.unpack93, i8* %obj1.repack7.i.i.i, align 1
  %obj1.repack9.i.i.i = getelementptr inbounds [256 x i8], [256 x i8]* %obj1.i.i.i, i64 0, i64 5
  store i8 %.unpack94, i8* %obj1.repack9.i.i.i, align 1
  %obj1.repack11.i.i.i = getelementptr inbounds [256 x i8], [256 x i8]* %obj1.i.i.i, i64 0, i64 6
... x 256 times

 %table.repack7 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 1
  %table.repack9 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 2
  %table.repack11 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 3
  %table.repack13 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 4
  %table.repack15 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 5
  %table.repack17 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 6
  %table.repack19 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 7
  %table.repack21 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 8
  %table.repack23 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 9
  %table.repack25 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 10
  %table.repack27 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 11
  %table.repack29 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 12
  %table.repack31 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 13
  %table.repack33 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 14
  %table.repack35 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 15
  %table.repack37 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 16
... x 256 times

store i8 %.unpack519, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 1), align 1
  store i8 %.unpack521, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 2), align 2
  store i8 %.unpack523, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 3), align 1
  store i8 %.unpack525, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 4), align 4
  store i8 %.unpack527, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 5), align 1
  store i8 %.unpack529, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 6), align 2
  store i8 %.unpack531, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 7), align 1
  store i8 %.unpack533, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 8), align 8
  store i8 %.unpack535, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 9), align 1
  store i8 %.unpack537, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 10), align 2
  store i8 %.unpack539, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 11), align 1
  store i8 %.unpack541, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 12), align 4
  store i8 %.unpack543, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 13), align 1
  store i8 %.unpack545, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 14), align 2
  store i8 %.unpack547, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 15), align 1
  store i8 %.unpack549, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 16), align 16
  store i8 %.unpack551, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 17), align 1
  store i8 %.unpack553, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 18), align 2
  x 256 times

%array.i.sroa.0.0.copyload = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 0), align 16
  %array.i.sroa.4.0.copyload = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 1), align 1
  %array.i.sroa.5.0.copyload = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 2), align 2
  %array.i.sroa.6.0.copyload = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 3), align 1
  %array.i.sroa.7.0.copyload = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 4), align 4
  %array.i.sroa.8.0.copyload = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 5), align 1
  %array.i.sroa.9.0.copyload = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 6), align 2
  %array.i.sroa.10.0.copyload = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 7), align 1
  %array.i.sroa.11.0.copyload = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 8), align 8
  %array.i.sroa.12.0.copyload = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 9), align 1
  %array.i.sroa.13.0.copyload = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 10), align 2
  %array.i.sroa.14.0.copyload = load i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @"String::CHAR_TO_DIGIT", i64 0, i64 11), align 1
  %arra
... x 256 times

%table.repack2 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 1
  store i8 %array.i.sroa.4.0.copyload, i8* %table.repack2, align 1
  %table.repack4 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 2
  store i8 %array.i.sroa.5.0.copyload, i8* %table.repack4, align 1
  %table.repack6 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 3
  store i8 %array.i.sroa.6.0.copyload, i8* %table.repack6, align 1
  %table.repack8 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 4
  store i8 %array.i.sroa.7.0.copyload, i8* %table.repack8, align 1
  %table.repack10 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 5
  store i8 %array.i.sroa.8.0.copyload, i8* %table.repack10, align 1
  %table.repack12 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 6
  store i8 %array.i.sroa.9.0.copyload, i8* %table.repack12, align 1
  %table.repack14 = getelementptr inbounds [256 x i8], [256 x i8]* %table, i64 0, i64 7
  store i8 %array.i.sroa.10.0.copyload, i8* %table.repack14, align 1
  ... x 256 times

%.unpack514 = load i8, i8* %table.repack2, align 1
  %.unpack516 = load i8, i8* %table.repack4, align 1
  %.unpack518 = load i8, i8* %table.repack6, align 1
  %.unpack520 = load i8, i8* %table.repack8, align 1
  %.unpack522 = load i8, i8* %table.repack10, align 1
  %.unpack524 = load i8, i8* %table.repack12, align 1
  %.unpack526 = load i8, i8* %table.repack14, align 1
  %.unpack528 = load i8, i8* %table.repack16, align 1
  %.unpack530 = load i8, i8* %table.repack18, align 1
  %.unpack532 = load i8, i8* %table.repack20, align 1
  %.unpack534 = load i8, i8* %table.repack22, align 1
  %.unpack536 = load i8, i8* %table.repack24, align 1
  %.unpack538 = load i8, i8* %table.repack26, align 1
  ... x 256 times

and some more

Comparing by line numbers in .ll file

with stack: 41438
with heap: 40203 (1235 less)

By filesize

2725276   char_to_digit_stack.ll
2544674   char_to_digit_heap.ll (180k less)

lbguilherme · 2018-01-21T09:41:35Z

The way LLVM generates code for static array is quite poor. But as I understand, the code is generated in a way that is easy to be optimized in the later phase. Perhaps you should compare the optimized version of the ll, or even the final optimized binary. Using heap here is clearly more runtime work and if it is better then staticarray is wrong all the way down.

ysbaddaden · 2018-01-21T09:54:42Z

The difference is that StaticArray (and Slices) is bound checked, but the changed code uses direct pointer accesses.

Keep the static array and skip the malloc but operate on #to_unsafe to directly access pointers, skipping bound checks and I'm pretty sure LLVM will optimize in the same way.

But you get unsafe code, and must be perfectly sure to never have out of bounds value.

lbguilherme · 2018-01-21T10:03:04Z

@ysbaddaden the previous code already had to_unsafe

S-YOU · 2018-01-21T11:14:55Z

How to get optimized version of ll?

S-YOU · 2018-01-21T11:51:36Z

I tried to compare in assembly dump, but I am not sure those really refer to CHAR_TO_DIGITS array, because every lines differ in comparison.

But I do see similar in objdump output too.

8 84 24 00 01 00 00 	movb	%al, 256(%rsp)
44 88 94 24 01 01 00 00 	movb	%r10b, 257(%rsp)
8a 84 24 f8 00 00 00 	movb	248(%rsp), %al
88 84 24 02 01 00 00 	movb	%al, 258(%rsp)
8a 84 24 f9 00 00 00 	movb	249(%rsp), %al
88 84 24 03 01 00 00 	movb	%al, 259(%rsp)
44 88 9c 24 04 01 00 00 	movb	%r11b, 260(%rsp)
44 88 8c 24 05 01 00 00 	movb	%r9b, 261(%rsp)
40 88 bc 24 06 01 00 00 	movb	%dil, 262(%rsp)
8a 84 24 fa 00 00 00 	movb	250(%rsp), %al
88 84 24 07 01 00 00 	movb	%al, 263(%rsp)
8a 84 24 fc 00 00 00 	movb	252(%rsp), %al
88 84 24 08 01 00 00 	movb	%al, 264(%rsp)
8a 84 24 fb 00 00 00 	movb	251(%rsp), %al
88 84 24 09 01 00 00 	movb	%al, 265(%rsp)
8a 84 24 ff 00 00 00 	movb	255(%rsp), %al
88 84 24 0a 01 00 00 	movb	%al, 266(%rsp)
40 88 b4 24 0b 01 00 00 	movb	%sil, 267(%rsp)
44 88 84 24 0c 01 00 00 	movb	%r8b, 268(%rsp)
8a 84 24 fe 00 00 00 	movb	254(%rsp), %al
88 84 24 0d 01 00 00 	movb	%al, 269(%rsp)
8a 84 24 fd 00 00 00 	movb	253(%rsp), %al
88 84 24 0e 01 00 00 	movb	%al, 270(%rsp)
44 88 a4 24 0f 01 00 00 	movb	%r12b, 271(%rsp)
44 88 bc 24 10 01 00 00 	movb	%r15b, 272(%rsp)
40 88 ac 24 11 01 00 00 	movb	%bpl, 273(%rsp)
8a 84 24 d4 00 00 00 	movb	212(%rsp), %al
88 84 24 12 01 00 00 	movb	%al, 274(%rsp)
8a 84 24 cf 00 00 00 	movb	207(%rsp), %al
88 84 24 13 01 00 00 	movb	%al, 275(%rsp)
8a 84 24 c5 00 00 00 	movb	197(%rsp), %al
88 84 24 14 01 00 00 	movb	%al, 276(%rsp)
8a 84 24 c4 00 00 00 	movb	196(%rsp), %al
.. many more lines

a 44 24 7f 	movb	127(%rsp), %al
88 84 24 5c 01 00 00 	movb	%al, 348(%rsp)
8a 44 24 7e 	movb	126(%rsp), %al
88 84 24 5d 01 00 00 	movb	%al, 349(%rsp)
8a 44 24 7d 	movb	125(%rsp), %al
88 84 24 5e 01 00 00 	movb	%al, 350(%rsp)
8a 44 24 7c 	movb	124(%rsp), %al
88 84 24 5f 01 00 00 	movb	%al, 351(%rsp)
8a 44 24 7b 	movb	123(%rsp), %al
88 84 24 60 01 00 00 	movb	%al, 352(%rsp)
8a 44 24 7a 	movb	122(%rsp), %al
88 84 24 61 01 00 00 	movb	%al, 353(%rsp)
8a 44 24 79 	movb	121(%rsp), %al
88 84 24 62 01 00 00 	movb	%al, 354(%rsp)
8a 44 24 78 	movb	120(%rsp), %al
88 84 24 63 01 00 00 	movb	%al, 355(%rsp)
8a 44 24 77 	movb	119(%rsp), %al
88 84 24 64 01 00 00 	movb	%al, 356(%rsp)
8a 44 24 76 	movb	118(%rsp), %al
88 84 24 65 01 00 00 	movb	%al, 357(%rsp)
8a 44 24 75 	movb	117(%rsp), %al
88 84 24 66 01 00 00 	movb	%al, 358(%rsp)
8a 44 24 74 	movb	116(%rsp), %al
... many more lines

84 24 cd 01 00 00 	movb	%al, 461(%rsp)
8a 84 24 c6 00 00 00 	movb	198(%rsp), %al
88 84 24 ce 01 00 00 	movb	%al, 462(%rsp)
8a 84 24 c7 00 00 00 	movb	199(%rsp), %al
88 84 24 cf 01 00 00 	movb	%al, 463(%rsp)
8a 84 24 c8 00 00 00 	movb	200(%rsp), %al
88 84 24 d0 01 00 00 	movb	%al, 464(%rsp)
8a 84 24 ca 00 00 00 	movb	202(%rsp), %al
88 84 24 d1 01 00 00 	movb	%al, 465(%rsp)
8a 84 24 cb 00 00 00 	movb	203(%rsp), %al
88 84 24 d2 01 00 00 	movb	%al, 466(%rsp)
8a 84 24 cc 00 00 00 	movb	204(%rsp), %al
88 84 24 d3 01 00 00 	movb	%al, 467(%rsp)
8a 84 24 cd 00 00 00 	movb	205(%rsp), %al
88 84 24 d4 01 00 00 	movb	%al, 468(%rsp)
8a 84 24 d0 00 00 00 	movb	208(%rsp), %al
88 84 24 d5 01 00 00 	movb	%al, 469(%rsp)
8a 84 24 d1 00 00 00 	movb	209(%rsp), %al
88 84 24 d6 01 00 00 	movb	%al, 470(%rsp)
8a 84 24 d2 00 00 00 	movb	210(%rsp), %al
88 84 24 d7 01 00 00 	movb	%al, 471(%rsp)
8a 84 24 d3 00 00 00 	movb	211(%rsp), %al
88 84 24 d8 01 00 00 	movb	%al, 472(%rsp)
8a 84 24 d5 00 00 00 	movb	213(%rsp), %al
88 84 24 d9 01 00 00 	movb	%al, 473(%rsp)
8a 84 24 d6 00 00 00 	movb	214(%rsp), %al
88 84 24 da 01 00 00 	movb	%al, 474(%rsp)
8a 84 24 d7 00 00 00 	movb	215(%rsp), %al
88 84 24 db 01 00 00 	movb	%al, 475(%rsp)
8a 84 24 d8 00 00 00 	movb	216(%rsp), %al
... many more lines

S-YOU · 2018-01-21T13:24:14Z

The core issue is StaticArray using a loop to fill a value to all items.

def self.new(&block : Int32 -> T)
    array = uninitialized self
    N.times do |i|
      array.to_unsafe[i] = yield i
    end
    array
  end

but I think those can actually done with one memset (for Int8 | UInt8) call for initialization.
~~I could try to send a PR to use memset, instead of this change, if that is preferable.~~

EDIT: I tried to change StaticArray and used memset instead of Loop but that does not change the result, so uninitialized self causing all those stuff.

S-YOU · 2018-01-21T16:22:23Z

Not a typical usage, but doing an assignment on benchmark went a bit slower.
Closing, Thanks for taking your time.

require "benchmark"

Benchmark.ips do |x|
  x.report("CHAR_TO_DIGIT2") { b = String::CHAR_TO_DIGIT2 }
  x.report("CHAR_TO_DIGIT") { a = String::CHAR_TO_DIGIT }
end

CHAR_TO_DIGIT2 468.55M (  2.13ns) (± 8.64%)  0 B/op   1.18× slower
CHAR_TO_DIGIT 552.43M (  1.81ns) (± 8.64%)  0 B/op        fastest

Use slice instead of StaticArray

ce6a7b1

S-YOU closed this Jan 21, 2018

S-YOU deleted the chore_char_to_digit branch May 4, 2018 14:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use slice instead of StaticArray for String::CHAR_TO_DIGIT #5616

Use slice instead of StaticArray for String::CHAR_TO_DIGIT #5616

S-YOU commented Jan 20, 2018

ysbaddaden commented Jan 20, 2018

S-YOU commented Jan 21, 2018

asterite commented Jan 21, 2018

S-YOU commented Jan 21, 2018

S-YOU commented Jan 21, 2018 •

edited

Loading

lbguilherme commented Jan 21, 2018

ysbaddaden commented Jan 21, 2018

lbguilherme commented Jan 21, 2018

S-YOU commented Jan 21, 2018

S-YOU commented Jan 21, 2018

S-YOU commented Jan 21, 2018 •

edited

Loading

S-YOU commented Jan 21, 2018

Use slice instead of StaticArray for String::CHAR_TO_DIGIT #5616

Use slice instead of StaticArray for String::CHAR_TO_DIGIT #5616

Conversation

S-YOU commented Jan 20, 2018

ysbaddaden commented Jan 20, 2018

S-YOU commented Jan 21, 2018

asterite commented Jan 21, 2018

S-YOU commented Jan 21, 2018

S-YOU commented Jan 21, 2018 • edited Loading

lbguilherme commented Jan 21, 2018

ysbaddaden commented Jan 21, 2018

lbguilherme commented Jan 21, 2018

S-YOU commented Jan 21, 2018

S-YOU commented Jan 21, 2018

S-YOU commented Jan 21, 2018 • edited Loading

S-YOU commented Jan 21, 2018

S-YOU commented Jan 21, 2018 •

edited

Loading

S-YOU commented Jan 21, 2018 •

edited

Loading