Optimise IO::Buffered for reads 4096 <= size < 8192

When count is more than half the buffer size, an unbuffered read is likely to be happening in more than half of the read calls, so it's unlikely to be worth the extra buffer copy. Benchmarks show this to be true: before 2048 7.74 (± 0.46%) 1.24× slower 4096 7.39 (± 0.45%) 1.30× slower 6144 7.52 (± 0.57%) 1.28× slower 8192 9.59 (± 0.79%) fastest after 2048 7.73 (± 0.64%) 1.24× slower 4096 8.11 (± 0.76%) 1.18× slower 6144 8.61 (± 0.66%) 1.11× slower 8192 9.6 (± 0.89%) fastest Sizes 2048 and 8192 have the same behaviour before and after this commit, so we see similar speeds. However, we see a 10% speedup for 4096 and 6144 bytes as they call unbuffered_read less often. Benchmark code: ``` require "benchmark" macro benchmark(size) b.report("{{size}}") do # Read a 1 gigabyte file in = File.open("1gb", "r") buffer = uninitialized UInt8[{{size}}] while in.read(buffer.to_slice) > 0 end in.close end end Benchmark.ips do |b| benchmark(2048) benchmark(4096) benchmark(6144) benchmark(8192) end ```
crystal-lang · Oct 13, 2016 · bb208bc · bb208bc
1 parent 89da3a7
commit bb208bc
Showing 1 changed file with 4 additions and 3 deletions.
diff --git a/src/io/buffered.cr b/src/io/buffered.cr
@@ -156,9 +156,10 @@ module IO::Buffered
     return 0 if count == 0
 
     if @in_buffer_rem.empty?
-      # If we are asked to read more than the buffer's size,
-      # read directly into the slice.
-      if count >= BUFFER_SIZE
+      # If we are asked to read more than half the buffer's size,
+      # read directly into the slice, as it's not worth the extra
+      # memory copy.
+      if count >= BUFFER_SIZE / 2
         return unbuffered_read(slice[0, count]).to_i
       else
         fill_buffer