[WIP] Vectored I/O #6078

carlhoerberg · 2018-05-08T11:28:17Z

Optimize writes by using writev

WIP, feedback appreciated

RX14

This needs specs.

I'm a little concerned that the cost of the heap allocation for the to_a call will be larger than the performance gained using scatter-gather IO. Looking at writev_syscall_heaper makes me think that write_syscall_helper might have been the wrong abstraction and we should have encapsulated only the loop and the errno check, not any of the slice accounting. (i.e. i'd like to use write_syscall_helper for writev as well)

I think the best effort to avoid heap allocs should be made if this is a performance-sensitive syscall.

RX14 · 2018-05-08T11:50:29Z

src/crystal/system/unix/file_descriptor.cr

+  private def unbuffered_write(*slices : Bytes)
+    writev_syscall_helper(slices.to_a, "Error writing file") do |slices|
+      iovecs = slices.map do |slice|
+        iovec = LibC::IoVec.new


LibC::IoVec.new(iov_base: slice.to_unsafe, iov_len: slice.size) should work and be a lot shorter.

RX14 · 2018-05-08T11:52:16Z

src/io/buffered.cr

+      return unbuffered_write(*slices)
+    end
+
+    count = slices.sum &.size


nitpick: this is changing terminology from size to count, how about just total_size?

RX14 · 2018-05-08T11:59:52Z

src/lib_c/x86_64-macosx-darwin/c/unistd.cr

@@ -38,4 +43,5 @@ lib LibC
  fun sysconf(x0 : Int) : Long
  fun unlink(x0 : Char*) : Int
  fun write(x0 : Int, x1 : Void*, x2 : SizeT) : SSizeT
+  fun writev(x0 : Int, x1 : IoVec*, x2 : SizeT) : SSizeT


This will obviously need to be copied to all the other libcs.

asterite · 2018-05-08T12:13:07Z

src/crystal/system/unix/file_descriptor.cr

@@ -26,6 +26,22 @@ module Crystal::System::FileDescriptor
    end
  end

+  private def unbuffered_write(*slices : Bytes)
+    writev_syscall_helper(slices.to_a, "Error writing file") do |slices|


This to_a defeats the purpose of the method (it allocates). Does it work without it?

no, but if I could convert a Tuple to a StaticArray it would.

I guess we could add Tuple#static_array, as Number#static_array

crystal/src/number.cr

Lines 65 to 71 in 80a975c

macro static_array(*nums)

%array = uninitialized StaticArray({{@type}}, {{nums.size}})

{% for num, i in nums %}

%array.to_unsafe[{{i}}] = {{@type}}.new({{num}})

{% end %}

%array

end

Maybe a splat argument with type like *slices : Bytes should just be a StaticArray?

carlhoerberg · 2018-05-08T13:19:53Z

yes, to my understanding write_syscall_helper does basically what read_fully does for reads. Should IO#write always write everything, and in that case, why does not IO#read read everything? I guess it's few times a user would not want to read or write everything?

RX14 · 2018-05-08T13:42:42Z

@carlhoerberg write should always write everything because partial writes are always the wrong behaviour, read shouldn't read everything because usually full reads are often the wrong behaviour (you can't implement most protocols efficiently without partial read)

ysbaddaden · 2018-05-08T15:04:59Z

It would be nice to have a longer description in the pull request:

What's the problem?
What solution(s) are you proposing?
Is using writev(2) noticeably faster for small, medium and/or large collections than a naive and simpler solution, given that you must allocate 2 arrays (#to_a then #map to build the IoVec)?

def write(slices : Enumerable(Bytes))
  slices.each { |slice| write(slice) }
end

def write(*slices : Bytes)
  write(slices)
end

straight-shoota · 2018-05-08T15:14:07Z

@ysbaddaden Tuple#map actually returns a Tuple, so no allocation there.

The array allocation can also be skipped if the splash argument is converted to a StaticArray. StaticArray#map also returns a StaticArray.

ysbaddaden · 2018-05-08T15:28:53Z

Still a naive solution is 6 lines of code, doesn't need external bindings, accepts any Enumerable without allocating, ... Using writev(2) has better be worth it :)

RX14 · 2018-05-08T15:34:51Z

writev must be worth it for some usecases to be added as a syscall.

@carlhoerberg what is your usecase exactly and can you benchmark it for us?

Before obsessing over the implementation of this PR it'd be nice to see some benchmarks.

asterite · 2018-05-08T20:04:25Z

Worth reading: https://stackoverflow.com/a/10520793/4990850

The use cases are very specific.

In Ruby there's a gem for that: https://github.com/tenderlove/writev

If it's not too difficult to implement, we can have it in the standard library.

RX14 · 2018-05-08T20:24:45Z

Hence why I think it'd be interesting to hear @carlhoerberg's usecase :)

Uses the writev syscall

carlhoerberg · 2018-05-09T14:07:44Z

Benchmark:

require "benchmark"

data_set = {
  small: Tuple.new("a" * 5, "b" * 5, "c" * 5, "d" * 5, "e" * 5, "f" * 5, "g" * 5, "h" * 5, "i" * 5, "j" * 5).map &.to_slice,
  medium: Tuple.new("a" * 500, "b" * 500, "c" * 500, "d" * 500, "e" * 500, "f" * 500, "g" * 500, "h" * 500, "i" * 500, "j" * 500).map &.to_slice,
  large: Tuple.new("a" * 5000, "b" * 5000, "c" * 5000, "d" * 5000, "e" * 5000, "f" * 5000, "g" * 5000, "h" * 5000, "i" * 5000, "j" * 5000).map &.to_slice,
}

buffered = File.open("/tmp/write-buf", "w")
synced = File.open("/tmp/write-sync", "w").tap { |f| f.sync = true }
writev = File.open("/tmp/writev", "w").tap { |f| f.sync = true }

data_set.each do |type, data|
  Benchmark.ips do |x|
    x.report("write buffered #{type}") do
      data.each do |d|
        buffered.write d
      end
      buffered.flush
    end
    x.report("write sync #{type}") do
      data.each do |d|
        synced.write d
      end
    end
    x.report("writev #{type}") do
      writev.write *data
    end
  end
  puts
end

Output:

write buffered small 144.25k (  6.93µs) (± 6.05%)  0 B/op        fastest
    write sync small  15.52k ( 64.45µs) (± 6.49%)  0 B/op   9.30× slower
        writev small 127.73k (  7.83µs) (± 7.45%)  0 B/op   1.13× slower

write buffered medium  66.11k ( 15.13µs) (± 6.05%)  0 B/op        fastest
    write sync medium   11.6k ( 86.18µs) (±11.17%)  0 B/op   5.70× slower
        writev medium  62.87k ( 15.91µs) (± 7.01%)  0 B/op   1.05× slower

write buffered large   6.36k (157.11µs) (± 7.59%)  0 B/op   2.10× slower
    write sync large   6.42k (155.74µs) (± 8.60%)  0 B/op   2.08× slower
        writev large  13.34k ( 74.96µs) (±12.05%)  0 B/op        fastest

When the combined writes are larger than the buffer's 4096 byte limit the faster (relatively) writev becomes.

carlhoerberg · 2018-05-09T14:21:49Z

Benchmarking TCP writes:

require "benchmark"
require "socket"

server = TCPServer.new("localhost", 1234)

fork do
  client = TCPSocket.new("localhost", 1234)
  client.skip_to_end
end

data_set = {
  small: Tuple.new("a" * 5, "b" * 5, "c" * 5, "d" * 5, "e" * 5, "f" * 5, "g" * 5, "h" * 5, "i" * 5, "j" * 5).map &.to_slice,
  medium: Tuple.new("a" * 500, "b" * 500, "c" * 500, "d" * 500, "e" * 500, "f" * 500, "g" * 500, "h" * 500, "i" * 500, "j" * 500).map &.to_slice,
  large: Tuple.new("a" * 5000, "b" * 5000, "c" * 5000, "d" * 5000, "e" * 5000, "f" * 5000, "g" * 5000, "h" * 5000, "i" * 5000, "j" * 5000).map &.to_slice,
}

socket = server.accept
data_set.each do |type, data|
  Benchmark.ips do |x|
    socket.sync = false
    x.report("write buffered #{type}") do
      data.each do |d|
        socket.write d
      end
      socket.flush
    end
    socket.sync = true
    x.report("write sync #{type}") do
      data.each do |d|
        socket.write d
      end
    end
    x.report("writev #{type}") do
      socket.write *data
    end
  end
  puts
end

output:

write buffered small  28.57k (  35.0µs) (± 7.25%)  0 B/op   7.18× slower
    write sync small  28.53k ( 35.06µs) (± 5.52%)  0 B/op   7.19× slower
        writev small 205.15k (  4.87µs) (± 6.60%)  0 B/op        fastest

write buffered medium  23.65k ( 42.29µs) (± 6.56%)  0 B/op   6.86× slower
    write sync medium   23.9k ( 41.85µs) (± 4.51%)  0 B/op   6.79× slower
        writev medium 162.24k (  6.16µs) (± 7.51%)  0 B/op        fastest

write buffered large  20.91k ( 47.82µs) (± 7.41%)  0 B/op   1.21× slower
    write sync large  20.98k ( 47.66µs) (± 5.44%)  0 B/op   1.20× slower
        writev large  25.25k (  39.6µs) (± 5.91%)  0 B/op        fastest

Sija · 2018-05-09T21:29:26Z

src/io/syscall.cr

@@ -81,6 +81,34 @@ module IO::Syscall
    end
  end

+  def writev_syscall_helper(slices, errno_msg : String) : Nil


errno_msg argument here is IMHO pointless, why not just use Error writing to IO?

carlhoerberg · 2018-05-09T22:06:09Z

Benchmark TCP writes, where I compare it with using a IO::Memory as buffer:

require "benchmark"
require "socket"

server = TCPServer.new("localhost", 1234)

fork do
  client = TCPSocket.new("localhost", 1234)
  client.skip_to_end
end

data_set = {
  small: StaticArray(Bytes, 1).new { ("0" * 5).to_slice },
  medium: StaticArray(Bytes, 10).new { ("0" * 5).to_slice },
  large: StaticArray(Bytes, 100).new { ("0" * 5).to_slice },
}

socket = server.accept
socket.tcp_nodelay = true
socket.sync = true
data_set.each do |type, data|
  Benchmark.ips do |x|
    x.report("write w/ io::memory #{type}") do
      mem = IO::Memory.new
      data.each do |d|
        mem.write d
      end
      socket.write mem.to_slice
    end
    x.report("write sync #{type}") do
      data.each do |d|
        socket.write d
      end
    end
    x.report("writev #{type}") do
      socket.write data
    end
  end
  puts
end

output:

write w/ io::memory small 116.99k (  8.55µs) (± 5.37%)  177 B/op        fastest
         write sync small 105.84k (  9.45µs) (± 5.25%)    0 B/op   1.11× slower
             writev small 113.92k (  8.78µs) (± 4.29%)    0 B/op   1.03× slower

write w/ io::memory medium 102.97k (  9.71µs) (± 1.54%)  176 B/op        fastest
         write sync medium  10.17k ( 98.29µs) (± 9.34%)    0 B/op  10.12× slower
             writev medium  98.14k ( 10.19µs) (± 3.78%)    0 B/op   1.05× slower

write w/ io::memory large  86.69k ( 11.54µs) (± 1.36%)  1168 B/op        fastest
         write sync large   1.08k (928.12µs) (± 6.78%)     0 B/op  80.45× slower
             writev large  40.02k ( 24.99µs) (± 0.85%)     0 B/op   2.17× slower

Sija · 2018-05-09T22:10:37Z

src/crystal/system/unix/file_descriptor.cr

@@ -26,6 +26,19 @@ module Crystal::System::FileDescriptor
    end
  end

+  private def unbuffered_write(slices : Indexable(Bytes))


Shouldn't it be Enumerable instead of Indexable? (ditto all below)

ysbaddaden · 2018-05-09T22:25:58Z

Thanks for taking the time to benchmark!

wooster0 · 2019-02-05T13:06:25Z

Should someone else maybe continue this PR?

yxhuvud · 2019-11-16T18:50:13Z

FWIW, looking at the glibc implementation, it looks like they already to a solution that looks quite similar to what RX14 talked about in the puts thread: https://github.com/lattera/glibc/blob/master/sysdeps/posix/writev.c . That one should work fine and atomically with buffered io, I think.

The variant provided by the kernel is on the other hand used by musl: https://github.com/davidlazar/musl/blob/master/src/unistd/writev.c on the other hand. That variant should probably not be mixed with any buffered io as it will write directly to the underlying file descriptor, so any competing buffered writes may end up being applied in an unexpected order.

(I accdentally pinged ysbaddaden in an earlier edit)

RX14 · 2019-11-21T16:43:56Z

@yxhuvud writev is a linux syscall, i'm 100% certain that glibc implementation is just a fallback for ancient kernels with no writev support.

RX14 requested changes May 8, 2018

View reviewed changes

RX14 added pr:wip kind:feature topic:stdlib labels May 8, 2018

RX14 changed the title ~~Vectored I/O~~ [WIP] Vectored I/O May 8, 2018

asterite reviewed May 8, 2018

View reviewed changes

carlhoerberg added 3 commits May 9, 2018 10:21

IO#write(*slices : Byte) implementation

f3c66b4

Uses the writev syscall

Tuple#static_array

1e0162f

use Tuple#static_array instead of Tuple#to_a

c5aa96d

vectored writes in sockets

6db02ad

carlhoerberg force-pushed the vectored_io branch from f07f254 to 6db02ad Compare May 9, 2018 14:23

carlhoerberg added 2 commits May 9, 2018 22:44

allow passing any Indexable(Bytes) structre to IO#write

072f153

writev in x86_64 linux

4a16be5

Sija reviewed May 9, 2018

View reviewed changes

carlhoerberg closed this Jan 10, 2019

straight-shoota mentioned this pull request Sep 3, 2019

puts in a multithreaded environment #8140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Vectored I/O #6078

[WIP] Vectored I/O #6078

carlhoerberg commented May 8, 2018

RX14 left a comment

RX14 May 8, 2018

RX14 May 8, 2018

RX14 May 8, 2018

asterite May 8, 2018

carlhoerberg May 8, 2018

carlhoerberg May 8, 2018 •

edited

Loading

straight-shoota May 8, 2018

carlhoerberg commented May 8, 2018

RX14 commented May 8, 2018 •

edited

Loading

ysbaddaden commented May 8, 2018

straight-shoota commented May 8, 2018

ysbaddaden commented May 8, 2018 •

edited

Loading

RX14 commented May 8, 2018

asterite commented May 8, 2018

RX14 commented May 8, 2018

carlhoerberg commented May 9, 2018

carlhoerberg commented May 9, 2018

Sija May 9, 2018

carlhoerberg commented May 9, 2018

Sija May 9, 2018

ysbaddaden commented May 9, 2018

wooster0 commented Feb 5, 2019

yxhuvud commented Nov 16, 2019 •

edited

Loading

RX14 commented Nov 21, 2019

	macro static_array(*nums)
	%array = uninitialized StaticArray({{@type}}, {{nums.size}})
	{% for num, i in nums %}
	%array.to_unsafe[{{i}}] = {{@type}}.new({{num}})
	{% end %}
	%array
	end

[WIP] Vectored I/O #6078

[WIP] Vectored I/O #6078

Conversation

carlhoerberg commented May 8, 2018

RX14 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlhoerberg May 8, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlhoerberg commented May 8, 2018

RX14 commented May 8, 2018 • edited Loading

ysbaddaden commented May 8, 2018

straight-shoota commented May 8, 2018

ysbaddaden commented May 8, 2018 • edited Loading

RX14 commented May 8, 2018

asterite commented May 8, 2018

RX14 commented May 8, 2018

carlhoerberg commented May 9, 2018

carlhoerberg commented May 9, 2018

Choose a reason for hiding this comment

carlhoerberg commented May 9, 2018

Choose a reason for hiding this comment

ysbaddaden commented May 9, 2018

wooster0 commented Feb 5, 2019

yxhuvud commented Nov 16, 2019 • edited Loading

RX14 commented Nov 21, 2019

carlhoerberg May 8, 2018 •

edited

Loading

RX14 commented May 8, 2018 •

edited

Loading

ysbaddaden commented May 8, 2018 •

edited

Loading

yxhuvud commented Nov 16, 2019 •

edited

Loading