Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Array#to_slice and Array#bytesize #4327

Closed
wants to merge 3 commits into from

Conversation

Papierkorb
Copy link
Contributor

@Papierkorb Papierkorb commented Apr 22, 2017

Hi,

This PR adds Array#to_slice and Array#bytesize including specs, and also adds a simple spec for Array#to_unsafe.

Use-cases are commonly found in I/O code, e.g., to quickly write an array of data into an IO:

player_positions_ary = Game.player_positions # Array(Int32)
client_socket.write player_positions_ary.to_slice

I think it's much better to offer a canonical way of turning an array into a slice for a moment, than having the user to fiddle with ary.to_unsafe.to_slice(ary.size). It may not be obvious to some if to_slice in this case requires the byte size, or the count of elements.

Sometimes you also just want to know the current size in memory of the whole array. Array#bytesize mirrors Slice#bytesize, and is much nicer to read than ary.size * sizeof(MyT), especially as the user has to figure out the T. This also makes it easier to write generic code.

Cheers!

src/array.cr Outdated
#
# This method is **unsafe** because when the array shrinks or grows, it might
# change the position of the internal buffer. The slice would then point to an
# uninialized memory region.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: uninitialized

@ysbaddaden
Copy link
Contributor

Slices are supposed to be safe pointers. Sadly, exposing an Array's internal pointer as a slice is unsafe, because the internal pointer may be reallocated, and accessing any exported slice would segfault.

Array#to_slice doesn't exist to not introduce a safety a feeling to the developer, when it's actually not. Use #to_unsafe and #size instead, so it's clear that you are dealing with something unsafe.

@bcardiff
Copy link
Member

Maybe we could add Array#to_unsafe_slice?

@RX14
Copy link
Contributor

RX14 commented Apr 22, 2017

#to_slice would be feasible, however it would have to copy the underlying memory.

@Papierkorb
Copy link
Contributor Author

What about this? Renamed #to_slice to #to_unsafe_slice, and made the documentation more clear on dangerous usage.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Apr 24, 2017

IMHO a Slice ain't just a Pointer with a size, but also caries the idea of a safe struct. It's obvious a Pointer is unsafe, it's less obvious a Slice is, because we sell Slices as a safe alternative to Pointers. Adding Array#to_unsafe_slice in stdlib would push the idea that a Slice can be unsafe, which I'm not comfortable with.

You must be very careful with the exported, unsafe Slice, too; if you are writing the slice to an IO, the IO will likely switch fibers, and... if something mutates the Array in another Fiber: 💥 segfault. You may be confident when you write the code that this won't happen... but what about a few weeks or months later? when you optimize Game.player_positions to reuse an Array instead of allocating one on each call?

I support @RX14 solution: allocate and copy memory (that won't trigger the event loop).

class Array(T)
  def to_slice
    pointer = Pointer(T).malloc(size)
    pointer.copy_from(to_unsafe, size)
    Slice(T).new(pointer, size)
  end
end

@bew
Copy link
Contributor

bew commented Apr 24, 2017

@ysbaddaden totally agree

If we're going with the allocate & copy solution, the documentation should mention that it's slow (copy the array buffer), and even slower on an Array of struct.

@mverzilli
Copy link

In that same vein, wouldn't it be better to be a little more explicit and call it copy_to_slice? Nobody reads the documentation of to_slice given it's so pervasive in the std

@ysbaddaden
Copy link
Contributor

Are you sure? to_ already means an export, we do x.to_u32 not x.copy_to_u32. I agree that to_slice usually exposes internals directly, but isn't it a side-effect of objects having a stable internal pointer, which allows optimizations?

@Papierkorb
Copy link
Contributor Author

@ysbaddaden It is not-obvious that a call to #to_slice on an array would copy anything. I wouldn't expect it to. E.g. String#to_slice doesn't copy. Why it doesn't isn't that important, the user knows it's fast, the internal detail became a feature. That assumption would break with Array, causing headache if a production application suddenly slows down to a crawl.

@ysbaddaden
Copy link
Contributor

Last argument, then I shut up: naming should be done consistently on what is expected to be returned, not internal details optimizations.

If a method could accept an Array, StaticArray, Slice of String or whatever, I would have to make a special case for Array, because it decided to use the unexpected #copy_to_slice instead of the expected #to_slice.

Silly example: what about a What#copy_to_s method?

@asterite
Copy link
Member

Array is a high-level type, and taking slices from it is super unsafe because later the array can change and discard a whole slice (when realloc happens). I think it's better to use the to_unsafe way explicitly to show that something unsafe is happening.

@asterite asterite closed this Sep 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants