Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec String#unsafe_byte_at #5500

Closed
wants to merge 1 commit into from

Conversation

chastell
Copy link
Contributor

@chastell chastell commented Jan 1, 2018

I noticed the String#unsafe_byte_at#unsafe_chr pattern used in a few places, so this adds String#unsafe_chr_at.

Question: should String#unsafe_byte_at indeed return 0_u8 on out-of-bounds index?

@oprypin
Copy link
Member

oprypin commented Jan 1, 2018

Sorry for the harsh message below. Nothing personal, I just don't think that more time should be spent in this direction. But also feel free to ignore me.

So the behavior is
Get the n-th byte, and take the Unicode character corresponding to its value. Don't mind that for most characters this does not work and returns arbitrary gibberish.
The reason that you found use for it in standard library is due to specific optimizations, but replacing the existing explicit actions with this magical method doesn't make things clearer. There is no use for this otherwise, it's clutter at best.

There is no precedent for naming things as chr_at.
There is precedent for char_at which means something completely different.

@oprypin
Copy link
Member

oprypin commented Jan 1, 2018

As for your question: the behavior of String#unsafe_byte_at is undefined for an out-of-bounds index.

In practice, it returns whatever is in the memory after the string (also note that there's one guaranteed 0 byte at the end of the string)

@chastell
Copy link
Contributor Author

chastell commented Jan 1, 2018

I didn’t find this harsh at all, I fully understand your reasoning!

I cut this PR down to speccing the (defined…) String#unsafe_byte_at behaviour.

@chastell chastell changed the title Add (and use) String#unsafe_chr_at Spec String#unsafe_byte_at Jan 1, 2018
@RX14
Copy link
Contributor

RX14 commented Jan 1, 2018

Not sure about this, all the other string specs heavily test this internal method so it has coverage already. unsafe_byte_at should probably be protected...

@asterite
Copy link
Member

asterite commented Jan 1, 2018

I agree here, I don't think we need to add general unsafe methods, specially when they are not used a lot. At most they should be protected. Same goes with unsafe_byte_at, should be protected or undocumented, can be easily achieved with to_unsafe[I]

@chastell
Copy link
Contributor Author

chastell commented Jan 1, 2018

Welllll, I tried making String#unsafe_byte_at protected, but it’s also used in IO#gets, HTTP::Params.parse, URI.unescape and URI.unescape_one. 😄 But I do agree with your arguments, closing!

@chastell chastell closed this Jan 1, 2018
@chastell chastell deleted the String#unsafe_chr_at branch January 1, 2018 18:23
@chastell
Copy link
Contributor Author

chastell commented Jan 1, 2018

(Let me know if you think adjusting IO#gets, HTTP::Params.parse, URI.unescape and URI.unescape_one to use String#to_unsafe and making String#unsafe_byte_at protected would be welcome, I can happily do that.)

@straight-shoota
Copy link
Member

What purpose has unsafe_byte_at as a dedicated method anyway? It's just an alias for to_unsafe[] which is actually shorter...

@RX14
Copy link
Contributor

RX14 commented Jan 1, 2018

unsafe_byte_{slice,string} should also be made either :nodoc: or protected. Heavilly prefer the latter.

@straight-shoota
Copy link
Member

unsafe_byte_slice_string is already protected and unsafe_byte_slice is only used in file.cr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants