-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify String and Char to integer/float conversion API #3025
Conversation
@jhass So much ❤️ for this!! I didn't know how to make |
|
||
# Same as `#to_f?`. | ||
def to_f64? | ||
return nil unless '0' <= self[0] <= '9' || self[0] == '-' || self[0] == '+' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this check needed? I imagine because leading whitespace is allowed by strtod
.
Eventually we should have some of the options as to_i
:
- whitespace (defaults to
true
): if true, leading and trailing whitespaces are allowed - strict (defaults to
true
): if true, extraneous characters past the end of the number are disallowed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, to bypass the whitespace allowance. strtod
/strtof
also always handles/guesses hexadecimal, I think in the long term we need our own implementation to support all bases and have more efficient ways to check this.
I didn't want to do too much in this PR, it does already quite a few things, I'd like to see the options added in follow up contributions. As said in the OP this is to cater for the common usages first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the whitespace
is allowed by default for String#to_i
, what about allowing it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh is it? I didn't even notice and it comes a bit unexpected to me tbh, whitespace in front or the back first of all is garbage to me, like anything else. I wouldn't be opposed at all to changing the default for to_i
there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ignoring whitespace is save and probably expected.
For example in competition programs they give you a file with numbers in each line, and then you usually split and invoke to_i
. This way the spaces between numbers, and newlines, don't bother.
Or, if you ask for a number in the command line, ignoring whitespace is probably OK. Or if you read it from a file, etc.
But of course, this can be discussed :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I met this a few days ago, and was pleased to see that " 123 "
was a valid number string, it's harmless enough (compared to "123foo"
). I assume this is the same reason why the strtod
specification allows for them too.
In the interest of not blocking this PR on this discussion, I added the |
def to_i(base, or_else = 0) | ||
to_i(base) { or_else } | ||
def to_f | ||
to_f32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default Float is Float64
, and to_f
is always equivalent to to_f64
* Add Char#to_i*, Char#to_i*?, Char#to_u* and Char#to_u*? * Add Char#to_f, Char#to_f?, Char#to_f* and Char#to_f*? * Add String#to_f? and String#to_f*? * Make to_f behavior consistent with to_i behavior * Remove Char#to_i with a block and second argument
end | ||
end | ||
|
||
endptr >= string_end ? v : nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can be more strict, since endptr
must point to the last char, and we may add strict
at the same time, since you already do check for it, by not allowing anything after except for whitespace:
return v unless strict
if whitespace
while endptr < string_end && endptr.value.chr.whitespace?
endptr += 1
end
end
v if endptr == string_end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would allow "123 x"
, with whitespace: true, strict: true
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also just ignoring endptr
with strict: true
would return 0.0
in the error case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, there was a few gotchas. The following passes:
private def to_f_impl(whitespace = true, strict = true)
return unless whitespace || '0' <= self[0] <= '9' || self[0] == '-' || self[0] == '+'
ret, endptr = yield
string_end = to_unsafe + bytesize
if strict
if whitespace
while endptr < string_end && endptr.value.chr.whitespace?
endptr += 1
end
end
# reached the end of the string
ret if endptr == string_end
else
ptr = to_unsafe
if whitespace
while ptr < string_end && ptr.value.chr.whitespace?
ptr += 1
end
end
# consumed some chars
ret if endptr > ptr
end
end
assert do
"0".to_f32.should eq(0_f32)
"0.0".to_f32.should eq(0_f32)
"123".to_f32.should eq(123_f32)
"1.2".to_f32.should eq(1.2_f32)
"1.2 ".to_f32.should eq(1.2_f32)
" 1.2".to_f32.should eq(1.2_f32)
"1.2 ".to_f32(whitespace: false).should be_nil
" 1.2".to_f32(whitespace: false).should be_nil
"123x".to_f32.should be_nil
"123x".to_f32(strict: false).should eq(123_f32)
"123 x".to_f32(strict: false).should eq(123_f32)
" 123 x".to_f32.should be_nil
" 123 x".to_f32(strict: false).should eq(123_f32)
" 123 x ".to_f32(whitespace: false, strict: false).should be_nil
"123 x".to_f32(whitespace: false, strict: false).should eq(123_f32)
"x123".to_f32.should be_nil
"x123".to_f32(strict: false).should be_nil
" x 123".to_f32.should be_nil
" x 123".to_f32(strict: false).should be_nil
" x 123".to_f32(whitespace: false, strict: false).should be_nil
# edge case (failed expectation)
"123x ".to_f32(whitespace: false, strict: false).should be_nil
end
Except for the last edge case, where I'm not sure if it should return nil
or 123
? there is a trailing space and we refuse it, so it should return nil
, but strict is false so whatever is found after "123"
is acceptable so it should return 123_f32
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we merge this PR and you can work on that in a follow up? :)
The char integer and all float stuff still lacks the amount of options the string integer stuff has, but this should be a good first step and unify for the common usages.
Closes #3024