Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify String and Char to integer/float conversion API #3025

Merged
merged 2 commits into from
Jul 22, 2016

Conversation

jhass
Copy link
Member

@jhass jhass commented Jul 21, 2016

  • Add Char#to_i*, Char#to_i*?, Char#to_u* and Char#to_u*?
  • Add Char#to_f, Char#to_f?, Char#to_f* and Char#to_f*?
  • Add String#to_f? and String#to_f*?
  • Make to_f behaviour consistent with to_i behaviour
  • Remove Char#to_i with a block and second argument

The char integer and all float stuff still lacks the amount of options the string integer stuff has, but this should be a good first step and unify for the common usages.

Closes #3024

@asterite
Copy link
Member

@jhass So much ❤️ for this!!

I didn't know how to make to_f raise, because with atof it seemed a bit hard/impossible. I didn't know there was strtof in libc.


# Same as `#to_f?`.
def to_f64?
return nil unless '0' <= self[0] <= '9' || self[0] == '-' || self[0] == '+'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this check needed? I imagine because leading whitespace is allowed by strtod.

Eventually we should have some of the options as to_i:

  • whitespace (defaults to true): if true, leading and trailing whitespaces are allowed
  • strict (defaults to true): if true, extraneous characters past the end of the number are disallowed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, to bypass the whitespace allowance. strtod/strtof also always handles/guesses hexadecimal, I think in the long term we need our own implementation to support all bases and have more efficient ways to check this.

I didn't want to do too much in this PR, it does already quite a few things, I'd like to see the options added in follow up contributions. As said in the OP this is to cater for the common usages first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the whitespace is allowed by default for String#to_i, what about allowing it here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh is it? I didn't even notice and it comes a bit unexpected to me tbh, whitespace in front or the back first of all is garbage to me, like anything else. I wouldn't be opposed at all to changing the default for to_i there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ignoring whitespace is save and probably expected.

For example in competition programs they give you a file with numbers in each line, and then you usually split and invoke to_i. This way the spaces between numbers, and newlines, don't bother.

Or, if you ask for a number in the command line, ignoring whitespace is probably OK. Or if you read it from a file, etc.

But of course, this can be discussed :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I met this a few days ago, and was pleased to see that " 123 " was a valid number string, it's harmless enough (compared to "123foo"). I assume this is the same reason why the strtod specification allows for them too.

@jhass
Copy link
Member Author

jhass commented Jul 21, 2016

In the interest of not blocking this PR on this discussion, I added the whitespace option to String#to_f and companions.

def to_i(base, or_else = 0)
to_i(base) { or_else }
def to_f
to_f32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default Float is Float64, and to_f is always equivalent to to_f64

* Add Char#to_i*, Char#to_i*?, Char#to_u* and Char#to_u*?
* Add Char#to_f, Char#to_f?, Char#to_f* and Char#to_f*?
* Add String#to_f? and String#to_f*?
* Make to_f behavior consistent with to_i behavior
* Remove Char#to_i with a block and second argument
end
end

endptr >= string_end ? v : nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can be more strict, since endptr must point to the last char, and we may add strict at the same time, since you already do check for it, by not allowing anything after except for whitespace:

return v unless strict

if whitespace
  while endptr < string_end && endptr.value.chr.whitespace?
    endptr += 1
  end
end

v if endptr == string_end

Copy link
Member Author

@jhass jhass Jul 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would allow "123 x", with whitespace: true, strict: true, no?

Copy link
Member Author

@jhass jhass Jul 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also just ignoring endptr with strict: true would return 0.0 in the error case.

Copy link
Contributor

@ysbaddaden ysbaddaden Jul 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, there was a few gotchas. The following passes:

  private def to_f_impl(whitespace = true, strict = true)
    return unless whitespace || '0' <= self[0] <= '9' || self[0] == '-' || self[0] == '+'

    ret, endptr = yield
    string_end = to_unsafe + bytesize

    if strict
      if whitespace
        while endptr < string_end && endptr.value.chr.whitespace?
          endptr += 1
        end
      end
      # reached the end of the string
      ret if endptr == string_end
    else
      ptr = to_unsafe
      if whitespace
        while ptr < string_end && ptr.value.chr.whitespace?
          ptr += 1
        end
      end
      # consumed some chars
      ret if endptr > ptr
    end
  end

assert do
  "0".to_f32.should eq(0_f32)
  "0.0".to_f32.should eq(0_f32)
  "123".to_f32.should eq(123_f32)
  "1.2".to_f32.should eq(1.2_f32)
  "1.2 ".to_f32.should eq(1.2_f32)
  " 1.2".to_f32.should eq(1.2_f32)
  "1.2 ".to_f32(whitespace: false).should be_nil
  " 1.2".to_f32(whitespace: false).should be_nil
  "123x".to_f32.should be_nil
  "123x".to_f32(strict: false).should eq(123_f32)
  "123 x".to_f32(strict: false).should eq(123_f32)
  " 123 x".to_f32.should be_nil
  " 123 x".to_f32(strict: false).should eq(123_f32)
  " 123 x ".to_f32(whitespace: false, strict: false).should be_nil
  "123 x".to_f32(whitespace: false, strict: false).should eq(123_f32)
  "x123".to_f32.should be_nil
  "x123".to_f32(strict: false).should be_nil
  " x 123".to_f32.should be_nil
  " x 123".to_f32(strict: false).should be_nil
  "  x 123".to_f32(whitespace: false, strict: false).should be_nil

  # edge case (failed expectation)
  "123x ".to_f32(whitespace: false, strict: false).should be_nil
end

Except for the last edge case, where I'm not sure if it should return nil or 123? there is a trailing space and we refuse it, so it should return nil, but strict is false so whatever is found after "123" is acceptable so it should return 123_f32...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we merge this PR and you can work on that in a follow up? :)

@ysbaddaden ysbaddaden merged commit da3c38e into crystal-lang:master Jul 22, 2016
@asterite asterite added this to the 0.19.0 milestone Jul 22, 2016
@jhass jhass deleted the to_num branch July 25, 2016 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants