Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error messages for invalid values of Char #2811

Merged
merged 1 commit into from
Jun 12, 2016

Conversation

jhass
Copy link
Member

@jhass jhass commented Jun 12, 2016

The other question is if .chr shouldn't raise already?

@asterite
Copy link
Member

chr is just a way to transform an int value to a char without performing any type checks. You can think of it as a cast in C. We could maybe have another method to raise if the integer isn't a valid char.

@jhass
Copy link
Member Author

jhass commented Jun 12, 2016

I think I'd prefer chr to do the check and have unsafe_chr or so. Or perhaps we can even just allow .to_u32.as(Char)?

@jhass jhass merged commit 2d57f73 into crystal-lang:master Jun 12, 2016
@jhass jhass deleted the char_error branch June 12, 2016 15:25
@asterite
Copy link
Member

The check will just be checking that the integer is in 0..Int32::MAX, right?

I did a couple of benchmarks, in non-release mode it's twice as slow, but in release mode there's no difference. I guess then it's worth improving this.

@asterite
Copy link
Member

The benchmark I did is:

require "benchmark"

struct Int
  def chr!
    unless 0 <= self <= Int32::MAX
      raise ArgumentError.new("#{self} out of char range")
    end
    chr
  end
end

a = 0
Benchmark.ips do |x|
  x.report("chr") do
    "foo bar".each_byte do |byte|
      a += byte.chr.ord
    end
  end
  x.report("chr!") do
    "foo bar".each_byte do |byte|
      a += byte.chr!.ord
    end
  end
end
puts a

Of course we'd rename chr to unsafe_chr, and chr! to chr (should be very easy not that we define primitives in Crystal code 😸)

@asterite
Copy link
Member

Hmm... actually, the name is still hardcoded in the compiler:

when "chr" then mod.char
. I'll try to improve that first.

@jhass
Copy link
Member Author

jhass commented Jun 12, 2016

You're by now more of a UTF-8 expert than me by now, but it looks like 0x10ffff? My goal would be that if foo.chr does not raise, so it'll be impossible for foo.chr.bytesize and foo.chr.each_byte to raise.

@asterite
Copy link
Member

Mmm... on second though, never mind. We can add an "unsafe_chr" there too, and in codegen too. The @[Primitive] think is just so that we can add docs to it, but the primtiive names still need to be hardcoded.

Right, 0x10ffff should be the maximum value, I don't know why I put Int32::MAX there. I re-ran the benchmark, numbers are the same.

Do you want to send a PR for this? We'd need to check Crystal::VERSION to define chr or unsafe_chr in primitives. In main visitor we can check for both names (the chr condition will be never met in 0.18.0, later we can remove it). And I think chr can be added to Int, no need to add it to each specific integer.

Otherwise I can do it too :-)

@jhass
Copy link
Member Author

jhass commented Jun 12, 2016

Please do, you have a better grasp of which chrs in the stdlib would need to change to unsafe_chrs, if any.

@jhass
Copy link
Member Author

jhass commented Jun 12, 2016

Oh and perhaps we should extract 0x10ffff as Char::MAX...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants