Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: String#gsub replacing ascii chars to work in non-ascii string #5350

Merged
merged 3 commits into from Jan 1, 2018

Conversation

straight-shoota
Copy link
Member

Fixes #5348

@straight-shoota
Copy link
Member Author

straight-shoota commented Dec 5, 2017

This is a fix for the immediate issue.
It would probably be better if gsub_ascii_char worked with non-ascii strings, it is still just replacing an ascii char for another.

UPDATED: I've implemented the better solution.

@straight-shoota
Copy link
Member Author

straight-shoota commented Dec 5, 2017

This would do as a proper solution in gsub_ascii_char, but I'm not fully confident about that pointer copying:

if my_char == char
  buffer[i] = replacement.ord.to_u8
else
  (buffer + i).copy_from(to_unsafe + i, my_char.bytesize)
end

@larubujo
Copy link
Contributor

larubujo commented Dec 5, 2017

solution is to use each_byte_with_index, not go by char when char is ascii. faster and better. no need for string to be full ascii.

@straight-shoota straight-shoota changed the title Fix: String#gsub should use shortcut only for ascii-only string Fix: String#gsub replacing ascii chars to work in non-ascii string Dec 5, 2017
@straight-shoota
Copy link
Member Author

There is no each_byte_with_index, but to_slice.each_with_index will do.

@straight-shoota
Copy link
Member Author

Can I get a review? It should be a good solution now, not just a workaround as initially announced.

src/string.cr Outdated
each_char_with_index do |my_char, i|
buffer[i] = if my_char == char
to_slice.each_with_index do |byte, i|
buffer[i] = if char === byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use ===, use char.ord == byte.

Copy link
Contributor

@RX14 RX14 Dec 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also i'd much prefer

if char.ord == byte
  buffer[i] = replacement.ord.to_u8
else
  buffer[i] = byte
end

to avoid the ugly indent.

Copy link
Contributor

@Sija Sija Dec 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RX14 I vote to change formatter rule for this indent to uniformly used 2 spaces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's another issue, and I disagree with you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of the curiosity, how's that you don't want to change the rule, while in the same time you suggest against using it? Is there sth I don't get?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because changing the rule would mean it looks like

buffer[i] = if char.ord == byte
  replacement.ord.to_u8
else
  byte
end

which i think is even more ugly than indenting it all.

My preference goes:

  1. what I suggested (assign in the if)
  2. what's implemented in this PR (indent the if)
  3. what you suggested (don't indent the if)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! for my taste it'd be 3,1,2 - in that order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. is what it currently looks like. Until now this PR doesn't change anything about that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well it won't get changed if it's not in this PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RX14 now it is =)

@chastell
Copy link
Contributor

chastell commented Jan 1, 2018

make spec passes for me on this PR, so the TCPSocket failures on Travis are probably unrelated; could someone with access restart the Travis jobs?

@RX14
Copy link
Contributor

RX14 commented Jan 1, 2018

This PR needs to be rebased onto master to pick up the latest travis.yml changes to pass CI.

@straight-shoota
Copy link
Member Author

Rebased.

@RX14 RX14 modified the milestones: 0.24.2, Next Jan 1, 2018
@RX14 RX14 merged commit d7273c7 into crystal-lang:master Jan 1, 2018
@RX14
Copy link
Contributor

RX14 commented Jan 1, 2018

@bcardiff this should also probably be cherry-picked into 0.24.2 but i've milestoned it Next until that happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants