Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String: optimize gsub and tr for ascii char and single byte string #4978

Merged
merged 1 commit into from Sep 16, 2017
Merged

String: optimize gsub and tr for ascii char and single byte string #4978

merged 1 commit into from Sep 16, 2017

Conversation

asterite
Copy link
Member

Fixes #4586

The main performance benefit is that when replacing a single byte char with another single byte char the new string bytesize is known to be the same as the current one, so we can just allocate that. The general gsub implementation uses a String::Builder with an initial approximate capacity.

Here I also check for single byte strings and invoke the same method (different overload) with Char, which should be more efficient in general.

I used this as a benchmark:

require "benchmark"

str = "hello world"

Benchmark.ips do |x|
  x.report("gsub('l', '_')") do
    str.gsub('l', '_')
  end
  x.report("gsub(\"l\", \"_\")") do
    str.gsub("l", "_")
  end
  x.report("tr(\"l\", \"_\")") do
    str.tr("l", "_")
  end
  x.report("gsub(\"wo\", \"wooow\")") do
    str.gsub("wo", "wooow")
  end
  x.report("tr(\"eo\", \"ax\")") do
    str.tr("eo", "ax")
  end
end

Old output:

     gsub('l', '_')   3.63M (275.37ns) (± 9.49%)  1.10× slower
     gsub("l", "_")   3.98M (251.09ns) (± 5.67%)       fastest
       tr("l", "_")   3.92M (255.08ns) (± 4.79%)  1.02× slower
gsub("wo", "wooow")   3.43M (291.68ns) (± 2.37%)  1.16× slower
     tr("eo", "ax")   3.82M (262.09ns) (± 2.16%)  1.04× slower

New output:

     gsub('l', '_')  12.71M ( 78.66ns) (± 3.42%)       fastest
     gsub("l", "_")  12.31M ( 81.22ns) (± 4.30%)  1.03× slower
       tr("l", "_")  12.26M ( 81.57ns) (± 3.21%)  1.04× slower
gsub("wo", "wooow")   3.34M ( 299.1ns) (± 7.31%)  3.80× slower
     tr("eo", "ax")   3.95M (253.12ns) (± 4.47%)  3.22× slower

(the last two entries are just there to see that those checks don't ruin the performance when the optimization can't be applied)

if replacement.is_a?(Char) && char.ascii? && replacement.ascii?
return gsub_ascii_char(char, replacement)
end

gsub { |my_char| char == my_char ? replacement : my_char }
else
self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly unrelated but there is no direct coverage of this line in the specs. I wanted to double check the optimized overload have specs, and they do. But there is no String#gsub(Char, replacement) invocation that will stress this path. Other specs fails if this line is changed though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a spec.

Another discussion: replacement can be anything, like a number, array, string or char. Maybe we should limit it to only chars and strings? (like in Ruby)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave it unbounded until there is another overload. I think is clear from the class and the name of the method what to expect.

@crisward
Copy link
Contributor

👏 FWIW I used gsub way too much. This should have a big impact on the performance of some of my code. Are the benchmarks done with --release ?

@asterite
Copy link
Member Author

Yes, this is with release. Note that this only improves the performance if you replace one ascii char with another one. In all other cases the performance doesn't improve. And this was already in the order of nanoseconds so it's probably not noticeable in a real app. But you can tell us later :-)

@mverzilli mverzilli added this to the Next milestone Sep 16, 2017
@mverzilli mverzilli merged commit b14d8c7 into crystal-lang:master Sep 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants