-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #3107 #3226
Fixes #3107 #3226
Conversation
def gsub(hash : Hash(String, _)) | ||
string = self | ||
hash.each do |key, value| | ||
string = string.gsub(key, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that iterating the string for every replacement is not the best performing solution. You probably want to find the last character of every key, and place them in an array, then iterate the string checking against each character. If a match is found perform strcmp calls to determine if any of the full strings match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RX14 👍 I just left a note at the same time. I was not excited about this change it did not seem very performant to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, String#gsub
addition is not related in any way to the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a sufficiently small diff that it could get merged as 1 PR with 2 commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was curious to see what the difference between the additional String#gsub(Hash(String, _))
versus the already existing String#gsub(Hash(Char, _))
require "benchmark"
string = "Apples " * 10000
hash = { "p" => 'a', "l" => 's' }
inverse = hash.invert
Benchmark.bm do |x|
x.report("Gsub with Hash String => Char 2000 times") do
2000.times do
string.gsub(hash)
end
end
x.report("Gsub with Hash Char => String 2000 times") do
2000.times do
string.gsub(inverse)
end
end
end
I have run this four times and so far my implementation seems faster on my MacBook. Keep in mind I am new to crystal maybe there is a better way to do this benchmark.
user system total real
Gsub with Hash String => Char 2000 times 1.400000 0.270000 1.670000 ( 1.499639)
Gsub with Hash Char => String 2000 times 1.620000 0.160000 1.780000 ( 1.669893)
user system total real
Gsub with Hash String => Char 2000 times 1.390000 0.270000 1.660000 ( 1.482408)
Gsub with Hash Char => String 2000 times 1.610000 0.150000 1.760000 ( 1.669062)
user system total real
Gsub with Hash String => Char 2000 times 1.380000 0.270000 1.650000 ( 1.471932)
Gsub with Hash Char => String 2000 times 1.620000 0.150000 1.770000 ( 1.677077)
user system total real
Gsub with Hash String => Char 2000 times 1.390000 0.270000 1.660000 ( 1.482940)
Gsub with Hash Char => String 2000 times 1.610000 0.150000 1.760000 ( 1.668073)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not as slow as I would have thought:
require "benchmark"
string = "Apples Oranges " * 100
hash = { "p" => 'a', "l" => 's', "es" => 'c', "Oranges" => 'f' }
inverse = hash.invert
string_inverse = string.gsub(hash)
class String
def gsub(hash : Hash(String, _))
string = self
hash.each do |key, value|
string = string.gsub(key, value)
end
string
end
end
Benchmark.ips do |x|
x.report("Hash(String => Char)") do
string.gsub(hash)
end
x.report("Hash(Char => String)") do
string_inverse.gsub(inverse)
end
end
$ ./test
Hash(String => Char) 37.59k (± 0.59%) 1.82× slower
Hash(Char => String) 68.4k (± 0.86%) fastest
With single char replacements:
$ ./test
Hash(String => Char) 40.93k (± 0.45%) 1.45× slower
Hash(Char => String) 59.46k (± 2.19%) fastest
I reduced the string length, and used Benchmark.ips
because it has a much nicer output. it's a bit slower but not by as much as I would have thought.
I am not sure how I feel about the |
@amedeiros Thank you! But this is missing encoded entities like |
It also misses hundreds of other named entities. Basically, this implementation can undo Crystal's |
Ok. I will go back to it when I get a little time. |
Does this also mean that the escape method that is in the standard library is not complete either? |
No. The thing is that to escape HTML some entities must absolutely be escaped, for example For the inverse operation, all encoded entities must be decoded back. You can't leave a For example:
Note that we unescaped |
Awesome thank you for the explanation. I was just looking over the Ruby implementation which is indeed much more involved. I can close this pull request and make a new branch start over so it's clean if that works best. |
may be translate it from ruby https://github.com/threedaymonk/htmlentities as shard |
No description provided.