-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement HTML.escape(string) in terms of HTML.escape(string, io) #4674
Conversation
The overhead seems to come from using |
I've taken a look at the link where the samples come from... are these benchmarks seriously showing that in Rust a regex is faster than a plain loop with simple replacements?? |
@straight-shoota big improvement by byte vs char replacement. good catch. |
It might also come from the longer callstack of It might be also that in the particular case of replacing HTML (single-byte) special characters with (multibyte) entities the
Yup. 🤷 |
I'm really curious to learn how come Rust is faster in Regex @chastell ? |
@sdogruyol See the linked reddit thread here. |
thank you @RX14 👍 |
Sooo… anything else I should do with this PR? 😉 (I’ll happily look into |
@@ -30,7 +30,9 @@ module HTML | |||
# HTML.escape("Crystal & You") # => "Crystal & You" | |||
# ``` | |||
def self.escape(string : String) : String | |||
string.gsub(SUBSTITUTIONS) | |||
io = IO::Memory.new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better (more idiomatic) to use String.build { |io| escape(string, io) }
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IO::Memory
is somehow faster as the OP shows...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String.build
is the idiomatic way to build strings. If alternatives are faster, then String.build
should be improved (global overall benefit), instead of using alternatives in stdlib (duplications).
I'm getting different results:
This is all on a MacBook Pro. |
Same here (linux, i5-3570k):
|
Closing this because the PR doesn't seem to optimize the code (at least for me and @RX14 it results in slower code). It's also longer code. |
While working on my PolyConf slides 😉 I discovered that
HTML.escape(string)
is implemeted in terms ofString#gsub
, rather than in terms ofHTML.escape(string, io)
. The below benchmark (with string examples straight from this wonderful source) suggests implementingHTML.escape(string)
in terms ofHTML.escape(string, io)
is significantly faster while not sacrificing much readability.On current master (d24f79c), realigned for readability:
I’m not 100% sure what are the ramifications of choosing
IO::Memory#to_s
overString#gsub
’s solution based onString.build(bytesize)
(I’d guess potentially using up to twice the memory?), though, so do feel free to correct me on the usefulness of this, of course. ❤️