-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds ability to use custom separators in CSV.build #5998
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs some specs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good! I've just noted 3 little syntax/performance enhancements.
src/csv/builder.cr
Outdated
@io << %("") | ||
when @quote_char | ||
@io << @quote_char | ||
@io << @quote_char |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO @io << @quote_char << @quote_char
in one line is clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But @Sija wouldn't @io << @quote_char << @quote_char
be faster? Because it's one operation then instead of two. I also made a benchmark:
require "benchmark"
Benchmark.ips do |x|
x.report "2 operations" {
io1 = IO::Memory.new
io1 << "hello"
io1 << "hello"
}
x.report "1 operation" {
io2 = IO::Memory.new
io2 << "hello" << "hello"
}
end
crystal file.cr --release
This gives the results:
2 operations 4.45M (224.91ns) (±10.23%) 1.06× slower
1 operation 4.72M (211.66ns) (± 9.68%) fastest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair 'nuff, changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no????
the two are exactly the same, <<
just returns self
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
God I wish I had a desktop to benchmark on instead of a laptop, the results are always terribly noisy, and the last result always wins. We should do some order randomization and result splicing in the benchmark harness to attempt to combat this
def self.build(io : IO) | ||
builder = Builder.new(io) | ||
def self.build(io : IO, separator : Char = DEFAULT_SEPARATOR, quote_char : Char = DEFAULT_QUOTE_CHAR) | ||
builder = Builder.new(io, separator, quote_char) | ||
yield builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the one line yield Builder.new(io, separator, quote_char)
works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is not about that, sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, let's stop commenting about syntax. Let's focus on semantic. Same line, different line, it's the same behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not about syntax here, it's about avoiding an uneeded memory allocation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably wrong sorry, but builder =
allocates, right? Avoiding this and directly yield
can improve performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Builder.new
allocates heap memory and puts a reference to it on the stack. That piece on the stack is just called builder
. You can call it whatever you want or don't give it any name - it makes no real difference.
@@ -138,7 +139,7 @@ class CSV::Builder | |||
private def needs_quotes?(value) | |||
value.each_byte do |byte| | |||
case byte.unsafe_chr | |||
when ',', '\n', '"' | |||
when @separator, @quote_char, '\n' | |||
return true | |||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A case
condition is a bit overkill here.
Smaller to write return true if [@separator, @quote_char, '\n'].includes? byte.unsafe_chr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... but slower
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @straight-shoota I didn't know that, noted! Why case
is more perfomant, because the includes use an Array
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array#includes?
needs to setup and execute a loop and several function calls... not even considering the array allocation. While you could use a tuple instead and LLVM will probably optimize some of this, there is still a considerable overhead over a case ... when
which simply translates to a few comparisons.
No specs? :-( |
@sdogruyol originally requested specs (#5998 (review)) but merged anyway? @Sija can you make a follow-up? |
@straight-shoota yeah, that was incoherent... 😕 I'll make a followup PR with them if that's alright. |
Closes #5987