Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds ability to use custom separators in CSV.build #5998

Merged
merged 1 commit into from Apr 25, 2018

Conversation

Sija
Copy link
Contributor

@Sija Sija commented Apr 24, 2018

Closes #5987

Copy link
Member

@sdogruyol sdogruyol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs some specs

Copy link
Contributor

@j8r j8r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good! I've just noted 3 little syntax/performance enhancements.

@io << %("")
when @quote_char
@io << @quote_char
@io << @quote_char
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO @io << @quote_char << @quote_char in one line is clearer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But @Sija wouldn't @io << @quote_char << @quote_char be faster? Because it's one operation then instead of two. I also made a benchmark:

require "benchmark"

Benchmark.ips do |x|
  x.report "2 operations" {
    io1 = IO::Memory.new 
    io1 << "hello"
    io1 << "hello"
  }

  x.report "1 operation" {
    io2 = IO::Memory.new 
    io2 << "hello" << "hello"
  }
end

crystal file.cr --release

This gives the results:

2 operations   4.45M (224.91ns) (±10.23%)  1.06× slower
 1 operation   4.72M (211.66ns) (± 9.68%)       fastest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair 'nuff, changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no????

the two are exactly the same, << just returns self.

Copy link
Contributor

@RX14 RX14 Apr 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

God I wish I had a desktop to benchmark on instead of a laptop, the results are always terribly noisy, and the last result always wins. We should do some order randomization and result splicing in the benchmark harness to attempt to combat this

def self.build(io : IO)
builder = Builder.new(io)
def self.build(io : IO, separator : Char = DEFAULT_SEPARATOR, quote_char : Char = DEFAULT_QUOTE_CHAR)
builder = Builder.new(io, separator, quote_char)
yield builder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the one line yield Builder.new(io, separator, quote_char) works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is not about that, sorry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, let's stop commenting about syntax. Let's focus on semantic. Same line, different line, it's the same behaviour.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not about syntax here, it's about avoiding an uneeded memory allocation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably wrong sorry, but builder = allocates, right? Avoiding this and directly yield can improve performance?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Builder.new allocates heap memory and puts a reference to it on the stack. That piece on the stack is just called builder. You can call it whatever you want or don't give it any name - it makes no real difference.

@@ -138,7 +139,7 @@ class CSV::Builder
private def needs_quotes?(value)
value.each_byte do |byte|
case byte.unsafe_chr
when ',', '\n', '"'
when @separator, @quote_char, '\n'
return true
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A case condition is a bit overkill here.
Smaller to write return true if [@separator, @quote_char, '\n'].includes? byte.unsafe_chr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... but slower

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @straight-shoota I didn't know that, noted! Why case is more perfomant, because the includes use an Array?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array#includes? needs to setup and execute a loop and several function calls... not even considering the array allocation. While you could use a tuple instead and LLVM will probably optimize some of this, there is still a considerable overhead over a case ... when which simply translates to a few comparisons.

@sdogruyol sdogruyol merged commit 382d07e into crystal-lang:master Apr 25, 2018
@sdogruyol sdogruyol added this to the Next milestone Apr 25, 2018
@asterite
Copy link
Member

No specs? :-(

@straight-shoota
Copy link
Member

straight-shoota commented Apr 25, 2018

@sdogruyol originally requested specs (#5998 (review)) but merged anyway?

@Sija can you make a follow-up?

@Sija
Copy link
Contributor Author

Sija commented Apr 25, 2018

@straight-shoota yeah, that was incoherent... 😕 I'll make a followup PR with them if that's alright.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants