Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include Enumerable in CSV #3465

Closed
wants to merge 1 commit into from
Closed

Conversation

tristil
Copy link
Contributor

@tristil tristil commented Oct 24, 2016

Include Enumerable in CSV, and add a test for map_with_index.

@tristil
Copy link
Contributor Author

tristil commented Oct 24, 2016

Let me know if I should add comments about justification to the commit message.

@chaniks
Copy link

chaniks commented Oct 24, 2016

You can write it here. You can also report an issue first, and then reference it in both the commit message and the PR, which usually makes it easier to track things.

Include Enumerable in CSV in order to be able to use `each_with_index`,
`map_with_index`, etc. Includes a test for `map_with_index`.
@tristil
Copy link
Contributor Author

tristil commented Oct 25, 2016

Updated commit message to explain justification and made test more robust.

@asterite
Copy link
Member

@tristil Thank you! But CSV is not an enumerable. An instance of CSV is more like a cursor that always points to a row and you can fetch any of the columns by index, header name or regex. That's why each doesn't return the row (your question in Google Groups). In fact a CSV instance has just one instance variable for storing all rows, but it just stores one at a time, for efficiency (to avoid allocating new arrays for each row). You can invoke to_a on a CSV to get a copy of that row.

For an Enumerable (well, Iterator), you can use CSV.each_row:

CSV.each_row(string_or_io).with_index.each { |row, index| ... }

@asterite asterite closed this Oct 25, 2016
@tristil
Copy link
Contributor Author

tristil commented Oct 25, 2016

@asterite Okay, I understand now, and I see that this is well documented on the class. What confused me, besides not reading the documentation well, is that there is an each method which just calls next for you. My thinking was that anything that implements each should be an enumerable, but I guess the idea is that this could create problems because e.g. if you did CSV.new(string_or_io).each(&.itself) you would get back the same instance over and over again? I'm still vague on the principle that governs when something that implements each should include Enumerable, but I am satisfied that I can use the other interface, which cleans up my code. Thanks!

While I'm here, would it make sense to implement the headers functionality in CSV Parser? If so, I can take a shot at it.

@asterite
Copy link
Member

@tristil The headers functionality is already there. What do you need to do?

@tristil
Copy link
Contributor Author

tristil commented Oct 25, 2016

@asterite So my new code is

require "csv"

def get_rows
  text = File.read("vendors.csv")

  CSV.each_row(text).map_with_index do |row, index|
    next if index == 0

    row
  end.compact
end

rows = get_rows

The only reason I need the index is to skip the first row, which I would get for free with headers: true if I were using the standard interface. Right now it seems like you have to choose between using the iterator/parsing interface but you don't get the headers switch or using the standard interface. Seems like it would be good to have CSV.each_row(text, headers: true) for consistency?

@tristil
Copy link
Contributor Author

tristil commented Oct 25, 2016

@asterite I see now that wouldn't work unless it was returning hashes instead of Array(String), which is a bigger change than I was talking about. Looking at the Ruby interface they switch to returning Row objects when you set headers: true. OTOH it's a bit confusing that you have a CSV::Row class but each_row yields an Array(String).

@chaniks
Copy link

chaniks commented Oct 25, 2016

Just for note, you can also use Iterator#skip to skip header, like:

CSV.each_row(text).skip(1).to_a

And I think CSV.each_row is a lightweight convenient(utility) method for those who don't want to use the full CSV implementation. In that case, returning an array might be reasonable.

@tristil
Copy link
Contributor Author

tristil commented Oct 25, 2016

@chaniks That skip is nice, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants