Handle File.extname edge case #6234

bcardiff · 2018-06-21T12:49:01Z

Fix #6215. Isn't it nice how many ways it can be written?

#6216 and #6217 were closed before merging.

This PR took the approach of reverse iterating the string until . or separator is found and making all the decisions from there. Similar to what @straight-shoota did in #5635 as Path#extension but with a more linear flow IMO.

straight-shoota · 2018-06-21T13:15:37Z

src/file.cr

+    return "" unless reader.has_previous?
+
+    # otherwise we are not at the beginning, and there is a previous char.
+    # if current is '/', then the patern is prefix/foo and has no extension


straight-shoota

Actually, there is a mistake in the last line.

straight-shoota · 2018-06-21T13:19:24Z

src/file.cr

+    # we are not at the beginning,
+    # the previous char is not a '/',
+    # and we have an extension
+    return filename[reader.pos + 1, filename.size - 1]


This needs to use filename.byte_slice because reader.pos is byte index. Also, it's faster than String#[].
By the way, the semantics of the current method call are wrong: it's String#[start, count], not String#[start, end].

I guess just calling filename.byte_slice(reader.pos + 1) would be enough, because the extension always ends at the end of the string.

bcardiff · 2018-06-21T15:49:58Z

Thanks for the extra pair of 👀. PR updated.

RX14 · 2018-06-22T16:35:30Z

start_index = filename.rindex(SEPARATOR) || 0
dot_index = filename.rindex('.')

# If the dot wasn't found in `basename(filename)`, ignore it
dot_index = nil if start_index > dot_index

# If there was no separator, return an empty string
return "" unless dot_index

# If the dot was at the start or end of the basename, ignore it
# Examples: "foo.", ".gitignore"
return "" if dot_index == start_index + 1 || dot_index == filename.size - 1

filename[dot_index..-1]

This is the most readable way I could write it. Not the most performant, but hey. Isn't it fun to bikeshed?

RX14

Apart from those nitpicks, this is the best version for the stdlib (that keeps current semantics, I question the requirement to handle all these edge cases)

RX14 · 2018-06-22T16:38:49Z

src/file.cr

+
+    # position the reader at the last . or SEPARATOR
+    while (current_char = reader.current_char) &&
+          (current_char != SEPARATOR && current_char != '.') &&


Why not use reader.current_char != SEPARATOR && reader.current_char != '.' here? This current code makes me think that reader.current_char returning nil is an exit condition for the loop - it's not and reader.current_char cannot return nil.

Actually, Char::Reader is not needed at all for this. You can simply use string.to_slice and go backwards from there, byte per byte. All of '.', / and \ fit in a single byte in UTF-8 so they can't be mistaken for something else. And going byte per byte is both faster and simpler (no need to decode UTF-8).

This is also how it's done in Go.

RX14 · 2018-06-22T16:41:23Z

src/file.cr

+    # we are not at the beginning,
+    # the previous char is not a '/',
+    # and we have an extension
+    return filename.byte_slice(reader.pos + 1)


unnecessary return.

asterite · 2018-06-22T17:18:38Z

src/file.cr

@@ -282,35 +282,37 @@ class File < IO::FileDescriptor
  def self.extname(filename) : String
    filename.check_no_null_byte

-    reader = Char::Reader.new(at_end: filename)
+    bytes = filename.to_slice


return "" if bytes.empty?

(and add a test for File.extname(""), which should be raising now)

Fix crystal-lang#6215. Handle File.extname edge case

fbec6bd

bcardiff added kind:bug topic:stdlib labels Jun 21, 2018

bcardiff added this to the 0.25.1 milestone Jun 21, 2018

straight-shoota approved these changes Jun 21, 2018

View reviewed changes

straight-shoota requested changes Jun 21, 2018

View reviewed changes

bcardiff added 2 commits June 21, 2018 17:48

Fix typo

8c326ef

Use byte_slice

70276fc

straight-shoota approved these changes Jun 21, 2018

View reviewed changes

sdogruyol approved these changes Jun 22, 2018

View reviewed changes

RX14 requested changes Jun 22, 2018

View reviewed changes

bcardiff added 3 commits June 22, 2018 18:59

Remove local variable

9b00b6e

On less return in the world

1d0466d

Iterate bytes

6298a0a

asterite reviewed Jun 22, 2018

View reviewed changes

Empty case

46cf84e

RX14 approved these changes Jun 25, 2018

View reviewed changes

RX14 merged commit 9d99700 into crystal-lang:master Jun 25, 2018

bcardiff deleted the fix/6215-file-extname branch June 26, 2018 18:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle File.extname edge case #6234

Handle File.extname edge case #6234

bcardiff commented Jun 21, 2018

straight-shoota Jun 21, 2018

straight-shoota left a comment

straight-shoota Jun 21, 2018

asterite Jun 21, 2018

bcardiff commented Jun 21, 2018

RX14 commented Jun 22, 2018 •

edited

Loading

RX14 left a comment

RX14 Jun 22, 2018 •

edited

Loading

asterite Jun 22, 2018

RX14 Jun 22, 2018

asterite Jun 22, 2018

asterite Jun 22, 2018

Handle File.extname edge case #6234

Handle File.extname edge case #6234

Conversation

bcardiff commented Jun 21, 2018

straight-shoota Jun 21, 2018

Choose a reason for hiding this comment

straight-shoota left a comment

Choose a reason for hiding this comment

straight-shoota Jun 21, 2018

Choose a reason for hiding this comment

asterite Jun 21, 2018

Choose a reason for hiding this comment

bcardiff commented Jun 21, 2018

RX14 commented Jun 22, 2018 • edited Loading

RX14 left a comment

Choose a reason for hiding this comment

RX14 Jun 22, 2018 • edited Loading

Choose a reason for hiding this comment

asterite Jun 22, 2018

Choose a reason for hiding this comment

RX14 Jun 22, 2018

Choose a reason for hiding this comment

asterite Jun 22, 2018

Choose a reason for hiding this comment

asterite Jun 22, 2018

Choose a reason for hiding this comment

RX14 commented Jun 22, 2018 •

edited

Loading

RX14 Jun 22, 2018 •

edited

Loading