Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Implement Path #3603

Closed
wants to merge 5 commits into from
Closed

[WIP] Implement Path #3603

wants to merge 5 commits into from

Conversation

krissi
Copy link

@krissi krissi commented Nov 28, 2016

I started implementing Pathname. I plan to have almost no own code in here, besides some glue.

Is there a convention how to prefix macros in documentation? Like .class_method and #instance_method I used %macro for now.

Any wishes or criticism is welcome. Do not fear, rebase and style will come later ;)

@asterite
Copy link
Member

Thanks! But if Pathname simply delegates to File or FileUtils, in its entirety, then I don't see a lot of reasons to include this. I believe Ruby's standard library grew organically and has a lot of duplicated functionality (for example Process and Open3), and in Crystal we try not to have too much of that.

@refi64
Copy link
Contributor

refi64 commented Nov 29, 2016

FWIW, [Python recently added something extremely similar]. It's just a lot more convenient to do stuff with objects instead of overly nested calls, and it also lets you do stuff like path1 / path2 for joining paths.

@krissi
Copy link
Author

krissi commented Nov 29, 2016

Pathname does in fact delegate many of its methods, but these are all just convenience methods. Its role is the representation of a path. This is a concept, which does not exist when using the functions from Dir, File and FileUtils.

In ruby AFAIK, the File- and Dir functions which are working with filenames as arguments are trying to convert them with #to_path first. Instead of the convenience call Pathname("foo.txt").read, File.read(Pathname("foo.txt")) should work as well. This way around the role might be clearer.

The benefit, as @kirbyfan64 pointed out, is the dealing with objects and methods instead of strings and functions. For visualization image a method #auto_annotate(input : IO, cache : IO, output : IO):

# Settings
input_file = "./src/test.cr"
annotation_dir = "./annotations"
cache_dir = "/tmp"

# Classic
input_io = File.open(input_file)
cache_io = File.open(File.join(cache_dir, File.basename(input_file)))
annotation_io = File.open(File.join(annotation_dir, File.basename(input_file, ".cr") + ".html"))

# Pathname (assuming the 3 settings vars are Pathnames)
file = file.open
cache_io = (cache_dir / input_file.basename).open
annotation_io = (annotation_dir / input_file.basename.ext(".html")).open

The only function which is handling files in the example is #open, all other function are related to paths, which are used by directories as well.

Maybe its a idea to implement a class Path which is buildable like rubys Pathname and move only the path related stuff from File to it. The core methods should then rely on this class instead of plain strings.

As further convrnience we could add another class or shard which combines Path, Dir, File and FileUtils, like Pathname does.

input_file = input_file.to_path
input_io = File.open(input_file.to_path)
cache_io = File.open(cache_dir.to_path / input_file.basename)
annotation_io = File.open(annotation_dir.to_path / input_file.basename.ext(".html"))

@ysbaddaden
Copy link
Contributor

Looking at the implementation, Pathname does nothing but delegate to other types. This falls into our "don't duplicate" policy. We want to avoid having many ways to deal with a single problem.

We already have File + Dir and FileUtils which are different because one is more idiomatic and deals with single resources, when the other looks like a shell (cosmetic) and allows operating on many files at once (feature).

This explains why we may be reluctant to add a 3rd solution, which is purely cosmetic and doesn't bring something new.

@bcardiff
Copy link
Member

I think there is value in Pathname abstraction. I like the type safety it can offer avoiding Strings if you want.

But since there are some doubts to add it I would suggest to create a shard, have a mature implementation and if that end up better or more used we can think of adding it to std lib or to the organization if the author want it.

@asterite
Copy link
Member

I am personally not against it, for example the / and + methods are nice, but since this PR was only about forwarding my initial reaction was "no". I'd start with a PR that adds value. Then the delegating methods will be OK, though I wouldn't use macros here. I'd use proper signatures and docs so users understand what arguments need to be passed.

@RX14
Copy link
Contributor

RX14 commented Nov 29, 2016

In java there's a Path class which is somewhat similar to this, and I find myself using it a lot. being able to treat the path as a series of nodes seperated by the separator character is really quite handy. I found myself using methods like relativize a lot. I would highly recommending just calling this class Path though instead of Pathname.

@luislavena
Copy link
Contributor

I will personally about methods like #open and instead add overloads to existing File.open that accepts a Pathname instance instead.

That way, we avoid having to go over and update method signatures all over the place.

@krissi krissi changed the title [WIP] Implement Pathname [WIP] Implement Path Dec 8, 2016
@@ -3,6 +3,7 @@ require "c/stdio"
require "c/stdlib"
require "c/sys/stat"
require "c/unistd"
require "path"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fond that a core class like File have to depend on another class to provide basic functionality like dirname and others.

It also creates multiple instances of Path which are discarded immediately.

I believe past comment in relation to rely was made in the opposite direction: that Path relies on File and Dir classes and not the other way around.

@krissi
Copy link
Author

krissi commented Dec 8, 2016

Thanks for the feedback. Ill address it. Currently this is more an example for a Path core class, not the initial Pathname class.

The idea behind this is to replace strings when handling files and directories for comfort, type safety and maybe security (e.g. injection protection). Following this idea, the path-related File and Dir functions whicn are then methods of Path can be removed in the future.

What do you think about this?

@@ -3,10 +3,9 @@ require "c/stdio"
require "c/stdlib"
require "c/sys/stat"
require "c/unistd"
require "path"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the require and then using it to define SEPARATOR from Path::SEPARATOR is not what I meant with my previous comment.

You're adding Path as dependency of File, not just by the require, but also all the methods owned by File now depend on an instance of Path that is discarded when used.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what you mean. I'm not sure if you have seen my last comment on the PR. I am proposing to move some of the File methods which are currently just Path.new(path).xyz.to_s to path and remove the source methods and the SEPARATOR* constants from the source

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Dec 9, 2016

I'm not very fond of this implementation. I like moving path related methods from File to Path, it makes sense.

But, Path becomes required to join or expand paths (among others). I repeat myself, but it often just instantiates a class to throw it away immediately, to return a String. Same for some methods that are wrapping a single String into an Array, just to iterate over it and throw it away immediately. but this is bad for performance, since it requires to allocate HEAP memory that the GC will have to collect.

It also loses some null bytes check, which reopens some very important security holes. Please keep them until we are sure we can remove them (spoiler: we will most certainly keep them).

We can't expect a developer to have to instantiate a Path just to return the extname of a file (GC pressure), or to have to wrap a String into a Path so we can open a file. Treating paths as simple String still makes sense.

I would prefer a simpler solution that moves the File class methods verbatim to Path, so we'd have Path.join(*names) : String and Path.expand(path, dir = nil) : String methods, and so on. The Path instance methods could then take advantage of these. Then File and Dir methods would simply call to_s on paths it receives, accepting both a String (noop) or a Path, instead of wrapping the String into a Path, at least for the time being.

@ysbaddaden ysbaddaden closed this Dec 9, 2016
@ysbaddaden ysbaddaden reopened this Dec 9, 2016
@ysbaddaden
Copy link
Contributor

(sorry, wrong button)

@asterite
Copy link
Member

We'll probably implement Path or Pathname in the future, as a struct, but this PR doesn't seem to be what we have in mind, so I'm closing this.

@asterite asterite closed this Sep 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants