Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setup.sh: rewrite stripHash #66161

Merged
merged 1 commit into from Nov 12, 2019
Merged

Conversation

lilyball
Copy link
Member

@lilyball lilyball commented Aug 6, 2019

Motivation for this change

It really bugged me when I discovered that stripHash --foo failed with an error so I decided to fix it. While I was at it, I figured I could get rid of the external command invocation as Bash is perfectly capable of doing the necessary string manipulation itself.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nix-review --run "nix-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

I tested the new stripHash function in isolation using nixpkgs.bash and in all cases I tested it matches behavior with the old one (except for the case where the argument starts with --, which simply failed in the old one but works in the new).

I have not actually run any builds with this as it requires rebuilding the entire world (due to changing stdenv) and I don't have the patience to do that locally. I'm not sure if there's any packages that can serve as a simple testbed for stdenv changes.

I realize the changes outside of basename -- may be a little arcane, but it just bugged me to run two external commands when I could get rid of all additional processes outside of basename altogether. On my machine direct manipulation is ~70x faster than invoking an external command.

Rewrite the `stripHash` helper function with 2 differences:

* Paths starting with `--` will no longer produce an error.
* Use Bash string manipulation instead of shelling out to `grep` and
  `cut`. This should be faster.
@@ -802,14 +802,17 @@ dumpVars() {
# Utility function: echo the base name of the given path, with the
# prefix `HASH-' removed, if present.
stripHash() {
local strippedName
local strippedName casematchOpt=0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use oldOpts=$(shopt -p nocasematch) to match how runHook works above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then eval $oldOpts at the end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that, but this way is faster. I realize that the performance difference hardly matters here, but it just bugged me to fork the shell unnecessarily.

If I'm going to accept the cost of a fork, then it would actually be simpler overall to skip this and do

if ( shopt -u nocasematch; [[ "$strippedName" =~ ^[a-z0-9]{32}- ]] ); then

which just unsets it in the subshell.

Copy link
Member Author

@lilyball lilyball Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I measured it and shopt -s nocasematch; shopt -u nocasematch is somewhere around 80x faster (on my machine) than ( shopt -s nocasematch ). More generally (again on my machine), shopt -s nocasematch; shopt -u nocasematch takes <10µs whereas spawning a subshell takes at least 650µs.

This seems like a tiny difference, but imagine if I'm running stripHash on 1000 files. That 650µs turns into 650ms. And we're already spawning a separate process for basename as it is.

For that matter, I'm tempted to replace the call to basename with more string manipulation. It seems relatively straightforward except for one thing, which is that the behavior of basename // is documented as being platform-specific, depending on whether the platform in question treats // as semantically distinct from /. I don't know what platforms do this (does Cygwin do this?) but the inability to replicate this edge case makes me leery about replacing the call.

EDIT:
I dug into the basename source and it is indeed Cygwin (also something called Apollo DomainOS that's so old that basename doesn't even support it, and something else called z/OS). It actually tests for support normally by checking the inodes of / and //, but hardcodes those OSes when cross-compiling.

I didn't see this in the manpage but basename also has support for drive letters, which it enables for Cygwin, Win32, OS/2, and DOS.

This is all to say that I'm not going to try and get rid of the explicit call to basename.

echo "$strippedName" | cut -c34-
strippedName="$(basename -- "$1")"
shopt -q nocasematch && casematchOpt=1
shopt -u nocasematch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often would this be set? By default we should have it unset. But, I guess it's always good to cover every possible case in utility functions like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unset by default, but a package that sets it as part of one of their build scripts shouldn't cause the function to change behavior.

We could just elect to skip this entirely. The results of doing so are that if you do shopt -s nocasematch then stripHash will start stripping hashes containing uppercase characters too, whereas the default behavior is to only strip hashes containing lowercase characters. Ultimately not a big deal, but it was easy enough to avoid.

Copy link
Member

@matthewbauer matthewbauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides a few nits, this looks like a good improvement on stripHash. We don't have tests for this currently, so we need to be a little extra careful with it.

@lilyball
Copy link
Member Author

Is there anything I have to do to move this forward?

@lheckemann lheckemann added this to the 20.03 milestone Sep 12, 2019
@lilyball lilyball closed this Oct 18, 2019
@lilyball lilyball deleted the stripHash-dashdash branch October 18, 2019 22:01
@lilyball
Copy link
Member Author

Oops I did not mean to delete this.

@lilyball lilyball restored the stripHash-dashdash branch October 19, 2019 21:02
@lilyball lilyball reopened this Oct 19, 2019
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/prs-ready-for-review-may-2019/3032/69

@veprbl veprbl added this to WIP in Staging via automation Nov 7, 2019
@veprbl veprbl moved this from WIP to Needs review in Staging Nov 7, 2019
@veprbl
Copy link
Member

veprbl commented Nov 7, 2019

I've read through this and I still don't get what real problem does this solve?

@lilyball
Copy link
Member Author

lilyball commented Nov 7, 2019

@veprbl stripHash is effectively a replacement for basename sans switches with the additional behavior of stripping the hash if present, except the current behavior doesn't treat all possible arguments as a filename. In particular, if the argument starts with --, stripHash assumes it's a filename but basename treats it as a switch. I realize this is a rather esoteric issue, but it really bugged me.

The core fix here is just

-    strippedName="$(basename "$1")"
+    strippedName="$(basename -- "$1")"

The rest of the changes seek to optimize stripHash by getting rid of unnecessary process spawns. The old implementation invokes both grep and cut to do stuff that bash can do entirely in shell script. This optimization makes stripHash significantly faster, which will be useful if you're calling this on a lot of filenames. I measured the cost of spawning a subshell alone at being ~70x the cost of direct option manipulation; I didn't even measure how long it takes to run grep and cut but it must be even more expensive than the mere fork.

@lilyball
Copy link
Member Author

lilyball commented Nov 7, 2019

I guess that means the performance optimization is by far the most significant part of this PR, and the fact that it fixes an edge case is just more of a freebie.

@FRidh FRidh merged commit d45d620 into NixOS:staging Nov 12, 2019
Staging automation moved this from Needs review to Done Nov 12, 2019
@lilyball lilyball deleted the stripHash-dashdash branch November 12, 2019 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Staging
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants