Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maintainers/scripts/update.nix: Add support for auto-commiting changes #59372

Closed
wants to merge 6 commits into from

Conversation

jtojnar
Copy link
Contributor

@jtojnar jtojnar commented Apr 12, 2019

No description provided.

@jtojnar

This comment has been minimized.

doc/stdenv.xml Outdated Show resolved Hide resolved
@NeQuissimus
Copy link
Member

Could you then go through all the update.sh files and remove the git commands? :)
I like this, I've been trying to streamline the update process for packages I am interested in.
We should probably also have the ability to attach derivations to be built and tests to be run as part of the updating process. And revert any changes if the builds/tests fail (or maybe it can already do that and I am using it wrong :D).

@jtojnar
Copy link
Contributor Author

jtojnar commented Apr 13, 2019

Could you then go through all the update.sh files and remove the git commands? :)

I think I removed most of them when we parallelized the updater.

We should probably also have the ability to attach derivations to be built and tests to be run as part of the updating process. And revert any changes if the builds/tests fail (or maybe it can already do that and I am using it wrong :D).

In GNOME, we use this for the menial parts of update process, expecting that the package might not build. The failures (most often dependency changes) need to be fixed manually but we still want to keep the commit.

@jtojnar
Copy link
Contributor Author

jtojnar commented Apr 13, 2019

I have managed to get it to pretty much the state I wanted: It will create a new git worktree for each thread in the pool and run the update script there. Then it will commit the change in the worktree and cherry pick it in the main repo, releasing the worktree for a next change.

I have two gripes with this:

  1. It feels slow. Though, I am not sure if it is not just in my head.
  2. The worktree tempdir is created in /run/user/1000/, quickly running out of space – had to run it under env TMPDIR=/tmp.

@jtojnar jtojnar requested a review from romildo April 13, 2019 02:14
@jtojnar
Copy link
Contributor Author

jtojnar commented Apr 13, 2019

One thing to consider: do we want to updaters to have to add commit to supported features, or should we parse the old and new versions from the expression in update.nix?

thread_name = package['thread']
worktree, lock = temp_dirs[thread_name]
changes = json.loads(p.stdout)
for change in changes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too much nesting. Consider a small separate function for the following lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say a much more pressing issue is how tightly coupled is everything with console input/output and the interruption not working very well.

Something like this looks better but interruption is still not handled gracefully. Perhaps using queue.Queue would make it nicer.

--- a/maintainers/scripts/update.py
+++ b/maintainers/scripts/update.py
@@ -9,6 +9,7 @@
 import threading
 
 updates = {}
+temp_dirs = {}
 
 thread_name_prefix='UpdateScriptThread'
 
@@ -28,6 +29,41 @@
 
     return subprocess.run(package['updateScript'], stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True, cwd=worktree)
 
+def update_packages(packages, max_workers, commit):
+    with contextlib.ExitStack() as stack, concurrent.futures.ThreadPoolExecutor(max_workers=max_workers, thread_name_prefix=thread_name_prefix) as executor:
+
+        if commit:
+            for i in range(max_workers):
+                temp_dirs[f'{thread_name_prefix}_{str(i)}'] = stack.enter_context(tempfile.TemporaryDirectory()), threading.Lock()
+
+            for wt, _lock in temp_dirs.values():
+                subprocess.run(['git', 'worktree', 'add', wt], check=True)
+
+        for package in packages:
+            updates[executor.submit(run_update_script, package, commit)] = package
+
+        for future in concurrent.futures.as_completed(updates):
+            package = updates[future]
+
+            try:
+                p = future.result()
+                if commit and 'commit' in package['supportedFeatures']:
+                    thread_name = package['thread']
+                    worktree, lock = temp_dirs[thread_name]
+                    changes = json.loads(p.stdout)
+                    for change in changes:
+                        subprocess.run(['git', 'add'] + change['files'], check=True, cwd=worktree)
+                        commit_message = '{attrName}: {oldVersion} → {newVersion}'.format(**change)
+                        subprocess.run(['git', 'commit', '-m', commit_message], check=True, cwd=worktree)
+                        subprocess.run(['git', 'cherry-pick', os.path.basename(worktree)], check=True)
+                yield package, True, None
+            except subprocess.CalledProcessError as e:
+                yield package, False, e.stdout
+            finally:
+                if commit and 'commit' in package['supportedFeatures']:
+                    lock.release()
+
+
 def main(max_workers, keep_going, commit, packages):
     with open(sys.argv[1]) as f:
         packages = json.load(f)
@@ -43,49 +79,22 @@
         eprint()
         eprint('Running update for:')
 
-        with contextlib.ExitStack() as stack, concurrent.futures.ThreadPoolExecutor(max_workers=max_workers, thread_name_prefix=thread_name_prefix) as executor:
-            global temp_dirs
-
-            if commit:
-                temp_dirs = {f'{thread_name_prefix}_{str(i)}': (stack.enter_context(tempfile.TemporaryDirectory()), threading.Lock()) for i in range(max_workers)}
-
-                for wt, _lock in temp_dirs.values():
-                    subprocess.run(['git', 'worktree', 'add', wt], check=True)
-
-            for package in packages:
-                updates[executor.submit(run_update_script, package, commit)] = package
-
-            for future in concurrent.futures.as_completed(updates):
-                package = updates[future]
-
-                try:
-                    p = future.result()
-                    if commit and 'commit' in package['supportedFeatures']:
-                        thread_name = package['thread']
-                        worktree, lock = temp_dirs[thread_name]
-                        changes = json.loads(p.stdout)
-                        for change in changes:
-                            subprocess.run(['git', 'add'] + change['files'], check=True, cwd=worktree)
-                            commit_message = '{attrName}: {oldVersion} → {newVersion}'.format(**change)
-                            subprocess.run(['git', 'commit', '-m', commit_message], check=True, cwd=worktree)
-                            subprocess.run(['git', 'cherry-pick', os.path.basename(worktree)], check=True)
-                    eprint(f" - {package['name']}: DONE.")
-                except subprocess.CalledProcessError as e:
-                    eprint(f" - {package['name']}: ERROR")
-                    eprint()
-                    eprint(f"--- SHOWING ERROR LOG FOR {package['name']} ----------------------")
-                    eprint()
-                    eprint(e.stdout.decode('utf-8'))
-                    with open(f"{package['pname']}.log", 'wb') as f:
-                        f.write(e.stdout)
-                    eprint()
-                    eprint(f"--- SHOWING ERROR LOG FOR {package['name']} ----------------------")
-
-                    if not keep_going:
-                        sys.exit(1)
-                finally:
-                    if commit and 'commit' in package['supportedFeatures']:
-                        lock.release()
+        for package, status, output in update_packages(packages, max_workers, commit):
+            if package['status']:
+                eprint(f" - {package['name']}: DONE.")
+            else:
+                eprint(f" - {package['name']}: ERROR")
+                eprint()
+                eprint(f"--- SHOWING ERROR LOG FOR {package['name']} ----------------------")
+                eprint()
+                eprint(output.decode('utf-8'))
+                with open(f"{package['pname']}.log", 'wb') as f:
+                    f.write(output)
+                eprint()
+                eprint(f"--- SHOWING ERROR LOG FOR {package['name']} ----------------------")
+
+                if not keep_going:
+                    sys.exit(1)
 
         eprint()
         eprint('Packages updated!')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to separate function. I am also thinking about switching from ThreadPoolExecutor to asyncio and https://pypi.org/project/asyncio-pool/

@hedning
Copy link
Contributor

hedning commented Apr 23, 2019

Tested this out and it seems to work, didn't seem to take a huge a amount of time compared to the previous version. It did get stuck on one update for a while, though I think that could happen before too.

One issue, I had to run git worktree prune after aborting an update once as the updater tried to reuse a non-existing worktree. Are we cleaning up properly on exit/start?

@jtojnar
Copy link
Contributor Author

jtojnar commented Apr 23, 2019

It did get stuck on one update for a while, though I think that could happen before too.

Do you mean that only single update script was running at the time? I guess we could have the threads waiting for their worktree lock to be released but that should not take a long time.

The context manager cleans worktrees once it finishes and should do the same even when exception occurs. But maybe exceptions triggered by signals are somewhat different, especially with regards to threads.

Maybe we need to handle the signals separately: https://stackoverflow.com/questions/18499497/how-to-process-sigterm-signal-gracefully

Or we could even use asyncio instead of threads. The main benefit is that unlike threads, the subprocesses can be stopped easily. Perhaps we could even have a pool of worktrees, simplifying the code even more. https://stackoverflow.com/questions/41677434/how-to-make-a-asyncio-pool-cancelable

This series offers a nice description of how python does these things: https://pymotw.com/3/concurrency.html

@jtojnar
Copy link
Contributor Author

jtojnar commented Apr 25, 2019

Some more questions:

We probably need to use the worktrees for non-commit use case as well, right? Otherwise, the packages.nix (for example Vala) could be broken, since update-source-version will race.

How would we merge the changes then, though? git add -Agit commit (versions would be obtained from derivation name before and after the change) → git cherry-pick --no-commit? What would happen when there are conflicts git cannot resolve? (I really need to focus on possible failure modes in the current solution as well.)

Or should we still update the files in-tree and try to solve this by locking? How would we know what files to lock? Another attribute listing the locked files in the updateScript attrset?

And since I am already considering always using the worktree, we could get the commit feature for free (the merge procedure would only differ by the presence of --no-commit flag when cherry-picking). We would have to assume that each updateScript only updates a single attribute but we already do that – we only cherry-pick the last commit. Should we then get rid of the explicit commit feature? Do we remove the --print-changes from update-source-version as well?

Finally, should we reset the worktree after each merge? It would make conflicts always manifest, not only when they occur in different worker threads, which might be preferable.

@jtojnar
Copy link
Contributor Author

jtojnar commented Apr 25, 2019

Also should the commit flag default to true?

@hedning
Copy link
Contributor

hedning commented Apr 29, 2019

Do you mean that only single update script was running at the time?

I think it all the other threads were done, think it was just one update that was stuck a while, probably unrelated.

The context manager cleans worktrees once it finishes and should do the same even when exception occurs

Yeah, the directories in tmp were gone, but git still had references to them, it looked like the update script tried to reuse them and failed since they were no longer there. I might've pressed ctrl-c twice, killing the cleanup before it was entirely done though.

Also should the commit flag default to true?

I wouldn't mind, it's just as easy (if not easier) to deal with the commits as un-committed changes in the worktree.

worldofpeace pushed a commit to worldofpeace/nixpkgs that referenced this pull request May 1, 2019
It doesn't have to be that complicated with the major and minor versions
being split. We have library functions that handle it fine.

passthru.upateScript can be enabled once NixOS#59372 is merged
@jtojnar
Copy link
Contributor Author

jtojnar commented Oct 6, 2019

Yeah, hopefully the Pool implementation will default to creating instances on demand.

@hedning
Copy link
Contributor

hedning commented Oct 7, 2019

Hmm, it looks like it four instances on my machine even if there's only one package:

Going to be running update for following packages:
 - gnome-builder-3.34.0

Press Enter key to continue...

Running update for:
Preparing worktree (new branch 'update-tmphlopysqe')
Updating files: 100% (19962/19962), done.
HEAD is now at 5610389e1ff fixup! gnome3.gdm: 3.34.0 → 3.34.1
Preparing worktree (new branch 'update-tmpbk29udcw')
HEAD is now at 5610389e1ff fixup! gnome3.gdm: 3.34.0 → 3.34.1
Preparing worktree (new branch 'update-tmpy36__ucd')
HEAD is now at 5610389e1ff fixup! gnome3.gdm: 3.34.0 → 3.34.1
Preparing worktree (new branch 'update-tmpip9rlkpa')
HEAD is now at 5610389e1ff fixup! gnome3.gdm: 3.34.0 → 3.34.1
 - gnome-builder-3.34.0: UPDATING ...

@jtojnar
Copy link
Contributor Author

jtojnar commented Oct 7, 2019

Yeah, I have not pushed the new Pool implementation yet. Hence the WIP status.

@stale
Copy link

stale bot commented Sep 4, 2020

Hello, I'm a bot and I thank you in the name of the community for your contributions.

Nixpkgs is a busy repository, and unfortunately sometimes PRs get left behind for too long. Nevertheless, we'd like to help committers reach the PRs that are still important. This PR has had no activity for 180 days, and so I marked it as stale, but you can rest assured it will never be closed by a non-human.

If this is still important to you and you'd like to remove the stale label, we ask that you leave a comment. Your comment can be as simple as "still important to me". But there's a bit more you can do:

If you received an approval by an unprivileged maintainer and you are just waiting for a merge, you can @ mention someone with merge permissions and ask them to help. You might be able to find someone relevant by using Git blame on the relevant files, or via GitHub's web interface. You can see if someone's a member of the nixpkgs-committers team, by hovering with the mouse over their username on the web interface, or by searching them directly on the list.

If your PR wasn't reviewed at all, it might help to find someone who's perhaps a user of the package or module you are changing, or alternatively, ask once more for a review by the maintainer of the package/module this is about. If you don't know any, you can use Git blame on the relevant files, or GitHub's web interface to find someone who touched the relevant files in the past.

If your PR has had reviews and nevertheless got stale, make sure you've responded to all of the reviewer's requests / questions. Usually when PR authors show responsibility and dedication, reviewers (privileged or not) show dedication as well. If you've pushed a change, it's possible the reviewer wasn't notified about your push via email, so you can always officially request them for a review, or just @ mention them and say you've addressed their comments.

Lastly, you can always ask for help at our Discourse Forum, or more specifically, at this thread or at #nixos' IRC channel.

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Sep 4, 2020
Not sure why I chose ProcessPoolExecutor in the first place.
Printing the changed file and new version can be used to commit the changes to git.
Update scripts can now declare features using

	passthru.updateScript = {
	  command = [ ../../update.sh pname ];
	  supportedFeatures = [ "commit" ];
	};

A `commit` feature means that when the update script finishes successfully,
it will print a JSON list like the following:

	[
	  {
	    "attrName": "volume_key",
	    "oldVersion": "0.3.11",
	    "newVersion": "0.3.12",
	    "files": [
	      "/path/to/nixpkgs/pkgs/development/libraries/volume-key/default.nix"
	    ]
	  }
	]

and data from that will be used when update.nix is run with --argstr commit true
to create commits.

We will create a new git worktree for each thread in the pool and run the update
script there. Then we will commit the change and cherry pick it in the main repo,
releasing the worktree for a next change.
Get rid of some globals, split main into smaller functions, rename some variables, add typehints.
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Sep 18, 2020
@jtojnar
Copy link
Contributor Author

jtojnar commented Sep 19, 2020

Replaced by #98304

@jtojnar jtojnar closed this Sep 19, 2020
@jtojnar jtojnar deleted the updateScript-commit branch September 19, 2020 23:10
alyssais added a commit to alyssais/nixpkgs that referenced this pull request Jun 17, 2021
NixOS#59372 was replaced with NixOS#98304, which was merged as
74c5472, so I'm following the
instructions in the comment and enabling the updateScript. Seems to work.
@alyssais alyssais mentioned this pull request Jun 17, 2021
11 tasks
alyssais added a commit that referenced this pull request Jun 17, 2021
#59372 was replaced with #98304, which was merged as
74c5472, so I'm following the
instructions in the comment and enabling the updateScript. Seems to work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants