Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for EarlyTasks. #1632

Closed
wants to merge 8 commits into from
Closed

Support for EarlyTasks. #1632

wants to merge 8 commits into from

Conversation

felixfontein
Copy link
Contributor

The earlytask_impl branch implements support for EarlyTasks in Nikola, which are tasks that are executed before Nikola scans for posts. See #1562 for more details. After some testing (sorry it took so long) I think it works rather well now, so it's time to start a PR :-)

What this branch improves over current Nikola:

  • there's a better separation between Tasks and LateTasks: LateTasks won't be processed before Tasks aren't done; this makes in particular a difference when multithreading/multiprocessing is used;
  • scanning for posts is now collected to happen at precisely one place: in stage 0, which is before the stage where all Task objects are processed, and after all EarlyTask objects are processed;

What this branch requires:

  • an up-to-date version of doit

What this branch cannot deliver at the moment:

  • nikola build <filename> does not work without further support in doit (see specify target of a DelayedTask on command line pydoit/doit#20, haven't had time to really work on it yet);
  • nikola build <taskname> only works for tasks named after the plugin itself, and not for (potential) otherwise named tasks created by the plugins;
  • when running nikola in parallel (via -n 4 for example), one has to tell doit (via -P thread) to use threads instead of processes.

(fixes #1553 and #1562)

Review on Reviewable

@felixfontein felixfontein added this to the Whenever milestone Mar 8, 2015
@Kwpolska
Copy link
Member

Kwpolska commented Mar 8, 2015

How up-to-date is up-to-date? Change the dependency in requirements.txt to match.

Also, can't we tell doit to use threads from code?

@Kwpolska
Copy link
Member

Kwpolska commented Mar 8, 2015

Also, the tests fail on Travis, which is a very bad thing you absolutely need to fix, especially because there are dependency errors in integration tests.

@felixfontein
Copy link
Contributor Author

As up-to-date as it can be ;-) I updated the requirement, but before merging we probably have to update it again.

Tell doit to use threads from code: yes, that's the plan. I'll check for a more elegant way later (instead of just inserting -P thread into the command line arguments).

Tests: they probably fail because the new post list rendering code requires the post to be already rendered. I updated the dependencies.

@felixfontein
Copy link
Contributor Author

Hmm, at the moment I don't see a completely satisfactory way to force the use of threads via code, except by injecting -P thread into the command line arguments. Overwriting _execute is not a good idea (in case the signature of doit's Run._execute every changes), but that's essentially the place where the default value comes from. @schettino72 or do I miss something?

@schettino72
Copy link
Member

It needs doit 0.27 or above. Apart from setting the minimum version on setup.py and requirements files, please also set minversion.

Please refresh my memory... Why parallel execution with multi-process doesnt work?

You can set (almost) any command line attribute on DOIT_CONFIG, you need to check the source code for the key name as it might be different from the command line parameter name. In your case it is:

DOIT_CONFIG = {
    'num_process': 4,
    'par_type': "thread",
}

@felixfontein
Copy link
Contributor Author

Ah, I forgot about DOIT_CONFIG. I now added 'par_type' as well as 'minversion'. Thanks for mentioning that!

About multi-processing: the implementation of multi-processing in doit requires to pickle task objects, which doesn't work well with closures (search for "pickled" in #1562). Since Nikola, default plugins and 3rd party plugins use closures in tasks, this could be somewhat problematic. At least for me, it always fails quite early when using multi-processing.

@Kwpolska
Copy link
Member

You set minversion incorrectly. It should be (0, 27, 0) (a tuple and not a string)

@felixfontein
Copy link
Contributor Author

Hmm, this looks like a bug in doit (or a Python string problem) to me; after all, the documentation (http://pydoit.org/cmd_run.html#minversion) says you should use a string. I guess the problem is in version_tuple in doit/cmd_base.py:

parts = ver_in.split('.') if isinstance(ver_in, str) else ver_in

in combination with:

from __future__ import print_function, unicode_literals

in nikola/__main__.py, where later DOIT_CONFIG['minversion'] = '0.27.0' is set.

A simple fix (which ignores the main problem) would be to use a tuple, but that somewhat bothers me since it is not clear if that's officially allowed (the documentation only mentions strings, not tuples). @schettino72: should using tuples be officially allowed?

@felixfontein
Copy link
Contributor Author

(The problem doesn't happen with Python 3, BTW, that's why I never experienced it locally...)

@schettino72
Copy link
Member

It should support both string and tuple. The documentation on the site is incomplete, docs on source code are good: https://github.com/pydoit/doit/blob/master/doit/cmd_base.py#L14

And there is a bug on python2 it should check for six.string_types not for plain str.
https://github.com/pydoit/doit/blob/master/doit/cmd_base.py#L19

It is an easy fix (patches welcome), but I suggest you guys just use a tuple or you will have to wait for next doit release.

@schettino72
Copy link
Member

FYI. I fixed the the minversion problem. I also improved the docs saying both strings and tuples are supported. To be released on doit 0.28

pydoit/doit@1b8f8b4

@felixfontein
Copy link
Contributor Author

Thanks a lot! I'm now using a tuple.

@ralsina
Copy link
Member

ralsina commented Apr 24, 2015

So doit 0.28 is out. What's the status of this? Does it work? Does it have any limitations?

@Kwpolska
Copy link
Member

It cannot work yet, because we need to fix things to actually support doit v0.28.0. See #1655.

@felixfontein
Copy link
Contributor Author

The main limitation is still that it is currently not possible to specify targets on the command line, like nikola build output/index.html, and also it is not possible to specify task names which are different from their plugin's name. While I think the second might not be that important, the first one still bothers me quite a lot. How this will resolve depends a lot on what doit will do about it; the current discussion can be found here: pydoit/doit#46

@Kwpolska Kwpolska mentioned this pull request May 12, 2015
26 tasks
@felixfontein
Copy link
Contributor Author

I now rebased the earlytask_impl branch to current master; this also includes support for doit 0.28. (The old branch for doit 0.27 is preserved as https://github.com/getnikola/nikola/tree/earlytask_impl_old, just in case.)

Note that with doit 0.28, you still cannot specify targets via nikola build <targetname>, but what you can do (at least with the current doit master) nikola build <taskname>:<targetname>, for example nikola build render_tags:output/categories/demo.xml. (Simply running nikola build render_tags works too, of course.)

@ralsina
Copy link
Member

ralsina commented May 20, 2015

@felixfontein in what ways does this break backwards compatibility? If it really does, then I'll do a 7.4.2 before merging it.

@felixfontein
Copy link
Contributor Author

Well, currently what doesn't work is nikola build <file name to build>, i.e. specifying a target directly (see my comment from 3 days ago for more details). Also, running nikola build <name of task> doesn't work when <name of task> doesn't equals the plugin's name. Whether the first one will change (to be backwards compatible) depends on what will be implemented in doit; the second can work if the other task names are explicitly given to doit, there's no support for that implemented in earltask_impl at the moment.

There might be some other minor incompatibilities, in particular with other task plugins than the default ones.

Also, I'd prefer to rebase/merge with master again after when you merged the scanposts refactoring, since these two branches will have some conflicts.

@ralsina
Copy link
Member

ralsina commented May 23, 2015

The task/target specification limitations, while a change, I don't think is too bad. We could even check that plugins generate tasks with the right name.

@felixfontein
Copy link
Contributor Author

Hmm, a problem I just noted: if a task has more than one target, there are two problems:
a) if you write nikola build <taskname>:<targetname>, it won't find the task if you don't specify the first target;
b) nikola check -f marks all other targets except the first one as an unknown file.
Both points also affect current Nikola. Since most task plugins never generate more than one file per task, this is hardly ever noticed. Also, one can get around of a) by leaving the task name away.
(In fact, I only noticed this since one of my plugins generates two files per task -- a HTML file and a CSS file --, and I was surprised to find the second file among the orphans.)

I don't know what to do about a) (@schettino72: any idea?). For b), this is currently a result because doit's list --all command returns a list of tasks in the form <taskname>:<target>, where target is the first target in the list if more than one is given. One has to manually call doit info <taskname>:<target> for every taskname and (first) target to get a list of all targets. Hmm, now that I think about it, both problems (a and b) are somewhat related.

@ralsina
Copy link
Member

ralsina commented May 25, 2015

I suppose nikola check could do something smarter than reading the output of the list command. It just never seemed necessary so far.

If you file a bug I'll get around to it eventually.

@felixfontein felixfontein force-pushed the earlytask_impl branch 3 times, most recently from 01699f4 to dadb91e Compare August 18, 2016 04:55
@ralsina ralsina modified the milestones: v8.0.0, 7.8.1 Aug 29, 2016
… Task and LateTask.

Scanning posts is done in additional stage between EarlyTask and Task.
Using delayed loading per plugin task object when cmd.execute_tasks is True, which allows to use `nikola build <task_name>`.
Setting doit parallelism mechanism to threading, as multi-processing does not work due to pickling.
@felixfontein
Copy link
Contributor Author

At least some of the current failures seem to be caused by the problem described in pydoit/doit#152. I'll take another look when that's fixed.

@Kwpolska Kwpolska modified the milestones: v7.8.1, v7.8.2 Oct 13, 2016
@Kwpolska Kwpolska modified the milestones: v7.8.2, v7.8.3 Jan 8, 2017
@felixfontein
Copy link
Contributor Author

Does anyone care to keep this? I've stopped using and maintaining this a long time ago (some eight months or so), and it doesn't work well anyway (since doit doesn't implement some required features).

@Kwpolska
Copy link
Member

👍 for killing it

@Kwpolska Kwpolska closed this Mar 18, 2017
@Kwpolska Kwpolska deleted the earlytask_impl branch March 18, 2017 17:46
@Kwpolska Kwpolska moved this from In Progress to Ideas in Version 8 Jun 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Version 8
  
Cancelled
Development

Successfully merging this pull request may close these issues.

Improve tools for making an auto-generated post
4 participants