New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create directory-specific sitemaps for directories with 200+ files #2030
Conversation
Mitigates issue #1683 by making the main index smaller. Also reduces bandwidth usage by making smaller sitemaps overall.
@Aeyoun you can make your task depend scan_locs_task like write_sitemap does:
|
If I understand correctly, here is what we need to do.
I have looked at the code, and I don't know how to make this work correctly:
So, I can see 2 ways to make this work:
@schettino72 any of those things possible? |
With the |
The easiest solution is to handle this logic yourself inside the task. On step 4 "Run those tasks, generate all the small local sitemaps". Remember that inside your action you can manipulate the task metadata. And as @felixfontein said, early_task might be another way to solve this. |
Ahhh changing targets inside one task that generates all the sitemaps looks like the easy way out. Yes, early tasks would fix this but that's a backwards-incompatible change for us so it has to wait for v8 :-( |
Well, we actually don’t know what all the folders will be either. Will investigate these new ideas. Thanks guys! |
@Aeyoun just have one task, calc_dep on scan_locs, decide there what folders need sitemaps, and have it set its own targets list accordingly. |
Hm. updating targets after running actions doesn’t really work. nikola.gen_tasks() as used by check.py is unaware of metadata that is updated from within a task’s action. So check.py will break. Other than that, it seem to work to update the metadata as part of running the task. Though it also introduced some build instability as scan_locs() and tasks that depend on it is unaware of any sitemap files by design. |
Changing the metadata later probably also break |
So, @Aeyoun is this stalled? Impossible? |
I can’t figure out how to do it, at least. The task is always created too early to know what other files will exist to accurately say which sitemaps will be generated when. I’ve tried some variations from the original idea but they all run into the same problem at some time or another. |
This would have to generate it’s tasks and run only after everything else have already finished. |
Ok, so basically this never worked, and it's for a minor corner-feature (if that's a thing). I say we close it. |
Folders with 200+ HTML files get their own sitemaps that are included in the main sitemapindex.xml. These are not included in the main sitemap.xml file as they already exists in the folder-sitemaps. As all URLs for a folder reside within the same folder, they’re valid sitemaps in terms of search engines. Works nicely for sites with many tags (TAG_PATH fills up quickly) and should work for sectioned sites as well. The number 200 was entirely arbitrarily chosen in an attempt to keep the sitemaps somewhat small without having to create one for every folder with just a few files.
The motivation here was trying to make sitemaps smaller so sitemap consumers don’t need to download the entire growing main sitemap.xml file every time I add one page. It eats a lot of bandwidth. Plus it mitigates #1683 somewhat.
New sitemapindex follows best practices and passes sitemap tests at Bing, Yandex, Google, and Baidu. Well, at least I think it passes at Baidu. Big green checkmark and Chinese text means it’s okay, right?