New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve blogger directory structure #157
Conversation
self.write_content( | ||
os.path.join(self.output_folder, out_folder, slug + '.html'), | ||
content) | ||
regex = re.compile('/(?P<year>\d{4})/(?P<month>\d{2})/') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any other URL structures possible with Blogger? This looks like a hack nevertheless. Also, the slug is very likely to still be incorrect (YYYYMMrest-of-slug)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is default structure. Not sure if any other structures are possible.
regex = re.compile('/(?P<year>\d{4})/(?P<month>\d{2})/(?P<slug>.*)') | ||
match = regex.search(link_path) | ||
if match: | ||
year, month, slug = match.group('year'), match.group('month'), match.group('slug') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don’t need to name your groups in this case and you could just use match.groups()
Please check if there are other structures available — and perhaps refrain from using regular expressions, instead basing yourself on splitting on /
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using split seems much simpler. Blogger allows custom slugs but url structure is fixed.
f144d91
to
4251d62
Compare
os.path.join(self.output_folder, out_folder, slug + '.html'), | ||
content) | ||
link_fragments = link_path.split('/') | ||
if len(link_fragments) == 4: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- this check is unnecessary; however, you should have
slug
pre-determined and set to the correct value much earlier (line 161) - remove the unnecessary
re
import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about drafts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detecting drafts does not depend on this as far as I can see. You need to write the correct path to the URL map, for example.
self.write_content( | ||
os.path.join(self.output_folder, out_folder, slug + '.html'), | ||
content) | ||
if is_draft: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
@@ -154,8 +156,11 @@ def import_item(self, item, out_folder=None): | |||
|
|||
if link_path.lower().endswith('.html'): | |||
link_path = link_path[:-5] | |||
link_path = link_path.lstrip('/') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do this unconditionally.
LGTM, but I’d like a second opinion just in case (we went through so many changes here.) @punchagan , @ralsina? (thanks for listening to all my complaints. In the future, it might be better not to amend and force-push) |
@@ -182,21 +187,19 @@ def import_item(self, item, out_folder=None): | |||
else: | |||
is_draft = False | |||
|
|||
self.url_map[link] = self.context['SITE_URL'] + '/' + \ | |||
out_folder + '/' + slug + '.html' | |||
self.url_map[link] = os.path.join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can’t use os.path.join
for URLs (that would give backslashes on Windows), just use slashes manually or do os.path.join().replace(os.sep, '/')
out_folder + '/' + slug + '.html' | ||
|
||
self.url_map[link] = self.context['SITE_URL'] + out_folder + \ | ||
'/' + link_path + '.html' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better style (flake8 complains about indent):
self.url_map[link] = (self.context['SITE_URL'] + out_folder +
'/' + link_path + '.html')
self.url_map[link] = self.context['SITE_URL'] + '/' + \ | ||
out_folder + '/' + slug + '.html' | ||
|
||
self.url_map[link] = (self.context['SITE_URL'] + out_folder + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be safer to ensure that SITE_URL ends with a '/' when populating the context. Some blogs probably don't return a url ending in '/' and that may be why we had it in our code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Merging, thanks for your contribution! |
No description provided.