Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wordpress code blocks (fixes #1186) #1187

Closed
wants to merge 4 commits into from

Conversation

asmeurer
Copy link
Contributor

This is a basic fix for #1186. If you guys have tests I haven't modified them yet.

All it does is replace

[code language="stuff"]
code here
[/code]

with

code here

(no idea how to render ``` inside of a code block on GitHub) in the Wordpress importer.

@asmeurer
Copy link
Contributor Author

I guess there are tests :)

How should I go about adding some [code] examples to the test wordpress export?

@asmeurer
Copy link
Contributor Author

Should I make the sourcecode method also use this?

@asmeurer
Copy link
Contributor Author

I think the lxml code that is used to write the document is negating my replacement of > with >.

@Kwpolska
Copy link
Member

So ignore those replaces and be done with it.

As for the rest of the code, it looks good, though it fails the tests and you should fix it. I’d suggest merging it under transform_sourcecode and handling both at the same time (and make sure to end up with a staticmethod)

@asmeurer
Copy link
Contributor Author

asmeurer commented Apr 1, 2014

Is a staticmethod important? It makes it harder to use a compiled regular expression. I'm personally not a fan of staticmethods. They're too often used when a regular method should be used, just because you didn't happen to need self.

@asmeurer
Copy link
Contributor Author

asmeurer commented Apr 1, 2014

So ignore those replaces and be done with it.

Yeah, except I have a lot of these in my own blog I'm importing (a lot of >>> Python examples mostly). The whole point of this PR was to contribute my wordpress cleanups back to the importer so that they help other people, rather than just doing a one-off.

@asmeurer
Copy link
Contributor Author

asmeurer commented Apr 1, 2014

Oh, and sorry if I wasn't clear, but I'm not really clear how to test this. Should I write some dummy wordpress blog post and export it? Maybe whoever has the test blog for the current test cases should add them.

The latter didn't really to a transformation, it just marked the problematic
code.

Also, don't use a new variable in transform_content, as you might forget to
write new_content = func(new_content) instead of new_content = func(content).
@Kwpolska
Copy link
Member

Kwpolska commented Apr 1, 2014

Is a staticmethod important? It makes it harder to use a compiled regular expression. I'm personally not a fan of staticmethods. They're too often used when a regular method should be used, just because you didn't happen to need self.

staticmethod is used here mostly for organizational reasons. It’s important, because we use the “no self required” thing sometimes.

I think the lxml code that is used to write the document is negating my replacement of > with >.

But what does the output .wp file look like, if you have the replacements on? If it contains regular > characters, all is well (and the replacements should be in the code, misunderstood you there). You cannot have raw <> characters in HTML, that’s why it’s replaced, but browsers and everyone should handle it well — provided that the input file contains <> and not entities (otherwise pygments would go crazy, and someone along the line might even turn it into &amp;gt;)

Yeah, except I have a lot of these in my own blog I'm importing (a lot of >>> Python examples mostly). The whole point of this PR was to contribute my wordpress cleanups back to the importer so that they help other people, rather than just doing a one-off.

I understand. I was misled and thought those replacements are not necessary after all. Sorry if you felt offended there.

Oh, and sorry if I wasn't clear, but I'm not really clear how to test this. Should I write some dummy wordpress blog post and export it? Maybe whoever has the test blog for the current test cases should add them.

You may add stuff to the XML (pay attention to its correctness), or import it to a WordPress blog, make changes and export that out. For the reference, Niko Wenselowski contributed this export in b1661d4 over a year ago — I doubt he still has that blog on WordPress, and even if he did, it’s still easier to modify it manually or with WordPress’ assistance..

@asmeurer
Copy link
Contributor Author

asmeurer commented Apr 1, 2014

Things that look like html tags (like <class 'int'>) are left alone. But >>> are all converted back to &gt;&gt;&gt;.

@Kwpolska
Copy link
Member

Kwpolska commented Apr 1, 2014

culprit: https://github.com/asmeurer/nikola/blob/wordpress_code_blocks/nikola/plugins/basic_import.py#L116

we should find a different way to do it, as using lxml with markdown is nuts.

@asmeurer
Copy link
Contributor Author

asmeurer commented Apr 1, 2014

So I think having &gt; just in bare markdown is fine, in fact, here is one in GitHub markdown: > (it renders like >). The issue is that this doesn't happen in a code block. Code blocks are kind of like like raw strings. Nothing is escaped, and nothing is escapable.

@asmeurer
Copy link
Contributor Author

asmeurer commented Apr 1, 2014

Oh yeah, that's the bad code. Sorry if that wasn't clear. I should have pointed to that function from the get go.

@Kwpolska
Copy link
Member

Kwpolska commented Apr 1, 2014

Precisely. We need to find a replacement for rewrite_links(replacer) (which should not be too hard) and not use lxml there.

@ralsina
Copy link
Member

ralsina commented Apr 4, 2014

@Kwpolska in fact, it makes a lot of sense because WP imported posts have lots of HTML links in them, and that's acceptable for markdown, and they need replacing to link to the downloaded assets instead.

One way to "fix" this would be to temporarily wrap code blocks in <pre> tags and then remove them, maybe.

Another would be to replace links by using str.replace() but that's rather scary too.

@ralsina
Copy link
Member

ralsina commented Apr 4, 2014

You may want to take a look at #369 while fiddling with replacer

@asmeurer
Copy link
Contributor Author

asmeurer commented Apr 4, 2014

I think you're right. As far as I know, the wordpress format is a superset of HTML.

I'll probably have to rearrange some of the importer code to fix this. The lxml stuff happens at the very end, just before the content is written. I guess I just need to factor it out into its own method that is overridden in the wordpress importer class.

@ralsina ralsina modified the milestones: v8.0.0, v7.3.0 Jan 13, 2015
@ralsina
Copy link
Member

ralsina commented Apr 24, 2015

This branch didn't apply cleanly, and I can't push here so I took the diff and did #1666, thanks for the fix!

@ralsina ralsina closed this Apr 24, 2015
@asmeurer
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants