Wordpress code blocks (fixes #1186) #1187

asmeurer · 2014-03-30T20:02:19Z

This is a basic fix for #1186. If you guys have tests I haven't modified them yet.

All it does is replace

[code language="stuff"]
code here
[/code]

with

code here

(no idea how to render ``` inside of a code block on GitHub) in the Wordpress importer.

… blocks

asmeurer · 2014-03-30T20:04:44Z

I guess there are tests :)

How should I go about adding some [code] examples to the test wordpress export?

asmeurer · 2014-03-30T20:07:45Z

Should I make the sourcecode method also use this?

asmeurer · 2014-03-31T05:19:03Z

I think the lxml code that is used to write the document is negating my replacement of > with >.

Kwpolska · 2014-03-31T13:04:53Z

So ignore those replaces and be done with it.

As for the rest of the code, it looks good, though it fails the tests and you should fix it. I’d suggest merging it under transform_sourcecode and handling both at the same time (and make sure to end up with a staticmethod)

asmeurer · 2014-04-01T00:01:40Z

Is a staticmethod important? It makes it harder to use a compiled regular expression. I'm personally not a fan of staticmethods. They're too often used when a regular method should be used, just because you didn't happen to need self.

asmeurer · 2014-04-01T00:02:56Z

So ignore those replaces and be done with it.

Yeah, except I have a lot of these in my own blog I'm importing (a lot of >>> Python examples mostly). The whole point of this PR was to contribute my wordpress cleanups back to the importer so that they help other people, rather than just doing a one-off.

asmeurer · 2014-04-01T00:04:22Z

Oh, and sorry if I wasn't clear, but I'm not really clear how to test this. Should I write some dummy wordpress blog post and export it? Maybe whoever has the test blog for the current test cases should add them.

The latter didn't really to a transformation, it just marked the problematic code. Also, don't use a new variable in transform_content, as you might forget to write new_content = func(new_content) instead of new_content = func(content).

Kwpolska · 2014-04-01T13:40:27Z

Is a staticmethod important? It makes it harder to use a compiled regular expression. I'm personally not a fan of staticmethods. They're too often used when a regular method should be used, just because you didn't happen to need self.

staticmethod is used here mostly for organizational reasons. It’s important, because we use the “no self required” thing sometimes.

I think the lxml code that is used to write the document is negating my replacement of > with >.

But what does the output .wp file look like, if you have the replacements on? If it contains regular > characters, all is well (and the replacements should be in the code, misunderstood you there). You cannot have raw <> characters in HTML, that’s why it’s replaced, but browsers and everyone should handle it well — provided that the input file contains <> and not entities (otherwise pygments would go crazy, and someone along the line might even turn it into &gt;)

Yeah, except I have a lot of these in my own blog I'm importing (a lot of >>> Python examples mostly). The whole point of this PR was to contribute my wordpress cleanups back to the importer so that they help other people, rather than just doing a one-off.

I understand. I was misled and thought those replacements are not necessary after all. Sorry if you felt offended there.

Oh, and sorry if I wasn't clear, but I'm not really clear how to test this. Should I write some dummy wordpress blog post and export it? Maybe whoever has the test blog for the current test cases should add them.

You may add stuff to the XML (pay attention to its correctness), or import it to a WordPress blog, make changes and export that out. For the reference, Niko Wenselowski contributed this export in b1661d4 over a year ago — I doubt he still has that blog on WordPress, and even if he did, it’s still easier to modify it manually or with WordPress’ assistance..

asmeurer · 2014-04-01T15:05:14Z

Things that look like html tags (like <class 'int'>) are left alone. But >>> are all converted back to >>>.

Kwpolska · 2014-04-01T15:08:24Z

culprit: https://github.com/asmeurer/nikola/blob/wordpress_code_blocks/nikola/plugins/basic_import.py#L116

we should find a different way to do it, as using lxml with markdown is nuts.

asmeurer · 2014-04-01T15:09:04Z

So I think having > just in bare markdown is fine, in fact, here is one in GitHub markdown: > (it renders like >). The issue is that this doesn't happen in a code block. Code blocks are kind of like like raw strings. Nothing is escaped, and nothing is escapable.

asmeurer · 2014-04-01T15:09:29Z

Oh yeah, that's the bad code. Sorry if that wasn't clear. I should have pointed to that function from the get go.

Kwpolska · 2014-04-01T15:15:03Z

Precisely. We need to find a replacement for rewrite_links(replacer) (which should not be too hard) and not use lxml there.

ralsina · 2014-04-04T09:15:31Z

@Kwpolska in fact, it makes a lot of sense because WP imported posts have lots of HTML links in them, and that's acceptable for markdown, and they need replacing to link to the downloaded assets instead.

One way to "fix" this would be to temporarily wrap code blocks in <pre> tags and then remove them, maybe.

Another would be to replace links by using str.replace() but that's rather scary too.

ralsina · 2014-04-04T09:40:53Z

You may want to take a look at #369 while fiddling with replacer

asmeurer · 2014-04-04T15:43:29Z

I think you're right. As far as I know, the wordpress format is a superset of HTML.

I'll probably have to rearrange some of the importer code to fix this. The lxml stuff happens at the very end, just before the content is written. I guess I just need to factor it out into its own method that is overridden in the wordpress importer class.

ralsina · 2015-04-24T14:31:28Z

This branch didn't apply cleanly, and I can't push here so I took the diff and did #1666, thanks for the fix!

asmeurer · 2015-04-24T14:59:17Z

Thanks!

asmeurer added 3 commits March 30, 2014 14:54

First pass at converting [code] blocks in the Wordpress importer

0c9adf8

Add support for replacing html escaped characters in Wordpress [code]…

9e3c0ce

… blocks

Make flake8 happy

84c2471

asmeurer mentioned this pull request Mar 30, 2014

Wordpress importer doesn't translate [code] #1186

Closed

Replace transform_sourcecode with transform_code

04e58cd

The latter didn't really to a transformation, it just marked the problematic code. Also, don't use a new variable in transform_content, as you might forget to write new_content = func(new_content) instead of new_content = func(content).

ralsina added this to the v7.0.0 milestone Apr 18, 2014

Kwpolska modified the milestones: v7.1.0, v7.0.0 May 16, 2014

Kwpolska force-pushed the master branch from 3549ebe to b1c16b0 Compare October 29, 2014 17:50

Kwpolska modified the milestones: v7.2.0, v7.3.0 Nov 2, 2014

Kwpolska force-pushed the master branch from fefe9b6 to 1ee0ed7 Compare November 3, 2014 18:57

ralsina modified the milestones: v8.0.0, v7.3.0 Jan 13, 2015

ralsina closed this Apr 24, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wordpress code blocks (fixes #1186) #1187

Wordpress code blocks (fixes #1186) #1187

asmeurer commented Mar 30, 2014

asmeurer commented Mar 30, 2014

asmeurer commented Mar 30, 2014

asmeurer commented Mar 31, 2014

Kwpolska commented Mar 31, 2014

asmeurer commented Apr 1, 2014

asmeurer commented Apr 1, 2014

asmeurer commented Apr 1, 2014

Kwpolska commented Apr 1, 2014

asmeurer commented Apr 1, 2014

Kwpolska commented Apr 1, 2014

asmeurer commented Apr 1, 2014

asmeurer commented Apr 1, 2014

Kwpolska commented Apr 1, 2014

ralsina commented Apr 4, 2014

ralsina commented Apr 4, 2014

asmeurer commented Apr 4, 2014

ralsina commented Apr 24, 2015

asmeurer commented Apr 24, 2015

Wordpress code blocks (fixes #1186) #1187

Wordpress code blocks (fixes #1186) #1187

Conversation

asmeurer commented Mar 30, 2014

asmeurer commented Mar 30, 2014

asmeurer commented Mar 30, 2014

asmeurer commented Mar 31, 2014

Kwpolska commented Mar 31, 2014

asmeurer commented Apr 1, 2014

asmeurer commented Apr 1, 2014

asmeurer commented Apr 1, 2014

Kwpolska commented Apr 1, 2014

asmeurer commented Apr 1, 2014

Kwpolska commented Apr 1, 2014

asmeurer commented Apr 1, 2014

asmeurer commented Apr 1, 2014

Kwpolska commented Apr 1, 2014

ralsina commented Apr 4, 2014

ralsina commented Apr 4, 2014

asmeurer commented Apr 4, 2014

ralsina commented Apr 24, 2015

asmeurer commented Apr 24, 2015