Skip to content

Pass language to slugify and unslugify. (WIP) #2284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 10, 2016
Merged

Conversation

felixfontein
Copy link
Contributor

First try on implementing passing of language to slugify and unslugify; see discussion in #2273.

if _tag_slugified in slugged_tags[lang]:
if tag not in self.posts_per_tag:
# Tags that differ only in case
other_tag = [existing for existing in self.posts_per_tag.keys() if utils.slugify(existing) == _tag_slugified][0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing , lang here.

@ralsina
Copy link
Member

ralsina commented Mar 6, 2016

Since slugify could be used by plugins, unless you want this to have to
wait until v8, you should give lang a default value.

El dom., mar. 6, 2016 14:15, Felix Fontein notifications@github.com
escribió:

First try on implementing passing of language to slugify and unslugify;

see discussion in #2273 #2273.

You can view, comment on, or merge this pull request online at:

#2284
Commit Summary

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#2284.

@Kwpolska
Copy link
Member

Kwpolska commented Mar 6, 2016

I’m 👎 for this. You just wasted your time implementing a tiny obscure feature.

@felixfontein
Copy link
Contributor Author

@ralsina: I added a warning in that case. This warning will be useful while this is WIP.

@Kwpolska: I really want to have this feature for my blog, for me it's neither tiny nor obscure. And at least @davidak and @akrist seem to agree since they bothered to open an issue resp. write code for this.

@davidak
Copy link
Contributor

davidak commented Mar 7, 2016

It is important for a german site since it is the grammatically correct representation.

Wordpress supports it since Version 3.6:
http://www.perun.net/2013/04/17/wordpress-3-6-umlaute-im-titel-ohne-plugin-entschaerfen/

before, most people used a plugin.

To implement this as a plugin would be ok for me.

there is a python module that does what we want:
https://pypi.python.org/pypi/translitcodec/

>>> import translitcodec
>>> import codecs
>>> codecs.encode(u'Schöne Grüße aus Köln', 'translit/long')
'Schoene Gruesse aus Koeln'

@felixfontein
Copy link
Contributor Author

@davidak: this Python module cannot be used for all languages since for most it would be simply wrong. Also, I wonder how well it works for languages such as Chinese and Japanese.

My goal is to not change Nikola's default behavior, but to just enable this to be changed by plugins without writing a really insane plugin (which does post preprocessing etc.). Such a plugin could use this Python module for the German language, though.

About WordPress: I wonder why they only support de_DE and not also de_CH and de_AT. Also, the current version of WordPress uses similar translations for Danish (da_DK).

@davidak
Copy link
Contributor

davidak commented Mar 7, 2016

Such a plugin could use this Python module for the German language, though.

that was my intention.

@ralsina
Copy link
Member

ralsina commented Mar 7, 2016

This looks ok to me, and changes no behaviour. If noone sees a bug in the next 24 hours, I'll merge.

@felixfontein
Copy link
Contributor Author

@ralsina: maybe we should leave it open a bit longer, I'd like to do some more intense tests (like with a real blog :) ). I'm not sure yet when I'll have time for this, but probably not during the next 24h.

@Kwpolska
Copy link
Member

Kwpolska commented Mar 7, 2016

(Note we currently use unidecode which is a module that does the same as translitcodec, just with different opinions, and I’m pretty sure CJK text is complete garbage with both)

@@ -26,6 +26,7 @@ def test_getting_metadata_from_content(self):
post = dummy()
post.source_path = 'file_with_metadata'
post.metadata_path = 'file_with_metadata.meta'
post.default_lang = 'en'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nicer to make a special post_dummy object with default_lang = 'en' already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using dummy now, since that's precisely used in these situations.

@ralsina
Copy link
Member

ralsina commented Mar 7, 2016

@felixfontain ok, let's give it a few days. I want to do a release before friday because I will then be unavailable for a few weeks. Then again, if it doesn't go in this release, it will go in the next one.

@felixfontein
Copy link
Contributor Author

Seems to work fine. I created a mini plugin for testing, for those who are interested, which can be put into the plugins/ folder of a blog:
The description file (t.plugin):

[Core]
Name = t
Module = t

The code (t.py):

# -*- coding: utf-8 -*-

from nikola.plugin_categories import SignalHandler
from nikola import utils
from unidecode import unidecode
import re


_slugify_strip_re = re.compile(r'[^+\w\s-]')
_slugify_hyphenate_re = re.compile(r'[-\s]+')


def my_slugify(value, lang=None, force=False):
    if not isinstance(value, utils.unicode_str):
        raise ValueError("Not a unicode object: {0}".format(value))
    value = value.replace('ä', 'ae').replace('ö', 'oe').replace('ü', 'ue').replace('ß', 'ss')
    value = value.replace('Ä', 'ae').replace('Ö', 'oe').replace('Ü', 'ue')
    value = utils.unicode_str(unidecode(value))
    value = _slugify_strip_re.sub('', value, re.UNICODE).strip().lower()
    return 'slugified-' + _slugify_hyphenate_re.sub('-', value, re.UNICODE)


class T(SignalHandler):  # Could also be any other plugin type which is initialized early enough.
    def set_site(self, site):
        super(T, self).set_site(site)
        utils.slugify = my_slugify

I recommend to remove the 'slugified-' prefix for production use ;-)

So from my side, merging is fine.

@felixfontein
Copy link
Contributor Author

@ralsina: do you want to merge, or should I just do it myself?

Kwpolska added a commit that referenced this pull request Mar 10, 2016
Pass language to slugify and unslugify. (WIP)
@Kwpolska Kwpolska merged commit 4db06f1 into master Mar 10, 2016
@Kwpolska
Copy link
Member

Merged, let's hope it works.

@Kwpolska Kwpolska deleted the pass-lang-to-slugify branch March 10, 2016 16:42
@felixfontein
Copy link
Contributor Author

Thanks for merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants