Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Html metadata (#2825)
* Extract metadata from HTML meta tags like Pelican does (Issue #1923)

* updated changelog

* docs, use html_metadata map

* doc tweak

* doc tweak
  • Loading branch information
ralsina committed Jun 6, 2017
1 parent c9e4caa commit 156277a
Show file tree
Hide file tree
Showing 3 changed files with 64 additions and 2 deletions.
8 changes: 8 additions & 0 deletions CHANGES.txt
@@ -1,3 +1,11 @@
New in master
=============

Features
--------

* Extract metadata from HTML meta and title tags like Pelican (Issue #1923)

New in v7.8.7
=============

Expand Down
26 changes: 26 additions & 0 deletions docs/manual.txt
Expand Up @@ -418,6 +418,7 @@ other static site generators. The currently supported metadata formats are:
* TOML, between ``+++`` (Hugo)
* reST docinfo (Pelican)
* Markdown metadata extension (Pelican)
* HTML meta tags (Pelican)

You can add arbitrary meta fields in any format.

Expand Down Expand Up @@ -529,6 +530,30 @@ the `markdown metadata extension docs <https://pythonhosted.org/Markdown/extensi

Note that keys are converted to lowercase automatically.

HTML meta tags
``````````````

For HTML source files, metadata will be extracted from ``meta`` tags, and the title from the ``title`` tag.
Following Pelican's behaviour, tags can be put in a "tags" meta tag or in a "keywords" meta tag. Example:

.. code:: html

<html>
<head>
<title>My super title</title>
<meta name="tags" content="thats, awesome" />
<meta name="date" content="2012-07-09 22:28" />
<meta name="modified" content="2012-07-10 20:14" />
<meta name="category" content="yeah" />
<meta name="authors" content="Conan Doyle" />
<meta name="summary" content="Short version for index and feeds" />
</head>
<body>
This is the content of my super blog post.
</body>
</html>


Mapping metadata from other formats
```````````````````````````````````

Expand All @@ -541,6 +566,7 @@ For Pelican, use:
METADATA_MAPPING = {
"rest_docinfo": {"summary": "description", "modified": "updated"},
"markdown_metadata": {"summary": "description", "modified": "updated"}
"html_metadata": {"summary": "description", "modified": "updated"}
}

For Hugo, use:
Expand Down
32 changes: 30 additions & 2 deletions nikola/plugins/compile/html.py
Expand Up @@ -28,12 +28,14 @@

from __future__ import unicode_literals

import os
import io
import os

import lxml.html

from nikola import shortcodes as sc
from nikola.plugin_categories import PageCompiler
from nikola.utils import makedirs, write_metadata
from nikola.utils import LocaleBorg, makedirs, map_metadata, write_metadata


class CompileHtml(PageCompiler):
Expand Down Expand Up @@ -84,3 +86,29 @@ def create_post(self, path, **kw):
fd.write(write_metadata(metadata))
fd.write('-->\n\n')
fd.write(content)

def read_metadata(self, post, file_metadata_regexp=None, unslugify_titles=False, lang=None):
"""Read the metadata from a post's meta tags, and return a metadata dict."""
if lang is None:
lang = LocaleBorg().current_lang
source_path = post.translated_source_path(lang)

with io.open(source_path, 'r', encoding='utf-8') as inf:
data = inf.read()

metadata = {}
doc = lxml.html.document_fromstring(data)
title_tag = doc.find('*//title')
if title_tag is not None:
metadata['title'] = title_tag.text
meta_tags = doc.findall('*//meta')
for tag in meta_tags:
k = tag.get('name').lower()
if not k:
continue
elif k == 'keywords':
k = 'tags'
metadata[k] = tag.get('content', '')
map_metadata(metadata, 'html_metadata', self.site.config)
return metadata

0 comments on commit 156277a

Please sign in to comment.