Skip to content

Instantly share code, notes, and snippets.

@gjtorikian
Last active August 29, 2015 14:01
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gjtorikian/5b9ee9a8ab883a6404a3 to your computer and use it in GitHub Desktop.
Save gjtorikian/5b9ee9a8ab883a6404a3 to your computer and use it in GitHub Desktop.

Hi, my name is @gjtorikian: http://miadzin.org/

How many of you work with DITA/XML? (wait for people to raise their hands)

Ok, and how many of you use a lightweight markup language like Markdown, ReStructuredText, wikis, etc. (wait for more people to raise hands.)

Keep your hands raised. I want people from the first group to go and talk to people from the second group. Share ideas. Exchange words.

I speak to you as someone who's been a writer that transitioned into a doc build person for a large DITA enterprise. I speak to you as someone who has written not one, not two, but three documentation build systems. I've written ant tasks and XSLTs and DTDs for DITA. I've built a tool to validate documentation links and images, and an implementation of conrefs that is very fast. And I've also written a bunch of documentation.

I think we, as writers, deserve better tools. I think that the while the ideas of DITA are good and well-intentioned--that is, structured documentation for reader consistency--the result is a nightmare of build scripts, garbled XML tags, and a team of engineers that the writers depend on.

Take a look at this file:

<reference id="pwd_command">
  <title>The <cmdname>pwd</cmdname> command</title>
 
  <refbody>
    <refsyn><cmdname>pwd</cmdname></refsyn>
 
    <section><title>DESCRIPTION</title><p>Print the full filename of the
    current working directory.</p><note>Your shell may have its own version of
    <cmdname>pwd</cmdname>, which usually supersedes the version described
    here.</note></section>
 
    <section><title>AUTHOR</title><p>Written by John Doe. </p></section>
  </refbody>
</reference>

I can guarantee that your reader does not care at all that pwd is a <cmdname>. They do not care about refsyn or refbody or any other special tags. These tags also require bulky programs (like Arbortext) to sanely present to writers. They make it very difficult to diff and edit content.

What if we simplified the content?

---
type: reference
title: The `pwd` command
author: John Doe
---
 
Print the full filename of the current working directory.
 
Note: Your shell may have its own version of `pwd`, which usually supersedes the version described here.

This is a reference document--maybe it has a different layout, or additional validation for content that can be inserted. Instead of a <note> tag, we write the word Note:, like a human. This sort of content is easy to read and easy to write.

But it's difficult to structure. How can you ensure that, for example, no document marked as Reference have procedural steps in them?

Right now, systems like Read the Docs make it easy to provide documentation for your open-source projects. These sorts of tools are great, allowing writers to write in a plain-text language, and have it built to HTML and other formats. But we need to go a step further. We need to focus on building tools that provide some level of sophistication, without overwhelming the writer or placing a support burden on engineers in a company.

Businesses are always looking to streamline and improve processes. While engineers get better and better tools, writers historically have a lump of "whatever works."

Writing is, or should be, what we're good at. 90% of our time should be spent writing. If we care about our jobs, it means worrying about sentences, punctuation, and rendering to the reader. We don't want to wonder how to insert a new paragraph, when to give something an ID, when to implement a menu cascade. We shouldn't have to care about that; the user certainly doesn't. It's important that our menu items all look the same; what's not important is that that's actually a <menucascade> with two <menuitem>s inside, or whatever other semantic nonsense.

The hardest part of a writer's job should not be struggling with their tools. It should be crafting the right message. Better tools will help us become better writers, which ultimately will be better for businesses.

I'm really interested in creating documentation tools to make this dream a reality. Please contact me if you too are interested.

@jbisso
Copy link

jbisso commented May 7, 2014

Interesting squib, G. I agree with you that better tools for writers are needed, I'm just not sure how you get something as "easy" as markdown to be structured enough to keep writers' bad habits from surfacing. Also, I think the idea behind marking up things like apiname or filepath does not mean much to writer or reader now, but the promise was one of using that semantic markup to do things with. Myself, I've been hankering after a codeblock like thing that not only does syntactic highlighting but also pulls code out of sample apps in a repo somewhere and makes sure they still build and run unit tests on them.

At my current job, I was part of the team to help move from RST/Sphinx to DITA. Like yourself, I found myself writing less doc and munging around in build scripts and DTDs. I'm not sure I', ready to go back to the old way, but something needs to be done with the new one. The big problem with DITA and the DITA-OT is lack of decent conceptual documentation. I think that is an easier problem to fix than going with something new. Maybe another layer on top of DITA: something that uses JSON and markdown but outputs DITA (and runs that through the OT).

Let's talk.

@mbakeranalecta
Copy link

If you want real structure, you need a way to specify structure for a particular document type. In other words, you need a schema language.

Markdown, etc, don't have a schema language so their semantics are pre-defined. There is no way to define structure beyond what is already there. XML has several schema languages. Its semantics are not predefined, so you can define structure using a schema. However, it is not an author friendly syntax. XML was created without any attention being paid to making it author friendly (that was supposed to be left to SGML -- RIP).

What we need is a format with a more author-friendly syntax, but which also has a schema language to define semantics for different document types.

XML was also designed to be very general. Any more author-friendly markup could afford to be far less general in its aims. XMLs generality comes with verbosity and complexity, which has not served it well. Both JSON and HTML5 have taken chunks out of its territory. There is no reason we can't come up with a markdown-like syntax and a supporting schema language specifically for authoring. Less ambition means less complexity and less cost.

I've been working on this for a while now and I am a good way through defining a syntax and a schema language. Contact me if you would like to discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment