Pandoc – A Document Conversion Tool For Linux
Pandoc is a command line tool that converts text from one markup language to another. The markup languages supported by pandoc are – HTML, Text, Markdown, LaTeX, ConTeXt, RTF, DocBook XML, OpenDocument XML, MediaWiki markup, S5 (An HTML slide show similar to Slidy) and so on. However, it doesn’t support PDF conversion. So if you want to convert files to PDF, you should take a look at other document conversion tools (txt2tags, wkhtml2pdf etc).
If you are willing to put up with a command line tool for document conversion and you are not dependent on PDF, then pandoc could do the job for you.
How to install Pandoc in Linux
In most main stream Linux distributions, pandoc is available in their respective repositories. More specifically, in Ubuntu, you can install pandoc using the apt-get command as follows.
$ sudo apt-get install pandoc
Pandoc Usage
The best way to learn to use this tool is through examples. So here goes.
$ pandoc test.txt
will convert text into HTML and output the result on your screen. To send the output to a file, you need to specify the file using the -o option as follows.
$ pandoc -o output.html test.txt
To create an HTML file with smart quotes and table of contents, use the following syntax.
$ pandoc -s -S --toc -o output.html test.txt
By default, when you use pandoc, it will produce output as a fragment. However, if you want a standalone file (be it HTML, LaTeX, RTF etc) with a header and footer, then use the -s option. Check the output.html file source code for the above pandoc commands.
And -S directs pandoc to use smart quotes, dashes, and ellipses. This option is significant only when the input format is markdown. It is selected automatically when the output format is LaTeX or ConTeXt.
The --toc will insert a “Table of Contents” in the output.html file.
Now check out the following command.
$ pandoc -c beautify.css -o output.html test1.txt test2.txt
In the above case, by using the -c command, I tell pandoc to link to the beautify.css stylesheet from within output.html file.
And as shown above, I can pass multiple input files (test1.txt, test2.txt etc) to pandoc and everything will be included in the output.
Another very useful feature is inserting content in the header and footer of the output file. This is done using the -H and -A options as shown in the following example.
$ pandoc --toc -H header.html -A footer.html -o output.html test1.txt test2.txt
pandoc is intelligent enough to guess the correct file format from the extensions of the output file. So to convert a file to RTF, you can do this -
$ pandoc -o output.rtf test1.txt
If you want to create presentations like those created using Slidy, you are in luck. Pandoc supports a slideshow markup language called S5 and you can convert your files to the S5 format.
$ pandoc -s -i -w s5 test1.txt -o output.html
-w indicates to write to the particular format (s5 HTML and Javascript slideshow), -i is an s5 specific option which asks pandoc to make list items incremental in the output. And test1.txt will contain s5 specific markup.
A few tips
Knowing a few markup rules help in converting a simple text file into different formats like HTML, RTF, S5 slideshow, etc.
- Enclosing a group of words into
` `will be treated as code. - If you want to enclose some text in <pre> tag, use the tab delimiter.
- All underlined text will be treated as heading.
=====is level 1 heading and-----is level 2 heading. - Pandoc automatically creates lists. Use 1., 2. etc for ordered lists and * for unordered lists. You can create a sub-list by using a
TABprior to the list item in your text file.
For more information, read the man page that comes with Pandoc, or visit Pandoc’s home page.