How to Build a Markdown-to-HTML Pipeline with Pandoc and Bash

Stylish office workspace featuring dual monitors, a keyboard, notebooks, and decorative plant.
Stylish office workspace featuring dual monitors, a keyboard, notebooks, and decorative plant.

How to Build a Markdown-to-HTML Pipeline with Pandoc and Bash

Markdown has become the de-facto standard for writing plain text content that’s easy to read and simple to format. But for sharing on the web, HTML is king. Bridging this gap efficiently is where a good Markdown-to-HTML pipeline comes in.

This post will show you how to leverage two incredibly powerful tools: Pandoc, the “Swiss Army knife” of document conversion, and Bash, your trusty shell, to build a flexible and automated pipeline for converting your Markdown files into beautiful HTML.

Why a Markdown-to-HTML Pipeline?

You might be asking, “Why bother with a custom pipeline when I can just use a static site generator?” Here’s why:

  • Simplicity: For smaller projects, documentation, or single-page sites, a full-blown static site generator can be overkill. A simple script is often all you need.
  • Control: You have absolute control over the conversion process, styling, and output structure.
  • Automation: Integrate it into your CI/CD, build processes, or local development workflows for instant updates.
  • Flexibility: Pandoc isn’t just Markdown to HTML; it can convert from and to dozens of formats. This pipeline can be adapted for EPUB, PDF, DocX, and more.

Let’s dive in.

Prerequisites

Before we start, you’ll need two main components:

  1. Bash: If you’re on Linux or macOS, you already have it. For Windows, you can use Git Bash, WSL (Windows Subsystem for Linux), or Cygwin.
  2. Pandoc: This is the heavy lifter.

Installing Pandoc

Pandoc is cross-platform and easy to install.

On Debian/Ubuntu (Linux):

sudo apt update
sudo apt install pandoc

On Fedora (Linux):

sudo dnf install pandoc

On macOS (with Homebrew):

brew install pandoc

On Windows: Download the installer from the Pandoc releases page.

After installation, verify Pandoc is working by checking its version:

pandoc --version
pandoc 3.1.9
# ... (other details like API version, Haskell version, etc.)

Pandoc Basics: Your First Conversion

Let’s start with the simplest possible conversion. Create a file named my_document.md:

# My Awesome Document

This is a **sample** Markdown file.

- Item 1
- Item 2

Here's some `inline code`.

```python
def hello_world():
    print("Hello, Pandoc!")

Now, convert it to HTML:

```bash
pandoc my_document.md -o output.html

This command takes my_document.md as input and writes the HTML output to output.html.

Let’s inspect output.html:

head -n 20 output.html
<h1 id="my-awesome-document">My Awesome Document</h1>
<p>This is a <strong>sample</strong> Markdown file.</p>
<ul>
<li>Item 1</li>
<li>Item 2</li>
</ul>
<p>Here’s some <code>inline code</code>.</p>
<pre><code class="language-python">def hello_world():
    print("Hello, Pandoc!")
</code></pre>

As you can see, Pandoc generates clean HTML. However, it’s just the body content. It lacks a full HTML structure, CSS, or JavaScript. Let’s fix that.

Enhancing HTML Output with Pandoc

Pandoc offers a plethora of options to customize your HTML output.

1. Self-Contained HTML

For a complete, ready-to-open HTML file with basic styling and structure, use the --standalone (or -s) flag. This embeds a default CSS theme and structure directly into the HTML file.

pandoc -s my_document.md -o my_document_standalone.html

Now, let’s peek into the <head> of the my_document_standalone.html file. Notice the default CSS and meta tags.

grep -A 10 '<head>' my_document_standalone.html
<head>
  <meta charset="utf-8">
  <meta name="generator" content="pandoc">
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
  <title>My Awesome Document</title>
  <style>
    code{white-space: pre-wrap;}
    span.smallcaps{font-variant: small-caps;}
    div.columns{display: flex; gap: min(4vw, 1.5em);}
    div.column{flex: auto; overflow-x: auto;}
    div.hanging-indent{margin-left: 1.8em; text-indent: -1.8em;}
    ul.task-list{list-style: none;}
    ul.task-list li input[type="checkbox"] {
      width: 0.8em;
      margin: 0 0.8em 0.2em -1.6em;
      vertical-align: middle;
    }
  </style>
  <!--[if lt IE 9]>
    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
  <![endif]-->
</head>

This is already much more usable!

2. Custom CSS

While the default styling is okay, you’ll likely want your own. You can link external CSS files using the --css flag.

First, create a styles.css file:

/* styles.css */
body {
    font-family: Arial, sans-serif;
    line-height: 1.6;
    color: #333;
    max-width: 800px;
    margin: 2em auto;
    padding: 1em;
    background-color: #f9f9f9;
    border: 1px solid #ddd;
    box-shadow: 0 0 10px rgba(0,0,0,0.1);
}

h1, h2, h3 {
    color: #0056b3;
}

pre {
    background-color: #eee;
    padding: 1em;
    border-radius: 5px;
    overflow-x: auto;
}

code {
    background-color: #e0e0e0;
    padding: 0.2em 0.4em;
    border-radius: 3px;
}

Now, convert your Markdown, linking the custom CSS:

pandoc -s --css styles.css my_document.md -o my_document_custom_css.html

Check the <head> again:

grep '<link rel="stylesheet"' my_document_custom_css.html
  <link rel="stylesheet" href="styles.css">

Pandoc correctly linked your styles.css file. Make sure your CSS file is in the same directory as the output HTML or provide a correct relative/absolute path.

3. Metadata (YAML Front Matter)

Pandoc can parse YAML metadata blocks at the beginning of your Markdown file. This is fantastic for adding a title, author, date, and other document-specific information.

Update my_document.md to include YAML front matter:

---
title: My Super Awesome Document
author: Dev Blogger
date: 2025-06-17T13:05:16.383Z
description: This document demonstrates Pandoc's capabilities.
keywords: [pandoc, markdown, html, bash]
---

# My Awesome Document

This is a **sample** Markdown file.

- Item 1
- Item 2

Here's some `inline code`.

```python
def hello_world():
    print("Hello, Pandoc!")

Now, convert it again using the standalone flag:

```bash
pandoc -s my_document.md -o my_document_with_meta.html

Pandoc automatically uses the title for the <title> tag and sometimes renders other metadata as part of the body, depending on the template.

grep -E '<title>|<h1>' my_document_with_meta.html
  <title>My Super Awesome Document</title>
<h1 id="my-awesome-document">My Awesome Document</h1>

Note: Pandoc’s default HTML template will use the YAML title for the HTML <title> tag. The first # My Awesome Document in the Markdown itself becomes the <h1> heading in the body. If you omit the H1 in the Markdown and rely solely on the YAML title, Pandoc will still use it for the <title> tag, but the body won’t have an <h1> unless specified otherwise or via a custom template.

4. Table of Contents

For longer documents, a table of contents (TOC) is indispensable. Pandoc can automatically generate one from your headings. Use the --toc or --table-of-contents flag.

Update my_document.md with more headings:

---
title: My Super Awesome Document
author: Dev Blogger
date: 2025-06-17T13:05:16.383Z
description: This document demonstrates Pandoc's capabilities.
keywords: [pandoc, markdown, html, bash]
---

# My Awesome Document

This is a **sample** Markdown file.

## Introduction
A brief introduction to the document.

### Sub-section A
More details here.

## Main Content
Here's the core of the document.

- Item 1
- Item 2

Here's some `inline code`.

```python
def hello_world():
    print("Hello, Pandoc!")

Conclusion

Wrapping things up.


Now convert with `--toc`:

```bash
pandoc -s --toc --css styles.css my_document.md -o my_document_toc.html

Observe the generated table of contents at the beginning of the HTML body:

grep -A 15 '<nav id="TOC">' my_document_toc.html
<body>
<nav id="TOC" role="doc-toc">
<h2 id="contents">Contents</h2>
<ul>
<li><a href="#my-awesome-document">My Awesome Document</a>
  <ul>
  <li><a href="#introduction">Introduction</a>
    <ul>
    <li><a href="#sub-section-a">Sub-section A</a></li>
    </ul></li>
  <li><a href="#main-content">Main Content</a></li>
  <li><a href="#conclusion">Conclusion</a></li>
  </ul></li>
</ul>
</nav>
<h1 id="my-awesome-document">My Awesome Document</h1>
<p>This is a <strong>sample</strong> Markdown file.</p>
<!-- ... rest of the document ... -->

This is incredibly powerful! Pandoc builds an accessible navigation menu with anchor links based on your Markdown headings.

Building the Bash Pipeline

Now that we understand Pandoc’s capabilities, let’s use Bash to automate conversions.

1. Single File Conversion Script

Let’s create a script convert_single_md.sh that takes a Markdown file as an argument and converts it to HTML in an html_output/ directory.

#!/bin/bash

# convert_single_md.sh: Converts a single Markdown file to HTML.
# Usage: ./convert_single_md.sh <markdown_file>

# --- Configuration ---
OUTPUT_DIR="html_output"
CSS_FILE="styles.css" # Optional: Path to your custom CSS file
PANDOC_ARGS="-s --toc" # Standard arguments for standalone HTML with TOC

# --- Script Logic ---

# Check if a Markdown file was provided
if [ -z "$1" ]; then
    echo "Usage: $0 <markdown_file>"
    echo "Example: $0 my_document.md"
    exit 1
fi

MD_FILE="$1"

# Check if the Markdown file exists
if [ ! -f "$MD_FILE" ]; then
    echo "Error: Markdown file '$MD_FILE' not found!"
    exit 1
fi

# Ensure the output directory exists
mkdir -p "$OUTPUT_DIR"

# Generate the output HTML filename
# Basename gets the filename without path, sed replaces .md with .html
HTML_FILE="${OUTPUT_DIR}/$(basename "$MD_FILE" .md).html"

# Add CSS arg if the file exists
if [ -f "$CSS_FILE" ]; then
    PANDOC_ARGS+=" --css \"$CSS_FILE\""
fi

echo "Converting '$MD_FILE' to '$HTML_FILE'..."
eval pandoc "$MD_FILE" -o "$HTML_FILE" $PANDOC_ARGS

if [ $? -eq 0 ]; then
    echo "Conversion successful: '$HTML_FILE'"
else
    echo "Conversion failed for '$MD_FILE'"
    exit 1
fi

Make the script executable:

chmod +x convert_single_md.sh

Now, let’s test it. Ensure my_document.md and styles.css are in the same directory as the script.

./convert_single_md.sh my_document.md
Converting 'my_document.md' to 'html_output/my_document.html'...
Conversion successful: 'html_output/my_document.html'

Verify the output:

ls html_output/
my_document.html

2. Batch Conversion Script

A single file conversion is useful, but what if you have a directory full of Markdown files, like docs/? Let’s create a script batch_convert.sh to handle this.

First, set up a sample docs/ directory:

mkdir -p docs
echo "# Doc A" > docs/doc_a.md
echo "# Doc B" > docs/doc_b.md
echo "# Doc C" > docs/doc_c.md
ls docs/
doc_a.md  doc_b.md  doc_c.md

Now, the batch_convert.sh script:

#!/bin/bash

# batch_convert.sh: Converts all Markdown files in a source directory to HTML.
# Usage: ./batch_convert.sh <source_markdown_dir>

# --- Configuration ---
OUTPUT_DIR="build_html"       # Where to store generated HTML files
CSS_FILE="styles.css"         # Optional: Path to your custom CSS file
PANDOC_ARGS="-s --toc"        # Standard arguments for standalone HTML with TOC

# --- Script Logic ---

# Check if source directory was provided
if [ -z "$1" ]; then
    echo "Usage: $0 <source_markdown_dir>"
    echo "Example: $0 docs/"
    exit 1
fi

SOURCE_DIR="$1"

# Check if the source directory exists
if [ ! -d "$SOURCE_DIR" ]; then
    echo "Error: Source directory '$SOURCE_DIR' not found!"
    exit 1
fi

# Ensure the output directory exists and is clean
echo "Cleaning existing output directory: '$OUTPUT_DIR/'"
rm -rf "$OUTPUT_DIR"
mkdir -p "$OUTPUT_DIR"

# Add CSS arg if the file exists
if [ -f "$CSS_FILE" ]; then
    PANDOC_ARGS+=" --css \"$CSS_FILE\""
    echo "Using custom CSS: '$CSS_FILE'"
fi

echo "Starting batch conversion from '$SOURCE_DIR/' to '$OUTPUT_DIR/'..."

# Find all Markdown files in the source directory and convert them
find "$SOURCE_DIR" -type f -name "*.md" | while IFS= read -r md_file; do
    # Get the relative path of the markdown file from the source directory
    relative_path="${md_file#$SOURCE_DIR/}"

    # Construct the full path for the output HTML file
    # Replace .md with .html and prepend with OUTPUT_DIR
    html_file="${OUTPUT_DIR}/${relative_path%.md}.html"

    # Ensure subdirectories exist in the output for nested structures
    mkdir -p "$(dirname "$html_file")"

    echo "Converting '$md_file' to '$html_file'..."
    pandoc "$md_file" -o "$html_file" $PANDOC_ARGS
    if [ $? -ne 0 ]; then
        echo "Warning: Conversion failed for '$md_file'"
    fi
done

echo "Batch conversion complete."
echo "Generated files are in: '$OUTPUT_DIR/'"

Make the script executable:

chmod +x batch_convert.sh

Now run the batch conversion:

./batch_convert.sh docs/
Cleaning existing output directory: 'build_html/'
Using custom CSS: 'styles.css'
Starting batch conversion from 'docs/' to 'build_html/'...
Converting 'docs/doc_a.md' to 'build_html/doc_a.html'...
Converting 'docs/doc_b.md' to 'build_html/doc_b.html'...
Converting 'docs/doc_c.md' to 'build_html/doc_c.html'...
Batch conversion complete.
Generated files are in: 'build_html/'

Verify the generated files:

ls build_html/
doc_a.html  doc_b.html  doc_c.html

This script can handle nested directories too! If you had docs/subdir/another.md, it would create build_html/subdir/another.html.

3. Watchdog / Auto-Rebuild (Advanced Concept)

For a truly streamlined development workflow, you can set up a “watchdog” that automatically re-converts Markdown files whenever they are saved. This gives you instant feedback.

On Linux, inotifywait (part of inotify-tools) is excellent for this. On macOS, fswatch is a good alternative.

Here’s a conceptual snippet for inotifywait to watch the docs/ directory and trigger the batch_convert.sh script:

#!/bin/bash

# auto_build.sh: Watches a directory for changes and triggers batch conversion.

WATCH_DIR="docs/"
BUILD_SCRIPT="./batch_convert.sh" # Path to your batch conversion script

if ! command -v inotifywait &> /dev/null; then
    echo "Error: inotifywait not found. Please install inotify-tools."
    echo "  e.g., sudo apt install inotify-tools"
    exit 1
fi

echo "Watching directory '$WATCH_DIR' for changes. Press Ctrl+C to stop."

# Initial build
"$BUILD_SCRIPT" "$WATCH_DIR"

# Loop indefinitely, watching for create, modify, or delete events on .md files
inotifywait -m -r -e create,modify,delete --format '%w%f' --include '\.md$' "$WATCH_DIR" | while read -r changed_file; do
    echo "Change detected in: '$changed_file'. Rebuilding..."
    "$BUILD_SCRIPT" "$WATCH_DIR"
    echo "Build complete. Waiting for more changes..."
done

This script continuously monitors the docs/ directory. Any change to a Markdown file (.md$) triggers a full rebuild using batch_convert.sh. This is how many static site generators provide their “live reload” feature.

Advanced Considerations and Tips

Templates (--template)

Pandoc uses default templates to generate output formats. For ultimate control over your HTML structure (e.g., adding custom headers, footers, navigation, or meta tags not covered by basic options), you can provide your own HTML template.

pandoc --print-default-template html > my_custom_template.html

Edit my_custom_template.html and then use it with:

pandoc -s --template my_custom_template.html my_document.md -o my_document_templated.html

This is where things can get very powerful, allowing you to inject variables from your Markdown’s YAML front matter directly into the HTML structure.

Markdown Extensions

Pandoc supports a superset of standard Markdown, including various extensions. For example, GitHub Flavored Markdown (GFM) features like task lists, strikethrough, and fenced code blocks with language highlighting are often enabled by default or can be explicitly enabled with --from markdown_github.

Explore pandoc --list-extensions markdown for a full list.

Image Paths

When embedding images in your Markdown, use relative paths. Pandoc will include these paths in the generated HTML. Ensure that your output HTML files can correctly locate these images. If your images are in a images/ directory relative to your Markdown, you might want to copy that images/ directory to your build_html/ directory as part of your Bash pipeline.

Example (add to batch_convert.sh after mkdir -p "$OUTPUT_DIR"):

# Copy static assets (e.g., images)
STATIC_ASSETS_DIR="assets" # Assuming your images/files are here
if [ -d "$STATIC_ASSETS_DIR" ]; then
    echo "Copying static assets from '$STATIC_ASSETS_DIR/' to '$OUTPUT_DIR/'"
    cp -r "$STATIC_ASSETS_DIR" "$OUTPUT_DIR/"
fi

Version Control

Always keep your Markdown source files under version control (e.g., Git), not the generated HTML files. The HTML files are artifacts of your build process and can be easily regenerated. Add html_output/ or build_html/ to your .gitignore.

# .gitignore
html_output/
build_html/

Integration with CI/CD

This Bash-Pandoc pipeline is ideal for CI/CD environments. You can run batch_convert.sh as part of your build step, then deploy the build_html/ directory to a static hosting service (Netlify, Vercel, GitHub Pages, S3, etc.).

You now have a solid foundation for building a Markdown-to-HTML pipeline using Pandoc and Bash. We’ve covered:

  • Basic Pandoc conversions.
  • Enhancing HTML output with standalone mode, custom CSS, metadata, and tables of contents.
  • Automating single and batch conversions with Bash scripts.
  • Conceptualizing an auto-rebuild watcher.
  • Tips for advanced usage and integration.

This setup is lightweight, highly customizable, and extremely powerful for managing documentation, personal blogs, or any content that benefits from being written in Markdown and published as HTML.

Experiment with Pandoc’s vast array of options (pandoc --help) and tailor the Bash scripts to fit your specific workflow. Happy building!

Last updated on