Building a Wiki from Markdown with Pandoc and CLI Tools

Minimalist image of a hand holding a Black Friday sale tag, ideal for promotions.
Minimalist image of a hand holding a Black Friday sale tag, ideal for promotions.

Building a Wiki from Markdown with Pandoc and CLI Tools

In an age of ever-more complex web applications and cloud-based solutions, sometimes the simplest tools offer the most robust and enduring solutions. For personal knowledge management, project documentation, or even a modest team wiki, the combination of plain Markdown files and powerful command-line utilities presents a compelling alternative to heavyweight wiki software. This approach champions simplicity, portability, version control, and longevity.

This post will guide you through building a static wiki using Markdown as your content source, Pandoc for conversion, and a handful of CLI tools to automate the process.

Why a Markdown-Based CLI Wiki?

You might wonder, “Why go through the trouble when there are dedicated wiki platforms like MediaWiki, Confluence, or even Notion?” Here’s why this approach shines:

  1. Simplicity & Portability: Your content is plain text. It’s incredibly easy to move, backup, and view on any device. No databases, no proprietary formats, no complex server setups.
  2. Version Control: Markdown files are text files, making them ideal for Git. Every change can be tracked, reverted, and collaboratively managed with ease. This is arguably the most powerful advantage for any evolving knowledge base.
  3. Longevity: Plain text formats rarely become obsolete. Your wiki will be accessible decades from now, independent of specific software versions or dependencies.
  4. Security: A static HTML wiki has a minimal attack surface compared to dynamic web applications.
  5. Control: You have absolute control over every aspect of your wiki’s generation and presentation.
  6. “Future-Proofing”: As long as Markdown exists (and it’s not going anywhere soon), your content is accessible.

This setup is particularly suited for personal notes, project documentation, internal knowledge bases, or small team wikis where the emphasis is on content creation and ease of maintenance, rather than dynamic features like user authentication or complex editing interfaces.

Prerequisites: Your CLI Toolkit

Before we dive in, ensure you have the following tools installed on your system (Linux, macOS, or WSL on Windows):

  • Pandoc: The Swiss Army knife for document conversion. It’s the core of our wiki builder, capable of converting Markdown to HTML with incredible flexibility.
  • Git: Essential for version control, allowing you to track changes, collaborate, and revert to previous states of your wiki.
  • Standard Unix Utilities:
    • find: For locating Markdown files.
    • sed: For stream editing, particularly useful for transforming wiki-style links.
    • grep: For searching within your wiki content.
    • mkdir, cp, rsync: For managing directories and copying files.
  • A Text Editor: Any editor you prefer for writing Markdown (VS Code, Sublime Text, Vim, Emacs, etc.).

Designing Your Wiki Structure

A well-organized directory structure is crucial for a maintainable wiki. Here’s a common pattern:

wiki-root/
├── src/                      # Your Markdown source files
│   ├── index.md
│   ├── concepts/
│   │   ├── markdown-basics.md
│   │   └── cli-tools.md
│   ├── projects/
│   │   ├── project-x.md
│   │   └── project-y.md
│   └── images/               # Images and other assets linked from Markdown
│       └── diagram.png
├── templates/                # Custom HTML templates for Pandoc
│   └── default.html
├── assets/                   # Global CSS, JS, favicons, etc.
│   ├── style.css
│   └── script.js
├── build/                    # The generated static HTML wiki (will be created by script)
└── build.sh                  # The script to build your wiki

Writing Your Markdown Content

Within the src/ directory, each Markdown file (.md) will represent a wiki page.

Front Matter: Pandoc can leverage YAML metadata blocks at the top of your Markdown files. This is invaluable for defining page titles, descriptions, and other page-specific data.

---
title: Understanding Pandoc Templates
description: "A deep dive into creating and using custom HTML templates with Pandoc for consistent wiki styling."
date: 2025-06-17T11:22:34.549Z
tags: [Pandoc, Templates, HTML, Customization]
---

# Understanding Pandoc Templates

Pandoc's templating system allows you to...

Internal Links: This is where wiki functionality often diverges from standard Markdown. Common wiki software uses [[Page Name]] syntax for internal links. Standard Markdown uses [Link Text](path/to/page.md).

For this CLI-based wiki, we have a few options:

  1. Pandoc’s Native Relative Paths: The simplest way is to use standard Markdown relative links: [Concepts](concepts/markdown-basics.md). This works directly with Pandoc. The challenge is ensuring these link to .html files in the build/ directory and that your file names are consistent.

    • Best practice for this approach: Keep all Markdown filenames lowercase and use hyphens (e.g., my-cool-page.md) to easily map to my-cool-page.html.
  2. Pre-processing Wiki-style Links: If you prefer the [[Page Name]] syntax for writing, you’ll need a pre-processing step using sed or awk to convert them into standard Markdown links before Pandoc sees them. This is more flexible but adds complexity.

    Let’s assume we want to convert [[My Cool Page]] to [My Cool Page](my-cool-page.html). This requires a consistent slugification strategy (e.g., My Cool Page becomes my-cool-page).

    # Example sed command (simplified, real slugification is more complex)
    # This example converts [[Page Name]] to [Page Name](page-name.html)
    # It assumes page-name.html can be derived by lowercasing and replacing spaces with hyphens.
    # In a real scenario, you'd likely map titles to filenames more robustly.
    
    # Read the markdown content
    content=$(cat "your-page.md")
    
    # Use sed to find [[...]] and replace with [...](slug.html)
    # This is a very basic example and might not handle all edge cases.
    # A more robust solution might involve a Python script.
    processed_content=$(echo "$content" | sed -E 's/\[\[([^\]]+)\]\]/\[\1\](\L\1.html)/g' | sed -E 's/\[([^\]]+)\]\(([^)]+) (.*?).html\)/[\1](\2-\3.html)/g')
    
    # The above sed is an attempt at converting spaces to hyphens and lowercasing the link target.
    # A cleaner approach for complex slugification often involves a short helper script
    # or function that takes the page title and returns a clean filename.

    Note: Achieving robust slugification purely with sed can be tricky. For a production-ready system, a short Python or Node.js script that maps [[Title]] to [Title](slugified-title.html) and handles all edge cases (special characters, duplicates) is highly recommended.

Images and Assets: Place images within your src/ directory (e.g., src/images/). Link to them using relative paths in Markdown: ![Alt Text](../images/diagram.png). During the build, these will be copied to the build/ directory, maintaining their relative paths.

The Pandoc Powerhouse: Conversion and Templating

Pandoc is the workhorse. Its basic command for converting a single Markdown file to HTML is:

pandoc input.md -o output.html

Custom HTML Templates

To give your wiki a consistent look and feel, Pandoc supports custom HTML templates. These are regular HTML files with special variables that Pandoc replaces with content and metadata.

templates/default.html (Example Snippet):

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>$title$ - My CLI Wiki</title>
    $if(description)$<meta name="description" content="$description$">$endif$
    <link rel="stylesheet" href="../assets/style.css">
    <!-- You can add other assets here, e.g., favicons -->
</head>
<body>
    <header>
        <h1><a href="/index.html">My CLI Wiki</a></h1>
        <nav>
            <!-- Basic navigation example, could be dynamically generated -->
            <a href="/concepts/index.html">Concepts</a>
            <a href="/projects/index.html">Projects</a>
        </nav>
    </header>

    <main>
        <article>
            <h1 id="page-title">$title$</h1>
            $body$
        </article>
    </main>

    <footer>
        <p>&copy; $date$ My CLI Wiki. Built with Pandoc.</p>
        $if(tags)$
        <p>Tags: $tags$</p>
        $endif$
    </footer>
</body>
</html>
  • $title$: Replaced by the title from your Markdown front matter.
  • $body$: This is where the converted Markdown content goes.
  • $if(variable)$...$endif$: Conditional blocks, useful for including elements only if a certain variable is present (e.g., description, tags).
  • $date$: Can be pulled from front matter or Pandoc’s default processing date.

You tell Pandoc to use a template with the --template flag:

pandoc --template=templates/default.html -s input.md -o output.html

The -s (or --standalone) flag is crucial; it tells Pandoc to produce a complete HTML document, not just a fragment.

The Build Script: build.sh

This is the heart of your CLI wiki. It will orchestrate finding Markdown files, converting them, copying assets, and organizing everything into the build/ directory.

build.sh (Example Script):

#!/bin/bash

# Configuration
SRC_DIR="src"
BUILD_DIR="build"
TEMPLATES_DIR="templates"
ASSETS_DIR="assets" # Directory for global CSS, JS, etc.

TEMPLATE_FILE="$TEMPLATES_DIR/default.html" # Your main HTML template

# --- Utility Functions ---

# Function to convert page title to a URL-friendly slug
# This is a simple example. For production, consider a more robust slugification.
# For example, a Python script or a dedicated utility like 'slugify' (pip install python-slugify)
# could be used in a more complex setup.
slugify() {
    echo "$1" | tr '[:upper:]' '[:lower:]' | sed -E 's/[^a-z0-9]+/-/g' | sed -E 's/^-+|-+$//g'
}

# --- Main Build Process ---

echo "--- Building Wiki ---"

# 1. Clean previous build
echo "1. Cleaning $BUILD_DIR..."
rm -rf "$BUILD_DIR"
mkdir -p "$BUILD_DIR"

# 2. Copy global assets (CSS, JS, etc.)
echo "2. Copying global assets from $ASSETS_DIR to $BUILD_DIR/$ASSETS_DIR..."
mkdir -p "$BUILD_DIR/$ASSETS_DIR"
cp -r "$ASSETS_DIR"/* "$BUILD_DIR/$ASSETS_DIR/"

# 3. Process Markdown files
echo "3. Processing Markdown files from $SRC_DIR..."

# Use find to locate all Markdown files, then loop through them
find "$SRC_DIR" -name "*.md" | while read -r md_file; do
    # Calculate relative path from SRC_DIR (e.g., concepts/page.md)
    relative_path="${md_file#$SRC_DIR/}"

    # Determine output HTML file path
    # Replace .md with .html
    # Maintain directory structure
    output_html_file="$BUILD_DIR/${relative_path%.md}.html"

    # Ensure output directory exists for the HTML file
    mkdir -p "$(dirname "$output_html_file")"

    echo "  - Processing $md_file -> $output_html_file"

    # --- Pre-processing for Wiki-style Links (Optional but powerful) ---
    # If you use [[Page Name]] syntax, uncomment and adapt this section.
    # This is a very basic replacement. For proper slugification from [[Page Title]] to (page-title.html),
    # you'd likely need a more sophisticated script (e.g., Python) that reads all titles
    # and maps them to their respective slugs/filenames.
    #
    # temp_md_file=$(mktemp)
    # sed -E 's/\[\[([^\]]+)\]\]/\[\1\](\L\1.html)/g' "$md_file" | \
    # sed -E 's/\]\(([^\)]+) ([^\)]+)\.html\)/](\1-\2.html)/g' > "$temp_md_file"
    # INPUT_FILE_FOR_PANDOC="$temp_md_file"
    #
    # If not using pre-processing, Pandoc directly processes the original MD file
    INPUT_FILE_FOR_PANDOC="$md_file"


    # --- Pandoc Conversion ---
    # pandoc options:
    # -s: standalone (complete HTML document)
    # --template: use our custom HTML template
    # -f markdown+yaml_metadata_block: specify input format, enabling YAML front matter
    # --metadata title:"Fallback Title": provide a default title if not in front matter
    # --css: link to specific CSS (relative to output HTML, often handled by template)
    # --toc (optional): adds a table of contents if you want one
    pandoc "$INPUT_FILE_FOR_PANDOC" \
           -s \
           --template="$TEMPLATE_FILE" \
           -f markdown+yaml_metadata_block \
           -o "$output_html_file"

    # Clean up temp file if used
    # rm -f "$temp_md_file"

done

# 4. Copy images and other assets from SRC_DIR (e.g., src/images/)
echo "4. Copying source assets (images, etc.)..."
# Find all files in src/ that are NOT .md files
find "$SRC_DIR" -type f ! -name "*.md" | while read -r asset_file; do
    relative_asset_path="${asset_file#$SRC_DIR/}"
    dest_asset_path="$BUILD_DIR/$relative_asset_path"
    mkdir -p "$(dirname "$dest_asset_path")"
    cp "$asset_file" "$dest_asset_path"
    echo "  - Copied $asset_file -> $dest_asset_path"
done


echo "--- Wiki Build Complete! ---"
echo "Your wiki is ready in the '$BUILD_DIR' directory."
echo "You can now open '$BUILD_DIR/index.html' in your browser."
echo "Or serve it with a simple web server like 'python3 -m http.server' from '$BUILD_DIR'."

Making the script executable: chmod +x build.sh

Running the build: ./build.sh

Explanation of the Build Script

  • Configuration: Defines paths for source, build, templates, and assets.
  • Cleaning: Removes the previous build/ directory to ensure a fresh build.
  • Copying Global Assets: Copies assets/style.css (and any other files in assets/) directly into build/assets/. Your Pandoc template should link to these.
  • Processing Markdown Files:
    • find "$SRC_DIR" -name "*.md": Recursively finds all Markdown files within your src/ directory.
    • while read -r md_file; do ... done: Loops through each found Markdown file.
    • relative_path="${md_file#$SRC_DIR/}": Extracts the path relative to src/ (e.g., concepts/markdown-basics.md).
    • output_html_file="$BUILD_DIR/${relative_path%.md}.html": Constructs the output HTML path, replacing .md with .html and placing it in build/, maintaining the directory structure.
    • mkdir -p "$(dirname "$output_html_file")": Creates any necessary subdirectories in build/ for the HTML file.
    • Pandoc Command: Uses pandoc to convert the Markdown file to HTML, applying your template and enabling YAML front matter parsing.
  • Copying Source Assets: Crucially, this step finds any files in src/ that aren’t Markdown (like images) and copies them to the build/ directory, maintaining their relative paths. This ensures your ![Alt Text](../images/diagram.png) links work in the final HTML.

Advanced Considerations & Tips

Version Control with Git

Initialize a Git repository at the root of your wiki-root directory.

cd wiki-root
git init
echo "build/" >> .gitignore # Ignore the generated build directory
git add .
git commit -m "Initial wiki setup"

Regularly commit your changes, push to a remote repository (GitHub, GitLab, Bitbucket), and enjoy the full power of version control.

Adding Basic Search

For a static wiki, a full-text search engine can be tricky. However, some options:

  • grep: For CLI-based searching of your source Markdown files.
    grep -r "your search term" src/
  • Client-Side Search: For a browser-based search, solutions like Lunr.js or FlexSearch.js can index your content in the browser, offering a surprisingly robust search experience for static sites. This would require an additional build step to generate the search index.

Deployment

Since your wiki is entirely static HTML, deployment is incredibly straightforward:

  • Web Server: Simply copy the contents of your build/ directory to any web server (Apache, Nginx).
  • rsync: A great tool for efficient deployment.
    rsync -avz --delete build/ user@your-server:/var/www/your-wiki/
  • GitHub Pages / GitLab Pages: Excellent free hosting options for static sites. Just push your build/ directory to a specific branch (gh-pages for GitHub Pages) or configure your CI/CD to publish it.

Continuous Integration/Continuous Deployment (CI/CD)

For larger wikis or collaborative projects, you can automate the build process using CI/CD pipelines (e.g., GitHub Actions, GitLab CI/CD, Jenkins). Every time a change is pushed to your Git repository, the pipeline can automatically run build.sh and deploy the updated wiki.

Styling and Interactivity

  • CSS: Integrate your assets/style.css into your Pandoc template for custom styling.
  • JavaScript: Add assets/script.js for minor interactivity (e.g., toggle menus, search UI).

Broken Link Checking

Tools like linkchecker can scan your generated HTML files for broken internal and external links, helping you maintain a high-quality wiki.

Limitations (Honest Assessment)

While powerful, this approach isn’t for every use case:

  • No Dynamic Features: There’s no built-in commenting system, user authentication, live editing, or complex permissions. It’s a read-only wiki.
  • Steeper Learning Curve for Complex Builds: While simple builds are easy, implementing sophisticated features like cross-page slugification or custom shortcodes requires comfort with shell scripting, sed, awk, or potentially a small amount of Python/Node.js.
  • No WYSIWYG Editor: Content creation is purely Markdown in a text editor. This is a feature for many, but a limitation for those who prefer a visual editor.
  • Scalability for Extreme Cases: For a wiki with tens of thousands of pages, the find and pandoc loop might become slow, although static serving itself scales extremely well.

Conclusion

Building a wiki from Markdown and CLI tools like Pandoc offers a powerful, portable, and enduring solution for knowledge management. It emphasizes plain text, version control, and simplicity, giving you ultimate control over your content and its presentation. While it may require a bit more initial scripting than a ready-made platform, the long-term benefits in terms of maintainability, longevity, and freedom from proprietary systems are immeasurable.

Embrace the power of the command line, and start building your robust, future-proof wiki today.

Last updated on