Writing Logging Scripts That Save Your Weekend

All Notes

Technology

The clock ticks past midnight. You’re staring at a blank console or an inscrutable error message from a script that should be running smoothly. Another production issue, another lost evening, another weekend potentially sacrificed to the debugging gods. Sound familiar?

This nightmare scenario is precisely why robust, well-thought-out logging isn’t just a “nice-to-have” – it’s a fundamental pillar of reliable software and system administration. It’s the difference between blindly poking around in the dark and having a clear trail of breadcrumbs leading you directly to the problem. More importantly, it’s the difference between firefighting all weekend and enjoying your well-deserved time off.

This post isn’t about slapping a few print() statements into your code. It’s about building a logging strategy that provides actionable insights, prevents issues before they escalate, and ultimately, saves your sanity.

Beyond `print()`: Why Logging Truly Matters

Many developers start with print() or echo for debugging. While useful for quick checks during development, they fall short in production environments:

No Context: A simple print("Error!") tells you nothing about what failed, when, why, or with what data.
Ephemeral: Output often disappears after script execution or is buried in console buffers.
Lack of Levels: You can’t differentiate between informational messages, warnings, or critical errors easily.
Difficult to Parse: Ad-hoc outputs are hard for machines to process, making automation, alerting, and analysis impossible.
Performance Impact: Excessive printing can slow down applications, especially in high-throughput systems.

Effective logging addresses these issues by providing:

Debugging in Production: Trace issues in live systems without attaching a debugger.
Auditing and Compliance: Maintain a historical record of system activities for security and regulatory requirements.
Performance Monitoring: Identify bottlenecks and unusual resource consumption.
Security Insights: Detect suspicious activities or failed login attempts.
Proactive Issue Detection: Trigger alerts based on critical errors or unusual patterns before users are impacted.

Core Principles of Effective Logging

Before diving into code, let’s establish some universal principles:

Clarity and Context: Every log message should clearly state what happened, when (timestamp), where (module/function), and why (error type, reason). Include relevant data (e.g., user ID, transaction ID, file path).
Appropriate Log Levels: Not all messages are equally important. Use standard logging levels to categorize your messages:
- DEBUG: Detailed information, typically only of interest when diagnosing problems.
- INFO: Confirmation that things are working as expected.
- WARNING: An indication that something unexpected happened, or indicative of a problem that may occur if no action is taken. The software is still working as expected.
- ERROR: Due to a more serious problem, the software has not been able to perform some function.
- CRITICAL: A serious error, indicating that the program itself may be unable to continue running. Using levels allows you to filter logs based on severity.

Structured Logging: Instead of free-form text, log data in a structured format like JSON or key-value pairs. This makes logs easily parsable by machines, enabling powerful querying, analysis, and visualization tools.

{
  "timestamp": "2023-10-27T10:30:00Z",
  "level": "ERROR",
  "message": "Failed to process payment",
  "component": "payment_service",
  "transaction_id": "tx_abc123",
  "user_id": "user_456",
  "error_code": "E007",
  "details": "Insufficient funds"
}

Actionable Information: Can an alert be triggered from this log? Does it contain enough information for an engineer to begin troubleshooting without further investigation?
Performance Considerations: Logging adds overhead. While often negligible, for high-throughput applications, consider asynchronous logging to avoid blocking your main process.
Sensitive Data Handling: Crucially, never log sensitive information directly, such as passwords, API keys, credit card numbers, or personally identifiable information (PII). Mask or redact sensitive fields before logging. This is a common security pitfall.

Logging in Practice: Tools and Examples

Let’s look at practical examples in Python and Bash, two common scripting environments.

Python’s `logging` Module: Your Best Friend

Python’s built-in logging module is incredibly powerful and flexible. It supports multiple handlers, formatters, and filters, allowing you to route logs to files, consoles, network sockets, and more.

1. Basic Setup (Console and File)

import logging
import sys

# 1. Get a logger instance
logger = logging.getLogger(__name__) # __name__ provides context (module name)
logger.setLevel(logging.DEBUG)     # Set the minimum level to log

# 2. Create handlers (where logs go)
# Console handler
console_handler = logging.StreamHandler(sys.stdout)
console_handler.setLevel(logging.INFO) # Only INFO and above to console

# File handler
file_handler = logging.FileHandler('app.log')
file_handler.setLevel(logging.DEBUG)  # Log all DEBUG and above to file

# 3. Create formatters (how logs look)
# Basic formatter for console
console_formatter = logging.Formatter('%(levelname)s: %(message)s')

# Detailed formatter for file (includes timestamp, module, line number)
file_formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(funcName)s:%(lineno)d - %(message)s')

# 4. Add formatters to handlers
console_handler.setFormatter(console_formatter)
file_handler.setFormatter(file_formatter)

# 5. Add handlers to the logger
logger.addHandler(console_handler)
logger.addHandler(file_handler)

# --- Now, log messages! ---
logger.debug("This is a debug message.")
logger.info("Application started successfully.")
try:
    result = 10 / 0
except ZeroDivisionError as e:
    logger.error("A critical error occurred during calculation!", exc_info=True) # exc_info=True adds stack trace
    logger.critical("Application is shutting down due to unrecoverable error.")

logger.warning("Configuration file not found, using default settings.")

Key takeaways from the Python example:

getLogger(__name__): Best practice for creating loggers specific to modules.
setLevel(): Controls the minimum severity level that this specific handler or logger will process.
StreamHandler vs. FileHandler: Directs output to console (sys.stdout/sys.stderr) or a file.
Formatter: Defines the structure of your log messages. Common format fields include %(asctime)s, %(levelname)s, %(name)s, %(message)s, %(filename)s, %(lineno)d, %(funcName)s. For more details, refer to the Python logging module documentation.
exc_info=True: Crucial for logging exceptions, as it automatically includes the traceback.

2. Structured Logging with extra

For structured logging (e.g., JSON), you can pass extra context via the extra dictionary. While you can write a custom formatter, for production use cases, it’s often better to use a dedicated library that handles JSON output more robustly (e.g., python-json-logger or the popular Loguru library).

import logging
import json

# Custom JSON formatter (simplified example)
class JsonFormatter(logging.Formatter):
    def format(self, record):
        log_record = {
            "timestamp": self.formatTime(record, self.datefmt),
            "level": record.levelname,
            "message": record.getMessage(),
            "component": record.name,
            "function": record.funcName,
            "line": record.lineno,
        }
        # Add extra dictionary items from record.extra
        if hasattr(record, 'extra') and isinstance(record.extra, dict):
            log_record.update(record.extra)

        # Add exception info if present
        if record.exc_info:
            log_record["exception"] = self.formatException(record.exc_info)
        
        return json.dumps(log_record)

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(JsonFormatter('%Y-%m-%dT%H:%M:%SZ')) # ISO 8601 for consistency
logger.addHandler(handler)

# Example structured logging
user_id = "user_123"
order_id = "order_XYZ"
logger.info("Order processed successfully", extra={'user_id': user_id, 'order_id': order_id})

try:
    raise ValueError("Invalid input data")
except ValueError as e:
    logger.error("Data validation failed", exc_info=True, extra={'input_value': "bad_data"})

Note: A more robust JSON logging solution would typically leverage a dedicated library like python-json-logger or Loguru for production environments, as they handle more edge cases and provide richer features.

3. Log Rotation with RotatingFileHandler

To prevent log files from growing indefinitely and consuming all disk space, use RotatingFileHandler or TimedRotatingFileHandler.

import logging
from logging.handlers import RotatingFileHandler
import os

log_file = 'my_app_rotated.log'

# Ensure the log directory exists
log_dir = os.path.dirname(log_file)
if log_dir and not os.path.exists(log_dir):
    os.makedirs(log_dir)

# Max file size 1MB (1024 * 1024 bytes), keep 5 backup files
rot_handler = RotatingFileHandler(log_file, maxBytes=1024*1024, backupCount=5)
rot_handler.setLevel(logging.INFO)

formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
rot_handler.setFormatter(formatter)

logger_rot = logging.getLogger('my_app')
logger_rot.setLevel(logging.INFO)
logger_rot.addHandler(rot_handler)

for i in range(20000): # Write enough messages to trigger rotation
    logger_rot.info(f"This is log message number {i}.")

This will create my_app_rotated.log, my_app_rotated.log.1, my_app_rotated.log.2, etc., deleting the oldest when the limit is reached.

Bash Scripting: Simple but Effective Logging

Bash scripts are often used for automation, cron jobs, and system tasks, where logging is equally important.

1. Basic Timestamped Logging

#!/bin/bash

LOG_FILE="/var/log/my_script.log"
# Ensure the log directory exists
mkdir -p "$(dirname "$LOG_FILE")" || { echo "Failed to create log directory."; exit 1; }

# Function to log messages with timestamp and level
log_message() {
    local level="$1"
    local message="$2"
    echo "$(date '+%Y-%m-%d %H:%M:%S') [$level] $message" | tee -a "$LOG_FILE"
}

# Redirect stderr (file descriptor 2) to stdout (file descriptor 1) for the entire script.
# This ensures error messages from any command within the script are also captured.
exec 2>&1

log_message INFO "Script started."

# Simulate some work
sleep 2

# Example of a successful command
echo "Listing current directory:"
ls -l

# Example of a command that might fail
echo "Attempting to copy a non-existent file..."
cp /tmp/nonexistent_file.txt /tmp/destination 2>&1 # Explicitly redirect stderr for specific command if not using exec 2>&1

if [ $? -ne 0 ]; then
    log_message ERROR "File copy failed. Check error message above."
fi

log_message INFO "Script finished."

$(date '+%Y-%m-%d %H:%M:%S'): Adds a timestamp to each log message.
tee -a "$LOG_FILE": Outputs to both stdout (console) and appends (-a) to the specified log file. This is useful for seeing live output while also persisting it. If you only want to log to file, use >> "$LOG_FILE".
exec 2>&1: This is powerful. It duplicates stdout to stderr for the remainder of the script. Any command’s stderr output will now also go to the script’s stdout, meaning it will be piped to tee and into your log file.
if [ $? -ne 0 ]; then: Checks the exit status of the last executed command. A non-zero status typically indicates an error.

2. Using the logger Command (for Syslog)

For system-level scripts, you might want to send logs to the system’s syslog daemon (e.g., rsyslog, syslog-ng). This integrates your script’s logs with the OS’s central logging system, making them accessible via journalctl or /var/log files.

#!/bin/bash

# Identifies the source of the log messages in syslog
SCRIPT_NAME="my_cron_script"

log_syslog() {
    local level="$1"
    local message="$2"
    # -i: Log the PID of the logger process
    # -t: Tag the message with SCRIPT_NAME for easy filtering
    # -p: Specify facility.level (e.g., user.info, daemon.err)
    logger -i -t "$SCRIPT_NAME" -p "user.$level" "$message"
}

log_syslog info "Cron job started."

# Simulate an error
INVALID_COMMAND_RESULT=$(ls /nonexistent_directory 2>&1)

if [ $? -ne 0 ]; then
    log_syslog err "Failed to list directory: $INVALID_COMMAND_RESULT"
fi

log_syslog info "Cron job finished."

You can then view these logs using journalctl -t my_cron_script on systemd-based systems, or by checking /var/log/syslog, /var/log/messages, etc., depending on your Linux distribution’s syslog configuration.

3. Log Rotation with logrotate (Linux/Unix)

For standalone log files created by Bash scripts (or any application), logrotate is the standard Linux utility. It’s configured via files in /etc/logrotate.d/ and runs periodically (often daily) via cron. logrotate operates entirely externally to your script, managing the files independently.

Example /etc/logrotate.d/my_script:

/var/log/my_script.log {
    daily             # Rotate daily
    missingok         # Don't error if log file is missing
    rotate 7          # Keep 7 old log files
    compress          # Compress old log files (gzip by default)
    delaycompress     # Compress on the next rotation cycle, not immediately
    notifempty        # Don't rotate if log file is empty
    create 0644 root root # Create new log file with specified permissions (mode, owner, group)
    postrotate        # Commands to run after rotation completes
        # If your script is a long-running daemon that keeps the log file open,
        # you might need to send it a signal (e.g., HUP) to make it re-open its log file.
        # This is typically done by reading the daemon's PID from a .pid file.
        # Example: kill -HUP `cat /var/run/my_script.pid 2>/dev/null`
    endscript
}

For comprehensive details on logrotate options, consult the official logrotate Man Page.

Advanced Concepts & Best Practices

Centralized Logging: For complex applications, microservices architectures, or multiple servers, shipping logs to a centralized system is indispensable. Tools like the ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, Graylog, Loki, or cloud-native services (AWS CloudWatch, Google Cloud Logging, Azure Monitor) allow you to:
- Aggregate logs from all sources.
- Search and filter across petabytes of data quickly.
- Create dashboards for monitoring trends and performance.
- Set up alerts for specific error patterns. This is where structured logging really shines, as these tools can easily parse JSON or key-value pairs, enabling powerful analytics.
Error Handling and Retries: When writing robust scripts, you’ll often implement retries for transient failures (e.g., network issues, temporary service unavailability).
- Log the initial error: Before retrying, log what went wrong.
- Log retry attempts: Indicate that a retry is happening (e.g., “Attempt 1 of 3 to connect to database…”).
- Log final success or failure: Conclude the operation’s status (e.g., “Database connection successful after 2 retries” or “Failed to connect to database after 3 attempts, giving up.”). This provides a clear audit trail of an operation’s lifecycle, which is vital for debugging and understanding system resilience.
Asynchronous Logging: In performance-critical applications or services generating high volumes of logs, writing logs synchronously to disk or network can introduce latency and block your main application thread. Libraries like Python’s Loguru (which often handles this internally) or custom asynchronous handlers (e.g., using QueueHandler in Python’s standard logging module with a separate thread for processing) can offload log writing, ensuring your main application flow remains unblocked. Loguru Library is an excellent choice for a modern, batteries-included logging experience in Python.
Logging Levels in Production: In production, you typically set the default logging level to INFO or WARNING to avoid excessive log volume, which can consume disk space and obscure important messages. DEBUG logs are usually enabled only during active troubleshooting or for specific components. Centralized logging systems can help here by allowing you to dynamically adjust logging levels or filter by level on the fly, reducing noise.
Testing Your Logs: Yes, you can (and should) test your logging! In unit and integration tests, you can capture log output to ensure that expected messages are generated, that sensitive data is not logged, and that errors trigger appropriate logging. This verifies that your logging strategy works as intended and prevents future logging regressions.

The Weekend-Saving Payoff

Imagine this: On a Friday evening, an automated alert pops up: “CRITICAL: Payment service failed to process transaction for user_456, error E007: Insufficient funds”. Because your logs are structured and actionable, you don’t need to SSH into a server, tail countless log files, or try to reproduce the issue.

You immediately know:

What happened: Payment failed.
When: Right now.
Where: Payment service component.
Who was affected: User_456.
The specific error: E007, “Insufficient funds”.

With this information, you can quickly:

Check the user’s balance or account status via an admin tool.
Review recent successful payments to see if this is an isolated incident or part of a larger trend.
Potentially even trigger an automated fix or notification to the user/support team.

This level of insight transforms reactive firefighting into proactive problem-solving. Instead of spending hours deciphering cryptic messages, you spend minutes pinpointing the root cause. This efficiency compounds, leading to fewer incidents, faster resolutions, and ultimately, more undisturbed weekends.

Conclusion

Good logging is an investment. It takes time to design, implement, and maintain. But the returns are immense: clearer insights into your systems, faster debugging, enhanced security, and significantly improved operational stability.