How to Run a Link Checker on Your Marmite Website

Maintaining working links is crucial for website quality and user experience. Lychee is a fast, asynchronous link checker that can validate all links in your Marmite-generated website, helping you identify broken links, invalid URLs, and other link-related issues.

What is Lychee?

Lychee is a command-line tool written in Rust that checks links in markdown files, HTML files, and websites. It's particularly useful for static sites like those generated by Marmite, as it can crawl your entire site and verify that all internal and external links are working correctly.

Installation

Using Cargo (Rust Package Manager)

cargo install lychee

Using Package Managers

macOS (Homebrew):

brew install lychee

Arch Linux:

pacman -S lychee

Ubuntu/Debian:

# Download from GitHub releases
wget https://github.com/lycheeverse/lychee/releases/latest/download/lychee-x86_64-unknown-linux-gnu.tar.gz
tar -xzf lychee-x86_64-unknown-linux-gnu.tar.gz
sudo mv lychee /usr/local/bin/

For other installation methods, check the official installation guide.

Basic Usage with Marmite

To check links in your Marmite website, you'll need to first build and serve your site, then run Lychee against it.

Step 1: Build and Serve Your Site

# Build your Marmite site and serve it locally
marmite your-content-dir ./public --serve

This will start a local server (typically on http://localhost:8000) with your generated website.

Step 2: Run Lychee

In another terminal, run Lychee against your served site:

# Basic link checking
lychee http://localhost:8000

# More verbose output with HTML file checking
lychee --verbose http://localhost:8000 --extensions html

# Check the built files directly (offline mode)
lychee --verbose ./public --extensions html

Advanced Usage

Excluding Problematic Links

Some links may be problematic for automated checking (rate-limited APIs, private content, etc.). You can exclude them:

lychee --verbose ./public \
    --extensions html \
    --exclude "linkedin\.com|twitter\.com|facebook\.com" \
    --exclude-mail

Common Exclusion Patterns

# Exclude social media and local development links
lychee --verbose ./public \
    --extensions html \
    --exclude "linkedin|twitter|facebook|localhost|127\.0\.0\.1" \
    --exclude-file .lycheeignore

Configuration File

Create a lychee.toml configuration file in your project root:

# lychee.toml
verbose = true
no_progress = false
max_redirects = 5
timeout = 30

# File extensions to check
extensions = ["html", "md"]

# URLs to exclude (regex patterns)
exclude = [
    "^mailto:",
    "linkedin\\.com",
    "twitter\\.com",
    "facebook\\.com",
    "localhost",
    "127\\.0\\.0\\.1"
]

# Exclude private/authentication-required links
exclude_private = true
exclude_mail = true

# Headers for requests (useful for APIs)
headers = [
    "Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "User-Agent=Mozilla/5.0 (compatible; lychee/0.13.0)"
]

Then run simply:

lychee ./public

GitHub Actions Integration

Automate link checking in your CI/CD pipeline by adding Lychee to your GitHub Actions workflow.

Create .github/workflows/link-check.yml:

name: Link Check

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  schedule:
    # Run weekly on Sundays at 00:00 UTC
    - cron: '0 0 * * 0'

jobs:
  link-check:
    runs-on: ubuntu-latest
    
    steps:
    - name: Checkout repository
      uses: actions/checkout@v4
      
    - name: Install Rust and Marmite
      uses: actions-rs/toolchain@v1
      with:
        toolchain: stable
        
    - name: Install Marmite
      run: cargo install marmite
      
    - name: Build website
      run: marmite content ./public
      
    - name: Link Checker
      uses: lycheeverse/lychee-action@v1.10.0
      with:
        args: >
          --verbose
          --no-progress
          --extensions html
          --exclude "linkedin|twitter|facebook|localhost"
          ./public
        fail: true
        
    - name: Create Issue on Failure
      if: failure()
      uses: peter-evans/create-issue-from-file@v5
      with:
        title: Link Check Failed
        content-filepath: ./lychee/out.md
        labels: |
          bug
          links

Advanced GitHub Actions Setup

For more control, you can use the manual approach:

name: Comprehensive Link Check

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  schedule:
    - cron: '0 6 * * 1'  # Weekly on Mondays

jobs:
  link-check:
    runs-on: ubuntu-latest
    
    steps:
    - name: Checkout
      uses: actions/checkout@v4
      
    - name: Setup Rust
      uses: actions-rs/toolchain@v1
      with:
        toolchain: stable
        
    - name: Install dependencies
      run: |
        cargo install marmite
        cargo install lychee
        
    - name: Build site
      run: marmite content ./public
      
    - name: Check internal links
      run: |
        lychee --verbose ./public \
          --extensions html,md \
          --exclude "linkedin|twitter|facebook|mailto:" \
          --output ./link-check-results.md
          
    - name: Upload results
      if: always()
      uses: actions/upload-artifact@v4
      with:
        name: link-check-results
        path: ./link-check-results.md

Docker Usage

You can also run Lychee using Docker, which is useful for consistent environments:

# Build your site first
marmite content ./public

# Run Lychee in Docker
docker run --rm -v $(pwd):/workspace lycheeverse/lychee \
  --verbose /workspace/public \
  --extensions html \
  --exclude "linkedin|twitter"

Best Practices

1. Regular Automated Checks

Set up GitHub Actions to run link checks:

  • On every push to main branch
  • On pull requests
  • Weekly scheduled runs to catch external link rot

2. Reasonable Exclusions

Exclude links that are expected to fail in automated environments:

  • Social media sites that block bots
  • Authentication-required content
  • Rate-limited APIs
  • Local development URLs

3. Handle Rate Limiting

Some sites may rate-limit your requests. Configure appropriate delays:

# In lychee.toml
timeout = 30
max_concurrent = 8

4. Monitor External Dependencies

Keep track of which external sites your content links to, as these are outside your control and may break over time.

Troubleshooting

Common Issues

SSL Certificate Errors:

lychee --insecure ./public  # Skip SSL verification (use carefully)

Rate Limiting:

lychee --delay 1 ./public  # Add 1-second delay between requests

Timeout Issues:

lychee --timeout 60 ./public  # Increase timeout to 60 seconds

Debugging Failed Links

Use verbose mode to see detailed information about failed links:

lychee --verbose --debug ./public

This will show you exactly why each link failed, helping you decide whether to fix the link or exclude it.

Integration with Marmite Workflow

Here's a complete workflow for maintaining link quality in your Marmite site:

#!/bin/bash
# build-and-check.sh

# Build the site
echo "Building Marmite site..."
marmite content ./public

# Check links
echo "Checking links..."
lychee --verbose ./public \
  --extensions html \
  --exclude "linkedin|twitter|facebook|localhost|127\.0\.0\.1" \
  --exclude-mail \
  --output link-check-results.txt

# Check results
if [ $? -eq 0 ]; then
    echo "✅ All links are working!"
else
    echo "❌ Some links are broken. Check link-check-results.txt"
    exit 1
fi

Make it executable and use it in your build process:

chmod +x build-and-check.sh
./build-and-check.sh

Conclusion

Lychee provides a robust solution for maintaining link quality in your Marmite-generated websites. By integrating it into your development workflow and CI/CD pipeline, you can catch broken links early and maintain a high-quality user experience.

The combination of Marmite's fast site generation and Lychee's efficient link checking creates a powerful workflow for maintaining professional static websites with confidence in their link integrity.

Please consider giving a ☆ on Marmite Github repository, that helps a lot!

Comments