Commands.page Logo

How to Mirror a Website Exactly Using Wget on Ubuntu

This guide explains how to use the wget command-line tool on Ubuntu to create a complete local mirror of a website. You will learn the specific flags required to download all pages, images, and assets recursively while maintaining the original directory structure. By following these steps, you can browse the mirrored site offline exactly as it appeared online.

Prerequisites

Ensure you have wget installed on your Ubuntu system. It is typically pre-installed, but you can verify or install it using the terminal:

sudo apt update
sudo apt install wget

The Mirror Command

To mirror a website exactly, you need to combine several flags that handle recursion, file conversion, and asset downloading. Open your terminal and use the following command structure:

wget --mirror --convert-links --adjust-extension --page-requisites --no-parent https://example.com

Understanding the Flags

Each option in the command serves a specific purpose to ensure the local copy functions correctly:

Respecting Server Rules

When mirroring a website, it is crucial to respect the server’s resources and rules. Check the robots.txt file located at https://example.com/robots.txt to see if mirroring is allowed. To prevent overloading the server, you can add a delay between requests:

wget --mirror --convert-links --adjust-extension --page-requisites --no-parent --wait=2 --random-wait https://example.com

The --wait=2 option adds a 2-second delay between requests, and --random-wait adds further variation to mimic human browsing behavior.

Viewing the Mirrored Site

Once the command finishes, a folder named after the website domain will appear in your current directory. Navigate into this folder and open the index.html file with your web browser to browse the site offline.

cd example.com
firefox index.html

All links should now point to your local files, allowing you to navigate the mirrored structure without an internet connection.