How to Mirror a Website Exactly Using Wget on Ubuntu
This guide explains how to use the wget command-line tool on Ubuntu to create a complete local mirror of a website. You will learn the specific flags required to download all pages, images, and assets recursively while maintaining the original directory structure. By following these steps, you can browse the mirrored site offline exactly as it appeared online.
Prerequisites
Ensure you have wget installed on your Ubuntu system. It
is typically pre-installed, but you can verify or install it using the
terminal:
sudo apt update
sudo apt install wgetThe Mirror Command
To mirror a website exactly, you need to combine several flags that handle recursion, file conversion, and asset downloading. Open your terminal and use the following command structure:
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent https://example.comUnderstanding the Flags
Each option in the command serves a specific purpose to ensure the local copy functions correctly:
--mirror(-m): Turns on options suitable for mirroring, including infinite recursion and time-stamping.--convert-links(-k): Converts links in the downloaded HTML files so they work locally rather than pointing to the live website.--adjust-extension(-E): Saves files with the proper extension (e.g., .html) if the server does not provide one.--page-requisites(-p): Downloads all necessary files to display the page properly, such as CSS, images, and JavaScript.--no-parent(-np): Ensures wget does not ascend to the parent directory, keeping the download restricted to the specific site section.
Respecting Server Rules
When mirroring a website, it is crucial to respect the server’s
resources and rules. Check the robots.txt file located at
https://example.com/robots.txt to see if mirroring is
allowed. To prevent overloading the server, you can add a delay between
requests:
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent --wait=2 --random-wait https://example.comThe --wait=2 option adds a 2-second delay between
requests, and --random-wait adds further variation to mimic
human browsing behavior.
Viewing the Mirrored Site
Once the command finishes, a folder named after the website domain
will appear in your current directory. Navigate into this folder and
open the index.html file with your web browser to browse
the site offline.
cd example.com
firefox index.htmlAll links should now point to your local files, allowing you to navigate the mirrored structure without an internet connection.