Download Entire Website Recursively with Wget on Ubuntu

This guide explains how to use the wget command-line utility on Ubuntu to download entire websites recursively. You will learn the specific syntax required to mirror site structures, follow links, and save pages for offline browsing without manual intervention.

To begin, open your terminal on your Ubuntu system. Ensure wget is installed by running sudo apt install wget. The basic command to download a website recursively involves using the mirror flag, which enables recursion, infinite depth, and timestamping.

The most effective command syntax is:

wget --mirror --convert-links --adjust-extension --page-requisites --no-parent https://example.com

Here is the breakdown of the flags used in this command:

--mirror: Turns on options suitable for mirroring, including infinite recursion and timestamping.
--convert-links: Converts links in downloaded HTML files to make them suitable for local viewing.
--adjust-extension: Saves files with the proper extension (e.g., .html) if the server does not provide it.
--page-requisites: Downloads all necessary files to display the page properly, such as CSS and images.
--no-parent: Ensures wget does not ascend to the parent directory, keeping the download restricted to the specified path.

Replace https://example.com with the actual URL of the website you wish to download. The files will be saved in a directory named after the domain in your current working folder.

It is important to respect website ownership and server load. You should add a wait interval between requests to avoid overwhelming the server. To add a 1-second delay between each request, include the --wait=1 flag in your command. Always check the site’s robots.txt file to ensure you are permitted to crawl and download the content.