Download Entire Website Recursively with Wget on Ubuntu
This guide explains how to use the wget command-line utility on Ubuntu to download entire websites recursively. You will learn the specific syntax required to mirror site structures, follow links, and save pages for offline browsing without manual intervention.
To begin, open your terminal on your Ubuntu system. Ensure wget is
installed by running sudo apt install wget. The basic
command to download a website recursively involves using the mirror
flag, which enables recursion, infinite depth, and timestamping.
The most effective command syntax is:
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent https://example.comHere is the breakdown of the flags used in this command:
--mirror: Turns on options suitable for mirroring, including infinite recursion and timestamping.--convert-links: Converts links in downloaded HTML files to make them suitable for local viewing.--adjust-extension: Saves files with the proper extension (e.g., .html) if the server does not provide it.--page-requisites: Downloads all necessary files to display the page properly, such as CSS and images.--no-parent: Ensures wget does not ascend to the parent directory, keeping the download restricted to the specified path.
Replace https://example.com with the actual URL of the
website you wish to download. The files will be saved in a directory
named after the domain in your current working folder.
It is important to respect website ownership and server load. You
should add a wait interval between requests to avoid overwhelming the
server. To add a 1-second delay between each request, include the
--wait=1 flag in your command. Always check the site’s
robots.txt file to ensure you are permitted to crawl and
download the content.