Exclude File Types with Wget Recursive Download on Ubuntu
This guide explains how to use the wget command on Ubuntu to recursively download website content while filtering out unwanted file extensions. You will learn the specific command flags required to reject or accept specific formats, ensuring you only save the data you need without cluttering your storage with images, videos, or documents you do not want.
To begin, ensure wget is installed on your Ubuntu system. Open your terminal and run the following command to install it if it is not already present:
sudo apt update
sudo apt install wgetThe core functionality relies on the -r flag for
recursive downloading and the -R flag to reject specific
file types. The -R flag accepts a comma-separated list of
file extensions you wish to ignore. For example, if you want to download
a site but exclude all PDF and JPG files, use this command:
wget -r -R pdf,jpg https://example.comYou can also use the -A flag to accept only specific
file types, which effectively excludes everything else. If you only want
to download HTML and CSS files, the command would look like this:
wget -r -A html,css https://example.comIt is important to note that file extensions are case-sensitive. If
the server uses uppercase extensions like .PDF or
.JPG, you must include those in your reject list as well.
To cover both cases, list them explicitly:
wget -r -R pdf,jpg,PDF,JPG https://example.comWhen combining these flags, wget will process the links found on the page and download only the files that match your criteria. This method prevents unnecessary bandwidth usage and keeps your local directory clean during large site mirroring operations.