How to Limit Wget Recursion Depth on Ubuntu Linux
This article provides a concise guide on restricting the directory levels wget traverses while downloading websites on Ubuntu. It covers the essential command-line flags required to set a specific recursion limit, ensuring you only retrieve the necessary data without consuming excessive bandwidth or storage space.
The Core Command Flags
To control how deep wget goes into a website’s directory structure,
you need two specific flags. The -r flag enables recursive
downloading, and the -l flag sets the maximum depth
level.
The basic syntax is:
wget -r -l [depth] [URL]Replace [depth] with an integer representing the number
of levels you wish to download. Replace [URL] with the
target website address.
Setting a Specific Depth Limit
If you want to download the homepage and only the links directly found on that page, set the level to 1. For a slightly broader download that includes links found on those subsequent pages, set the level to 2.
Execute the following command in your Ubuntu terminal:
wget -r -l 2 https://example.comIn this scenario, wget will not follow any links found on the second layer of pages. This prevents the tool from downloading the entire site indefinitely.
Preventing Parent Directory Access
When downloading a specific section of a site, you often want to
prevent wget from moving up to parent directories. Combine the depth
limit with the -np (no-parent) flag.
wget -r -l 2 -np https://example.com/documents/This ensures wget stays within the /documents/ directory
and respects the two-level depth restriction.
Unlimited Recursion
If you intend to download the entire site without any depth
restrictions, you can set the level to inf. Use this with
caution as it may download a large amount of data.
wget -r -l inf https://example.comSaving Files for Local Viewing
When mirroring a site with limited recursion, the links may still
point to the live website. To adjust the links so they work locally on
your Ubuntu machine, add the -k flag to your command.
wget -r -l 2 -k https://example.comThis converts the links in the downloaded files to point to the local directory structure created by wget.