Commands.page Logo

How to Limit Wget Recursion Depth on Ubuntu Linux

This article provides a concise guide on restricting the directory levels wget traverses while downloading websites on Ubuntu. It covers the essential command-line flags required to set a specific recursion limit, ensuring you only retrieve the necessary data without consuming excessive bandwidth or storage space.

The Core Command Flags

To control how deep wget goes into a website’s directory structure, you need two specific flags. The -r flag enables recursive downloading, and the -l flag sets the maximum depth level.

The basic syntax is:

wget -r -l [depth] [URL]

Replace [depth] with an integer representing the number of levels you wish to download. Replace [URL] with the target website address.

Setting a Specific Depth Limit

If you want to download the homepage and only the links directly found on that page, set the level to 1. For a slightly broader download that includes links found on those subsequent pages, set the level to 2.

Execute the following command in your Ubuntu terminal:

wget -r -l 2 https://example.com

In this scenario, wget will not follow any links found on the second layer of pages. This prevents the tool from downloading the entire site indefinitely.

Preventing Parent Directory Access

When downloading a specific section of a site, you often want to prevent wget from moving up to parent directories. Combine the depth limit with the -np (no-parent) flag.

wget -r -l 2 -np https://example.com/documents/

This ensures wget stays within the /documents/ directory and respects the two-level depth restriction.

Unlimited Recursion

If you intend to download the entire site without any depth restrictions, you can set the level to inf. Use this with caution as it may download a large amount of data.

wget -r -l inf https://example.com

Saving Files for Local Viewing

When mirroring a site with limited recursion, the links may still point to the live website. To adjust the links so they work locally on your Ubuntu machine, add the -k flag to your command.

wget -r -l 2 -k https://example.com

This converts the links in the downloaded files to point to the local directory structure created by wget.