How to Download First Level Links with Wget on Ubuntu
This guide explains how to use the wget command in Ubuntu to download only the first level of links from a specific webpage. You will learn the specific flags required to limit recursion depth, ensuring you capture linked pages without downloading the entire site structure. This method is efficient for mirroring content shallowly while saving bandwidth and storage space.
Install Wget
Most Ubuntu installations come with wget pre-installed. To verify installation or install the tool if it is missing, open your terminal and run the following command:
sudo apt update
sudo apt install wgetThe Core Command
To download a webpage and only follow the links found directly on that page without going deeper, use the recursive flag combined with the level flag. Execute the following command in your terminal, replacing the URL with your target website:
wget -r -l 1 https://example.comUnderstanding the Flags
- -r: Enables recursive downloading, allowing wget to follow links.
- -l 1: Sets the recursion depth to 1. This tells wget to download the starting page and any pages linked directly from it, but not to follow links found on those subsequent pages.
- -p: (Optional) Download all files necessary to properly display the page, such as images and CSS.
- -k: (Optional) Convert links in downloaded files to make them suitable for local viewing.
Complete Example
For a complete local mirror of the first level of links including assets, use this command:
wget -r -l 1 -p -k https://example.comImportant Considerations
Always respect the website’s robots.txt file and terms
of service. Aggressive downloading can strain server resources. If you
encounter issues, you may need to add a delay between requests using the
--wait flag to reduce server load.