本文目录导读:

wget 批量下载(Linux/Unix)
从URL列表下载
# 创建URL列表文件 urls.txt,每行一个URL # 然后执行: wget -i urls.txt
通配符批量下载
# 下载所有匹配的文件
wget http://example.com/images/{1..100}.jpg
# 或使用通配符
wget http://example.com/files/*.zip
curl 批量下载脚本
#!/bin/bash
# download_images.sh
for i in $(seq 1 100); do
url="https://example.com/images/image_$i.jpg"
curl -O "$url"
echo "Downloaded image $i"
# 添加延迟避免被封
sleep 1
done
aria2 批量下载(推荐)
安装
# Ubuntu/Debian sudo apt install aria2 # macOS brew install aria2
使用方式
# 从文件列表下载 aria2c -i urls.txt # 多个连接加速 aria2c -x 16 -s 16 -j 5 -i urls.txt
Python 脚本批量下载
import requests
import os
from concurrent.futures import ThreadPoolExecutor
import urllib.parse
def download_file(url, save_path):
try:
response = requests.get(url, timeout=30)
response.raise_for_status()
# 确保目录存在
os.makedirs(os.path.dirname(save_path), exist_ok=True)
with open(save_path, 'wb') as f:
f.write(response.content)
print(f"Downloaded: {url}")
return True
except Exception as e:
print(f"Failed: {url} - {e}")
return False
def batch_download(urls, save_dir, max_workers=5):
"""批量下载文件"""
with ThreadPoolExecutor(max_workers=max_workers) as executor:
for url in urls:
# 从URL提取文件名
parsed = urllib.parse.urlparse(url)
filename = os.path.basename(parsed.path)
save_path = os.path.join(save_dir, filename)
executor.submit(download_file, url, save_path)
# 使用示例
urls = [
"https://example.com/file1.pdf",
"https://example.com/file2.pdf",
# 更多URL...
]
batch_download(urls, "./downloads", max_workers=3)
PowerShell 批量下载(Windows)
# download_files.ps1
$urls = @(
"https://example.com/file1.pdf",
"https://example.com/file2.pdf"
)
$outputDir = "C:\Downloads"
foreach ($url in $urls) {
$filename = Split-Path $url -Leaf
$outputPath = Join-Path $outputDir $filename
Write-Host "Downloading $filename..."
Invoke-WebRequest -Uri $url -OutFile $outputPath
# 添加延迟
Start-Sleep -Seconds 1
}
特殊场景脚本
图片批量下载(带解析)
# 爬取页面中的图片
import requests
from bs4 import BeautifulSoup
import re
def download_images_from_page(page_url, save_dir):
response = requests.get(page_url)
soup = BeautifulSoup(response.text, 'html.parser')
# 找到所有图片链接
images = soup.find_all('img')
for img in images:
img_url = img.get('src')
if img_url and img_url.startswith('http'):
# 提取文件名
filename = img_url.split('/')[-1]
save_path = f"{save_dir}/{filename}"
# 下载图片
img_data = requests.get(img_url).content
with open(save_path, 'wb') as f:
f.write(img_data)
实用工具脚本
断点续传 + 多线程
import requests
import threading
import os
class Downloader:
def __init__(self, url, filename, num_threads=4):
self.url = url
self.filename = filename
self.num_threads = num_threads
def download(self):
# 获取文件大小
response = requests.head(self.url)
file_size = int(response.headers.get('content-length', 0))
# 分块下载
chunk_size = file_size // self.num_threads
threads = []
for i in range(self.num_threads):
start = i * chunk_size
end = start + chunk_size - 1 if i < self.num_threads - 1 else file_size - 1
thread = threading.Thread(
target=self._download_chunk,
args=(start, end, f"{self.filename}.part{i}")
)
threads.append(thread)
thread.start()
# 等待所有线程完成
for thread in threads:
thread.join()
# 合并文件
self._merge_files()
def _download_chunk(self, start, end, temp_file):
headers = {'Range': f'bytes={start}-{end}'}
response = requests.get(self.url, headers=headers)
with open(temp_file, 'wb') as f:
f.write(response.content)
使用建议
- 速度限制:添加
time.sleep()避免触发反爬虫 - 错误重试:建议添加重试机制
- 日志记录:记录下载成功/失败的文件
- 断点续传:支持已下载文件的跳过
- 代理支持:必要时使用代理
根据您的具体需求(下载什么类型文件、来源、系统环境),选择合适的方案,如果需要更具体的示例,请告诉我您的使用场景。