批量下载脚本怎做？

wen 实用脚本 2026-06-05 82

本文目录导读：

批量下载脚本怎做？

wget 批量下载（Linux/Unix）
curl 批量下载脚本
aria2 批量下载（推荐）
Python 脚本批量下载
PowerShell 批量下载（Windows）
特殊场景脚本
实用工具脚本
使用建议

wget 批量下载（Linux/Unix）

从URL列表下载

# 创建URL列表文件 urls.txt，每行一个URL
# 然后执行：
wget -i urls.txt

通配符批量下载

# 下载所有匹配的文件
wget http://example.com/images/{1..100}.jpg
# 或使用通配符
wget http://example.com/files/*.zip

curl 批量下载脚本

#!/bin/bash
# download_images.sh
for i in $(seq 1 100); do
    url="https://example.com/images/image_$i.jpg"
    curl -O "$url"
    echo "Downloaded image $i"
    # 添加延迟避免被封
    sleep 1
done

aria2 批量下载（推荐）

安装

# Ubuntu/Debian
sudo apt install aria2
# macOS
brew install aria2

使用方式

# 从文件列表下载
aria2c -i urls.txt
# 多个连接加速
aria2c -x 16 -s 16 -j 5 -i urls.txt

Python 脚本批量下载

import requests
import os
from concurrent.futures import ThreadPoolExecutor
import urllib.parse
def download_file(url, save_path):
    try:
        response = requests.get(url, timeout=30)
        response.raise_for_status()
        # 确保目录存在
        os.makedirs(os.path.dirname(save_path), exist_ok=True)
        with open(save_path, 'wb') as f:
            f.write(response.content)
        print(f"Downloaded: {url}")
        return True
    except Exception as e:
        print(f"Failed: {url} - {e}")
        return False
def batch_download(urls, save_dir, max_workers=5):
    """批量下载文件"""
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        for url in urls:
            # 从URL提取文件名
            parsed = urllib.parse.urlparse(url)
            filename = os.path.basename(parsed.path)
            save_path = os.path.join(save_dir, filename)
            executor.submit(download_file, url, save_path)
# 使用示例
urls = [
    "https://example.com/file1.pdf",
    "https://example.com/file2.pdf",
    # 更多URL...
]
batch_download(urls, "./downloads", max_workers=3)

PowerShell 批量下载（Windows）

# download_files.ps1
$urls = @(
    "https://example.com/file1.pdf",
    "https://example.com/file2.pdf"
)
$outputDir = "C:\Downloads"
foreach ($url in $urls) {
    $filename = Split-Path $url -Leaf
    $outputPath = Join-Path $outputDir $filename
    Write-Host "Downloading $filename..."
    Invoke-WebRequest -Uri $url -OutFile $outputPath
    # 添加延迟
    Start-Sleep -Seconds 1
}

特殊场景脚本

图片批量下载（带解析）

# 爬取页面中的图片
import requests
from bs4 import BeautifulSoup
import re
def download_images_from_page(page_url, save_dir):
    response = requests.get(page_url)
    soup = BeautifulSoup(response.text, 'html.parser')
    # 找到所有图片链接
    images = soup.find_all('img')
    for img in images:
        img_url = img.get('src')
        if img_url and img_url.startswith('http'):
            # 提取文件名
            filename = img_url.split('/')[-1]
            save_path = f"{save_dir}/{filename}"
            # 下载图片
            img_data = requests.get(img_url).content
            with open(save_path, 'wb') as f:
                f.write(img_data)

实用工具脚本

断点续传 + 多线程

import requests
import threading
import os
class Downloader:
    def __init__(self, url, filename, num_threads=4):
        self.url = url
        self.filename = filename
        self.num_threads = num_threads
    def download(self):
        # 获取文件大小
        response = requests.head(self.url)
        file_size = int(response.headers.get('content-length', 0))
        # 分块下载
        chunk_size = file_size // self.num_threads
        threads = []
        for i in range(self.num_threads):
            start = i * chunk_size
            end = start + chunk_size - 1 if i < self.num_threads - 1 else file_size - 1
            thread = threading.Thread(
                target=self._download_chunk,
                args=(start, end, f"{self.filename}.part{i}")
            )
            threads.append(thread)
            thread.start()
        # 等待所有线程完成
        for thread in threads:
            thread.join()
        # 合并文件
        self._merge_files()
    def _download_chunk(self, start, end, temp_file):
        headers = {'Range': f'bytes={start}-{end}'}
        response = requests.get(self.url, headers=headers)
        with open(temp_file, 'wb') as f:
            f.write(response.content)

使用建议

速度限制：添加 time.sleep() 避免触发反爬虫
错误重试：建议添加重试机制
日志记录：记录下载成功/失败的文件
断点续传：支持已下载文件的跳过
代理支持：必要时使用代理

根据您的具体需求（下载什么类型文件、来源、系统环境），选择合适的方案，如果需要更具体的示例,请告诉我您的使用场景。