批量下载脚本怎做?

wen 实用脚本 46

本文目录导读:

批量下载脚本怎做?

  1. wget 批量下载(Linux/Unix)
  2. curl 批量下载脚本
  3. aria2 批量下载(推荐)
  4. Python 脚本批量下载
  5. PowerShell 批量下载(Windows)
  6. 特殊场景脚本
  7. 实用工具脚本
  8. 使用建议

wget 批量下载(Linux/Unix)

从URL列表下载

# 创建URL列表文件 urls.txt,每行一个URL
# 然后执行:
wget -i urls.txt

通配符批量下载

# 下载所有匹配的文件
wget http://example.com/images/{1..100}.jpg
# 或使用通配符
wget http://example.com/files/*.zip

curl 批量下载脚本

#!/bin/bash
# download_images.sh
for i in $(seq 1 100); do
    url="https://example.com/images/image_$i.jpg"
    curl -O "$url"
    echo "Downloaded image $i"
    # 添加延迟避免被封
    sleep 1
done

aria2 批量下载(推荐)

安装

# Ubuntu/Debian
sudo apt install aria2
# macOS
brew install aria2

使用方式

# 从文件列表下载
aria2c -i urls.txt
# 多个连接加速
aria2c -x 16 -s 16 -j 5 -i urls.txt

Python 脚本批量下载

import requests
import os
from concurrent.futures import ThreadPoolExecutor
import urllib.parse
def download_file(url, save_path):
    try:
        response = requests.get(url, timeout=30)
        response.raise_for_status()
        # 确保目录存在
        os.makedirs(os.path.dirname(save_path), exist_ok=True)
        with open(save_path, 'wb') as f:
            f.write(response.content)
        print(f"Downloaded: {url}")
        return True
    except Exception as e:
        print(f"Failed: {url} - {e}")
        return False
def batch_download(urls, save_dir, max_workers=5):
    """批量下载文件"""
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        for url in urls:
            # 从URL提取文件名
            parsed = urllib.parse.urlparse(url)
            filename = os.path.basename(parsed.path)
            save_path = os.path.join(save_dir, filename)
            executor.submit(download_file, url, save_path)
# 使用示例
urls = [
    "https://example.com/file1.pdf",
    "https://example.com/file2.pdf",
    # 更多URL...
]
batch_download(urls, "./downloads", max_workers=3)

PowerShell 批量下载(Windows)

# download_files.ps1
$urls = @(
    "https://example.com/file1.pdf",
    "https://example.com/file2.pdf"
)
$outputDir = "C:\Downloads"
foreach ($url in $urls) {
    $filename = Split-Path $url -Leaf
    $outputPath = Join-Path $outputDir $filename
    Write-Host "Downloading $filename..."
    Invoke-WebRequest -Uri $url -OutFile $outputPath
    # 添加延迟
    Start-Sleep -Seconds 1
}

特殊场景脚本

图片批量下载(带解析)

# 爬取页面中的图片
import requests
from bs4 import BeautifulSoup
import re
def download_images_from_page(page_url, save_dir):
    response = requests.get(page_url)
    soup = BeautifulSoup(response.text, 'html.parser')
    # 找到所有图片链接
    images = soup.find_all('img')
    for img in images:
        img_url = img.get('src')
        if img_url and img_url.startswith('http'):
            # 提取文件名
            filename = img_url.split('/')[-1]
            save_path = f"{save_dir}/{filename}"
            # 下载图片
            img_data = requests.get(img_url).content
            with open(save_path, 'wb') as f:
                f.write(img_data)

实用工具脚本

断点续传 + 多线程

import requests
import threading
import os
class Downloader:
    def __init__(self, url, filename, num_threads=4):
        self.url = url
        self.filename = filename
        self.num_threads = num_threads
    def download(self):
        # 获取文件大小
        response = requests.head(self.url)
        file_size = int(response.headers.get('content-length', 0))
        # 分块下载
        chunk_size = file_size // self.num_threads
        threads = []
        for i in range(self.num_threads):
            start = i * chunk_size
            end = start + chunk_size - 1 if i < self.num_threads - 1 else file_size - 1
            thread = threading.Thread(
                target=self._download_chunk,
                args=(start, end, f"{self.filename}.part{i}")
            )
            threads.append(thread)
            thread.start()
        # 等待所有线程完成
        for thread in threads:
            thread.join()
        # 合并文件
        self._merge_files()
    def _download_chunk(self, start, end, temp_file):
        headers = {'Range': f'bytes={start}-{end}'}
        response = requests.get(self.url, headers=headers)
        with open(temp_file, 'wb') as f:
            f.write(response.content)

使用建议

  1. 速度限制:添加 time.sleep() 避免触发反爬虫
  2. 错误重试:建议添加重试机制
  3. 日志记录:记录下载成功/失败的文件
  4. 断点续传:支持已下载文件的跳过
  5. 代理支持:必要时使用代理

根据您的具体需求(下载什么类型文件、来源、系统环境),选择合适的方案,如果需要更具体的示例,请告诉我您的使用场景。

抱歉,评论功能暂时关闭!