本文目录导读:

当然可以!批量总结需要结合脚本和合适的工具,以下提供几种实用方案,涵盖不同场景:
文本文件批量总结(Python)
适用于 .txt/.md 等纯文本文件:
import os
from transformers import pipeline
# 加载总结模型(首次需下载)
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
def batch_summarize(folder_path, output_file="summary.txt"):
with open(output_file, 'w', encoding='utf-8') as out:
for filename in os.listdir(folder_path):
if filename.endswith(('.txt', '.md')):
filepath = os.path.join(folder_path, filename)
with open(filepath, 'r', encoding='utf-8') as f:
text = f.read()
# 生成摘要(根据文本长度调整参数)
summary = summarizer(text, max_length=150, min_length=30, do_sample=False)[0]['summary_text']
out.write(f"=== {filename} ===\n{summary}\n\n")
print(f"已处理: {filename}")
PDF文档批量总结(免费方案)
使用PyMuPDF + OpenAI API(可替换为本地模型):
pip install pymupdf openai
import fitz # PyMuPDF
import os
from openai import OpenAI
client = OpenAI(api_key="你的API密钥")
def pdf_batch_summary(pdf_folder, output_file="pdf_summary.txt"):
with open(output_file, 'w') as out:
for pdf in os.listdir(pdf_folder):
if pdf.endswith('.pdf'):
path = os.path.join(pdf_folder, pdf)
doc = fitz.open(path)
text = "".join([page.get_text() for page in doc])
# 分块处理长文档
chunks = [text[i:i+3000] for i in range(0, len(text), 3000)]
summaries = []
for chunk in chunks[:5]: # 限制处理前5块
resp = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": f"用中文总结以下内容:{chunk}"}]
)
summaries.append(resp.choices[0].message.content)
final_summary = "\n".join(summaries[:3])
out.write(f"### {pdf}\n{final_summary}\n\n")
网页批量总结(爬虫+AI)
批量抓取URL列表并生成摘要:
import requests
from bs4 import BeautifulSoup
urls = ["https://example.com/article1", "https://example.com/article2"]
summarizer = pipeline("summarization")
for url in urls:
try:
resp = requests.get(url, timeout=10)
soup = BeautifulSoup(resp.text, 'html.parser')
article = " ".join([p.text for p in soup.find_all('p')])[:5000]
summary = summarizer(article, max_length=100)[0]['summary_text']
print(f"{url} -> {summary}")
except Exception as e:
print(f"失败: {url}, 错误: {e}")
命令行快捷工具(无需编程)
使用 sumy 包(基于文本统计):
# 安装
pip install sumy
# 批量处理文件夹内所有文件
find ./docs -name "*.txt" -exec sh -c 'sumy lex-rank --length=3 --file="{}" --language=chinese' \;
企业级批量方案
跨平台桌面工具(推荐)
- PDF Summarizer(免费):支持批量拖拽PDF,一键生成摘要
- TextAnalyzer:支持多种格式批量输出
在线服务
- Resoomer:网页批量输入文本
- QuillBot:支持10万字批量处理
注意事项
- API费用:AI方案需考虑token消耗
- 本地模型:可改用
microsoft/phi-3-mini等免费模型 - 文本长度:长文档需分块处理(通常每块3000字以内)
- 输出格式:可自动生成Markdown/CSV报表
- 隐私安全:敏感数据建议本地运行
需要哪种具体场景的完整脚本?我可以提供更针对性的实现。