本文目录导读:

- 文本文件批量处理(sed/awk)
- Python正则批量规则
- 图片/PDF批量处理(Pillow/PDFPlumber)
- Excel/CSV批量转换规则
- 网络请求批量规则(爬虫+API)
- 配置文件驱动的规则(最灵活)
- 常用场景示例
可以的!实用脚本完全可以实现批量规则处理,根据你的具体需求场景,常见的有以下几种实现方式:
文本文件批量处理(sed/awk)
# 批量替换多个文件中多个规则
find . -name "*.txt" -exec sed -i 's/old/new/g; s/foo/bar/g' {} \;
# awk批量处理规则:过滤+替换
awk '{ if ($1 > 100) $1 = "High"; print }' input.txt > output.txt
Python正则批量规则
import re
import os
rules = [
(r'\d{4}-\d{2}-\d{2}', '[DATE]'), # 日期隐藏
(r'1[3-9]\d{9}', '[PHONE]'), # 手机号脱敏
(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', '[EMAIL]')
]
def apply_rules(text):
for pattern, replacement in rules:
text = re.sub(pattern, replacement, text)
return text
# 批量处理文件夹下所有文件
for filename in os.listdir('data/'):
with open(f'data/{filename}') as f:
content = f.read()
with open(f'output/{filename}', 'w') as f:
f.write(apply_rules(content))
图片/PDF批量处理(Pillow/PDFPlumber)
# 批量调整图片尺寸+水印(规则链)
from PIL import Image, ImageDraw, ImageFont
import os
for img_file in os.listdir('photos/'):
img = Image.open(f'photos/{img_file}')
# 规则1:缩放到指定宽度
img = img.resize((800, int(800*img.height/img.width)))
# 规则2:添加水印
draw = ImageDraw.Draw(img)
draw.text((10, 10), "Sample Watermark", fill='red')
img.save(f'processed/{img_file}')
Excel/CSV批量转换规则
# pandas批量应用规则
import pandas as pd
df = pd.read_csv('data.csv')
# 定义规则字典
rules = {
'age': lambda x: 'Adult' if x >= 18 else 'Minor', # 分类
'salary': lambda x: x * 1.1, # 涨薪10%
'name': lambda x: x.strip().title() # 规范姓名
}
df = df.assign(**{col: df[col].apply(rule) for col, rule in rules.items()})
df.to_excel('processed.xlsx')
网络请求批量规则(爬虫+API)
import requests
from concurrent.futures import ThreadPoolExecutor
urls = ['url1', 'url2', 'url3']
# 规则:提取标题+价格
def process_url(url):
resp = requests.get(url, timeout=10)
# 假设返回JSON
return {'title': resp.json().get('title'), 'price': resp.json().get('price')}
with ThreadPoolExecutor(max_workers=5) as executor:
results = list(executor.map(process_url, urls))
配置文件驱动的规则(最灵活)
# rules.yaml
rules:
- type: replace
pattern: '\d{4}-\d{2}-\d{2}'
replacement: '[DATE]'
- type: lowercase
- type: trim
import yaml, re
with open('rules.yaml') as f:
config = yaml.safe_load(f)
def apply_rules(text):
for rule in config['rules']:
if rule['type'] == 'replace':
text = re.sub(rule['pattern'], rule['replacement'], text)
elif rule['type'] == 'lowercase':
text = text.lower()
elif rule['type'] == 'trim':
text = text.strip()
return text
常用场景示例
| 场景 | 规则类型 | 推荐工具 |
|---|---|---|
| 日志脱敏 | 正则替换 | sed / Python |
| 批量重命名 | 文件操作 | rename / Python os.rename |
| 数据清洗 | 条件过滤 | awk / pandas |
| 图片批处理 | 格式转换 | ImageMagick / Pillow |
| PDF提取 | 规则提取 | PDFPlumber / camelot |
| 网页爬虫 | 解析规则 | BeautifulSoup / Scrapy |
只要规则是可逻辑化描述的(正则、条件判断、数学运算、字符串操作),就可以用脚本批量执行,关键是先定义好规则的类型和顺序,然后交给循环或迭代器处理,需要具体某个场景的代码示例吗?