实用脚本能批量规则吗?

wen 实用脚本 52

本文目录导读:

实用脚本能批量规则吗?

  1. 文本文件批量处理(sed/awk)
  2. Python正则批量规则
  3. 图片/PDF批量处理(Pillow/PDFPlumber)
  4. Excel/CSV批量转换规则
  5. 网络请求批量规则(爬虫+API)
  6. 配置文件驱动的规则(最灵活)
  7. 常用场景示例

可以的!实用脚本完全可以实现批量规则处理,根据你的具体需求场景,常见的有以下几种实现方式:

文本文件批量处理(sed/awk)

# 批量替换多个文件中多个规则
find . -name "*.txt" -exec sed -i 's/old/new/g; s/foo/bar/g' {} \;
# awk批量处理规则:过滤+替换
awk '{ if ($1 > 100) $1 = "High"; print }' input.txt > output.txt

Python正则批量规则

import re
import os
rules = [
    (r'\d{4}-\d{2}-\d{2}', '[DATE]'),  # 日期隐藏
    (r'1[3-9]\d{9}', '[PHONE]'),        # 手机号脱敏
    (r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', '[EMAIL]')
]
def apply_rules(text):
    for pattern, replacement in rules:
        text = re.sub(pattern, replacement, text)
    return text
# 批量处理文件夹下所有文件
for filename in os.listdir('data/'):
    with open(f'data/{filename}') as f:
        content = f.read()
    with open(f'output/{filename}', 'w') as f:
        f.write(apply_rules(content))

图片/PDF批量处理(Pillow/PDFPlumber)

# 批量调整图片尺寸+水印(规则链)
from PIL import Image, ImageDraw, ImageFont
import os
for img_file in os.listdir('photos/'):
    img = Image.open(f'photos/{img_file}')
    # 规则1:缩放到指定宽度
    img = img.resize((800, int(800*img.height/img.width)))
    # 规则2:添加水印
    draw = ImageDraw.Draw(img)
    draw.text((10, 10), "Sample Watermark", fill='red')
    img.save(f'processed/{img_file}')

Excel/CSV批量转换规则

# pandas批量应用规则
import pandas as pd
df = pd.read_csv('data.csv')
# 定义规则字典
rules = {
    'age': lambda x: 'Adult' if x >= 18 else 'Minor',  # 分类
    'salary': lambda x: x * 1.1,                        # 涨薪10%
    'name': lambda x: x.strip().title()                 # 规范姓名
}
df = df.assign(**{col: df[col].apply(rule) for col, rule in rules.items()})
df.to_excel('processed.xlsx')

网络请求批量规则(爬虫+API)

import requests
from concurrent.futures import ThreadPoolExecutor
urls = ['url1', 'url2', 'url3']
# 规则:提取标题+价格
def process_url(url):
    resp = requests.get(url, timeout=10)
    # 假设返回JSON
    return {'title': resp.json().get('title'), 'price': resp.json().get('price')}
with ThreadPoolExecutor(max_workers=5) as executor:
    results = list(executor.map(process_url, urls))

配置文件驱动的规则(最灵活)

# rules.yaml
rules:
  - type: replace
    pattern: '\d{4}-\d{2}-\d{2}'
    replacement: '[DATE]'
  - type: lowercase
  - type: trim
import yaml, re
with open('rules.yaml') as f:
    config = yaml.safe_load(f)
def apply_rules(text):
    for rule in config['rules']:
        if rule['type'] == 'replace':
            text = re.sub(rule['pattern'], rule['replacement'], text)
        elif rule['type'] == 'lowercase':
            text = text.lower()
        elif rule['type'] == 'trim':
            text = text.strip()
    return text

常用场景示例

场景 规则类型 推荐工具
日志脱敏 正则替换 sed / Python
批量重命名 文件操作 rename / Python os.rename
数据清洗 条件过滤 awk / pandas
图片批处理 格式转换 ImageMagick / Pillow
PDF提取 规则提取 PDFPlumber / camelot
网页爬虫 解析规则 BeautifulSoup / Scrapy

只要规则是可逻辑化描述的(正则、条件判断、数学运算、字符串操作),就可以用脚本批量执行,关键是先定义好规则的类型和顺序,然后交给循环或迭代器处理,需要具体某个场景的代码示例吗?

抱歉,评论功能暂时关闭!