Python案例怎么求取数据差集？

wen python案例 2026-06-09 55

本文目录导读：

Python案例怎么求取数据差集？

使用集合（Set）的差集运算
使用列表推导式（保留重复元素和顺序）
使用 filter() 函数
处理字典数据
处理自定义对象
实用案例：文件内容对比
性能比较
总结建议

在Python中求取数据差集（即在一个集合中但不在另一个集合中的元素）有多种方法，以下是几种常见方式：

使用集合（Set）的差集运算

运算符

list1 = [1, 2, 3, 4, 5]
list2 = [3, 4, 5, 6, 7]
set1 = set(list1)
set2 = set(list2)
# 求在list1但不在list2中的元素
diff = set1 - set2
print(diff)  # {1, 2}
# 求在list2但不在list1中的元素
diff2 = set2 - set1
print(diff2)  # {6, 7}

`difference()` 方法

list1 = [1, 2, 3, 4, 5]
list2 = [3, 4, 5, 6, 7]
set1 = set(list1)
set2 = set(list2)
# set1.difference(set2) 等价于 set1 - set2
diff = set1.difference(set2)
print(diff)  # {1, 2}

使用列表推导式（保留重复元素和顺序）

list1 = [1, 2, 3, 4, 5, 3, 2]
list2 = [3, 4, 5]
# 保留list1中不在list2中的元素（包括重复）
diff = [x for x in list1 if x not in set(list2)]
print(diff)  # [1, 2, 2]

使用 `filter()` 函数

list1 = [1, 2, 3, 4, 5]
list2 = [3, 4, 5, 6, 7]
set2 = set(list2)
diff = list(filter(lambda x: x not in set2, list1))
print(diff)  # [1, 2]

处理字典数据

dict1 = {'a': 1, 'b': 2, 'c': 3}
dict2 = {'b': 2, 'c': 3, 'd': 4}
# 求键的差集
key_diff = set(dict1.keys()) - set(dict2.keys())
print(key_diff)  # {'a'}
# 求键值对的差集
items_diff = set(dict1.items()) - set(dict2.items())
print(items_diff)  # {('a', 1)}

处理自定义对象

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    def __repr__(self):
        return f"Person({self.name}, {self.age})"
    def __eq__(self, other):
        return self.name == other.name
    def __hash__(self):
        return hash(self.name)
# 创建两个列表
people1 = [Person("Alice", 25), Person("Bob", 30), Person("Charlie", 35)]
people2 = [Person("Bob", 30), Person("David", 40)]
# 求差集（基于name属性比较）
set1 = set(people1)
set2 = set(people2)
diff = set1 - set2
print(diff)  # {Person(Alice, 25), Person(Charlie, 35)}

实用案例：文件内容对比

# 读取两个文件并求差集
def file_diff(file1_path, file2_path):
    with open(file1_path, 'r', encoding='utf-8') as f1, \
         open(file2_path, 'r', encoding='utf-8') as f2:
        lines1 = set(line.strip() for line in f1)
        lines2 = set(line.strip() for line in f2)
    # 只在file1中的行
    only_in_file1 = lines1 - lines2
    # 只在file2中的行
    only_in_file2 = lines2 - lines1
    return only_in_file1, only_in_file2
# 使用示例
diff1, diff2 = file_diff('file1.txt', 'file2.txt')
print(f"只在file1中的行: {diff1}")
print(f"只在file2中的行: {diff2}")

性能比较

import time
# 大数据集测试
large_list1 = list(range(100000))
large_list2 = list(range(50000, 150000))
# 方法1: 集合运算
start = time.time()
diff1 = set(large_list1) - set(large_list2)
print(f"集合运算耗时: {time.time() - start:.4f}秒")
# 方法2: 列表推导式（慢）
start = time.time()
set2 = set(large_list2)
diff2 = [x for x in large_list1 if x not in set2]
print(f"列表推导式耗时: {time.time() - start:.4f}秒")
# 方法3: filter函数（中等）
start = time.time()
set2 = set(large_list2)
diff3 = list(filter(lambda x: x not in set2, large_list1))
print(f"filter函数耗时: {time.time() - start:.4f}秒")