Python案例如何求取数据并集？

wen python案例 2026-06-09 58

本文目录导读：

Python案例如何求取数据并集？

列表（List）的并集
集合（Set）的并集
多个列表的并集
字典的并集（合并字典）
自定义对象的并集
使用 Pandas 的 DataFrame 并集

在Python中求取数据并集有多种方法,主要取决于你的数据结构，以下是几种常见场景及对应代码示例：

列表（List）的并集

方法1：使用集合转换

list1 = [1, 2, 3, 4, 5]
list2 = [4, 5, 6, 7, 8]
# 转换为集合求并集，再转回列表
union_list = list(set(list1) | set(list2))
print(union_list)  # [1, 2, 3, 4, 5, 6, 7, 8]

方法2：保持原始顺序

list1 = [1, 2, 3, 4, 5]
list2 = [4, 5, 6, 7, 8]
# 保持原始顺序，无重复
union_list = list(dict.fromkeys(list1 + list2))
print(union_list)  # [1, 2, 3, 4, 5, 6, 7, 8]

集合（Set）的并集

set1 = {1, 2, 3, 4, 5}
set2 = {4, 5, 6, 7, 8}
# 方法1：使用 | 运算符
union_set = set1 | set2
print(union_set)  # {1, 2, 3, 4, 5, 6, 7, 8}
# 方法2：使用 union() 方法
union_set = set1.union(set2)
print(union_set)  # {1, 2, 3, 4, 5, 6, 7, 8}

多个列表的并集

list1 = [1, 2, 3]
list2 = [3, 4, 5]
list3 = [5, 6, 7]
# 使用 reduce 或循环
from functools import reduce
union_list = list(reduce(lambda x, y: set(x) | set(y), [list1, list2, list3]))
print(union_list)  # [1, 2, 3, 4, 5, 6, 7]
# 或者直接组合
union_list = list(set(list1) | set(list2) | set(list3))

字典的并集（合并字典）

dict1 = {'a': 1, 'b': 2}
dict2 = {'b': 3, 'c': 4}
# Python 3.9+ 使用 | 运算符
union_dict = dict1 | dict2
print(union_dict)  # {'a': 1, 'b': 3, 'c': 4}  # 注意 b 被覆盖
# 合并并保留所有键（不覆盖）
union_dict = {**dict1, **dict2}  # 与上面相同

自定义对象的并集

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    def __hash__(self):
        return hash((self.name, self.age))
    def __eq__(self, other):
        return self.name == other.name and self.age == other.age
    def __repr__(self):
        return f"Person({self.name}, {self.age})"
persons1 = [Person("Alice", 25), Person("Bob", 30)]
persons2 = [Person("Bob", 30), Person("Charlie", 35)]
# 定义 __hash__ 和 __eq__ 后可使用集合
union_persons = list(set(persons1) | set(persons2))
print(union_persons)  # [Person(Alice, 25), Person(Bob, 30), Person(Charlie, 35)]

使用 Pandas 的 DataFrame 并集

import pandas as pd
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']})
df2 = pd.DataFrame({'A': [3, 4, 5], 'B': ['c', 'd', 'e']})
# 并集（默认保留重复行）
union_df = pd.concat([df1, df2]).drop_duplicates().reset_index(drop=True)
print(union_df)
#    A  B
# 0  1  a
# 1  2  b
# 2  3  c
# 3  4  d
# 4  5  e

数据结构	方法	是否保留顺序	是否去重
List → Set	`set(list1) \| set(list2)`	否	是
List → dict	`dict.fromkeys(list1 + list2)`	是	是
Set	`set1 \| set2` 或 `set1.union(set2)`	自动
Dict	`dict1 \| dict2` (Python 3.9+)	是(3.7+)	是
Pandas DataFrame	`pd.concat([df1, df2]).drop_duplicates()`	是	是

选择哪种方法取决于你的具体需求：是否需要保持顺序、元素类型是否可哈希、是否需要处理重复项等。