Python案例怎么统计数据出现频次？

wen python案例 2026-06-09 52

Python统计数据出现频次的多种方法

以下是几种常用的统计频次方法,从简单到复杂：

Python案例怎么统计数据出现频次？

使用字典手动统计

# 基础方法：使用字典
data = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
freq_dict = {}
for item in data:
    if item in freq_dict:
        freq_dict[item] += 1
    else:
        freq_dict[item] = 1
print(freq_dict)
# 输出: {'apple': 3, 'banana': 2, 'orange': 1}

使用 `collections.Counter`（推荐）

from collections import Counter
data = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
counter = Counter(data)
print(counter)
# 输出: Counter({'apple': 3, 'banana': 2, 'orange': 1})
# 获取最常见的2个
print(counter.most_common(2))
# 输出: [('apple', 3), ('banana', 2)]

使用 `pandas.value_counts()`（适合数据分析）

import pandas as pd
data = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
series = pd.Series(data)
freq = series.value_counts()
print(freq)
# 输出:
# apple     3
# banana    2
# orange    1
# dtype: int64
# 按索引排序
print(series.value_counts().sort_index())

统计数字频次

# 统计数字出现频次
numbers = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
from collections import Counter
count = Counter(numbers)
print(count)
# 输出: Counter({4: 4, 3: 3, 2: 2, 1: 1})
# 统计数字在某个范围内的分布
histogram = {}
for num in numbers:
    range_key = num // 2 * 2  # 每2个数一组
    if range_key not in histogram:
        histogram[range_key] = 0
    histogram[range_key] += 1
print(histogram)
# 输出: {0: 1, 2: 2, 4: 3}

使用 `numpy.unique()` 和 `return_counts`

import numpy as np
data = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
unique_values, counts = np.unique(data, return_counts=True)
freq_dict = dict(zip(unique_values, counts))
print(freq_dict)
# 输出: {'apple': 3, 'banana': 2, 'orange': 1}

实用案例：文本词频统计

from collections import Counter
import re
text = "Python is awesome. Python is easy to learn. Python is powerful."
# 清洗文本并分词
words = re.findall(r'\b\w+\b', text.lower())
# 统计词频
word_freq = Counter(words)
print(word_freq)
# 输出: Counter({'python': 3, 'is': 3, 'awesome': 1, 'easy': 1, 'to': 1, 'learn': 1, 'powerful': 1})
# 显示最常用的5个词
for word, count in word_freq.most_common(5):
    print(f"{word}: {count}次")

按条件统计频次

# 统计指定条件的频次
scores = [85, 92, 78, 90, 85, 88, 92, 95, 85]
# 统计分数段
grade_ranges = {'A': (90, 101), 'B': (80, 90), 'C': (70, 80)}
frequency = {}
for score in scores:
    for grade, (low, high) in grade_ranges.items():
        if low <= score < high:
            frequency[grade] = frequency.get(grade, 0) + 1
            break
print(frequency)
# 输出: {'A': 3, 'B': 4, 'C': 1}

方法	适用场景
`Counter`	通用统计，快速简单
`pandas.value_counts()`	数据分析，需要可视化或进一步处理
`numpy.unique()`	数组操作，需要同时获取唯一值和频次
字典手动统计	需要更多控制或自定义统计逻辑
`Counter.most_common()`	需要排序结果，如Top N统计

Python案例怎么统计数据出现频次？

Python统计数据出现频次的多种方法

使用字典手动统计

使用 collections.Counter（推荐）

使用 pandas.value_counts()（适合数据分析）

统计数字频次

使用 numpy.unique() 和 return_counts

实用案例：文本词频统计

按条件统计频次

推荐使用场景

使用 `collections.Counter`（推荐）

使用 `pandas.value_counts()`（适合数据分析）

使用 `numpy.unique()` 和 `return_counts`