Python统计数据出现频次的多种方法
以下是几种常用的统计频次方法,从简单到复杂:

使用字典手动统计
# 基础方法:使用字典
data = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
freq_dict = {}
for item in data:
if item in freq_dict:
freq_dict[item] += 1
else:
freq_dict[item] = 1
print(freq_dict)
# 输出: {'apple': 3, 'banana': 2, 'orange': 1}
使用 collections.Counter(推荐)
from collections import Counter
data = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
counter = Counter(data)
print(counter)
# 输出: Counter({'apple': 3, 'banana': 2, 'orange': 1})
# 获取最常见的2个
print(counter.most_common(2))
# 输出: [('apple', 3), ('banana', 2)]
使用 pandas.value_counts()(适合数据分析)
import pandas as pd data = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple'] series = pd.Series(data) freq = series.value_counts() print(freq) # 输出: # apple 3 # banana 2 # orange 1 # dtype: int64 # 按索引排序 print(series.value_counts().sort_index())
统计数字频次
# 统计数字出现频次
numbers = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
from collections import Counter
count = Counter(numbers)
print(count)
# 输出: Counter({4: 4, 3: 3, 2: 2, 1: 1})
# 统计数字在某个范围内的分布
histogram = {}
for num in numbers:
range_key = num // 2 * 2 # 每2个数一组
if range_key not in histogram:
histogram[range_key] = 0
histogram[range_key] += 1
print(histogram)
# 输出: {0: 1, 2: 2, 4: 3}
使用 numpy.unique() 和 return_counts
import numpy as np
data = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
unique_values, counts = np.unique(data, return_counts=True)
freq_dict = dict(zip(unique_values, counts))
print(freq_dict)
# 输出: {'apple': 3, 'banana': 2, 'orange': 1}
实用案例:文本词频统计
from collections import Counter
import re
text = "Python is awesome. Python is easy to learn. Python is powerful."
# 清洗文本并分词
words = re.findall(r'\b\w+\b', text.lower())
# 统计词频
word_freq = Counter(words)
print(word_freq)
# 输出: Counter({'python': 3, 'is': 3, 'awesome': 1, 'easy': 1, 'to': 1, 'learn': 1, 'powerful': 1})
# 显示最常用的5个词
for word, count in word_freq.most_common(5):
print(f"{word}: {count}次")
按条件统计频次
# 统计指定条件的频次
scores = [85, 92, 78, 90, 85, 88, 92, 95, 85]
# 统计分数段
grade_ranges = {'A': (90, 101), 'B': (80, 90), 'C': (70, 80)}
frequency = {}
for score in scores:
for grade, (low, high) in grade_ranges.items():
if low <= score < high:
frequency[grade] = frequency.get(grade, 0) + 1
break
print(frequency)
# 输出: {'A': 3, 'B': 4, 'C': 1}
推荐使用场景
| 方法 | 适用场景 |
|---|---|
Counter |
通用统计,快速简单 |
pandas.value_counts() |
数据分析,需要可视化或进一步处理 |
numpy.unique() |
数组操作,需要同时获取唯一值和频次 |
| 字典手动统计 | 需要更多控制或自定义统计逻辑 |
Counter.most_common() |
需要排序结果,如Top N统计 |
最简单推荐:from collections import Counter,一行代码就能解决大部分统计需求。