Python案例怎么统计数据出现频次?

wen python案例 17

Python统计数据出现频次的多种方法

以下是几种常用的统计频次方法,从简单到复杂:

Python案例怎么统计数据出现频次?

使用字典手动统计

# 基础方法:使用字典
data = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
freq_dict = {}
for item in data:
    if item in freq_dict:
        freq_dict[item] += 1
    else:
        freq_dict[item] = 1
print(freq_dict)
# 输出: {'apple': 3, 'banana': 2, 'orange': 1}

使用 collections.Counter(推荐)

from collections import Counter
data = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
counter = Counter(data)
print(counter)
# 输出: Counter({'apple': 3, 'banana': 2, 'orange': 1})
# 获取最常见的2个
print(counter.most_common(2))
# 输出: [('apple', 3), ('banana', 2)]

使用 pandas.value_counts()(适合数据分析)

import pandas as pd
data = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
series = pd.Series(data)
freq = series.value_counts()
print(freq)
# 输出:
# apple     3
# banana    2
# orange    1
# dtype: int64
# 按索引排序
print(series.value_counts().sort_index())

统计数字频次

# 统计数字出现频次
numbers = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
from collections import Counter
count = Counter(numbers)
print(count)
# 输出: Counter({4: 4, 3: 3, 2: 2, 1: 1})
# 统计数字在某个范围内的分布
histogram = {}
for num in numbers:
    range_key = num // 2 * 2  # 每2个数一组
    if range_key not in histogram:
        histogram[range_key] = 0
    histogram[range_key] += 1
print(histogram)
# 输出: {0: 1, 2: 2, 4: 3}

使用 numpy.unique()return_counts

import numpy as np
data = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
unique_values, counts = np.unique(data, return_counts=True)
freq_dict = dict(zip(unique_values, counts))
print(freq_dict)
# 输出: {'apple': 3, 'banana': 2, 'orange': 1}

实用案例:文本词频统计

from collections import Counter
import re
text = "Python is awesome. Python is easy to learn. Python is powerful."
# 清洗文本并分词
words = re.findall(r'\b\w+\b', text.lower())
# 统计词频
word_freq = Counter(words)
print(word_freq)
# 输出: Counter({'python': 3, 'is': 3, 'awesome': 1, 'easy': 1, 'to': 1, 'learn': 1, 'powerful': 1})
# 显示最常用的5个词
for word, count in word_freq.most_common(5):
    print(f"{word}: {count}次")

按条件统计频次

# 统计指定条件的频次
scores = [85, 92, 78, 90, 85, 88, 92, 95, 85]
# 统计分数段
grade_ranges = {'A': (90, 101), 'B': (80, 90), 'C': (70, 80)}
frequency = {}
for score in scores:
    for grade, (low, high) in grade_ranges.items():
        if low <= score < high:
            frequency[grade] = frequency.get(grade, 0) + 1
            break
print(frequency)
# 输出: {'A': 3, 'B': 4, 'C': 1}

推荐使用场景

方法 适用场景
Counter 通用统计,快速简单
pandas.value_counts() 数据分析,需要可视化或进一步处理
numpy.unique() 数组操作,需要同时获取唯一值和频次
字典手动统计 需要更多控制或自定义统计逻辑
Counter.most_common() 需要排序结果,如Top N统计

最简单推荐from collections import Counter,一行代码就能解决大部分统计需求。

抱歉,评论功能暂时关闭!