热点新闻
Pandas - 10.1 聚合groupby-agg/aggreagte
2023-07-27 18:35  浏览:845  搜索引擎搜索“手机速企网”
温馨提示:信息一旦丢失不一定找得到,请务必收藏信息以备急用!本站所有信息均是注册会员发布如遇到侵权请联系文章中的联系方式或客服删除!
联系我时,请说明是在手机速企网看到的信息,谢谢。
展会发布 展会网站大全 报名观展合作 软文发布

可以与groupby一起使用的方法或函数

count / np.count_nonzero 统计频数(不包含NaN值)
size 统计频数 (包含NaN值)
mean / np.mean 求平均值
std / np.std 样本标准差
min /np.min 最小值
quantile(q=0.25) / np.percentile(q=0.25) 较小四分位数
quantile(q=0.5) / np.percentile(q=0.5) 中位数
quantile(q=0.75) / np.percentile(q=0.75) 较大四分位数
max / np.max 最大值
sum / np.sum 求和
var / np.var 无偏方差
sem / scipy.stats.sem 平均值的无偏方差
describe / scipy.stats.describe 统计信息描述
frist 返回第一行
last 返回最后一行
nth 返回第n行

import pandas as pd df = pd.read_csv('data/gapminder.tsv', sep='\t') continent_describe = df.groupby('continent').lifeExp.describe() print(continent_describe) ''' count mean std min 25% 50% 75% \ continent Africa 624.0 48.865330 9.150210 23.599 42.37250 47.7920 54.41150 Americas 300.0 64.658737 9.345088 37.579 58.41000 67.0480 71.69950 Asia 396.0 60.064903 11.864532 28.801 51.42625 61.7915 69.50525 Europe 360.0 71.903686 5.433178 43.585 69.57000 72.2410 75.45050 Oceania 24.0 74.326208 3.795611 69.120 71.20500 73.6650 77.55250 max continent Africa 76.442 Americas 80.653 Asia 82.603 Europe 81.757 Oceania 81.235 '''

聚合函数

除了上面列出的函数,可以调用agg或aggregate方法传入想用的聚合函数。

  • 传入其他库的函数
  • 传入自定义的函数

传入其他库的函数

import numpy as np cont_le_agg = df.groupby('continent').lifeExp.agg(np.mean) print(cont_le_agg) ''' continent Africa 48.865330 Americas 64.658737 Asia 60.064903 Europe 71.903686 Oceania 74.326208 Name: lifeExp, dtype: float64 ''' cont_le_agg2 = df.groupby('continent').lifeExp.aggregate(np.mean) print(cont_le_agg2) ''' continent Africa 48.865330 Americas 64.658737 Asia 60.064903 Europe 71.903686 Oceania 74.326208 Name: lifeExp, dtype: float64 '''

自定义函数

def my_mean(values): n = len(values) sum = 0 for value in values: sum += value return (sum/n) agg_my_mean = df.groupby('continent').lifeExp.aggregate(my_mean) print(agg_my_mean) ''' continent Africa 48.865330 Americas 64.658737 Asia 60.064903 Europe 71.903686 Oceania 74.326208 Name: lifeExp, dtype: float64 '''

带有多个参数的自定义聚合函数,第一个参数是值序列,其他参数作为关键字传入agg

def my_mean_diff(values, diff_value): n = len(values) sum =0 for value in values: sum += value mean = sum/n return (mean - diff_value) global_mean = df.lifeExp.mean() print(global_mean) # 59.47443936619713 agg_mean_diff = df.groupby('year').lifeExp.agg(my_mean_diff, diff_value=global_mean) print(agg_mean_diff) ''' year 1952 -10.416820 1957 -7.967038 1962 -5.865190 1967 -3.796150 1972 -1.827053 1977 0.095718 1982 2.058758 1987 3.738173 1992 4.685899 1997 5.540237 2002 6.220483 2007 7.532983 Name: lifeExp, dtype: float64 '''

同时传入多个函数

  • 对于一个序列计算多个聚合函数,将它们放入一个python列表,再将列表传入agg
  • 对多个序列分别使用不同的聚合函数,将字典传入agg

一个序列计算多个聚合函数

gdf = df.groupby('year').lifeExp.agg([np.mean, np.std, np.count_nonzero]) print(gdf) ''' mean std count_nonzero year 1952 49.057620 12.225956 142.0 1957 51.507401 12.231286 142.0 1962 53.609249 12.097245 142.0 1967 55.678290 11.718858 142.0 1972 57.647386 11.381953 142.0 1977 59.570157 11.227229 142.0 1982 61.533197 10.770618 142.0 1987 63.212613 10.556285 142.0 1992 64.160338 11.227380 142.0 1997 65.014676 11.559439 142.0 2002 65.694923 12.279823 142.0 2007 67.007423 12.073021 142.0 ''' gdf = df.groupby('year').lifeExp.\ agg([np.mean, np.std, np.count_nonzero]).\ rename(columns={'mean':'avg', 'count_nonzero':'count', 'std':'std_dev'}).reset_index() print(gdf) ''' year avg std_dev count 0 1952 49.057620 12.225956 142.0 1 1957 51.507401 12.231286 142.0 2 1962 53.609249 12.097245 142.0 3 1967 55.678290 11.718858 142.0 4 1972 57.647386 11.381953 142.0 5 1977 59.570157 11.227229 142.0 6 1982 61.533197 10.770618 142.0 7 1987 63.212613 10.556285 142.0 8 1992 64.160338 11.227380 142.0 9 1997 65.014676 11.559439 142.0 10 2002 65.694923 12.279823 142.0 11 2007 67.007423 12.073021 142.0 '''

多个序列分别使用不同的聚合函数,针对Dataframe

gdf_dict = df.groupby('year').agg({ 'lifeExp':'mean', 'pop':'median', 'gdpPercap':'median'}) print(gdf_dict) ''' lifeExp pop gdpPercap year 1952 49.057620 3943953.0 1968.528344 1957 51.507401 4282942.0 2173.220291 1962 53.609249 4686039.5 2335.439533 1967 55.678290 5170175.5 2678.334741 1972 57.647386 5877996.5 3339.129407 1977 59.570157 6404036.5 3798.609244 1982 61.533197 7007320.0 4216.228428 1987 63.212613 7774861.5 4280.300366 1992 64.160338 8688686.5 4386.085502 1997 65.014676 9735063.5 4781.825478 2002 65.694923 10372918.5 5319.804524 2007 67.007423 10517531.0 6124.371109 '''

发布人:2f92****    IP:117.173.23.***     举报/删稿
展会推荐
让朕来说2句
评论
收藏
点赞
转发