Pandas >> usage of agg(), aggregate()
Table of Contents
In this article, we will talk about the usage of agg and aggregate in Pandas
.
You can use the agg() and aggregate() methods to aggregate the columns or rows of a DataFrame. agg() is an alias for aggregate().
Firstly, we will prepare test data.
import pandas as pd
# class, name, height, weight
data = [("A", "Kevin", 170, 60),
("A", "Jack", 168, 59),
("A", "Mary", 160, 50),
("B", "Tom", 175, 65),
("B", "Annie", 162, 51)]
df = pd.DataFrame(data=data, columns=["class", "name", "height", "weight"])
df
Result
class | name | height | weight | |
---|---|---|---|---|
0 | A | Kevin | 170 | 60 |
1 | A | Jack | 168 | 59 |
2 | A | Mary | 160 | 50 |
3 | B | Tom | 175 | 65 |
4 | B | Annie | 162 | 51 |
Basic usage of agg()
Specify a string or a list of callable objects as the argument of agg() to indicate the process to be applied. Here, we will use a string.
Definition of agg()
DataFrame.agg(func=None, axis=0, *args, **kwargs)
func is function, string function name, list of functions and/or function names, e.g. [np.sum, ‘mean’], dict of axis labels -> functions, function names or list of such. function: np.sum, np.mean, etc. function name: sum, mean, count, etc.
# specify a list of functions
df.agg(['sum', 'mean', 'min', 'max'])
Result
class | name | height | weight | |
---|---|---|---|---|
sum | AAABB | KevinJackMaryTomAnnie | 835.0 | 285.0 |
min | A | Annie | 160.0 | 50.0 |
max | B | Tom | 175.0 | 65.0 |
mean | NaN | NaN | 167.0 | 57.0 |
If list is specified, DataFrame will be returned. If a single function name is specified, Series will be returned.
# The return value is DataFrame
this_is_a_dataframe = df.agg(['mean'])
# The return value is Series
this_is_a_series= df.agg('mean')
If we want to apply different aggregations on columns, we can use key(column name): value(applied aggregations function).
# specify a list of functions
df.agg({"height": ['sum', 'mean'], "weight": ['min', 'max']})
Result
height | weight | |
---|---|---|
sum | 835.0 | NaN |
mean | 167.0 | NaN |
min | NaN | 50.0 |
min | NaN | 65.0 |
The aggregation is performed on columns in default. If we want to apply the aggregation on rows we can specify axis=1
or axis='columns'
.
axis: If 0
or ‘index’
: apply function to each column. If 1
or ‘columns’
: apply function to each row.
# Calculate sum of height and weight in row direction
df[["height", "weight"]].agg("sum", axis=1)
# or
df[["height", "weight"]].agg("sum", axis='columns')
Aggregation function usage examples
- We can specify function name string to apply aggregation.
df["height"].agg("mean")
Result
167.0
- We can also specify function to apply aggregation.
import numpy as np
df["height"].agg(np.mean)
Result
167.0
- We can also define lambda function to apply aggregation.
df["height"].agg(lambda x: x/10)
Result
0 | 17.0 |
1 | 16.8 |
2 | 16.0 |
3 | 17.5 |
4 | 16.2 |
Name: height, dtype: float64
- We can also define own function to apply aggregation.
def myfunc(h):
return "Height: " + str(h)
df["height"].agg(myfunc)
Result
0 | Height: 170 |
1 | Height: 168 |
2 | Height: 160 |
3 | Height: 175 |
4 | Height: 162 |
Name: height, dtype: float64
- We can also specify multiple functions to apply aggregation.
df["height"].agg([lambda x: x/10, myfunc])
Result
<lambda> | myfunc | |
---|---|---|
0 | 17.0 | Height: 170 |
1 | 16.8 | Height: 168 |
2 | 16.0 | Height: 160 |
3 | 17.5 | Height: 175 |
4 | 16.2 | Height: 162 |