Pandas >> usage of agg(), aggregate()

2021-10-16 Pandas

Table of Contents

In this article, we will talk about the usage of agg and aggregate in Pandas.

[Pandas] usage of agg, aggregate

You can use the agg() and aggregate() methods to aggregate the columns or rows of a DataFrame. agg() is an alias for aggregate().

Firstly, we will prepare test data.

import pandas as pd

# class, name, height, weight
data = [("A", "Kevin", 170, 60), 
        ("A", "Jack", 168, 59), 
        ("A", "Mary", 160, 50), 
        ("B", "Tom", 175, 65), 
        ("B", "Annie", 162, 51)]
df = pd.DataFrame(data=data, columns=["class", "name", "height", "weight"])
df

Result

class name height weight
0 A Kevin 170 60
1 A Jack 168 59
2 A Mary 160 50
3 B Tom 175 65
4 B Annie 162 51

Basic usage of agg()

Specify a string or a list of callable objects as the argument of agg() to indicate the process to be applied. Here, we will use a string.

Definition of agg()

DataFrame.agg(func=None, axis=0, *args, **kwargs)

func is function, string function name, list of functions and/or function names, e.g. [np.sum, ‘mean’], dict of axis labels -> functions, function names or list of such. function: np.sum, np.mean, etc. function name: sum, mean, count, etc.

# specify a list of functions
df.agg(['sum', 'mean', 'min', 'max'])

Result

class name height weight
sum AAABB KevinJackMaryTomAnnie 835.0 285.0
min A Annie 160.0 50.0
max B Tom 175.0 65.0
mean NaN NaN 167.0 57.0

If list is specified, DataFrame will be returned. If a single function name is specified, Series will be returned.

# The return value is DataFrame
this_is_a_dataframe = df.agg(['mean'])

# The return value is Series
this_is_a_series= df.agg('mean')

If we want to apply different aggregations on columns, we can use key(column name): value(applied aggregations function).

# specify a list of functions
df.agg({"height": ['sum', 'mean'], "weight": ['min', 'max']})

Result

height weight
sum 835.0 NaN
mean 167.0 NaN
min NaN 50.0
min NaN 65.0

The aggregation is performed on columns in default. If we want to apply the aggregation on rows we can specify axis=1 or axis='columns'.

axis: If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.

# Calculate sum of height and weight in row direction
df[["height", "weight"]].agg("sum", axis=1)
# or
df[["height", "weight"]].agg("sum", axis='columns')

Aggregation function usage examples

  1. We can specify function name string to apply aggregation.
df["height"].agg("mean")

Result

167.0
  1. We can also specify function to apply aggregation.
import numpy as np
df["height"].agg(np.mean)

Result

167.0
  1. We can also define lambda function to apply aggregation.
df["height"].agg(lambda x: x/10)

Result

0 17.0
1 16.8
2 16.0
3 17.5
4 16.2

Name: height, dtype: float64

  1. We can also define own function to apply aggregation.
def myfunc(h):
    return "Height: " + str(h)

df["height"].agg(myfunc)

Result

0 Height: 170
1 Height: 168
2 Height: 160
3 Height: 175
4 Height: 162

Name: height, dtype: float64

  1. We can also specify multiple functions to apply aggregation.
df["height"].agg([lambda x: x/10, myfunc])

Result

<lambda> myfunc
0 17.0 Height: 170
1 16.8 Height: 168
2 16.0 Height: 160
3 17.5 Height: 175
4 16.2 Height: 162

Subscribe and be the FIRST reader of our latest articles

* indicates required

Contact us