Groupby_agg : Shortcut for assigning a groupby-transform to a new column.¶

Background¶

This notebook serves to show how to use the groupby_agg method from pyjanitor's general functions submodule.

The groupby_agg method allows us to add the result of an aggregation from a grouping, as a new column, back to the dataframe.

Currently in pandas, to append a column back to a dataframe, you do it in three steps:

Groupby a column or columns
Apply the transform method with an aggregate function on the grouping, and finally
Assign the result of the transform to a new column in the dataframe.

The groupby_agg allows you to replicate the same process in one step and with sensible arguments. The example below illustrates this function

In [1]:

# load modules
import pandas as pd
import numpy as np
from janitor import groupby_agg

In [2]:

data = {
    'item': ['shoe', 'shoe', 'bag', 'shoe', 'bag'],
    'MRP': [220, 450, 320, 200, 305],
    'number_sold': [100, 40, 56, 38, 25]
}

df = pd.DataFrame(data)

df

Out[2]:

	item	MRP	number_sold
0	shoe	220	100
1	shoe	450	40
2	bag	320	56
3	shoe	200	38
4	bag	305	25

Use grouby_agg to find average price for each item and append column to dataframe¶

In [3]:

df = df.groupby_agg(
    by='item',
    agg='mean',
    agg_column_name='MRP',
    new_column_name='Avg_MRP'
)

df

Out[3]:

	item	MRP	number_sold	Avg_MRP
0	shoe	220	100	290.0
1	shoe	450	40	290.0
2	bag	320	56	312.5
3	shoe	200	38	290.0
4	bag	305	25	312.5

Null cells are retained as well :

In [4]:

df = pd.DataFrame(
        {
            "name": ("black", "black", "black", "red", "red"),
            "type": ("chair", "chair", "sofa", "sofa", "plate"),
            "num": (4, 5, 12, 4, 3),
            "nulls": (1, 1, np.nan, np.nan, 3),
        }
    )

df

Out[4]:

	name	type	num	nulls
0	black	chair	4	1.0
1	black	chair	5	1.0
2	black	sofa	12	NaN
3	red	sofa	4	NaN
4	red	plate	3	3.0

In [5]:

filtered_df = df.groupby_agg(by=['nulls'],
                             agg='size',
                             agg_column_name='type',
                             new_column_name='counter'
                             )
filtered_df

Out[5]:

	name	type	num	nulls	counter
0	black	chair	4	1.0	2
1	black	chair	5	1.0	2
2	black	sofa	12	NaN	2
3	red	sofa	4	NaN	2
4	red	plate	3	3.0	1

The groupby_agg method can be extended for different purposes. One of these is groupwise filtering, where only groups that meet a condition are retained. Let's explore this with an example:

In [6]:

filtered_df = (df.groupby_agg(by=['name', 'type'],
                             agg='size',
                             agg_column_name='type',
                             new_column_name='counter'
                             )
                .query('counter > 1'))
filtered_df

Out[6]:

	name	type	num	nulls	counter
0	black	chair	4	1.0	2
1	black	chair	5	1.0	2

In [ ]: