group by - Pandas: expanding_apply with groupby for unique counts of string type -


i have dataframe:

import pandas pd id = [0,0,0,0,1,1,1,1] color = ['red','blue','red','black','blue','red','black','black'] test = pd.dataframe(zip(id, color), columns = ['id', 'color']) 

and create column of running count of unique colors grouped id final dataframe looks this:

   id  color  expanding_unique_count 0   0    red                       1 1   0   blue                       2 2   0    red                       2 3   0  black                       3 4   1   blue                       1 5   1    red                       2 6   1  black                       3 7   1  black                       3 

i tried simple way:

def len_unique(x):     return(len(np.unique(x)))  test['expanding_unique_count'] = test.groupby('id')['color'].apply(lambda x: pd.expanding_apply(x, len_unique)) 

and got valueerror: not convert string float: black

if change colors integers:

color = [1,2,1,3,2,1,3,3]  test = pd.dataframe(zip(id, color), columns = ['id', 'color']) 

then running same code above produces desired result. there way work while maintaining string type column color?

it looks expanding_apply , rolling_apply work on numeric values. maybe try creating numeric column code color string numeric values (this can done make color column categorical), , expanding_apply.

# processing # =================================== # create numeric label test['numeric_label'] = pd.categorical(test['color']).codes # output: array([2, 1, 2, 0, 1, 2, 0, 0], dtype=int8)  # expanding function test['expanding_unique_count'] = test.groupby('id')['numeric_label'].apply(lambda x: pd.expanding_apply(x, len_unique)) # drop auxiliary column test.drop('numeric_label', axis=1)     id  color  expanding_unique_count 0   0    red                       1 1   0   blue                       2 2   0    red                       2 3   0  black                       3 4   1   blue                       1 5   1    red                       2 6   1  black                       3 7   1  black                       3 

edit:

def func(group):     return pd.series(1, index=group.groupby('color').head(1).index).reindex(group.index).fillna(0).cumsum()  test['expanding_unique_count'] =  test.groupby('id', group_keys=false).apply(func) print(test)     id  color  expanding_unique_count 0   0    red                       1 1   0   blue                       2 2   0    red                       2 3   0  black                       3 4   1   blue                       1 5   1    red                       2 6   1  black                       3 7   1  black                       3 

Comments

Popular posts from this blog

yii2 - Yii 2 Running a Cron in the basic template -

asp.net - 'System.Web.HttpContext' does not contain a definition for 'GetOwinContext' Mystery -

mercurial graft feature, can it copy? -