group by - Pandas: expanding_apply with groupby for unique counts of string type -
i have dataframe:
import pandas pd id = [0,0,0,0,1,1,1,1] color = ['red','blue','red','black','blue','red','black','black'] test = pd.dataframe(zip(id, color), columns = ['id', 'color'])
and create column of running count of unique colors grouped id final dataframe looks this:
id color expanding_unique_count 0 0 red 1 1 0 blue 2 2 0 red 2 3 0 black 3 4 1 blue 1 5 1 red 2 6 1 black 3 7 1 black 3
i tried simple way:
def len_unique(x): return(len(np.unique(x))) test['expanding_unique_count'] = test.groupby('id')['color'].apply(lambda x: pd.expanding_apply(x, len_unique))
and got valueerror: not convert string float: black
if change colors integers:
color = [1,2,1,3,2,1,3,3] test = pd.dataframe(zip(id, color), columns = ['id', 'color'])
then running same code above produces desired result. there way work while maintaining string type column color?
it looks expanding_apply
, rolling_apply
work on numeric values. maybe try creating numeric column code color string numeric values (this can done make color column categorical), , expanding_apply
.
# processing # =================================== # create numeric label test['numeric_label'] = pd.categorical(test['color']).codes # output: array([2, 1, 2, 0, 1, 2, 0, 0], dtype=int8) # expanding function test['expanding_unique_count'] = test.groupby('id')['numeric_label'].apply(lambda x: pd.expanding_apply(x, len_unique)) # drop auxiliary column test.drop('numeric_label', axis=1) id color expanding_unique_count 0 0 red 1 1 0 blue 2 2 0 red 2 3 0 black 3 4 1 blue 1 5 1 red 2 6 1 black 3 7 1 black 3
edit:
def func(group): return pd.series(1, index=group.groupby('color').head(1).index).reindex(group.index).fillna(0).cumsum() test['expanding_unique_count'] = test.groupby('id', group_keys=false).apply(func) print(test) id color expanding_unique_count 0 0 red 1 1 0 blue 2 2 0 red 2 3 0 black 3 4 1 blue 1 5 1 red 2 6 1 black 3 7 1 black 3
Comments
Post a Comment