python - Pandas split column name -
i have test dataframe looks this:
data = pd.dataframe([[0,0,0,3,6,5,6,1],[1,1,1,3,4,5,2,0],[2,1,0,3,6,5,6,1],[3,0,0,2,9,4,2,1]], columns=["id", "sex", "split", "group0low", "group0high", "group1low", "group1high", "trim"]) grouped = data.groupby(['sex','split']).mean() stacked = grouped.stack().reset_index(level=2) stacked.columns = ['group_level', 'mean']
next, want separate out group_level , stack 2 new factors:
stacked['group'] = stacked.group_level.str[:6] stacked['level'] = stacked.group_level.str[6:]
this works fine. question this:
this works if column names ("group0low", "group0high", "group1low", "group1high") have in common each other.
what if instead column names more "routelow", "routehigh", "landmarklow", "landmarkhigh"? how use str split group_level in case?
this question similar 1 posted here: slice/split string series @ various positions
the difference of column subnames different , have no commonality (whereas in other post had group or class in name). there regex string, or other method, can use stacking?
here way. assumes low/high group ends words low
, high
respectively, can use .str.endswith()
identify rows low/high.
here sample data
df = pd.dataframe('group0low group0high group1low group1high routelow routehigh landmarklow landmarkhigh'.split(), columns=['group_level']) df group_level 0 group0low 1 group0high 2 group1low 3 group1high 4 routelow 5 routehigh 6 landmarklow 7 landmarkhigh
use np.where
, can following
df['level'] = np.where(df['group_level'].str.endswith('low'), 'low', 'high') df['group'] = np.where(df['group_level'].str.endswith('low'), df['group_level'].str[:-3], df['group_level'].str[:-4]) df group_level level group 0 group0low low group0 1 group0high high group0 2 group1low low group1 3 group1high high group1 4 routelow low route 5 routehigh high route 6 landmarklow low landmark 7 landmarkhigh high landmark
Comments
Post a Comment