R:Count daily number of a variable distinguish per ID -
i've asked similar question before (here link), time, want calculate number of v distinguish per day , per id, "distinguish" not means different v
1 day, means different v
day , forwards days.
for example, if there v1
in second day, in day before, don't count v1 second day.
id1:
day1: v1/v2 -----> 2 day1
day2: v1/v3 -----> 1 day2
day3: v3 -----> 0 day3
id2
day1: v4 -----> 1 day1
day2: v5/v4/v1 -----> 2 day2
day3: v3/v4 -----> 1 day3
here data:
id day v id1 1 v1 id1 1 v1 id1 1 v2 id1 2 v1 id1 2 v3 id1 3 v3 id1 3 v3 id1 3 v3 id2 1 v4 id2 2 v5 id2 2 v5 id2 2 v4 id2 2 v1 id2 3 v3 id2 3 v4
with data above, wanna result like:
id day v daily_v_distinguish_id id1 1 v1 2 id1 1 v1 na id1 1 v2 na id1 2 v1 1 id1 2 v3 na id1 3 v3 0 id1 3 v3 na id1 3 v3 na id2 1 v4 1 id2 2 v5 2 id2 2 v5 na id2 2 v4 na id2 2 v1 na id2 3 v3 1 id2 3 v4 na
if use setdt(df1)[, daily_v_id := c(uniquen(v), rep(na, .n-1)), = .(id, day)]
, have not compared v
in day forwards days.
we use data.table
create 'daily_v_distinguish_id'. convert 'data.frame' 'data.table' (setdt(df1)
), grouped 'id' create logical index based on elements in 'v' not duplicated
. in next step, group 'id' , 'day' column, sum
of 'indx' , concatenate 'na' fill rest of elements in each group , assign (:=
'daily_v_distinguish_id'.
library(data.table) setdt(df1)[, indx:=!duplicated(v) ,.(id) ][, daily_v_distinguish_id:= c(sum(indx),rep(na, .n-1)) , .(id, day) ][,indx:=null] df1 # id day v daily_v_distinguish_id # 1: id1 1 v1 2 # 2: id1 1 v1 na # 3: id1 1 v2 na # 4: id1 2 v1 1 # 5: id1 2 v3 na # 6: id1 3 v3 0 # 7: id1 3 v3 na # 8: id1 3 v3 na # 9: id2 1 v4 1 #10: id2 2 v5 2 #11: id2 2 v5 na #12: id2 2 v4 na #13: id2 2 v1 na #14: id2 3 v3 1 #15: id2 3 v4 na
a similar option using dplyr
is
library(dplyr) df1 %>% group_by(id) %>% mutate(ind=!duplicated(v)) %>% group_by(day, add=true)%>% mutate(daily_v_distinguish_id=c(sum(ind), rep(na, n()-1))) %>% select(-ind)
or using ave
base r
with(df1, ave(!duplicated(df1[-2]), id, day, fun=function(x) c(sum(x), rep(na, length(x)-1)))) #[1] 2 na na 1 na 0 na na 1 2 na na na 1 na
data
df1 <- structure(list(id = c("id1", "id1", "id1", "id1", "id1", "id1", "id1", "id1", "id2", "id2", "id2", "id2", "id2", "id2", "id2" ), day = c(1l, 1l, 1l, 2l, 2l, 3l, 3l, 3l, 1l, 2l, 2l, 2l, 2l, 3l, 3l), v = c("v1", "v1", "v2", "v1", "v3", "v3", "v3", "v3", "v4", "v5", "v5", "v4", "v1", "v3", "v4")), .names = c("id", "day", "v"), class = "data.frame", row.names = c(na, -15l))
Comments
Post a Comment