Find a list of strings across a data table in R -
i have vector of strings (candidates), each of want find within data table (fbgn_dmels), , return first column entry if match found within row (e.g. cg2175 should return "1-dec").
> head(candidates) [1] "cg2175" "cg31196" "cg3169" "cg15168" "cg2252" "cg2019" > fbgn_dmels v1_01 v1_02 v1_03 v1_04 v1_05 v1_06 v1_07 v1_08 v1_09 v1_10 v1_11 v1_12 v1_13 v1_14 1: 1-dec fbgn0000427 fbgn0000645 cg2175 na na na na na na na na na na 2: 1-sep fbgn0011710 fbgn0005665 fbgn0013404 fbgn0014082 fbgn0024226 cg1403 na na na na na na na 3: 128up fbgn0010339 fbgn0010196 cg8340 na na na na na na na na na na 4: 14-3-3epsilon fbgn0020238 fbgn0011329 fbgn0016739 fbgn0016743 fbgn0046456 fbgn0051196 fbgn0064146 fbgn0066007 cg31196 na na na na 5: 14-3-3zeta fbgn0004907 fbgn0010635 fbgn0019723 fbgn0023038 fbgn0046306 fbgn0064146 cg17870 na na na na na na --- 17743: zw10 fbgn0004643 fbgn0000016 fbgn0002765 fbgn0029627 cg9900 na na na na na na na na 17744: zwilch fbgn0061476 fbgn0036933 fbgn0042214 cg18729 cg18639 na na na na na na na na 17745: zyd fbgn0265767 fbgn0243503 fbgn0025689 fbgn0058147 fbgn0040030 cg2893 cg40147 na na na na na na 17746: zye fbgn0036985 cg5847 na na na na na na na na na na na 17747: zyx fbgn0011642 fbgn0047225 fbgn0052018 cg32018 na na na na na na na na na
i solve issue using loops on data frame, seems quite slow , inefficient. wondering if there straightforward way of doing data tables.
many in advance suggestions how tackle this.
-geo
pretty hacky, seems work. i'm assuming data called fbgn_dmels:
candidates <- c("cg2175", "cg31196", "cg3169", "cg15168", "cg2252", "cg2019") getthem <- function(string){ string <- paste0("^",string,"$") as.character(fbgn_dmels[which(apply(fbgn_dmels, 2, function(x) grepl(string, x, perl=true)), arr.ind = true)[1], "v1_01"][1]) }
sapply(candidates, getthem)
first have defined function (getthem
) gets first occurrence single one, use sapply
hit candidates.
Comments
Post a Comment