performance - Hive: Is there a better way to percentile rank a column? -


currently, percentile rank column in hive, using following. trying rank items in column percentile fall under, assigning value form 0 1 each item. code below assigns value 0 9, saying item char_percentile_rank of 0 in bottom 10% of items, , value of 9 in top 10% of items. there better way of doing this?

select item     , characteristic     , case when characteristic <= char_perc[0] 0         when characteristic <= char_perc[1] 1         when characteristic <= char_perc[2] 2         when characteristic <= char_perc[3] 3         when characteristic <= char_perc[4] 4         when characteristic <= char_perc[5] 5         when characteristic <= char_perc[6] 6         when characteristic <= char_perc[7] 7         when characteristic <= char_perc[8] 8         else 9       end char_percentile_rank (     select split(item_id,'-')[0] item         , split(item_id,'-')[1] characteristic         , char_perc     (         select collect_set(concat_ws('-',item,characteristic)) item_set             , percentile(bigint(characteristic),array(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)) char_perc         from(             select item                 , sum(characteristic) characteristic             table             group item         ) t1     ) t2     lateral view explode(item_set) explodetable item_id ) t3 

note: had collect_set in order avoid self join, percentile function implicitly performs group by.

i've gathered percentile function horribly slow (at least in usage). perhaps better manually calculate percentile?

try removing 1 of derived tables

select item     , characteristic     , case when characteristic <= char_perc[0] 0         when characteristic <= char_perc[1] 1         when characteristic <= char_perc[2] 2         when characteristic <= char_perc[3] 3         when characteristic <= char_perc[4] 4         when characteristic <= char_perc[5] 5         when characteristic <= char_perc[6] 6         when characteristic <= char_perc[7] 7         when characteristic <= char_perc[8] 8         else 9       end char_percentile_rank (      select item, characteristic,          , percentile(bigint(characteristic),array(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)) on () char_perc       (        select item          , sum(characteristic) characteristic                     table        group item                  ) t1 ) t2 

Comments

Popular posts from this blog

yii2 - Yii 2 Running a Cron in the basic template -

asp.net - 'System.Web.HttpContext' does not contain a definition for 'GetOwinContext' Mystery -

mercurial graft feature, can it copy? -