oracle11g - Why is Oracle using full table scan when it should use an index? -


i'm doing experimentation query plans in oracle, , have following table:

--create table use create table skewed_data(   emp_id int,   dept int,   col2 int,   constraint skewed_data_pk primary key (emp_id) ); --add index on dept create index skewed_data_index1 on skewed_data(dept); 

i insert 1 million rows of data 999,999 rows have dept id 1, , 1 row has dept id 99.

before calculating statistics on table, oracle autotrace shows when running following queries, using index scan both:

select avg(col2) skewed_data d dept = 1; select avg(col2) skewed_data d dept = 99; 

it's understanding more efficient in case use full table scan dept id 1, , index scan dept id 2.

i run following command generate statistics table:

execute dbms_stats.gather_table_stats ('harry','skewed_data');  

and querying dba_tab_statistics , user_tab_col_statistics confirms stats , histograms have been gathered.

running autotrace on following queries shows full table scan both!

select avg(col2) skewed_data d dept = 1; select avg(col2) skewed_data d dept = 99; 

my question is: why oracle using full table scan dept id 99 when there 1 row value?

update

i tried running query dept 99 hint force oracle use index, , whilst autotrace believes less efficient, time takes 0.001 seconds, compared 0.03 seconds when using full table scan, proving (i think?) theory oracle should using index in instance.

select /*+ index(d skewed_data_index1) */ avg(col2) skewed_data d dept = 99; 

ok, think might have solved it. when had 999,999 rows dept 1 , 1 row dept 99, inspected number of histogram buckets running following query:

select column_name, histogram, num_buckets, num_distinct user_tab_col_statistics  table_name = 'skewed_data'; 

this showed there 2 distinct values 1 bucket. if change stats gathering this:

execute dbms_stats.gather_table_stats('harry','skewed_data',estimate_percent=>100); 

it correctly comes 2 buckets, , autotrace shows 'correct' execution plans. so, guess it's because of extreme 'skewness' of data oracle cannot generate correct stats unless estimate_percent massive.

interestingly if have less skewed data (say 2-3% of records dept id of 99) oracle treat correctly when leave estimate_percent default.

so, moral of story seems be: if have ridiculously skewed data , oracle not using correct execution plan, try playing around estimate_percent parameter.


Comments

Popular posts from this blog

yii2 - Yii 2 Running a Cron in the basic template -

asp.net - 'System.Web.HttpContext' does not contain a definition for 'GetOwinContext' Mystery -

mercurial graft feature, can it copy? -