oracle11g - Why is Oracle using full table scan when it should use an index? -
i'm doing experimentation query plans in oracle, , have following table:
--create table use create table skewed_data( emp_id int, dept int, col2 int, constraint skewed_data_pk primary key (emp_id) ); --add index on dept create index skewed_data_index1 on skewed_data(dept);
i insert 1 million rows of data 999,999 rows have dept id 1, , 1 row has dept id 99.
before calculating statistics on table, oracle autotrace shows when running following queries, using index scan both:
select avg(col2) skewed_data d dept = 1; select avg(col2) skewed_data d dept = 99;
it's understanding more efficient in case use full table scan dept id 1, , index scan dept id 2.
i run following command generate statistics table:
execute dbms_stats.gather_table_stats ('harry','skewed_data');
and querying dba_tab_statistics
, user_tab_col_statistics
confirms stats , histograms have been gathered.
running autotrace on following queries shows full table scan both!
select avg(col2) skewed_data d dept = 1; select avg(col2) skewed_data d dept = 99;
my question is: why oracle using full table scan dept id 99 when there 1 row value?
update
i tried running query dept 99 hint force oracle use index, , whilst autotrace believes less efficient, time takes 0.001 seconds, compared 0.03 seconds when using full table scan, proving (i think?) theory oracle should using index in instance.
select /*+ index(d skewed_data_index1) */ avg(col2) skewed_data d dept = 99;
ok, think might have solved it. when had 999,999 rows dept 1 , 1 row dept 99, inspected number of histogram buckets running following query:
select column_name, histogram, num_buckets, num_distinct user_tab_col_statistics table_name = 'skewed_data';
this showed there 2 distinct values 1 bucket. if change stats gathering this:
execute dbms_stats.gather_table_stats('harry','skewed_data',estimate_percent=>100);
it correctly comes 2 buckets, , autotrace shows 'correct' execution plans. so, guess it's because of extreme 'skewness' of data oracle cannot generate correct stats unless estimate_percent massive.
interestingly if have less skewed data (say 2-3% of records dept id of 99) oracle treat correctly when leave estimate_percent default.
so, moral of story seems be: if have ridiculously skewed data , oracle not using correct execution plan, try playing around estimate_percent
parameter.
Comments
Post a Comment