Please tell me the difference between rule-based and cost-based optimization of SQL queries, in detail. Thank you.
A long time ago, the only optimizer in the Oracle database was the Rule-Based Optimizer (RBO). Basically, the RBO used a set of rules to determine how to execute a query. If an index was available on a table, the RBO rules said to always use the index. There are some cases where the use of an index slowed down a query. For example, assume someone put an index on the GENDER column, which holds one of two values, MALE and FEMALE. Then someone issues the following query:
SELECT * FROM emp WHERE gender='FEMALE';
If the above query returned approximately 50% of the rows, then using an index would actually slow things down. It would be faster to read the entire table and throw away all rows that have MALE values. Experts in Oracle query optimization have come to a rule of thumb that says if the number of rows returned is more than 5-10% of the total table volume, using an index would slow things down. The RBO would always use an index if present because its rules said to.
It became obvious that the RBO, armed with its set of discrete rules, did not always make great decisions. The biggest problem with the RBO was that it did not take the data distribution into account. So the Cost-Based Optimizer (CBO) was born. The CBO uses statistics about the table, its indexes and the data distribution to make better informed decisions. Using our previous example, assume that the company has employees that are 95% female and 5% male. If you query for females, then you do not want to use the index. If you query for males, then you would like to use the index. The CBO has information at hand to help make these kind of determinations that were not available in the old RBO.