On numerous occasions, I have been required to pull the top or the bottom 20% or 30% (or any other value) from...
the dataset. The Oracle 8i analytical function CUME_DIST (cumulative distribution) is a nice and easy way to do this.
A brief description
CUME_DIST is an analytic function. It computes the relative position of a specified value in a group of values. For a row R, assuming ascending ordering, the CUME_DIST of R is the number of rows with values lower than or equal to the value of R, divided by the number of rows being evaluated (the entire query result set or a partition). The range of values returned by CUME_DIST is >0 to <=1. Tie values always evaluate to the same cumulative distribution value.
How to use it
Suppose we have a table TEST with only two columns:
ORACLE_8.1.7>desc test; Name Type ------ ------- ID NUMBER VALUE NUMBER
The table has the following data:
ORACLE_8.1.7>select * from test; ID VALUE ------ ------ 1 180 2 100 3 1000 4 10 5 1
The problem is to find the top 20% of the data (i.e. one -- the top -- record out of the five records in the TEST table). Here is the solution:
ORACLE_8.1.7> select id,value from (select id, value,cume_dist() over (order by value desc) cd from test) where cd <= 0.20 ID VALUE ----- ------ 3 1000
Note that the cd value can be changed to 30% or 40% or any other value.
For More Information
- What do you think about this tip? E-mail us at [email protected] with your feedback.
- The Best Oracle Web Links: tips, tutorials, scripts, and more.
- Have an Oracle tip to offer your fellow DBA's? The best tips submitted will receive a cool prize--submit your tip today!
- Ask your technical Oracle questions--or help out your peers by answering them--in our live discussion forums.
- Check out our Ask the Experts feature: Our SQL, Database Design, Oracle, SQL Server, and DB2 gurus are waiting to answer your toughest questions.