Q
Problem solve Get help with specific problems with your technologies, process and projects.

Weighted percentiles, part 1

From this table:

SELECT    item_id
     ,    units
     ,    price
FROM      items
ORDER BY  price
;

  ITEM_ID      UNITS      PRICE
--------- ---------- ----------
        1          4       1.99
        3          6       2.29
        4         11       2.49
        2          1       2.99
...

...and so on, I can find the average price, min, max, etc. What I need to know is the fifth percentile and the 95th percentile of the price data, weighted by units. I have noticed that there are functions like percentile_disc. But my problem is that I have to weight the price by the units sold.

Do you have any suggestions in Oracle SQL?

Let's assume your data consists of just the four rows shown above. To restate the problem, you want to find the price such that 5% of the units sold had a lower (or equal) price, and the price such that 95% of the units sold had a lower (or equal) price. Since the total number of units is 22, you want the prices of the "1.1st" unit (.05 * 22 = 1.1) and the "20.9th" unit.

The analytic function percentile_disc was introduced in Oracle 9 to answer questions just like yours on unweighted data. Unfortunately, like you, I don't see how it can be used on weighted data. There is, however, another analytic function, sum, that can help by telling you how many units sold at a lower (or equal) price for each row of your table. With that number, and the grand total of the units sold, you can compute the percentiles.

SELECT    item_id            -- Query 1
     ,    units
     ,    price
     ,    SUM (units)
              OVER  (ORDER BY price
                     RANGE UNBOUNDED PRECEDING)
                  AS running_total
     ,    (
              SELECT  SUM (units)
              FROM    items
          )       AS grand_total
FROM      items
ORDER BY  price
;

   ITEM_ID      UNITS      PRICE RUNNING_TOTAL GRAND_TOTAL
---------- ---------- ---------- ------------- -----------
         1          4       1.99             4          22
         3          6       2.29            10          22
         4         11       2.49            21          22
         2          1       2.99            22          22

You can query the result set above to get the information you need, one item at a time, as shown below. Notice that the in-line view (the subquery) is (almost) exactly the same as Query 1 shown above.

SELECT  price   AS price_05          -- Query 2
FROM    (   --  begin Sub-query 2a, same as Query 1
            SELECT    item_id
                 ,    units
                 ,    price
                 ,    SUM (units)
                          OVER  (ORDER BY price
                                 RANGE UNBOUNDED PRECEDING)
                              AS running_total
                 ,    (
                          SELECT  SUM (units)
                          FROM    items
                      )       AS grand_total
            FROM      items
        )   --  end sub-query 2a
WHERE   (.05 * grand_total)  BETWEEN  (running_total - units)
                                 AND  units
;

  PRICE_05
----------
      1.99

Click to jump to part 2.

Dig Deeper on Using Oracle PL-SQL

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

-ADS BY GOOGLE

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide.com

SearchDataCenter

SearchContentManagement

SearchHRSoftware

Close