Q
Problem solve Get help with specific problems with your technologies, process and projects.

Groups that have over 50% females

From a table, I want to select the site_ids that have a majority (over 50%) females.

I have a table:

site_id  gender  client_id
 1        M       1a
 1        F       1b
 1        F       1c
 2        F       2a
 2        M       2b
 3        M       3a

1. I want to select the site_ids that have a majority (over 50%) of female gender.

2. Alternatively, select a list of client_ids corresponding to all the site_ids selected above.

The first query is pretty straightforward. We simply do a GROUP BY on the site_id, and count how many females as compared to the overall count:

select site_id  
  from daTable
group
    by site_id
having 100.0
     * count(case when gender='F'
                  then 937 end)
     / count(*) > 50.0

You might be wondering why the HAVING clause uses 100.0 in its calculation. This is because counts are integers, and whenever an arithmetic calculation involves only integers, the result will be an integer. So without the 100.0, which is a decimal, the result of the division would be either 0 or 1. With the 100.0, the calculation changes to a decimal calculation, and can be compared to a percentage.

Next, you may wonder what the 937 is doing. Fair question, and the use of that value is, admittedly, a wee red herring. The point is, 937 is not NULL. Notice that in the CASE expression, there is no ELSE, which means that, by default, the ELSE value is NULL. So the CASE expression evaluates to a non-NULL value for females, and NULL for males. Now, recall that aggregate functions like COUNT ignore NULLs, and the solution becomes clear.

Sometimes you will also see the partial count implemented like this:

having 100.0
     * sum(case when gender='F'
                then 1 else 0 end)
     / count(*) > 50.0

Here SUM is used instead of COUNT, and the CASE returns 1s and 0s instead. It would also be okay to omit the "ELSE 0" here.

As for your second question, you probably meant to say "additionally" instead of "alternatively." The answer is:

select distinct client_id
  from daTable
 where site_id in
       ( first query goes here )
This was last published in January 2007

Dig Deeper on Oracle and SQL

PRO+

Content

Find more PRO+ content and other member only offers, here.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Start the conversation

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

-ADS BY GOOGLE

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide.com

SearchDataCenter

SearchContentManagement

SearchFinancialApplications

Close