I am having a difficult time with a "one-to-every" relationship. I am trying to find the employees who worked on EVERY database project. Employees exist on the EmpProj table with EmpNo (Employee number) and ProjNo (Project number). The Project table has ProjNo and project type. I need to find all projects that exist on the Project table that have a ProjType of 'database' and then find only the employees that have worked on EVERY one of them.
The first thing we need to do is find those employees who have worked on any database project:
select EmpNo, count(*) as DatabaseProjects from EmpProj inner join Project on EmpProj.EmpNo = Project.EmpNo and Project.ProjType='database' group by EmpNo
The count that is returned in the DatabaseProjects column will be one of two values:
the number of database projects, assuming the employee can only work on a single project once, i.e. the combination of values in EmpNo and ProjNo in the EmpProj table is unique
the number of assignments to database projects, if the employee can work on a single project more than once
In the first case, COUNT(*) will suffice, but in the second case, to satisfy the original question, we need to use COUNT(DISTINCT ProjNo).
Let's assume the second case, that an employee can work on the same project more than once. If that doesn't sound reasonable, just change "employee" to "consultant" and "project" to "client" and suddenly the need to count distinct clients is more obvious.
select EmpNo from EmpProj inner join Project on EmpProj.EmpNo = Project.EmpNo and Project.ProjType='database' group by EmpNo having count(distinct ProjNo) = ( select count(*) from Project where ProjType='database' )
In the subquery, we can use COUNT(*) because the Project table has one row per project. Presumably, ProjNo is the primary key and therefore unique and not null. Thus COUNT(DISTINCT ProjNo) would also work, but it would be less efficient as well as unnecessary. COUNT(ProjNo) would count the non-null values of ProjNo, and this too would be less efficient and unnecessary. Use COUNT(*) whenever the situation allows you to.
The subquery is not correlated, which simply means that it is independent of the main query. Therefore its result, which is a single number, can be calculated ahead of time by the database engine, and this number can then be used to compare to the group count for each employee in the HAVING clause. Queries that use uncorrelated subqueries like this are very efficient.
Dig Deeper on Oracle and SQL
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.