What’s the difference between SEMI-JOIN and ANTI-JOIN?

Dear Patrick,

What is an ANTI-JOIN? And what is the difference between the SEMI-JOIN and the ANTI-JOIN?

Lillian Sturdey

Dear Lillian,

First of all, both SEMI-JOIN and ANTI-JOIN are not in the SQL syntax but they are more a pattern. You might expect to be able to write something like:

[PATRICK]SQL>SELECT d.deptno, d.dname, d.loc
               FROM dept d
               SEMI JOIN emp e ON (e.deptno = d.deptno)
             /

to get all the departments that have at least one employee.
Or:

[PATRICK]SQL>SELECT d.deptno, d.dname, d.loc
               FROM dept d
               ANTI JOIN emp e ON (e.deptno = d.deptno)
             /

to get the departments with no employees. But all you get is an error saying your command is not properly ended, which can be read as a syntax error.

ERROR at line 3:
ORA-00933: ORA-00933 SQL command not properly ended.

Maybe your first idea would be to use a normal join to get all the departments with at least one employee:

[PATRICK]SQL>SELECT d.deptno, d.dname, d.loc
               FROM dept d
               JOIN emp e ON (e.deptno = d.deptno)
             /

But this results in a record for every row in the EMP table. And we only wanted every unique department.

    DEPTNO DNAME          LOC
---------- -------------- -------------
        20 RESEARCH       DALLAS
        30 SALES          CHICAGO
        30 SALES          CHICAGO
        20 RESEARCH       DALLAS
        30 SALES          CHICAGO
        30 SALES          CHICAGO
        10 ACCOUNTING     NEW YORK
        20 RESEARCH       DALLAS
        10 ACCOUNTING     NEW YORK
        30 SALES          CHICAGO
        20 RESEARCH       DALLAS

    DEPTNO DNAME          LOC
---------- -------------- -------------
        30 SALES          CHICAGO
        20 RESEARCH       DALLAS
        10 ACCOUNTING     NEW YORK

14 rows selected.

Well, that’s easy enough, you think, just add a DISTINCT to the statement:

[PATRICK]SQL>SELECT DISTINCT d.deptno, d.dname, d.loc
               FROM dept d
               JOIN emp e ON (e.deptno = d.deptno) 
             /

Exactly the result we are looking for:

    DEPTNO DNAME          LOC
---------- -------------- -------------
        20 RESEARCH       DALLAS
        10 ACCOUNTING     NEW YORK
        30 SALES          CHICAGO

But what if the EMP table contains hundreds, thousands or maybe millions of rows. That would mean the database has to do a lot of work to filter out the distinct values.
A different, and probably better, approach would be to use the SEMI-JOIN pattern. You can use the IN operator like this:

[PATRICK]SQL>SELECT d.deptno, d.dname, d.loc
              FROM dept d
              WHERE d.deptno IN (SELECT e.deptno
                                   FROM emp e)
             /
    DEPTNO DNAME          LOC
---------- -------------- -------------
        20 RESEARCH       DALLAS
        30 SALES          CHICAGO
        10 ACCOUNTING     NEW YORK

This is exactly what we want to see but for big tables this is not the correct way to go. For every record in the dept table all the records in the EMP table are checked. Again, if we have a lot of employees, this means a lot of work for the database.
A better SEMI-JOIN to use is the EXISTS operator:

[PATRICK]SQL>SELECT d.deptno, d.dname, d.loc
               FROM dept d
              WHERE EXISTS (SELECT 1
                              FROM emp e
                             WHERE e.deptno = d.deptno)
             /

Please note that with the current optimizer in the database Oracle will rewrite your query to use the best approach for the task. If the inner table (in our example EMP) is rather small, then the IN approach might be the best, in other cases it might be better to use the EXISTS approach. Where in earlier versions you had to think about which way to go (IN is better for small tables, EXISTS is better for big ones), you can now rely on the optimizer to make the correct choice.
If you would want to see exactly the opposite of this query, i.e. all departments with no employees, you use an ANTI-JOIN pattern, which is pretty much the same but in this case you use NOT IN or NOT EXISTS. A different approach, which I think is pretty nice is to use an OUTER JOIN and check for the non-existence of values in column for the OUTER JOINED table.

[PATRICK]SQL>SELECT d.deptno, d.dname, d.loc
               FROM dept d
               LEFT OUTER JOIN emp e ON (e.deptno = d.deptno)
              WHERE e.empno IS NULL
             /
    DEPTNO DNAME          LOC
---------- -------------- -------------
        40 OPERATIONS     BOSTON

Hope this gives you a bit more insight in this subject and gives you a better understanding of the wonders of the SQL language. Notice there are many ways to reach the same result, but one approach might be more economical than the other.

Happy Oracle’ing,
Patrick Barel

Hope this answers your question.
Happy Oracle’ing,
Patrick Barel

If you have any comments on this subject or you have a question you want answered, please send an email to patrick[at]bar-solutions[dot]com. If I know the answer, or can find it for you, maybe I can help.

This question has been published in OTech Magazine of Winter 2014.