Do you know the difference between the UNION and the UNION ALL part of a statement? In short, UNION does a sort-unique on both collections. UNION ALL just retrieves all rows, regardless of any double values.
I came across a (similar) piece of code during my work at BT:
select distinct * from ...
union
select distinct * from ...;
It looks like there is a distinct done twice here.
To play around with this a bit, I created a simple table:
create table union_test ( value number );
Then I inserted a couple of record (with duplicate values):
insert into union_test(value) values (1);
insert into union_test(value) values (1);
insert into union_test(value) values (2);
insert into union_test(value) values (2);
insert into union_test(value) values (3);
insert into union_test(value) values (3);
insert into union_test(value) values (4);
insert into union_test(value) values (4);
insert into union_test(value) values (5);
insert into union_test(value) values (5);
Now I can run some queries to see the differences:
select * from union_test where value <= 3;
The result for this query is as follows:
VALUE
----------
1
1
2
2
3
3
6 rows selected
Note, the duplicate values. To get only unique values I need to add the distinct operator.
select distinct * from union_test where value <= 3;
The result is:
VALUE
----------
1
2
3
3 rows selected
Now I want to get the other part of the values (with some duplicate values with regards to the previous query)
select * from union_test where value >= 3;
VALUE
----------
3
3
4
4
5
5
6 rows selected
I can remove the duplicate values using the distinct again:
select distinct * from union_test where value >= 3;
VALUE
----------
4
5
3
3 rows selected
If I want to join both result sets then I should use the union operator:
select distinct * from union_test where value <= 3
union
select distinct * from union_test where value >= 3;
VALUE
----------
1
2
3
4
5
5 rows selected
Note that I am not getting duplicate values. Is this because of the distinct clause in both select statements? This clause is applied to each result set individually so the union operator must be doing a distinct as well. Let’s try removing the distinct clause from the select statements:
select * from union_test where value <= 3
union
select * from union_test where value >= 3;
VALUE
----------
1
2
3
4
5
5 rows selected
We get the same results as before. This should mean that the union operator performs a distinct on the entire result set. There is a way to circumvent this behavior. This is done by adding the all clause to the union operator. This tells the SQL-engine to just add both result sets regardless of any duplicates.
select distinct * from union_test where value <= 3
union all
select distinct * from union_test where value >= 3;
VALUE
----------
1
2
3
4
5
3
6 rows selected
This way both sets of result are returned. Note that there is no sorting done either. The union operator apparently also performs sorting on the result set. The distinct clause makes sure the different sets of data give me the unique values. If I omit this clause then the result is like this:
select * from union_test where value <= 3
union all
select * from union_test where value >= 3;
VALUE
----------
1
1
2
2
3
3
3
3
4
4
5
5
12 rows selected
Note that performing the distinct and sort operation on the result costs time, so if you are absolutely sure the result set for both parts of the union are mutually exclusive (that is, there will never be any duplicates in the sets) then use the union all clause to have the query perform (a lot) faster.
Another issue is that both select statements should retrieve the same number of columns. Also the datatypes of all the retrieved columns must be the same.
A few comments:
>Note that there is no sorting done either. The union operator apparently also performs sorting on the result set.
NO, NO. Never ever rely on any specific order without order by clause! This time it is just because Oracle is using sort unique opeartion to sort out distinct values, but as soon as you are using parralel operations, partitions and/or use hash unique operation to sort out distinct values it won’t be true anymore.
>Also the datatypes of all the retrieved columns must be the same.
Let’s say they have to be such that Oracle at least can do implicit conversion
[shameless plug] 🙂
And see more about sort operators in my article http://www.gplivna.eu/papers/sql_set_operators.htm
[/shameless plug]