DISTINCT operator in a
SELECT statement filters the result set to
-- Returns the unique values from one column. -- NULL is included in the set of values if any rows have a NULL in this column. select distinct c_birth_country from customer; -- Returns the unique combinations of values from multiple columns. select distinct c_salutation, c_last_name from customer;
You can use
DISTINCT in combination with an aggregation function, typically
COUNT(), to find how many different values a column contains:
-- Counts the unique values from one column. -- NULL is not included as a distinct value in the count. select count(distinct c_birth_country) from customer; -- Counts the unique combinations of values from multiple columns. select count(distinct c_salutation, c_last_name) from customer;
One construct that Impala SQL does not support is using
DISTINCT in more than one
aggregation function in the same query. For example, you could not have a single query with both
COUNT(DISTINCT c_first_name) and
COUNT(DISTINCT c_last_name) in the
Zero-length strings: For purposes of clauses such as
BY, Impala considers zero-length strings (
NULL, and space
to all be different values.
By default, Impala only allows a single
expression in each query.
If you do not need precise accuracy, you can produce an estimate of the distinct values for a column by
NDV(column); a query can contain multiple instances of
NDV(column). To make Impala automatically rewrite
COUNT(DISTINCT) expressions to
NDV(), enable the
APPX_COUNT_DISTINCT query option.
To produce the same result as multiple
COUNT(DISTINCT) expressions, you can use the
following technique for queries involving a single table:
select v1.c1 result1, v2.c1 result2 from (select count(distinct col1) as c1 from t1) v1 cross join (select count(distinct col2) as c1 from t1) v2;
CROSS JOIN is an expensive operation, prefer to use the
technique wherever practical.
In contrast with some database systems that always return
DISTINCT values in sorted order,
Impala does not do any ordering of
DISTINCT values. Always include an
BY clause if you need the values in alphabetical or numeric sorted order.