DELETE Statement (Impala 2.8 or higher only)

Deletes an arbitrary number of rows from a table. This statement only works for Kudu (from Impala 2.8) and Iceberg tables (from Impala 4.3).

Syntax:


DELETE [FROM] [database_name.]table_name [ WHERE where_conditions ]

DELETE table_ref FROM [joined_table_refs] [ WHERE where_conditions ]

The first form evaluates rows from one table against an optional WHERE clause, and deletes all the rows that match the WHERE conditions, or all rows if WHERE is omitted.

The second form evaluates one or more join clauses, and deletes all matching rows from one of the tables. The join clauses can include tables of any kind, but the table from which the rows are deleted must be a Kudu or Iceberg table. The FROM keyword is required in this case, to separate the name of the table whose rows are being deleted from the table names of the join clauses.

Usage notes:

The conditions in the WHERE clause are the same ones allowed for the SELECT statement. See SELECT Statement for details.

The conditions in the WHERE clause can refer to any combination of primary key columns or other columns. Referring to primary key columns in the WHERE clause is more efficient than referring to non-primary key columns (in case of Kudu tables).

If the WHERE clause is omitted, all rows are removed from the table.

Because Kudu currently does not enforce strong consistency during concurrent DML operations, be aware that the results after this statement finishes might be different than you intuitively expect:

Iceberg also allows concurrent DELETE operations, in which case the concurrent DELETEs might remove the same rows. This won't corrupt the table as Iceberg allows such behavior.

The number of affected rows is reported in an impala-shell message and in the query profile.

Statement type: DML

Important: After adding or replacing data in a table used in performance-critical queries, issue a COMPUTE STATS statement to make sure all statistics are up-to-date. Consider updating statistics for a table after any INSERT, LOAD DATA, or CREATE TABLE AS SELECT statement in Impala, or after loading data through Hive and doing a REFRESH table_name in Impala. This technique is especially important for tables that are very large, used in join queries, or both.

Examples:

The following examples show how to delete rows from a specified table, either all rows or rows that match a WHERE clause:


-- Deletes all rows. The FROM keyword is optional.
DELETE FROM table;
DELETE table;

-- Deletes 0, 1, or more rows.
-- (If c1 is a single-column primary key, the statement could only
-- delete 0 or 1 rows.)
DELETE FROM table WHERE c1 = 100;

-- Deletes all rows that match all the WHERE conditions.
DELETE FROM table WHERE
  (c1 > c2 OR c3 IN ('hello','world')) AND c4 IS NOT NULL;
DELETE FROM t1 WHERE
  (c1 IN (1,2,3) AND c2 > c3) OR c4 IS NOT NULL;
DELETE FROM time_series WHERE
  year = 2016 AND month IN (11,12) AND day > 15;

-- WHERE condition with a subquery.
DELETE FROM t1 WHERE
  c5 IN (SELECT DISTINCT other_col FROM other_table);

-- Does not delete any rows, because the WHERE condition is always false.
DELETE FROM table WHERE 1 = 0;

The following examples show how to delete rows that are part of the result set from a join:


-- Remove _all_ rows from t1 that have a matching X value in t2.
DELETE t1 FROM t1 JOIN t2 ON t1.x = t2.x;

-- Remove _some_ rows from t1 that have a matching X value in t2.
DELETE t1 FROM t1 JOIN t2 ON t1.x = t2.x
  WHERE t1.y = FALSE and t2.z > 100;

-- Delete from a table based on a join with another table.
DELETE t1 FROM table t1 JOIN other_table t2 ON t1.x = t2.x;

-- The tables can be joined in any order as long as the Kudu or Iceberg table
-- is specified as the deletion target.
DELETE t2 FROM non_kudu_non_ice_table t1 JOIN kudu_or_ice_table t2 ON t1.x = t2.x;

Related information:

Using Impala with Iceberg Tables, Using Impala to Query Kudu Tables, INSERT Statement, UPDATE Statement (Impala 2.8 or higher only), UPSERT Statement (Impala 2.8 or higher only)