Identifiers are the names of databases, tables, or columns that you specify in a SQL statement. The rules for identifiers govern what names you can give to things you create, the notation for referring to names containing unusual characters, and other aspects such as case sensitivity.
The minimum length of an identifier is 1 character.
The maximum length of an identifier is currently 128 characters except for column names which can contain 767 characters, enforced by the metastore database.
An identifier must start with an alphanumeric or underscore character except for column names which can start with any unicode characters. Quoting the identifier with backticks has no effect on the allowed characters in the name.
An identifier can contain only ASCII characters except for column names which can contain unicode characters.
To use an identifier name that matches one of the Impala reserved keywords (listed in
Impala Reserved Words), surround the identifier with ``
characters (backticks). Quote the reserved word even if it is part of a fully qualified name.
The following example shows how a reserved word can be used as a column name if it is quoted
with backticks in the CREATE TABLE
statement, and how the column name
must also be quoted with backticks in a query:
[localhost:21000] > create table reserved (`data` string);
[localhost:21000] > select data from reserved;
ERROR: AnalysisException: Syntax error in line 1:
select data from reserved
^
Encountered: DATA
Expected: ALL, CASE, CAST, DISTINCT, EXISTS, FALSE, IF, INTERVAL, NOT, NULL, STRAIGHT_JOIN, TRUE, IDENTIFIER
CAUSED BY: Exception: Syntax error
[localhost:21000] > select reserved.data from reserved;
ERROR: AnalysisException: Syntax error in line 1:
select reserved.data from reserved
^
Encountered: DATA
Expected: IDENTIFIER
CAUSED BY: Exception: Syntax error
[localhost:21000] > select reserved.`data` from reserved;
[localhost:21000] >
Impala identifiers are always case-insensitive. That is, tables named t1
and
T1
always refer to the same table, regardless of quote characters. Internally, Impala
always folds all specified table and column names to lowercase. This is why the column headers in query
output are always displayed in lowercase.
See Overview of Impala Aliases for how to define shorter or easier-to-remember aliases if the original names are long or cryptic identifiers. Aliases follow the same rules as identifiers when it comes to case insensitivity. Aliases can be longer than identifiers (up to the maximum length of a Java string) and can include additional characters such as spaces and dashes when they are quoted using backtick characters.
Another way to define different names for the same tables or columns is to create views. See Overview of Impala Views for details.