PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (Impala 2.6 or higher only)
Causes Impala INSERT
and CREATE TABLE AS SELECT
statements
to write Parquet files that use the UTF-8 annotation for STRING
columns.
Usage notes:
By default, Impala represents a STRING
column in Parquet as an unannotated binary field.
Impala always uses the UTF-8 annotation when writing CHAR
and VARCHAR
columns to Parquet files. An alternative to using the query option is to cast STRING
values to VARCHAR
.
This option is to help make Impala-written data more interoperable with other data processing engines. Impala itself currently does not support all operations on UTF-8 data. Although data processed by Impala is typically represented in ASCII, it is valid to designate the data as UTF-8 when storing on disk, because ASCII is a subset of UTF-8.
Type: Boolean; recognized values are 1 and 0, or true
and
false
; any other value interpreted as false
Default: false
(shown as 0 in output of SET
statement)
Added in: Impala 2.6.0
Related information: