PARQUET_FILE_SIZE Query Option
Specifies the maximum size of each Parquet data file produced by Impala INSERT
statements.
Syntax:
Specify the size in bytes, or with a trailing m
or g
character to indicate
megabytes or gigabytes. For example:
-- 128 megabytes.
set PARQUET_FILE_SIZE=134217728
INSERT OVERWRITE parquet_table SELECT * FROM text_table;
-- 512 megabytes.
set PARQUET_FILE_SIZE=512m;
INSERT OVERWRITE parquet_table SELECT * FROM text_table;
-- 1 gigabyte.
set PARQUET_FILE_SIZE=1g;
INSERT OVERWRITE parquet_table SELECT * FROM text_table;
Usage notes:
With tables that are small or finely partitioned, the default Parquet block size (formerly 1 GB, now 256 MB
in Impala 2.0 and later) could be much larger than needed for each data file. For INSERT
operations into such tables, you can increase parallelism by specifying a smaller
PARQUET_FILE_SIZE
value, resulting in more HDFS blocks that can be processed by different
nodes.
Type: numeric, with optional unit specifier
Currently, the maximum value for this setting is 1 gigabyte (1g
).
Setting a value higher than 1 gigabyte could result in errors during
an INSERT
operation.
Default: 0 (produces files with a target size of 256 MB; files might be larger for very wide tables)
Because ADLS does not expose the block sizes of data files the way HDFS does,
any Impala INSERT
or CREATE TABLE AS SELECT
statements
use the PARQUET_FILE_SIZE
query option setting to define the size of
Parquet data files. (Using a large block size is more important for Parquet tables than
for tables that use other file formats.)
Isilon considerations:
PARQUET_FILE_SIZE
query option has no effect when Impala inserts data into a table or partition
residing on Isilon storage. Use the isi
command to set the
default block size globally on the Isilon device. For example, to set the
Isilon default block size to 256 MB, the recommended size for Parquet
data files for Impala, issue the following command:
isi hdfs settings modify --default-block-size=256MB
Related information:
For information about the Parquet file format, and how the number and size of data files affects query performance, see Using the Parquet File Format with Impala Tables.