S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)
INSERT operations on tables or partitions residing on the
Amazon S3 filesystem. The tradeoff is the possibility of inconsistent data left behind
if an error occurs partway through the operation.
By default, Impala write operations to S3 tables and partitions involve a two-stage process. Impala writes intermediate files to S3, then (because S3 does not provide a "rename" operation) those intermediate files are copied to their final location, making the process more expensive as on a filesystem that supports renaming or moving files. This query option makes Impala skip the intermediate files, and instead write the new data directly to the final destination.
If a host that is participating in the
INSERT operation fails partway through
the query, you might be left with a table or partition that contains some but not all of the
expected data files. Therefore, this option is most appropriate for a development or test
environment where you have the ability to reconstruct the table if a problem during
INSERT leaves the data in an inconsistent state.
The timing of file deletion during an
INSERT OVERWRITE operation
makes it impractical to write new files to S3 and delete the old files in a single operation.
Therefore, this query option only affects regular
INSERT statements that add
to the existing data in a table, not
INSERT OVERWRITE statements.
TRUNCATE TABLE if you need to remove all contents from an S3 table
before performing a fast
INSERT with this option enabled.
Performance improvements with this option enabled can be substantial. The speed increase might be more noticeable for non-partitioned tables than for partitioned tables.
Type: Boolean; recognized values are 1 and 0, or
any other value interpreted as
true (shown as 1 in output of
Added in: Impala 2.6.0