PARQUET_READ_PAGE_INDEX Query Option

Use the PARQUET_READ_PAGE_INDEX query option to disable or enable using the Parquet page index during scans. The page index contains min/max statistics at the page-level granularity. It can be used to skip pages and rows that do not match the conditions in the WHERE clause.

This option enables the same optimization as the PARQUET_READ_STATISTICS at the finer grained page level.

Impala supports filtering based on Parquet statistics:

  • Of the types: Boolean, Integer, Decimal, String, Timestamp
  • For simple predicates of the forms: <slot> <op> <constant> or <constant> <op> <slot>, where <op> is LT, LE, GE, GT, and EQ
The supported values for the query option are:
  • true (1): Read the page-level statistics from the Parquet page index during query processing and filter out pages based on the statistics.
  • false (0): Do not use the Parquet page index.
  • Any other values are treated as false.

Type: Boolean

Default: TRUE