Apache Impala Guide
Introducing Apache Impala
Concepts and Architecture
Components
Developing Applications
Role in the Hadoop Ecosystem
Deployment Planning
Requirements
Designing Schemas
Installing Impala
Managing Impala
Post-Installation Configuration for Impala
Upgrading Impala
Starting Impala
Modifying Impala Startup Options
Tutorials
Administration
Setting Timeouts
Load-Balancing Proxy for HA
Managing Disk Space
Impala Security
Security Guidelines for Impala
Securing Impala Data and Log Files
Installation Considerations for Impala Security
Securing the Hive Metastore Database
Securing the Impala Web User Interface
Configuring TLS/SSL for Impala
Impala Authorization
Impala Authentication
Enabling Kerberos Authentication for Impala
Enabling LDAP Authentication for Impala
Using Multiple Authentication Methods with Impala
Configuring Impala Delegation for Clients
Auditing
Viewing Lineage Info
SQL Reference
Comments
Data Types
ARRAY Complex Type (Impala 2.3 or higher only)
BIGINT
BOOLEAN
CHAR
DATE
DECIMAL
DOUBLE
FLOAT
INT
MAP Complex Type (Impala 2.3 or higher only)
REAL
SMALLINT
STRING
STRUCT Complex Type (Impala 2.3 or higher only)
TIMESTAMP
Customizing Time Zones
TINYINT
VARCHAR
Complex Types (Impala 2.3 or higher only)
Literals
SQL Operators
Schema Objects and Object Names
Aliases
Databases
Functions
Identifiers
Tables
Views
Transactions
SQL Statements
DDL Statements
DML Statements
ALTER DATABASE
ALTER TABLE
ALTER VIEW
COMMENT
COMPUTE STATS
CREATE DATABASE
CREATE FUNCTION
CREATE ROLE
CREATE TABLE
CREATE VIEW
DELETE
DESCRIBE
DROP DATABASE
DROP FUNCTION
DROP ROLE
DROP STATS
DROP TABLE
DROP VIEW
EXPLAIN
GRANT
INSERT
INVALIDATE METADATA
LOAD DATA
REFRESH
REFRESH AUTHORIZATION
REFRESH FUNCTIONS
REVOKE
SELECT
Joins
ORDER BY Clause
GROUP BY Clause
HAVING Clause
LIMIT Clause
OFFSET Clause
UNION Clause
Subqueries
TABLESAMPLE Clause
WITH Clause
DISTINCT Operator
SET
Query Options for the SET Statement
ABORT_ON_ERROR
ALLOW_ERASURE_CODED_FILES
ALLOW_UNSUPPORTED_FORMATS
APPX_COUNT_DISTINCT
BATCH_SIZE
BROADCAST_BYTES_LIMIT
BUFFER_POOL_LIMIT
COMPRESSION_CODEC
COMPUTE_STATS_MIN_SAMPLE_SIZE
DEBUG_ACTION
DECIMAL_V2
DEFAULT_FILE_FORMAT
DEFAULT_HINTS_INSERT_STATEMENT
DEFAULT_JOIN_DISTRIBUTION_MODE
DEFAULT_SPILLABLE_BUFFER_SIZE
DEFAULT_TRANSACTIONAL_TYPE
DELETE_STATS_IN_TRUNCATE
DISABLE_CODEGEN
DISABLE_CODEGEN_ROWS_THRESHOLD
DISABLE_HBASE_NUM_ROWS_ESTIMATE
DISABLE_ROW_RUNTIME_FILTERING
DISABLE_STREAMING_PREAGGREGATIONS
DISABLE_UNSAFE_SPILLS
ENABLE_EXPR_REWRITES
EXEC_SINGLE_NODE_ROWS_THRESHOLD
EXEC_TIME_LIMIT_S
EXPLAIN_LEVEL
MAX_NUM_RUNTIME_FILTERS
FETCH_ROWS_TIMEOUT_MS
JOIN_ROWS_PRODUCED_LIMIT
HBASE_CACHE_BLOCKS
HBASE_CACHING
IDLE_SESSION_TIMEOUT
KUDU_READ_MODE
LIVE_PROGRESS
LIVE_SUMMARY
MAX_ERRORS
MAX_MEM_ESTIMATE_FOR_ADMISSION
MAX_RESULT_SPOOLING_MEM
MAX_ROW_SIZE
MAX_SCAN_RANGE_LENGTH
MAX_SPILLED_RESULT_SPOOLING_MEM
MEM_LIMIT
MIN_SPILLABLE_BUFFER_SIZE
MT_DOP
NUM_NODES
NUM_ROWS_PRODUCED_LIMIT
NUM_SCANNER_THREADS
OPTIMIZE_PARTITION_KEY_SCANS
PARQUET_COMPRESSION_CODEC
PARQUET_ANNOTATE_STRINGS_UTF8
PARQUET_ARRAY_RESOLUTION
PARQUET_DICTIONARY_FILTERING
PARQUET_FALLBACK_SCHEMA_RESOLUTION
PARQUET_FILE_SIZE
PARQUET_OBJECT_STORE_SPLIT_SIZE
PARQUET_PAGE_ROW_COUNT_LIMIT
PARQUET_READ_STATISTICS
PARQUET_READ_PAGE_INDEX
PARQUET_WRITE_PAGE_INDEX
PREFETCH_MODE
QUERY_TIMEOUT_S
REFRESH_UPDATED_HMS_PARTITIONS
REPLICA_PREFERENCE
REQUEST_POOL
RESOURCE_TRACE_RATIO
RETRY_FAILED_QUERIES
RUNTIME_BLOOM_FILTER_SIZE
RUNTIME_FILTER_MAX_SIZE
RUNTIME_FILTER_MIN_SIZE
RUNTIME_FILTER_MODE
RUNTIME_FILTER_WAIT_TIME_MS
S3_SKIP_INSERT_STAGING
SCAN_BYTES_LIMIT
SCHEDULE_RANDOM_REPLICA
SCRATCH_LIMIT
SHUFFLE_DISTINCT_EXPRS
SPOOL_QUERY_RESULTS
SUPPORT_START_OVER
SYNC_DDL
THREAD_RESERVATION_AGGREGATE_LIMIT
THREAD_RESERVATION_LIMIT
TIMEZONE
TOPN_BYTES_LIMIT
UTF8_MODE
EXPAND_COMPLEX_TYPES
SHOW
SHUTDOWN
TRUNCATE TABLE
UPDATE
UPSERT
USE
VALUES
Optimizer Hints
Built-In Functions
Mathematical Functions
Bit Functions
Type Conversion Functions
Date and Time Functions
Conditional Functions
String Functions
Miscellaneous Functions
Aggregate Functions
APPX_MEDIAN
AVG
COUNT
GROUP_CONCAT
MAX
MIN
NDV
STDDEV, STDDEV_SAMP, STDDEV_POP
SUM
VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP
Analytic Functions
User-Defined Functions (UDFs)
SQL Differences Between Impala and Hive
Porting SQL
UTF-8 Support
Performance Tuning
Performance Best Practices
Join Performance
Table and Column Statistics
Benchmarking
Controlling Resource Usage
Runtime Filtering
HDFS Caching
HDFS Block Skew
Data Cache for Remote Reads
Testing Impala Performance
EXPLAIN Plans and Query Profiles
Scalability Considerations
Scaling Limits and Guidelines
Dedicated Coordinators Optimization
Metadata Management
Resource Management
Admission Control and Query Queuing
Configuring Admission Control
Partitioning
File Formats
Text Data Files
Parquet Data Files
ORC Data Files
Avro Data Files
Hudi Data Files
RCFile Data Files
SequenceFile Data Files
Using Impala to Query Kudu Tables
HBase Tables
Iceberg Tables
S3 Tables
ADLS Tables
Isilon Storage
Ozone Storage
Logging
Client Access
The Impala Shell
Configuration Options
Connecting to impalad
Running Commands and SQL Statements
Command Reference
Configuring Impala to Work with ODBC
Configuring Impala to Work with JDBC
Spooling Impala Query Results
Fault Tolerance
Impala Transparent Query Retries
Impala Node Blacklisting
Troubleshooting Impala
Web User Interface
Breakpad Minidumps
Ports Used by Impala
Impala Reserved Words
Impala Frequently Asked Questions
Impala Release Notes
New Features in Apache Impala
Incompatible Changes and Limitations in Apache Impala
Known Issues and Workarounds in Impala
Fixed Issues in Apache Impala