All articles


Impala: A Modern, Open-Source SQL Engine for Hadoop

Presented at The Conference on Innovative Data Systems Research (CIDR) 2015.

ABSTRACT

Cloudera Impala is a modern, open-source MPP SQL engine architected from the ground up for the Hadoop data processing environment. Impala provides low latency and high concurrency for BI/analytic read-mostly queries on Hadoop, not delivered by batch frameworks such as Apache Hive. This paper presents Impala from a user’s perspective, gives an overview of its architecture and main components and briefly demonstrates its superior performance compared against other popular SQL-on-Hadoop systems.

Paper | Slides