Impala includes a fine-grained authorization framework for Hadoop, based on Apache Ranger. Ranger authorization was added in Impala 3.3.0. Together with the Kerberos authentication framework, Ranger takes Hadoop security to a new level needed for the requirements of highly regulated industries such as healthcare, financial services, and government. Impala also includes an auditing capability which was added in Impala 1.1.1; Impala generates the audit data which can be consumed, filtered, and visualized by cluster-management components focused on governance.
The Impala security features have several objectives. At the most basic level, security prevents accidents or mistakes that could disrupt application processing, delete or corrupt data, or reveal data to unauthorized users. More advanced security features and practices can harden the system against malicious users trying to gain unauthorized access or perform other disallowed operations. The auditing feature provides a way to confirm that no unauthorized access occurred, and detect whether any such attempts were made. This is a critical set of features for production deployments in large organizations that handle important or sensitive data. It sets the stage for multi-tenancy, where multiple applications run concurrently and are prevented from interfering with each other.
The material in this section presumes that you are already familiar with administering secure Linux systems. That is, you should know the general security practices for Linux and Hadoop, and their associated commands and configuration files. For example, you should know how to create Linux users and groups, manage Linux group membership, set Linux and HDFS file permissions and ownership, and designate the default permissions and ownership for new files. You should be familiar with the configuration of the nodes in your Hadoop cluster, and know how to apply configuration changes or run a set of commands across all the nodes.
The security features are divided into these broad categories:
impala
user, which is suitable for a development/test environment but not for a secure production environment.
When authorization is enabled, Impala uses the OS user ID of the user who runs
impala-shell or other client program, and associates various privileges with each
user. See Impala Authorization for details about setting up and managing
authorization.