Enabling Kerberos Authentication for Impala
Impala supports an enterprise-grade authentication system called Kerberos. Kerberos provides strong security benefits including capabilities that render intercepted authentication packets unusable by an attacker. It virtually eliminates the threat of impersonation by never sending a user's credentials in cleartext over the network. For more information on Kerberos, visit the MIT Kerberos website.
The rest of this topic assumes you have a working Kerberos Key Distribution Center (KDC) set up. To enable Kerberos, you first create a Kerberos principal for each host running impalad or statestored.
impala). To implement user-level access to different databases, tables, columns, partitions, and so on, use the Sentry authorization feature, as explained in Enabling Sentry Authorization for Impala.
An alternative form of authentication you can use is LDAP, described in Enabling LDAP Authentication for Impala.
Requirements for Using Impala with Kerberos
sudo yum install python-devel openssl-devel python-pip sudo pip-python install ssl
If you plan to use Impala in your cluster, you must configure your KDC to allow tickets to be renewed,
and you must configure krb5.conf to request renewable tickets. Typically, you can do
this by adding the
max_renewable_life setting to your realm in
kdc.conf, and by adding the renew_lifetime parameter to the
libdefaults section of krb5.conf. For more information about
renewable tickets, see the
Currently, you cannot use the resource management feature on a cluster that has Kerberos authentication enabled.
Start all impalad and statestored daemons with the
--keytab-file flags set to the principal and full path
name of the
keytab file containing the credentials for the principal.
To enable Kerberos in the Impala shell, start the impala-shell command using the
To enable Impala to work with Kerberos security on your Hadoop cluster, make sure you perform the installation and configuration steps in Authentication in Hadoop. Note that when Kerberos security is enabled in Impala, a web browser that supports Kerberos HTTP SPNEGO is required to access the Impala web console (for example, Firefox, Internet Explorer, or Chrome).
If the NameNode, Secondary NameNode, DataNode, JobTracker, TaskTrackers, ResourceManager, NodeManagers, HttpFS, Oozie, Impala, or Impala statestore services are configured to use Kerberos HTTP SPNEGO authentication, and two or more of these services are running on the same host, then all of the running services must use the same HTTP principal and keytab file used for their HTTP endpoints.
Configuring Impala to Support Kerberos Security
Enabling Kerberos authentication for Impala involves steps that can be summarized as follows:
Creating service principals for Impala and the HTTP service. Principal names take the form:
In Impala 2.0 and later,
user()returns the full Kerberos principal string, such as
email@example.com, in a Kerberized environment.
- Creating, merging, and distributing key tab files for these principals.
/etc/default/impalato accommodate Kerberos authentication.
Enabling Kerberos for Impala
Create an Impala service principal, specifying the name of the OS user that the Impala daemons run
under, the fully qualified domain name of each node running impalad, and the realm
name. For example:
$ kadmin kadmin: addprinc -requires_preauth -randkey impala/impala_host.example.com@TEST.EXAMPLE.COM
Create an HTTP service principal. For example:
kadmin: addprinc -randkey HTTP/impala_host.example.com@TEST.EXAMPLE.COMNote: The
HTTPcomponent of the service principal must be uppercase as shown in the preceding example.
keytabfiles with both principals. For example:
kadmin: xst -k impala.keytab impala/impala_host.example.com kadmin: xst -k http.keytab HTTP/impala_host.example.com kadmin: quit
ktutilto read the contents of the two keytab files and then write those contents to a new file. For example:
$ ktutil ktutil: rkt impala.keytab ktutil: rkt http.keytab ktutil: wkt impala-http.keytab ktutil: quit
(Optional) Test that credentials in the merged keytab file are valid, and that the "renew until"
date is in the future. For example:
$ klist -e -k -t impala-http.keytab
Copy the impala-http.keytab file to the Impala configuration directory. Change the
permissions to be only read for the file owner and change the file owner to the
impalauser. By default, the Impala user and group are both named
impala. For example:
$ cp impala-http.keytab /etc/impala/conf $ cd /etc/impala/conf $ chmod 400 impala-http.keytab $ chown impala:impala impala-http.keytab
Add Kerberos options to the Impala defaults file, /etc/default/impala. Add the
options for both the impalad and statestored daemons, using the
IMPALA_STATE_STORE_ARGSvariables. For example, you might add:
-kerberos_reinit_interval=60 -principal=impala_1/impala_host.example.com@TEST.EXAMPLE.COM -keytab_file=/path/to/impala.keytab
For more information on changing the Impala defaults specified in /etc/default/impala, see Modifying Impala Startup Options.
Enabling Kerberos for Impala with a Proxy Server
A common configuration for Impala with High Availability is to use a proxy server to submit requests to the actual impalad daemons on different hosts in the cluster. This configuration avoids connection problems in case of machine failure, because the proxy server can route new requests through one of the remaining hosts in the cluster. This configuration also helps with load balancing, because the additional overhead of being the "coordinator node" for each query is spread across multiple hosts.
Although you can set up a proxy server with or without Kerberos authentication, typically users set up a secure Kerberized configuration. For information about setting up a proxy server for Impala, including Kerberos-specific steps, see Using Impala through a Proxy for High Availability.
Using a Web Browser to Access a URL Protected by Kerberos HTTP SPNEGO
Your web browser must support Kerberos HTTP SPNEGO. For example, Chrome, Firefox, or Internet Explorer.
To configure Firefox to access a URL protected by Kerberos HTTP SPNEGO:
Open the advanced settings Firefox configuration page by loading the
Use the Filter text box to find
network.negotiate-auth.trusted-urispreference and enter the hostname or the domain of the web server that is protected by Kerberos HTTP SPNEGO. Separate multiple domains and hostnames with a comma.
- Click OK.
Enabling Impala Delegation for Kerberos Users
See Configuring Impala Delegation for Hue and BI Tools for details about the delegation feature that lets certain users submit queries using the credentials of other users.
Using TLS/SSL with Business Intelligence Tools
You can use Kerberos authentication, TLS/SSL encryption, or both to secure connections from JDBC and ODBC applications to Impala. See Configuring Impala to Work with JDBC and Configuring Impala to Work with ODBC for details.
Prior to Impala 2.5, the Hive JDBC driver did not support connections that use both Kerberos authentication and SSL encryption. If your cluster is running an older release that has this restriction, use an alternative JDBC driver that supports both of these security features.
Enabling Access to Internal Impala APIs for Kerberos Users
For applications that need direct access
to Impala APIs, without going through the HiveServer2 or Beeswax interfaces, you can
specify a list of Kerberos users who are allowed to call those APIs. By default, the
hdfs users are the only ones authorized
for this kind of access.
Any users not explicitly authorized through the
configuration setting are blocked from accessing the APIs. This setting applies to all the
Impala-related daemons, although currently it is primarily used for HDFS to control the
behavior of the catalog server.
Mapping Kerberos Principals to Short Names for Impala
auth_to_localsetting, specified through the HDFS configuration setting
hadoop.security.auth_to_local. This feature is disabled by default, to avoid an unexpected change in security-related behavior. To enable it:
--load_auth_to_local_rules=truein the impalad and catalogd configuration settings.
Kerberos-Related Memory Overhead for Large Clusters
'kerberos_reinit_interval'may cause out-of-memory errors, because executing the command involves a fork of the Impala process. The error looks similar to the following:
Failed to obtain Kerberos ticket for principal: <varname>principal_details</varname> Failed to execute shell cmd: 'kinit -k -t <varname>keytab_details</varname>', error was: Error(12): Cannot allocate memory
vm.overcommit_memorysetting immediately on a running host. However, this setting is reset when the host is restarted.
echo 1 > /proc/sys/vm/overcommit_memory
sysctl -p. No reboot is needed.