Securing Impala Data and Log Files

One aspect of security is to protect files from unauthorized access at the filesystem level. For example, if you store sensitive data in HDFS, you specify permissions on the associated files and directories in HDFS to restrict read and write permissions to the appropriate users and groups.

If you issue queries containing sensitive values in the WHERE clause, such as financial account numbers, those values are stored in Impala log files in the Linux filesystem and you must secure those files also. For the locations of Impala log files, see Using Impala Logging.

All Impala read and write operations are performed under the filesystem privileges of the impala user. The impala user must be able to read all directories and data files that you query, and write into all the directories and data files for INSERT and LOAD DATA statements. At a minimum, make sure the impala user is in the hive group so that it can access files and directories shared between Impala and Hive. See User Account Requirements for more details.

Setting file permissions is necessary for Impala to function correctly, but is not an effective security practice by itself: