Configuring Impala to Work with JDBC
Impala supports the standard JDBC interface, allowing access from commercial Business Intelligence tools and custom software written in Java or other programming languages. The JDBC driver allows you to access Impala from a Java program that you write, or a Business Intelligence or similar tool that uses JDBC to communicate with various database products.
Setting up a JDBC connection to Impala involves the following steps:
- Verifying the communication port where the Impala daemons in your cluster are listening for incoming JDBC requests.
- Installing the JDBC driver on every system that runs the JDBC-enabled application.
- Specifying a connection string for the JDBC application to access one of the servers running the impalad daemon, with the appropriate security settings.
Configuring the JDBC Port
The default port used by JDBC 2.0 and later (as well as ODBC 2.x) is 21050. Impala server accepts JDBC
connections through this same port 21050 by default. Make sure this port is available for communication
with other hosts on your network, for example, that it is not blocked by firewall software. If your JDBC
client software connects to a different port, specify that alternative port number with the
--hs2_port
option when starting impalad
. See
Starting Impala for details about Impala startup options. See
Ports Used by Impala for information about all ports used for communication between Impala
and clients or between Impala components.
Choosing the JDBC Driver
In Impala 2.0 and later, you can use the Hive 0.13 JDBC driver. If you are already using JDBC applications with an earlier Impala release, you should update your JDBC driver, because the Hive 0.12 driver that was formerly the only choice is not compatible with Impala 2.0 and later.
The Hive JDBC driver provides a substantial speed increase for JDBC applications with Impala 2.0 and higher, for queries that return large result sets.
Complex type considerations:
The Impala complex types (STRUCT
, ARRAY
, or MAP
)
are available in Impala 2.3 and higher.
To use these types with JDBC requires version 2.5.28 or higher of the JDBC Connector for Impala.
To use these types with ODBC requires version 2.5.30 or higher of the ODBC Connector for Impala.
Consider upgrading all JDBC and ODBC drivers at the same time you upgrade from Impala 2.3 or higher.
Although the result sets from queries involving complex types consist of all scalar values, the queries involve join notation and column references that might not be understood by a particular JDBC or ODBC connector. Consider defining a view that represents the flattened version of a table containing complex type columns, and pointing the JDBC or ODBC application at the view. See Complex Types (Impala 2.3 or higher only) for details.
Enabling Impala JDBC Support on Client Systems
Using the Hive JDBC Driver
You install the Hive JDBC driver (hive-jdbc
package) through the Linux package manager, on
hosts within the cluster. The driver consists of several Java JAR files. The same driver can be used by Impala and Hive.
To get the JAR files, install the Hive JDBC driver on each host in the cluster that will run JDBC applications.
If you are using JDBC-enabled applications on hosts outside the cluster, you cannot use the the same install procedure on the hosts. Install the JDBC driver on at least one cluster host using the preceding procedure. Then download the JAR files to each client machine that will use JDBC with Impala:
commons-logging-X.X.X.jar
hadoop-common.jar
hive-common-X.XX.X.jar
hive-jdbc-X.XX.X.jar
hive-metastore-X.XX.X.jar
hive-service-X.XX.X.jar
httpclient-X.X.X.jar
httpcore-X.X.X.jar
libfb303-X.X.X.jar
libthrift-X.X.X.jar
log4j-X.X.XX.jar
slf4j-api-X.X.X.jar
slf4j-logXjXX-X.X.X.jar
To enable JDBC support for Impala on the system where you run the JDBC application:
-
Download the JAR files listed above to each client machine.
Note: For Maven users, see this sample github page for an example of the dependencies you could add to a
pom
file instead of downloading the individual JARs. -
Store the JAR files in a location of your choosing, ideally a directory already referenced in your
CLASSPATH
setting. For example:-
On Linux, you might use a location such as
/opt/jars/
. - On Windows, you might use a subdirectory underneath C:\Program Files.
-
On Linux, you might use a location such as
-
To successfully load the Impala JDBC driver, client programs must be able to locate the associated JAR
files. This often means setting the
CLASSPATH
for the client process to include the JARs. Consult the documentation for your JDBC client for more details on how to install new JDBC drivers, but some examples of how to setCLASSPATH
variables include:-
On Linux, if you extracted the JARs to
/opt/jars/
, you might issue the following command to prepend the JAR files path to an existing classpath:export CLASSPATH=/opt/jars/*.jar:$CLASSPATH
-
On Windows, use the System Properties control panel item to modify the Environment
Variables for your system. Modify the environment variables to include the path to which you
extracted the files.
Note: If the existing
CLASSPATH
on your client machine refers to some older version of the Hive JARs, ensure that the new JARs are the first ones listed. Either put the new JAR files earlier in the listings, or delete the other references to Hive JAR files.
-
On Linux, if you extracted the JARs to
Establishing JDBC Connections
The JDBC driver class depends on which driver you select.
haproxy
, be cautious about reusing the connections. If the load balancer has set up
connection timeout values, either check the connection frequently so that it never sits idle longer than
the load balancer timeout value, or check the connection validity before using it and create a new one if
the connection has been closed.
Using the Hive JDBC Driver
For example, with the Hive JDBC driver, the class name is org.apache.hive.jdbc.HiveDriver
.
Once you have configured Impala to work with JDBC, you can establish connections between the two.
To do so for a cluster that does not use
Kerberos authentication, use a connection string of the form
jdbc:hive2://host:port/;auth=noSasl
.
For example, you might use:
jdbc:hive2://myhost.example.com:21050/;auth=noSasl
To connect to an instance of Impala that requires Kerberos authentication, use a connection string of the
form
jdbc:hive2://host:port/;principal=principal_name
.
The principal must be the same user principal you used when starting Impala. For example, you might use:
jdbc:hive2://myhost.example.com:21050/;principal=impala/myhost.example.com@H2.EXAMPLE.COM
To connect to an instance of Impala that requires LDAP authentication, use a connection string of the form
jdbc:hive2://host:port/db_name;user=ldap_userid;password=ldap_password
.
For example, you might use:
jdbc:hive2://myhost.example.com:21050/test_db;user=fred;password=xyz123
Prior to Impala 2.5, the Hive JDBC driver did not support connections that use both Kerberos authentication and SSL encryption. If your cluster is running an older release that has this restriction, use an alternative JDBC driver that supports both of these security features.
Notes about JDBC and ODBC Interaction with Impala SQL Features
Most Impala SQL features work equivalently through the impala-shell interpreter of the JDBC or ODBC APIs. The following are some exceptions to keep in mind when switching between the interactive shell and applications using the APIs:
-
Complex type considerations:
-
Queries involving the complex types (
ARRAY
,STRUCT
, andMAP
) require notation that might not be available in all levels of JDBC and ODBC drivers. If you have trouble querying such a table due to the driver level or inability to edit the queries used by the application, you can create a view that exposes a "flattened" version of the complex columns and point the application at the view. See Complex Types (Impala 2.3 or higher only) for details. -
The complex types available in Impala 2.3 and higher are supported by the JDBC
getColumns()
API. BothMAP
andARRAY
are reported as the JDBC SQL TypeARRAY
, because this is the closest matching Java SQL type. This behavior is consistent with Hive.STRUCT
types are reported as the JDBC SQL TypeSTRUCT
.To be consistent with Hive's behavior, the TYPE_NAME field is populated with the primitive type name for scalar types, and with the fulltoSql()
for complex types. The resulting type names are somewhat inconsistent, because nested types are printed differently than top-level types. For example, the following list shows howtoSQL()
for Impala types are translated toTYPE_NAME
values:DECIMAL(10,10) becomes DECIMAL CHAR(10) becomes CHAR VARCHAR(10) becomes VARCHAR ARRAY<DECIMAL(10,10)> becomes ARRAY<DECIMAL(10,10)> ARRAY<CHAR(10)> becomes ARRAY<CHAR(10)> ARRAY<VARCHAR(10)> becomes ARRAY<VARCHAR(10)>
-
Kudu Considerations for DML Statements
Currently, Impala INSERT
, UPDATE
, or
other DML statements issued through the JDBC interface against a Kudu
table do not return JDBC error codes for conditions such as duplicate
primary key columns. Therefore, for applications that issue a high
volume of DML statements, prefer to use the Kudu Java API directly
rather than a JDBC application.