Impala supports the standard JDBC interface, allowing access from commercial Business Intelligence tools and custom software written in Java or other programming languages. The JDBC driver allows you to access Impala from a Java program that you write, or a Business Intelligence or similar tool that uses JDBC to communicate with various database products.
Setting up a JDBC connection to Impala involves the following steps:
Protocol | Default Port | Flag to Specify an Alternate Port |
HTTP | 28000 | ‑‑hs2_http_port
|
Binary TCP | 21050 | ‑‑hs2_port
|
Make sure the port for the protocol you are using is available for communication with clients, for example, that it is not blocked by firewall software.
If your JDBC client software connects to a different port, specify
that alternative port number with the flag in the above table when
starting the impalad
.
In Impala 2.0 and later, you can use the Hive 0.13 or higher JDBC driver. If you are already using JDBC applications with an earlier Impala release, you should update your JDBC driver, because the Hive 0.12 driver that was formerly the only choice is not compatible with Impala 2.0 and later.
The Hive JDBC driver provides a substantial speed increase for JDBC applications with Impala 2.0 and higher, for queries that return large result sets.
You install the Hive JDBC driver (hive-jdbc
package) through the Linux package manager, on hosts within the
cluster. The driver consists of several JAR files. The same driver can
be used by Impala and Hive.
To get the JAR files, install the Hive JDBC driver on each host in the cluster that will run JDBC applications.
If you are using JDBC-enabled applications on hosts outside the cluster, you cannot use the the same install procedure on the hosts. Install the JDBC driver on at least one cluster host using the preceding procedure. Then download the JAR files to each client machine that will use JDBC with Impala:
commons-logging-X.X.X.jar
hadoop-common.jar
hive-common-X.XX.X.jar
hive-jdbc-X.XX.X.jar
hive-metastore-X.XX.X.jar
hive-service-X.XX.X.jar
httpclient-X.X.X.jar
httpcore-X.X.X.jar
libfb303-X.X.X.jar
libthrift-X.X.X.jar
log4j-X.X.XX.jar
slf4j-api-X.X.X.jar
slf4j-logXjXX-X.X.X.jar
To enable JDBC support for Impala on the system where you run the JDBC application:
pom
file instead
of downloading the individual JARs. CLASSPATH
setting. For example: /opt/jars/
. CLASSPATH
for the client process to include the
JARs. Consult the documentation for your JDBC client for more
details on how to install new JDBC drivers, but some examples of how
to set CLASSPATH
variables include: /opt/jars/
, you might issue the following
command to prepend the JAR files path to an existing classpath:
export CLASSPATH=/opt/jars/*.jar:$CLASSPATH
CLASSPATH
on your client machine refers to
some older version of the Hive JARs, ensure that the new JARs
are the first ones listed. Either put the new JAR files
earlier in the listings, or delete the other references to
Hive JAR files. The JDBC driver class depends on which driver you select.
haproxy
, be cautious about reusing the connections. If the load
balancer has set up connection timeout values, either check the connection frequently so
that it never sits idle longer than the load balancer timeout value, or check the
connection validity before using it and create a new one if the connection has been
closed.
For example, with the Hive JDBC driver, the class name is
org.apache.hive.jdbc.HiveDriver
. Once you have
configured Impala to work with JDBC, you can establish connections
between the two. To do so for a cluster that does not use Kerberos
authentication, use a connection string of the form
jdbc:hive2://host:port/;auth=noSasl
.
For example, you might use:
jdbc:hive2://myhost.example.com:21050/;auth=noSasl
To connect to an instance of Impala that requires Kerberos
authentication, use a connection string of the form
jdbc:hive2://host:port/;principal=principal_name
.
The principal must be the same user principal you used when starting
Impala. For example, you might use:
jdbc:hive2://myhost.example.com:21050/;principal=impala/myhost.example.com@H2.EXAMPLE.COM
To connect to an instance of Impala that requires LDAP
authentication, use a connection string of the form
jdbc:hive2://host:port/db_name;user=ldap_userid;password=ldap_password
.
For example, you might use:
jdbc:hive2://myhost.example.com:21050/test_db;user=fred;password=xyz123
transportMode=http
in the
connection string. For example:
jdbc:hive2://myhost.example.com:28000/;transportMode=http
Prior to Impala 2.5, the Hive JDBC driver did not support connections that use both Kerberos authentication and SSL encryption. If your cluster is running an older release that has this restriction, use an alternative JDBC driver that supports both of these security features.
Most Impala SQL features work equivalently through the impala-shell interpreter of the JDBC or ODBC APIs. The following are some exceptions to keep in mind when switching between the interactive shell and applications using the APIs:
Complex type considerations:
Queries involving the complex types (ARRAY
,
STRUCT
, and MAP
) require
notation that might not be available in all levels of JDBC and
ODBC drivers. If you have trouble querying such a table due to
the driver level or inability to edit the queries used by the
application, you can create a view that exposes a
"flattened" version of the complex columns and point the
application at the view. See Complex Types (Impala 2.3 or higher only) for details.
The complex types available in Impala 2.3 and higher are supported by the JDBC
getColumns()
API. Both MAP
and ARRAY
are reported as the JDBC SQL Type
ARRAY
, because this is the closest matching
Java SQL type. This behavior is consistent with Hive.
STRUCT
types are reported as the JDBC SQL
Type STRUCT
.
toSql()
for complex types. The
resulting type names are somewhat inconsistent, because nested
types are printed differently than top-level types. For example,
the following list shows how toSQL()
for Impala
types are translated to TYPE_NAME
values: DECIMAL(10,10) becomes DECIMAL
CHAR(10) becomes CHAR
VARCHAR(10) becomes VARCHAR
ARRAY<DECIMAL(10,10)> becomes ARRAY<DECIMAL(10,10)>
ARRAY<CHAR(10)> becomes ARRAY<CHAR(10)>
ARRAY<VARCHAR(10)> becomes ARRAY<VARCHAR(10)>
Currently, Impala INSERT
, UPDATE
, or
other DML statements issued through the JDBC interface against a Kudu
table do not return JDBC error codes for conditions such as duplicate
primary key columns. Therefore, for applications that issue a high
volume of DML statements, prefer to use the Kudu Java API directly
rather than a JDBC application.