Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
com.cloudera.impala.catalog.HBaseTable Class Reference
Inheritance diagram for com.cloudera.impala.catalog.HBaseTable:
Collaboration diagram for com.cloudera.impala.catalog.HBaseTable:

Public Member Functions

void load (Table oldValue, HiveMetaStoreClient client, org.apache.hadoop.hive.metastore.api.Table msTbl) throws TableLoadingException
 
synchronized Pair< Long, Long > getEstimatedRowStats (byte[] startRowKey, byte[] endRowKey)
 
long getHdfsSize (HRegionInfo info) throws IOException
 
ArrayList< ColumngetColumnsInHiveOrder ()
 
TTableDescriptor toThriftDescriptor (Set< Long > referencedPartitions)
 
String getHBaseTableName ()
 
HTable getHTable ()
 
int getNumNodes ()
 
TCatalogObjectType getCatalogObjectType ()
 
TTable toThrift ()
 
String getStorageHandlerClassName ()
 
TResultSet getTableStats ()
 
void addColumn (Column col)
 
void clearColumns ()
 
void updateLastDdlTime (long ddlTime)
 
void validate () throws TableLoadingException
 
TCatalogObject toTCatalogObject ()
 
Db getDb ()
 
String getName ()
 
String getFullName ()
 
TableName getTableName ()
 
String getOwner ()
 
ArrayList< ColumngetColumns ()
 
List< String > getColumnNames ()
 
List< ColumngetNonClusteringColumns ()
 
Column getColumn (String name)
 
org.apache.hadoop.hive.metastore.api.Table getMetaStoreTable ()
 
int getNumClusteringCols ()
 
TableId getId ()
 
long getNumRows ()
 
ArrayType getType ()
 
long getCatalogVersion ()
 
void setCatalogVersion (long catalogVersion)
 
boolean isLoaded ()
 

Static Public Member Functions

static Path getRootDir (final Configuration c) throws IOException
 
static Configuration getHBaseConf ()
 
static List< HRegionLocation > getRegionsInRange (HTable hbaseTbl, final byte[] startKey, final byte[] endKey) throws IOException
 
static boolean isHBaseTable (org.apache.hadoop.hive.metastore.api.Table msTbl)
 
static Table fromMetastoreTable (TableId id, Db db, org.apache.hadoop.hive.metastore.api.Table msTbl)
 
static Table fromThrift (Db parentDb, TTable thriftTable) throws TableLoadingException
 

Static Public Attributes

static final String DEFAULT_PREFIX = "default."
 
static final int ROW_COUNT_ESTIMATE_BATCH_SIZE = 10
 

Protected Member Functions

 HBaseTable (TableId id, org.apache.hadoop.hive.metastore.api.Table msTbl, Db db, String name, String owner)
 
void loadFromThrift (TTable table) throws TableLoadingException
 
List< String > getColumnNamesWithHmsStats ()
 
void loadAllColumnStats (HiveMetaStoreClient client)
 
Type parseColumnType (FieldSchema fs) throws TableLoadingException
 

Static Protected Member Functions

static long getRowCount (Map< String, String > parameters)
 

Protected Attributes

HBaseColumn rowKey_
 
String hbaseTableName_
 
final
org.apache.hadoop.hive.metastore.api.Table 
msTable_
 
final TableId id_
 
final Db db_
 
final String name_
 
final String owner_
 
TTableDescriptor tableDesc_
 
List< FieldSchema > fields_
 
TAccessLevel accessLevel_ = TAccessLevel.READ_WRITE
 
int numClusteringCols_
 
long numRows_ = -1
 
final ArrayType type_ = new ArrayType(new StructType())
 
long lastDdlTime_
 

Static Protected Attributes

static EnumSet< TableType > SUPPORTED_TABLE_TYPES
 

Private Member Functions

void parseColumnMapping (boolean tableDefaultStorageIsBinary, String columnsMappingSpec, List< FieldSchema > fieldSchemas, List< String > columnFamilies, List< String > columnQualifiers, List< Boolean > colIsBinaryEncoded) throws SerDeException
 
boolean supportsBinaryEncoding (FieldSchema fs)
 
String getHBaseTableName (org.apache.hadoop.hive.metastore.api.Table tbl)
 
Pair< Long, Long > getEstimatedRowStatsForRegion (HRegionLocation location, boolean isCompressed) throws IOException
 
THBaseTable getTHBaseTable ()
 

Private Attributes

HTable hTable_ = null
 
HColumnDescriptor[] columnFamilies_ = null
 

Static Private Attributes

static final double DELTA_FROM_AVERAGE = 0.15
 
static final Logger LOG = Logger.getLogger(HBaseTable.class)
 
static final int MIN_NUM_REGIONS_TO_CHECK = 5
 
static final String HBASE_INPUT_FORMAT
 
static final String HBASE_SERIALIZATION_LIB
 
static final String HBASE_STORAGE_HANDLER
 
static final String ROW_KEY_COLUMN_FAMILY = ":key"
 
static final Configuration hbaseConf_ = HBaseConfiguration.create()
 

Detailed Description

Impala representation of HBase table metadata, as loaded from Hive's metastore. This implies that we inherit the metastore's limitations related to HBase, for example the lack of support for composite HBase row keys. We sort the HBase columns (cols) by family/qualifier to simplify the retrieval logic in the backend, since HBase returns data ordered by family/qualifier. This implies that a "select *"-query on an HBase table will not have the columns ordered as they were declared in the DDL. They will be ordered by family/qualifier.

Definition at line 76 of file HBaseTable.java.

Constructor & Destructor Documentation

com.cloudera.impala.catalog.HBaseTable.HBaseTable ( TableId  id,
org.apache.hadoop.hive.metastore.api.Table  msTbl,
Db  db,
String  name,
String  owner 
)
inlineprotected

Definition at line 123 of file HBaseTable.java.

Member Function Documentation

void com.cloudera.impala.catalog.Table.clearColumns ( )
inlineinherited
static Table com.cloudera.impala.catalog.Table.fromMetastoreTable ( TableId  id,
Db  db,
org.apache.hadoop.hive.metastore.api.Table  msTbl 
)
inlinestaticinherited

Creates a table of the appropriate type based on the given hive.metastore.api.Table object.

Definition at line 207 of file Table.java.

References com.cloudera.impala.catalog.DataSourceTable.isDataSourceTable(), com.cloudera.impala.catalog.HBaseTable.isHBaseTable(), and com.cloudera.impala.catalog.HdfsFileFormat.isHdfsFormatClass().

static Table com.cloudera.impala.catalog.Table.fromThrift ( Db  parentDb,
TTable  thriftTable 
) throws TableLoadingException
inlinestaticinherited

Factory method that creates a new Table from its Thrift representation. Determines the type of table to create based on the Thrift table provided.

Definition at line 231 of file Table.java.

TCatalogObjectType com.cloudera.impala.catalog.HBaseTable.getCatalogObjectType ( )
inline

Implements com.cloudera.impala.catalog.CatalogObject.

Definition at line 605 of file HBaseTable.java.

Column com.cloudera.impala.catalog.Table.getColumn ( String  name)
inlineinherited
List<String> com.cloudera.impala.catalog.Table.getColumnNamesWithHmsStats ( )
inlineprotectedinherited
ArrayList<Column> com.cloudera.impala.catalog.HBaseTable.getColumnsInHiveOrder ( )
inline

Hive returns the columns in order of their declaration for HBase tables.

Definition at line 572 of file HBaseTable.java.

References com.cloudera.impala.catalog.Table.getColumns().

synchronized Pair<Long, Long> com.cloudera.impala.catalog.HBaseTable.getEstimatedRowStats ( byte[]  startRowKey,
byte[]  endRowKey 
)
inline

Get an estimate of the number of rows and bytes per row in regions between startRowKey and endRowKey.

This number is calculated by incrementally checking as many region servers as necessary until we observe a relatively constant row size per region on average. Depending on the skew of data in the regions this can either mean that we need to check only a minimal number of regions or that we will scan all regions.

The accuracy of this number is determined by the number of rows that are written and kept in the memstore and have not been flushed until now. A large number of key-value pairs in the memstore will lead to bad estimates as this number is not reflected in the file size on HDFS that is used to estimate this number.

Currently, the algorithm does not consider the case that the key range used as a parameter might be generally of different size than the rest of the region.

The values computed here should be cached so that in high qps workloads the nn is not overwhelmed. Could be done in load(); Synchronized to make sure that only one thread at a time is using the htable.

Parameters
startRowKeyFirst row key in the range
endRowKeyLast row key in the range
Returns
The estimated number of rows in the regions between the row keys (first) and the estimated row size in bytes (second).

Definition at line 488 of file HBaseTable.java.

References com.cloudera.impala.catalog.HBaseTable.columnFamilies_, com.cloudera.impala.catalog.HBaseTable.DELTA_FROM_AVERAGE, com.cloudera.impala.catalog.HBaseTable.getEstimatedRowStatsForRegion(), com.cloudera.impala.catalog.HBaseTable.getHdfsSize(), com.cloudera.impala.catalog.HBaseTable.getRegionsInRange(), com.cloudera.impala.catalog.HBaseTable.hTable_, and com.cloudera.impala.catalog.HBaseTable.MIN_NUM_REGIONS_TO_CHECK.

Pair<Long, Long> com.cloudera.impala.catalog.HBaseTable.getEstimatedRowStatsForRegion ( HRegionLocation  location,
boolean  isCompressed 
) throws IOException
inlineprivate

Estimates the number of rows for a single region and returns a pair with the estimated row count and the estimated size in bytes per row.

Definition at line 399 of file HBaseTable.java.

References com.cloudera.impala.catalog.HBaseTable.getHdfsSize(), and com.cloudera.impala.catalog.HBaseTable.ROW_COUNT_ESTIMATE_BATCH_SIZE.

Referenced by com.cloudera.impala.catalog.HBaseTable.getEstimatedRowStats(), and com.cloudera.impala.catalog.HBaseTable.getTableStats().

static Configuration com.cloudera.impala.catalog.HBaseTable.getHBaseConf ( )
inlinestatic
String com.cloudera.impala.catalog.HBaseTable.getHBaseTableName ( org.apache.hadoop.hive.metastore.api.Table  tbl)
inlineprivate
String com.cloudera.impala.catalog.HBaseTable.getHBaseTableName ( )
inline
long com.cloudera.impala.catalog.HBaseTable.getHdfsSize ( HRegionInfo  info) throws IOException
inline
HTable com.cloudera.impala.catalog.HBaseTable.getHTable ( )
inline

Definition at line 590 of file HBaseTable.java.

References com.cloudera.impala.catalog.HBaseTable.hTable_.

TableId com.cloudera.impala.catalog.Table.getId ( )
inlineinherited
List<Column> com.cloudera.impala.catalog.Table.getNonClusteringColumns ( )
inlineinherited

Returns the list of all columns excluding any partition columns.

Definition at line 385 of file Table.java.

References com.cloudera.impala.catalog.Table.numClusteringCols_.

Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.analyze(), and com.cloudera.impala.catalog.Table.getColumnsInHiveOrder().

int com.cloudera.impala.catalog.HBaseTable.getNumNodes ( )
inline

Definition at line 599 of file HBaseTable.java.

long com.cloudera.impala.catalog.Table.getNumRows ( )
inlineinherited
String com.cloudera.impala.catalog.Table.getOwner ( )
inlineinherited

Definition at line 348 of file Table.java.

References com.cloudera.impala.catalog.Table.owner_.

static List<HRegionLocation> com.cloudera.impala.catalog.HBaseTable.getRegionsInRange ( HTable  hbaseTbl,
final byte[]  startKey,
final byte[]  endKey 
) throws IOException
inlinestatic

This is copied from org.apache.hadoop.hbase.client.HTable. The only difference is that it does not use cache when calling getRegionLocation. TODO: Remove this function and use HTable.getRegionsInRange when the non-cache version has been ported to CDH (DISTRO-477). Get the corresponding regions for an arbitrary range of keys.

Parameters
startRowStarting row in range, inclusive
endRowEnding row in range, exclusive
Returns
A list of HRegionLocations corresponding to the regions that contain the specified range
Exceptions
IOExceptionif a remote or network exception occurs

Definition at line 650 of file HBaseTable.java.

Referenced by com.cloudera.impala.catalog.HBaseTable.getEstimatedRowStats().

static Path com.cloudera.impala.catalog.HBaseTable.getRootDir ( final Configuration  c) throws IOException
inlinestatic

Returns hbase's root directory: i.e. hbase.rootdir from the given configuration as a qualified Path. Method copied from HBase FSUtils.java to avoid depending on HBase server.

Definition at line 562 of file HBaseTable.java.

Referenced by com.cloudera.impala.catalog.HBaseTable.getHdfsSize().

static long com.cloudera.impala.catalog.Table.getRowCount ( Map< String, String >  parameters)
inlinestaticprotectedinherited
String com.cloudera.impala.catalog.HBaseTable.getStorageHandlerClassName ( )
inline

Returns the storage handler class for HBase tables read by Hive.

Definition at line 676 of file HBaseTable.java.

References com.cloudera.impala.catalog.HBaseTable.HBASE_STORAGE_HANDLER.

TResultSet com.cloudera.impala.catalog.HBaseTable.getTableStats ( )
inline

Returns statistics on this table as a tabular result set. Used for the SHOW TABLE STATS statement. The schema of the returned TResultSet is set inside this method.

Definition at line 685 of file HBaseTable.java.

References com.cloudera.impala.catalog.Type.BIGINT, com.cloudera.impala.catalog.HBaseTable.getEstimatedRowStatsForRegion(), com.cloudera.impala.catalog.HBaseTable.getHdfsSize(), com.cloudera.impala.catalog.HBaseTable.hTable_, com.cloudera.impala.catalog.Type.STRING, and com.cloudera.impala.catalog.ScalarType.toThrift().

ArrayType com.cloudera.impala.catalog.Table.getType ( )
inlineinherited
static boolean com.cloudera.impala.catalog.HBaseTable.isHBaseTable ( org.apache.hadoop.hive.metastore.api.Table  msTbl)
inlinestatic

Returns true if the given Metastore Table represents an HBase table. Versions of Hive/HBase are inconsistent which HBase related fields are set (e.g., HIVE-6548 changed the input format to null). For maximum compatibility consider all known fields that indicate an HBase table.

Definition at line 739 of file HBaseTable.java.

References com.cloudera.impala.catalog.HBaseTable.HBASE_INPUT_FORMAT, com.cloudera.impala.catalog.HBaseTable.HBASE_SERIALIZATION_LIB, and com.cloudera.impala.catalog.HBaseTable.HBASE_STORAGE_HANDLER.

Referenced by com.cloudera.impala.catalog.Table.fromMetastoreTable().

void com.cloudera.impala.catalog.Table.loadAllColumnStats ( HiveMetaStoreClient  client)
inlineprotectedinherited
void com.cloudera.impala.catalog.HBaseTable.parseColumnMapping ( boolean  tableDefaultStorageIsBinary,
String  columnsMappingSpec,
List< FieldSchema >  fieldSchemas,
List< String >  columnFamilies,
List< String >  columnQualifiers,
List< Boolean >  colIsBinaryEncoded 
) throws SerDeException
inlineprivate
Type com.cloudera.impala.catalog.Table.parseColumnType ( FieldSchema  fs) throws TableLoadingException
inlineprotectedinherited

Gets the ColumnType from the given FieldSchema by using Impala's SqlParser. Throws a TableLoadingException if the FieldSchema could not be parsed. The type can either be:

  • Supported by Impala, in which case the type is returned.
  • A type Impala understands but is not yet implemented (e.g. date), the type is returned but type.IsSupported() returns false.
  • A type Impala can't understand at all, and a TableLoadingException is thrown.

Definition at line 331 of file Table.java.

References com.cloudera.impala.catalog.Table.getName().

Referenced by com.cloudera.impala.catalog.View.load(), com.cloudera.impala.catalog.HBaseTable.load(), com.cloudera.impala.catalog.DataSourceTable.loadColumns(), com.cloudera.impala.catalog.HdfsTable.loadColumns(), and com.cloudera.impala.catalog.HBaseTable.supportsBinaryEncoding().

void com.cloudera.impala.catalog.Table.setCatalogVersion ( long  catalogVersion)
inlineinherited
boolean com.cloudera.impala.catalog.HBaseTable.supportsBinaryEncoding ( FieldSchema  fs)
inlineprivate
TCatalogObject com.cloudera.impala.catalog.Table.toTCatalogObject ( )
inlineinherited
TTable com.cloudera.impala.catalog.HBaseTable.toThrift ( )
inline
void com.cloudera.impala.catalog.Table.updateLastDdlTime ( long  ddlTime)
inlineinherited

Updates the lastDdlTime for this Table, if the new value is greater than the existing value. Does nothing if the new value is less than or equal to the existing value.

Definition at line 132 of file Table.java.

References com.cloudera.impala.catalog.Table.lastDdlTime_.

void com.cloudera.impala.catalog.Table.validate ( ) throws TableLoadingException
inlineinherited

Checks preconditions for this table to function as expected. Currently only checks that all entries in colsByName_ use lower case keys.

Definition at line 279 of file Table.java.

References com.cloudera.impala.catalog.Table.colsByName_.

Member Data Documentation

HColumnDescriptor [] com.cloudera.impala.catalog.HBaseTable.columnFamilies_ = null
private
final String com.cloudera.impala.catalog.HBaseTable.DEFAULT_PREFIX = "default."
static
final double com.cloudera.impala.catalog.HBaseTable.DELTA_FROM_AVERAGE = 0.15
staticprivate
List<FieldSchema> com.cloudera.impala.catalog.Table.fields_
protectedinherited
final String com.cloudera.impala.catalog.HBaseTable.HBASE_INPUT_FORMAT
staticprivate
Initial value:
=
"org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat"

Definition at line 101 of file HBaseTable.java.

Referenced by com.cloudera.impala.catalog.HBaseTable.isHBaseTable().

final String com.cloudera.impala.catalog.HBaseTable.HBASE_SERIALIZATION_LIB
staticprivate
Initial value:
=
"org.apache.hadoop.hive.hbase.HBaseSerDe"

Definition at line 105 of file HBaseTable.java.

Referenced by com.cloudera.impala.catalog.HBaseTable.isHBaseTable().

final String com.cloudera.impala.catalog.HBaseTable.HBASE_STORAGE_HANDLER
staticprivate
Initial value:
=
"org.apache.hadoop.hive.hbase.HBaseStorageHandler"

Definition at line 109 of file HBaseTable.java.

Referenced by com.cloudera.impala.catalog.HBaseTable.getStorageHandlerClassName(), and com.cloudera.impala.catalog.HBaseTable.isHBaseTable().

final Configuration com.cloudera.impala.catalog.HBaseTable.hbaseConf_ = HBaseConfiguration.create()
staticprivate
long com.cloudera.impala.catalog.Table.lastDdlTime_
protectedinherited
final Logger com.cloudera.impala.catalog.HBaseTable.LOG = Logger.getLogger(HBaseTable.class)
staticprivate

Definition at line 81 of file HBaseTable.java.

final int com.cloudera.impala.catalog.HBaseTable.MIN_NUM_REGIONS_TO_CHECK = 5
staticprivate
final org.apache.hadoop.hive.metastore.api.Table com.cloudera.impala.catalog.Table.msTable_
protectedinherited
final String com.cloudera.impala.catalog.Table.owner_
protectedinherited
final int com.cloudera.impala.catalog.HBaseTable.ROW_COUNT_ESTIMATE_BATCH_SIZE = 10
static
final String com.cloudera.impala.catalog.HBaseTable.ROW_KEY_COLUMN_FAMILY = ":key"
staticprivate

Definition at line 113 of file HBaseTable.java.

Referenced by com.cloudera.impala.catalog.HBaseTable.load().

HBaseColumn com.cloudera.impala.catalog.HBaseTable.rowKey_
protected

Definition at line 94 of file HBaseTable.java.

EnumSet<TableType> com.cloudera.impala.catalog.Table.SUPPORTED_TABLE_TYPES
staticprotectedinherited
Initial value:
= EnumSet.of(
TableType.EXTERNAL_TABLE, TableType.MANAGED_TABLE, TableType.VIRTUAL_VIEW)

Definition at line 88 of file Table.java.

TTableDescriptor com.cloudera.impala.catalog.Table.tableDesc_
protectedinherited

Definition at line 64 of file Table.java.


The documentation for this class was generated from the following file: