Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
com.cloudera.impala.analysis.ComputeStatsStmt Class Reference
Inheritance diagram for com.cloudera.impala.analysis.ComputeStatsStmt:
Collaboration diagram for com.cloudera.impala.analysis.ComputeStatsStmt:

Public Member Functions

void analyze (Analyzer analyzer) throws AnalysisException
 
String getTblStatsQuery ()
 
String getColStatsQuery ()
 
String toSql ()
 
TComputeStatsParams toThrift ()
 
void setIsExplain ()
 
boolean isExplain ()
 

Protected Member Functions

 ComputeStatsStmt (TableName tableName)
 
 ComputeStatsStmt (TableName tableName, boolean isIncremental, PartitionSpec partSpec)
 

Protected Attributes

final TableName tableName_
 
Table table_
 
String tableStatsQueryStr_
 
String columnStatsQueryStr_
 
boolean isExplain_ = false
 

Private Member Functions

void addPartitionCols (HdfsTable table, List< String > selectList, List< String > groupByCols)
 
List< String > getBaseColumnStatsQuerySelectList (Analyzer analyzer)
 
void checkIncompleteAvroSchema (HdfsTable table) throws AnalysisException
 

Private Attributes

boolean isIncremental_ = false
 
boolean expectAllPartitions_ = false
 
final List< TPartitionStats > validPartStats_ = Lists.newArrayList()
 
final List< List< String > > expectedPartitions_ = Lists.newArrayList()
 
PartitionSpec partitionSpec_ = null
 

Static Private Attributes

static final Logger LOG = Logger.getLogger(ComputeStatsStmt.class)
 
static String AVRO_SCHEMA_MSG_PREFIX
 
static String AVRO_SCHEMA_MSG_SUFFIX
 
static final boolean COUNT_NULLS = false
 
static final int MAX_INCREMENTAL_PARTITIONS = 1000
 

Detailed Description

Represents a COMPUTE STATS

COMPUTE INCREMENTAL STATS

PARTITION <part_spec>] statement for statistics collection. The former statement gathers all table and column stats for a given table and stores them in the Metastore via the CatalogService. All existing stats for that table are replaced and no existing stats are reused. The latter, incremental form, similarly computes stats for the whole table but does so by re-using stats from partitions which have 'valid' statistics. Statistics are 'valid' currently if they exist, in the future they may be expired based on recency etc.

TODO: Allow more coarse/fine grained (db, column) TODO: Compute stats on complex types.

Definition at line 52 of file ComputeStatsStmt.java.

Constructor & Destructor Documentation

com.cloudera.impala.analysis.ComputeStatsStmt.ComputeStatsStmt ( TableName  tableName)
inlineprotected

Constructor for the non-incremental form of COMPUTE STATS.

Definition at line 111 of file ComputeStatsStmt.java.

com.cloudera.impala.analysis.ComputeStatsStmt.ComputeStatsStmt ( TableName  tableName,
boolean  isIncremental,
PartitionSpec  partSpec 
)
inlineprotected

Constructor for the incremental form of COMPUTE STATS. If isIncremental is true, statistics will be recomputed incrementally; if false they will be recomputed for the whole table. The partition spec partSpec can specify a single partition whose stats should be recomputed.

Definition at line 121 of file ComputeStatsStmt.java.

References com.cloudera.impala.authorization.Privilege.ALTER, and com.cloudera.impala.analysis.ComputeStatsStmt.partitionSpec_.

Member Function Documentation

void com.cloudera.impala.analysis.ComputeStatsStmt.addPartitionCols ( HdfsTable  table,
List< String >  selectList,
List< String >  groupByCols 
)
inlineprivate

Utility method for constructing the child queries to add partition columns to both a select list and a group-by list; the former are wrapped in a cast to a string.

Definition at line 139 of file ComputeStatsStmt.java.

Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.analyze().

void com.cloudera.impala.analysis.ComputeStatsStmt.analyze ( Analyzer  analyzer) throws AnalysisException
inline

Constructs two queries to compute statistics for 'tableName_', if that table exists (although if we can detect that no work needs to be done for either query, that query will be 'null' and not executed).

The first query computes the number of rows (on a per-partition basis if the table is partitioned) and has the form "SELECT COUNT(*) FROM tbl GROUP BY part_col1, part_col2...", with an optional WHERE clause for incremental computation (see below).

The second query computes the NDV estimate, the average width, the maximum width and, optionally, the number of nulls for each column. For non-partitioned tables (or non-incremental computations), the query is simple:

SELECT NDV(col), COUNT(<nulls>), MAX(length(col)), AVG(length(col)) FROM tbl

(For non-string columns, the widths are hard-coded as they are known at query construction time).

If computation is incremental (i.e. the original statement was COMPUTE INCREMENTAL STATS.., and the underlying table is a partitioned HdfsTable), some modifications are made to the non-incremental per-column query. First, a different UDA, NDV_NO_FINALIZE() is used to retrieve and serialise the intermediate state from each column. Second, the results are grouped by partition, as with the row count query, so that the intermediate NDV computation state can be stored per-partition. The number of rows per-partition are also recorded.

For both the row count query, and the column stats query, the query's WHERE clause is used to restrict execution only to partitions that actually require new statstics to be computed.

SELECT NDV_NO_FINALIZE(col), <nulls, max, avg>, COUNT(col) FROM tbl GROUP BY part_col1, part_col2, ... WHERE ((part_col1 = p1_val1) AND (part_col2 = p1_val2)) OR ((part_col1 = p2_val1) AND (part_col2 = p2_val2)) OR ...

Implements com.cloudera.impala.analysis.ParseNode.

Definition at line 242 of file ComputeStatsStmt.java.

References com.cloudera.impala.analysis.ComputeStatsStmt.addPartitionCols(), com.cloudera.impala.authorization.Privilege.ALTER, com.cloudera.impala.analysis.ComputeStatsStmt.checkIncompleteAvroSchema(), com.cloudera.impala.analysis.ComputeStatsStmt.columnStatsQueryStr_, com.cloudera.impala.analysis.ComputeStatsStmt.expectAllPartitions_, com.cloudera.impala.analysis.ComputeStatsStmt.expectedPartitions_, com.cloudera.impala.analysis.ComputeStatsStmt.getBaseColumnStatsQuerySelectList(), com.cloudera.impala.catalog.Table.getNonClusteringColumns(), com.cloudera.impala.catalog.Table.getNumClusteringCols(), com.cloudera.impala.catalog.HdfsTable.getPartitions(), com.cloudera.impala.analysis.PartitionSpec.getPartitionSpecKeyValues(), com.cloudera.impala.catalog.Table.getTableName(), com.cloudera.impala.catalog.HdfsTable.isAvroTable(), com.cloudera.impala.analysis.ComputeStatsStmt.isIncremental_, com.cloudera.impala.analysis.ComputeStatsStmt.MAX_INCREMENTAL_PARTITIONS, com.cloudera.impala.analysis.ComputeStatsStmt.partitionSpec_, com.cloudera.impala.analysis.ComputeStatsStmt.table_, com.cloudera.impala.analysis.ComputeStatsStmt.tableName_, com.cloudera.impala.analysis.ComputeStatsStmt.tableStatsQueryStr_, and com.cloudera.impala.analysis.ComputeStatsStmt.toSql().

void com.cloudera.impala.analysis.ComputeStatsStmt.checkIncompleteAvroSchema ( HdfsTable  table) throws AnalysisException
inlineprivate

Checks whether the column definitions from the CREATE TABLE stmt match the columns in the Avro schema. If there is a mismatch, then COMPUTE STATS cannot update the statistics in the Metastore's backend DB due to HIVE-6308. Throws an AnalysisException for such ill-created Avro tables. Does nothing if the column definitions match the Avro schema exactly.

Definition at line 437 of file ComputeStatsStmt.java.

References com.cloudera.impala.analysis.ComputeStatsStmt.AVRO_SCHEMA_MSG_PREFIX, com.cloudera.impala.analysis.ComputeStatsStmt.AVRO_SCHEMA_MSG_SUFFIX, and com.cloudera.impala.catalog.Column.getType().

Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.analyze().

String com.cloudera.impala.analysis.ComputeStatsStmt.getColStatsQuery ( )
inline
String com.cloudera.impala.analysis.ComputeStatsStmt.getTblStatsQuery ( )
inline
boolean com.cloudera.impala.analysis.StatementBase.isExplain ( )
inlineinherited
void com.cloudera.impala.analysis.StatementBase.setIsExplain ( )
inlineinherited
String com.cloudera.impala.analysis.ComputeStatsStmt.toSql ( )
inline

Member Data Documentation

String com.cloudera.impala.analysis.ComputeStatsStmt.AVRO_SCHEMA_MSG_PREFIX
staticprivate
Initial value:
= "Cannot COMPUTE STATS on Avro table " +
"'%s' because its column definitions do not match those in the Avro schema."

Definition at line 55 of file ComputeStatsStmt.java.

Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.checkIncompleteAvroSchema().

String com.cloudera.impala.analysis.ComputeStatsStmt.AVRO_SCHEMA_MSG_SUFFIX
staticprivate
Initial value:
= "Please re-create the table with " +
"column definitions, e.g., using the result of 'SHOW CREATE TABLE'"

Definition at line 57 of file ComputeStatsStmt.java.

Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.checkIncompleteAvroSchema().

String com.cloudera.impala.analysis.ComputeStatsStmt.columnStatsQueryStr_
protected
final boolean com.cloudera.impala.analysis.ComputeStatsStmt.COUNT_NULLS = false
staticprivate
boolean com.cloudera.impala.analysis.ComputeStatsStmt.expectAllPartitions_ = false
private
final List<List<String> > com.cloudera.impala.analysis.ComputeStatsStmt.expectedPartitions_ = Lists.newArrayList()
private
final Logger com.cloudera.impala.analysis.ComputeStatsStmt.LOG = Logger.getLogger(ComputeStatsStmt.class)
staticprivate

Definition at line 53 of file ComputeStatsStmt.java.

final int com.cloudera.impala.analysis.ComputeStatsStmt.MAX_INCREMENTAL_PARTITIONS = 1000
staticprivate
final TableName com.cloudera.impala.analysis.ComputeStatsStmt.tableName_
protected
String com.cloudera.impala.analysis.ComputeStatsStmt.tableStatsQueryStr_
protected
final List<TPartitionStats> com.cloudera.impala.analysis.ComputeStatsStmt.validPartStats_ = Lists.newArrayList()
private

The documentation for this class was generated from the following file: