Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
Public Member Functions | |
void | analyze (Analyzer analyzer) throws AnalysisException |
String | getTblStatsQuery () |
String | getColStatsQuery () |
String | toSql () |
TComputeStatsParams | toThrift () |
void | setIsExplain () |
boolean | isExplain () |
Protected Member Functions | |
ComputeStatsStmt (TableName tableName) | |
ComputeStatsStmt (TableName tableName, boolean isIncremental, PartitionSpec partSpec) | |
Protected Attributes | |
final TableName | tableName_ |
Table | table_ |
String | tableStatsQueryStr_ |
String | columnStatsQueryStr_ |
boolean | isExplain_ = false |
Private Member Functions | |
void | addPartitionCols (HdfsTable table, List< String > selectList, List< String > groupByCols) |
List< String > | getBaseColumnStatsQuerySelectList (Analyzer analyzer) |
void | checkIncompleteAvroSchema (HdfsTable table) throws AnalysisException |
Private Attributes | |
boolean | isIncremental_ = false |
boolean | expectAllPartitions_ = false |
final List< TPartitionStats > | validPartStats_ = Lists.newArrayList() |
final List< List< String > > | expectedPartitions_ = Lists.newArrayList() |
PartitionSpec | partitionSpec_ = null |
Static Private Attributes | |
static final Logger | LOG = Logger.getLogger(ComputeStatsStmt.class) |
static String | AVRO_SCHEMA_MSG_PREFIX |
static String | AVRO_SCHEMA_MSG_SUFFIX |
static final boolean | COUNT_NULLS = false |
static final int | MAX_INCREMENTAL_PARTITIONS = 1000 |
Represents a COMPUTE STATS
COMPUTE INCREMENTAL STATS
PARTITION <part_spec>] statement for statistics collection. The former statement gathers all table and column stats for a given table and stores them in the Metastore via the CatalogService. All existing stats for that table are replaced and no existing stats are reused. The latter, incremental form, similarly computes stats for the whole table but does so by re-using stats from partitions which have 'valid' statistics. Statistics are 'valid' currently if they exist, in the future they may be expired based on recency etc.
TODO: Allow more coarse/fine grained (db, column) TODO: Compute stats on complex types.
Definition at line 52 of file ComputeStatsStmt.java.
|
inlineprotected |
Constructor for the non-incremental form of COMPUTE STATS.
Definition at line 111 of file ComputeStatsStmt.java.
|
inlineprotected |
Constructor for the incremental form of COMPUTE STATS. If isIncremental is true, statistics will be recomputed incrementally; if false they will be recomputed for the whole table. The partition spec partSpec can specify a single partition whose stats should be recomputed.
Definition at line 121 of file ComputeStatsStmt.java.
References com.cloudera.impala.authorization.Privilege.ALTER, and com.cloudera.impala.analysis.ComputeStatsStmt.partitionSpec_.
|
inlineprivate |
Utility method for constructing the child queries to add partition columns to both a select list and a group-by list; the former are wrapped in a cast to a string.
Definition at line 139 of file ComputeStatsStmt.java.
Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.analyze().
|
inline |
Constructs two queries to compute statistics for 'tableName_', if that table exists (although if we can detect that no work needs to be done for either query, that query will be 'null' and not executed).
The first query computes the number of rows (on a per-partition basis if the table is partitioned) and has the form "SELECT COUNT(*) FROM tbl GROUP BY part_col1, part_col2...", with an optional WHERE clause for incremental computation (see below).
The second query computes the NDV estimate, the average width, the maximum width and, optionally, the number of nulls for each column. For non-partitioned tables (or non-incremental computations), the query is simple:
SELECT NDV(col), COUNT(<nulls>), MAX(length(col)), AVG(length(col)) FROM tbl
(For non-string columns, the widths are hard-coded as they are known at query construction time).
If computation is incremental (i.e. the original statement was COMPUTE INCREMENTAL STATS.., and the underlying table is a partitioned HdfsTable), some modifications are made to the non-incremental per-column query. First, a different UDA, NDV_NO_FINALIZE() is used to retrieve and serialise the intermediate state from each column. Second, the results are grouped by partition, as with the row count query, so that the intermediate NDV computation state can be stored per-partition. The number of rows per-partition are also recorded.
For both the row count query, and the column stats query, the query's WHERE clause is used to restrict execution only to partitions that actually require new statstics to be computed.
SELECT NDV_NO_FINALIZE(col), <nulls, max, avg>, COUNT(col) FROM tbl GROUP BY part_col1, part_col2, ... WHERE ((part_col1 = p1_val1) AND (part_col2 = p1_val2)) OR ((part_col1 = p2_val1) AND (part_col2 = p2_val2)) OR ...
Implements com.cloudera.impala.analysis.ParseNode.
Definition at line 242 of file ComputeStatsStmt.java.
References com.cloudera.impala.analysis.ComputeStatsStmt.addPartitionCols(), com.cloudera.impala.authorization.Privilege.ALTER, com.cloudera.impala.analysis.ComputeStatsStmt.checkIncompleteAvroSchema(), com.cloudera.impala.analysis.ComputeStatsStmt.columnStatsQueryStr_, com.cloudera.impala.analysis.ComputeStatsStmt.expectAllPartitions_, com.cloudera.impala.analysis.ComputeStatsStmt.expectedPartitions_, com.cloudera.impala.analysis.ComputeStatsStmt.getBaseColumnStatsQuerySelectList(), com.cloudera.impala.catalog.Table.getNonClusteringColumns(), com.cloudera.impala.catalog.Table.getNumClusteringCols(), com.cloudera.impala.catalog.HdfsTable.getPartitions(), com.cloudera.impala.analysis.PartitionSpec.getPartitionSpecKeyValues(), com.cloudera.impala.catalog.Table.getTableName(), com.cloudera.impala.catalog.HdfsTable.isAvroTable(), com.cloudera.impala.analysis.ComputeStatsStmt.isIncremental_, com.cloudera.impala.analysis.ComputeStatsStmt.MAX_INCREMENTAL_PARTITIONS, com.cloudera.impala.analysis.ComputeStatsStmt.partitionSpec_, com.cloudera.impala.analysis.ComputeStatsStmt.table_, com.cloudera.impala.analysis.ComputeStatsStmt.tableName_, com.cloudera.impala.analysis.ComputeStatsStmt.tableStatsQueryStr_, and com.cloudera.impala.analysis.ComputeStatsStmt.toSql().
|
inlineprivate |
Checks whether the column definitions from the CREATE TABLE stmt match the columns in the Avro schema. If there is a mismatch, then COMPUTE STATS cannot update the statistics in the Metastore's backend DB due to HIVE-6308. Throws an AnalysisException for such ill-created Avro tables. Does nothing if the column definitions match the Avro schema exactly.
Definition at line 437 of file ComputeStatsStmt.java.
References com.cloudera.impala.analysis.ComputeStatsStmt.AVRO_SCHEMA_MSG_PREFIX, com.cloudera.impala.analysis.ComputeStatsStmt.AVRO_SCHEMA_MSG_SUFFIX, and com.cloudera.impala.catalog.Column.getType().
Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.analyze().
|
inlineprivate |
Definition at line 150 of file ComputeStatsStmt.java.
References com.cloudera.impala.analysis.ComputeStatsStmt.COUNT_NULLS, com.cloudera.impala.catalog.Table.getNumClusteringCols(), com.cloudera.impala.catalog.Column.getType(), com.cloudera.impala.analysis.ComputeStatsStmt.isIncremental_, com.cloudera.impala.catalog.Type.isStringType(), com.cloudera.impala.catalog.Type.isValid(), and com.cloudera.impala.analysis.ComputeStatsStmt.table_.
Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.analyze().
|
inline |
Definition at line 498 of file ComputeStatsStmt.java.
References com.cloudera.impala.analysis.ComputeStatsStmt.columnStatsQueryStr_.
Referenced by com.cloudera.impala.analysis.AnalyzeDDLTest.checkComputeStatsStmt().
|
inline |
Definition at line 497 of file ComputeStatsStmt.java.
References com.cloudera.impala.analysis.ComputeStatsStmt.tableStatsQueryStr_.
Referenced by com.cloudera.impala.analysis.AnalyzeDDLTest.checkComputeStatsStmt().
|
inlineinherited |
Definition at line 43 of file StatementBase.java.
References com.cloudera.impala.analysis.StatementBase.isExplain_.
|
inlineinherited |
Definition at line 42 of file StatementBase.java.
References com.cloudera.impala.analysis.StatementBase.isExplain_.
|
inline |
Implements com.cloudera.impala.analysis.ParseNode.
Definition at line 501 of file ComputeStatsStmt.java.
References com.cloudera.impala.analysis.ComputeStatsStmt.isIncremental_, com.cloudera.impala.analysis.ComputeStatsStmt.partitionSpec_, and com.cloudera.impala.analysis.PartitionSpec.toSql().
Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.analyze().
|
inline |
Definition at line 510 of file ComputeStatsStmt.java.
References com.cloudera.impala.analysis.ComputeStatsStmt.columnStatsQueryStr_, com.cloudera.impala.analysis.ComputeStatsStmt.expectAllPartitions_, com.cloudera.impala.analysis.ComputeStatsStmt.expectedPartitions_, com.cloudera.impala.catalog.Table.getDb(), com.cloudera.impala.catalog.Table.getName(), com.cloudera.impala.analysis.ComputeStatsStmt.isIncremental_, com.cloudera.impala.analysis.ComputeStatsStmt.table_, com.cloudera.impala.analysis.ComputeStatsStmt.tableStatsQueryStr_, and com.cloudera.impala.analysis.ComputeStatsStmt.validPartStats_.
|
staticprivate |
Definition at line 55 of file ComputeStatsStmt.java.
Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.checkIncompleteAvroSchema().
|
staticprivate |
Definition at line 57 of file ComputeStatsStmt.java.
Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.checkIncompleteAvroSchema().
|
protected |
Definition at line 75 of file ComputeStatsStmt.java.
Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.analyze(), com.cloudera.impala.analysis.ComputeStatsStmt.getColStatsQuery(), and com.cloudera.impala.analysis.ComputeStatsStmt.toThrift().
|
staticprivate |
Definition at line 67 of file ComputeStatsStmt.java.
Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.getBaseColumnStatsQuerySelectList().
|
private |
Definition at line 84 of file ComputeStatsStmt.java.
Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.analyze(), and com.cloudera.impala.analysis.ComputeStatsStmt.toThrift().
|
private |
Definition at line 95 of file ComputeStatsStmt.java.
Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.analyze(), and com.cloudera.impala.analysis.ComputeStatsStmt.toThrift().
|
protectedinherited |
|
private |
Definition at line 78 of file ComputeStatsStmt.java.
Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.analyze(), com.cloudera.impala.analysis.ComputeStatsStmt.getBaseColumnStatsQuerySelectList(), com.cloudera.impala.analysis.ComputeStatsStmt.toSql(), and com.cloudera.impala.analysis.ComputeStatsStmt.toThrift().
|
staticprivate |
Definition at line 53 of file ComputeStatsStmt.java.
|
staticprivate |
Definition at line 106 of file ComputeStatsStmt.java.
Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.analyze().
|
private |
Definition at line 99 of file ComputeStatsStmt.java.
Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.analyze(), com.cloudera.impala.analysis.ComputeStatsStmt.ComputeStatsStmt(), and com.cloudera.impala.analysis.ComputeStatsStmt.toSql().
|
protected |
|
protected |
Definition at line 60 of file ComputeStatsStmt.java.
Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.analyze().
|
protected |
Definition at line 71 of file ComputeStatsStmt.java.
Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.analyze(), com.cloudera.impala.analysis.ComputeStatsStmt.getTblStatsQuery(), and com.cloudera.impala.analysis.ComputeStatsStmt.toThrift().
|
private |
Definition at line 88 of file ComputeStatsStmt.java.
Referenced by com.cloudera.impala.analysis.ComputeStatsStmt.toThrift().