Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
Classes | |
enum | AggPhase |
Static Public Member Functions | |
static AggregateInfo | create (ArrayList< Expr > groupingExprs, ArrayList< FunctionCallExpr > aggExprs, TupleDescriptor tupleDesc, Analyzer analyzer) throws AnalysisException |
static< TextendsExpr > boolean | requiresIntermediateTuple (List< T > aggExprs) |
Protected Member Functions | |
String | tupleDebugName () |
void | createTupleDescs (Analyzer analyzer) |
Protected Attributes | |
ExprSubstitutionMap | intermediateTupleSmap_ = new ExprSubstitutionMap() |
ExprSubstitutionMap | outputTupleSmap_ = new ExprSubstitutionMap() |
final ExprSubstitutionMap | outputToIntermediateTupleSmap_ |
ArrayList< Expr > | groupingExprs_ |
ArrayList< FunctionCallExpr > | aggregateExprs_ |
TupleDescriptor | intermediateTupleDesc_ |
TupleDescriptor | outputTupleDesc_ |
ArrayList< Integer > | materializedSlots_ = Lists.newArrayList() |
Private Member Functions | |
AggregateInfo (ArrayList< Expr > groupingExprs, ArrayList< FunctionCallExpr > aggExprs, AggPhase aggPhase) | |
void | createDistinctAggInfo (ArrayList< Expr > origGroupingExprs, ArrayList< FunctionCallExpr > distinctAggExprs, Analyzer analyzer) throws AnalysisException |
void | createMergeAggInfo (Analyzer analyzer) |
Expr | createCountDistinctAggExprParam (int firstIdx, int lastIdx, ArrayList< SlotDescriptor > slots) |
void | createSecondPhaseAggInfo (ArrayList< Expr > origGroupingExprs, ArrayList< FunctionCallExpr > distinctAggExprs, Analyzer analyzer) throws AnalysisException |
void | createSecondPhaseAggSMap (AggregateInfo inputAggInfo, ArrayList< FunctionCallExpr > distinctAggExprs) |
Private Attributes | |
AggregateInfo | mergeAggInfo_ |
AggregateInfo | secondPhaseDistinctAggInfo_ |
final AggPhase | aggPhase_ |
List< Expr > | partitionExprs_ |
Static Private Attributes | |
static final Logger | LOG = LoggerFactory.getLogger(AggregateInfo.class) |
Encapsulates all the information needed to compute the aggregate functions of a single Select block, including a possible 2nd phase aggregation step for DISTINCT aggregate functions and merge aggregation steps needed for distributed execution.
The latter requires a tree structure of AggregateInfo objects which express the original aggregate computations as well as the necessary merging aggregate computations. TODO: get rid of this by transforming SELECT COUNT(DISTINCT a, b, ..) GROUP BY x, y, ... into an equivalent query with a inline view: SELECT COUNT(*) FROM (SELECT DISTINCT a, b, ..., x, y, ...) GROUP BY x, y, ...
The tree structure looks as follows:
In general, merging aggregate computations are idempotent; in other words, aggInfo.mergeAggInfo == aggInfo.mergeAggInfo.mergeAggInfo.
TODO: move the merge construction logic from SelectStmt into AggregateInfo TODO: Add query tests for aggregation with intermediate tuples with num_nodes=1.
Definition at line 66 of file AggregateInfo.java.
|
inlineprivate |
Definition at line 104 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfo.aggPhase_.
Referenced by com.cloudera.impala.analysis.AggregateInfo.create(), and com.cloudera.impala.analysis.AggregateInfo.createSecondPhaseAggInfo().
|
inline |
Validates the internal state of this agg info: Checks that the number of materialized slots of the output tuple corresponds to the number of materialized aggregate functions plus the number of grouping exprs. Also checks that the return types of the aggregate and grouping exprs correspond to the slots in the output tuple.
Definition at line 596 of file AggregateInfo.java.
References com.cloudera.impala.analysis.Expr.getType(), and com.cloudera.impala.analysis.AggregateInfoBase.groupingExprs_.
|
inlinestatic |
Creates complete AggregateInfo for groupingExprs and aggExprs, including aggTupleDesc and aggTupleSMap. If parameter tupleDesc != null, sets aggTupleDesc to that instead of creating a new descriptor (after verifying that the passed-in descriptor is correct for the given aggregation). Also creates mergeAggInfo and secondPhaseDistinctAggInfo, if needed. If an aggTupleDesc is created, also registers eq predicates between the grouping exprs and their respective slots with 'analyzer'.
Definition at line 122 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfo.AggregateInfo(), and com.cloudera.impala.analysis.AggregateInfo.AggPhase.FIRST.
|
inlineprivate |
Creates an IF function call that returns NULL if any of the slots at indexes [firstIdx, lastIdx] return NULL. For example, the resulting IF function would like this for 3 slots: IF(IsNull(slot1), NULL, IF(IsNull(slot2), NULL, slot3)) Returns null if firstIdx is greater than lastIdx. Returns a SlotRef to the last slot if there is only one slot in range.
Definition at line 370 of file AggregateInfo.java.
Referenced by com.cloudera.impala.analysis.AggregateInfo.createSecondPhaseAggInfo().
|
inlineprivate |
Create aggregate info for select block containing aggregate exprs with DISTINCT clause. This creates:
At the moment, we require that all distinct aggregate functions be applied to the same set of exprs (ie, we can't do something like SELECT COUNT(DISTINCT id), COUNT(DISTINCT address)). Aggregation happens in two successive phases:
Example: SELECT a, COUNT(DISTINCT b, c), MIN(d), COUNT(*) FROM T GROUP BY a
TODO: expand implementation to cover the general case; this will require a different execution strategy
Definition at line 188 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfo.createMergeAggInfo(), com.cloudera.impala.analysis.AggregateInfo.createSecondPhaseAggInfo(), com.cloudera.impala.analysis.AggregateInfo.createSmaps(), com.cloudera.impala.analysis.AggregateInfoBase.createTupleDescs(), and com.cloudera.impala.analysis.Expr.equalLists().
|
inlineprivate |
Create the info for an aggregation node that merges its pre-aggregated inputs:
The returned AggregateInfo shares its descriptor and smap with the input info; createAggTupleDesc() must not be called on it.
Definition at line 328 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfo.aggPhase_, com.cloudera.impala.analysis.AggregateInfo.AggPhase.FIRST, com.cloudera.impala.analysis.AggregateInfo.AggPhase.FIRST_MERGE, com.cloudera.impala.analysis.AggregateInfoBase.getAggregateExprs(), com.cloudera.impala.analysis.AggregateInfoBase.getGroupingExprs(), com.cloudera.impala.analysis.AggregateInfoBase.intermediateTupleDesc_, com.cloudera.impala.analysis.AggregateInfo.intermediateTupleSmap_, com.cloudera.impala.analysis.AggregateInfoBase.materializedSlots_, com.cloudera.impala.analysis.AggregateInfo.mergeAggInfo_, com.cloudera.impala.analysis.AggregateInfoBase.outputTupleDesc_, com.cloudera.impala.analysis.AggregateInfo.outputTupleSmap_, and com.cloudera.impala.analysis.AggregateInfo.AggPhase.SECOND_MERGE.
Referenced by com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo().
|
inlineprivate |
Create the info for an aggregation node that computes the second phase of of DISTINCT aggregate functions. (Refer to createDistinctAggInfo() for an explanation of the phases.)
This call also creates the tuple descriptor and smap for the returned AggregateInfo.
Definition at line 404 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfoBase.aggregateExprs_, com.cloudera.impala.analysis.AggregateInfo.AggregateInfo(), com.cloudera.impala.analysis.AggregateInfo.createCountDistinctAggExprParam(), com.cloudera.impala.analysis.AggregateInfoBase.getGroupingExprs(), com.cloudera.impala.analysis.TupleDescriptor.getSlots(), com.cloudera.impala.analysis.AggregateInfoBase.intermediateTupleDesc_, com.cloudera.impala.analysis.AggregateInfo.intermediateTupleSmap_, com.cloudera.impala.analysis.AggregateInfo.AggPhase.SECOND, and com.cloudera.impala.analysis.AggregateInfo.secondPhaseDistinctAggInfo_.
Referenced by com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo().
|
inlineprivate |
Create smap to map original grouping and aggregate exprs onto output of secondPhaseDistinctAggInfo.
Definition at line 475 of file AggregateInfo.java.
|
inline |
Populates the output and intermediate smaps based on the output and intermediate tuples that are assumed to be set. If an intermediate tuple is required, also populates the output-to-intermediate smap and registers auxiliary equivalence predicates between the grouping slots of the two tuples.
Definition at line 515 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfoBase.aggregateExprs_, com.cloudera.impala.analysis.TupleDescriptor.getSlots(), com.cloudera.impala.analysis.AggregateInfoBase.groupingExprs_, com.cloudera.impala.analysis.AggregateInfoBase.intermediateTupleDesc_, com.cloudera.impala.analysis.AggregateInfo.intermediateTupleSmap_, com.cloudera.impala.analysis.AggregateInfoBase.outputTupleDesc_, com.cloudera.impala.analysis.AggregateInfo.outputTupleSmap_, and com.cloudera.impala.analysis.AggregateInfoBase.requiresIntermediateTuple().
Referenced by com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo().
|
inlineprotectedinherited |
Creates the intermediate and output tuple descriptors. If no agg expr has an intermediate type different from its output type, then only the output tuple descriptor is created and the intermediate tuple is set to the output tuple.
Definition at line 70 of file AggregateInfoBase.java.
References com.cloudera.impala.analysis.AggregateInfoBase.aggregateExprs_, com.cloudera.impala.analysis.AggregateInfoBase.createTupleDesc(), com.cloudera.impala.analysis.AggregateInfoBase.intermediateTupleDesc_, com.cloudera.impala.analysis.AggregateInfoBase.outputTupleDesc_, and com.cloudera.impala.analysis.AggregateInfoBase.requiresIntermediateTuple().
Referenced by com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo().
|
inline |
Definition at line 645 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfo.aggPhase_, com.cloudera.impala.analysis.AggregateInfo.mergeAggInfo_, and com.cloudera.impala.analysis.AggregateInfo.secondPhaseDistinctAggInfo_.
Referenced by com.cloudera.impala.planner.AggregationNode.debugString().
|
inline |
Definition at line 229 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfo.aggPhase_.
|
inlineinherited |
Definition at line 157 of file AggregateInfoBase.java.
References com.cloudera.impala.analysis.AggregateInfoBase.aggregateExprs_.
Referenced by com.cloudera.impala.analysis.SelectStmt.analyze(), com.cloudera.impala.analysis.AnalyzerTest.checkSelectToThrift(), com.cloudera.impala.analysis.AggregateInfo.createMergeAggInfo(), com.cloudera.impala.planner.AggregationNode.getNodeExplainString(), and com.cloudera.impala.analysis.AggregateInfo.hasAggregateExprs().
|
inlineinherited |
Definition at line 156 of file AggregateInfoBase.java.
References com.cloudera.impala.analysis.AggregateInfoBase.groupingExprs_.
Referenced by com.cloudera.impala.analysis.AnalyzerTest.checkSelectToThrift(), com.cloudera.impala.analysis.AggregateInfo.createMergeAggInfo(), com.cloudera.impala.analysis.AggregateInfo.createSecondPhaseAggInfo(), and com.cloudera.impala.planner.AggregationNode.getNodeExplainString().
|
inline |
Definition at line 232 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfo.intermediateTupleSmap_.
|
inlineinherited |
Definition at line 159 of file AggregateInfoBase.java.
References com.cloudera.impala.analysis.AggregateInfoBase.intermediateTupleDesc_.
|
inlineinherited |
Definition at line 160 of file AggregateInfoBase.java.
Referenced by com.cloudera.impala.planner.AggregationNode.toThrift().
|
inline |
Definition at line 252 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfoBase.materializedSlots_.
Referenced by com.cloudera.impala.planner.AggregationNode.toThrift().
|
inline |
Definition at line 225 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfo.mergeAggInfo_.
|
inline |
Definition at line 233 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfo.outputTupleSmap_.
|
inline |
Definition at line 234 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfo.outputToIntermediateTupleSmap_.
|
inlineinherited |
Definition at line 158 of file AggregateInfoBase.java.
References com.cloudera.impala.analysis.AggregateInfoBase.outputTupleDesc_.
Referenced by com.cloudera.impala.planner.AnalyticPlanner.collectWindowGroups().
|
inlineinherited |
Definition at line 161 of file AggregateInfoBase.java.
Referenced by com.cloudera.impala.planner.AggregationNode.AggregationNode(), com.cloudera.impala.planner.AnalyticPlanner.AnalyticPlanner(), com.cloudera.impala.analysis.SelectStmt.analyzeAggregation(), com.cloudera.impala.analysis.AggregateInfo.getResultTupleId(), com.cloudera.impala.planner.AggregationNode.setIntermediateTuple(), and com.cloudera.impala.planner.AggregationNode.toThrift().
|
inline |
Returns DataPartition derived from grouping exprs. Returns unpartitioned spec if no grouping. TODO: this won't work when we start supporting range partitions, because we could derive both hash and order-based partitions
Definition at line 636 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfoBase.groupingExprs_, and com.cloudera.impala.planner.DataPartition.UNPARTITIONED.
|
inline |
Definition at line 110 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfo.partitionExprs_.
|
inline |
Append ids of all slots that are being referenced in the process of performing the aggregate computation described by this AggregateInfo.
Definition at line 264 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfoBase.aggregateExprs_, com.cloudera.impala.analysis.AggregateInfoBase.groupingExprs_, and com.cloudera.impala.analysis.AggregateInfoBase.outputTupleDesc_.
|
inline |
Return the tuple id produced in the final aggregation step.
Definition at line 247 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfoBase.getOutputTupleId(), and com.cloudera.impala.analysis.AggregateInfo.isDistinctAgg().
|
inline |
Definition at line 226 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfo.secondPhaseDistinctAggInfo_.
Referenced by com.cloudera.impala.analysis.SelectStmt.analyzeAggregation().
|
inline |
Definition at line 238 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfoBase.getAggregateExprs(), and com.cloudera.impala.analysis.AggregateInfo.secondPhaseDistinctAggInfo_.
|
inline |
Definition at line 231 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfo.secondPhaseDistinctAggInfo_.
Referenced by com.cloudera.impala.planner.SingleNodePlanner.createAggregationPlan(), com.cloudera.impala.analysis.AggregateInfo.getResultTupleId(), and com.cloudera.impala.analysis.AggregateInfo.materializeRequiredSlots().
|
inline |
Definition at line 230 of file AggregateInfo.java.
|
inline |
Mark slots required for this aggregation as materialized:
Definition at line 556 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfoBase.groupingExprs_, com.cloudera.impala.analysis.AggregateInfo.isDistinctAgg(), and com.cloudera.impala.analysis.SlotDescriptor.isMaterialized().
|
inlineinherited |
Definition at line 162 of file AggregateInfoBase.java.
References com.cloudera.impala.analysis.AggregateInfoBase.intermediateTupleDesc_, and com.cloudera.impala.analysis.AggregateInfoBase.outputTupleDesc_.
Referenced by com.cloudera.impala.analysis.AggregateInfo.createSmaps(), and com.cloudera.impala.analysis.AggregateInfoBase.createTupleDescs().
|
inlinestaticinherited |
Returns true if evaluating the given aggregate exprs requires an intermediate tuple, i.e., whether one of the aggregate functions has an intermediate type different from its output type.
Definition at line 173 of file AggregateInfoBase.java.
|
inline |
Definition at line 111 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfo.partitionExprs_.
|
inline |
Substitute all the expressions (grouping expr, aggregate expr) and update our substitution map according to the given substitution map:
Definition at line 295 of file AggregateInfo.java.
References com.cloudera.impala.analysis.AggregateInfoBase.aggregateExprs_, com.cloudera.impala.analysis.AggregateInfoBase.groupingExprs_, and com.cloudera.impala.analysis.AggregateInfo.secondPhaseDistinctAggInfo_.
|
inlineprotected |
Definition at line 663 of file AggregateInfo.java.
|
private |
Definition at line 84 of file AggregateInfo.java.
Referenced by com.cloudera.impala.analysis.AggregateInfo.AggregateInfo(), com.cloudera.impala.analysis.AggregateInfo.createMergeAggInfo(), com.cloudera.impala.analysis.AggregateInfo.debugString(), and com.cloudera.impala.analysis.AggregateInfo.getAggPhase().
|
protectedinherited |
Definition at line 31 of file AggregateInfoBase.java.
Referenced by com.cloudera.impala.analysis.AggregateInfoBase.AggregateInfoBase(), com.cloudera.impala.analysis.AggregateInfo.createSecondPhaseAggInfo(), com.cloudera.impala.analysis.AggregateInfo.createSmaps(), com.cloudera.impala.analysis.AggregateInfoBase.createTupleDesc(), com.cloudera.impala.analysis.AggregateInfoBase.createTupleDescs(), com.cloudera.impala.analysis.AggregateInfoBase.debugString(), com.cloudera.impala.analysis.AggregateInfoBase.getAggregateExprs(), com.cloudera.impala.analysis.AggregateInfo.getRefdSlots(), and com.cloudera.impala.analysis.AggregateInfo.substitute().
|
protectedinherited |
Definition at line 26 of file AggregateInfoBase.java.
Referenced by com.cloudera.impala.analysis.AggregateInfoBase.AggregateInfoBase(), com.cloudera.impala.analysis.AggregateInfo.checkConsistency(), com.cloudera.impala.analysis.AggregateInfo.createSmaps(), com.cloudera.impala.analysis.AggregateInfoBase.createTupleDesc(), com.cloudera.impala.analysis.AggregateInfoBase.debugString(), com.cloudera.impala.analysis.AggregateInfoBase.getGroupingExprs(), com.cloudera.impala.analysis.AggregateInfo.getPartition(), com.cloudera.impala.analysis.AggregateInfo.getRefdSlots(), com.cloudera.impala.analysis.AggregateInfo.materializeRequiredSlots(), and com.cloudera.impala.analysis.AggregateInfo.substitute().
|
protectedinherited |
Definition at line 40 of file AggregateInfoBase.java.
Referenced by com.cloudera.impala.analysis.AggregateInfo.createMergeAggInfo(), com.cloudera.impala.analysis.AggregateInfo.createSecondPhaseAggInfo(), com.cloudera.impala.analysis.AggregateInfo.createSmaps(), com.cloudera.impala.analysis.AggregateInfoBase.createTupleDescs(), com.cloudera.impala.analysis.AggregateInfoBase.debugString(), com.cloudera.impala.analysis.AggregateInfoBase.getIntermediateTupleDesc(), com.cloudera.impala.analysis.AnalyticInfo.getRefdSlots(), and com.cloudera.impala.analysis.AggregateInfoBase.requiresIntermediateTuple().
|
protected |
Definition at line 89 of file AggregateInfo.java.
Referenced by com.cloudera.impala.analysis.AggregateInfo.createMergeAggInfo(), com.cloudera.impala.analysis.AggregateInfo.createSecondPhaseAggInfo(), com.cloudera.impala.analysis.AggregateInfo.createSmaps(), and com.cloudera.impala.analysis.AggregateInfo.getIntermediateSmap().
|
staticprivate |
Definition at line 67 of file AggregateInfo.java.
|
protectedinherited |
Definition at line 53 of file AggregateInfoBase.java.
Referenced by com.cloudera.impala.analysis.AggregateInfo.createMergeAggInfo(), and com.cloudera.impala.analysis.AggregateInfo.getMaterializedAggregateExprs().
|
private |
Definition at line 76 of file AggregateInfo.java.
Referenced by com.cloudera.impala.analysis.AggregateInfo.createMergeAggInfo(), com.cloudera.impala.analysis.AggregateInfo.debugString(), and com.cloudera.impala.analysis.AggregateInfo.getMergeAggInfo().
|
protected |
Definition at line 97 of file AggregateInfo.java.
Referenced by com.cloudera.impala.analysis.AggregateInfo.getOutputToIntermediateSmap().
|
protectedinherited |
Definition at line 47 of file AggregateInfoBase.java.
Referenced by com.cloudera.impala.analysis.AggregateInfo.createMergeAggInfo(), com.cloudera.impala.analysis.AggregateInfo.createSmaps(), com.cloudera.impala.analysis.AggregateInfoBase.createTupleDescs(), com.cloudera.impala.analysis.AggregateInfoBase.debugString(), com.cloudera.impala.analysis.AggregateInfoBase.getOutputTupleDesc(), com.cloudera.impala.analysis.AggregateInfo.getRefdSlots(), and com.cloudera.impala.analysis.AggregateInfoBase.requiresIntermediateTuple().
|
protected |
Definition at line 93 of file AggregateInfo.java.
Referenced by com.cloudera.impala.analysis.AggregateInfo.createMergeAggInfo(), com.cloudera.impala.analysis.AggregateInfo.createSmaps(), and com.cloudera.impala.analysis.AggregateInfo.getOutputSmap().
|
private |
Definition at line 101 of file AggregateInfo.java.
Referenced by com.cloudera.impala.analysis.AggregateInfo.getPartitionExprs(), and com.cloudera.impala.analysis.AggregateInfo.setPartitionExprs().
|
private |
Definition at line 82 of file AggregateInfo.java.
Referenced by com.cloudera.impala.analysis.AggregateInfo.createSecondPhaseAggInfo(), com.cloudera.impala.analysis.AggregateInfo.debugString(), com.cloudera.impala.analysis.AggregateInfo.getSecondPhaseDistinctAggInfo(), com.cloudera.impala.analysis.AggregateInfo.hasAggregateExprs(), com.cloudera.impala.analysis.AggregateInfo.isDistinctAgg(), and com.cloudera.impala.analysis.AggregateInfo.substitute().