Inheritance diagram for com.cloudera.impala.analysis.AggregateInfo:

Collaboration diagram for com.cloudera.impala.analysis.AggregateInfo:

Classes
enum	AggPhase

Public Member Functions
List< Expr >	getPartitionExprs ()

void	setPartitionExprs (List< Expr > exprs)

AggregateInfo	getMergeAggInfo ()

AggregateInfo	getSecondPhaseDistinctAggInfo ()

AggPhase	getAggPhase ()

boolean	isMerge ()

boolean	isDistinctAgg ()

ExprSubstitutionMap	getIntermediateSmap ()

ExprSubstitutionMap	getOutputSmap ()

ExprSubstitutionMap	getOutputToIntermediateSmap ()

boolean	hasAggregateExprs ()

TupleId	getResultTupleId ()

ArrayList< FunctionCallExpr >	getMaterializedAggregateExprs ()

void	getRefdSlots (List< SlotId > ids)

void	substitute (ExprSubstitutionMap smap, Analyzer analyzer) throws InternalException

void	createSmaps (Analyzer analyzer)

void	materializeRequiredSlots (Analyzer analyzer, ExprSubstitutionMap smap)

void	checkConsistency ()

DataPartition	getPartition ()

String	debugString ()

ArrayList< Expr >	getGroupingExprs ()

ArrayList< FunctionCallExpr >	getAggregateExprs ()

TupleDescriptor	getOutputTupleDesc ()

TupleDescriptor	getIntermediateTupleDesc ()

TupleId	getIntermediateTupleId ()

TupleId	getOutputTupleId ()

boolean	requiresIntermediateTuple ()

Static Public Member Functions
static AggregateInfo	create (ArrayList< Expr > groupingExprs, ArrayList< FunctionCallExpr > aggExprs, TupleDescriptor tupleDesc, Analyzer analyzer) throws AnalysisException

static< TextendsExpr > boolean	requiresIntermediateTuple (List< T > aggExprs)

Protected Member Functions
String	tupleDebugName ()

void	createTupleDescs (Analyzer analyzer)

Protected Attributes
ExprSubstitutionMap	intermediateTupleSmap_ = new ExprSubstitutionMap()

ExprSubstitutionMap	outputTupleSmap_ = new ExprSubstitutionMap()

final ExprSubstitutionMap	outputToIntermediateTupleSmap_

ArrayList< Expr >	groupingExprs_

ArrayList< FunctionCallExpr >	aggregateExprs_

TupleDescriptor	intermediateTupleDesc_

TupleDescriptor	outputTupleDesc_

ArrayList< Integer >	materializedSlots_ = Lists.newArrayList()

Private Member Functions
	AggregateInfo (ArrayList< Expr > groupingExprs, ArrayList< FunctionCallExpr > aggExprs, AggPhase aggPhase)

void	createDistinctAggInfo (ArrayList< Expr > origGroupingExprs, ArrayList< FunctionCallExpr > distinctAggExprs, Analyzer analyzer) throws AnalysisException

void	createMergeAggInfo (Analyzer analyzer)

Expr	createCountDistinctAggExprParam (int firstIdx, int lastIdx, ArrayList< SlotDescriptor > slots)

void	createSecondPhaseAggInfo (ArrayList< Expr > origGroupingExprs, ArrayList< FunctionCallExpr > distinctAggExprs, Analyzer analyzer) throws AnalysisException

void	createSecondPhaseAggSMap (AggregateInfo inputAggInfo, ArrayList< FunctionCallExpr > distinctAggExprs)

Private Attributes
AggregateInfo	mergeAggInfo_

AggregateInfo	secondPhaseDistinctAggInfo_

final AggPhase	aggPhase_

List< Expr >	partitionExprs_

Static Private Attributes
static final Logger	LOG = LoggerFactory.getLogger(AggregateInfo.class)

Detailed Description

Encapsulates all the information needed to compute the aggregate functions of a single Select block, including a possible 2nd phase aggregation step for DISTINCT aggregate functions and merge aggregation steps needed for distributed execution.

The latter requires a tree structure of AggregateInfo objects which express the original aggregate computations as well as the necessary merging aggregate computations. TODO: get rid of this by transforming SELECT COUNT(DISTINCT a, b, ..) GROUP BY x, y, ... into an equivalent query with a inline view: SELECT COUNT(*) FROM (SELECT DISTINCT a, b, ..., x, y, ...) GROUP BY x, y, ...

The tree structure looks as follows:

for non-distinct aggregation:
- aggInfo: contains the original aggregation functions and grouping exprs
- aggInfo.mergeAggInfo: contains the merging aggregation functions (grouping exprs are identical)
for distinct aggregation (for an explanation of the phases, see SelectStmt.createDistinctAggInfo()):
- aggInfo: contains the phase 1 aggregate functions and grouping exprs
- aggInfo.2ndPhaseDistinctAggInfo: contains the phase 2 aggregate functions and grouping exprs
- aggInfo.mergeAggInfo: contains the merging aggregate functions for the phase 1 computation (grouping exprs are identical)
- aggInfo.2ndPhaseDistinctAggInfo.mergeAggInfo: contains the merging aggregate functions for the phase 2 computation (grouping exprs are identical)

In general, merging aggregate computations are idempotent; in other words, aggInfo.mergeAggInfo == aggInfo.mergeAggInfo.mergeAggInfo.

TODO: move the merge construction logic from SelectStmt into AggregateInfo TODO: Add query tests for aggregation with intermediate tuples with num_nodes=1.

Definition at line 66 of file AggregateInfo.java.

Constructor & Destructor Documentation

com.cloudera.impala.analysis.AggregateInfo.AggregateInfo	(	ArrayList< Expr >	groupingExprs,
		ArrayList< FunctionCallExpr >	aggExprs,
		AggPhase	aggPhase
	)

inlineprivate

Definition at line 104 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfo.aggPhase_.

Referenced by com.cloudera.impala.analysis.AggregateInfo.create(), and com.cloudera.impala.analysis.AggregateInfo.createSecondPhaseAggInfo().

Member Function Documentation

void com.cloudera.impala.analysis.AggregateInfo.checkConsistency ( )

inline

Validates the internal state of this agg info: Checks that the number of materialized slots of the output tuple corresponds to the number of materialized aggregate functions plus the number of grouping exprs. Also checks that the return types of the aggregate and grouping exprs correspond to the slots in the output tuple.

Definition at line 596 of file AggregateInfo.java.

References com.cloudera.impala.analysis.Expr.getType(), and com.cloudera.impala.analysis.AggregateInfoBase.groupingExprs_.

static AggregateInfo com.cloudera.impala.analysis.AggregateInfo.create	(	ArrayList< Expr >	groupingExprs,
		ArrayList< FunctionCallExpr >	aggExprs,
		TupleDescriptor	tupleDesc,
		Analyzer	analyzer
	)		throws AnalysisException

inlinestatic

Creates complete AggregateInfo for groupingExprs and aggExprs, including aggTupleDesc and aggTupleSMap. If parameter tupleDesc != null, sets aggTupleDesc to that instead of creating a new descriptor (after verifying that the passed-in descriptor is correct for the given aggregation). Also creates mergeAggInfo and secondPhaseDistinctAggInfo, if needed. If an aggTupleDesc is created, also registers eq predicates between the grouping exprs and their respective slots with 'analyzer'.

Definition at line 122 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfo.AggregateInfo(), and com.cloudera.impala.analysis.AggregateInfo.AggPhase.FIRST.

Expr com.cloudera.impala.analysis.AggregateInfo.createCountDistinctAggExprParam	(	int	firstIdx,
		int	lastIdx,
		ArrayList< SlotDescriptor >	slots
	)

inlineprivate

Creates an IF function call that returns NULL if any of the slots at indexes [firstIdx, lastIdx] return NULL. For example, the resulting IF function would like this for 3 slots: IF(IsNull(slot1), NULL, IF(IsNull(slot2), NULL, slot3)) Returns null if firstIdx is greater than lastIdx. Returns a SlotRef to the last slot if there is only one slot in range.

Definition at line 370 of file AggregateInfo.java.

Referenced by com.cloudera.impala.analysis.AggregateInfo.createSecondPhaseAggInfo().

void com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo	(	ArrayList< Expr >	origGroupingExprs,
		ArrayList< FunctionCallExpr >	distinctAggExprs,
		Analyzer	analyzer
	)		throws AnalysisException

inlineprivate

Create aggregate info for select block containing aggregate exprs with DISTINCT clause. This creates:

aggTupleDesc
a complete secondPhaseDistinctAggInfo
mergeAggInfo

At the moment, we require that all distinct aggregate functions be applied to the same set of exprs (ie, we can't do something like SELECT COUNT(DISTINCT id), COUNT(DISTINCT address)). Aggregation happens in two successive phases:

the first phase aggregates by all grouping exprs plus all parameter exprs of DISTINCT aggregate functions

Example: SELECT a, COUNT(DISTINCT b, c), MIN(d), COUNT(*) FROM T GROUP BY a

1st phase grouping exprs: a, b, c
1st phase agg exprs: MIN(d), COUNT(*)
2nd phase grouping exprs: a
2nd phase agg exprs: COUNT(*), MIN(<MIN(d) from 1st phase>), SUM(<COUNT(*) from 1st phase>)

TODO: expand implementation to cover the general case; this will require a different execution strategy

Definition at line 188 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfo.createMergeAggInfo(), com.cloudera.impala.analysis.AggregateInfo.createSecondPhaseAggInfo(), com.cloudera.impala.analysis.AggregateInfo.createSmaps(), com.cloudera.impala.analysis.AggregateInfoBase.createTupleDescs(), and com.cloudera.impala.analysis.Expr.equalLists().

void com.cloudera.impala.analysis.AggregateInfo.createMergeAggInfo ( Analyzer analyzer )

inlineprivate

Create the info for an aggregation node that merges its pre-aggregated inputs:

pre-aggregation is computed by 'this'
tuple desc and smap are the same as that of the input (we're materializing the same logical tuple)
grouping exprs: slotrefs to the input's grouping slots
aggregate exprs: aggregation of the input's aggregateExprs slots

The returned AggregateInfo shares its descriptor and smap with the input info; createAggTupleDesc() must not be called on it.

Definition at line 328 of file AggregateInfo.java.

Referenced by com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo().

void com.cloudera.impala.analysis.AggregateInfo.createSecondPhaseAggInfo	(	ArrayList< Expr >	origGroupingExprs,
		ArrayList< FunctionCallExpr >	distinctAggExprs,
		Analyzer	analyzer
	)		throws AnalysisException

inlineprivate

Create the info for an aggregation node that computes the second phase of of DISTINCT aggregate functions. (Refer to createDistinctAggInfo() for an explanation of the phases.)

'this' is the phase 1 aggregation
grouping exprs are those of the original query (param origGroupingExprs)
aggregate exprs for the DISTINCT agg fns: these are aggregating the grouping slots that were added to the original grouping slots in phase 1; count is mapped to count(*) and sum is mapped to sum
other aggregate exprs: same as the non-DISTINCT merge case (count is mapped to sum, everything else stays the same)

This call also creates the tuple descriptor and smap for the returned AggregateInfo.

Definition at line 404 of file AggregateInfo.java.

Referenced by com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo().

void com.cloudera.impala.analysis.AggregateInfo.createSecondPhaseAggSMap	(	AggregateInfo	inputAggInfo,
		ArrayList< FunctionCallExpr >	distinctAggExprs
	)

inlineprivate

Create smap to map original grouping and aggregate exprs onto output of secondPhaseDistinctAggInfo.

Definition at line 475 of file AggregateInfo.java.

void com.cloudera.impala.analysis.AggregateInfo.createSmaps ( Analyzer analyzer )

inline

Populates the output and intermediate smaps based on the output and intermediate tuples that are assumed to be set. If an intermediate tuple is required, also populates the output-to-intermediate smap and registers auxiliary equivalence predicates between the grouping slots of the two tuples.

Definition at line 515 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfoBase.aggregateExprs_, com.cloudera.impala.analysis.TupleDescriptor.getSlots(), com.cloudera.impala.analysis.AggregateInfoBase.groupingExprs_, com.cloudera.impala.analysis.AggregateInfoBase.intermediateTupleDesc_, com.cloudera.impala.analysis.AggregateInfo.intermediateTupleSmap_, com.cloudera.impala.analysis.AggregateInfoBase.outputTupleDesc_, com.cloudera.impala.analysis.AggregateInfo.outputTupleSmap_, and com.cloudera.impala.analysis.AggregateInfoBase.requiresIntermediateTuple().

Referenced by com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo().

void com.cloudera.impala.analysis.AggregateInfoBase.createTupleDescs ( Analyzer analyzer )

inlineprotectedinherited

Creates the intermediate and output tuple descriptors. If no agg expr has an intermediate type different from its output type, then only the output tuple descriptor is created and the intermediate tuple is set to the output tuple.

Definition at line 70 of file AggregateInfoBase.java.

References com.cloudera.impala.analysis.AggregateInfoBase.aggregateExprs_, com.cloudera.impala.analysis.AggregateInfoBase.createTupleDesc(), com.cloudera.impala.analysis.AggregateInfoBase.intermediateTupleDesc_, com.cloudera.impala.analysis.AggregateInfoBase.outputTupleDesc_, and com.cloudera.impala.analysis.AggregateInfoBase.requiresIntermediateTuple().

Referenced by com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo().

String com.cloudera.impala.analysis.AggregateInfo.debugString ( )

inline

Definition at line 645 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfo.aggPhase_, com.cloudera.impala.analysis.AggregateInfo.mergeAggInfo_, and com.cloudera.impala.analysis.AggregateInfo.secondPhaseDistinctAggInfo_.

Referenced by com.cloudera.impala.planner.AggregationNode.debugString().

AggPhase com.cloudera.impala.analysis.AggregateInfo.getAggPhase ( )

inline

Definition at line 229 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfo.aggPhase_.

ArrayList<FunctionCallExpr> com.cloudera.impala.analysis.AggregateInfoBase.getAggregateExprs ( )

inlineinherited

Definition at line 157 of file AggregateInfoBase.java.

References com.cloudera.impala.analysis.AggregateInfoBase.aggregateExprs_.

Referenced by com.cloudera.impala.analysis.SelectStmt.analyze(), com.cloudera.impala.analysis.AnalyzerTest.checkSelectToThrift(), com.cloudera.impala.analysis.AggregateInfo.createMergeAggInfo(), com.cloudera.impala.planner.AggregationNode.getNodeExplainString(), and com.cloudera.impala.analysis.AggregateInfo.hasAggregateExprs().

ArrayList<Expr> com.cloudera.impala.analysis.AggregateInfoBase.getGroupingExprs ( )

inlineinherited

Definition at line 156 of file AggregateInfoBase.java.

References com.cloudera.impala.analysis.AggregateInfoBase.groupingExprs_.

Referenced by com.cloudera.impala.analysis.AnalyzerTest.checkSelectToThrift(), com.cloudera.impala.analysis.AggregateInfo.createMergeAggInfo(), com.cloudera.impala.analysis.AggregateInfo.createSecondPhaseAggInfo(), and com.cloudera.impala.planner.AggregationNode.getNodeExplainString().

ExprSubstitutionMap com.cloudera.impala.analysis.AggregateInfo.getIntermediateSmap ( )

inline

Definition at line 232 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfo.intermediateTupleSmap_.

TupleDescriptor com.cloudera.impala.analysis.AggregateInfoBase.getIntermediateTupleDesc ( )

inlineinherited

Definition at line 159 of file AggregateInfoBase.java.

References com.cloudera.impala.analysis.AggregateInfoBase.intermediateTupleDesc_.

TupleId com.cloudera.impala.analysis.AggregateInfoBase.getIntermediateTupleId ( )

inlineinherited

Definition at line 160 of file AggregateInfoBase.java.

Referenced by com.cloudera.impala.planner.AggregationNode.toThrift().

ArrayList<FunctionCallExpr> com.cloudera.impala.analysis.AggregateInfo.getMaterializedAggregateExprs ( )

inline

Definition at line 252 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfoBase.materializedSlots_.

Referenced by com.cloudera.impala.planner.AggregationNode.toThrift().

AggregateInfo com.cloudera.impala.analysis.AggregateInfo.getMergeAggInfo ( )

inline

Definition at line 225 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfo.mergeAggInfo_.

ExprSubstitutionMap com.cloudera.impala.analysis.AggregateInfo.getOutputSmap ( )

inline

Definition at line 233 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfo.outputTupleSmap_.

ExprSubstitutionMap com.cloudera.impala.analysis.AggregateInfo.getOutputToIntermediateSmap ( )

inline

Definition at line 234 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfo.outputToIntermediateTupleSmap_.

TupleDescriptor com.cloudera.impala.analysis.AggregateInfoBase.getOutputTupleDesc ( )

inlineinherited

Definition at line 158 of file AggregateInfoBase.java.

References com.cloudera.impala.analysis.AggregateInfoBase.outputTupleDesc_.

Referenced by com.cloudera.impala.planner.AnalyticPlanner.collectWindowGroups().

TupleId com.cloudera.impala.analysis.AggregateInfoBase.getOutputTupleId ( )

inlineinherited

Definition at line 161 of file AggregateInfoBase.java.

Referenced by com.cloudera.impala.planner.AggregationNode.AggregationNode(), com.cloudera.impala.planner.AnalyticPlanner.AnalyticPlanner(), com.cloudera.impala.analysis.SelectStmt.analyzeAggregation(), com.cloudera.impala.analysis.AggregateInfo.getResultTupleId(), com.cloudera.impala.planner.AggregationNode.setIntermediateTuple(), and com.cloudera.impala.planner.AggregationNode.toThrift().

DataPartition com.cloudera.impala.analysis.AggregateInfo.getPartition ( )

inline

Returns DataPartition derived from grouping exprs. Returns unpartitioned spec if no grouping. TODO: this won't work when we start supporting range partitions, because we could derive both hash and order-based partitions

Definition at line 636 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfoBase.groupingExprs_, and com.cloudera.impala.planner.DataPartition.UNPARTITIONED.

List<Expr> com.cloudera.impala.analysis.AggregateInfo.getPartitionExprs ( )

inline

Definition at line 110 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfo.partitionExprs_.

void com.cloudera.impala.analysis.AggregateInfo.getRefdSlots ( List< SlotId > ids )

inline

Append ids of all slots that are being referenced in the process of performing the aggregate computation described by this AggregateInfo.

Definition at line 264 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfoBase.aggregateExprs_, com.cloudera.impala.analysis.AggregateInfoBase.groupingExprs_, and com.cloudera.impala.analysis.AggregateInfoBase.outputTupleDesc_.

TupleId com.cloudera.impala.analysis.AggregateInfo.getResultTupleId ( )

inline

Return the tuple id produced in the final aggregation step.

Definition at line 247 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfoBase.getOutputTupleId(), and com.cloudera.impala.analysis.AggregateInfo.isDistinctAgg().

AggregateInfo com.cloudera.impala.analysis.AggregateInfo.getSecondPhaseDistinctAggInfo ( )

inline

Definition at line 226 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfo.secondPhaseDistinctAggInfo_.

Referenced by com.cloudera.impala.analysis.SelectStmt.analyzeAggregation().

boolean com.cloudera.impala.analysis.AggregateInfo.hasAggregateExprs ( )

inline

Definition at line 238 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfoBase.getAggregateExprs(), and com.cloudera.impala.analysis.AggregateInfo.secondPhaseDistinctAggInfo_.

boolean com.cloudera.impala.analysis.AggregateInfo.isDistinctAgg ( )

inline

Definition at line 231 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfo.secondPhaseDistinctAggInfo_.

Referenced by com.cloudera.impala.planner.SingleNodePlanner.createAggregationPlan(), com.cloudera.impala.analysis.AggregateInfo.getResultTupleId(), and com.cloudera.impala.analysis.AggregateInfo.materializeRequiredSlots().

boolean com.cloudera.impala.analysis.AggregateInfo.isMerge ( )

inline

Definition at line 230 of file AggregateInfo.java.

void com.cloudera.impala.analysis.AggregateInfo.materializeRequiredSlots	(	Analyzer	analyzer,
		ExprSubstitutionMap	smap
	)

inline

Mark slots required for this aggregation as materialized:

all grouping output slots as well as grouping exprs
for non-distinct aggregation: the aggregate exprs of materialized aggregate slots; this assumes that the output slots corresponding to aggregate exprs have already been marked by the consumer of this select block
for distinct aggregation, we mark all aggregate output slots in order to keep things simple Also computes materializedAggregateExprs. This call must be idempotent because it may be called more than once for Union stmt.

Definition at line 556 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfoBase.groupingExprs_, com.cloudera.impala.analysis.AggregateInfo.isDistinctAgg(), and com.cloudera.impala.analysis.SlotDescriptor.isMaterialized().

boolean com.cloudera.impala.analysis.AggregateInfoBase.requiresIntermediateTuple ( )

inlineinherited

Definition at line 162 of file AggregateInfoBase.java.

References com.cloudera.impala.analysis.AggregateInfoBase.intermediateTupleDesc_, and com.cloudera.impala.analysis.AggregateInfoBase.outputTupleDesc_.

Referenced by com.cloudera.impala.analysis.AggregateInfo.createSmaps(), and com.cloudera.impala.analysis.AggregateInfoBase.createTupleDescs().

static <TextendsExpr> boolean com.cloudera.impala.analysis.AggregateInfoBase.requiresIntermediateTuple ( List< T > aggExprs )

inlinestaticinherited

Returns true if evaluating the given aggregate exprs requires an intermediate tuple, i.e., whether one of the aggregate functions has an intermediate type different from its output type.

Definition at line 173 of file AggregateInfoBase.java.

void com.cloudera.impala.analysis.AggregateInfo.setPartitionExprs ( List< Expr > exprs )

inline

Definition at line 111 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfo.partitionExprs_.

void com.cloudera.impala.analysis.AggregateInfo.substitute	(	ExprSubstitutionMap	smap,
		Analyzer	analyzer
	)		throws InternalException

inline

Substitute all the expressions (grouping expr, aggregate expr) and update our substitution map according to the given substitution map:

smap typically maps from tuple t1 to tuple t2 (example: the smap of an inline view maps the virtual table ref t1 into a base table ref t2)
our grouping and aggregate exprs need to be substituted with the given smap so that they also reference t2
aggTupleSMap needs to be recomputed to map exprs based on t2 onto our aggTupleDesc (ie, the left-hand side needs to be substituted with smap)
mergeAggInfo: this is not affected, because
- its grouping and aggregate exprs only reference aggTupleDesc_
- its smap is identical to aggTupleSMap_
2ndPhaseDistinctAggInfo:
- its grouping and aggregate exprs also only reference aggTupleDesc_ and are therefore not affected
- its smap needs to be recomputed to map exprs based on t2 to its own aggTupleDesc