Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
com.cloudera.impala.analysis.AggregateInfo Class Reference
Inheritance diagram for com.cloudera.impala.analysis.AggregateInfo:
Collaboration diagram for com.cloudera.impala.analysis.AggregateInfo:

Classes

enum  AggPhase
 

Public Member Functions

List< ExprgetPartitionExprs ()
 
void setPartitionExprs (List< Expr > exprs)
 
AggregateInfo getMergeAggInfo ()
 
AggregateInfo getSecondPhaseDistinctAggInfo ()
 
AggPhase getAggPhase ()
 
boolean isMerge ()
 
boolean isDistinctAgg ()
 
ExprSubstitutionMap getIntermediateSmap ()
 
ExprSubstitutionMap getOutputSmap ()
 
ExprSubstitutionMap getOutputToIntermediateSmap ()
 
boolean hasAggregateExprs ()
 
TupleId getResultTupleId ()
 
ArrayList< FunctionCallExprgetMaterializedAggregateExprs ()
 
void getRefdSlots (List< SlotId > ids)
 
void substitute (ExprSubstitutionMap smap, Analyzer analyzer) throws InternalException
 
void createSmaps (Analyzer analyzer)
 
void materializeRequiredSlots (Analyzer analyzer, ExprSubstitutionMap smap)
 
void checkConsistency ()
 
DataPartition getPartition ()
 
String debugString ()
 
ArrayList< ExprgetGroupingExprs ()
 
ArrayList< FunctionCallExprgetAggregateExprs ()
 
TupleDescriptor getOutputTupleDesc ()
 
TupleDescriptor getIntermediateTupleDesc ()
 
TupleId getIntermediateTupleId ()
 
TupleId getOutputTupleId ()
 
boolean requiresIntermediateTuple ()
 

Static Public Member Functions

static AggregateInfo create (ArrayList< Expr > groupingExprs, ArrayList< FunctionCallExpr > aggExprs, TupleDescriptor tupleDesc, Analyzer analyzer) throws AnalysisException
 
static< TextendsExpr > boolean requiresIntermediateTuple (List< T > aggExprs)
 

Protected Member Functions

String tupleDebugName ()
 
void createTupleDescs (Analyzer analyzer)
 

Protected Attributes

ExprSubstitutionMap intermediateTupleSmap_ = new ExprSubstitutionMap()
 
ExprSubstitutionMap outputTupleSmap_ = new ExprSubstitutionMap()
 
final ExprSubstitutionMap outputToIntermediateTupleSmap_
 
ArrayList< ExprgroupingExprs_
 
ArrayList< FunctionCallExpraggregateExprs_
 
TupleDescriptor intermediateTupleDesc_
 
TupleDescriptor outputTupleDesc_
 
ArrayList< Integer > materializedSlots_ = Lists.newArrayList()
 

Private Member Functions

 AggregateInfo (ArrayList< Expr > groupingExprs, ArrayList< FunctionCallExpr > aggExprs, AggPhase aggPhase)
 
void createDistinctAggInfo (ArrayList< Expr > origGroupingExprs, ArrayList< FunctionCallExpr > distinctAggExprs, Analyzer analyzer) throws AnalysisException
 
void createMergeAggInfo (Analyzer analyzer)
 
Expr createCountDistinctAggExprParam (int firstIdx, int lastIdx, ArrayList< SlotDescriptor > slots)
 
void createSecondPhaseAggInfo (ArrayList< Expr > origGroupingExprs, ArrayList< FunctionCallExpr > distinctAggExprs, Analyzer analyzer) throws AnalysisException
 
void createSecondPhaseAggSMap (AggregateInfo inputAggInfo, ArrayList< FunctionCallExpr > distinctAggExprs)
 

Private Attributes

AggregateInfo mergeAggInfo_
 
AggregateInfo secondPhaseDistinctAggInfo_
 
final AggPhase aggPhase_
 
List< ExprpartitionExprs_
 

Static Private Attributes

static final Logger LOG = LoggerFactory.getLogger(AggregateInfo.class)
 

Detailed Description

Encapsulates all the information needed to compute the aggregate functions of a single Select block, including a possible 2nd phase aggregation step for DISTINCT aggregate functions and merge aggregation steps needed for distributed execution.

The latter requires a tree structure of AggregateInfo objects which express the original aggregate computations as well as the necessary merging aggregate computations. TODO: get rid of this by transforming SELECT COUNT(DISTINCT a, b, ..) GROUP BY x, y, ... into an equivalent query with a inline view: SELECT COUNT(*) FROM (SELECT DISTINCT a, b, ..., x, y, ...) GROUP BY x, y, ...

The tree structure looks as follows:

  • for non-distinct aggregation:
    • aggInfo: contains the original aggregation functions and grouping exprs
    • aggInfo.mergeAggInfo: contains the merging aggregation functions (grouping exprs are identical)
  • for distinct aggregation (for an explanation of the phases, see SelectStmt.createDistinctAggInfo()):
    • aggInfo: contains the phase 1 aggregate functions and grouping exprs
    • aggInfo.2ndPhaseDistinctAggInfo: contains the phase 2 aggregate functions and grouping exprs
    • aggInfo.mergeAggInfo: contains the merging aggregate functions for the phase 1 computation (grouping exprs are identical)
    • aggInfo.2ndPhaseDistinctAggInfo.mergeAggInfo: contains the merging aggregate functions for the phase 2 computation (grouping exprs are identical)

In general, merging aggregate computations are idempotent; in other words, aggInfo.mergeAggInfo == aggInfo.mergeAggInfo.mergeAggInfo.

TODO: move the merge construction logic from SelectStmt into AggregateInfo TODO: Add query tests for aggregation with intermediate tuples with num_nodes=1.

Definition at line 66 of file AggregateInfo.java.

Constructor & Destructor Documentation

com.cloudera.impala.analysis.AggregateInfo.AggregateInfo ( ArrayList< Expr groupingExprs,
ArrayList< FunctionCallExpr aggExprs,
AggPhase  aggPhase 
)
inlineprivate

Member Function Documentation

void com.cloudera.impala.analysis.AggregateInfo.checkConsistency ( )
inline

Validates the internal state of this agg info: Checks that the number of materialized slots of the output tuple corresponds to the number of materialized aggregate functions plus the number of grouping exprs. Also checks that the return types of the aggregate and grouping exprs correspond to the slots in the output tuple.

Definition at line 596 of file AggregateInfo.java.

References com.cloudera.impala.analysis.Expr.getType(), and com.cloudera.impala.analysis.AggregateInfoBase.groupingExprs_.

static AggregateInfo com.cloudera.impala.analysis.AggregateInfo.create ( ArrayList< Expr groupingExprs,
ArrayList< FunctionCallExpr aggExprs,
TupleDescriptor  tupleDesc,
Analyzer  analyzer 
) throws AnalysisException
inlinestatic

Creates complete AggregateInfo for groupingExprs and aggExprs, including aggTupleDesc and aggTupleSMap. If parameter tupleDesc != null, sets aggTupleDesc to that instead of creating a new descriptor (after verifying that the passed-in descriptor is correct for the given aggregation). Also creates mergeAggInfo and secondPhaseDistinctAggInfo, if needed. If an aggTupleDesc is created, also registers eq predicates between the grouping exprs and their respective slots with 'analyzer'.

Definition at line 122 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfo.AggregateInfo(), and com.cloudera.impala.analysis.AggregateInfo.AggPhase.FIRST.

Expr com.cloudera.impala.analysis.AggregateInfo.createCountDistinctAggExprParam ( int  firstIdx,
int  lastIdx,
ArrayList< SlotDescriptor slots 
)
inlineprivate

Creates an IF function call that returns NULL if any of the slots at indexes [firstIdx, lastIdx] return NULL. For example, the resulting IF function would like this for 3 slots: IF(IsNull(slot1), NULL, IF(IsNull(slot2), NULL, slot3)) Returns null if firstIdx is greater than lastIdx. Returns a SlotRef to the last slot if there is only one slot in range.

Definition at line 370 of file AggregateInfo.java.

Referenced by com.cloudera.impala.analysis.AggregateInfo.createSecondPhaseAggInfo().

void com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo ( ArrayList< Expr origGroupingExprs,
ArrayList< FunctionCallExpr distinctAggExprs,
Analyzer  analyzer 
) throws AnalysisException
inlineprivate

Create aggregate info for select block containing aggregate exprs with DISTINCT clause. This creates:

  • aggTupleDesc
  • a complete secondPhaseDistinctAggInfo
  • mergeAggInfo

At the moment, we require that all distinct aggregate functions be applied to the same set of exprs (ie, we can't do something like SELECT COUNT(DISTINCT id), COUNT(DISTINCT address)). Aggregation happens in two successive phases:

  • the first phase aggregates by all grouping exprs plus all parameter exprs of DISTINCT aggregate functions

Example: SELECT a, COUNT(DISTINCT b, c), MIN(d), COUNT(*) FROM T GROUP BY a

  • 1st phase grouping exprs: a, b, c
  • 1st phase agg exprs: MIN(d), COUNT(*)
  • 2nd phase grouping exprs: a
  • 2nd phase agg exprs: COUNT(*), MIN(<MIN(d) from 1st phase>), SUM(<COUNT(*) from 1st phase>)

TODO: expand implementation to cover the general case; this will require a different execution strategy

Definition at line 188 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfo.createMergeAggInfo(), com.cloudera.impala.analysis.AggregateInfo.createSecondPhaseAggInfo(), com.cloudera.impala.analysis.AggregateInfo.createSmaps(), com.cloudera.impala.analysis.AggregateInfoBase.createTupleDescs(), and com.cloudera.impala.analysis.Expr.equalLists().

void com.cloudera.impala.analysis.AggregateInfo.createMergeAggInfo ( Analyzer  analyzer)
inlineprivate
void com.cloudera.impala.analysis.AggregateInfo.createSecondPhaseAggInfo ( ArrayList< Expr origGroupingExprs,
ArrayList< FunctionCallExpr distinctAggExprs,
Analyzer  analyzer 
) throws AnalysisException
inlineprivate

Create the info for an aggregation node that computes the second phase of of DISTINCT aggregate functions. (Refer to createDistinctAggInfo() for an explanation of the phases.)

  • 'this' is the phase 1 aggregation
  • grouping exprs are those of the original query (param origGroupingExprs)
  • aggregate exprs for the DISTINCT agg fns: these are aggregating the grouping slots that were added to the original grouping slots in phase 1; count is mapped to count(*) and sum is mapped to sum
  • other aggregate exprs: same as the non-DISTINCT merge case (count is mapped to sum, everything else stays the same)

This call also creates the tuple descriptor and smap for the returned AggregateInfo.

Definition at line 404 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfoBase.aggregateExprs_, com.cloudera.impala.analysis.AggregateInfo.AggregateInfo(), com.cloudera.impala.analysis.AggregateInfo.createCountDistinctAggExprParam(), com.cloudera.impala.analysis.AggregateInfoBase.getGroupingExprs(), com.cloudera.impala.analysis.TupleDescriptor.getSlots(), com.cloudera.impala.analysis.AggregateInfoBase.intermediateTupleDesc_, com.cloudera.impala.analysis.AggregateInfo.intermediateTupleSmap_, com.cloudera.impala.analysis.AggregateInfo.AggPhase.SECOND, and com.cloudera.impala.analysis.AggregateInfo.secondPhaseDistinctAggInfo_.

Referenced by com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo().

void com.cloudera.impala.analysis.AggregateInfo.createSecondPhaseAggSMap ( AggregateInfo  inputAggInfo,
ArrayList< FunctionCallExpr distinctAggExprs 
)
inlineprivate

Create smap to map original grouping and aggregate exprs onto output of secondPhaseDistinctAggInfo.

Definition at line 475 of file AggregateInfo.java.

void com.cloudera.impala.analysis.AggregateInfo.createSmaps ( Analyzer  analyzer)
inline
void com.cloudera.impala.analysis.AggregateInfoBase.createTupleDescs ( Analyzer  analyzer)
inlineprotectedinherited

Creates the intermediate and output tuple descriptors. If no agg expr has an intermediate type different from its output type, then only the output tuple descriptor is created and the intermediate tuple is set to the output tuple.

Definition at line 70 of file AggregateInfoBase.java.

References com.cloudera.impala.analysis.AggregateInfoBase.aggregateExprs_, com.cloudera.impala.analysis.AggregateInfoBase.createTupleDesc(), com.cloudera.impala.analysis.AggregateInfoBase.intermediateTupleDesc_, com.cloudera.impala.analysis.AggregateInfoBase.outputTupleDesc_, and com.cloudera.impala.analysis.AggregateInfoBase.requiresIntermediateTuple().

Referenced by com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo().

AggPhase com.cloudera.impala.analysis.AggregateInfo.getAggPhase ( )
inline
ExprSubstitutionMap com.cloudera.impala.analysis.AggregateInfo.getIntermediateSmap ( )
inline
TupleDescriptor com.cloudera.impala.analysis.AggregateInfoBase.getIntermediateTupleDesc ( )
inlineinherited
TupleId com.cloudera.impala.analysis.AggregateInfoBase.getIntermediateTupleId ( )
inlineinherited
ArrayList<FunctionCallExpr> com.cloudera.impala.analysis.AggregateInfo.getMaterializedAggregateExprs ( )
inline
AggregateInfo com.cloudera.impala.analysis.AggregateInfo.getMergeAggInfo ( )
inline
ExprSubstitutionMap com.cloudera.impala.analysis.AggregateInfo.getOutputSmap ( )
inline
ExprSubstitutionMap com.cloudera.impala.analysis.AggregateInfo.getOutputToIntermediateSmap ( )
inline
TupleDescriptor com.cloudera.impala.analysis.AggregateInfoBase.getOutputTupleDesc ( )
inlineinherited
DataPartition com.cloudera.impala.analysis.AggregateInfo.getPartition ( )
inline

Returns DataPartition derived from grouping exprs. Returns unpartitioned spec if no grouping. TODO: this won't work when we start supporting range partitions, because we could derive both hash and order-based partitions

Definition at line 636 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfoBase.groupingExprs_, and com.cloudera.impala.planner.DataPartition.UNPARTITIONED.

List<Expr> com.cloudera.impala.analysis.AggregateInfo.getPartitionExprs ( )
inline
void com.cloudera.impala.analysis.AggregateInfo.getRefdSlots ( List< SlotId ids)
inline

Append ids of all slots that are being referenced in the process of performing the aggregate computation described by this AggregateInfo.

Definition at line 264 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfoBase.aggregateExprs_, com.cloudera.impala.analysis.AggregateInfoBase.groupingExprs_, and com.cloudera.impala.analysis.AggregateInfoBase.outputTupleDesc_.

TupleId com.cloudera.impala.analysis.AggregateInfo.getResultTupleId ( )
inline

Return the tuple id produced in the final aggregation step.

Definition at line 247 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfoBase.getOutputTupleId(), and com.cloudera.impala.analysis.AggregateInfo.isDistinctAgg().

AggregateInfo com.cloudera.impala.analysis.AggregateInfo.getSecondPhaseDistinctAggInfo ( )
inline
boolean com.cloudera.impala.analysis.AggregateInfo.hasAggregateExprs ( )
inline
boolean com.cloudera.impala.analysis.AggregateInfo.isMerge ( )
inline

Definition at line 230 of file AggregateInfo.java.

void com.cloudera.impala.analysis.AggregateInfo.materializeRequiredSlots ( Analyzer  analyzer,
ExprSubstitutionMap  smap 
)
inline

Mark slots required for this aggregation as materialized:

  • all grouping output slots as well as grouping exprs
  • for non-distinct aggregation: the aggregate exprs of materialized aggregate slots; this assumes that the output slots corresponding to aggregate exprs have already been marked by the consumer of this select block
  • for distinct aggregation, we mark all aggregate output slots in order to keep things simple Also computes materializedAggregateExprs. This call must be idempotent because it may be called more than once for Union stmt.

Definition at line 556 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfoBase.groupingExprs_, com.cloudera.impala.analysis.AggregateInfo.isDistinctAgg(), and com.cloudera.impala.analysis.SlotDescriptor.isMaterialized().

static <TextendsExpr> boolean com.cloudera.impala.analysis.AggregateInfoBase.requiresIntermediateTuple ( List< T >  aggExprs)
inlinestaticinherited

Returns true if evaluating the given aggregate exprs requires an intermediate tuple, i.e., whether one of the aggregate functions has an intermediate type different from its output type.

Definition at line 173 of file AggregateInfoBase.java.

void com.cloudera.impala.analysis.AggregateInfo.setPartitionExprs ( List< Expr exprs)
inline
void com.cloudera.impala.analysis.AggregateInfo.substitute ( ExprSubstitutionMap  smap,
Analyzer  analyzer 
) throws InternalException
inline

Substitute all the expressions (grouping expr, aggregate expr) and update our substitution map according to the given substitution map:

  • smap typically maps from tuple t1 to tuple t2 (example: the smap of an inline view maps the virtual table ref t1 into a base table ref t2)
  • our grouping and aggregate exprs need to be substituted with the given smap so that they also reference t2
  • aggTupleSMap needs to be recomputed to map exprs based on t2 onto our aggTupleDesc (ie, the left-hand side needs to be substituted with smap)
  • mergeAggInfo: this is not affected, because
    • its grouping and aggregate exprs only reference aggTupleDesc_
    • its smap is identical to aggTupleSMap_
  • 2ndPhaseDistinctAggInfo:
    • its grouping and aggregate exprs also only reference aggTupleDesc_ and are therefore not affected
    • its smap needs to be recomputed to map exprs based on t2 to its own aggTupleDesc

Definition at line 295 of file AggregateInfo.java.

References com.cloudera.impala.analysis.AggregateInfoBase.aggregateExprs_, com.cloudera.impala.analysis.AggregateInfoBase.groupingExprs_, and com.cloudera.impala.analysis.AggregateInfo.secondPhaseDistinctAggInfo_.

String com.cloudera.impala.analysis.AggregateInfo.tupleDebugName ( )
inlineprotected

Definition at line 663 of file AggregateInfo.java.

Member Data Documentation

final Logger com.cloudera.impala.analysis.AggregateInfo.LOG = LoggerFactory.getLogger(AggregateInfo.class)
staticprivate

Definition at line 67 of file AggregateInfo.java.

ArrayList<Integer> com.cloudera.impala.analysis.AggregateInfoBase.materializedSlots_ = Lists.newArrayList()
protectedinherited
final ExprSubstitutionMap com.cloudera.impala.analysis.AggregateInfo.outputToIntermediateTupleSmap_
protected
Initial value:
=
new ExprSubstitutionMap()

Definition at line 97 of file AggregateInfo.java.

Referenced by com.cloudera.impala.analysis.AggregateInfo.getOutputToIntermediateSmap().

List<Expr> com.cloudera.impala.analysis.AggregateInfo.partitionExprs_
private

The documentation for this class was generated from the following file: