Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
com.cloudera.impala.planner.HashJoinNode Class Reference
Inheritance diagram for com.cloudera.impala.planner.HashJoinNode:
Collaboration diagram for com.cloudera.impala.planner.HashJoinNode:

Classes

enum  DistributionMode
 

Public Member Functions

 HashJoinNode (PlanNode outer, PlanNode inner, TableRef tblRef, List< BinaryPredicate > eqJoinConjuncts, List< Expr > otherJoinConjuncts)
 
List< BinaryPredicategetEqJoinConjuncts ()
 
JoinOperator getJoinOp ()
 
TableRef getTableRef ()
 
DistributionMode getDistributionMode ()
 
void setDistributionMode (DistributionMode distrMode)
 
void setAddProbeFilters (boolean b)
 
void init (Analyzer analyzer) throws InternalException
 
void computeStats (Analyzer analyzer)
 
void computeCosts (TQueryOptions queryOptions)
 
PlanNodeId getId ()
 
void setId (PlanNodeId id)
 
long getLimit ()
 
boolean hasLimit ()
 
long getPerHostMemCost ()
 
long getCardinality ()
 
int getNumNodes ()
 
float getAvgRowSize ()
 
void setFragment (PlanFragment fragment)
 
PlanFragment getFragment ()
 
List< ExprgetConjuncts ()
 
ExprSubstitutionMap getOutputSmap ()
 
void setOutputSmap (ExprSubstitutionMap smap)
 
Set< ExprIdgetAssignedConjuncts ()
 
void setAssignedConjuncts (Set< ExprId > conjuncts)
 
void setLimit (long limit)
 
void unsetLimit ()
 
ArrayList< TupleIdgetTupleIds ()
 
ArrayList< TupleIdgetTblRefIds ()
 
void setTblRefIds (ArrayList< TupleId > ids)
 
Set< TupleIdgetNullableTupleIds ()
 
void addConjuncts (List< Expr > conjuncts)
 
void transferConjuncts (PlanNode recipient)
 
String getExplainString ()
 
TPlan treeToThrift ()
 
boolean isBlockingNode ()
 
long getInputCardinality ()
 

Static Public Member Functions

static long addCardinalities (long a, long b)
 
static long multiplyCardinalities (long a, long b)
 

Protected Member Functions

String debugString ()
 
void toThrift (TPlanNode msg)
 
String getDisplayLabelDetail ()
 
String getNodeExplainString (String prefix, String detailPrefix, TExplainLevel detailLevel)
 
final String getExplainString (String rootPrefix, String prefix, TExplainLevel detailLevel)
 
String getExplainString (List<?extends Expr > exprs)
 
void setDisplayName (String s)
 
final String getDisplayLabel ()
 
String getOffsetExplainString (String prefix)
 
void assignConjuncts (Analyzer analyzer)
 
ExprSubstitutionMap getCombinedChildSmap ()
 
void createDefaultSmap (Analyzer analyzer)
 
long capAtLimit (long cardinality)
 
void markSlotsMaterialized (Analyzer analyzer, List< Expr > exprs)
 
void computeMemLayout (Analyzer analyzer)
 
double computeSelectivity ()
 
boolean hasValidStats ()
 

Protected Attributes

String displayName_
 
PlanNodeId id_
 
long limit_
 
ArrayList< TupleIdtupleIds_
 
ArrayList< TupleIdtblRefIds_
 
Set< TupleIdnullableTupleIds_ = Sets.newHashSet()
 
List< Exprconjuncts_ = Lists.newArrayList()
 
PlanFragment fragment_
 
ExprSubstitutionMap outputSmap_
 
Set< ExprIdassignedConjuncts_
 
long cardinality_
 
int numNodes_
 
float avgRowSize_
 
long perHostMemCost_ = -1
 

Static Protected Attributes

static final int DEFAULT_BATCH_SIZE = 1024
 

Private Member Functions

long getJoinCardinality (Analyzer analyzer)
 
long getSemiJoinCardinality ()
 
long getNdv (Expr expr)
 
String eqJoinConjunctsDebugString ()
 

Private Attributes

final TableRef tblRef_
 
final JoinOperator joinOp_
 
DistributionMode distrMode_
 
List< BinaryPredicateeqJoinConjuncts_
 
List< ExprotherJoinConjuncts_
 
boolean addProbeFilters_
 

Static Private Attributes

static final Logger LOG = LoggerFactory.getLogger(HashJoinNode.class)
 
static final long DEFAULT_PER_HOST_MEM = 2L * 1024L * 1024L * 1024L
 

Detailed Description

Hash join between left child (outer) and right child (inner). One child must be the plan generated for a table ref. Typically, that is the right child, but due to join inversion (for outer/semi/cross joins) it could also be the left child.

Definition at line 50 of file HashJoinNode.java.

Constructor & Destructor Documentation

Member Function Documentation

static long com.cloudera.impala.planner.PlanNode.addCardinalities ( long  a,
long  b 
)
inlinestaticinherited
void com.cloudera.impala.planner.PlanNode.addConjuncts ( List< Expr conjuncts)
inlineinherited

Definition at line 209 of file PlanNode.java.

void com.cloudera.impala.planner.PlanNode.createDefaultSmap ( Analyzer  analyzer)
inlineprotectedinherited
String com.cloudera.impala.planner.HashJoinNode.debugString ( )
inlineprotected
String com.cloudera.impala.planner.HashJoinNode.eqJoinConjunctsDebugString ( )
inlineprivate
Set<ExprId> com.cloudera.impala.planner.PlanNode.getAssignedConjuncts ( )
inlineinherited
ExprSubstitutionMap com.cloudera.impala.planner.PlanNode.getCombinedChildSmap ( )
inlineprotectedinherited
List<Expr> com.cloudera.impala.planner.PlanNode.getConjuncts ( )
inlineinherited
DistributionMode com.cloudera.impala.planner.HashJoinNode.getDistributionMode ( )
inline
List<BinaryPredicate> com.cloudera.impala.planner.HashJoinNode.getEqJoinConjuncts ( )
inline
final String com.cloudera.impala.planner.PlanNode.getExplainString ( String  rootPrefix,
String  prefix,
TExplainLevel  detailLevel 
)
inlineprotectedinherited
String com.cloudera.impala.planner.PlanNode.getExplainString ( List<?extends Expr exprs)
inlineprotectedinherited

Definition at line 506 of file PlanNode.java.

PlanFragment com.cloudera.impala.planner.PlanNode.getFragment ( )
inlineinherited
long com.cloudera.impala.planner.PlanNode.getInputCardinality ( )
inlineinherited

The input cardinality is the sum of output cardinalities of its children. For scan nodes the input cardinality is the expected number of rows scanned.

Definition at line 570 of file PlanNode.java.

References com.cloudera.impala.planner.PlanNode.addCardinalities().

long com.cloudera.impala.planner.HashJoinNode.getJoinCardinality ( Analyzer  analyzer)
inlineprivate

Returns the estimated cardinality of an inner or outer join. For a join between child(0) and child(1), we look for join conditions "L.c = R.d" (with L being from child(0) and R from child(1)) and use as the cardinality estimate the maximum of |child(0)| * |R| / NDV(R.d) * |child(1)| / |R| across all suitable join conditions, which simplifies to |child(0)| * |child(1)| / NDV(R.d) The reasoning is that

  • each row in child(0) joins with |R| / NDV(R.d) rows in R
  • each row in R is 'present' in |child(1)| / |R| rows in child(1)

This handles the very frequent case of a fact table/dimension table join (aka foreign key/primary key join) if the primary key is a single column, with possible additional predicates against the dimension table. An example: FROM FactTbl F JOIN Customers C D ON (F.cust_id = C.id) ... WHERE C.region = 'US'

  • if there are 5 regions, the selectivity of "C.region = 'US'" would be 0.2 and the output cardinality of the Customers scan would be 0.2 * # rows in Customers
  • # rows in Customers == # of distinct values for Customers.id
  • the output cardinality of the join would be F.cardinality * 0.2

Definition at line 222 of file HashJoinNode.java.

References com.cloudera.impala.planner.PlanNode.cardinality_, com.cloudera.impala.planner.HashJoinNode.eqJoinConjuncts_, com.cloudera.impala.catalog.ColumnStats.hasNumDistinctValues(), com.cloudera.impala.planner.HashJoinNode.joinOp_, and com.cloudera.impala.planner.PlanNode.multiplyCardinalities().

Referenced by com.cloudera.impala.planner.HashJoinNode.computeStats().

JoinOperator com.cloudera.impala.planner.HashJoinNode.getJoinOp ( )
inline
long com.cloudera.impala.planner.PlanNode.getLimit ( )
inlineinherited
long com.cloudera.impala.planner.HashJoinNode.getNdv ( Expr  expr)
inlineprivate

Unwraps the SlotRef in expr and returns the NDVs of it. Returns -1 if the NDVs are unknown or if expr is not a SlotRef.

Definition at line 349 of file HashJoinNode.java.

References com.cloudera.impala.catalog.ColumnStats.hasNumDistinctValues().

Referenced by com.cloudera.impala.planner.HashJoinNode.getSemiJoinCardinality().

Set<TupleId> com.cloudera.impala.planner.PlanNode.getNullableTupleIds ( )
inlineinherited
String com.cloudera.impala.planner.PlanNode.getOffsetExplainString ( String  prefix)
inlineprotectedinherited

Return the offset_ details, if applicable. This is available separately from 'getNodeExplainString' because we want to output 'limit: ...' (which can be printed from PlanNode) before 'offset: ...', which is only printed from SortNodes right now.

Definition at line 336 of file PlanNode.java.

Referenced by com.cloudera.impala.planner.PlanNode.getExplainString().

ExprSubstitutionMap com.cloudera.impala.planner.PlanNode.getOutputSmap ( )
inlineinherited

Definition at line 178 of file PlanNode.java.

References com.cloudera.impala.planner.PlanNode.outputSmap_.

long com.cloudera.impala.planner.PlanNode.getPerHostMemCost ( )
inlineinherited
long com.cloudera.impala.planner.HashJoinNode.getSemiJoinCardinality ( )
inlineprivate

Returns the estimated cardinality of a semi join node. For a left semi join between child(0) and child(1), we look for equality join conditions "L.c = R.d" (with L being from child(0) and R from child(1)) and use as the cardinality estimate the minimum of |child(0)| * Min(NDV(L.c), NDV(R.d)) / NDV(L.c) over all suitable join conditions. The reasoning is that:

  • each row in child(0) is returned at most once
  • the probability of a row in child(0) having a match in R is Min(NDV(L.c), NDV(R.d)) / NDV(L.c)

For a left anti join we estimate the cardinality as the minimum of: |L| * Max(NDV(L.c) - NDV(R.d), NDV(L.c)) / NDV(L.c) over all suitable join conditions. The reasoning is that:

  • each row in child(0) is returned at most once
  • if NDV(L.c) > NDV(R.d) then the probability of row in L having a match in child(1) is (NDV(L.c) - NDV(R.d)) / NDV(L.c)
  • otherwise, we conservatively use |L| to avoid underestimation

We analogously estimate the cardinality for right semi/anti joins, and treat the null-aware anti join like a regular anti join

TODO: In order to take into account additional conjuncts in the child child subtrees adjust NDV(L.c) by |child(0)| / |L| and the NDV(R.d) by |child(1)| / |R|. The adjustment is currently too dangerous due to the other planner bugs compounding to bad plans causing perf regressions (IMPALA-976).

Definition at line 294 of file HashJoinNode.java.

References com.cloudera.impala.planner.PlanNode.cardinality_, com.cloudera.impala.planner.HashJoinNode.eqJoinConjuncts_, com.cloudera.impala.planner.HashJoinNode.getNdv(), com.cloudera.impala.planner.HashJoinNode.joinOp_, com.cloudera.impala.analysis.JoinOperator.RIGHT_ANTI_JOIN, and com.cloudera.impala.analysis.JoinOperator.RIGHT_SEMI_JOIN.

Referenced by com.cloudera.impala.planner.HashJoinNode.computeStats().

TableRef com.cloudera.impala.planner.HashJoinNode.getTableRef ( )
inline
ArrayList<TupleId> com.cloudera.impala.planner.PlanNode.getTblRefIds ( )
inlineinherited
boolean com.cloudera.impala.planner.PlanNode.hasValidStats ( )
inlineprotectedinherited
boolean com.cloudera.impala.planner.PlanNode.isBlockingNode ( )
inlineinherited

Returns true if this plan node can output its first row only after consuming all rows of all its children. This method is used to group plan nodes into pipelined units for resource estimation.

Definition at line 555 of file PlanNode.java.

Referenced by com.cloudera.impala.planner.PipelinedPlanNodeSet.computePlanNodeSets().

void com.cloudera.impala.planner.PlanNode.markSlotsMaterialized ( Analyzer  analyzer,
List< Expr exprs 
)
inlineprotectedinherited

Marks all slots referenced in exprs as materialized.

Definition at line 464 of file PlanNode.java.

Referenced by com.cloudera.impala.planner.HdfsScanNode.init().

static long com.cloudera.impala.planner.PlanNode.multiplyCardinalities ( long  a,
long  b 
)
inlinestaticinherited

Computes and returns the product of two cardinalities. If an overflow occurs, the maximum Long value is returned (Long.MAX_VALUE).

Definition at line 541 of file PlanNode.java.

Referenced by com.cloudera.impala.planner.CrossJoinNode.computeStats(), and com.cloudera.impala.planner.HashJoinNode.getJoinCardinality().

void com.cloudera.impala.planner.HashJoinNode.setAddProbeFilters ( boolean  b)
inline
void com.cloudera.impala.planner.PlanNode.setAssignedConjuncts ( Set< ExprId conjuncts)
inlineinherited
void com.cloudera.impala.planner.PlanNode.setDisplayName ( String  s)
inlineprotectedinherited

Definition at line 223 of file PlanNode.java.

References com.cloudera.impala.planner.PlanNode.displayName_.

void com.cloudera.impala.planner.HashJoinNode.setDistributionMode ( DistributionMode  distrMode)
inline
void com.cloudera.impala.planner.PlanNode.setFragment ( PlanFragment  fragment)
inlineinherited

Definition at line 175 of file PlanNode.java.

References com.cloudera.impala.planner.PlanNode.fragment_.

void com.cloudera.impala.planner.PlanNode.setId ( PlanNodeId  id)
inlineinherited

Definition at line 165 of file PlanNode.java.

References com.cloudera.impala.planner.PlanNode.id_.

void com.cloudera.impala.planner.PlanNode.setLimit ( long  limit)
inlineinherited

Set the limit_ to the given limit_ only if the limit_ hasn't been set, or the new limit_ is lower.

Parameters
limit_

Definition at line 190 of file PlanNode.java.

References com.cloudera.impala.planner.PlanNode.limit_.

void com.cloudera.impala.planner.PlanNode.setOutputSmap ( ExprSubstitutionMap  smap)
inlineinherited

Definition at line 179 of file PlanNode.java.

References com.cloudera.impala.planner.PlanNode.outputSmap_.

void com.cloudera.impala.planner.PlanNode.setTblRefIds ( ArrayList< TupleId ids)
inlineinherited

Definition at line 202 of file PlanNode.java.

References com.cloudera.impala.planner.PlanNode.tblRefIds_.

void com.cloudera.impala.planner.HashJoinNode.toThrift ( TPlanNode  msg)
inlineprotected
void com.cloudera.impala.planner.PlanNode.transferConjuncts ( PlanNode  recipient)
inlineinherited

Definition at line 214 of file PlanNode.java.

References com.cloudera.impala.planner.PlanNode.conjuncts_.

TPlan com.cloudera.impala.planner.PlanNode.treeToThrift ( )
inlineinherited
void com.cloudera.impala.planner.PlanNode.unsetLimit ( )
inlineinherited

Definition at line 194 of file PlanNode.java.

References com.cloudera.impala.planner.PlanNode.limit_.

Member Data Documentation

boolean com.cloudera.impala.planner.HashJoinNode.addProbeFilters_
private
final int com.cloudera.impala.planner.PlanNode.DEFAULT_BATCH_SIZE = 1024
staticprotectedinherited

Definition at line 63 of file PlanNode.java.

final long com.cloudera.impala.planner.HashJoinNode.DEFAULT_PER_HOST_MEM = 2L * 1024L * 1024L * 1024L
staticprivate
PlanNodeId com.cloudera.impala.planner.PlanNode.id_
protectedinherited
final Logger com.cloudera.impala.planner.HashJoinNode.LOG = LoggerFactory.getLogger(HashJoinNode.class)
staticprivate

Definition at line 51 of file HashJoinNode.java.

Set<TupleId> com.cloudera.impala.planner.PlanNode.nullableTupleIds_ = Sets.newHashSet()
protectedinherited
final TableRef com.cloudera.impala.planner.HashJoinNode.tblRef_
private
ArrayList<TupleId> com.cloudera.impala.planner.PlanNode.tblRefIds_
protectedinherited

The documentation for this class was generated from the following file: