Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
com.cloudera.impala.analysis.ColumnLineageGraph Class Reference
Collaboration diagram for com.cloudera.impala.analysis.ColumnLineageGraph:

Public Member Functions

 ColumnLineageGraph ()
 
void computeLineageGraph (List< Expr > resultExprs, Analyzer rootAnalyzer)
 
void addDependencyPredicates (Collection< Expr > exprs)
 
String toJson ()
 
boolean equals (Object obj)
 
String debugString ()
 
void addTargetColumnLabels (Collection< String > columnLabels)
 
void addTargetColumnLabels (Table dstTable)
 

Static Public Member Functions

static ColumnLineageGraph createFromJSON (String json)
 

Private Member Functions

 ColumnLineageGraph (String stmt, String user, long timestamp)
 
void setVertices (Set< Vertex > vertices)
 
MultiEdge createMultiEdge (Set< String > targets, Set< String > sources, MultiEdge.EdgeType type)
 
Vertex createVertex (String label)
 
void init (Analyzer analyzer)
 
void computeProjectionDependencies (List< Expr > resultExprs)
 
void computeResultPredicateDependencies (Analyzer analyzer)
 
void getSourceBaseCols (Expr expr, Set< String > sourceBaseCols, List< Expr > directPredDeps, boolean traversePredDeps)
 
List< ExprgetProjectionDeps (Expr e)
 
List< ExprgetPredicateDeps (Expr e)
 
String getQueryHash (String queryStr)
 
MultiEdge createMultiEdgeFromJSONObj (JSONObject jsonEdge)
 
Set< VertexgetVerticesFromJSONArray (JSONArray vertexIdArray)
 

Private Attributes

String queryStr_
 
String user_
 
final List< ExprresultDependencyPredicates_ = Lists.newArrayList()
 
final List< MultiEdgeedges_ = Lists.newArrayList()
 
long timestamp_
 
final Map< String, Vertexvertices_ = Maps.newHashMap()
 
final Map< VertexId, VertexidToVertexMap_ = Maps.newHashMap()
 
final List< String > targetColumnLabels_ = Lists.newArrayList()
 
DescriptorTable descTbl_
 
final IdGenerator< VertexIdvertexIdGenerator = VertexId.createGenerator()
 

Static Private Attributes

static final Logger LOG = LoggerFactory.getLogger(ColumnLineageGraph.class)
 

Detailed Description

Represents the column lineage graph of a query. This is a directional graph that is used to track dependencies among the table/column entities that participate in a query. There are two types of dependencies that are represented as edges in the column lineage graph: a) Projection dependency: This is a dependency between a set of source columns (base table columns) and a single target (result expr or table column). This dependency indicates that values of the target depend on the values of the source columns. b) Predicate dependency: This is a dependency between a set of target columns (or exprs) and a set of source columns (base table columns). It indicates that the source columns restrict the values of their targets (e.g. by participating in WHERE clause predicates).

The following dependencies are generated for a query:

  • Exactly one projection dependency for every result expr / target column.
  • Exactly one predicate dependency that targets all result exprs / target cols and depends on all columns participating in a conjunct in the query.
  • Special case of analytic fns: One predicate dependency per result expr / target col whose value is directly or indirectly affected by an analytic function with a partition by and/or order by clause.

Definition at line 212 of file ColumnLineageGraph.java.

Constructor & Destructor Documentation

com.cloudera.impala.analysis.ColumnLineageGraph.ColumnLineageGraph ( )
inline
com.cloudera.impala.analysis.ColumnLineageGraph.ColumnLineageGraph ( String  stmt,
String  user,
long  timestamp 
)
inlineprivate

Member Function Documentation

void com.cloudera.impala.analysis.ColumnLineageGraph.addDependencyPredicates ( Collection< Expr exprs)
inline

Definition at line 452 of file ColumnLineageGraph.java.

void com.cloudera.impala.analysis.ColumnLineageGraph.addTargetColumnLabels ( Collection< String >  columnLabels)
inline

Definition at line 568 of file ColumnLineageGraph.java.

void com.cloudera.impala.analysis.ColumnLineageGraph.addTargetColumnLabels ( Table  dstTable)
inline
void com.cloudera.impala.analysis.ColumnLineageGraph.computeLineageGraph ( List< Expr resultExprs,
Analyzer  rootAnalyzer 
)
inline

Computes the column lineage graph of a query from the list of query result exprs. 'rootAnalyzer' is the Analyzer that was used for the analysis of the query.

Definition at line 300 of file ColumnLineageGraph.java.

References com.cloudera.impala.analysis.ColumnLineageGraph.computeProjectionDependencies(), com.cloudera.impala.analysis.ColumnLineageGraph.computeResultPredicateDependencies(), and com.cloudera.impala.analysis.ColumnLineageGraph.init().

void com.cloudera.impala.analysis.ColumnLineageGraph.computeResultPredicateDependencies ( Analyzer  analyzer)
inlineprivate
static ColumnLineageGraph com.cloudera.impala.analysis.ColumnLineageGraph.createFromJSON ( String  json)
inlinestatic

Creates a ColumnLineageGraph object from a serialized JSON record. The new ColumnLineageGraph object is returned. Used only during testing.

Definition at line 492 of file ColumnLineageGraph.java.

References com.cloudera.impala.analysis.ColumnLineageGraph.ColumnLineageGraph(), impala.hash, and gen_ir_descriptions.parser.

MultiEdge com.cloudera.impala.analysis.ColumnLineageGraph.createMultiEdge ( Set< String >  targets,
Set< String >  sources,
MultiEdge.EdgeType  type 
)
inlineprivate

Creates a new MultiEdge in the column lineage graph from the sets of 'sources' and 'targets' labels (representing column names or result expr labels). The new MultiEdge object is returned.

Definition at line 268 of file ColumnLineageGraph.java.

References com.cloudera.impala.analysis.ColumnLineageGraph.createVertex().

Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.computeProjectionDependencies(), and com.cloudera.impala.analysis.ColumnLineageGraph.computeResultPredicateDependencies().

MultiEdge com.cloudera.impala.analysis.ColumnLineageGraph.createMultiEdgeFromJSONObj ( JSONObject  jsonEdge)
inlineprivate
Vertex com.cloudera.impala.analysis.ColumnLineageGraph.createVertex ( String  label)
inlineprivate

Creates a new vertex in the column lineage graph. The new Vertex object is returned. If a Vertex with the same label already exists, reuse it.

Definition at line 287 of file ColumnLineageGraph.java.

References com.cloudera.impala.analysis.ColumnLineageGraph.vertexIdGenerator.

Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.createMultiEdge().

String com.cloudera.impala.analysis.ColumnLineageGraph.debugString ( )
inline
boolean com.cloudera.impala.analysis.ColumnLineageGraph.equals ( Object  obj)
inline
List<Expr> com.cloudera.impala.analysis.ColumnLineageGraph.getPredicateDeps ( Expr  e)
inlineprivate

Retrieve the exprs that 'e' is directly predicate dependent on. TODO Handle conditional exprs (e.g. CASE, IF).

Definition at line 439 of file ColumnLineageGraph.java.

References com.cloudera.impala.analysis.AnalyticExpr.getOrderByElements().

Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.getSourceBaseCols().

List<Expr> com.cloudera.impala.analysis.ColumnLineageGraph.getProjectionDeps ( Expr  e)
inlineprivate

Retrieve the exprs that 'e' is directly projection dependent on. TODO Handle conditional exprs (e.g. CASE, IF).

Definition at line 422 of file ColumnLineageGraph.java.

References com.cloudera.impala.analysis.AnalyticExpr.getFnCall().

Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.getSourceBaseCols().

String com.cloudera.impala.analysis.ColumnLineageGraph.getQueryHash ( String  queryStr)
inlineprivate
void com.cloudera.impala.analysis.ColumnLineageGraph.getSourceBaseCols ( Expr  expr,
Set< String >  sourceBaseCols,
List< Expr directPredDeps,
boolean  traversePredDeps 
)
inlineprivate

Identify the base table columns that 'expr' is connected to by recursively resolving all associated slots through inline views and materialization points to base-table slots. If 'directPredDeps' is not null, it is populated with the exprs that have a predicate dependency with 'expr' (e.g. partitioning and order by exprs for the case of an analytic function). If 'traversePredDeps' is false, not all the children exprs of 'expr' are used to identify the base columns that 'expr' is connected to. Which children are filtered depends on the type of 'expr' (e.g. for AnalyticFunctionExpr, grouping and sorting exprs are filtered out).

Definition at line 387 of file ColumnLineageGraph.java.

References com.cloudera.impala.analysis.ColumnLineageGraph.getPredicateDeps(), and com.cloudera.impala.analysis.ColumnLineageGraph.getProjectionDeps().

Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.computeProjectionDependencies(), and com.cloudera.impala.analysis.ColumnLineageGraph.computeResultPredicateDependencies().

Set<Vertex> com.cloudera.impala.analysis.ColumnLineageGraph.getVerticesFromJSONArray ( JSONArray  vertexIdArray)
inlineprivate
void com.cloudera.impala.analysis.ColumnLineageGraph.setVertices ( Set< Vertex vertices)
inlineprivate

Definition at line 256 of file ColumnLineageGraph.java.

Member Data Documentation

DescriptorTable com.cloudera.impala.analysis.ColumnLineageGraph.descTbl_
private
final List<MultiEdge> com.cloudera.impala.analysis.ColumnLineageGraph.edges_ = Lists.newArrayList()
private
final Map<VertexId, Vertex> com.cloudera.impala.analysis.ColumnLineageGraph.idToVertexMap_ = Maps.newHashMap()
private

Definition at line 232 of file ColumnLineageGraph.java.

final Logger com.cloudera.impala.analysis.ColumnLineageGraph.LOG = LoggerFactory.getLogger(ColumnLineageGraph.class)
staticprivate

Definition at line 213 of file ColumnLineageGraph.java.

final List<Expr> com.cloudera.impala.analysis.ColumnLineageGraph.resultDependencyPredicates_ = Lists.newArrayList()
private
final List<String> com.cloudera.impala.analysis.ColumnLineageGraph.targetColumnLabels_ = Lists.newArrayList()
private
final IdGenerator<VertexId> com.cloudera.impala.analysis.ColumnLineageGraph.vertexIdGenerator = VertexId.createGenerator()
private
final Map<String, Vertex> com.cloudera.impala.analysis.ColumnLineageGraph.vertices_ = Maps.newHashMap()
private

The documentation for this class was generated from the following file: