Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
Public Member Functions | |
ColumnLineageGraph () | |
void | computeLineageGraph (List< Expr > resultExprs, Analyzer rootAnalyzer) |
void | addDependencyPredicates (Collection< Expr > exprs) |
String | toJson () |
boolean | equals (Object obj) |
String | debugString () |
void | addTargetColumnLabels (Collection< String > columnLabels) |
void | addTargetColumnLabels (Table dstTable) |
Static Public Member Functions | |
static ColumnLineageGraph | createFromJSON (String json) |
Private Member Functions | |
ColumnLineageGraph (String stmt, String user, long timestamp) | |
void | setVertices (Set< Vertex > vertices) |
MultiEdge | createMultiEdge (Set< String > targets, Set< String > sources, MultiEdge.EdgeType type) |
Vertex | createVertex (String label) |
void | init (Analyzer analyzer) |
void | computeProjectionDependencies (List< Expr > resultExprs) |
void | computeResultPredicateDependencies (Analyzer analyzer) |
void | getSourceBaseCols (Expr expr, Set< String > sourceBaseCols, List< Expr > directPredDeps, boolean traversePredDeps) |
List< Expr > | getProjectionDeps (Expr e) |
List< Expr > | getPredicateDeps (Expr e) |
String | getQueryHash (String queryStr) |
MultiEdge | createMultiEdgeFromJSONObj (JSONObject jsonEdge) |
Set< Vertex > | getVerticesFromJSONArray (JSONArray vertexIdArray) |
Private Attributes | |
String | queryStr_ |
String | user_ |
final List< Expr > | resultDependencyPredicates_ = Lists.newArrayList() |
final List< MultiEdge > | edges_ = Lists.newArrayList() |
long | timestamp_ |
final Map< String, Vertex > | vertices_ = Maps.newHashMap() |
final Map< VertexId, Vertex > | idToVertexMap_ = Maps.newHashMap() |
final List< String > | targetColumnLabels_ = Lists.newArrayList() |
DescriptorTable | descTbl_ |
final IdGenerator< VertexId > | vertexIdGenerator = VertexId.createGenerator() |
Static Private Attributes | |
static final Logger | LOG = LoggerFactory.getLogger(ColumnLineageGraph.class) |
Represents the column lineage graph of a query. This is a directional graph that is used to track dependencies among the table/column entities that participate in a query. There are two types of dependencies that are represented as edges in the column lineage graph: a) Projection dependency: This is a dependency between a set of source columns (base table columns) and a single target (result expr or table column). This dependency indicates that values of the target depend on the values of the source columns. b) Predicate dependency: This is a dependency between a set of target columns (or exprs) and a set of source columns (base table columns). It indicates that the source columns restrict the values of their targets (e.g. by participating in WHERE clause predicates).
The following dependencies are generated for a query:
Definition at line 212 of file ColumnLineageGraph.java.
|
inline |
Definition at line 245 of file ColumnLineageGraph.java.
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.createFromJSON(), and com.cloudera.impala.analysis.ColumnLineageGraph.equals().
|
inlineprivate |
Private c'tor, used only for testing.
Definition at line 250 of file ColumnLineageGraph.java.
References com.cloudera.impala.analysis.ColumnLineageGraph.queryStr_, com.cloudera.impala.analysis.ColumnLineageGraph.timestamp_, and com.cloudera.impala.analysis.ColumnLineageGraph.user_.
|
inline |
Definition at line 452 of file ColumnLineageGraph.java.
|
inline |
Definition at line 568 of file ColumnLineageGraph.java.
|
inline |
Definition at line 573 of file ColumnLineageGraph.java.
References com.cloudera.impala.catalog.Table.getColumnNames().
|
inline |
Computes the column lineage graph of a query from the list of query result exprs. 'rootAnalyzer' is the Analyzer that was used for the analysis of the query.
Definition at line 300 of file ColumnLineageGraph.java.
References com.cloudera.impala.analysis.ColumnLineageGraph.computeProjectionDependencies(), com.cloudera.impala.analysis.ColumnLineageGraph.computeResultPredicateDependencies(), and com.cloudera.impala.analysis.ColumnLineageGraph.init().
|
inlineprivate |
Definition at line 331 of file ColumnLineageGraph.java.
References com.cloudera.impala.analysis.ColumnLineageGraph.createMultiEdge(), com.cloudera.impala.analysis.ColumnLineageGraph.getSourceBaseCols(), and com.cloudera.impala.analysis.ColumnLineageGraph.targetColumnLabels_.
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.computeLineageGraph().
|
inlineprivate |
Compute predicate dependencies for the query result, i.e. exprs that affect the possible values of the result exprs / target columns, such as predicates in a WHERE clause.
Definition at line 362 of file ColumnLineageGraph.java.
References com.cloudera.impala.analysis.ColumnLineageGraph.createMultiEdge(), com.cloudera.impala.analysis.ColumnLineageGraph.getSourceBaseCols(), com.cloudera.impala.analysis.ColumnLineageGraph.resultDependencyPredicates_, and com.cloudera.impala.analysis.ColumnLineageGraph.targetColumnLabels_.
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.computeLineageGraph().
|
inlinestatic |
Creates a ColumnLineageGraph object from a serialized JSON record. The new ColumnLineageGraph object is returned. Used only during testing.
Definition at line 492 of file ColumnLineageGraph.java.
References com.cloudera.impala.analysis.ColumnLineageGraph.ColumnLineageGraph(), impala.hash, and gen_ir_descriptions.parser.
|
inlineprivate |
Creates a new MultiEdge in the column lineage graph from the sets of 'sources' and 'targets' labels (representing column names or result expr labels). The new MultiEdge object is returned.
Definition at line 268 of file ColumnLineageGraph.java.
References com.cloudera.impala.analysis.ColumnLineageGraph.createVertex().
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.computeProjectionDependencies(), and com.cloudera.impala.analysis.ColumnLineageGraph.computeResultPredicateDependencies().
|
inlineprivate |
Definition at line 525 of file ColumnLineageGraph.java.
References com.cloudera.impala.analysis.ColumnLineageGraph.getVerticesFromJSONArray().
|
inlineprivate |
Creates a new vertex in the column lineage graph. The new Vertex object is returned. If a Vertex with the same label already exists, reuse it.
Definition at line 287 of file ColumnLineageGraph.java.
References com.cloudera.impala.analysis.ColumnLineageGraph.vertexIdGenerator.
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.createMultiEdge().
|
inline |
Definition at line 559 of file ColumnLineageGraph.java.
References com.cloudera.impala.analysis.ColumnLineageGraph.edges_, and com.cloudera.impala.analysis.ColumnLineageGraph.toJson().
|
inline |
Definition at line 548 of file ColumnLineageGraph.java.
References com.cloudera.impala.analysis.ColumnLineageGraph.ColumnLineageGraph(), and com.cloudera.impala.analysis.ColumnLineageGraph.vertices_.
Referenced by com.cloudera.impala.planner.PlannerTestBase.testColumnLineageOutput().
Retrieve the exprs that 'e' is directly predicate dependent on. TODO Handle conditional exprs (e.g. CASE, IF).
Definition at line 439 of file ColumnLineageGraph.java.
References com.cloudera.impala.analysis.AnalyticExpr.getOrderByElements().
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.getSourceBaseCols().
|
inlineprivate |
Retrieve the exprs that 'e' is directly projection dependent on. TODO Handle conditional exprs (e.g. CASE, IF).
Definition at line 422 of file ColumnLineageGraph.java.
References com.cloudera.impala.analysis.AnalyticExpr.getFnCall().
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.getSourceBaseCols().
|
inlineprivate |
Definition at line 482 of file ColumnLineageGraph.java.
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.toJson().
|
inlineprivate |
Identify the base table columns that 'expr' is connected to by recursively resolving all associated slots through inline views and materialization points to base-table slots. If 'directPredDeps' is not null, it is populated with the exprs that have a predicate dependency with 'expr' (e.g. partitioning and order by exprs for the case of an analytic function). If 'traversePredDeps' is false, not all the children exprs of 'expr' are used to identify the base columns that 'expr' is connected to. Which children are filtered depends on the type of 'expr' (e.g. for AnalyticFunctionExpr, grouping and sorting exprs are filtered out).
Definition at line 387 of file ColumnLineageGraph.java.
References com.cloudera.impala.analysis.ColumnLineageGraph.getPredicateDeps(), and com.cloudera.impala.analysis.ColumnLineageGraph.getProjectionDeps().
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.computeProjectionDependencies(), and com.cloudera.impala.analysis.ColumnLineageGraph.computeResultPredicateDependencies().
|
inlineprivate |
Definition at line 536 of file ColumnLineageGraph.java.
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.createMultiEdgeFromJSONObj().
|
inlineprivate |
Initialize the ColumnLineageGraph from the root analyzer of a query.
Definition at line 309 of file ColumnLineageGraph.java.
References com.cloudera.impala.analysis.ColumnLineageGraph.descTbl_, com.cloudera.impala.analysis.ColumnLineageGraph.queryStr_, com.cloudera.impala.analysis.ColumnLineageGraph.timestamp_, and com.cloudera.impala.analysis.ColumnLineageGraph.user_.
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.computeLineageGraph().
|
inlineprivate |
Definition at line 256 of file ColumnLineageGraph.java.
|
inline |
Encodes the ColumnLineageGraph object to JSON.
Definition at line 459 of file ColumnLineageGraph.java.
References com.cloudera.impala.analysis.ColumnLineageGraph.edges_, com.cloudera.impala.analysis.ColumnLineageGraph.getQueryHash(), com.cloudera.impala.analysis.ColumnLineageGraph.queryStr_, com.cloudera.impala.analysis.ColumnLineageGraph.timestamp_, and com.cloudera.impala.analysis.ColumnLineageGraph.user_.
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.debugString().
|
private |
Definition at line 241 of file ColumnLineageGraph.java.
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.init().
|
private |
Definition at line 222 of file ColumnLineageGraph.java.
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.debugString(), and com.cloudera.impala.analysis.ColumnLineageGraph.toJson().
|
private |
Definition at line 232 of file ColumnLineageGraph.java.
|
staticprivate |
Definition at line 213 of file ColumnLineageGraph.java.
|
private |
Definition at line 215 of file ColumnLineageGraph.java.
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.ColumnLineageGraph(), com.cloudera.impala.analysis.ColumnLineageGraph.init(), and com.cloudera.impala.analysis.ColumnLineageGraph.toJson().
|
private |
Definition at line 220 of file ColumnLineageGraph.java.
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.computeResultPredicateDependencies().
|
private |
Definition at line 237 of file ColumnLineageGraph.java.
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.computeProjectionDependencies(), and com.cloudera.impala.analysis.ColumnLineageGraph.computeResultPredicateDependencies().
|
private |
Definition at line 225 of file ColumnLineageGraph.java.
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.ColumnLineageGraph(), com.cloudera.impala.analysis.ColumnLineageGraph.init(), and com.cloudera.impala.analysis.ColumnLineageGraph.toJson().
|
private |
Definition at line 218 of file ColumnLineageGraph.java.
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.ColumnLineageGraph(), com.cloudera.impala.analysis.ColumnLineageGraph.init(), and com.cloudera.impala.analysis.ColumnLineageGraph.toJson().
|
private |
Definition at line 243 of file ColumnLineageGraph.java.
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.createVertex().
|
private |
Definition at line 228 of file ColumnLineageGraph.java.
Referenced by com.cloudera.impala.analysis.ColumnLineageGraph.equals().