Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-50983 Support nested correlated subqueries
  3. SPARK-51884

Subquery Definition Changes For Adding Support For Nested Correlated Subqueries

    XMLWordPrintableJSON

Details

    Description

      Newly added argument for SubqueryExpression: OuterScopeAttrs.

      `OuterScopeAttrs` contains attributes cannot be resolved in the containing query of the subquery or the subquery itself, but can be resolved in the whole query plan.

      The task design includes:

      • Add `OuterScopeAttrs` and related getter and setter methods for SubqueryExpression
      • All attributes in `OuterScopeAttrs` must be contained in the `OuterAttrs` AttributeSet of SubqueryExpression
      • Update the usage of SubqueryExpression and classes extending SubqueryExpression
      • Update SubqueryExpression.references to be `AttributeSet(outerAttrs) – AttributeSet(outerScopeAttrs)`

      Why we need the change?

      Spark only supports one layer of correlation now and does not support nested correlation.
      For example,
      SELECT col1 FROM VALUES (1, 2) t1 (col1, col2) WHERE EXISTS ( SELECT col1 FROM VALUES (1, 2) t2 (col1, col2) WHERE t2.col2 == MAX(t1.col2)
      )GROUP BY col1;
       
      is supported and
      SELECT col1 FROM VALUES (1, 2) t1 (col1, col2) WHERE EXISTS ( SELECT col1 FROM VALUES (1, 2) t2 (col1, col2) WHERE t2.col2 == ( SELECT MAX(t1.col2)
      )
      )GROUP BY col1;
       
      is not supported.

      The reason spark does not support it is because the Analyzer and Optimizer resolves and plans Subquery in a recursive way.

      The definition change for the SubqueryExpression adds the metadata OuterScopeAttrs which helps later rewrites for the Analyzer and Optimizer.

      High Level Design

      https://docs.google.com/document/d/1EGB48ArLQ04OZvb-zx_VVTJ8roIoCPwSRw4vTVuDY7o/edit?usp=sharing 

      Attachments

        Activity

          People

            avery_qi Avery Qi
            avery_qi Avery Qi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: