Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46382

XML: Capture values interspersed between elements

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0
    • 4.0.0
    • SQL

    Description

      In XML, elements typically consist of a name and a value, with the value enclosed between the opening and closing tags. But XML also allows to include arbitrary values interspersed between these elements. To address this, we provide an option named `valueTags`, which is enabled by default, to capture these values. Consider the following example:

      ```

      <ROW>
          <a>1</a>
        value1
        <b>
          value2
          <c>2</c>
          value3
        </b>
      </ROW>

      ```
      In this example, `<a>`,`<b>`, and `<c>` are named elements with their respective values enclosed within tags. There are arbitrary values value1 value2 value3 interspersed between the elements. Please note that there can be multiple occurrences of values in a single element (i.e. there are value2, value3 in the element <b>)

       

      We should parse the values between tags into the valueTags field. If there are multiple occurrences of value tags, the value tag field will be converted to an array type.

      Attachments

        Activity

          People

            shujing.yang Shujing Yang
            shujing.yang Shujing Yang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: