Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-52312

Caching AppendData plan causes data to be inserted twice

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0
    • 4.1.0
    • SQL

    Description

      We’ve identified an issue where a DataFrame created from an INSERT SQL statement and then cached will cause the INSERT to be executed twice. This happens because the logical plan for the INSERT (AppendData) doesn’t extend the IgnoreCachedData trait, so it isn’t ignored during caching as expected. As a result, the plan is cached and re-executed. We should fix this by ensuring that plans used by INSERT all extend the IgnoreCachedData trait.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tomvanbussel Tom van Bussel
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: