Restore vulnerability statuses reset by semgrep 6.7.0 bug
What does this MR do and why?
This MR addresses a bug that was introduced by semgrep v6.7.0, which sorts the vulnerabilities[].identifiers[] array as a result of the change made in Sort vulnerability links and identifiers (gitlab-org/security-products/analyzers/report!116 - merged) • Adam Cohen • 18.3. Sorting the vulnerabilities[].identifiers[] array causes the primary identifier to be changed, which leads to corrupt vulnerability data.
This MR adds a batched background migration which fixes corrupt vulnerability data.
What caused this bug?
Here's the sequence of events that lead to this bug:
-
semgrep v6.6.2 is the last release before the bug occurred. This release follows the normal convention of placing the
semgrep_idas the first element invulnerabilities[].identifiers[], thereby making it theprimary identifier. See gl-sast-report-semgrep-6.6.2-multiple-vulnerabilities.json for example:Click to expand report generated by
semgrep v6.6.2"identifiers": [ { "type": "semgrep_id", "name": "bandit.B506", "value": "bandit.B506", "url": "https://semgrep.dev/r/gitlab.bandit.B506" }, { "type": "cwe", "name": "CWE-502", "value": "502", "url": "https://cwe.mitre.org/data/definitions/502.html" }, { "type": "owasp", "name": "A08:2021 - Software and Data Integrity Failures", "value": "A08:2021" }, { "type": "owasp", "name": "A8:2017 - Insecure Deserialization", "value": "A8:2017" }, { "type": "bandit_test_id", "name": "Bandit Test ID B506", "value": "B506" } ] -
semgrep v6.7.0 is then released, which introduces the bug:
-
Bumps the security report schema version from
15.1.4to 15.2.2. -
Sorts
vulnerabilities[].identifiers[], and placescweorowaspin the first element of the list, making it the newprimary identifier. See gl-sast-report-semgrep-6.7.0-multiple-vulnerabilities-incorrect-primary-identifier.json:Click to expand report generated by
semgrep v6.7.0"identifiers": [ { "type": "cwe", "name": "CWE-502", "value": "502", "url": "https://cwe.mitre.org/data/definitions/502.html" }, { "type": "owasp", "name": "A08:2021 - Software and Data Integrity Failures", "value": "A08:2021" }, { "type": "owasp", "name": "A8:2017 - Insecure Deserialization", "value": "A8:2017" }, { "type": "bandit_test_id", "name": "Bandit Test ID B506", "value": "B506" }, { "type": "semgrep_id", "name": "bandit.B506", "value": "bandit.B506", "url": "https://semgrep.dev/r/gitlab.bandit.B506" } ]
It's at this point where the bug manifests, because
semgrep_idis no longer theprimary identifier, andcwe(orowasp) is the new primary identifier. -
-
semgrep v6.7.1 is released, which fixes this bug and ensures the
vulnerabilities[].identifiers[], placessemgrep_idin the first element of the list, restoring it as theprimary identifier. See gl-sast-report-semgrep-6.7.1-additional-vulnerabilities-correct-primary-identifier.json:Click to expand report generated by
semgrep v6.7.1"identifiers": [ { "type": "semgrep_id", "name": "bandit.B506", "value": "bandit.B506", "url": "https://semgrep.dev/r/gitlab.bandit.B506" }, { "type": "cwe", "name": "CWE-502", "value": "502", "url": "https://cwe.mitre.org/data/definitions/502.html" }, { "type": "owasp", "name": "A08:2021 - Software and Data Integrity Failures", "value": "A08:2021" }, { "type": "owasp", "name": "A8:2017 - Insecure Deserialization", "value": "A8:2017" }, { "type": "bandit_test_id", "name": "Bandit Test ID B506", "value": "B506" } ]
So the bug has been fixed in the analyzer code, however, changing the primary identifier of the vulnerabilities has caused the following issue to occur:
- When
semgrep v6.7.0is executed, the primary identifier for existing vulnerabilities is updated tocwe, and the vulnerabilitystateforresolvedvulnerabilities is reset todetected, however, thestateforconfirmedanddismissedvulnerabilities is not changed. - If
semgrep v6.7.1is executed, new vulnerabilities (and vulnerability findings) are created, and thestateof all of these new vulnerabilities is set todetected. These new vulnerabilities are duplicates of the vulnerabilities that were last detected bysemgrep v6.6.2.
This merge request restores the vulnerability data and states to the values that were present when semgrep v6.6.2 was executed.
References
Investigate automatically restoring vulnerabili... (#577229) • Adam Cohen • 18.8