diff --git a/doc/user/application_security/detect/vulnerability_deduplication.md b/doc/user/application_security/detect/vulnerability_deduplication.md index ae9253380e4ba990e65fb7bc306d2c1c3a24e81a..cb9b2a698c7066e732b0a81b4847068e9f6fa11b 100644 --- a/doc/user/application_security/detect/vulnerability_deduplication.md +++ b/doc/user/application_security/detect/vulnerability_deduplication.md @@ -1,6 +1,6 @@ --- -stage: Application Security Testing -group: Static Analysis +stage: Security Risk Management +group: Security Insights info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://handbook.gitlab.com/handbook/product/ux/technical-writing/#assignments title: Vulnerability deduplication process description: Deduplication of security scanning results @@ -14,15 +14,19 @@ description: Deduplication of security scanning results {{< /details >}} When a pipeline contains jobs that produce multiple security reports of the same type, it is -possible that the same vulnerability finding is present in multiple reports. This duplication is +possible that the same vulnerability is present in multiple reports. This duplication is common when different scanners are used to increase coverage, but can also exist in a single report. -The deduplication process allows you to maximize the vulnerability scanning coverage while reducing -the number of findings you need to manage. +Vulnerability deduplication automatically consolidates duplicate vulnerabilities across scans, helping you +focus on unique vulnerabilities while maintaining full scanning coverage. -A finding is considered a duplicate of another finding when their +The logic for deduplicating vulnerabilities varies depending on the scan type: + +- SAST vulnerabilities are deduplicated using the [scope-offset algorithm](#scope-offset-signatures). +- Secret detection vulnerabilities are deduplicated [per value and file](../secret_detection/pipeline/_index.md#duplicate-vulnerability-tracking). +- All other vulnerabilities are considered a duplicate of another vulnerability when their [scan type](../terminology/_index.md#scan-type-report-type), [location](../terminology/_index.md#location-fingerprint), and -[primary identifier](../../../development/integrations/secure.md#primary-identifier) are the same. +[identifiers](../terminology/_index.md#identifier) are the same. The scan type must match because each can have its own definition for the location of a vulnerability. For example, static analyzers are able to locate a file path and line number, whereas @@ -30,60 +34,123 @@ a container scanning analyzer uses the image name instead. When comparing identifiers, GitLab does not compare `CWE` and `WASC` during deduplication because they are "type identifiers" and are used to classify groups of vulnerabilities. Including these -identifiers would result in many findings being incorrectly considered duplicates. Two findings are +identifiers would result in many vulnerabilities being incorrectly considered duplicates. Two vulnerabilities are considered unique if none of their identifiers match. -In a set of duplicated findings, the first occurrence of a finding is kept and the remaining are -skipped. Security reports are processed in alphabetical file path order, and findings are processed +In a set of duplicated vulnerabilities, the first occurrence of a vulnerability is kept and the remaining are +skipped. Security reports are processed in alphabetical file path order, and vulnerabilities are processed sequentially in the order they appear in a report. +## Location definitions by scan type + +The location used for deduplication is dependent on the scan type. + +### Container scanning + +- Location is usually defined only by the Docker image name, not the image tag. +- However, the image tag is considered part of the location if the image tag matches semantic versioning (semver) syntax and doesn't look like a Git commit hash. For example: +- The following locations are treated as duplicates: + - `registry.gitlab.com/group-name/project-name/image1:12345019:libcrypto3` + - `registry.gitlab.com/group-name/project-name/image1:libcrypto3` +- The following locations are treated as unique: + - `registry.gitlab.com/group-name/project-name/image1:v19202021:libcrypto3` + - `registry.gitlab.com/group-name/project-name/image1:libcrypto3` + +### Dynamic Application Security Testing (DAST) + +- Location is defined by the URL path, HTTP method, and HTTP parameters. +- Two vulnerabilities are considered duplicates if they occur at the same URL endpoint with the same HTTP method. + +### Dependency scanning + +- Location is defined by the package name and version. +- Two vulnerabilities are considered duplicates if they affect the same package version. + +## Scope-offset signatures + +When security scanners analyze your code, they sometimes report the same vulnerability multiple times, +especially when code is refactored or moved around. Advanced vulnerability tracking uses a smart deduplication +system to recognize when these are actually the same issue, not new ones. + +Imagine you have a security issue in a function. If a developer refactors the code and moves that function to a different line, +the scanner might report it as a new vulnerability. Without deduplication, you'd see duplicate alerts for the same problem, +making it harder to track what you actually need to fix. + +When using scope-offset signatures, GitLab creates a unique "fingerprint" for each vulnerability using the following information: + +- Filename: The file that contains the vulnerability. +- Scope: The code context where the vulnerability lives (like a function name or class name). +- Offset: The position relative to that scope. + +This combination creates a signature that stays the same even when code moves around, as long as it stays within the same scope. + +### Example + +Say you have this Ruby code: + +```ruby +class OuterClass + class InnerClassA + def function_A(x) + puts "calling call1" + call1(x) # ← Vulnerability found here on line 5 + end + call2("calling call 2") + end +end +``` + +The scanner finds a vulnerability on line 5. GitLab needs to figure out whether the vulnerability is in `OuterClass`, `InnerClassA`, or `function_A`? +The scanner calculates which scope is the best fit by measuring the distance from the vulnerability to the beginning and to the end of each scope: + +- `OuterClass` (lines 1-9): Distance = (5-1) + (9-5) = 8 +- `InnerClassA` (lines 2-8): Distance = (5-2) + (8-5) = 6 +- `function_A` (lines 3-6): Distance = (5-3) + (6-5) = 3 + +The smallest distance wins, so GitLab identifies `function_A` as the scope. + +GitLab creates a signature like `lib/outer_class.rb|OuterClass[0]|InnerClassA[0]|function_A[0]:2` +to identify the location of the vulnerability. If the function or class that contains the vulnerability is moved +to a different location within its parent scope, the vulnerability will not be reintroduced. +However, if `OuterClass` is renamed the scope is different and a new vulnerability is created. + ## Deduplication examples -- Example 1: matching identifiers and location, mismatching scan type. - - Finding - - Scan type: `dependency_scanning` - - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - - Identifiers: CVE-2022-25510 - - Other Finding - - Scan type: `container_scanning` - - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - - Identifiers: CVE-2022-25510 - - Deduplication result: no deduplication occurs because the scan type is different. -- Example 2: matching location and scan type, mismatching type identifiers. - - Finding - - Scan type: `sast` - - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - - Identifiers: CWE-259 - - Other Finding - - Scan type: `sast` - - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - - Identifiers: CWE-798 - - Deduplication result: no duplication occurs because `CWE` identifiers are ignored. -- Example 3: matching scan type, location and an identifier. - - Finding - - Scan type: `container_scanning` - - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - - Identifiers: CVE-2019-12345, CVE-2022-25510, CWE-259 - - Other Finding - - Scan type: `container_scanning` - - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - - Identifiers: CVE-2022-25510, CWE-798 - - Deduplication result: duplication occurs because all criteria match, and type identifiers (CWE) are ignored. - Only one identifier needs to match, in this case CVE-2022-25510. - -You can find definitions for each scan type [`gitlab/lib/gitlab/ci/reports/security/locations`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/lib/gitlab/ci/reports/security/locations) -and [`gitlab/ee/lib/gitlab/ci/reports/security/locations`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/ee/lib/gitlab/ci/reports/security/locations). - -For instance, for `container_scanning` type the location is defined by the Docker image name without -tag. However, if the image tag matches a semver syntax and doesn't look like a Git commit hash, -it isn't considered a duplicate. - -For example, the following locations are treated as duplicates: - -- `registry.gitlab.com/group-name/project-name/image1:12345019:libcrypto3` -- `registry.gitlab.com/group-name/project-name/image1:libcrypto3` - -However, the following locations are considered different: - -- `registry.gitlab.com/group-name/project-name/image1:v19202021:libcrypto3` -- `registry.gitlab.com/group-name/project-name/image1:libcrypto3` +Here are some examples of how vulnerability deduplication behaves. + +### Matching identifiers and location, mismatching scan type + +- First vulnerability: + - Scan type: `dependency_scanning` + - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` + - Identifiers: CVE-2022-25510 +- Second vulnerability: + - Scan type: `container_scanning` + - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` + - Identifiers: CVE-2022-25510 +- Deduplication result: no deduplication is performed because the scan type is different. + +### Matching location and scan type, mismatching type identifiers + +- First vulnerability: + - Scan type: `sast` + - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` + - Identifiers: CWE-259 +- Second vulnerability: + - Scan type: `sast` + - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` + - Identifiers: CWE-798 +- Deduplication result: no deduplication is performed because `CWE` identifiers are ignored. + +### Matching scan type, location and an identifier + +- First vulnerability: + - Scan type: `container_scanning` + - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` + - Identifiers: CVE-2019-12345, CVE-2022-25510, CWE-259 +- Second vulnerability: + - Scan type: `container_scanning` + - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` + - Identifiers: CVE-2022-25510, CWE-798 +- Deduplication result: the vulnerabilities are deduplicated because both vulnerabilities have the same scan type, location fingerprint, + and are identified as CVE-2022-25510. diff --git a/doc/user/application_security/terminology/_index.md b/doc/user/application_security/terminology/_index.md index 159b54bf4c856c39bab88c9eb6c14ea525c974ee..3e882922e610f079104da6f8fde4fbbd617f8f65 100644 --- a/doc/user/application_security/terminology/_index.md +++ b/doc/user/application_security/terminology/_index.md @@ -107,6 +107,12 @@ A flexible and non-destructive way to visually organize vulnerabilities in group that are likely related but do not qualify for deduplication. For example, you can include findings that should be evaluated together, would be fixed by the same action, or come from the same source. +## Identifier + +An identifier is an ID for the vulnerability from an external database, such as Common Vulnerabilities and Exposures (CVE) +or Common Weakness Enumeration (CWE). A vulnerability may have multiple identifiers. +An identifier is composed of a type (like `CVE`) and an ID (like `CVE-2021-44228`). + ## Insignificant finding A legitimate finding that a particular customer doesn't care about. @@ -262,13 +268,8 @@ Examples: `DS_EXCLUDED_PATHS` should `Exclude files and directories from the sca ## Primary identifier -A finding's primary identifier is a value that is unique to each finding. The external type and external ID -of the finding's [first identifier](https://gitlab.com/gitlab-org/security-products/security-report-schemas/-/blob/v2.4.0-rc1/dist/sast-report-format.json#L228) -combine to create the value. - -An example primary identifier is `CVE`, which is used for Trivy. The identifier must be stable. -Subsequent scans must return the same value for the same finding, even if the location has slightly -changed. +The first [identifier](#identifier) is the primary identifier. The primary identifier must be stable. +Subsequent scans must return the same value for the same finding, even if the location of the vulnerability has changed. ## Processor