From 0a696aee4043038d425fa1067c9b2277c5ded827 Mon Sep 17 00:00:00 2001 From: Brian Williams Date: Wed, 10 Dec 2025 11:21:17 -0600 Subject: [PATCH 1/9] Improve vulnerability deduplication docs We should not expect users to read the source code to understand how the application should behave. This MR improves the documentation so that it describes the behavior directly instead of directing the reader to the code. --- .../detect/vulnerability_deduplication.md | 122 +++++++++++------- 1 file changed, 73 insertions(+), 49 deletions(-) diff --git a/doc/user/application_security/detect/vulnerability_deduplication.md b/doc/user/application_security/detect/vulnerability_deduplication.md index ae9253380e4ba9..801a72ebb7f1fa 100644 --- a/doc/user/application_security/detect/vulnerability_deduplication.md +++ b/doc/user/application_security/detect/vulnerability_deduplication.md @@ -19,7 +19,11 @@ common when different scanners are used to increase coverage, but can also exist The deduplication process allows you to maximize the vulnerability scanning coverage while reducing the number of findings you need to manage. -A finding is considered a duplicate of another finding when their +The logic for deduplicating vulnerabilities varies depending on the scan type: + +- SAST vulnerabilities are deduplicated using [vulnerability tracking](../../../development/sec/vulnerability_tracking.md). +- Secret detection vulnerabilities are deduplicated [per value and file](../secret_detection/pipeline/_index.md#duplicate-vulnerability-tracking) +- All other vulnerabilities are considered a duplicate of another vulnerability when their [scan type](../terminology/_index.md#scan-type-report-type), [location](../terminology/_index.md#location-fingerprint), and [primary identifier](../../../development/integrations/secure.md#primary-identifier) are the same. @@ -37,53 +41,73 @@ In a set of duplicated findings, the first occurrence of a finding is kept and t skipped. Security reports are processed in alphabetical file path order, and findings are processed sequentially in the order they appear in a report. +## Location definitions by scan type + +The location used for deduplication is dependent on the scan type. + +### Container Scanning + +- Location is defined by the Docker image name without the tag. +- However, if the image tag matches semantic versioning (semver) syntax and doesn't look like a Git commit hash, it is considered part of the location. +- For example, the following locations are treated as duplicates: + - `registry.gitlab.com/group-name/project-name/image1:12345019:libcrypto3` + - `registry.gitlab.com/group-name/project-name/image1:libcrypto3` +- However, the following locations are considered different: + - `registry.gitlab.com/group-name/project-name/image1:v19202021:libcrypto3` + - `registry.gitlab.com/group-name/project-name/image1:libcrypto3` + +### Coverage Fuzzing + +- Location is defined by the file path and line number of the vulnerable code. +- Two findings are considered duplicates if they occur in the same file at the same line number. + +### DAST (Dynamic Application Security Testing) + +- Location is defined by the URL path and HTTP method. +- Two findings are considered duplicates if they occur at the same URL endpoint with the same HTTP method. + +### Dependency Scanning + +- Location is defined by the package name and version. +- Two findings are considered duplicates if they affect the same package version. + ## Deduplication examples -- Example 1: matching identifiers and location, mismatching scan type. - - Finding - - Scan type: `dependency_scanning` - - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - - Identifiers: CVE-2022-25510 - - Other Finding - - Scan type: `container_scanning` - - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - - Identifiers: CVE-2022-25510 - - Deduplication result: no deduplication occurs because the scan type is different. -- Example 2: matching location and scan type, mismatching type identifiers. - - Finding - - Scan type: `sast` - - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - - Identifiers: CWE-259 - - Other Finding - - Scan type: `sast` - - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - - Identifiers: CWE-798 - - Deduplication result: no duplication occurs because `CWE` identifiers are ignored. -- Example 3: matching scan type, location and an identifier. - - Finding - - Scan type: `container_scanning` - - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - - Identifiers: CVE-2019-12345, CVE-2022-25510, CWE-259 - - Other Finding - - Scan type: `container_scanning` - - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - - Identifiers: CVE-2022-25510, CWE-798 - - Deduplication result: duplication occurs because all criteria match, and type identifiers (CWE) are ignored. - Only one identifier needs to match, in this case CVE-2022-25510. - -You can find definitions for each scan type [`gitlab/lib/gitlab/ci/reports/security/locations`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/lib/gitlab/ci/reports/security/locations) -and [`gitlab/ee/lib/gitlab/ci/reports/security/locations`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/ee/lib/gitlab/ci/reports/security/locations). - -For instance, for `container_scanning` type the location is defined by the Docker image name without -tag. However, if the image tag matches a semver syntax and doesn't look like a Git commit hash, -it isn't considered a duplicate. - -For example, the following locations are treated as duplicates: - -- `registry.gitlab.com/group-name/project-name/image1:12345019:libcrypto3` -- `registry.gitlab.com/group-name/project-name/image1:libcrypto3` - -However, the following locations are considered different: - -- `registry.gitlab.com/group-name/project-name/image1:v19202021:libcrypto3` -- `registry.gitlab.com/group-name/project-name/image1:libcrypto3` +Here are some examples of how vulnerability deduplication would behave. + +### Matching identifiers and location, mismatching scan type + +- Finding + - Scan type: `dependency_scanning` + - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` + - Identifiers: CVE-2022-25510 +- Other Finding + - Scan type: `container_scanning` + - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` + - Identifiers: CVE-2022-25510 +- Deduplication result: no deduplication occurs because the scan type is different. + +### Matching location and scan type, mismatching type identifiers + +- Finding + - Scan type: `sast` + - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` + - Identifiers: CWE-259 +- Other Finding + - Scan type: `sast` + - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` + - Identifiers: CWE-798 +- Deduplication result: no duplication occurs because `CWE` identifiers are ignored. + +### Matching scan type, location and an identifier + +- Finding + - Scan type: `container_scanning` + - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` + - Identifiers: CVE-2019-12345, CVE-2022-25510, CWE-259 +- Other Finding + - Scan type: `container_scanning` + - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` + - Identifiers: CVE-2022-25510, CWE-798 +- Deduplication result: duplication occurs because all criteria match, and type identifiers (CWE) are ignored. + Only one identifier needs to match, in this case CVE-2022-25510. -- GitLab From b4d45d58ea6f038c35841f6abf3c007c7a377052 Mon Sep 17 00:00:00 2001 From: Brian Williams Date: Wed, 10 Dec 2025 16:22:52 -0600 Subject: [PATCH 2/9] Apply reviewer suggestions --- .../detect/vulnerability_deduplication.md | 38 +++++++++---------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/doc/user/application_security/detect/vulnerability_deduplication.md b/doc/user/application_security/detect/vulnerability_deduplication.md index 801a72ebb7f1fa..fac84c6c6b5c5a 100644 --- a/doc/user/application_security/detect/vulnerability_deduplication.md +++ b/doc/user/application_security/detect/vulnerability_deduplication.md @@ -1,6 +1,6 @@ --- -stage: Application Security Testing -group: Static Analysis +stage: Security Risk Management +group: Security Insights info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://handbook.gitlab.com/handbook/product/ux/technical-writing/#assignments title: Vulnerability deduplication process description: Deduplication of security scanning results @@ -45,67 +45,67 @@ sequentially in the order they appear in a report. The location used for deduplication is dependent on the scan type. -### Container Scanning +### Container scanning -- Location is defined by the Docker image name without the tag. -- However, if the image tag matches semantic versioning (semver) syntax and doesn't look like a Git commit hash, it is considered part of the location. -- For example, the following locations are treated as duplicates: +- Location is usually defined only by the Docker image name, not the image tag. +- However, the image tag is considered part of the location if the image tag matches semantic versioning (semver) syntax and doesn't look like a Git commit hash. For example: +- The following locations are treated as duplicates: - `registry.gitlab.com/group-name/project-name/image1:12345019:libcrypto3` - `registry.gitlab.com/group-name/project-name/image1:libcrypto3` -- However, the following locations are considered different: +- The following locations are treated as unique: - `registry.gitlab.com/group-name/project-name/image1:v19202021:libcrypto3` - `registry.gitlab.com/group-name/project-name/image1:libcrypto3` -### Coverage Fuzzing +### Coverage fuzzing - Location is defined by the file path and line number of the vulnerable code. - Two findings are considered duplicates if they occur in the same file at the same line number. -### DAST (Dynamic Application Security Testing) +### Dynamic Application Security Testing (DAST) - Location is defined by the URL path and HTTP method. - Two findings are considered duplicates if they occur at the same URL endpoint with the same HTTP method. -### Dependency Scanning +### Dependency scanning - Location is defined by the package name and version. - Two findings are considered duplicates if they affect the same package version. ## Deduplication examples -Here are some examples of how vulnerability deduplication would behave. +Here are some examples of how vulnerability deduplication behaves. ### Matching identifiers and location, mismatching scan type -- Finding +- First finding: - Scan type: `dependency_scanning` - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - Identifiers: CVE-2022-25510 -- Other Finding +- Second Finding: - Scan type: `container_scanning` - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - Identifiers: CVE-2022-25510 -- Deduplication result: no deduplication occurs because the scan type is different. +- Deduplication result: no deduplication identified because the scan type is different. ### Matching location and scan type, mismatching type identifiers -- Finding +- First finding: - Scan type: `sast` - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - Identifiers: CWE-259 -- Other Finding +- Second finding: - Scan type: `sast` - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - Identifiers: CWE-798 -- Deduplication result: no duplication occurs because `CWE` identifiers are ignored. +- Deduplication result: no duplication identified because `CWE` identifiers are ignored. ### Matching scan type, location and an identifier -- Finding +- First finding: - Scan type: `container_scanning` - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - Identifiers: CVE-2019-12345, CVE-2022-25510, CWE-259 -- Other Finding +- Second finding: - Scan type: `container_scanning` - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - Identifiers: CVE-2022-25510, CWE-798 -- GitLab From a79061d0981b1a2bacb609545906829ba70e5e0a Mon Sep 17 00:00:00 2001 From: Brian Williams Date: Thu, 11 Dec 2025 10:43:51 -0600 Subject: [PATCH 3/9] Reword description of deduplication --- .../detect/vulnerability_deduplication.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/user/application_security/detect/vulnerability_deduplication.md b/doc/user/application_security/detect/vulnerability_deduplication.md index fac84c6c6b5c5a..091c3484993b6f 100644 --- a/doc/user/application_security/detect/vulnerability_deduplication.md +++ b/doc/user/application_security/detect/vulnerability_deduplication.md @@ -16,8 +16,8 @@ description: Deduplication of security scanning results When a pipeline contains jobs that produce multiple security reports of the same type, it is possible that the same vulnerability finding is present in multiple reports. This duplication is common when different scanners are used to increase coverage, but can also exist in a single report. -The deduplication process allows you to maximize the vulnerability scanning coverage while reducing -the number of findings you need to manage. +Vulnerability deduplication automatically consolidates duplicate findings across scans, helping you +focus on unique vulnerabilities while maintaining full scanning coverage. The logic for deduplicating vulnerabilities varies depending on the scan type: -- GitLab From 8c25d8c8c20d8fee5b3a21de549d1e32f4133685 Mon Sep 17 00:00:00 2001 From: Brian Williams Date: Thu, 11 Dec 2025 11:16:13 -0600 Subject: [PATCH 4/9] Improve docs on identifier deduplication --- .../detect/vulnerability_deduplication.md | 10 +++++----- .../application_security/terminology/_index.md | 15 ++++++++------- 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/doc/user/application_security/detect/vulnerability_deduplication.md b/doc/user/application_security/detect/vulnerability_deduplication.md index 091c3484993b6f..67773226451634 100644 --- a/doc/user/application_security/detect/vulnerability_deduplication.md +++ b/doc/user/application_security/detect/vulnerability_deduplication.md @@ -26,7 +26,7 @@ The logic for deduplicating vulnerabilities varies depending on the scan type: - All other vulnerabilities are considered a duplicate of another vulnerability when their [scan type](../terminology/_index.md#scan-type-report-type), [location](../terminology/_index.md#location-fingerprint), and -[primary identifier](../../../development/integrations/secure.md#primary-identifier) are the same. +[identifiers](../terminology/_index.md#identifier) are the same. The scan type must match because each can have its own definition for the location of a vulnerability. For example, static analyzers are able to locate a file path and line number, whereas @@ -85,7 +85,7 @@ Here are some examples of how vulnerability deduplication behaves. - Scan type: `container_scanning` - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - Identifiers: CVE-2022-25510 -- Deduplication result: no deduplication identified because the scan type is different. +- Deduplication result: no deduplication is performed because the scan type is different. ### Matching location and scan type, mismatching type identifiers @@ -97,7 +97,7 @@ Here are some examples of how vulnerability deduplication behaves. - Scan type: `sast` - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - Identifiers: CWE-798 -- Deduplication result: no duplication identified because `CWE` identifiers are ignored. +- Deduplication result: no deduplication is performed because `CWE` identifiers are ignored. ### Matching scan type, location and an identifier @@ -109,5 +109,5 @@ Here are some examples of how vulnerability deduplication behaves. - Scan type: `container_scanning` - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - Identifiers: CVE-2022-25510, CWE-798 -- Deduplication result: duplication occurs because all criteria match, and type identifiers (CWE) are ignored. - Only one identifier needs to match, in this case CVE-2022-25510. +- Deduplication result: the findings are deduplicated because both findings have the same scan type, location fingerprint, + and are identified as CVE-2022-25510. diff --git a/doc/user/application_security/terminology/_index.md b/doc/user/application_security/terminology/_index.md index 159b54bf4c856c..3e882922e610f0 100644 --- a/doc/user/application_security/terminology/_index.md +++ b/doc/user/application_security/terminology/_index.md @@ -107,6 +107,12 @@ A flexible and non-destructive way to visually organize vulnerabilities in group that are likely related but do not qualify for deduplication. For example, you can include findings that should be evaluated together, would be fixed by the same action, or come from the same source. +## Identifier + +An identifier is an ID for the vulnerability from an external database, such as Common Vulnerabilities and Exposures (CVE) +or Common Weakness Enumeration (CWE). A vulnerability may have multiple identifiers. +An identifier is composed of a type (like `CVE`) and an ID (like `CVE-2021-44228`). + ## Insignificant finding A legitimate finding that a particular customer doesn't care about. @@ -262,13 +268,8 @@ Examples: `DS_EXCLUDED_PATHS` should `Exclude files and directories from the sca ## Primary identifier -A finding's primary identifier is a value that is unique to each finding. The external type and external ID -of the finding's [first identifier](https://gitlab.com/gitlab-org/security-products/security-report-schemas/-/blob/v2.4.0-rc1/dist/sast-report-format.json#L228) -combine to create the value. - -An example primary identifier is `CVE`, which is used for Trivy. The identifier must be stable. -Subsequent scans must return the same value for the same finding, even if the location has slightly -changed. +The first [identifier](#identifier) is the primary identifier. The primary identifier must be stable. +Subsequent scans must return the same value for the same finding, even if the location of the vulnerability has changed. ## Processor -- GitLab From 3e5180e07ce7f8b357c3612c3383a12d75f3d2ff Mon Sep 17 00:00:00 2001 From: Brian Williams Date: Thu, 11 Dec 2025 11:18:24 -0600 Subject: [PATCH 5/9] Use the word "vulnerability" instead of "finding" --- .../detect/vulnerability_deduplication.md | 32 +++++++++---------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/doc/user/application_security/detect/vulnerability_deduplication.md b/doc/user/application_security/detect/vulnerability_deduplication.md index 67773226451634..724eddd61c03d4 100644 --- a/doc/user/application_security/detect/vulnerability_deduplication.md +++ b/doc/user/application_security/detect/vulnerability_deduplication.md @@ -14,9 +14,9 @@ description: Deduplication of security scanning results {{< /details >}} When a pipeline contains jobs that produce multiple security reports of the same type, it is -possible that the same vulnerability finding is present in multiple reports. This duplication is +possible that the same vulnerability is present in multiple reports. This duplication is common when different scanners are used to increase coverage, but can also exist in a single report. -Vulnerability deduplication automatically consolidates duplicate findings across scans, helping you +Vulnerability deduplication automatically consolidates duplicate vulnerabilities across scans, helping you focus on unique vulnerabilities while maintaining full scanning coverage. The logic for deduplicating vulnerabilities varies depending on the scan type: @@ -34,11 +34,11 @@ a container scanning analyzer uses the image name instead. When comparing identifiers, GitLab does not compare `CWE` and `WASC` during deduplication because they are "type identifiers" and are used to classify groups of vulnerabilities. Including these -identifiers would result in many findings being incorrectly considered duplicates. Two findings are +identifiers would result in many vulnerabilities being incorrectly considered duplicates. Two vulnerabilities are considered unique if none of their identifiers match. -In a set of duplicated findings, the first occurrence of a finding is kept and the remaining are -skipped. Security reports are processed in alphabetical file path order, and findings are processed +In a set of duplicated vulnerabilities, the first occurrence of a vulnerability is kept and the remaining are +skipped. Security reports are processed in alphabetical file path order, and vulnerabilities are processed sequentially in the order they appear in a report. ## Location definitions by scan type @@ -59,17 +59,17 @@ The location used for deduplication is dependent on the scan type. ### Coverage fuzzing - Location is defined by the file path and line number of the vulnerable code. -- Two findings are considered duplicates if they occur in the same file at the same line number. +- Two vulnerabilities are considered duplicates if they occur in the same file at the same line number. ### Dynamic Application Security Testing (DAST) - Location is defined by the URL path and HTTP method. -- Two findings are considered duplicates if they occur at the same URL endpoint with the same HTTP method. +- Two vulnerabilities are considered duplicates if they occur at the same URL endpoint with the same HTTP method. ### Dependency scanning - Location is defined by the package name and version. -- Two findings are considered duplicates if they affect the same package version. +- Two vulnerabilities are considered duplicates if they affect the same package version. ## Deduplication examples @@ -77,11 +77,11 @@ Here are some examples of how vulnerability deduplication behaves. ### Matching identifiers and location, mismatching scan type -- First finding: +- First vulnerability: - Scan type: `dependency_scanning` - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - Identifiers: CVE-2022-25510 -- Second Finding: +- Second vulnerability: - Scan type: `container_scanning` - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - Identifiers: CVE-2022-25510 @@ -89,11 +89,11 @@ Here are some examples of how vulnerability deduplication behaves. ### Matching location and scan type, mismatching type identifiers -- First finding: +- First vulnerability: - Scan type: `sast` - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - Identifiers: CWE-259 -- Second finding: +- Second vulnerability: - Scan type: `sast` - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - Identifiers: CWE-798 @@ -101,13 +101,13 @@ Here are some examples of how vulnerability deduplication behaves. ### Matching scan type, location and an identifier -- First finding: +- First vulnerability: - Scan type: `container_scanning` - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - Identifiers: CVE-2019-12345, CVE-2022-25510, CWE-259 -- Second finding: +- Second vulnerability: - Scan type: `container_scanning` - Location fingerprint: `adc83b19e793491b1c6ea0fd8b46cd9f32e592fc` - Identifiers: CVE-2022-25510, CWE-798 -- Deduplication result: the findings are deduplicated because both findings have the same scan type, location fingerprint, - and are identified as CVE-2022-25510. +- Deduplication result: the vulnerabilities are deduplicated because both vulnerabilities have the same scan type, location fingerprint, + and are identified as CVE-2022-25510. -- GitLab From 23b2be92eedbb379fef7b39a3ff445045b2f8071 Mon Sep 17 00:00:00 2001 From: Brian Williams Date: Thu, 11 Dec 2025 11:43:06 -0600 Subject: [PATCH 6/9] Add documentation describing the scope-offset algorithm --- .../detect/vulnerability_deduplication.md | 50 ++++++++++++++++++- 1 file changed, 49 insertions(+), 1 deletion(-) diff --git a/doc/user/application_security/detect/vulnerability_deduplication.md b/doc/user/application_security/detect/vulnerability_deduplication.md index 724eddd61c03d4..a35852383a3f97 100644 --- a/doc/user/application_security/detect/vulnerability_deduplication.md +++ b/doc/user/application_security/detect/vulnerability_deduplication.md @@ -21,7 +21,7 @@ focus on unique vulnerabilities while maintaining full scanning coverage. The logic for deduplicating vulnerabilities varies depending on the scan type: -- SAST vulnerabilities are deduplicated using [vulnerability tracking](../../../development/sec/vulnerability_tracking.md). +- SAST vulnerabilities are deduplicated using the [scope-offset algorithm](#scope-offset-signatures). - Secret detection vulnerabilities are deduplicated [per value and file](../secret_detection/pipeline/_index.md#duplicate-vulnerability-tracking) - All other vulnerabilities are considered a duplicate of another vulnerability when their [scan type](../terminology/_index.md#scan-type-report-type), @@ -71,6 +71,54 @@ The location used for deduplication is dependent on the scan type. - Location is defined by the package name and version. - Two vulnerabilities are considered duplicates if they affect the same package version. +## Scope-offset signatures + +When security scanners analyze your code, they sometimes report the same vulnerability multiple times, +especially when code is refactored or moved around. Advanced vulnerability tracking uses a smart deduplication +system to recognize when these are actually the same issue, not new ones. + +Imagine you have a security issue in a function. If a developer refactors the code and moves that function to a different line, +the scanner might report it as a new vulnerability. Without deduplication, you'd see duplicate alerts for the same problem, +making it harder to track what you actually need to fix. + +When using scope-offset signatures, GitLab creates a unique "fingerprint" for each vulnerability using three pieces of information: + +1. Filename: Which file contains the vulnerability +1. Scope: The code context where the vulnerability lives (like a function name or class name) +1. Offset: The position relative to that scope + +This combination creates a signature that stays the same even when code moves around, as long as it stays within the same scope. + +### A Real-World Example + +Let's say you have this Ruby code: + +```ruby +class OuterClass + class InnerClassA + def function_A(x) + puts "calling call1" + call1(x) # ← Vulnerability found here on line 5 + end + call2("calling call 2") + end +end +``` + +The scanner finds a vulnerability on line 5. GitLab needs to figure out: "Is this vulnerability in `OuterClass`, `InnerClassA`, or `function_A`?" +It calculates which scope is the "tightest fit" by measuring the distance from the vulnerability to the beginning and end of each scope: + +- `OuterClass` (lines 1-9): Distance = (5-1) + (9-5) = 8 +- `InnerClassA` (lines 2-8): Distance = (5-2) + (8-5) = 6 +- `function_A` (lines 3-6): Distance = (5-3) + (6-5) = 3 + +The smallest distance wins, so GitLab identifies `function_A` as the scope. + +GitLab creates a signature like `lib/outer_class.rb|OuterClass[0]|InnerClassA[0]|function_A[0]:2` +which is used to identify the location of the vulnerability. If the vulnerability moves around within the same scope, +it will not be considered a new vulnerability. However, perhaps `OuterClass` is renamed. This causes the scope to +change, and a new vulnerability is created. + ## Deduplication examples Here are some examples of how vulnerability deduplication behaves. -- GitLab From beb47af3bde4dd67171626e75ec75b1c3ff8147a Mon Sep 17 00:00:00 2001 From: Brian Williams Date: Fri, 12 Dec 2025 08:35:50 -0600 Subject: [PATCH 7/9] Apply reviewer suggestions --- .../detect/vulnerability_deduplication.md | 23 ++++++++----------- 1 file changed, 9 insertions(+), 14 deletions(-) diff --git a/doc/user/application_security/detect/vulnerability_deduplication.md b/doc/user/application_security/detect/vulnerability_deduplication.md index a35852383a3f97..cf0ddc782b7e6d 100644 --- a/doc/user/application_security/detect/vulnerability_deduplication.md +++ b/doc/user/application_security/detect/vulnerability_deduplication.md @@ -22,7 +22,7 @@ focus on unique vulnerabilities while maintaining full scanning coverage. The logic for deduplicating vulnerabilities varies depending on the scan type: - SAST vulnerabilities are deduplicated using the [scope-offset algorithm](#scope-offset-signatures). -- Secret detection vulnerabilities are deduplicated [per value and file](../secret_detection/pipeline/_index.md#duplicate-vulnerability-tracking) +- Secret detection vulnerabilities are deduplicated [per value and file](../secret_detection/pipeline/_index.md#duplicate-vulnerability-tracking). - All other vulnerabilities are considered a duplicate of another vulnerability when their [scan type](../terminology/_index.md#scan-type-report-type), [location](../terminology/_index.md#location-fingerprint), and @@ -56,14 +56,9 @@ The location used for deduplication is dependent on the scan type. - `registry.gitlab.com/group-name/project-name/image1:v19202021:libcrypto3` - `registry.gitlab.com/group-name/project-name/image1:libcrypto3` -### Coverage fuzzing - -- Location is defined by the file path and line number of the vulnerable code. -- Two vulnerabilities are considered duplicates if they occur in the same file at the same line number. - ### Dynamic Application Security Testing (DAST) -- Location is defined by the URL path and HTTP method. +- Location is defined by the URL path, HTTP method, and HTTP parameters. - Two vulnerabilities are considered duplicates if they occur at the same URL endpoint with the same HTTP method. ### Dependency scanning @@ -81,11 +76,11 @@ Imagine you have a security issue in a function. If a developer refactors the co the scanner might report it as a new vulnerability. Without deduplication, you'd see duplicate alerts for the same problem, making it harder to track what you actually need to fix. -When using scope-offset signatures, GitLab creates a unique "fingerprint" for each vulnerability using three pieces of information: +When using scope-offset signatures, GitLab creates a unique "fingerprint" for each vulnerability using the following information: -1. Filename: Which file contains the vulnerability -1. Scope: The code context where the vulnerability lives (like a function name or class name) -1. Offset: The position relative to that scope +- Filename: Which file contains the vulnerability +- Scope: The code context where the vulnerability lives (like a function name or class name) +- Offset: The position relative to that scope This combination creates a signature that stays the same even when code moves around, as long as it stays within the same scope. @@ -115,9 +110,9 @@ It calculates which scope is the "tightest fit" by measuring the distance from t The smallest distance wins, so GitLab identifies `function_A` as the scope. GitLab creates a signature like `lib/outer_class.rb|OuterClass[0]|InnerClassA[0]|function_A[0]:2` -which is used to identify the location of the vulnerability. If the vulnerability moves around within the same scope, -it will not be considered a new vulnerability. However, perhaps `OuterClass` is renamed. This causes the scope to -change, and a new vulnerability is created. +to identify the location of the vulnerability. If the vulnerability moves around in the same scope +it's considered the same vulnerability. However, if `OuterClass` is renamed the scope is different and +a new vulnerability is created. ## Deduplication examples -- GitLab From 1dd223d32805acd4d1e476ea7e738ea429caac34 Mon Sep 17 00:00:00 2001 From: Brian Williams Date: Fri, 12 Dec 2025 15:07:34 -0600 Subject: [PATCH 8/9] Apply reviewer edits --- .../detect/vulnerability_deduplication.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/doc/user/application_security/detect/vulnerability_deduplication.md b/doc/user/application_security/detect/vulnerability_deduplication.md index cf0ddc782b7e6d..c97f6121d2138e 100644 --- a/doc/user/application_security/detect/vulnerability_deduplication.md +++ b/doc/user/application_security/detect/vulnerability_deduplication.md @@ -78,15 +78,15 @@ making it harder to track what you actually need to fix. When using scope-offset signatures, GitLab creates a unique "fingerprint" for each vulnerability using the following information: -- Filename: Which file contains the vulnerability -- Scope: The code context where the vulnerability lives (like a function name or class name) -- Offset: The position relative to that scope +- Filename: The file that contains the vulnerability. +- Scope: The code context where the vulnerability lives (like a function name or class name). +- Offset: The position relative to that scope. This combination creates a signature that stays the same even when code moves around, as long as it stays within the same scope. -### A Real-World Example +### Example -Let's say you have this Ruby code: +Say you have this Ruby code: ```ruby class OuterClass @@ -100,8 +100,8 @@ class OuterClass end ``` -The scanner finds a vulnerability on line 5. GitLab needs to figure out: "Is this vulnerability in `OuterClass`, `InnerClassA`, or `function_A`?" -It calculates which scope is the "tightest fit" by measuring the distance from the vulnerability to the beginning and end of each scope: +The scanner finds a vulnerability on line 5. GitLab needs to figure out whether the vulnerability is in `OuterClass`, `InnerClassA`, or `function_A`? +The scanner calculates which scope is the best fit by measuring the distance from the vulnerability to the beginning and to the end of each scope: - `OuterClass` (lines 1-9): Distance = (5-1) + (9-5) = 8 - `InnerClassA` (lines 2-8): Distance = (5-2) + (8-5) = 6 @@ -111,7 +111,7 @@ The smallest distance wins, so GitLab identifies `function_A` as the scope. GitLab creates a signature like `lib/outer_class.rb|OuterClass[0]|InnerClassA[0]|function_A[0]:2` to identify the location of the vulnerability. If the vulnerability moves around in the same scope -it's considered the same vulnerability. However, if `OuterClass` is renamed the scope is different and +it's considered the same vulnerability. However, if `OuterClass` is renamed, the scope is different and a new vulnerability is created. ## Deduplication examples -- GitLab From dc52bdf19ef01929e56e44c8d676fed55ca19686 Mon Sep 17 00:00:00 2001 From: Brian Williams Date: Mon, 15 Dec 2025 08:30:09 -0600 Subject: [PATCH 9/9] Clarify how scope-offset comparison works --- .../detect/vulnerability_deduplication.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/user/application_security/detect/vulnerability_deduplication.md b/doc/user/application_security/detect/vulnerability_deduplication.md index c97f6121d2138e..cb9b2a698c7066 100644 --- a/doc/user/application_security/detect/vulnerability_deduplication.md +++ b/doc/user/application_security/detect/vulnerability_deduplication.md @@ -110,9 +110,9 @@ The scanner calculates which scope is the best fit by measuring the distance fro The smallest distance wins, so GitLab identifies `function_A` as the scope. GitLab creates a signature like `lib/outer_class.rb|OuterClass[0]|InnerClassA[0]|function_A[0]:2` -to identify the location of the vulnerability. If the vulnerability moves around in the same scope -it's considered the same vulnerability. However, if `OuterClass` is renamed, the scope is different and -a new vulnerability is created. +to identify the location of the vulnerability. If the function or class that contains the vulnerability is moved +to a different location within its parent scope, the vulnerability will not be reintroduced. +However, if `OuterClass` is renamed the scope is different and a new vulnerability is created. ## Deduplication examples -- GitLab