[go: up one dir, main page]

Batch SyncPolicyWorker with delays to prevent worker saturation

What does this MR do and why?

When a security policy is created, updated, or deleted in a large namespace with many projects, the worker immediately enqueues sync jobs for all affected projects. This causes a spike in worker queue load, potentially saturating the worker pool and degrading system performance.

This merge request addresses worker saturation issues when large namespaces perform security policy synchronization. The change introduces batching and staggered delays to the SyncPolicyWorker, preventing the worker queue from being overwhelmed by a large number of simultaneous sync jobs.

Delay Calculation Table:

Project Count Batch Count Total Delay
1 1 1 second
10 1 1 second
100 1 1 second
1,000 10 10 seconds
10,000 100 100 seconds (1 min 40 sec)
50,000 500 500 seconds (8 min 20 sec)
100,000 1,000 1,000 seconds (16 min 40 sec)

Example Breakdown for 10,000 projects:

  • Batch 1 (projects 1-100): Delayed by 1 second
  • Batch 2 (projects 101-200): Delayed by 2 seconds
  • Batch 3 (projects 201-300): Delayed by 3 seconds
  • ...
  • Batch 100 (projects 9,901-10,000): Delayed by 100 seconds

This staggered approach distributes the worker load over time, preventing queue saturation while ensuring all projects are eventually synced. In order to not introduce substantial delay in policy sync, this change is introduced behind security_policies_batched_sync_delay feature flag.

References

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #580036

Edited by Sashi Kumar Kumaresan

Merge request reports

Loading