Draft: Refactor Builder methods by collecting all variables first
What does this MR do and why?
This MR optimises variable collection performance in CI builds by introducing a new Collection.from_collections method and refactoring variable builders to use batch construction instead of multiple concatenation operations.
Performance Problem:
- Reduced method call overhead - eliminates multiple concat calls.
- More efficient memory allocation - single array allocation vs. multiple appends.
- Better cache locality - single pass through data vs. multiple iterations.
Solution:
- Added Collection.from_collections(*collections) class method for collecting the variables and instantiating a hash once all variables are collected.
Impact:
- Replaces multiple concat() calls with a single from_collections() method for more efficient variable collection building.
- Delivers 10-13% performance improvements across different CI variable scenarios.
- Reduces method call overhead and improves memory allocation patterns while maintaining identical functionality and variable precedence.
References
While working on #565862, I identified performance bottlenecks in CI variable collection methods due to inefficient hash operations. Based on profiling data (internal link), the following methods showed significant execution time:
- Gitlab::Ci::Variables::Builder#scoped_variables -> This MR refactors
scoped_variables,unprotected_scoped_variables,scoped_variables_for_pipeline_seedmethods. Thescoped_variablesis the most problematic according to our logs, nevertheless, the other two methods are also using the old pattern so I have refactored them as well. - Gitlab::Ci::Variables::Builder::Pipeline#predefined_variables -> This method as well as other variable methods that we have, that use the same pattern, should be refactored in the followup MRs. I will create those once proposed solution here is accepted and this MR is merged.
Most methods that process variables use the concat pattern, I believe we could use the from_collections method pattern in other classes/methods.
In Gitlab::Ci::Variables::Builder, there are multiple methods that follow similar pattern and could benefit from this optimisation. I've focused on selected methods for now, I would like get some feedback on this approach before refactoring additional variable assembly methods.
Screenshots or screen recordings
| Before | After |
|---|---|
| -> |
|
How to set up and validate locally
- Create a project with multiple variable levels (instance, group, project, pipeline).
- Add 20+ variables at each level.
- Run a CI job and verify variables are resolved correctly.
Performance Benchmark Analysis
After temporarily adding below script to your local, you can run this benchmark testing via RAILS_ENV=development rails runner scripts/ci_variables_performance_benchmark.rb
To run the below benchmark script:
- Add
scripts/ci_variables_performance_benchmark.rbfile into your scrips and paste below code. - run
RAILS_ENV=development rails runner scripts/ci_variables_performance_benchmark.rb
# CI Variables Collection Performance Benchmark
require 'benchmark'
ActiveRecord::Base.logger = nil
def create_variable_collections(source_count, vars_per_source)
prefixes = %w[CI_ PROJECT_ GITLAB_ RUNNER_ ENV_]
source_count.times.map do |source_index|
variables = vars_per_source.times.map do |var_index|
prefix = prefixes[source_index % prefixes.size]
{ key: "#{prefix}VAR_#{var_index + 1}", value: "value_#{source_index + 1}_#{var_index + 1}" }
end
Gitlab::Ci::Variables::Collection.new(variables)
end
end
def run_original_approach(collections)
Gitlab::Ci::Variables::Collection.new.tap do |variables|
collections.each { |collection| variables.concat(collection) }
end
end
def run_new_approach(collections)
Gitlab::Ci::Variables::Collection.from_collections(*collections)
end
def run_benchmark(iterations, &block)
# Warmup
3.times(&block)
# Measure 5 runs
times = []
5.times do
# Garbage collection
GC.disable
time = Benchmark.realtime { iterations.times(&block) }
GC.enable
GC.start
times << time
print "."
end
puts " #{times.sum / times.size * 1000}ms avg"
times.sum / times.size
end
def show_results(original_time, new_time)
percentage_change = ((new_time - original_time) / original_time * 100).round(1)
if percentage_change < 0
puts "--> #{percentage_change.abs}% faster (#{(original_time / new_time).round(2)}x speedup)"
else
puts "--> #{percentage_change}% slower"
end
end
scenarios = [
{ name: "Small Project", sources: 8, vars_per_source: 10 },
{ name: "Medium Project", sources: 9, vars_per_source: 15 },
{ name: "Large Project", sources: 10, vars_per_source: 25 },
{ name: "Enterprise Project", sources: 10, vars_per_source: 40 }
]
# Fixed iteration count for all scenarios
ITERATIONS = 50
scenarios.each do |scenario|
puts "\n#{scenario[:name]} (#{scenario[:sources]} × #{scenario[:vars_per_source]} = #{scenario[:sources] * scenario[:vars_per_source]} vars)"
collections = create_variable_collections(scenario[:sources], scenario[:vars_per_source])
print "Original approach: "
original_time = run_benchmark(ITERATIONS) do
run_original_approach(collections)
end
print "New approach: "
new_time = run_benchmark(ITERATIONS) do
run_new_approach(collections)
end
show_results(original_time, new_time)
end
Results:
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

