[go: up one dir, main page]

Run the RSpec test suite with large sequence ranges (bigint)

Problem

There was a seeding failure in GDK recently, because of trying to store sequence ID (bigint) to an integer column - gitlab-development-kit#2931 (comment 2718526062).

This would have been a problem for the legacy cell too, but only on large tables (with IDs more than 2 billion - 2^32 -1). But this will be evident in cells because sequence ranges for new cell will start from 1 trillion.

There is not an easy way to find all such occasions, but running our entire test suite as a cell would surface them.

Essentially we want the RSpec running in another variant (similar to pg16 single-db, pg16 single-db-ci-connection, etc) which will have it's sequence ranges altered and objects created using them.

Root Cause Analysis

The issue stems from integer column overflow when sequence values exceed the 32-bit integer limit (2^31 - 1 = 2,147,483,647). This manifests in two scenarios:

  1. Legacy environments: Large tables with IDs approaching 2 billion records
  2. Cells environments: New cells start with sequence ranges at 1 trillion (100_000_000_000), immediately exceeding integer column limits

The problem occurs when:

  • Database sequences generate bigint values (64-bit)
  • Application code assumes integer columns (32-bit)
  • Values get truncated/overflow during insertion, causing seeding failures

Proposed Solutions

Solution 1: CI Sequence Range Simulation (Draft MR !204731)

Approach: Modify CI pipelines to simulate cell-like sequence ranges during database setup.

Implementation:

  • Extract reusable gitlab:db:alter_sequences_range rake task from cell-specific version
  • Configure db:migrate:reset to simulate cell-2 sequence ranges (100B-200B)
  • Apply to all CI pipelines to surface integer overflow issues early

Changes:

# In scripts/utils.sh setup_db function
run_timed_command_with_metric "bundle exec rake db:drop db:create db:schema:load db:migrate gitlab:db:lock_writes 'gitlab:db:alter_sequences_range[100_000_000_001,200_000_000_000]'" "setup_db"

Pros:

  • Catches bigint overflow issues in CI before production
  • No additional CI jobs required
  • Immediate feedback on MRs
  • Tests realistic cell conditions

Cons:

  • ⚠️ Potential for "master broken" due to configuration change
  • ⚠️ All tests run with non-standard sequence ranges

Solution 2: Dedicated CI Variant (Alternative)

Approach: Create separate CI job variant (like pg16 single-db-bigint) with large sequence ranges.

Pros:

  • Isolated testing environment
  • Preserves existing test behavior

Cons:

  • Additional CI jobs increase pipeline complexity
  • Delayed feedback (separate job)
  • Higher CI resource usage

Impact Assessment

Immediate Impact

  • Development: Seeding failures in GDK when using cell configurations
  • Testing: Potential test failures when sequence values exceed integer limits
  • CI/CD: Pipeline failures due to database constraint violations

Long-term Impact

  • Production Risk: Silent data corruption or application errors in high-volume environments
  • Scalability: Blocks adoption of Cells architecture for large GitLab instances
  • Data Integrity: Risk of ID collisions or truncated values

Affected Components

Based on the linked GDK issues, potential problem areas include:

  • Database models with integer ID columns that should be bigint
  • Seed data generation scripts
  • Migration files with incorrect column types
  • Application code making integer size assumptions

Recommended Next Steps

  1. Immediate: Complete and merge Draft MR !204731 to enable CI detection
  2. Short-term: Run comprehensive test suite to identify all affected models/columns
  3. Medium-term: Create migration plan to convert integer columns to bigint where needed
  4. Long-term: Establish guidelines to prevent future integer/bigint mismatches

Documentation Needs

As suggested by @praba.m7n, we should document:

  • New CI behavior with large sequence ranges
  • Guidelines for choosing integer vs bigint column types
  • Migration patterns for existing integer columns
  • Testing procedures for cell-compatible code

This issue affects not just Cells but any high-volume GitLab instance approaching the 2 billion record limit, making it a critical infrastructure concern.

Edited by 🤖 GitLab Bot 🤖