[go: up one dir, main page]

Skip to content

Add text replacement functionality to bulk import API

Summary

Enhance the bulk import API to support text pattern replacement during migration, allowing users to automatically update references, URLs, and other text content when migrating between GitLab instances.

Problem to Solve

When migrating projects/groups from on-premises GitLab to GitLab.com (or between instances), many text references become outdated:

  • Issue/MR descriptions contain old instance URLs
  • Comments reference old project paths
  • Documentation links point to the previous instance
  • Team and Person mentions are inaccurate
  • Manual post-migration cleanup is time-consuming and error-prone, also it messes up timestamps

Proposal

Add a text_replacements parameter to the bulk import API that accepts an array of find-and-replace patterns:

{
  "configuration": {
    "url": "https://source-gitlab.example.com",
    "access_token": "glpat-xxxxxxxxxxxxxxxxxxxx"
  },
  "entities": [
    {
      "source_type": "group_entity",
      "source_full_path": "my-group",
      "destination_name": "my-group",
      "destination_namespace": "my-org"
    }
  ],
  "text_replacements": [
    {
      "source_text": "https://on-prem.gitlab.example/group1",
      "target_text": "https://gitlab.com/organization1/group1"
    },
    {
      "source_text": "https://on-prem.gitlab.example/group2", 
      "target_text": "https://gitlab.com/organization2/group1"
    },
    {
      "source_text": "@internal-team",
      "target_text": "@company-org/internal-team"
    },
    {
      "source_text": "JIRA-\\d+",
      "target_text": "https://company.atlassian.net/browse/$0",
      "regex": true
    }
  ]
}

Scope

Apply text replacements to:

  • Issue descriptions and comments
  • Merge request descriptions and comments
  • Epic descriptions and comments
  • Milestone descriptions
  • Wiki content (?)
  • Project/group descriptions
  • Release descriptions

Implementation Considerations

  • Add validation for text_replacements parameter
  • Implement replacement logic in relevant pipeline classes
  • Support regex patterns for advanced matching
  • Provide dry-run option to preview changes
  • Add logging for replacement operations
  • Consider performance impact on large migrations

Benefits

  • Reduces manual post-migration work
  • Ensures consistent reference updates
  • Improves migration experience for enterprise users
  • Maintains content integrity across instances

Current API Context

The bulk import API currently accepts the following structure according to the official documentation:

curl --request POST --header "PRIVATE-TOKEN: <your_access_token>" \
     --header "Content-Type: application/json" \
     --data '{
       "configuration": {
         "url": "https://source.gitlab.example.com",
         "access_token": "glpat-xxxxxxxxxxxxxxxxxxxx"
       },
       "entities": [
         {
           "source_type": "group_entity",
           "source_full_path": "my-group",
           "destination_name": "my-group", 
           "destination_namespace": "my-org"
         }
       ]
     }' \
     "https://gitlab.example.com/api/v4/bulk_imports"

Technical Implementation Details

The text replacement feature should integrate with the existing BulkImports pipeline architecture:

Affected Pipeline Classes:

  • BulkImports::Groups::Pipelines::* - Group-level content
  • BulkImports::Projects::Pipelines::* - Project-level content
  • Specifically: IssuesPipeline, MergeRequestsPipeline, MilestonesPipeline, etc.

Processing Location: Text replacements should occur during the transformation phase of each pipeline, before data is written to the target instance.

Use Cases

Enterprise Migration Scenario:

{
  "text_replacements": [
    {
      "source_text": "https://gitlab.internal.company.com",
      "target_text": "https://gitlab.com/company-org"
    },
    {
      "source_text": "@internal-team",
      "target_text": "@company-org/internal-team"
    },
    {
      "source_text": "JIRA-\\d+",
      "target_text": "https://company.atlassian.net/browse/$0",
      "regex": true
    }
  ]
}

API Response Enhancement

The bulk import status response should include replacement statistics:

{
  "id": 1,
  "status": "finished",
  "text_replacements_applied": {
    "total_replacements": 247,
    "entities_affected": 89,
    "breakdown": {
      "issues": 45,
      "merge_requests": 23,
      "comments": 179
    }
  }
}

Backward Compatibility

The text_replacements parameter should be optional to maintain compatibility with existing integrations.

Related Documentation

This enhancement would significantly improve the migration experience for organizations moving between GitLab instances, especially in enterprise environments where maintaining accurate cross-references is critical.

Edited by Elan Ruusamäe