Add text replacement functionality to bulk import API
Summary
Enhance the bulk import API to support text pattern replacement during migration, allowing users to automatically update references, URLs, and other text content when migrating between GitLab instances.
Problem to Solve
When migrating projects/groups from on-premises GitLab to GitLab.com (or between instances), many text references become outdated:
- Issue/MR descriptions contain old instance URLs
- Comments reference old project paths
- Documentation links point to the previous instance
- Team and Person mentions are inaccurate
- Manual post-migration cleanup is time-consuming and error-prone, also it messes up timestamps
Proposal
Add a text_replacements
parameter to the bulk import API that accepts an array of find-and-replace patterns:
{
"configuration": {
"url": "https://source-gitlab.example.com",
"access_token": "glpat-xxxxxxxxxxxxxxxxxxxx"
},
"entities": [
{
"source_type": "group_entity",
"source_full_path": "my-group",
"destination_name": "my-group",
"destination_namespace": "my-org"
}
],
"text_replacements": [
{
"source_text": "https://on-prem.gitlab.example/group1",
"target_text": "https://gitlab.com/organization1/group1"
},
{
"source_text": "https://on-prem.gitlab.example/group2",
"target_text": "https://gitlab.com/organization2/group1"
},
{
"source_text": "@internal-team",
"target_text": "@company-org/internal-team"
},
{
"source_text": "JIRA-\\d+",
"target_text": "https://company.atlassian.net/browse/$0",
"regex": true
}
]
}
Scope
Apply text replacements to:
- Issue descriptions and comments
- Merge request descriptions and comments
- Epic descriptions and comments
- Milestone descriptions
- Wiki content (?)
- Project/group descriptions
- Release descriptions
Implementation Considerations
- Add validation for
text_replacements
parameter - Implement replacement logic in relevant pipeline classes
- Support regex patterns for advanced matching
- Provide dry-run option to preview changes
- Add logging for replacement operations
- Consider performance impact on large migrations
Benefits
- Reduces manual post-migration work
- Ensures consistent reference updates
- Improves migration experience for enterprise users
- Maintains content integrity across instances
Current API Context
The bulk import API currently accepts the following structure according to the official documentation:
curl --request POST --header "PRIVATE-TOKEN: <your_access_token>" \
--header "Content-Type: application/json" \
--data '{
"configuration": {
"url": "https://source.gitlab.example.com",
"access_token": "glpat-xxxxxxxxxxxxxxxxxxxx"
},
"entities": [
{
"source_type": "group_entity",
"source_full_path": "my-group",
"destination_name": "my-group",
"destination_namespace": "my-org"
}
]
}' \
"https://gitlab.example.com/api/v4/bulk_imports"
Technical Implementation Details
The text replacement feature should integrate with the existing BulkImports
pipeline architecture:
Affected Pipeline Classes:
-
BulkImports::Groups::Pipelines::*
- Group-level content -
BulkImports::Projects::Pipelines::*
- Project-level content - Specifically:
IssuesPipeline
,MergeRequestsPipeline
,MilestonesPipeline
, etc.
Processing Location: Text replacements should occur during the transformation phase of each pipeline, before data is written to the target instance.
Use Cases
Enterprise Migration Scenario:
{
"text_replacements": [
{
"source_text": "https://gitlab.internal.company.com",
"target_text": "https://gitlab.com/company-org"
},
{
"source_text": "@internal-team",
"target_text": "@company-org/internal-team"
},
{
"source_text": "JIRA-\\d+",
"target_text": "https://company.atlassian.net/browse/$0",
"regex": true
}
]
}
API Response Enhancement
The bulk import status response should include replacement statistics:
{
"id": 1,
"status": "finished",
"text_replacements_applied": {
"total_replacements": 247,
"entities_affected": 89,
"breakdown": {
"issues": 45,
"merge_requests": 23,
"comments": 179
}
}
}
Backward Compatibility
The text_replacements
parameter should be optional to maintain compatibility with existing integrations.
Related Documentation
This enhancement would significantly improve the migration experience for organizations moving between GitLab instances, especially in enterprise environments where maintaining accurate cross-references is critical.