Unified import process based on "schematics"
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Background
Currently we have many different mechanisms for importing projects:
- Migrate from third-party providers
- Import/export for self-hosted <-> GitLab.com
- Project creation from templates
- Project transfer between namespaces
There is also the Demo Test Data Working Group that is working on tooling to seed data based on the "Awesome Co" example group.
The WG's current proposal (!83656 (merged)) involves seeding data through rake
tasks that call internal APIs.
This approach is similar to the one we've historically taken, i.e. hardcoding the process into the backend code. You can find plenty of examples where we are effectively maintaining full-blown client libraries for our competitors’ APIs in our core application code:
- https://gitlab.com/gitlab-org/gitlab/-/tree/v14.10.2-ee/lib/bitbucket/client.rb
- https://gitlab.com/gitlab-org/gitlab/-/tree/v14.10.2-ee/lib/gitlab/bitbucket_import
- https://gitlab.com/gitlab-org/gitlab/-/tree/v14.10.2-ee/lib/gitlab/github_import
- https://gitlab.com/gitlab-org/gitlab/-/tree/v14.10.2-ee/lib/gitlab/fogbugz_import
- etc...
Problem
Each of the project import mechanisms described above have caveats and limitations (including our own GitLab <-> GitLab import/export processes and project namespace transfer).
What is consistent is the lack of customization. All of our import/export processes are "black box" and give users no opportunity to apply any transformations or filter any of the data as it is processed.
In summary, the current processes:
- Require we maintain custom clients for each provider even when official libraries exist
- Has very little code shared between processes/features
- Cannot be modified or executed outside of the application and are thus difficult to contribute to
- Cannot be tweaked or configured by end-users at runtime
Proposed Solution
I think there is an opportunity to consolidate all these use cases into a unified, multi-step process. Note that everything up until the last step could potentially be implemented and executed client-side by the user.
- User provides a "schematic" as a data-only representation of the top-level group or project to be imported
- Hydrate the schematic by resolving nested group/project structures, git remotes, serialize all project settings, downloading all binary/text assets, evaluating any templates, and storing everything in
data/
alongside the hydratedschematic.yml
-
Future: Apply somewhat arbitrary transformations on the schematic, for example:
- Resolving/mapping usernames
- Modifying project/group names
- Appending/prepending arbitrary text to issue descriptions
- Transforming/mapping label names
- Updating timestamps on issues, milestone dates, etc
- Assign IDs to issues/epics/etc and resolve any references between IDs in the imported data
- Execute import process from the hydrated schematic
Schematic Layout
Hydrated Schematic
# project
schematic.yml
data/
blob/
# git data
# any binary data/uploads referenced by
# other data assets (issues, comments, MRs)
# eventually include things like packages, container images, etc as well
issues/
<issue-id>/
index.md # issue description
comments/
<comment-id>.md
merge_requests/
<mr-id>/
index.md # MR description
# same basic structure as issues
# patches stored in blob/
# group
schematic.yml
data/
schematics/
<name>/
schematic.yml # nested schematic for child group/project
data/
schematic.yml
Example kind: Group
spec:
upstream:
provider: "<gitlab|github|bitbucket|...>"
host: "https://..."
id: "<id>" # provider-specific project identifier
path: "<group/subgroup>"
settings:
# provider-specific group settings
description: "..."
destination:
provider: "gitlab" # always gitlab
path: "<group/subgroup>"
id: "<id>" # GitLab project identifier
settings:
# provider-specific settings mapped to GitLab settings
description: "..."
kind: Project
spec:
upstream:
provider: "<gitlab|github|bitbucket|...>"
id: "<id>" # provider-specific project identifier
path: "<group/project>"
git:
repo: "<git remote>"
ref: "<ref>"
commit: "<sha>"
settings: {}
destination:
provider: "gitlab" # always gitlab
id: "<id>" # GitLab project identifier
path: "<group/subgroup/project>"
settings: {}
data/issues/<issue-id>/index.md
Example Notice that all metadata about an issue/epic/etc is stored in frontmatter attributes. This gives us a provider-agnostic means of storing arbitrary metadata in a text format that is easily manipulated both manually and programmatically.
This is very important in order for schematics to be useful in both the "import/export" and "project template" context.
It also gives us a way to preserve any provider-specific metadata that we don't know how to natively process.
---
title: 'Unified import process based on "schematics"'
author: "@marshall007"
assignee:
labels: []
created: "2022-05-05"
due: "2022-05-05"
milestone: "15.0"
---
## Background
Currently we have many different mechanisms for importing projects:
...