[go: up one dir, main page]

Skip to content

Unified import process based on "schematics"

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Background

Currently we have many different mechanisms for importing projects:

There is also the Demo Test Data Working Group that is working on tooling to seed data based on the "Awesome Co" example group.

The WG's current proposal (!83656 (merged)) involves seeding data through rake tasks that call internal APIs.

This approach is similar to the one we've historically taken, i.e. hardcoding the process into the backend code. You can find plenty of examples where we are effectively maintaining full-blown client libraries for our competitors’ APIs in our core application code:

Problem

Each of the project import mechanisms described above have caveats and limitations (including our own GitLab <-> GitLab import/export processes and project namespace transfer).

What is consistent is the lack of customization. All of our import/export processes are "black box" and give users no opportunity to apply any transformations or filter any of the data as it is processed.

In summary, the current processes:

  1. Require we maintain custom clients for each provider even when official libraries exist
  2. Has very little code shared between processes/features
  3. Cannot be modified or executed outside of the application and are thus difficult to contribute to
  4. Cannot be tweaked or configured by end-users at runtime

Proposed Solution

I think there is an opportunity to consolidate all these use cases into a unified, multi-step process. Note that everything up until the last step could potentially be implemented and executed client-side by the user.

  1. User provides a "schematic" as a data-only representation of the top-level group or project to be imported
  2. Hydrate the schematic by resolving nested group/project structures, git remotes, serialize all project settings, downloading all binary/text assets, evaluating any templates, and storing everything in data/ alongside the hydrated schematic.yml
  3. Future: Apply somewhat arbitrary transformations on the schematic, for example:
    • Resolving/mapping usernames
    • Modifying project/group names
    • Appending/prepending arbitrary text to issue descriptions
    • Transforming/mapping label names
    • Updating timestamps on issues, milestone dates, etc
    • Assign IDs to issues/epics/etc and resolve any references between IDs in the imported data
  4. Execute import process from the hydrated schematic

Schematic Layout

Hydrated Schematic

# project

schematic.yml
data/
  blob/
    # git data

    # any binary data/uploads referenced by
    # other data assets (issues, comments, MRs)

    # eventually include things like packages, container images, etc as well
  issues/
    <issue-id>/
      index.md # issue description
      comments/
        <comment-id>.md
  merge_requests/
    <mr-id>/
      index.md # MR description
      # same basic structure as issues
      # patches stored in blob/
# group

schematic.yml
data/
schematics/
  <name>/
    schematic.yml # nested schematic for child group/project
    data/

Example schematic.yml

kind: Group
spec:
  upstream:
    provider: "<gitlab|github|bitbucket|...>"
    host: "https://..."
    id: "<id>" # provider-specific project identifier
    path: "<group/subgroup>"
    settings:
      # provider-specific group settings 
      description: "..."
  destination:
    provider: "gitlab" # always gitlab
    path: "<group/subgroup>"
    id: "<id>" # GitLab project identifier
    settings:
      # provider-specific settings mapped to GitLab settings 
      description: "..."
kind: Project
spec:
  upstream:
    provider: "<gitlab|github|bitbucket|...>"
    id: "<id>" # provider-specific project identifier
    path: "<group/project>"
    git:
      repo: "<git remote>"
      ref: "<ref>"
      commit: "<sha>"
    settings: {}
  destination:
    provider: "gitlab" # always gitlab
    id: "<id>" # GitLab project identifier
    path: "<group/subgroup/project>"
    settings: {}

Example data/issues/<issue-id>/index.md

Notice that all metadata about an issue/epic/etc is stored in frontmatter attributes. This gives us a provider-agnostic means of storing arbitrary metadata in a text format that is easily manipulated both manually and programmatically.

This is very important in order for schematics to be useful in both the "import/export" and "project template" context.

It also gives us a way to preserve any provider-specific metadata that we don't know how to natively process.

---
title: 'Unified import process based on "schematics"'
author: "@marshall007" 
assignee:
labels: []
created: "2022-05-05"
due: "2022-05-05"
milestone: "15.0"
---

## Background

Currently we have many different mechanisms for importing projects:

...
Edited by 🤖 GitLab Bot 🤖