Investigate Runners as possible sandbox solution for importers
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
This is a spike issue to investigate Runners as a possible sandbox solution for importers.
Time-box: 3d
Questions
- Can we run import logic inside a Runner?
- What would be the effort to rearchitect Imports to run in a Runner?
- Would it address the risks we highlight in https://gitlab.com/gitlab-org/gitlab/-/issues/371770+
- How does rails app communicate with Runners? Is there a communications interface that can be leveraged?
- Is it possible to execute arbitrary commands from rails app on the runner? Or does it only accept
.gitlab-ci.yml
? - In order to perform import, we need to insert into database. Can runners have access to the DB?
- In order to perform import, we need rails app booted. Can runners boot gitlab app?
- If booted rails app and db access are not an option, can we download/extract/decompress files on a Runner and have access to these files in sidekiq? Sounds like such files won't be trustworthy and we'd have to validate them again, since someone can potentially create a malicious runner?
Security concerns
- One thing to note is that some customers may install Runner on the same host as GitLab, so moving import code to runner wouldn't solve the problem. But managing the security of Runners is a known and well-documented part of self-managing GitLab: https://docs.gitlab.com/runner/security/ (reference: https://gitlab.com/gitlab-org/gitlab/-/issues/371770#note_1111413211
Notes
If I understand correctly, using Runners will not be the ultimate solution here as customers can still install them on the same host. In this case, I wonder if we should still consider this option at all?
Runners on the same host as GitLab is possible, but strongly discouraged for security reasons (1, 2).
Running GitLab Runner(s) on the same machine as GitLab can also have a significant performance impact as well. Under load, GitLab and GitLab Runners all be competing for the same limited resources (CPU, RAM, disk I/O). For this reason alone, GitLab Support advises our customers to run GitLab and GitLab Runner(s) on separate machines, linking to the docs as a source of truth.
If we do consider using Runners as an ultimate solution for self-managed customers, we'll want to think through:
- which executor(s) would be supported or required
- Shell Executor == local file system access
😰 - Docker Executor is most versatile, would require installing docker
- Docker + Machine executor (what GitLab.com uses) - these are external Runners (new VM spun up/down for each job), running Docker+Machine runner executor on same box as a self-managed GitLab is almost guaranteed cause performance degradation
- Kubernetes executor (outside of Cloud Native GitLab instances) might not work for sandbox, as
- Shell Executor == local file system access
- max or recommended job concurrency (with runner installed on same box as GitLab, each job consumes system resources that GitLab also needs to operate)