Critical Issues with Runners after GitLab Upgrade to v17.11.2
After upgrading our GitLab instance to version 17.11.2, we have encountered several critical issues related to the operation of GitLab Runners, which have significantly affected our development and deployment workflows.
Existing Runners Not Picking Up Jobs Following the upgrade, a large number of our previously registered runners stopped picking up jobs, despite still appearing as active in the GitLab UI. We have identified this as a known issue, corroborated by multiple threads on your official forums, suggesting that affected runners need to be re-registered. Due to the impact on our business processes, we were forced to urgently re-register a substantial portion of our runners.
Runner Token Prefix Collision and Working Directory Conflicts After re-registering our runners, we faced another issue related to the new runner token format. Each runner has a symbolic identifier or prefix (visible in the GitLab UI next to its numeric ID), which is also used internally to distinguish working directories between different logical runners on the same machine.
The path template in question is:
{builds_dir}/$RUNNER_TOKEN_KEY/$CONCURRENT_PROJECT_ID/$NAMESPACE/$PROJECT_NAME
where $RUNNER_TOKEN_KEY is derived from the first 8 characters of the runner token (excluding the common "glrt-" prefix).
Previously, runner tokens were purely random strings, ensuring uniqueness of these prefixes. However, with the new routable token format, the token contains base64-encoded runner metadata followed by a random segment, payload length, and checksum. As a result, runners with similar metadata produce identical or similar $RUNNER_TOKEN_KEY values.
For example, all project_type runners in our instance now share the same prefix "bzoxCnA6", which decodes to "o:1 p:", meaning every runner assigned to a project in organization ID 1 will use the same directory path on shared servers. This causes multiple runners to interfere with one another by attempting to use the same working directory, something that did not occur before the update.
Although we have identified temporary workarounds, the underlying problem persists and may surface in other contexts. We kindly request your assistance in addressing this issue and would appreciate any guidance or timeline on a potential fix.
Thank you in advance for your support.