[maintd] Move Maintenance process from taped into a new component: cta-maintd
Description
-
Moved
common/processCap/ProcessCap.[h|c]ppintocommon/process/ -
Moved
common/threadingintocommon/process/threading -
New functionality in
common/Config.hppto:- Be able to re-parse the config after object creation.
- Log the parsed entries without needing to do it externally like in frontend/common/FrontendService.cpp
- No changes have been done to the current usage (like FrontendService).
-
CmdLineParams is now part of the common library as it is used by taped and maintenance
-
Maintd
- Renamed all runners to routines
- Moved relevant routine functionality to the
maintddirectory - Created interface
IRoutinethat all routines implement. - Moved all hard-coded values into the maintd configuration file. During the creation of the routine we set the defaults if the values are not in the config. The default and example values are either the previously hardcoded values or the values we use in prod.
- The
RoutineRunnerloop repeatedly executes routines. Once all routines have executed we sleep for N seconds and start over.- The default sleep has been reduced from 10 secs to 1 for CI and dev environments.
- The process supports reloading the config, the process' user/group and the log format will not be reloaded.
-
RepackRequestManager- Removed the "RepackReportThreads" as the threading functionality was not actually being used, simplified the reporting code with a simple templated function.
-
k8s deployment
- For now we deploy a new pod with one maintd process running in it. Future improvements should allow to test X processes in Y pods.
-
Created a SignalReader and SignalReactor in the common library,
- The SignalReader reads all the pending signal and returns a std::set with the received signals.
- The SignalReactor follows the command pattern; based on the signals it sees from the SignalReader it can run a given function (which function is configurable)
- These functions could be e.g. to tell the MaintenanceDaemon to stop or to reload the config.
- When SystemD stops a process it will wait for X seconds to the process to shutdown gracefully; as we specify (were possible) a timeout to limit how long a routine should run, by making SystemD's shutdown time to be the double of the max timeout of all the runners we "can" guarantee a clean exit (if a runner is stuck the process will get killed but that can also happen today).
-
Stress Test Results
Related MRs
-
CI Monitoring: https://gitlab.cern.ch/cta/sandbox/ci_monitoring/-/merge_requests/4 -
Puppet Module: https://gitlab.cern.ch/ai/it-puppet-module-cta/-/merge_requests/313 -
Puppet Hostgroup: https://gitlab.cern.ch/ai/it-puppet-hostgroup-cta/-/merge_requests/413
Additional Required Actions
- Requires manual tests in pre-production: YES
- Deployment and tests documented here: https://gitlab.cern.ch/cta/operations/-/issues/1819
- Requires a documentation update: YES
Checklist
-
Documentation reflects the changes made. -
Merge Request title is clear, concise, and suitable as a changelog entry. See our contributing docs
References
Closes #1032 (closed)
Edited by Niels Alexander Buegel
