Resolve "The `populate_prod_db` is failing, leading to missing data in the Insights dashboards"
This improves the populate_prod_db job by:
- Fetching the API asynchronously in batch of 5 pages thanks to
Concurrent::Future(before we'd fetch synchronously the API, waiting for each page to return before fetching the next one). - Inserting the records as we retrieve them, minimizing the memory amount used bu the job (before we'd wait to have all the resources from the API before inserting them).
Problems remaining to solve
The memory used is constantly increasing and reaching 1.7GB when processing gitlab-foss, as can be seen here: https://gitlab.com/gitlab-org/gitlab-insights/-/jobs/344683057
total used free shared buff/cache available
Mem: 7983 7573 126 221 282 24
Swap: 0 0 0
memsize_of_all: 1794.234375 MB, ObjectSpace.count_objects: {:TOTAL=>22342887, :FREE=>5247592, :T_OBJECT=>3124075, :T_CLASS=>99398, :T_MODULE=>1039, :T_FLOAT=>9, :T_STRING=>8967399, :T_REGEXP=>1946, :T_ARRAY=>2423839, :T_HASH=>1971465, :T_STRUCT=>7234, :T_BIGNUM=>9, :T_FILE=>14, :T_DATA=>275813, :T_MATCH=>11024, :T_COMPLEX=>1, :T_RATIONAL=>27611, :T_SYMBOL=>514, :T_IMEMO=>182002, :T_NODE=>214, :T_ICLASS=>1689}
Data point checks
Closes #151 (closed).
Edited by Rémy Coutable