partial loss of activity history
Summary
- Customer upgraded from 8.6.3 to 10.8.7
- They reported losing activity data over a year old, which would be attributable to the housekeeping/pruning
- Activity more recent than a year has also gone missing
I've found two jobs which affected the events data
- The events table pruning job
- widely deployed and running regularly; a bug here would have high visibility
- The migration job that created push_event_payloads
- one-off job
- we've only looked in depth at one event, but it was event type: push
Their use case for activity data is at the project, not individual, level; so the year's retention for building developers' profiles is insufficient.
Asks
-
If either jobs encountered issues, where would it be logged?
-
Could the push_event_payloads migration job be used to repopulate the missing events: those pruned, and those which have vanished?
- See 'data recovery option' below.
-
Might a backup and restore of repositories fix this? Depends on how the activity data is restored, this issue might provide clues
Detail
Customer upgraded from 8.6.3 to 10.8.7 and reported (internal) that, in addition to losing activity entries over a year, some activity within the last year had also vanished.
Loss of >12 months would be attributable to the housekeeping - code for 10.8.7 is here
- ref: gitlab-foss#52246 (closed)
- now at three years: !18399 (merged)
- long term fix; partitioning: #24538 (closed)
- also reported here: #20631 (closed)
But loss of activity data <12 months old should not be caused by the pruning.
A possible second change that might be relevant was the splitting of the events
table, and the creation of push_event_payloads
. This was done via the temporary events_for_migration
table.
This might not be the only customer that saw this. This user might also have been reporting the same phenomenon.
The customer reports it in their production and non production environments, and in more than one project.
Focus has been on a specific event which was the creation of a feature branch; the project's activity is 23 records, so it's easy to spot the gaps. IT would have been a git/push event so would have been split by the migration.
Their upgrade steps:
- upgrade from GitLab 8.6.3 to GitLab 8.13.4
- Upgrade from GitLab 8.13.4 to GitLab 8.17.7
- Upgrade from GitLab 8.17.7 to GitLab 9.5.10
- Upgrade from GitLab 9.5.10 to GitLab 10.8.7
Things we've tried
- The push event that we've identified is not in the events table; queried via SQL. Total row count was 20-25.
select * from events where project_id=123;
-
events_for_migration
table had gone. -
Have asked for query that'll cross check the
push_events_payloads
andevents
tables, to see if there's orphaned records in the payloads table.
select count(*) from push_event_payloads where event_id not in (
select id from events
);
- I've also asked for db migration logs.
Data recovery option?
There's not a lot of activity on some of the customer's projects; the activity over a year old would clearly be of use to them.
If they recovered their events
table from 8.x into 10.x as events_for_migration
could they use the migration to repopulate [a] the missing recent events and [b] the history?
Does the migration check for the existing key before splitting the event and writing to events? IF it does, I imagine we could work around that by first pruning events_for_migration
via keys that are already in events
.
Steps to reproduce
(How one can reproduce the issue - this is very important)
Example Project
- Doesn't lend itself to reproduction on gitlab.com
- Have checked revisions of
prune_old_events_worker.rb
and don't see any revision. - Customer did the upgrades in 2019-11, and use most recent migration code available at that time.
What is the current bug behavior?
Upgrade steps defined result in activity data vanishing.
What is the expected correct behavior?
All activity data from the last year should be present and correct.
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
(not provided) Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:env:info
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production
)
Results of GitLab application Check
(not provided) Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)
Possible fixes
I've linked to the housekeeping and migration code above.