Rethinking caching in Pages

It was noticed in https://gitlab.com/gitlab-com/gl-infra/capacity-planning/-/issues/85#note_970729862 that the cache will keep growing as pages gets more usage. This leads to scenarios where the memory gets saturated and we need to increase the memory requests as we did in https://gitlab.com/gitlab-com/gl-infra/capacity-planning/-/issues/85.

It was suggested to use a LRU in the mix. It would allow us to set the limit on the memory consumption. And we'll probably spend memory more efficiently. But then we'll need to monitor hit rates instead of memory saturation. And once hit-rates go low we'll need to expand the memory or do something else. Adding such a change would require a significant refactoring of the zip VFS to work with an LRU along with other caches available in pages.

It was also noted that the library patrickmn/go-cache is not actively being maintained anymore and the last release dates Oct 2017. It would be worth exploring changing the cache library as well. We need to check if it's worth the effort.

While rethinking caching in Pages, we should also look into using centralized caching mechanism like Redis. It introduces a new dependency but it might be worth exploring this alternative considering the following:

Single cache for all pods
Shared domain configuration in all pods, reducing the number of requests to Rails
Shared archive information for the Zip VFS, reducing the number of requests to GCS per pod
Opens up avenue for different modes of rate limiting with SSoT for usage data available in centralized data store like Redis

Perhaps we're now at a point in time where having a Redis cache makes sense.

Edited Jun 08, 2022 by Vishal Tak