Rethinking caching in Pages
It was noticed in https://gitlab.com/gitlab-com/gl-infra/capacity-planning/-/issues/85#note_970729862 that the cache will keep growing as pages gets more usage. This leads to scenarios where the memory gets saturated and we need to increase the memory requests as we did in https://gitlab.com/gitlab-com/gl-infra/capacity-planning/-/issues/85.
It was suggested to use a LRU in the mix. It would allow us to set the limit on the memory consumption. And we'll probably spend memory more efficiently. But then we'll need to monitor hit rates instead of memory saturation. And once hit-rates go low we'll need to expand the memory or do something else. Adding such a change would require a significant refactoring of the zip VFS to work with an LRU along with other caches available in pages.
It was also noted that the library patrickmn/go-cache
is not actively being maintained anymore and the last release dates Oct 2017. It would be worth exploring changing the cache library as well. We need to check if it's worth the effort.
While rethinking caching in Pages, we should also look into using centralized caching mechanism like Redis. It introduces a new dependency but it might be worth exploring this alternative considering the following:
- Single cache for all pods
- Shared domain configuration in all pods, reducing the number of requests to Rails
- Shared archive information for the Zip VFS, reducing the number of requests to GCS per pod
- Opens up avenue for different modes of rate limiting with SSoT for usage data available in centralized data store like Redis
Perhaps we're now at a point in time where having a Redis cache makes sense.