This document provides details on caching options available for Cloud Storage FUSE and how each cache type can be configured.
To help increase the performance of data retrieval, Cloud Storage FUSE offers four types of optional caching. Use the following table to learn more about each type of caching:
Caching type | Description |
---|---|
File caching | Accelerates file data reads for read-heavy workloads that repetitively access data, especially artificial intelligence and machine learning training where the same large files are read multiple times, significantly reducing latency. |
List caching | Accelerates directory listing operations for workloads that frequently list the entire contents of a directory, such as iterating over a large set of files at the beginning of a processing job, improving the speed of directory traversal. |
Stat caching | Accelerates file metadata operations for applications that frequently check file attributes, which is common for many applications that repeatedly check if a file has changed, reducing the number of `GetMetadata` calls for Cloud Storage. |
Type caching | Accelerates file or directory existence checks for workloads that perform many existence checks or path lookups, improving latency by reducing the number of requests made to Cloud Storage to check if a path exists. |
Considerations
Enabling caching can increase performance but reduce consistency, which usually occurs when you access the same bucket using multiple clients with a high change rate. To reduce the impact on consistency, we recommend mounting buckets as read-only. To learn more about caching behavior, see Cloud Storage FUSE semantics in the Cloud Storage FUSE GitHub documentation.
To avoid cache thrashing, ensure that your entire dataset fits into the cache capacity. Also, consider the maximum capacity and performance that your cache media can provide. If you hit the provisioned cache's maximum performance, capacity limit, or both, it's beneficial to read directly from Cloud Storage which has much higher limits than Cloud Storage FUSE.
Read path for cached data
The Cloud Storage FUSE cache accelerates repeat reads after they've been ingested to the cache. Both first-time reads and cache misses go directly to Cloud Storage and are subject to normal Cloud Storage network latencies. To improve first-time read performance, see Pre-populate the metadata cache.
What's next
Learn more about each caching type: