[go: up one dir, main page]

Intelligently handle case when same remote content repository is used multiple times in playbook

More than one documentation component can live in the same content repository. To use these components in the same site requires referencing the same content repository multiple times in the content sources of the playbook.

If the same remote content repository (meaning the same URL) is used multiple times in a playbook, the content aggregator should handle this case intelligently. Currently, it writes the bare git files for both instances into the same cache directory. While libgit2 protects against collisions (using temporary files), the work performed is still redundant (the slower operation overwrites files from the faster operation).

The content aggregator should do one of two things:

  1. It should use a unique cache directory per content source, even when the URL matches another entry
  2. It should detect the duplicate URL and only perform the clone/fetch operation on the first instance

Solution (1) would be easier, but means that the repository will have to be retrieved multiple times, which takes longer and consumes more disk space. But even if we don't do (1), I still think the cache directory name should be simplified.

Instead of trying to create a unique local path by cleaning the URL, I think we should name the cache folder <basename>-<sha1>.git, where <basename> is the last segment in the URL and <sha1> is the hash of the URL (and possibly + start path).