PyTorch and the PyPI supply chain

Posted Sep 11, 2023 19:49 UTC (Mon) by snnn (guest, #155862)
In reply to: PyTorch and the PyPI supply chain by irvingleonard
Parent article: PyTorch and the PyPI supply chain

I know why PyTorch is in a separated index:

1. Their packages are huge. One file could be 1-2 GB. But PyPI is free. PyPI cannot be so generous to provide so much free storage for every PyPI project.
2. You may build PyTorch with different build configs. For example, different CUDA versions. PyTorch community wants to keep all of them under the same name: pytorch. Otherwise it would harder to other packages to setup dependency on PyTorch. Therefore, PyTorch chose two different approaches: 1. put all of them in the same index and use local version to distinguish them. 2. Put each of them into a different index. However, both approaches are not supported by PyPI.

This problem is a very general. Almost all machine learning packages with GPU acceleration capabilities need to deal with this. I believe every non- casual user should setup their private pypi index. Even the original problem is fixed, as long as you have multiple indexes, you are still at risk. You may think the problem in a different way: how much can I trust the Facebook's pypi index servers? What if someone puts a fake "wheel" package in PyTorch's PyPI index? Don't think no Facebook employee's account can be hacked if you still could remember last year Nvidia lost their GPG key.