Consider save raw bytes from rugged/git into database
Particularly merge_request_diff_files.diff
column. For now we're trying to encode it in UTF-8, and possibly transcode it from a guessed encoding.
The problem of this is that we hit into a rugged bug gitlab-ce#35371, and then we wrongly guessed the encoding, causing gitlab-ce#35098 and the data was corrupted.
To think about it, if we're supporting non-UTF-8, we should just save the original bytes, regardless it's valid or not, because we might not be able to tell!
We should only encode or transcode whenever we're rendering it, so that we don't have to go back and fix the database if the data was corrupted there. For example, if we're not saving the corrupted data, gitlab-ce!12990 could just fix gitlab-ce#35098.
/cc @smcgivern