Making a check point before rolling back to previous version, because there is something not working properly
Also note that everything can be done 2x faster, but using much more RAM if we use the fact that A_in is symmetric (implemented here, but not in parallel)