Memory distribution of qindx_B in BSE is broken

It was implemented for calculation with very large k-meshes and few bands.

In such cases qindx_B is almost as large as BSE and needs to be distributed in memory. I/O on the fly is performed during the calculation. Not very efficient, but it makes possible to run in parallel.

Relevant lines of code:

       j_k_bz_mem=PAR_K_scheme%bz_index(j_k_bz)                         
       if (j_p_bz_last/=j_p_bz.or.j_k_bz_last/=j_k_bz) then             
         j_p_bz_last=j_p_bz                                             
         j_k_bz_last=j_k_bz                                             
         if (j_k_bz_mem==0) then                                        
           !DEV_OMP critical                                            
           qindx_tmp=qindx_B_load(j_p_bz,j_k_bz,qindx_ID_frag)          
           j_q_W_bz=qindx_tmp(1)                                        
           j_g_W   =qindx_tmp(2)                                        
           !DEV_OMP end critical                                        
         else                                                           
           j_q_W_bz=qindx_B(j_p_bz,j_k_bz_mem,1)                        
           j_g_W   =qindx_B(j_p_bz,j_k_bz_mem,2)                        
         endif                                                          
         j_q_W_bz_last=j_q_W_bz                                         
         j_g_W_last   =j_g_W                                            
       else                                                             
         j_q_W_bz=j_q_W_bz_last                                         
         j_g_W   =j_g_W_last                                            
       endif

This feature is activated in input with the variable

PAR_def_mode="KQmemory"

when performing a BSE run with distribution over kpts.

At present it is however broken