[go: up one dir, main page]

Skip to content

Expensive SQL queries when distributed reads are enabled

In gitlab#227215 (comment 375428748), we observed that the Praefect Cloud SQL database (8-core, 30 GB RAM) pegged out at CPU at 100%, possibly leading to lots of stalled RPCs and job failures, particularly in the www-gitlab-com repo:

image

I think the query from https://gitlab.com/gitlab-org/gitaly/blob/26bae69b6bc2ddbcfa94811eb939441acabffe76/internal/praefect/datastore/queue.go#L396-405 may need to be optimized:

SELECT 
  storage 
FROM 
  (
    SELECT 
      DISTINCT ON (job ->> 'target_node_storage') job ->> 'target_node_storage' AS storage, 
      state 
    FROM 
      replication_queue 
    WHERE 
      job ->> 'virtual_storage' = 'praefect-file01' 
      AND job ->> 'relative_path' = '@hashed/fa/53/fa539965395b8382145f8370b34eab249cf610d2d6f2943c95b9b9d08a63d4a3.git' 
    ORDER BY 
      job ->> 'target_node_storage', 
      updated_at DESC NULLS FIRST
  ) t 
WHERE 
  state = 'completed';

EXPLAIN output: https://explain.depesz.com/s/tufT

Edited by Stan Hu
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information