[go: up one dir, main page]

Skip to content

Explore CitusDB as a sharding solution

CitusDB is an extension to scale-out native Postgres.

Reliability-wise, CitusDB tolerates worker failures (transparently) and suggests to use a standard Postgres HA setup for the coordinator cluster.

License for open-source edition: GNU v3

One use-case for CitusDB is a multi-tenant application. This already led the thinking about exploring a "Tenancy Model" for GitLab and the discussion about partitioning (Read: "Migrating an existing app").

This is a proposal to explore CitusDB as a database sharding solution for GitLab.com.

In GitLab's case, most of the data resides in a namespace (projects, their issues, MRs, CI data, other data etc.) and around users. A top-level namespace is considered a candidate to think of as a "tenant" or the distribution key. There are cross-cutting concerns around user-related data that we might need to extract.

Goal

Understand feasibility, limitations, road blocks and design choices when migrating to CitusDB. The goal is to know

  1. what we'll have to do in order to migrate the application to CitusDB and
  2. whether or not CitusDB meets our expectations.

Approach

  • Start with GDK and try to load the initial schema into CitusDB
  • How do we shard
  • How do we distribute the data
  • How do we handle joins that don't fit our sharding strategy
  • Set up a cluster for testing CitusDB
  • Use data from staging for testing

Caveats

Worth to note that there are Postgres features that CitusDB currently does not support to execute across shards and we make use of.

PG Feature Usage in GitLab
Correlated subqueries ?
Recursive CTEs Yes
Table sample Yes
SELECT … FOR UPDATE Yes
Grouping sets ?
Window functions that do not
include the distribution column in PARTITION BY
?

Decision

Our decision to not continue with Citus exploration is outlined here: https://about.gitlab.com/handbook/engineering/development/enablement/database/doc/citus.html

Edited by 🤖 GitLab Bot 🤖