[go: up one dir, main page]

Skip to content

[META] Snowplow Analytics for GitLab.com

Description

We've made the decision to pursue the open-source Snowplow Analytics for tracking events on GitLab.com. These events will be pushed to our data warehouse, and visualized in Looker.

This meta issue is an SSOT for the tasks needed. Our goal is to have a working pipeline by July 7th, where we're tracking a small handful of events in GitLab.com and are able to successfully visualize them in Looker.

Setup

Snowplow's guide to their pipeline is here, visualized by the following diagram:

image

Since we have an existing data warehouse (PostgreSQL on Cloud SQL...?), our primary concern is handling setup through subsystem 4, and getting tracked event data stored somewhere - ideally Cloud SQL, possibly S3.

1: Tracking

2: Collection

3: Enriching

4: Storage

5: Modeling

6: Analytics

Related security review: https://gitlab.com/gitlab-com/security/issues/114

Once the cleaned and enriched data is in S3, we can ETL it into our data warehouse on a regular basis. We can then visualize it in Looker. Great success!

Edited by Jeremy Watson (ex-GitLab)