[go: up one dir, main page]

Skip to content

Set up a Snowplow log parser for stored event data

Description

We'd like to use Snowplow for tracking pageviews and events on GitLab.com. In Snowplow, trackers fire events, which are received and logged by collectors. Trackers send data to collectors by making a GET request for a tracking pixel.

Once log data is sent to a collector, we then need a runner to parse/clean the log data and send it to S3. Once it's in S3, we can ETL it into our data warehouse where it can be visualized in Looker.

Proposal

This step is referred to as "Enrich" in Snowplow's pipeline, which is detailed in their documentation and this meta issue. It seems likely that we'll use Snowplow's EmrEtlRunner for this.

This step is dependent on setting up a collector and having logfiles to enrich by tracking events and sending them to the collector.

As noted in the setup documentation for EmrEtlRunner, we'll need to:

  • Install and host EmrEtlRunner somewhere.
  • Configure it to process and enrich data from the collector, and schedule it to run periodically.

Specific configuration/enrichments are TBD.

Links / references