[go: up one dir, main page]

Update Snowflake timestamp used in extraction and increase frequency of load

  1. usage_billing_enriched is currently extracted based on the value present in the Timestamp column (ie. we export a record if Timestamp is set to a time on the day before the day of extraction). I think it is a good idea to switch to enriched_at for the incremental extraction in usage_billing_enriched. My understanding is that events are created first and enriched shortly(?) afterward. If the extraction job filters on the event’s creation Timestamp, we could miss events that were created before the extraction ran but only became enriched afterward, since the extractor has already moved past that timestamp window. Using EnrichedAt should avoid this issue

  2. We are currently exporting data from ClickHouse to Snowflake once daily. Given the high visibility of this data, I'd like to explore exporting it every 6 hours instead. This would provide more timely updates and prevent file sizes from becoming too large as data volumes grow