How to Analyze 3M+ GitHub events with DBT and Snowflake

Alexander Bolaño Cervantes
5 min readJan 18, 2024

Encouraging data people to extract valuable insights from GitHub files complex using SQL commands effortlessly

created with DALL-E

Introduction

In the ever-evolving landscape of data analytics, the sheer volume and complexity of unstructured data can often be intimidating. For analytic engineers and non-technical business professionals alike with SQL knowledge, navigating through vast datasets, especially those stemming from platforms like GitHub, might seem like a daunting task. The key to unlocking the potential within this sea of information lies in harnessing the power of efficient tools and methodologies leveraging the prowess of DBT (Data Build Tool) and Snowflake, I aim to showcase a streamlined and agile approach that empowers both analytic engineers and non-technical business individuals to extract meaningful insights from raw, unstructured data.

Target audience

Analytic Engineers, Data Engineers, Data Analysts, and any other Business people with SQL knowledge

Assumptions

  1. Basic Knowledge of SQL
  2. Familiarity with Snowflake
  3. Free Snowflake account
  4. GitHub’s events data has already been loaded into a Snowflake table (GH_RAW_FILE)

--

--

Alexander Bolaño Cervantes

Hey, 👋 my name is Alex I’m a Senior Data Engineer passionate for automating tasks , Big Data and cutting-edge Technologies