Skip to main content

Data lineage

Overview

One of the key features of Skypoint AI configured dbt models is the ability to track data lineage. Data lineage refers to the origin, history, and transformations of data as it moves from its source to its destination. Data lineage offers a comprehensive understanding of the journey of data within an organization, including its transformations and utilization.

View the data lineage

Data lineage is represented as a Directed Acyclic Graph (DAG) of relationships between models, which are the building blocks of a dbt project. Each model represents a set of transformations that are applied to one or more source tables, and the lineage is captured by recording the dependencies between models.

Follow the below steps to view the data lineage:

  1. Go to Dataflow > Transformations.
  2. Select the specific Output model name you want to view.

The Transformation details page appears.

Alt image

The DAG view provides an overview of the relationships between models, making it easier to understand the structure of the data pipeline. For example, a data pipeline can process data from an e-commerce website in dbt and represents a DAG of relationships between models. In a DAG workflow, the dbt data lineage for an e-commerce business can include a source table ecommercecontact and two models: stg_customers and dim_customers.

  • ecommercecontact: This is the source table that contains raw customer data from the e-commerce database.
  • stg_customers: This model depends on the ecommercecontact table and performs initial data cleaning and preparation operations, such as removing duplicates, fixing data types, and standardizing values.
  • dim_customers: This model depends on the stg_customers model and performs additional data cleaning and preparation operations, such as splitting names into separate first and last name fields and calculating the age of each customer. The dim_customers model also integrates data from other table sources, such as ecommerceloyaltypoints, ecommercepurchases, to create a comprehensive customer profile.

This DAG represents the relationships between the source table and the two models and shows how the dim_customers model is derived from the ecommercecontact table through the intermediate step of preparing the data in the stg_customers model. Visualizing these relationships simplifies understanding the flow of data, the transformations applied, and troubleshooting any issues in the data pipeline.