Building A Lakehouse: Implementing Medallion Architecture In Fabric

Let's use Medallion Architecture in Microsoft Fabric and build a Lakehouse using Pipelines and Dataflows. We'll also discuss the responsibilities and the structure of the Bronze, Silver and Gold layers of the OneLake.

Harun Legoz

Sep 4, 2023 9 min read

Fabric brings you many different technologies and concepts to mix and match so that you can find the most optimised solution for your data requirements. One of those concepts is the Lakehouse, and we'll be looking into how we can build a proper one in Fabric using the Medallion Architecture.

As we've covered "What is Lakehouse?" in a previous article and the Medallion Architecture in the "Designing Fabric Workspaces" post, I won't go into too much detail here. But to recap, Medallion Architecture stands on the promise of splitting your data into multiple layers with different responsibilities. If interested in reading further, you can read Databricks' Medallion Architecture article.

This article will demonstrate implementing the best practice for Medallion Architecture in Fabric, and although we'll walk through the steps, we won't build the actual pipelines here. Instead, I'll show you how everything will be connected to the Lakehouse in four steps:

Step 1: Designing the Lake
Step 2: Establishing the Tables
Step 3: Building the Pipelines
Step 4: Putting it all together

We will cover the overnight data-pulling scenario with Pipelines and Dataflows in this article, but future articles will be on Streaming datasets and connecting to other Azure data resources.

Step 1: Designing The Lake

Before going into what we're going to use to process the data, let's define our zones/layers in our OneLake:

Landing: A layer for incoming data to arrive, ready to be picked up by our Lakehouse ingestion process. Data is kept in the original file format, with a folder structure reflecting arrival metadata. The data is kept here temporarily and deleted after the Lakehouse ingests it.
Bronze/Raw: A layer for incoming data to be kept and archived for access. You keep the data as it comes to only store it in Delta format and in a hierarchy to access them easily (most commonly, date of arrival and data type)
Silver/Trusted: Raw data is translated into a more standardised format. You can split a single raw file into multiple files/tables to create a normalised relationship, or you can put together numerous raw files into a single table.
Gold/Curated: For business-level aggregations and analytics.

Read the full story

Already have an account? Sign in

Harun Legoz

I’m a cloud solutions architect with a coffee obsession. Have been building apps and data platforms for over 18 years, I also blog on Azure & Microsoft Fabric. Feel free to say hi on Twitter/X!

Building A Lakehouse: Implementing Medallion Architecture In Fabric

Step 1: Designing The Lake

Read the full story

Harun Legoz

6 New Features in Microsoft Fabric with September 2023 Update

Dataverse to Microsoft Fabric & OneLake Connectivity

Lakehouse Design for Automotive with Fabric: 5 - Fabric Benefits & Drawbacks

Lakehouse Design for Automotive with Fabric: 4 - Gold Data Products

Lakehouse Design for Automotive with Fabric: 3 - Trusted Data Products

That Blue Cloud