That Blue Cloud

Loading Data Into OneLake via ADLS Gen2 SFTP

ADLS Gen2 has a neat feature allowing you to expose an SFTP endpoint from your storage. Any files uploaded land on the storage account directly. This feature isn’t available in OneLake, but that doesn’t mean you can’t combine ADLS Gen2 and OneLake to achieve SFTP ingestion.
Loading Data Into OneLake via ADLS Gen2 SFTP
Loading Data Into OneLake via ADLS Gen2 SFTP

Microsoft recommends splitting your Landing layer of Data Lake into a separate storage account, and for good reason: You don’t need to expose your Data Lake to all the third parties just because you want them to send their data to you. You can configure a Landing storage account with a different folder & permission structure, then move the landed datasets to your Data Lake when ready. In Fabric’s case, you can use an ADLS Gen2 account as your Landing layer and then move the data onto OneLake.

Depending on how frequently you would like to move the data, you have two options:

1.Immediate Transfer Using Blob Events and ADF

If you need the data to be immediately transferred to your OneLake, you can use the blob events raised by the storage account and trigger ADF pipelines that would copy your files to the OneLake target. It would be almost instantaneous.

Using Blob Events and Data Factory

SFTP events are a tad different than regular blob events. You would need to filter your blob events raised by your storage account using the data.api property and only include SftpCommit events, so you wouldn’t get an empty file when SFTP first creates it. This way, you would get the event only when SFTP finishes the file upload and commits the file. Details are available in the articles here and here.

2.Scheduled Transfer Using Fabric Pipelines

Alternatively, you can transfer the data from your Landing account to your OneLake on a schedule, like daily jobs. It’s useful if you don’t want to process the data as it comes and would like to process it in batch. That would cost less and with less monitoring effort. Fabric Pipelines and Dataflows can achieve this very easily.

Using Fabric Pipelines

Sadly, Fabric Pipelines don’t support event triggers yet, so you can’t trigger your pipelines using blob events.

Bonus: Use Dataflows and transform to Bronze directly

If you don’t want to keep the files in the original format by copying them into OneLake with binary copy, you can use Shortcuts to link Landing to your OneLake. That way, you can directly use Dataflows and transform the data on the air, allowing you to write directly into your Bronze layer in Delta format.

Or, you can directly connect to the Landing account using the ADLS Gen2 adapter within the Dataflows if you don’t fancy Shortcuts. That should achieve the goal the same way.

Keep in mind that this approach would only work for Scheduled jobs. You can’t trigger Fabric Pipelines or Dataflows using events.

ADLS Gen2, Shortcuts & Dataflows into OneLake

Conclusion

There are many flavours of ingesting data into Fabric, but SFTP is a frequent use case if you work with third-party integrations. It’s very secure and versatile, and with PGP encryption added to it, it’s hard to beat.

How do you plan to handle SFTP requirements in your Fabric tenant? Talk to me in the comments!

Harun Legoz

Harun Legoz

I’m a cloud solutions architect with a coffee obsession. Have been building apps and data platforms for over 18 years, I also blog on Azure & Microsoft Fabric. Feel free to say hi on Twitter/X!

That Blue Cloud

Design awesome data platforms using Microsoft Fabric

That Blue Cloud

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to That Blue Cloud.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.