Landing data with Dataflows Gen2 in Microsoft Fabric Pipelines can transform the way you tackle data needs. Dataflows in Microsoft Fabric offer an easy and efficient way to move your data into your Data Warehouse. Simply connect to the relevant data sources, prepare and transform the data, and land it directly into your Lakehouse or use a data pipeline for other destinations.
But what is a dataflow? It's essentially a cloud-based ETL (Extract, Transform, Load) tool for building and executing scalable data transformation processes. Using Power Query Online, you can extract, transform, and load data from a wide range of sources, and load it into a destination. The fundamental role of a dataflow is to reduce data prep time and can then be loaded into a new table, included in a Data Pipeline, or used as a data source by data analysts.
Dataflows Gen2 revolutionizes the way you handle data. Serving as an efficient cloud-based ETL tool, it standardizes data, reduces data prep time, and promotes reusable ETL logic, all while offering a wide variety of transformations. With its capability to horizontally partition Dataflows, data engineers can create specialized datasets, optimizing data conversions. Although not an absolute replacement for data warehouses, Dataflows Gen2 carries a lot of potential in handling complex data tasks.
The text details how to use Dataflows Gen2 in Microsoft Fabric to get data into a data warehouse. Dataflows Gen2 allows the connection to various data sources, preparing and transforming the data for access, either directly landing it in your Lakehouse or using a data pipeline for other destinations. Dataflows are a type of ETL (Extract, Transform, Load) tool for executing scalable data transformation processes. They enable the extraction of data from different sources, its transformation via diverse operations, and loading it into a destination.
Using Power Query Online can provide a visual interface for these tasks. Importantly, a dataflow consists of all the transformations necessary to reduce data prep time, allowing it to be loaded into a new table, included in a data pipeline, or used as a data source for data analysts. Dataflows Gen2's objective is to offer a simple, reusable method for performing ETL tasks utilizing Power Query Online.
In an operational aspect, instead of using your preferred coding language, you can create a Dataflow Gen2 to extract and transform the data first, then load it into a Lakehouse or other destinations. The dataflow preserves all transformation steps, and adding a data destination to your dataflow is optional.
To perform other tasks or load data to a different location after a transformation, you can create a data pipeline and include the Dataflow Gen2 activity in your orchestration. Alternatively, you could use a Data Pipeline and Dataflow Gen2 for an ELT (Extract, Load, Transform) process. In this case, the data would be extracted and loaded into your preferred destination (like the Lakehouse) using a pipeline, before creating a Dataflow Gen2 to connect to the Lakehouse data, cleanse and transform it.
Microsoft Fabric specialist, Dataflows Gen2 expert, Power Query Online professional, Data Warehouse developer using Microsoft, Microsoft data transformation specialist.