Disclaimer
This blog post contains information about CU usage in Microsoft Fabric. Since I am not a specialist in capacity usage and the Usage Metrics App is still evolving, please consider this information as indicative rather than definitive. Additionally, the choice of technology always depends on best practices, specific frameworks, and the customer’s unique situation.
Pleas also consider the limited amount of data. These extension tests are based on the WWI-Sample database and contains only a few tables of the Sales area:

Idea
Following my previous blog post, I’d like to explore alternative methods for data ingestion using Microsoft’s World Wide Importers sample database. My goal is to compare the CU usage across different ingestion technologies in Microsoft Fabric. In addition, I aim to offer recommendations and ratings for each technology. This post will provide a quick overview of various ingestion approaches based on an Azure SQL Database, with the hope that it will help you with your own projects.
Results
I aim to not only assess the CU usage of the different technologies but also evaluate the overall duration of the ingestion process, the flexibility, and the intuitiveness of each respective technology.
CU usage
Let’s address the elephant in the room first. Phew, I must admit that I found it quite challenging to get accurate numbers for the CU usage. For example, with ingestion via the notebook, the service took some time just to start the session. I couldn’t find an easy way to separate the CU usage for session startup from the actual execution. Despite this, I did some retesting and believe I got at least close to the correct figures. Of course, CU usage can vary depending on how it’s implemented in your framework and your specific projects. For instance, if you start a new session for each notebook to load data, the CU usage overhead might be higher.
Looking at the metrics app, I found that Dataflow Gen2 seems to consume the most CUs, while notebooks perform significantly better than Data Factory Pipelines with copy activities—this disparity is what motivated me to write this blog post. This fact brings us to an important takeaway: If you’re dealing with many small tables, consider switching from Pipelines to Notebooks. Pipeline Copy Activities always round up to a full minute (as I explained in my previous blog post), which can lead to unnecessary CU consumption. Moving to Notebooks can help you optimize both performance and costs.
Database mirroring is a bit of a deal-breaker, considering that the CU usage includes the initial setup of the mirrored database. However, it’s also difficult to determine which activities are strictly related to the mirroring, as there were numerous other operations happening in the target lakehouse (where I loaded the data). I focused on the relevant timestamps and took all the lakehouse operations into account (WWI-Sample represents the mirrored database).

Nevertheless, here are the results for the CU usage:

Microsoft describes database mirroring as a low-cost technology, which is absolutely true—the initial mirroring process is very resource-efficient. However, using a mirrored database isn’t always an option in every environment. In my real-world scenario, which led me to this topic, mirroring wasn’t viable because I needed a complete capture of the entire day as Parquet files in my destination lakehouse for archival purposes. Achieving this efficiently requires full control over the ingestion process, which isn’t easily possible with mirroring.
Let’s have a look at the other evaluation factors.
Overall duration
In terms of overall duration, notebook execution was the fastest, completing the ingestion of all Sales tables from the WWI-Sample database in just 44 seconds. This was followed by Dataflows Gen2 at 49 seconds and Data Factory Pipelines at 88 seconds. The entire database mirroring process took about 5 minutes from setup to the first full synchronization, but once configured, it could be extremely fast for handling new data in near real-time.
Type | Duration [s] |
---|---|
Notebook | 44 |
Dataflow Gen 2 | 49 |
Data Factory Pipeline | 88 |
Mirrored DB | 300* |
Flexibility
If we focus more on real-time projects—like mine originally was—both Dataflows and the Mirrored DB are immediately out of consideration. I’ve already mentioned the need for an “archive” function, but when it comes to Dataflows, the main drawback for me is the lack of programmability. I wasn’t able to easily ingest all tables within the Sales schema of the database. Yes, Power Query offers extensive transformation capabilities, but when it comes to pure data ingestion, the process feels too rigid for my needs.
That said, I still want to give some credit for how configurable each ingestion method is in adapting to specific requirements. This brings me to my (very personal) evaluation:
Type | Flexibility |
---|---|
Notebook | ★★★★★ |
Dataflow Gen 2 | ★★ |
Data Factory Pipeline | ★★★★ |
Mirrored DB | ★★ |
You may see that I really like working with notebooks. I was amazed that I can ingest the data with just a few lines of code. The following code snippet is just the ingestion of the tables using the SQL server data dictionary, all lines of code before were just setting up the database connection and importing libraries.
Imagine what you can do with all the custom libraries in Python!
# Define the SQL query or table name
query = " (SELECT TABLE_SCHEMA, TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'Sales') as qry" # Replace with your schema and table name
# query = " SELECT 1"
# Load data from SQL Server into a Spark DataFrame using the ODBC driver
df = spark.read.jdbc(url=jdbc_url, table=query, properties=connection_properties)
# Show the first few rows of the DataFrame
for tbl in df.collect():
if not tbl[1].endswith("_Archive"):
df_read = df = spark.read.jdbc(
url=jdbc_url, table=f"{tbl[0]}.{tbl[1]}", properties=connection_properties
)
df.write.mode("overwrite").parquet(f"Files/raw_ntb/{tbl[1]}")
Data Factory Pipelines offer a high degree of customization through loops, parameters, variables, and more. However, I see notebooks as the clear winner in this regard. As mentioned earlier, the mirroring process has very limited flexibility, and Dataflows remain quite static.
Intuitiveness
Let’s look at this from another perspective. What if your company doesn’t have any Python expertise? What if you want to enable employees to set up data ingestion processes on their own, without needing deep technical knowledge?
In this scenario, I see Dataflows as the clear winner. With its visual interface for transforming data, it provides the most user-friendly experience, making it the easiest ingestion method for non-technical users. Mirroring is also quite intuitive, but it’s more of a straightforward selection process—choosing which tables to ingest rather than configuring an entire workflow.
Data Factory Pipelines, on the other hand, require a deeper understanding of the technology, including how expressions and functions work within Azure Data Factory. Notebooks demand even more technical expertise, as a basic understanding of Python is essential. While they offer the most flexibility, they are not the ideal choice for teams without programming knowledge.
So, when ease of use is the priority, Dataflows stand out as the most accessible option.
Type | Intuitiveness |
---|---|
Notebook | ★ |
Dataflow Gen 2 | ★★★★★ |
Data Factory Pipeline | ★★★ |
Mirrored DB | ★★★★ |
Conclusion
As always, choosing the right technology depends on the specific needs of your project and customer scenario. In my case, I’m leaning towards using notebooks within our framework for data ingestion. But before you jump in, ask yourself: Are your developers ready for this technology?
Finding the sweet spot between capacity usage, ease of use, and flexibility is like trying to balance a three-legged stool—lean too far in one direction, and you might end up on the floor. My advice? Start with some Proof-of-Concept cases to test the waters. That way, you’ll have a solid foundation to pick your favorite data ingestion method in Fabric—without too many surprises along the way!