Blog @ Ivo's place – All about data.

TMDL demo 2/3: Change Tracker

September 16, 2025September 16, 2025 IvoUncategorizedLeave a Comment

Disclaimer

This blog post is part of a three-part series based on my speaker session at the European Microsoft Fabric Community Conference 2025 (FabCon) in Vienna, which I had the pleasure of delivering together with my colleague Roger Unholz.

Our session, titled “TMDL Playoffs”, was a fast-paced showdown where we shared our favorite tips and tricks for working with TMDL. In this series, I’ll take a deeper dive into each topic we presented on stage and explore them in more detail.All source files referenced throughout the posts are available in my public GitHub repository.

Please note, that the scripts are not tested for every scenario, that they are only created for demonstration purposes and should never be applied on any running, productive enviroments.
Link: ivsch/TMDLPlayoffs.

Situation

For semantic models in Power BI, it is always recommended to maintain a proper version history. Now, I am not talking about version control—we can already achieve that using TMDL and a proper repository. This post is about tracking the semantic model from a business point of view. This means that customers can see the changes to a report in a manual table, which could also be displayed in a Power BI report using the semantic model.

Since I don’t want to spend time re-investigating every small change I’ve made to a semantic model, I still need a way for customers to clearly see what has changed in their semantic models.

For this demo, I already have a Power BI semantic model saved in TMDL format with a manual table for the semantic model version (file is in GitHub folder “Ivo” > “SM Change Tracker”). In the report view of the Power BI file, there is also a page showing the visual representation of the table.

Idea

As TMDL provides a proper definition of all objects within a semantic model, it is a solid basis for tracking changes from one version to another. What I’d like to do is compare all the changes I have made in my local model with the one published in a Power BI workspace.

For this comparison, I’d like to follow this process:

Download the published semantic model via the Fabric REST API and store it locally.
Compare the TMDL definitions and generate a difference report.
Send the diff report to an LLM, which will summarize the differences in a clear, understandable way. The LLM should also classify each change as minor or major in order to properly increase the version number.
Write the new version number and description back to the local semantic model. As the target table is a manual table, the data is stored within the TMDL definition and can easily be updated.
Publish the updated report again to the destination (this step remains manual for now).

Prerequisites for using the REST API

Before you can call the Microsoft Fabric API with a service principal (like in this demo), a few prerequisites need to be set up. First, create an app registration in Entra ID (this is your service principal). Then, generate a client secret and keep both the ID and secret safe — these will be used for authentication.

Next, place the app registration into a security group that has access to Fabric/Power BI. In the Azure Portal, link this security group to the app registration (important: this step is done in the Azure Portal, not just inside Entra ID). You’ll also need the Tenant ID (and the ctid value mentioned in the setup) to include in your API calls.

Finally, check the Power BI Admin settings to ensure service principals are allowed to use APIs. Without this toggle enabled, the service principal won’t be able to access Fabric resources programmatically.

In short: register the app, generate a secret, put it into the right security group, grab the tenant details, and confirm admin permissions. Once those are in place, the service principal can authenticate directly to the Fabric API.

Solution

Because the process involves authentication, downloading model definitions, and file handling, a Jupyter notebook is used to orchestrate everything in one place. This could also run in any Jupyter environment, also directly inside Fabric to keep the workflow closer to the data and services.

The following text represantion just descibes some part of the notebook code. The notebook is called compare-sm.ipynb and also available in the GitHub repository.

Download the published semantic model

Authenticates against Fabric with the service principal (client ID, secret, tenant ID).
Calls the Fabric API getDefinition to export the current published semantic model in TMDL format.
Saves all model definition files into a definition_published/ folder locally.

Compare local vs. published versions

Loads your local semantic model (from the .pbip project).
Compares it line by line with the published TMDL version.
Produces a structured diff report (index.md) showing what was added, removed, or changed.

Summarize changes with an LLM

The diff report is sent to an Awan LLM (Meta-Llama-3.1).
The LLM generates a short, business-friendly summary of the changes (max 50 words).
It also labels the change as [MAJOR] (structural, new objects) or [MINOR] (renames, small tweaks).

Parse the LLM output

Extracts the summary text and the change tag (MAJOR/MINOR).

Update the version log inside the semantic model

Re-encodes and writes the updated version history back into the TMDL file (with a backup saved).
Opens the Semantic Model Version.tmdl file, which stores version history in a compressed payload.
Reads existing version numbers and finds the last version.
Bumps the version automatically:
- MAJOR → increments the major version (1.3 → 2.0).
- MINOR → increments the minor version (1.3 → 1.4).
Appends a new row with the new version number and the LLM-generated description.

TMDL demo 1/3: RLS Generator

September 16, 2025September 16, 2025 IvoFabric, Power BILeave a Comment

Disclaimer

Idea

The configuration of Role Level Security (RLS) within Power BI semantic models can sometimes be straightforward, but at other times it can feel like opening Pandora’s box. In most customer scenarios I encounter, the RLS setup is fairly basic. Nevertheless, managing RLS across a large number of semantic models can be a real challenge, especially when ensuring that the correct settings are applied consistently.

To address this, I often maintain role definitions outside of the semantic model to provide a clear overview of all enabled RLS settings. That could be a Excel file as well as a small “security application”. This should then be the basic information for a simple script generator to quickly and reliably apply the defined roles to a Power BI semantic model file.

TMDL RLS definition

To check existing RLS definitions in your Power BI semantic model, first enable the TMDL view in Power BI Desktop settings (currently available under Preview features).

On the right side of your semantic model, you’ll find all the objects available in the TMDL scripting language. If you drag and drop the Roles object into the script window, you’ll see the definition of the roles.

Outcome

Even if it’s not a complete solution for handling complex RLS definitions, the most common/basic scenarios can be covered with the Jupyter Notebook apply_rls in the GitHub source folder.

In the very first cell, you can provide all settings for the roles you want to apply. At the moment, the script only works with string values; it can be extended/adjusted to support other data types.

RLS_DEFINITION = [
    {
        "role_name":"CustomerSegmentManagersEnterprise",
        "table_name":"DimCustomer",
        "rls_field_name":"Segment",
        "allowed_values":["Consumer", "Enterprise"]
    },
    {
        "role_name":"CustomerSegmentManagersSMB",
        "table_name":"DimCustomer",
        "rls_field_name":"Segment",
        "allowed_values":["SMB"]
    }
]


RLS_DEFINITION = [
    {
        "role_name":"CustomerSegmentManagersEnterprise",
        "table_name":"DimCustomer",
        "rls_field_name":"Segment",
        "allowed_values":["Consumer", "Enterprise"]
    },
    {
        "role_name":"CustomerSegmentManagersSMB",
        "table_name":"DimCustomer",
        "rls_field_name":"Segment",
        "allowed_values":["SMB"]
    }
]

This configuration generates a script for two roles—CustomerSegmentManagersEnterprise and CustomerSegmentManagersSMB—that filters the DimCustomer[Segment] column to the specified values.

With those settings, the following output is provided and can be taken over to existing semantic model files.

Generated createOrReplace TMDL script:

createOrReplace

	role CustomerSegmentManagersEnterprise
		modelPermission: read

		tablePermission DimCustomer = [Segment] == "Consumer" || [Segment] == "Enterprise"


	role CustomerSegmentManagersSMB
		modelPermission: read

		tablePermission DimCustomer = [Segment] == "SMB"

Generated createOrReplace TMDL script:

createOrReplace

	role CustomerSegmentManagersEnterprise
		modelPermission: read

		tablePermission DimCustomer = [Segment] == "Consumer" || [Segment] == "Enterprise"


	role CustomerSegmentManagersSMB
		modelPermission: read

		tablePermission DimCustomer = [Segment] == "SMB"

Extendability

With further extensions, this functionality could be applied to an application that centrally defines roles and their respective security settings. This would simplify the management of RLS settings across multiple semantic models. In a pro version, you could even imagine downloading the Power BI semantic model, editing the TMDL role definitions, and re-uploading it via the Fabric REST API.

Clash of Data Ingestion: Comparing CU Usage (and other factors) in Fabric

February 11, 2025February 12, 2025 IvoFabricLeave a Comment

Disclaimer

This blog post contains information about CU usage in Microsoft Fabric. Since I am not a specialist in capacity usage and the Usage Metrics App is still evolving, please consider this information as indicative rather than definitive. Additionally, the choice of technology always depends on best practices, specific frameworks, and the customer’s unique situation.

Pleas also consider the limited amount of data. These extension tests are based on the WWI-Sample database and contains only a few tables of the Sales area:

Idea

Following my previous blog post, I’d like to explore alternative methods for data ingestion using Microsoft’s World Wide Importers sample database. My goal is to compare the CU usage across different ingestion technologies in Microsoft Fabric. In addition, I aim to offer recommendations and ratings for each technology. This post will provide a quick overview of various ingestion approaches based on an Azure SQL Database, with the hope that it will help you with your own projects.

Results

I aim to not only assess the CU usage of the different technologies but also evaluate the overall duration of the ingestion process, the flexibility, and the intuitiveness of each respective technology.

CU usage

Let’s address the elephant in the room first. Phew, I must admit that I found it quite challenging to get accurate numbers for the CU usage. For example, with ingestion via the notebook, the service took some time just to start the session. I couldn’t find an easy way to separate the CU usage for session startup from the actual execution. Despite this, I did some retesting and believe I got at least close to the correct figures. Of course, CU usage can vary depending on how it’s implemented in your framework and your specific projects. For instance, if you start a new session for each notebook to load data, the CU usage overhead might be higher.

Looking at the metrics app, I found that Dataflow Gen2 seems to consume the most CUs, while notebooks perform significantly better than Data Factory Pipelines with copy activities—this disparity is what motivated me to write this blog post. This fact brings us to an important takeaway: If you’re dealing with many small tables, consider switching from Pipelines to Notebooks. Pipeline Copy Activities always round up to a full minute (as I explained in my previous blog post), which can lead to unnecessary CU consumption. Moving to Notebooks can help you optimize both performance and costs.

Database mirroring is a bit of a deal-breaker, considering that the CU usage includes the initial setup of the mirrored database. However, it’s also difficult to determine which activities are strictly related to the mirroring, as there were numerous other operations happening in the target lakehouse (where I loaded the data). I focused on the relevant timestamps and took all the lakehouse operations into account (WWI-Sample represents the mirrored database).

Nevertheless, here are the results for the CU usage:

Microsoft describes database mirroring as a low-cost technology, which is absolutely true—the initial mirroring process is very resource-efficient. However, using a mirrored database isn’t always an option in every environment. In my real-world scenario, which led me to this topic, mirroring wasn’t viable because I needed a complete capture of the entire day as Parquet files in my destination lakehouse for archival purposes. Achieving this efficiently requires full control over the ingestion process, which isn’t easily possible with mirroring.

Let’s have a look at the other evaluation factors.

Overall duration

In terms of overall duration, notebook execution was the fastest, completing the ingestion of all Sales tables from the WWI-Sample database in just 44 seconds. This was followed by Dataflows Gen2 at 49 seconds and Data Factory Pipelines at 88 seconds. The entire database mirroring process took about 5 minutes from setup to the first full synchronization, but once configured, it could be extremely fast for handling new data in near real-time.

Type	Duration [s]
Notebook	44
Dataflow Gen 2	49
Data Factory Pipeline	88
Mirrored DB	300*

Flexibility

If we focus more on real-time projects—like mine originally was—both Dataflows and the Mirrored DB are immediately out of consideration. I’ve already mentioned the need for an “archive” function, but when it comes to Dataflows, the main drawback for me is the lack of programmability. I wasn’t able to easily ingest all tables within the Sales schema of the database. Yes, Power Query offers extensive transformation capabilities, but when it comes to pure data ingestion, the process feels too rigid for my needs.

That said, I still want to give some credit for how configurable each ingestion method is in adapting to specific requirements. This brings me to my (very personal) evaluation:

Type	Flexibility
Notebook	★★★★★
Dataflow Gen 2	★★
Data Factory Pipeline	★★★★
Mirrored DB	★★

You may see that I really like working with notebooks. I was amazed that I can ingest the data with just a few lines of code. The following code snippet is just the ingestion of the tables using the SQL server data dictionary, all lines of code before were just setting up the database connection and importing libraries.

Imagine what you can do with all the custom libraries in Python!

# Define the SQL query or table name
query = " (SELECT TABLE_SCHEMA, TABLE_NAME  FROM INFORMATION_SCHEMA.TABLES  WHERE TABLE_SCHEMA = 'Sales') as qry"  # Replace with your schema and table name
# query = " SELECT 1"
# Load data from SQL Server into a Spark DataFrame using the ODBC driver
df = spark.read.jdbc(url=jdbc_url, table=query, properties=connection_properties)

# Show the first few rows of the DataFrame
for tbl in df.collect():
    if not tbl[1].endswith("_Archive"):
        df_read = df = spark.read.jdbc(
            url=jdbc_url, table=f"{tbl[0]}.{tbl[1]}", properties=connection_properties
        )
        df.write.mode("overwrite").parquet(f"Files/raw_ntb/{tbl[1]}")

Data Factory Pipelines offer a high degree of customization through loops, parameters, variables, and more. However, I see notebooks as the clear winner in this regard. As mentioned earlier, the mirroring process has very limited flexibility, and Dataflows remain quite static.

Intuitiveness

Let’s look at this from another perspective. What if your company doesn’t have any Python expertise? What if you want to enable employees to set up data ingestion processes on their own, without needing deep technical knowledge?

In this scenario, I see Dataflows as the clear winner. With its visual interface for transforming data, it provides the most user-friendly experience, making it the easiest ingestion method for non-technical users. Mirroring is also quite intuitive, but it’s more of a straightforward selection process—choosing which tables to ingest rather than configuring an entire workflow.

Data Factory Pipelines, on the other hand, require a deeper understanding of the technology, including how expressions and functions work within Azure Data Factory. Notebooks demand even more technical expertise, as a basic understanding of Python is essential. While they offer the most flexibility, they are not the ideal choice for teams without programming knowledge.

So, when ease of use is the priority, Dataflows stand out as the most accessible option.

Type	Intuitiveness
Notebook	★
Dataflow Gen 2	★★★★★
Data Factory Pipeline	★★★
Mirrored DB	★★★★

Conclusion

As always, choosing the right technology depends on the specific needs of your project and customer scenario. In my case, I’m leaning towards using notebooks within our framework for data ingestion. But before you jump in, ask yourself: Are your developers ready for this technology?

Finding the sweet spot between capacity usage, ease of use, and flexibility is like trying to balance a three-legged stool—lean too far in one direction, and you might end up on the floor. My advice? Start with some Proof-of-Concept cases to test the waters. That way, you’ll have a solid foundation to pick your favorite data ingestion method in Fabric—without too many surprises along the way!

Fabric Data Factory: Issues with CU usage in copy activities

January 9, 2025January 13, 2025 IvoFabricLeave a Comment

Issue

At several customer sites, we migrate existing BI platforms from Azure services such as Azure Synapse, Azure Data Factory, or Azure Databricks to Fabric. One key step in the migration process is transferring Azure Data Factory pipelines to Fabric Data Factory.

In some cases, we work with a large number of small source tables (e.g., from an Azure SQL Database). After the migration, I reviewed the Fabric Capacity Metrics report and was surprised to see that a single execution of the daily load process consumed nearly 30% of the available capacity on an F8 instance. The majority of this usage was attributed to pipeline operations.

Given the size of the capacity, I initially believed that an F8 instance would be more than sufficient for the customer, considering the relatively small amount of data and the complexity of the calculations. So, why was the capacity usage so high?

Test Environment Setup

Next, I conducted an investigation on a Fabric pipeline with a Copy Data task that loads 12 tables from a test database into Parquet. The Copy Data task is executed within a ForEach loop. The goal was to explore ways to optimize the CU (Compute Unit) usage of Copy Data tasks.

What does Microsoft say?
According to the pricing page for Fabric pipelines, the following statement is provided for “Data Movement” tasks (Copy Data activity):
“Data Movement service for Copy activity runs. You are charged based on the Capacity Units consumed during the Copy activity execution duration.”

In the pricing breakdown for how “Data Movement” is charged, Microsoft states:
“Metrics are based on the Copy activity run duration (in hours) and the intelligent optimization throughput resources used.“

Source: Pricing for data pipelines – Microsoft Fabric | Microsoft Learn, 18.12.2024

But what exactly is “intelligent optimization”? According to Microsoft’s “Copy Activity Performance and Scalability Guide”, several factors are involved, such as parallel copy for partitioned sources and intelligent throughput optimization.

Source: Copy activity performance and scalability guide – Microsoft Fabric | Microsoft Learn

To investigate further, I conducted three tests with different settings, modifying the intelligent throughput optimization (ITO) option by comparing “Max” versus “Auto” and adjusting the batch count in the ForEach loop to 6. The results showed that the batch count significantly impacts the execution duration, while the ITO setting has little to no effect.

The results showed that the batch count significantly impacts the execution duration, while the ITO setting has little to no effect.

Now, let’s turn our attention to the Fabric Metrics App to examine the consumed CUs. What insights does it reveal about the resource usage?

All pipelines are charged the same. However, by examining the details more closely, we can see how many CUs are used by each individual activity. According to Microsoft’s pricing calculation, the duration of the operations is a key factor in determining the cost.

Source: https://learn.microsoft.com/en-us/fabric/data-factory/pricing-pipelines

his suggests that the duration should directly impact the CU calculation and costs. However, when we examine the individual operations, they all consume 360 CUs, regardless of the runtime.

This was quite unexpected.

Referring to a statement from a blog post, this is also what I assumed to be the basis for the calculation:

In my eyes:

1.5 CU per hour gives 0,0004166 CU per second.

Say 30 s duration. 30 * 0,0004166 = 0,0125.

Now how many intelligent optimization throughput resources are used? Was set to auto, so unclear.
But even assuming a maximum of 256, we only get 256 * 0,0125 = 3,2 CU (s). Far from listed 360!

Source: Solved: Minimum CU (s) billing per copy? Or am I just bad … – Microsoft Fabric Community

Let’s take a look at the real-life scenario at the customer mentioned at the start of this post. When we examine the correlation between the duration of the operation and the CUs consumed, we find that nearly all data movement operations are consuming 360 CUs!

In fact, 99% of the operations at the customer result in 360 CUs.

When I look at the duration, it’s clear that the operations with higher CUs are generally the “long-running” ones as well, but there are only few.

Here, we observe another interesting pattern: It appears that the CUs are calculated in 360-unit increments. This could potentially be linked to a time calculation in seconds, perhaps something like ((60 * 60) / 10)?

Conclusion

Based on the findings, it appears that Microsoft’s pricing for data copy activities within Fabric pipelines may not accurately reflect the true consumption based on task duration.

It seems that very small copy tasks are rounded up to at least one minute of usage, leading to an inflated cost. This rounding effect significantly impacts customers with a large number of small objects, as even though the data is minimal, the usage calculation results in high consumption of Capacity Units (CUs).

The implication is that optimizing individual tasks may not have as much impact on billing as expected, while reducing the number of tasks to be processed could have a more substantial effect on overall costs. This discrepancy in how CUs are calculated warrants further clarification from Microsoft, particularly for scenarios involving many small data movements.

Be cautious when working with Data Factory copy activities, especially during the migration of pipelines from Azure Data Factory to Fabric Data Factory. The way usage and costs are calculated differs significantly between the two platforms!

PS: This is a re-written post based on my contribution in Microsoft’s Data Factory forums: Is the pricing of Fabric pipeline data copy activi… – Microsoft Fabric Community

Enhancing Power BI Visuals for Color Accessibility: Balancing Color and Clarity

September 12, 2024September 12, 2024 IvoUncategorizedLeave a Comment

In the quest to make data visualizations more inclusive, it’s crucial to address the needs of color-blind people, while still preserving the effective storytelling that colors provide. In this post, we’ll explore some practical adjustments made to Power BI built-in visuals, that not only enhance accessibility but also retain the meaningful use of color.

Understanding Color Blindness

Color blindness affects approximately 8% of men and 0.5% of women globally. It occurs when the eye’s color-detecting cells fail to respond to certain wavelengths of light. This condition can make it difficult to distinguish between colors, particularly reds and greens, or blues and yellows.

For many color-blind individuals, visuals that rely solely on color for differentiation can be challenging to interpret. To address this, we need to implement design strategies that enhance clarity. Often, a color-blind friendly palette of colors is used. But this may come with some disadvantages to the real meaning of the colors.

The Importance of Color in Storytelling

Colors in data visualization do more than just decorate—they convey meaning and context. Lets look at the following example of three visuals displaying the age categories of Switzerland’s population:

Note: While there are many ways to enhance visuals further, this post focuses on color-related adjustments to improve accessibility.

Yellow signifies the “golden age” of seniors (65+), symbolizing experience and wisdom.
Green represents vitality and growth for individuals aged 19-64, reflecting an active and productive life stage.
Light blue (symbolizing youth under 20) conveys freshness, new beginnings, and potential. Light blue often evokes a sense of calm and clarity, making it ideal for representing the younger demographic.

These colors help users quickly understand and remember the context of the data. However, when viewed by individuals with color blindness, distinguishing these colors becomes problematic if used alone. As someone with “normal” color vision, I could only imagine how such visuals would look like. In monochrome, differentiating between categories becomes difficult without color cues, underscoring the need for enhanced visual clarity.

Addressing Color Accessibility

To ensure that our visuals are accessible to everyone, including those with color blindness, I have implemented several key adjustments:
Legends make it much easier to interpret individual visuals. For the line and bar charts, legends were added to clearly identify categories, while the pie chart now uses data labels instead of a legend, providing direct category information. To further enhance accessibility, different line styles—dotted, dashed, and solid—were applied to the line chart for each category. In the stacked column chart, borders were introduced:

The youth category remains default
The age group 20-64 has a grayscale border
The senior category features a solid black border

These adjustments collectively improve the clarity and readability of the visuals, even for those viewing them in grayscale.

Before Adjustments: In grayscale, it’s difficult to distinguish between categories due to the lack of color cues.

After Adjustments: With the addition of distinct line styles and borders, the grayscale visuals become much clearer, showing how these changes enhance readability.

Final Visuals

The final updated visuals incorporate color and additional design elements to improve accessibility while preserving the storytelling aspect. These adjustments not only help those with color blindness but also streamline the user experience for all viewers.

In summary, while these adjustments involve a minor increase in visual space for the additional legends, they significantly enhance both accessibility and overall clarity. By retaining the original color storytelling and incorporating complementary visual aids like borders and line styles, we ensure that our Power BI reports are both inclusive and effective.

Swiss federal elections 2023: Displaying results [demo report in German]

September 18, 2023September 12, 2024 IvoUncategorizedLeave a Comment

Data visualization has revolutionized the way we understand and interpret election results. In this context, I created a Power BI report to showcase the results of the Swiss National Council elections held in 2015 and 2019. With data directly ingested from open platforms provided by the Swiss Government, this report gives a comprehensive view of election results down to each municipality.

I recommend to display the report on a desktop-sized screen. Many of the visuals are interactive, detailed results from single municipalities can be displayed by hovering over map of Switzerland (image 1).

The second page will display results on the election day (coming 22th October 2023). Display the report in full-screen to get the best experience. In upcoming posts, I will explain the technical background and add new content.

Live report (select in the bottom right corner for full screen)

(Mis-)use custom map visuals in Power BI

March 28, 2023April 5, 2023 IvoPower BILeave a Comment

Use case

In this blog post, I will show you how to use custom map visuals in Power BI to display something different than a region on a world map.

I first stumbled upon custom map visuals years ago using Reporting Services Mobile Reports. Back then, my boss used to visualize figures on different regions of Switzerland based own region definitions given by the business.

Based on personal interest, I created a dataset using open data to get the votes of members in the Swiss National Council. I want to display the members on their effective place in the seat plan of the parliament. As I looked at the image of the parliament, I suddenly thought about it as a map. It actually displays a certain object placed on a respective geographical location – even if there is no need for longitude and latitude (in terms of placement on our world map)

Swiss Parliament (Source: https://www.parlament.ch)

Approach

I often try to avoid 3rd party visuals and I check first, if the requirement could be fulfilled with a built-in visual. So I gave the custom map visuals a go!

First, you need to activate the shape map visuals in the preview features settings in Power BI desktop. It will show up in your visual selection pane afterwards.

Custom map visuals need a GeoJSON file to display the custom maps. I’d have to generate one using the parliament seating plan.

The seat plan above needs do be represented as GeoJSON file defining the single seats as objects. They also need to have a ID to be able to identify and use them in the report. In this specific example i needed 3rd party tools to convert the seat plan first to a SVG, then to a JSON file (I have used the following converter: Online GIS/CAD Data Converter | SHP, KML, KMZ, TAB, CSV, … (mygeodata.cloud)).

In the JSON file resulting, an id has to be added to every polygon defined in the JSON file. Those Ids are connected to the respective data in Power BI afterwards. Adjustments like this or other transformations can be done with the online tool mapshaper (mapshaper).

Hint
On one point, I had to rotate the whole JSON file by 180 degrees. I was able to achieve this by using the following command in mapshaper: “mapshaper -affine rotate=180”. Just if you are struggling with this 😉

The Id in your data is now connected to the polygons by using the Id in the “Location” attribute of the custom map visual.

Conclusion

This was a first trial on (mis-) using custom map visuals not showing “real” geographic data on the world map but allocating areas on a complete different map, like a seating plan.

Do you have used custom map visuals for other purposes than real world maps as well? Let me know in the comments!

This is my final report in action, showing vote results based on my seating map:

Hello world!

March 21, 2023October 11, 2023 IvoUncategorizedLeave a Comment

I just returned home after an intense week fully booked with speaker sessions about data. I was attending the SQLbits 2023 in Wales, which is basically the largest data platform conference.

Now I got a backpack full of new ideas, new connections and new knowledge. I was impressed about the whole data community actively sharing their knowledge with free videos, blogs or sessions. In one session called “keynote to the community”, all attendees gathered in the big auditorium. Some key persons of the community invited all attendees to actively share knowledge with others as well.

As I thought about this, I remembered the numerous times I was glad that someone wrote about a specific topic, explaining it and providing solutions. Especially in our very specialized area of data engineering, it is crucial to be able to access this knowledge.

Even if this blog may not be read by a hundred persons, it could help the one or the other in daily business. Maybe I just want to give something back, for all the thousands of blog posts helping me during my journey on data platforms.

Have fun!