Power BI CI/CD & git integration

Posted by:

|

On:

|

Git integration and Continous Integration / Continous Delivery (or CI/CD in short) set up for Power BI reports and semantic models used to be quite a tedious task until very recently. Pseudo-solutions existed, involving a mix of tools like pbi-tools, .bim file export from Tabular Editor and custom tasks in Azure DevOps pipelines (like Power BI Actions). However these approaches required a lot of effort and none were officially supported by Microsoft so prone to issues from time to time.

But that was before the new .pbip format entered the spotlight! With that new format, dreams of CI/CD for Power BI objects are not so far anymore… Let’s dig in!

Why do we even need git integration and CI/CD?

If you’re coming from a data engineering or software developer background and write SQL stored procedures or python notebooks on a daily basis, you’re probably familiar with git integration and the CI/CD concept and may even be wondering why the hell I even dare to ask “why do we even need git integration and CI/CD?”.

If you’re coming from a Power BI report developer background however, the question might be more legitimate. I mean, you’ve already developed tons of Power BI reports the usual way (i.e. using Power BI Desktop and the Publish button), and it worked perfectly well. So why sould you need a new way?

Well here’s why, if you’ve developed a few Power BI reports, you’ve probably encountered some of these frustrating situations:

  1. Several developers were working on the same report and you had quite a good time trying to merge the multiple reports into a single one
  2. A report was published in prod at some point and then everything stopped working. You thought you had plenty of back-ups for the previous versions of that report but apparently, you didn’t (the dog probably ate them). That was a Friday afternoon around 6p.m. No beer that day.
  3. You wanted to do a bit of spring cleaning on a workspace and had no idea why some reports were there and who published them

Git integration and CI/CD try and solve these issues, by providing a common location to store all versions of a given report and ways of securing what gets published where, when and by whom. And git integration and CI/CD are made easy by the new Power BI format : .pbip

What is that .pbip format?

The classic .pbix format is a compressed binary proprietary file format that holds both the data and the metadata of the semantic model as well as the visual layout. The issue is that git can only read text files, so it can’t read the .pbix format. That’s where the .pbip comes in.

The .pbip format is just another way of storing a Power BI file. It splits the content of the Power BI file in several small json files, which are readable by git.

Let’s briefly explore the content of the files part of that .pbip structure. Two files are especially important: model.bim and report.json.

The model.bim file is already pretty famous. You’ve worked with it you ever used Sql Server Analysis Services (SSAS) or its Azure counterpart Azure Analysis Services (AAS). For Power BI files, the model.bim file could also be exported using tools like Tabular Editor. The model.bim file contains basically all the metadata of the semantic model: columns, measures, relationshis, etc.

The report.json file is pretty new however and was introduced with the .pbip format. It contains the visual layout of the report: types of visuals used, positions, colors, etc. As of today, it’s still pretty much impossible to edit that file directly in an text editor however (and is not recommended by the way). But that’s not too much of a deal, as long as it’s there and that it’s a text file.

The .pbip (or the.pbir) file is then able to reconstruct a Power BI file from these multiple json files. Note that the .pbip file itself does not hold any data. The data is located in a cache.abf file in the MyPBIFile.Dataset folder. That file should always be git ignored.

If you wish to know more about the files of the .pbip format, you can check the Microsoft github repository as well as Haven & Rui Romano Youtube video.

Ok, so we’ve saved our report as a .pbip file and we have a nice understanding of the files in there. What now?

How do we go about publishing our report?

Before .pbip, things were easy. You just edited the report in Power BI Desktop and clicked the Publish button and you were good to go. But we’ve discussed that this causes serious problems (no collaboration, no versionning, no check of what gets published, etc.).

Things are just a tiny wee bit harder now with the new way of publishing reports, but it offers so much that it’s definitely worth it.

First of all, we’re going to need two things if we want to integrate a Power BI report to a git repository: a Power BI report and … a git repository.

Take any Power BI report you want. I’m using a simple Power BI report with just a single Products dimension. As for the Azure DevOps repo, I won’t cover here how to create it, plenty of content out there already exists (like here in the Microsoft documentation for instance).

The first thing that we’re going to have to do is to clone the Azure DevOps repo onto our local machine. Click clone in the upper right corner of your repo and grab that URL.

We’ve got plenty of choices to clone the repo locally and interact with it: Visual Studio, Visual Studio Code, git command line, etc. I like VS Code so I’m going to go for that option.

Create a folder on your local machine where the Azure DevOps repo will be saved. Once you’ve done this, you should have a single ReadMe file (you’re basically seeing the content of your Azure DevOps repo which is often initialized with a ReadMe file).

That repo looks a bit empty doesn’t it? Let’s add some content!

Before we actually add the files, we need to talk about branches. When we initially cloned our repo, we were taken to the main branch directly. The main branch is where the final version of the code should appear, once it’s been verified by peers. You should never directly develop in the main branch (and you should even add branch policies in your Azure DevOps repo to make sure no one can). So before doing the copy of our .pbip format files, let’s create a branch. I’m going to call mine feature/add_power_bi_report.

And now we can actually copy over the files in that new branch. I’m copying all the .pbip format files to a folder called PowerBI inside my local repo in order to separate Power BI developments from other stuff I could have going on.

Some of the files (mainly the localSettings.json and cache.abf files ) in the screenshot above should not end up in our Azure DevOps repo however. The localSettings.json files are, like their name implies, local and don’t need to be pushed to Azure DevOps. The cache.abf file is the file that contains the actual data. It’s not text so it should not be sent to the Azure DevOps repo either.

We’ve got all we needed in our local branch. However at this point, everything is still only in our local repository and has not yet been pushed to Azure DevOps. For our colleagues to be able to work on the same report, we need to get that there. In order to push to Azure DevOps, our changes need to be locally validated (or commited in the git language). So let’s commit and push our changes.

You’ll be prompted for a commit message, which is mandatory. Just explain briefly why you commited these changes and close the COMMIT_EDITMSG tab to complete the push. You might also receive a warning that the remote branch does not exist, just say you want to create it as part of the push.

Back in Azure DevOps now, you should see your branch created.

We’re now going to want to push the changes from our feature/add_power_bi_report branch to the main branch. But why do we actually want to do so? What we initially wanted was simply to publish a Power BI report. How publishing to the main branch is going to help us accomplish that?

Well here’s how: with Fabric git integration you can link a Power BI workspace to an Azure DevOps branch. Everything that gets pushed in that DevOps branch gets synchronized in the Power BI workspace. Just like magic!

Just before pushing (merging we should say in git) in the main branch, let’s create an empty Power BI workspace (again, it has to be a Fabric or Premium Capacity workspace). In the workspace settings, let’s configure git integration. I won’t cover this in detail, it’s pretty straightforward. Just note that I’m choosing PowerBI under Git folder so that all my Power BI files are saved there in Azure DevOps repo, so they don’t get mixed with other stuff I could have already there.

Let’s now go back to Azure DevOps and create a pull request to take our changes from the feature branch to the main branch. Here you can add reviewers, link the pull request to some Azure DevOps Board user stories you were working on, etc. We’ll see in a future blog post how we can automatically run code reviews as part of this pull request.

Let’s create and complete the pull request. Let’s go back to our Power BI workspace now. It should track that new changes were pushed to the main branch. Let’s click on these two updates and select Update All to catch all changes from our Azure DevOps repo.

After a few seconds, the Power BI report and semantic model are added to the Power BI workspace.

Conclusion

And there you have it, we finally published our report to Power BI Service, but without using the Publish button in Power BI Desktop! It’s a bit longer and can seem a bit overwhelming if you’ve never working with git or VS Code before, but it just brings so much!

Now everything is set up to easily revert to previous versions of our report, have multiple developers collaborate on a single report, see which changes were published by whom, when and for what reasons, etc. What’s really cool is that our report can still be downloaded in .pbix. The two formats (.pbip and .pbix) are easilly interchangeable which really makes this feature awesome!

We’ve only scratched the surface of what .pbip brings in this blog post but I didn’t want it to be too long. I’m thinking about writing some more content on this topic:

  • How do we go about making changes to our published report?
  • How can multiple developers collaborate on a single report?
  • Which git workflow to adopt when working with multiple environments (e.g. what to do when you have a Dev, Test and Prod workspaces)?
  • How to integrate code quality checks as part of this process?
  • How to work with the new TMDL?

Stay tuned, thanks for reading!

Posted by

in