This blog is all about achieving a different mindset when you want to change business tools. If you used to work with Excel and want to take the path of Self-Service Power BI, you need to treat data in a different way. This blog is about the difference between the “lose” format Excel allows versus the more rigid set of normalized tables you want to work with in Power BI.
The difference between Excel, being a calculation program and Power BI, being a relational table program, is undeniably great. In Excel you are focused on (a calculation in) one cell, while working Power BI requires thinking in (filtering) columns and tables. Another big difference is in the way input of data works. In Excel you are free to add and change content in every single cell, while in Power BI transforming data is only possible in Power Query, again based on transforming columns and tables.
So how does Power BI work between tables? Let have a look on a small example. You can see two fact tables, named Stock and Sales and they are related to one filter (or dimension) table, named Products.
And with that I can create insights on my sales as well as on my stocks.
But let us look a bit closer to the tables. If you look at the filter and the fact tables, the relation has been done by columns containing text values.
That is quite unusual in relation databases. Since relationships in a relational database are promoted by constrains, those relationships are made between columns containing whole numbers values. That is not the case with Power BI. I do not know how relations in Power BI work, I do know that it works with almost every data type between columns in your Data model. And even columns with different data types values can be related!
But this is not the scenario you prefer. Relations in Power BI should be based on columns with the data type whole number. So, instead of text values in the above example you want to use a relationship between a normalized tables. So, the preferred relation scheme looks like this. Note that the relationship has been establish with columns, containing whole number values.
In a filtering table usually the value in the relational column is unique, while in the related fact table this value can be multiple times present; hence the “1 to many” relationship.
If you remove the text column in the “big boys” fact tables, compression will be much better, thus fastening your model. The other reason would be that typos are not that problematic: it is not the text, but numbers that will sustain the relationship. Hey, I know, this is rather arbitrary, but let us keep this practical.
So, before you even think of changing from Excel to Power BI, you need to change the “lose” data sets with all data in one Excel file to a set of different stored normalized tables. The biggest plus is having a set of independent tables, that can be used in every Power BI model, instead of creating new sets over and over again for every Excel model.
But how can you obtain normalized tables? If you can fetch tables from relational databases, your most of the time good to go. But what is happening if you have two different source databases that profits the fact tables. If you have an Enterprise Data Warehouse, you’re again good to go, because the EDHW will provide the normalized tables. But if you do not have this luxury option? Then you need to create your own set of normalized filter tables!
In my next blogs I want to discuss how to create simple filtering tables, based on different fact tables. Another blog will be spent on where to store the filtering tables.