Ambiguous model and USERELATIONSHIP

When a data model diagram contains cycles it’s ambiguous, because more than one path exists from the same two tables. When I began writing DAX code I thought that in case more than one path connected the same two tables, the result was the intersection of the filters. I was wrong.

A few days ago I answerd a question on stackoverflow.com about what happens when USERELATIONSHIP is used with an ambiguous model and I realized that the proposed model was the simplest case of ambiguity: three tables and three relationships.

There are two relationships on the Date column and an inactive one on the Name column between Product and ProductHistory. The last relationship is inactive, because otherwise it would create an ambiguity on the propagation of a filter from ‘Date'[Date] to ProductHistory[Date]. What would be the right path to follow? From Date directly to ProductHistory or from Date to Product and then to ProductHistory?

With the Product[Name] –> ProductHistory[Name] relationship inactivated, it’s evident that a filter applied to ‘Date'[Date] directly propagates to ProductHistory without crossing the Product table.

This can be seen by executing this query in DAX Studio, that per each ‘Date'[Date] shows the corresponding ProductHistory rows, according to the active relationship:

EVALUATE
GENERATE(
    ALL( 'Date'[Date] ),
    CALCULATETABLE(
        ProductHistory
    )
)

The result shows matching dates, as expected, since the ‘Date'[Date] –> ProductHistory[Date] relationship is the active one

But what happens when activating the Product[Name] –> ProductHistory[Name] relationship with USERELATIONSHIP ?

  • Both relationship are used and the result is the intersection
  • Only the relationship specified with USERELATIONSHIP is used and the other one is ignored

The answer is the latter: DAX uses only the path containing the relationship specified with USERELATIONSHIP and ignores any other existing path. The result is that the ‘Date'[Date] -> ‘ProductHistory'[Date] relationship becomes the inactive one for the duration of the CALCULATETABLE where USERELATIONSHIP is applied.

This can be seen by executing this other query in DAX Studio

EVALUATE
GENERATE(
    ALL( 'Date'[Date] ),
    CALCULATETABLE(
        ProductHistory,
        USERELATIONSHIP ( 'Product'[Name], ProductHistory[Name] )
    )
)

The result shows that the dates don’t match between ‘Date'[Date] and ProductHistory[Date].

‘Date'[Date] matches with the Product[Date], as can be seen in the code used to generate the three tables:

Product = 
DATATABLE( 
    "Name", STRING, "Date", DATETIME, "Quantity", INTEGER,
    {
        { "A", "2020-01-01", 1 },
        { "B", "2020-01-01", 2 }
    }
)

ProductHistory = 
DATATABLE(
    "Name", STRING, "Date", DATETIME, "Quantity", INTEGER,
    {
        { "A", "2020-01-02", 10 },
        { "A", "2020-01-03", 20 },
        { "B", "2020-01-01", 30 }
    }
)

Date = CALENDAR( "2020-01-01", "2020-01-31" )

The sample files for this article can be found on my github

Adding a Static Table with Types in Power Query With M Language

SQLBI’s guys recently published the Youtube video Create Static Tables in DAX Using the DATATABLE Function, where they show how to create a static table in Power BI.
At first using Power Query Enter data interface and then using the DAX language.
They also show that the M code generated by Enter data contains the table definition as an unreadable (by humans) encoded string, and that the only way to change it is through its specific editor.

Actually, it’s possible to create a static table using M readable code, using the syntax explained in this Chris Webb’s BI Blog post Creating Tables In Power BI/Power Query M Code Using #table(), where one of the proposed syntaxes also allows to specify the data types.

The M code to create the Segments table is the following

#table(
    type table
        [
            #"Price Range"=text, 
            #"Min Price"=number,
            #"Max Price"=number
        ], 
    {
        {"LOW", 0, 100},
        {"MEDIUM", 100, 1000},
        {"HIGH", 1000, 999999}
    }
)

This code uses the #table() function, specifying the type as the first parameter and the data as the second one.

To enter this code in Power Query first create a Blank Query

Then open the Advanced editor

And finally write the code in the Advanced editor window

selecting Done closes the Advanced editor window and shows the generated table

the #table() M function doesn’t have the same limitation of the corresponding DAX function, that only accept literals. With #table() variables are allowed.

Of course, defining static tables using M language has the disadvantage that a change triggers a refresh of the dataset.

The sample file can be downloaded from my GitHub

Circular Dependency in DAX caused by CALENDARAUTO

Working on a small demo, I was surprised by an unexpected circular dependency.

I built a repro to understand what was going on and I’m writing this post to fix what I found out for future memory.

The model is very simple, I have a table Employee with the Name and the Hire and Leave dates

I’ve added a Date calculated table, created by CALENDARAUTO

Then I generated a Employee Snapshot table, with all the dates between the Hire and Leave dates for each employee.

When I tried to create the relationship on the Name columns between the Employee and the Employee Snapshot tables, a Circular Dependency error was generated.

Trying the known techniques to solve the circular dependency on the Employee Snapshot DAX code didn’t work.

Eventually, the only solution I found was to replace the CALENDARAUTO with CALENDAR.

The Employee table in the repro is very simple, I created it directly with “Enter Data” button.

SNAG-0008

The code to create the Date table with CALENDARAUTO is straightforward

Date = CALENDARAUTO()

The code to create the Employee Snapshot, reduced to avoid context transitions and Time Intelligence function is

Employee Snapshot = 
SELECTCOLUMNS( 
    GENERATE(
        ALLNOBLANKROW( Employee ),
        FILTER (
            ALLNOBLANKROW( 'Date'[Date] ),
            'Date'[Date] >= 'Employee'[Hire Date] &&
            'Date'[Date] <= 'Employee'[Leave Date]
        )
    ),
    "Name", Employee[Name],
    "Date", 'Date'[Date]
)
The model now has the three tables, without any relationship
 
SNAG-0007
 
Then, when I try to create a relationship using the Name column
 
Relationship on Name setting
 
The circular dependency error is generated
SNAG-0006
The solution I found is to replace the code to generate the Date table
Date = 
VAR MaxEmployeeDate = MAX( MAX( Employee[Hire Date] ), MAX( Employee[Leave Date] ) )
VAR MinEmployeeDate = MIN( MIN( Employee[Hire Date] ), MIN( Employee[Leave Date] ) )
VAR MaxDate = DATE( YEAR( MaxEmployeeDate ), 12, 31 )
VAR MinDate = DATE( YEAR( MinEmployeeDate ), 1, 1 ) 
RETURN
    CALENDAR( MinDate, MaxDate )
With the new Date table it’s now possible to create the relationships
 
SNAG-0010
 
The explanation is that CALENDARAUTO is using all the dates in the model, including those in the calculated table.

 

The sample pbix can be downloaded from github

 

How To Set Up Power BI New Composite Models

After watching the new video from SQBI’s guys Unboxing new Power BI composite models I decided to try it.

Here is a quick guide on the steps to follow in order to set up a composite model.

Enable the Preview feature DirectQuery for Power BI datasets and Analysis Services.

In the Options configuration window in Power BI Desktop we need to check the “Power BI datasets and Analysis Services” Preview feature. After clicking OK,  Power BI requires to be restarted.

options with steps

Connect to the Power BI dataset to be used via Direct Query.

This creates a Live Connection with a model in the Power BI service. First click on the “Power BI datasets” button and then choose the Workspace. 

SNAG-0025

Add a local model and transform the Live Connection to Direct Query

Click on the “Transform Data” button and then “Add a local model”

Transform Data Direct Query connection

And that’s it

its’ now possible to add new tables using Power Query or DAX and to create calculated columns from the tables context menu in the Fields panel.

SNAG-0026

 

 

Variables in DAX and Common Mistakes

This post purpose is to show two common mistakes that happen when using DAX variables.

Variables in DAX are a powerful tool that improves the readability of DAX code. But their name is misleading, since DAX variables behave like constants.

The syntax to define a DAX variable is straightforward:

VAR myVariable = ... a DAX expression ...
RETURN ... a DAX expression using myVariable ...

For instance, this is valid DAX code, where we define the variable TotalSales and later we use it

[% Sales] = 
VAR TotalSales = CALCULATE( [Sales], REMOVEFILTERS( 'Date' ) )
RETURN DIVIDE( [Sales Amount[, TotalSales )

Once a variable has been declared, it cannot be changed like in a conventional programming language, where variables can be assigned multiple times.

To see what are the two common mistakes that newbies do when using variables, we create two tables in Power BI Desktop using this DAX code

Date = 
ADDCOLUMNS(
    CALENDAR( "2020-01-01", "2020-12-31" ),
    "Month", FORMAT( [Date], "mmm" ),
    "Month Number", MONTH( [Date] )
)

Sales = 
VAR MinDate =
    MIN( 'Date'[Date] )
RETURN
    SELECTCOLUMNS(
        GENERATESERIES( 1, 10000 ),
        "Date", CONVERT( MOD( [Value], 365 ) + MinDate, DATETIME ),
        "Amount", MOD( [Value], 100 )
    )

Then we create a one to many relationship between ‘Date'[Date] and Sales[Date] and we set the sort by column of ‘Date'[Month] to ‘Date'[Month Number].

This is our model:

We can create the measure [Sales Amount]

Sales Amount = SUM( Sales[Amount] )

And a matrix visual to show the [Sales Amount] per Month

Now we want to write a measure to compute the percentage of the sales over the total, and to do so we write this DAX code

% Sales Amount Wrong = 
VAR SalesAmount = [Sales Amount]
VAR TotalSalesAmount =
    CALCULATE( SalesAmount, REMOVEFILTERS( 'Date' ) )
RETURN
    DIVIDE( SalesAmount, TotalSalesAmount )

But when we add this measure to the matrix visual we get this result

We have 100% per every month, which is wrong. Our mistake was to assume that the SalesAmount variable would be evaluated inside the CALCULATE with the modified filter context.

The fact is that once SalesAmount is defined, it contains the value for the current row, and when it is referred later it behaves like a constant. For instance, in January SalesAmount assumes the value of 42690, therefore the following CALCULATE behaves like if it was written

VAR TotalSalesAmount =
    CALCULATE( 42690, REMOVEFILTERS( 'Date' ) )

The correct code for this measure uses the [Sales Amount] measure instead

% Sales Amount = 
VAR SalesAmount = [Sales Amount]
VAR TotalSalesAmount =
    CALCULATE( [Sales Amount], REMOVEFILTERS( 'Date' ) )
RETURN
    DIVIDE( SalesAmount, TotalSalesAmount )

And now the result is correct

The second mistake is less evident, since it happens when we define a variable containing a table.

We want to add a card visual to show the number of Sundays of January. We think it is a good idea (but it isn’t) to first prepare a table with just the Sundays and later to count its rows with a filter on January. So we write this code

# Sundays in January Wrong = 
VAR SundaysTable =
    FILTER( ALL( 'Date' ), WEEKDAY( 'Date'[Date] ) = 1 )
RETURN
    CALCULATE( COUNTROWS( SundaysTable ), 'Date'[Month Number] = 1 )

And we find out that January has more Sundays that days

This is because the SundayTable in CALCULATE is a constant, like if it was written with literals: { { “2020-01-05”, 1, “Jan” }, { “2020-01-12”, 1, “Jan” }, … { “2020-12-27”, 12, “Dec” } }, therefore it is not affected by the filter context modified by CALCULATE.

But since the columns in table variables keep the data lineage, table variables can be used as filter arguments and the previous measure can be fixed like follows

# Sundays in January = 
VAR SundaysTable =
    FILTER( ALL( 'Date' ), WEEKDAY( 'Date'[Date] ) = 1 )
RETURN
    CALCULATE( COUNTROWS( 'Date' ), 'Date'[Month Number] = 1, SundaysTable )

And now we see the correct number of 2020 January Sundays.

Variables are very useful and once we learn to avoid these mistakes we’ll use them to better organize our code, as a support for debugging and also, in some cases, as a tool to optimize our measure performances.

The DAX code in this post was written with the purpose to be as simple as possible to show the common mistakes.

The pbix file can be downloaded from my github profile