Brian Grant on BI

 DAX, Data Modeling, and Power Query


The Sub-Formulas of DAX

DAX is a language of sub-formulas.

To a new author, a Measure looks like one big formula; but to a trained DAX expert it’s a chain of interrelated sub-formulas, each of which can be isolated and understood as distinct.

While there are sub-formulas all over the place in DAX, most don’t require special training to understand. Even if an author doesn’t see the sub-formulas, they work just as expected, so there’s no need to obsess over them.

(The fancy term for these “sub-formulas” is “sub-expressions” which you may have heard before; however, I find for most folks the term “sub-formula” us much easier to understand and work with: it’s just a small formula inside a bigger formula. Eventually you will want to be able to use the more technical term, but starting with the simpler term can make learning the tricky related ideas much easier.)

There are two kinds of DAX sub-formulas you have to get good at understanding:

A Per Row Formula is a sub-formula typed into an Iterator (like SUMX or AVERAGEX) which is then run once per row of a Temp Table. Running these once per row produces the numbers that SUMX sums, AVERAGEX averages, COUNTX counts and so on.

A New Filters Formula is a sub-formula typed into a New Filters Function (mainly CALCULATE) which is then run with a new set of filters that have been changed in some way. The changes are determined by all the other arguments of the function.

Here are some examples of both types, with the sub-formula itself highlighted:

Each yellow bit is a sub-formula that will run once per row of a Temp Table. *Which* Temp Table is determined by the code just above the yellow bit (argument 1).

Each blue bit is a sub-formula that will be run with modified filters. *How* the filters should be different is determined by the code just below the blue bits (argument 2).

In this article we are going to walk through examples of both types; setting us up to learn about the often daunting concept of DAX Evaluation Context.

Let’s get to it.

The “Per Row Formula” and Iterators

The example we’re going to go through here is one we’ve gone through before in the Auto Snapshot series of posts; but this time we’re going to be focused less on the outcome and more on deeply understanding the sub-formula in our code.

In our example we have a Data Model that looks like this:

And a measure for Total Cost that looks like this:

Total Cost =
SUMX(
    Sale,
    Sale[Qty] * RELATED( Dish[Cost] )
)

Argument 2 of SUMX is where you type in the Per Row Formula. So what you see in argument 2 above is an example of a sub-formula; it’s the little formula that sits inside our larger formula.

Let’s pull it out the sub-formula so we can look at it in isolation:

Sale[Qty] * RELATED( Dish[Cost] )

So that is the Per Row Formula. It is what we have told SUMX to run for each row the Temp Table we specified in argument 1 (“Sale”). Our sub-formula says, “For each row, grab the current row’s Sale[Qty] number and multiply it by the current rows Related Dish[Cost] number.” (More on that in the original post).

Let’s imagine like before, that this Measure is on a card with some slicers for “Lunch” and “H. Special” (“the house special”, which for what it’s worth, is the burger):

I know I’m making your head spin but let me show you the Measure again so we can start to imagine it running.

Total Cost =
SUMX(
    Sale,
    Sale[Qty] * RELATED( Dish[Cost] )
)

So in our Measure, the first thing that happens is that SUMX fires up. It’s job is to take a Temp Table, run a Per Row Formula for each row of that Temp Table, and then sum up the results.

Argument 1 is the Temp Table we’re going to run our Per Row Formula for each row of. A *Temp* Table is what SUMX wants but what we typed in was a *Model* Table (the name of a table in our Data Model) so DAX does the conversion for us via the Auto Snapshot process, and we get the following Temp Table:

Because of the slicers for “Lunch” and “H. Special”, the Temp Table has been filtered down to just these two rows. (The three extra columns at the end were added during the Auto Snapshot process; this is done mainly to allow slicers to “work over relationships” and can also be use in our Per Row Formula like we’re about to do.)

Now SUMX is going to run the Per Row Formula for each row of that Temp Table:

The Per Row Formula runs for the top row, getting the numbers 1 and $5, then multiplying them together to get $5.

The same Per Row Formula then runs for the bottom row, getting the numbers 3 and $5, then multiplying them together to gets $15.

After that the Per Row Formula has been run for each row producing a bunch of numbers. To finish the job SUMX then sums up the results from each row, $5 + 15$ = $20.

The reason we get different answers for the different rows is only tricky if you think about it too hard. You’re running Per Row Formula on different rows, of course you get different results because each row has different numbers in it.

On the top row the qty is 1 and the cost is $5, and on the bottom row the qty is 3 and the cost is $5.

In fact, let me just list out all the values for the top row:

Sale[Shift] = “Lunch”
Sale[Qty]   = 1
Sale[Price] = $8
Sale[Dish]  = “Burger”
Dish[ID]    = “Burger”     (R)
Dish[Type]  = “H. Special” (R)
Dish[Cost]  = $5           (R)

So those are the visible values for the top tow. To be fancier we can say that the above list is the Row Context for the top row. Row Context is just a fancy way to say, “what values are in the current row?” I’m leaving out a few details, but not many. Row Context really is as simple as the values of the current row just listed out.

(The little “(R)”s are there to indicate which columns are the Related columns.)

What is the Row Context for the bottom row? It’s pretty close:

Sale[Shift] = “Lunch”
Sale[Qty]   = 3
Sale[Price] = $8
Sale[Dish]  = “Burger”
Dish[ID]    = “Burger”     (R)
Dish[Type]  = “H. Special” (R)
Dish[Cost]  = $5           (R)

If you were to ask the Per Row Formula for the bottom row “What’s in the current row?” it would write out the above list.

OK, so after running the Per Row Formula for each row, SUMX takes all the results (in our case $5 and $15) and sums them up.

The main difference between different Iterators is what the do with the resulting number. SUMX will sum them all up, AVERAGEX will find the average, MINX will find the smallest number, etc. Other fancier Iterators will do more complicated stuff, but it always begins with running the Per Row Formula for each row of a Temp Table.

Whenever you look at an Iterator written out in code though, it’s very much worth it to copy/paste *just* the Per Row Formula into something like Notepad to look at it for a second and say “This is the sub-formula that will run once per row.”

If you’re in a rush or don’t like Notepad, you can also highlight just the Per Row Formula part of your measure right in the formula editor. I will say that this isn’t quite as good, but I do get that it’s more practical. However you do it though, you have to drill onto your head that your big formula for the entire Measure contains it’s own small little sub-formula.

The “New Filters Formula” and CALCULATE

The other type of important sub-formula is the New Filters Formula. While technically there are several functions that use these, there’s really only one big one that does 95% of this kind of work out there and that’s CALCULATE.

Let’s look at what CALCULATE wants us to type into it:

CALCULATE(
    <New Filters Formula>,
    <Filter Modification>
)

Whatever you type into argument 1 of CALCULATE is the New Filters Formula. It is a sub-formula that will be run with a new set of filters. Argument 2 is whatever change you want to make to the *current* set of filters to create the *new* set of filters. Mostly this is either adding a filter or removing a filter.

Let’s say I want to write a Measure that will take the default filters, remove any filter on shift (“Lunch” or “Dinner”), and once that is done, take the Auto Snapshot of the “Sale” table and count how many rows are in it.

Here’s how I write that:

Total Transactions All Shifts = 
CALCULATE(
    COUNTROWS( Sale ),
    REMOVEFILTER( Sale[Shift] )
)

Argument 1 is the New Filters Formula. Let’s pull it out to look at in isolation:

COUNTROWS( Sale )

That is the sub-formula that will run once the filters have been changed.

How do we want to change the filters? That’s argument 2, where it says to remove any filter on the Sale[Shift] column.

Once that filter is removed, the new set of filters has been set up, and the New Filters Formula can run.

Does CALCULATE do things besides changing the filters and then running a New Filters Formula? No.

Wait, really?

Yes, that’s the only thing CALCULATE does. As hard is it can be to master, as scary as it first looks, CALCULATE really just does have that one job of changing the filters and running a sub-formula. (There’s few small nuances I’m leaving out, but they’re minor and/or semantic.)

Let’s walk through the above Measure with the same slicers as usual.

OK, so when the Measure starts, there are two filters in place (from the two slicers). They create a set of filters that looks like this:

The fancy name for “the set of filters that are in place right now” is the Filter Context. To be fancy, we can say that the above is the Measure’s initial Filter Context (sometimes called the “default Filter Context”).

With that in our heads let’s look again at our code:

Total Transactions All Shifts = 
CALCULATE(
    COUNTROWS( Sale ),
    REMOVEFILTER( Sale[Shift] )
)

When DAX starts executing this Measure the first thing that it hits is the CALCULATE function. This tells DAX that we want to run a New Filters Formula, but we need to change the filters first. How do we need to change them? Argument 2: we want to get rid of any filter on “Sale[Shift]”.

So CALCULATE looks in the default Filter Context and removes any filter on that column. Here’ what the Filter Context looks like before and after.

Notice the filter for “Lunch” is gone in the second version.

So with that new “Post Calculate” Filter Context set up, CALCULATE then runs the New Filters Formula using that new Filter Context.

Here, again, is that New Filters Formula pulled out so we can look at it in isolation:

COUNTROWS( Sale )

OK, what does the above do?

As we’ve discussed before, COUNTROWS is a function that counts the rows of a Temp Table. Since we’ve given it a Model Table instead of a Temp Table; DAX automatically does the conversion for us using the Auto Snapshot process.

Here’s that Auto Snapshot process drawn out; pay attention to the Auto Filtering step where only one filter is applied:

While the Measure’s initial Filter Context had two filters (“Lunch” and “H. Special”), CALCULATE created a *new* Filter Context where the “Lunch” filter got knocked out. Since the Auto Snapshot occurred in the New Filters Formula it used this new Filter Context, and hence only the single filter for “H. Special” got applied.

OK, so what is COUNTROWS counting the rows of? That last Temp Table at the bottom with 3 rows.

And what number does COUNTROWS come up with? If you guessed 3 then give yourself a cookie. That is the number that CALCULATE returns and is our final answer.

The key idea here though is being able to understand that argument 1 of CALCULATE is a sub-formula; it is a small formula that sits inside our larger formula. This sub formula runs with a new set of filters that are created for it by CALCULATE; because of this, our name for this sub-formula is the New Filters Formula.

Whenever you are writing a Measure that uses CALCULATE it can be helpful to pull out the sub-formula into something like Notepad and say to yourself: “This is the New Filters Formula I’m giving CALCULATE, it will run after the filters have been changed.”

To start, it really can be useful to literally pull the sub-formula out into something like Notepad. After a while, once the idea is starting to sink in, you can just highlight the sub-formula in the DAX editor (no copy/paste) and make sure that you think of it as a separate sub-formula. Eventually, your brain will be able to do this for you with zero highlighting and no copy/pasting. That’s the goal; my eyes do this automatically when looking at a DAX Measure and I’d guess that’s true of most seasoned pros as well. Until you can do it automatically however, do it manually.

(If in doing so, you find that you can copy and paste the code out into Notepad, and utter the magic words I’ve given to you, but it’s not clicking in your brain why I’m having you do it; don’t fret and don’t spent 10 minutes staring at Notepad waiting for things to click. Simply close Notepad and keep going about your business. It might take several attempts over several days before it clicks in your head and you know exactly what I mean when I say “What we’ve giving to the function to work with is not number, but a whole sub-formula to be run either once per row, or with different filters.” )

Formula Decomposition: You’re Tearing Me Apart

As you may have noticed, I find it very useful when starting out as a DAX author to pull out your Per Row Formulas and New Filters Formulas into something like Notepad to look at them in isolation.

It will help you understand “That’s the sub-formula that will run once per row.” or “That’s the sub-formula that will run under new filters.”

The fancy term for this pulling sub formulas out to look at them in isolation is formula decomposition. What I’ve done in the blog so far is the simplest version of it. Let me show you a slightly more sophisticated version.

Our first measure was:

Total Cost =
SUMX(
    Sale,
    Sale[Qty] * RELATED( Dish[Cost] )
)

Up till now I’ve simply pulled out the sub-formula onto a seperate line like this:

Sale[Qty] * RELATED( Dish[Cost] )

Which is fine, but here is a slightly fancier decomposition showing the whole thing:

We can see the Per Row Formula pulled apart out to the right, but we can also see where it goes in the original parent formula (the little “…”).

I don’t expect you to do this at home, but in this fancier rendition we can both see the sub-formula in isolation and see how it relates to the parent formula (the bigger formula the sub-formula is a part of).

What was that second example again?

Total Transactions All Shifts = 
CALCULATE(
    COUNTROWS( Sale ),
    REMOVEFILTER( Sale[Shift] )
)

OK, so argument 1 is the New Filters Formula. Here’s that Measure decomposed:

We can see the New Filters Formula pulled out into its own little box, but we can also see how it fits into the larger formula. In the first box this helps us focus on what CALCULATE is doing. Our eye can ignore the sub-formula for a moment and more easily see that CALCLUATE is removing filters on the Sale[Shift] column to get things ready for the New Filters Formula to run.

This is useful in small formulas like we saw above, but it can also be useful in larger formulas where the sub-formulas contain their own sub-formulas (“sub-sub-formulas” if you will). I won’t work through an entire example; you’ve been through enough in this post. But let me give you just a small taste.

I can take a slightly bigger, slightly scarier Measure like this:

Total Cost All Shifts =
CALCULATE(
    SUMX(
        Sale,
        Sale[Qty] * RELATED( Dish[Cost] )
    ),
    REMOVEFILTERS( Sale[Shift] )
)

And I can decompose it like this:

We can see that out main formula has a New Filter Formula which in turn contains it’s own Per Row Formula. Said otherwise, the sub-formula contains a sub-formula.

At first the above image probably feels overwhelming.

However, if we look at just the first box on its own:

Or just the middle box on its own:

Or just the last box on its own:

None of these individual boxes are that scary or particularly overwhelming. Each one only has at most one function.

Being able to break out a big formula like this is how a DAX expert is able to work through what otherwise seems like a giant overwhelming formula. A DAX expert can break the whole thing into little boxes and each little box usually only contains a few simple things in it; just like in the example above.

(The DAX expert might not describe these sub-formulas as “boxes within boxes”, but I guarantee that their mind is doing something *like* this.)

For now the above is just a smidge too complicated to fully work through. Measures where there are sub-formulas containing sub-formulas is definitely where the language starts to get interesting; but before your brain can effectively chain these concepts together, it has to be able to understand them deeply in isolation.

That means for new authors, just focus noticing the single layer Per Row Formulas or New Filters Formulas within Measure is a great place to start. When you see SUMX (or any other X function) pull argument 2 out into notepad and say:

“This is the Per Row Formula, DAX will run this once per row.”

and when you see CALCULATE pull argument 1 out into Notepad and say:

“This is the New Filters Formula, DAX will run this with a new set of filters.”

It’ll take a little while, but soon you’re be doing this reflexively and your DAX code will seem much less confusing and overwhelming.

With those ideas set up however, we have to burrow a layer deeper. In the next post we’ll be diving deep into the concept of Evaluation Context. It might sound scary, but most of the key ideas we’ve already set up; like usual, the hardest part is not getting intimidated by the technical sounding terms.

I hope you’re as excited as I am for it.

Till next time!
-BG