Understanding INCLUDE and EXCLUDE LODs
Kevin and I are pleased to welcome back Kirk Munroe for another guest post. Kirk lives in Halifax, Nova Scotia, Canada and is a business analytics and performance management expert. He is currently one of the owners and principal consultants at Paint with Data, a visual analytics consulting firm, along with his business partner and wife, Candice Munroe, a Tableau User Group Ambassador and former board member for Viz for Social Good.
Kirk is also the author of Data Modeling with Tableau, an extensive guide, complete with step-by-step explanations of essential concepts, practical examples, and hands-on exercises. The book details the role that Tableau Prep Builder and Tableau Desktop each play in data modeling. It also explores the components of Tableau Server and Cloud that make data modeling more robust, secure, and performant.
------------
There are many articles, community forums, blog posts, videos, etc. on Tableau’s level of detail expressions (LODs). In fact, Ken wrote a great one sharing 20 uses for LOD calcs. But most of the content is focused on the most common one, FIXED, even though there are two other types–EXCLUDE and INCLUDE. I expect this is a combination of the limited use cases for EXCLUDE and INCLUDE, as well as the nuance and complexity of two key Tableau concepts–the Order of Operations and the viz level-of-detail–and how they both impact query results.
The Tableau Order of Operations has
also been discussed in much detail elsewhere. So, for this blog, we are going
to focus specifically on context filters, dimension filters, and where LODs fit
with them.
First of all, what does it mean to be
“in the view” or “at the level of detail of the viz”? Any field that is on
rows, columns, or on the marks card (except for being solely on tooltip) is “on
the view.” Fields that are only on pages and/or filters are “on the canvas”,
but not in the view.
Where might this have tripped you up
before? If you try to “hide all unused fields” and then notice errors because
some of the fields you were using for filters are gone?!?! This is because they
were only on the filters shelf and not in the view anywhere. Hide all unused
fields really means, “all unused fields in the view on any worksheet!” As for
the pages shelf, it might have tripped you up during the short period of time
when bar charts races were the rage.
Level
of Detail & FIXED LODs
Why is being in the view so
important? Tableau will disaggregate your data to whatever is in the view. If
you want your data aggregated to a level of detail (see what Tableau did there)
that isn’t on the view, you need an LOD. In other words, level of detail
calculations are really “level of aggregation NOT based on the detail in the
view.” Marketing has a tough job!
Let’s explore this concept starting
with the basics. Using the Superstore dataset that comes with Tableau, create a
new worksheet and drag Sales onto
the Text card. This will give you a number. (Your number might be different
than mine depending on the version of Superstore data you are using):
What is this number, why “SUM” and
what does it mean? Again, Tableau will always aggregate your data to the level
of detail in the view. In this case, our default aggregation method for sales
is SUM. We can change this in the view or on the field. The number itself means
the sum of ALL the sales in your data because you have not given Tableau
anything to disaggregate it based on.
Next, drag Category to columns (or double-click on it!). Now, you notice that
the sum of sales doesn’t mean the same thing. It means “break down total sales
based on each category”. In SQL-speak, we are aggregating our sales grouped by category. Bring Sub-Category into the view by
double-clicking it or dragging it to the right of Category. Now we have the SUM of sales broken down by Category and Sub-Category.
But, what if we want to see sales
totals for each Category as well?
This is a great example to show how filters work with our query results. First,
go to the Analysis menu, select Totals,
then choose both Show Column Grand
Totals and Add All Subtotals.
Your view should now look like this:
Tableau is automatically creating
table calculations to give us the subtotals and totals. Let’s quickly review
how FIXED LODs work before moving to INCLUDE and EXCLUDE. Create the following
calculated field:
What does this calculation mean? It
means group the sum of sales by every category, regardless of what is in our
view.
Double-click on your new calculation
to bring it into the view. The result should be:
Notice that the new value matches the
sub-totals for each of the Categories.
Now, let’s look at how the table
Order of Operations come into play. Drag Sub-Category to the filters shelf and
remove Accessories, Appliances, and Bookcases. Now, the results should look
like:
Notice that the Category Totals no
longer match the sub-totals for each Category. Why? The reason is the Order of
Operations. FIXED LODs are calculated before
dimension filters. So Tableau is taking the entire data set and calculating
Category Total. After that’s done,
it filters the view on Sub-Category.
You might want this in some cases, but in others, you might want the Sub-Category filter to compute first.
To do this, we can change that filter to a Context
Filter, which computes before FIXED LODs. To do this, right-click the
filter and select “Add to Context”. The Category Total now matches the
subtotal.
INCLUDE
& EXCLUDE
But wasn’t this post supposed to be
focused on INCLUDE and EXCLUDE? Correct! This background was necessary, but now
we are ready to move on.
What do EXCLUDE and INCLUDE mean?
According to Tableau, “EXCLUDE level of detail expressions explicitly remove
dimensions from the expression—that is, they subtract dimensions from the view
level of detail.” And “INCLUDE level of detail expressions compute values using
the specified dimensions in addition to whatever dimensions are in the view.”
Huh? Let’s take a closer look…
Let’s take Sub-Category off the filters shelf and look at what this means. To
make it a little more obvious, let’s drag Ship
Mode to the right of Sub-Category
on the Rows shelf. Your view should now look like:
Our two columns contain the FIXED
Sales at the Category Level and the SUM(Sales) broken down by Category, Sub-Category, and Ship Mode.
Let’s write a new calculation:
Notice that we haven’t specified any
dimensions after “EXCLUDE”, meaning that we’re not excluding any dimensions
from the aggregation. Let’s double-click on the field to add it to the view and
see what we get:
We get the exact same thing as plain
old SUM(Sales), with the exception
that Tableau can’t total one of them, resulting in a * value instead. Why?
Because we didn’t tell Tableau to exclude anything from the calculation!
Let’s go back and edit EXCLUDE and add Sub-Category
to the calculation:
Now our numbers have changed!
What is going on here? Our EXCLUDE
calculation is now calculating the sum of sales aggregated to everything in the
view IGNORING (i.e. excluding) Sub-Category.
Essentially, each row is calculating the sum of Sales by Category and Ship Mode.
Understandably, the totals will be repetitive–notice that the First Class Ship Mode has the same value for all of
the Sub-Categories within the
Furniture Category.
We could verify this by creating a
FIXED which has all the dimensions in the view in it except for the ones in the
EXCLUDE. That is, in our case,
We can see that this produces the
same results as the EXCLUDE LOD.
One big difference between these
calculations is that we can decide if we want to filter before or after
applying our FIXED LOD by leveraging context filters. We cannot do this with
EXCLUDE and INCLUDE LODs because they are always calculated after both context and dimension filters
in the Order of Operations.
FIXED LODs, on the other hand, can
compute after a filter if we make it a context filter.
Let’s go back to our viz and
double-click our new calculation to the view. Now, let’s see the difference in
filtering. Add Sub-Category to the
filter shelf and deselect Accessories, Appliances, and Bookcases:
Hmmm…the calculations aren’t the same
anymore. This is because the FIXED calculated field has calculated the result
(i.e. aggregated sales) before the filter is applied and the EXCLUDE after. If
we add the filter to context the number will match.
I should say here that this is the
biggest confusion I see when training and coaching Tableau users–even users who
have advanced Tableau skills. People have an intuition that the missing Sub-Categories from the filter should
be ignored (excluded from consideration!) but it doesn’t work that way because
filters are not part of the view.
This is the single most important
thing to take away from this post in my experience–if your use case requires
you to control whether or not a filter is applied before or after a calculation
AND you require an aggregation that isn’t on the view, your only option is a
FIXED LOD. (If you want to control the filter before or after AND the fields
required for the level of aggregation are in the view, it is usually a table
calculation filter to the rescue, but that would be another blog post
entirely!).
When to Use EXCLUDE & INCLUDE
So, when should we use an EXCLUDE
LOD? I’ve been using Tableau (almost) daily for 7 years and I still can’t find
a good use case. Bethany Lyons has a great post on top
15 LOD expressions which she highlights using an
EXCLUDE for comparative sales analysis and Tableau Tim covers the same use case on
YouTube. I recommend checking those out.
However, I would personally use a FIXED LOD for that.
In theory, if you were publishing a data source that others would use and you
didn’t know what dimensions they would add to the view, you might want to
create an EXCLUDE, but it could be difficult to fully understand the results. I
would love to hear any use cases you regularly use, so please share them in the
comments?
What about INCLUDE? INCLUDE LODs let
you aggregate data at a level that is lower than the level of your viz. That
is, INCLUDE includes the viz level of detail PLUS whatever dimension(s) you
have in your calculation. This one always seems confusing. How can the person
viewing the viz understand a value that isn’t on the viz and is less aggregated
than what is in the viz?
I can only think of one example and
it is one I use frequently. To understand it, let’s look at the aggregation
options in Tableau, namely: SUM, MEDIAN, AVERAGE (MEAN), COUNT, COUNT DISTINCT,
MINIMUM, MAXIMUM, PERCENTILE, STANDARD DEVIATION, and VARIANCE. That is a lot
of aggregations! In my opinion, two important ones are missing–FIRST and LAST.
You might be thinking, “I’ve seen
LAST and FIRST in Tableau before!” Tableau does have those functions, but only
with table calculations. Table calculations need all the data in the view to be
used and sometimes we don’t want or need that.
How are FIRST and LAST different from
MINIMUM and MAXIMUM? Imagine you have a date field on the view and it is
aggregated to the month level. You also have sales in the view. To make this
thinking a bit easier, let’s also say that data is aggregated to the day level.
MINIMUM would return the sales of the day with the least amount of sales and
MAXIMUM would return the sales on the day with the highest sales in the month.
FIRST and LAST, on the other hand, would give us the sales on the first and
last days of the month, respectively. There is no (native) aggregation to get
you these values in Tableau and the level of day is lower than the level of
detail of the view. For this reason, the most efficient way to find those
numbers is with an INCLUDE LOD.
As a quick aside, there are a lot of
numbers we might want at the opening or closing of a period that cannot be
otherwise aggregated–headcount, stock values, many SaaS metrics like monthly
recurring revenue (MRR), and more. I’ve even had cases where clients have a
running sum of sales in their database at the day level–to find out monthly
growth of sales, you need to compare the single value at the end of each month.
Let’s create a new sheet with the
same Superstore data source.
Bring Order Date (as a continuous quarter) to Columns and Sales to Rows. You should have a sheet
that looks like this:
What we are seeing is the sum of
sales for each quarter in our data. What if we wanted to see sales on the last
day of each quarter?
First, create the following
calculation:
What is this calculation saying? It
is saying to get the maximum date at the level of detail of the viz. Since Order Date is a date (and not a
date-time) this means the day of the maximum date based on the level of the
viz.
If we wanted to get the first day of
the period, we would replace MAX with
MIN.
Next, let’s create another
calculation:
This calculation is getting the value
for sales when the Order Date equals
the last day of whatever aggregation Order
Date is on the view. Drag our new field over SUM(Sales) on rows to replace it. You should see:
My favorite part of INCLUDE
calculations is how they respect the level of the data in the view. Click on
the + in front of QUARTER(Order Date). Now we have the sum of sales for
the last day of each month! No complicated date calculations to get the level
of detail correct!
As with EXCLUDE LODs, INCLUDE LODs
happen after dimension filters in the Order of Operations, so context filters
have no impact. However, with INCLUDE, this always makes sense since they work
at a level lower than the level of detail in the view. If the data is filtered
out of the view, it should be filtered out of the calculation.
Wrap-Up
Phew! That was a long one. The
obvious takeaway from this post is that you probably don’t even EXCLUDE LODs
and there is one really powerful use case for INCLUDE LODs. But, I hope you got
more out of it than that! What these calculations, and this post, hopefully
demonstrated to you is a deep understanding of Tableau’s Order of Operations
and what it means to be “in the view.” In my opinion, these are the two key
concepts that need to be understood to get someone off the plateau between
“experienced” and “advanced” with Tableau.
Kirk Munroe, August 19, 2024
Suppose you create a bar chart for category and need manufacturer in the view to filter another chart/table with another data source with field manufacturers via filter action. Adding manufacturer on detial will chop the solid bar to multiple tiny parts. Doesn't look pretty good but exclude manufacturer : sum of sales calc and turning aggregation off in the analysis menu will do the job. Now clicking the category field will send your manufacturers in the filter action and you have solid bar again. Credit to Bethany as well
ReplyDeleteLove it!
DeleteI'm confused - did Ken write this, or did Kirk Munroe, because Ken's name is in the byline, but Kirk's is at the end of the article?
ReplyDeleteKirk did. I forgot to make that clear at the beginning, but that's fixed now.
Delete