A Beginner’s Guide to IF Statements in Tableau
IF statements are a fundamental part of Tableau, as well as other analytics platforms, programming languages, and even tools like spreadsheets. Understanding what they do and how they work is critical when building calculated fields. So, in this blog, I’m going to take you through the basics of IF statements, touching on their components, some rules that govern them, the order of operations, etc. While this blog is targeted at Tableau users, particularly new users, most of what I’ll be sharing is applicable to any other platform or programming language that uses IF statements—for the most part, the only difference will be the syntax.
Components
An IF statement in Tableau looks something like this:
// Group the sales into three categories.
IF [Sales]<=500 THEN
"Small"
ELSEIF [Sales]>500 AND [Sales]<=2000 THEN
"Medium"
ELSE
"Large"
END
If we break this down, we’ll see a handful of
components. Let’s start with the first two lines:
IF [Sales]<=500 THEN
"Small"
In plain English, this would read, “If the Sales are less
than or equal to 500 then return the string value, “Small”. Note: I like to
write the IF statement on one line then the return value on the next, prefixed
by an indent as I think this makes it much easier to read.
The IF keyword tells Tableau we’re about to
perform an IF statement. THEN indicates that we’re about to specify some
return value. In between IF and THEN is what is referred to as a conditional
expression. The conditional expression is essentially just a boolean statement—a
comparison that returns either TRUE or FALSE. In this case, that statement is [Sales]<=500.
This expression leverages a comparison operator to compare the two
values, Sales and 500. There are six comparison operators available
to us:
Operator |
Meaning |
Returns True If… |
= or == |
Equal
To |
Both terms
in the expression have exactly the same values. |
<> or != |
Not Equal To |
The terms have different values. |
> |
Greater
Than |
The
first term has a larger value than the second term. |
< |
Less Than |
The first term has a smaller value than the second
term. |
>= |
Greater
Than or Equal To |
The
first term has either a larger value or the same value as the second term. |
<= |
Less Than or Equal To |
The first term has either a smaller value or the
same value as the second term. |
In our example, we are checking to see if Sales
is less than or equal to 500. If it is, the IF statement will be TRUE. If not,
it will be FALSE. TRUE and FALSE are the only possible results. If the boolean statement
is true, then the value specified after THEN—in this case, “Small”—will be returned.
If not, then we’ll continue onto the next line:
ELSEIF [Sales]>500 AND [Sales]<=2000 THEN
"Medium"
We’ve already written IF so our next conditional
expression must be prefixed with ELSEIF. In the above example, our conditional
expression is [Sales]>500 AND [Sales]<=2000. Unlike the first
example, this expression contains two separate comparisons, [Sales]>500
and [Sales]<=2000, separated by AND. In this case, AND is
what’s called a logical operator. The most common logical operators are
shown below:
Operator |
Definition |
AND |
In order
for the entire conditional expression to be true, the comparisons on the left
and right side of the AND must both be true. If either of them is false, then
the entire statement is false. |
OR |
In order for the entire conditional expression to be
true, at least one of the comparisons on the left or right side of the OR must
be true. The entire statement will only be false if both comparisons are
false. |
NOT |
Unlike
AND and OR, NOT is not used to combine multiple expressions. Instead, it is
used on a single expression and basically returns the opposite of the
expression’s result. In other words, if the expression is TRUE, NOT will make
it FALSE and vice versa. NOT can often be avoided by using different types of
comparison operators. |
Note: It can get very complicated when you start to
chain ANDs and ORs together in conditional expressions. We’ll deal with this
later on in the Order of Operations section.
So, in order for our example statement to be true, the
Sales must be both greater than 500 and less than/equal to 2000. If both
are true, then the IF statement will return “Medium”. If not, it continues onto
the next statement.
ELSE
"Large"
When we use the keyword, ELSE, we do not need a
conditional expression. ELSE simply means that, if none of the earlier
conditional expressions were true, then we’ll return this value. So, in our
case, if the Sales are greater than 2000 then the statement will return “Large”.
Finally, in order for the IF statement to be
completed, it must be closed out with END.
Other than the basic operations detailed above, there
are a few key things you should understand about IF statements, which are
detailed below.
First Wins!
When it comes to IF statements, first always wins! In
other words, IF statements end as soon as a conditional expression is found to
be true. Let’s look at a modified version of our earlier statement.
// Group the sales into three categories.
IF [Sales]<=500 THEN
"Small"
ELSEIF [Sales]<=2000 THEN
"Medium"
ELSE
"Large"
END
We’ve taken the following:
ELSEIF [Sales]>500 AND [Sales]<=2000 THEN
and changed it to:
ELSEIF [Sales]<=2000 THEN
While it might seem like this will change the results,
it will not. To illustrate this, let’s assume that Sales = 100. Both IF
[Sales]<=500 THEN and ELSEIF [Sales]<=2000 THEN are
technically true, but because of the rule noted earlier, the statement will end
as soon as it reaches one true expression. Since the first expression is true,
the IF statement will return “Small” and end—it will never go on to evaluate
the second expression. If, for example, Sales = 600, then the first statement
would be false, so the IF statement would continue on to the second expression.
Since we now know that Sales is greater than 500, it’s redundant to include it
in the conditional expression, so we can exclude that comparison.
Single Row/Partition Only
Another important thing to understand is that an IF
statement will only operate on a single row. Okay, this is technically not
true as Tableau calculated fields and IF statements can compute over a larger
partition of data when some sort of aggregation is being used. In that case, we
then get into the viz level-of-detail and other more complex topics than I want
to discuss here. So, for the sake of keeping this simple, we’re not going to do
any sort of aggregation and, because of that, our IF statements will always operate
at the row level.
Take, for example, this small sample data set.
Actor |
Movie |
Category |
Adam
Sandler |
Billy
Madison |
Comedy |
Mark
Wahlberg |
Ted |
Comedy |
Mark
Wahlberg |
Shooter |
Action |
and the following calculated field:
New Category
// Create single value for actors doing both action and comedy.
IF [Category]="Comedy" AND [Category]="Action" THEN
"Comedy/Action"
ELSE
[Category]
END
You might be tempted to think that Mark Wahlberg will
result in a value of “Comedy/Action” since he has both the “Comedy” and “Action”
categories. But this is not the case, since this IF statement will operate at
the row-level only—it will only ever see the Category from that current row,
resulting in the following.
Actor |
Movie |
Category |
New Category |
Adam
Sandler |
Billy
Madison |
Comedy |
Comedy |
Mark
Wahlberg |
Ted |
Comedy |
Comedy |
Mark
Wahlberg |
Shooter |
Action |
Action |
You could, of course, create a calculated field to combine
the genres of movies that Mark Wahlberg stars in, but that would require some
more complex calculations than we’re going to get into here.
Same Data Type
IF statements must always return the same data type.
Take, for example, this version of our calculated field:
// Group the sales into three categories.
IF [Sales]<=500 THEN
1
ELSEIF [Sales]<=2000 THEN
"Medium"
END
If you were to try this in Tableau, it would give you
the error, “Expected type integer, found string. Result types from ‘IF’ expressions
must match.” While this is a somewhat cryptic error message, it’s simply
stating that the first IF statement returned an integer, but the following
ELSEIF returns a string. We can’t do this as both must be the same data type—we
must choose either an integer return value or a string, not both.
There’s Always an ELSE
A final thing you should understand is that IF
statements always include an ELSE, whether you explicitly include them or not. In
our earlier example, we explicitly included an ELSE, but what if we excluded it
as shown below?
// Group the sales into three categories.
IF [Sales]<=500 THEN
"Small"
ELSEIF [Sales]<=2000 THEN
"Medium"
END
In this case, any row with Sales greater than 2000
would return a value of NULL. Excluding the ELSE is essentially like writing ELSE
NULL.
// Group the sales into three categories.
IF [Sales]<=500 THEN
"Small"
ELSEIF [Sales]<=2000 THEN
"Medium"
ELSE
NULL
END
NULL is a special value that indicates that there is
no value. It can also be any data type, which means that it’s always a valid
return value.
Of course, there is nothing stopping you from
explicitly writing ELSE NULL. In fact, I do this sometimes to make it
easier for others to understand the logic of a particular IF statement or to
make note of the fact that I realize there are other potential values, but I
want those to return NULL.
Order of Operations
Now that we’ve explored the anatomy of IF statements
as well as some of the rules that govern their usage, let’s talk a bit about
some more complex logical issues we may encounter. Here’s an example I see
quite frequently:
IF [Category]="Furniture" OR [Category]="Office Supplies" AND [Order Date]>=#01/01/2020# THEN
You might think that this will give you anything where
the Order Date falls on/after January 1, 2020 and has a category of either “Furniture”
or “Office Supplies”. But you’d be wrong as these different logical operators
have a specific order of operations. This order of operations is just like the
one used in equations in mathematics. The following graphic shows the order of
precedence for logical statements:
In other words, the first thing to be computed in a
logical statement are the parentheses, which group comparisons together. Next
comes NOT, followed by AND, then OR. So, let’s use this to better understand
the IF statement above. Since AND computes before OR, the two comparisons
separated by AND will be evaluated first:
[Category]="Office Supplies" AND [Order Date]>=#01/01/2020#
So, this will give us anything with a category of “Office
Supplies” and an order date on/after January 1, 2020.
The OR will then act as if the above statement is a
singular expression. So, in English, the original statement will give us anything
where 1) Category is “Furniture” OR 2) Category is “Office Supplies” and order
date is on/after January 1, 2020. Furniture sales from 2019 will be
included since they meet criteria # 1.
You can probably see how, as stated earlier, IF
statements that mix different logical operators can get very confusing as you
need to think about the order in which the operators will execute. Fortunately,
we can use parentheses to group different comparisons. For example, if let’s
change this statement to return anything where the Order date falls on/after
January 1, 2020 and has a category of either “Furniture” or “Office Supplies”.
To do this, we can add parentheses (shown in red below) to group the category
conditions together.
([Category]=”Furniture” OR [Category]=”Office Supplies”) AND [Order Date]>=#01/01/2020#
These parentheses will force the IF statement to first
evaluate the conditions between them—is the category “Furniture” or “Office
Supplies”? Then it will evaluate this expression, as a whole, along with
the expression on the other side of the AND, which is exactly what we want.
Since it can be difficult to think through the order
of operations as you’re writing an IF statement, my personal recommendation is
to always use parentheses to group your conditions any time you’re mixing
logical operators. If all of your operators are ORs or all of them are ANDs,
then this is not necessary, but as soon as you mix ANDs, ORs, and NOTs, it’s a
good practice to explicitly insert parentheses so that you can be 100% of the
order in which each condition is evaluated.
Variants of IF
Before I wrap this blog, I want to point out a few
variants of IF that are available in Tableau and some other platforms/programming
languages.
One such variant is IIF, which allows you
perform a simple IF/ELSE statement within a concise function call. It takes three
parameters—the condition, the result if true, and the result if false.
For example, let’s look at the following statement.
IF [Sales]<=500 THEN
"Small"
ELSE
"Medium"
END
We could convert this to IIF as follows:
IIF([Sales]<=500, "Small", "Medium")
While the function only supports one IF and one ELSE,
you can use nested IIF statements to mimic the functionality of IF/ELSEIF/ELSE
statements. However, nested IIFs can be very difficult to read, so in my opinion,
you’d be much better off writing out the whole IF statement.
Note: If you come from an Excel background, then you’ll
recognize this as it’s pretty much exactly how you write IF statements in
Excel.
Another variant is the CASE statement. The
functionality of case statements varies quite a bit between different platforms.
In Tableau, it is simply a more concise, yet less powerful method for performing
multiple comparisons. However, in platforms such as SQL, CASE allows you to do
everything you can with an IF statement. In fact, pure ANSI standard SQL doesn’t
even include IF statements at all—you must use CASE.
Let’s look at the following IF statement.
IF [Sub-Category] = 'Accessories' THEN 'Group A'
ELSEIF [Sub-Category] = 'Appliances' THEN 'Group A'
ELSEIF [Sub-Category] = 'Art' THEN 'Group B'
ELSEIF [Sub-Category] = 'Binders' THEN 'Group B'
ELSEIF [Sub-Category] = 'Bookcases' THEN 'Group C'
ELSEIF [Sub-Category] = 'Chairs' THEN 'Group D'
ELSEIF [Sub-Category] = 'Copiers' THEN 'Group D'
ELSEIF [Sub-Category] = 'Envelopes' THEN 'Group D'
ELSEIF [Sub-Category] = 'Fasteners' THEN 'Group E'
ELSEIF [Sub-Category] = 'Furnishings' THEN 'Group E'
ELSEIF [Sub-Category] = 'Labels' THEN 'Group F'
ELSEIF [Sub-Category] = 'Machines' THEN 'Group G'
ELSEIF [Sub-Category] = 'Paper' THEN 'Group H'
ELSEIF [Sub-Category] = 'Phones' THEN 'Group I'
ELSEIF [Sub-Category] = 'Storage' THEN 'Group I'
ELSEIF [Sub-Category] = 'Supplies' THEN 'Group I'
ELSEIF [Sub-Category] = 'Tables' THEN 'Group J'
END
This is a simple IF statement in that each IF/ELSEIF
performs a simple equality comparison based on the same field, Sub-Category.
In this scenario, we can replace it with a CASE statement:
CASE [Sub-Category]
WHEN 'Accessories' THEN 'Group A'
WHEN 'Appliances' THEN 'Group A'
WHEN 'Art' THEN 'Group B'
WHEN 'Binders' THEN 'Group B'
WHEN 'Bookcases' THEN 'Group C'
WHEN 'Chairs' THEN 'Group D'
WHEN 'Copiers' THEN 'Group D'
WHEN 'Envelopes' THEN 'Group D'
WHEN 'Fasteners' THEN 'Group E'
WHEN 'Furnishings' THEN 'Group E'
WHEN 'Labels' THEN 'Group F'
WHEN 'Machines' THEN 'Group G'
WHEN 'Paper' THEN 'Group H'
WHEN 'Phones' THEN 'Group I'
WHEN 'Storage' THEN 'Group I'
WHEN 'Supplies' THEN 'Group I'
WHEN 'Tables' THEN 'Group J'
END
As you can see, this requires a bit less text and is a
little easier to write. However, as noted above, CASE statements in Tableau
require 1) you are comparing to a single field, which is listed after the CASE
keyword and 2) you are performing equality comparisons only—you cannot perform
any inequality comparisons (>, <, etc.) in a CASE statement.
Note: Be sure to check out Kevin's Case Statement Generator if you need to crank out CASE statements with lots of values.
Conclusion
IF statements (and their variants) are incredibly
powerful and an absolute requirement for anyone creating calculated fields in
Tableau. But, if you don’t come from a programming or analytics background,
they can be a bit daunting. While I’ve only just scratched the surface on what
you can do with IF statements in Tableau (and other platforms), I’m hopeful
that this short guide will help you to better understand how they work and will
help you to avoid many of the pitfalls that others have encountered when first
learning them.
Thanks for reading! If you have any comments, please
feel free to leave them in the comments section below.
Ken Flerlage, February 15, 2021
Hi Ken, good article, as usual.
ReplyDeleteHowever, I would include that IF THEN and IIF treat NULLs differently by default and you should know that before deciding which one to use.
The first part of this article summarizes it nicely.
https://interworks.com/blog/tmcconnell/2015/02/10/case-statements-vs-if-statements-tableau
One thing I would add that is not in the article is that the IIF function has a fourth argument to categorize the NULLS
Great point. I'll add that. Thanks for the feedback!
DeleteHola Ken. Felicitación como siempre tu ayuda es oportuna.
ReplyDeletePor favor para cuando tengas un tiempo, tendrás algún articulo para realizar actualización incremental deseo actualizar y remplazar solo los datos de la ultima semana. mi base de datos esta en SQL Server. y diariamente actualizo los dashboard, pero demora porque actualmente esta actualizando todos los años.
I agree about the null fields, it is important to know what formula to apply for a correct result. Excellent article.
ReplyDeleteGreetings from Argentina @mariano35
Great article - Thanks for sharing!
ReplyDeleteKind regards from France ! Alex
Its really great article, have cleared lot of my doubts but would be good to have different error covered while writing IF statements, especially by newbies
ReplyDeleteVery logically and neatly written, thanks!
ReplyDeleteI'm new to Tableau and this was extremely beneficial. Thank you.
ReplyDeleteOur pleasure!
Delete