Kirk Munroe: 14 Scenarios and When to Use Tableau Prep vs Desktop?
Kevin
and I are pleased to welcome back Kirk Munroe for the second blog in his series
about data modeling with Tableau. Kirk lives in Halifax, Nova Scotia, Canada
and is a business analytics and performance management expert. He is currently
one of the owners and principal consultants at Paint
with Data,
a visual analytics consulting firm, along with his business partner and wife,
Candice Munroe, a Tableau User Group Ambassador and former board member for Viz for Social Good.
Kirk is also the author of Data Modeling with Tableau, an extensive guide, complete with step-by-step explanations of essential concepts, practical examples, and hands-on exercises. The book details the role that Tableau Prep Builder and Tableau Desktop each play in data modeling. It also explores the components of Tableau Server and Cloud that make data modeling more robust, secure, and performant. This blog will give you a taste of what’s available in the book, so if you find this useful, go buy the book! Also be sure to check out Kirks’ previous blog, 4 Common Tableau Data Model Problems…and How to Fix Them.
Desktop
vs Prep
When modeling
data in Tableau, one of the most common questions I hear is, “when do I use
Tableau Prep and when should I use Desktop?” In this post, I am going to do my
best to answer these questions in a pragmatic manner.
Why pragmatic?
Data modeling often ends up in a very theoretical, almost dogmatic,
conversation. In the context of Tableau, this often means that many of the
items that are discussed here would ideally be done earlier in the data
pipeline. I acknowledge that many of these items might be in the vein of “once
the horse is out of the barn” but, in my experience, data engineering teams are
regularly overtaxed and often have strict development processes that result in
weeks (or months) for change requests. Business users don’t want to wait that
long. So, we are going to make three assumptions before diving into our
scenarios:
1)
It’s In Your Hands - You, the
reader, are tasked with coming up with an analysis for decision-makers in your
organization and you don’t have the ability, funding, or time to go back to an
earlier point in the data pipeline.
2)
You have a Tableau Creator License (which
gives you access to both Desktop and Prep) and don’t have access to other data
prep tools. In other words, we are going to solve these use cases using only
Tableau products.
3)
Your organization has Tableau Prep
Conductor on Tableau Server or Cloud. Prep Conductor is part of the Data Management Add-On and, without it, you cannot easily
schedule Prep workflows to refresh automatically. Without this, the choice of
Prep vs Desktop becomes more difficult as you can easily refresh data models
created in Desktop, but Prep would require you to either refresh them manually
or to use unwieldy batch files.
We are going to
use the language of Prep and Desktop for brevity. Prep represents the Tableau Prep
Builder client software or New Flow capability
on the web. Desktop represents the
Tableau Desktop client software or the New
Workbook and New Published Data
Source capabilities on the web.
One other
important concept to always keep in mind is the trade-off between the cost of a
query and the user experience (UX) for your end users. What is meant by the
cost of a query? The simplest way to think of this is how long the query takes
to run. A “cheap” query runs quickly (e.g. subsecond) and an “expensive” query
takes longer to run. In the cloud computing era, this can often map exactly to
the monetary metaphor it represents if you are paying by server utilization.
The concept is still important with on-premise database servers as a longer
query means longer queues for subsequent queries. We want to optimize our
queries to be as quick (cheap) as possible but it isn’t always possible. In
these cases, we want to pay the cost of these queries as infrequently as
possible.
What does all
this mean in the context of Prep vs Desktop? Prep will typically pay the cost
of the query once per output–commonly once a day. In the case of a live
connection, Desktop will pay the cost of the query every time someone interacts
with the viz. This means, the more expensive the query, the better suited it is
for Prep.
Of course, when
we’re working with extracts, that’s not exactly the case. In our scenario,
we’re most likely going to output our Prep flows to a hyper extract. You can,
of course, do the same thing with Desktop. By doing this, we’re reducing the
cost of the query to once, just like in Prep. Of course, an extract is, in
fact, a database, so Tableau is always executing queries against it. But, at
that point, the question of Prep vs Desktop becomes less about how many times a
query is run and more about the complexity of those queries, which we’ll touch
on in these scenarios.
What about the
end-user experience? In many situations, this is what you have to trade off
against query cost. Let’s take a common use case that we won’t cover in this
post. In the Designing
Efficient Workbooks whitepaper (which is an amazing
resource, so I highly recommend having it handy at all times!), the authors
talk about how expensive “only relevant values” filters are in Tableau. This is
a trade-off I would make for user experience every time (assuming I couldn’t
get rid of the quick filters on the dashboard altogether 😊). When a
business user comes across a quick filter with values that, if selected, result
in viz with no data, the experience is frustrating and often appears to be a
bug. It can result in them not trusting or even using your viz. Worth the trade-off in performance! Of course, a
slow-running query is not good for user experience either, so there’s a
delicate balance for us to strike.
Scenarios
Now let’s jump
into our scenarios. For each, I’ll provide one of the following
recommendations:
Use
Prep
- Only Prep has the feature(s) needed.
Lean
Prep
- This is a case where it can be done in either but it is better done in Prep.
If you’re already building a Prep flow for other purposes, then I’d highly
recommend pushing this particular functionality into that flow.
Either
-
Both products work just fine so you’ll need to consider the cost of the query
and user experience, then pick the tool that makes the most sense for your use
case.
Lean
Desktop
- It can be done in either but it is better done in Desktop. If you’re not
already building a Prep flow for other purposes, then it’s probably not worth
creating one specifically for this scenario.
Use
Desktop
- Only Desktop has the feature(s) needed.
I’ve sorted these
scenarios from Use Prep to Use Desktop as that is the order we
would typically encounter them in the data pipeline.
1)
Pivot Rows to Columns -
Use Prep
Why: While you can sort of force Desktop to do something similar to this, the Pivot Rows to Columns feature only exists in Prep.
When
might you encounter it: When your data is too narrow as
described in 4 Common Tableau Data
Model Problems…and How to Fix Them.
Exceptions:
None. Trying to deal with this in Desktop results in bad UX and costly queries.
2) Cleaning Data - Use Prep
Why:
This feature is somewhat limited in Desktop. Prep has robust machine learning
algorithms to clean and profile data as you work.
When
might you encounter it: When the source systems that
produce your data don’t properly enforce text field input validation. Or, in
general, you’re working with some sort of messy data set. In other words, a
lot!
Exceptions:
There are a few scenarios where, if you have only a little bit of data cleaning
to do, you might be able to knock that out in Desktop. But, I’d recommend
against that as data profiling is built into Prep, allowing you to catch
cleanliness problems you would have never even noticed otherwise.
3) Date Scaffolding - Use Prep
Why:
While possible to do this in Desktop, it’s very complicated. Prep, on the other
hand, includes easy-to-use native functionality for date scaffolding.
When
might you encounter it: When your data is too short as
described in 4 Common Tableau Data
Model Problems…and How to Fix Them.
Exceptions:
None
4) Row-Level Calculations - Lean Prep
Why:
A row-level calculation is always an option in Prep as it has all rows and the
calculation is not dependent on the viz level-of-detail. If you do these
calculations in Desktop, you might pay the price for the same query multiple
times. (As noted earlier, in the case of extracts, Desktop will materialize
these calculations. In other words, it will only pay the cost of the query when
it rebuilds the extract, the same as Prep. However, it is not always easy to
predict when Hyper materializes a calculation and when it doesn’t.)
When
might you encounter it: It’s pretty rare to not have any
need for row-level calculations, so it’s likely you’ll have to deal with this
almost every time.
Exceptions:
If you aren’t otherwise using Prep to build your data model and you test in
Desktop (or with Performance Recorder) and it runs quickly.
5) Integration with Data Science Models - Either
Why:
If the data science model is intended to run once per refresh of the data
model, use Prep. If it is being used interactively based on user input, use
Desktop. For example, you might need to score a large amount of data for
customer sentiment; in this case, since it’s being done once, you can use Prep.
Another example is interactively deciding on a customer's propensity to churn
based on parameter inputs; in this case, the output is dependent upon user
interaction, so you must use Desktop.
When
might you encounter it: Any time you’re using an analytics
extension (R, Python, etc.) to process data.
Exceptions:
NA
6) Filtering Data - Either
Why:
Both Prep and Desktop make it easy to filter both rows and columns of data.
When filtering is needed to create your data model (for example, maybe you only
need to show the last 3 years), then Prep is best. When filtering needs to be
interactive for your user, then you must use Desktop.
When
might you encounter it: Filters are used in almost
everything I create, so you’ll encounter this in most scenarios. When deciding
whether to use Prep or Desktop, you should focus on whether or not those
filters need to be available to users.
Exceptions:
N/A
7) Union Data - Either
Why:
Both Prep and Desktop make it easy to union data. My preference is to do this
in Prep if possible as it will greatly simplify your resulting Tableau Desktop
data model and could help with query performance.
When
might you encounter it: Fairly Often. Especially when
legacy systems dump data to flat files for ingestion into analytics tools like
Tableau.
Exceptions:
N/A
8) Joining Tables that Don’t Cause Aggregation Issues - Either
Why:
Both Prep and Desktop make it easy to join tables. Most joins do not cause
aggregation issues as you will be joining dimension tables to fact tables. An
example would be a Superstore-like scenario in the “real world.” Superstore is
already a highly denormalized table. It would have likely been created by
joining a fact table of sales with a few dimension tables. For instance, in
terms of product analysis, the fact table would have only had Product ID in the
fact (sales) table. This data would have been joined with a product table with
fields for Category, Subcategory, and Product Name. This is a pretty common
type of join and does not cause aggregation issues as the result of the join
leaves the leave of aggregation the same as it was before the join–individual
sales at the product level.
When
might you encounter it: Frequently. It’s very common to
need to join multiple tables together.
Exceptions:
Not an exception, but this might have a slight “Lean Prep” because the Prep UI
makes the results of your join immediately obvious in a visual manner. However,
the nature of relationships might cause you to rethink this, as I’ll address
later in this post.
9) Pivot Columns to Rows - Either
Why:
Both Prep and Desktop make it easy to pivot columns to rows. As with unioning,
I prefer pushing this to Prep in order to simplify your Desktop data model.
Prep also has a little more functionality here, particularly the ability to use
wildcards when pivoting.
When
might you encounter it: When your data is too wide as
described in 4 Common Tableau Data
Model Problems…and How to Fix Them.
Exceptions:
N/A
10) Aggregate Calculations - Use Desktop
Why:
Aggregate calculations are typically dependent on the fields on the viz, so
they should be done in Desktop. For example, consider profitability. You can
create a SUM(Profit)/SUM(Sales) calculation in Desktop then use it to find profitability
at the region level, customer level, or any level you choose–all using a single
calculation. In Prep, you’d have to know those levels of detail ahead of time,
then create individual fields for each one.
When
might you encounter it: Aggregate calculations are very
common and are typically used in most Tableau workbooks.
Exceptions:
MIN and MAX calculations could make sense in Prep. Imagine that you want to
know when a customer first bought a product, using Superstore as an example.
The calculation { FIXED [Customer Name] : MIN([Order Date])} is not dependent
on the level of the viz. Why not ensure you only pay the cost of that query
once and put it in Prep?
11) “Conditional” Calculations - Use Desktop
Why:
These are calculations that are “conditional” depending on what is placed on
the view or based on some data input. Generally speaking, these tend to be
table calculations (which will always compute differently based on the
dimensions on the view) and some types of level-of-detail (LOD) calculations,
particularly INCLUDE and EXCLUDE. In such cases, Desktop must be used.
When
might you encounter it: It’s somewhat often that you’ll
need to use these types of calculations in your work.
Exceptions:
In some simple cases where a calculation is conditional on a single value, Prep
could be used (particularly using a parameter). There are also other scenarios
where the only option in Desktop is the use of a table calculation, but Prep
could be used to limit such a need. For a good example of such a scenario, see
the Get Data from a Prior Row/Partition
section of Ken’s blog, 3
Creative Uses for Relationships/Joins in Tableau.
The blog shows how to address the problem using Desktop, but we think it would
be more effective to handle this upfront in Prep. The solution pushes the work
to the data model, resulting in the elimination of any need for complicated
table calculations.
12) Extending the Data Model (Sets, Folders, Hierarchies, Custom Formatting, etc.) - Desktop
Why:
There is so much you can do with your data model in Desktop that you cannot
create in Prep, including creating Sets and Hierarchies, organizing data into
folders, applying default formatting (and other settings) to data items, etc.
When
might you encounter it: All the time! This doesn’t mean
Prep can’t be your go-to modeling tool. Just enhance your model after Prep runs
the flow. The good news is that all your changes are maintained when the flow
is run in the future.
Exceptions:
N/A
13) “Joining” Tables that Result in Aggregation Levels - Use Desktop
Why:
This isn’t possible in Prep. It is done in Desktop with relationships and not
joins, hence the “joining” in the use case title. Let’s take the case of
getting sales targets for each subcategory at the monthly level. Your consumers
are looking to see how sales performed against targets. If you join these data,
you are going to end up with issues because the sales data is at the
transactional level, but targets are at the monthly level. This means you will
either have to: (i) pre-aggregate our sales data before making the join (which
limits details analysis of sales) or (ii) be really careful with the
aggregation levels when creating sheets in Tableau. These data at different
levels of aggregation are handled pretty seamlessly by relationships, which
will we talk about shortly
When
might you encounter it: Somewhat regularly. While this is
not needed in all analyses, it is not an uncommon requirement. When it occurs,
it can be difficult to determine the best approach.
Exceptions:
N/A
14) Live Connections to Your Data - Desktop
Why:
If you need a live connection to your data, then you cannot run it through a
Prep flow ahead of time as Prep always outputs the data from flows into a new
table/files, which are essentially point-in-time snapshots.
When
might you encounter it: There are many use cases for live
connections, especially with the emergence of high-performance cloud data
warehouses. However, it’s important to only use live connections when you
really need them because this can be a big performance hit.
Exceptions:
There are no true exceptions as only Desktop can connect live but there may be
cases where you create flows in Prep, output to a database instead of a
published data source, and have Desktop connect live to that data. Or you may
be able to schedule Prep flows to run frequently enough to create a data source
that is near-real time enough to meet your users’ requirements.
The
Impact of Relationships
Before closing
out this blog, I think it’s important to briefly touch on the impact of
relationships, a feature added to Tableau Desktop in version 2020.2. As
discussed by Ken in his blog, Tableau's
New Data Model, relationships provide a more
flexible way to join your data that is dependent upon the dimensions and
measures used in your view. Essentially, relationships cause Tableau to perform
what Ken calls “Smart SQL”, only joining the tables that are needed to meet the
need of a specific view. When using a live connection, this can help to improve
query performance significantly by “culling” unneeded joins and tables. But,
since extracts are also a type of database, it has a similar impact when
connected to extracts.
Obviously, since
relationships are largely viz-dependent, they don’t make sense in Tableau Prep.
But their existence may cause you to reconsider how you build your data model
in Prep. Let’s say, for example, you have two tables that you want to combine
together. One has 10 thousand rows and will be used in every worksheet you
create in Tableau. The other has 10 million rows and will only be used on one
sheet. You could join the tables in Prep, but your resulting data model will
have at least 10 million rows. Once you’re working with that data in Tableau
Desktop, it will need to process all those rows for all of your sheets. This, of course, is the beauty of
relationships. But this does not mean that you should dispense with Prep.
Instead, you should just rethink your Prep workflow slightly. Instead of
joining these within the workflow, you can work with both separately. Clean up
and extend each table in the flow, but output them to their own tables/files.
Then, in Desktop, connect those outputs using relationships, so that it will
cull the large table when it’s not in use.
Summary
There you have
it–14 Tableau data modeling scenarios and which Tableau product to use for each
of them. As a companion to the blog, I’ve created a Tableau Public workbook to
act as a cheat sheet.
My book, Data
Modeling with Tableau, covers these and many other use
cases and scenarios. In addition to modeling in Tableau Desktop and Prep
Builder, it covers virtual connections, securing and sharing data models,
creating models in the Tableau web interfaces, data model considerations for
Ask and Explain Data, and data catalog and lineage. It is now available on
Amazon in e-book and print formats.
Thanks for
reading. If you have any questions or thoughts, feel free to drop a comment
below and we will be glad to address them.
Kirk Munroe
February 20, 2023
No comments: