Build Your Tableau Cloud Admin Insights History
If you’re a Tableau Cloud user, then you’ve probably leveraged some of the amazing data available in Admin Insights. Admin Insights is a Tableau Cloud project that is pre-populated with curated published data sources and pre-built workbooks showing useful insights about your site. You can use the pre-built workbooks or build your own using the data sources provided.
As of the writing of this blog, the
project contains the following data sources:
Master Data
These data sources contain “master”
data about key content and resources on your Cloud site. Note: When information
is deleted from Cloud, it also gets removed from these tables.
1) TS Users – Tableau Cloud
users, their roles, etc.
2) Groups –Tableau Cloud
Groups and membership.
3) Site Content – Projects,
data sources, workbooks, views, flows, etc..
4) Permissions – Effective
permissions for users and content on your site.
5) Subscriptions –
Subscriptions and their creators and content.
6) Tokens – Active tokens,
including personal access tokens, refresh tokens, and OAuth tokens.
Audit Data
Historical audit data that tracks
point-in-time events.
7) TS Events – Historical
events that occur on the site, including sign-ins, views, data source access,
publishing, etc.
8) Viz Load Times – Historical
tracking of workbook/view load times to help you better understand and address
workbook performance.
9) Job Performance –
Historical tracking of various background jobs, including extract refreshes
(including Bridge, as of May 2023) and Prep flow executions.
Source:
Admin Insights Data Sources
The audit data is particularly
valuable for better understanding and optimizing your environment. For example,
you can use TS Events to understand which workbooks and views are most
heavily used or which users are most active. Or you can use it to determine
which workbooks are stale or which users are not using their licenses. You can
use Viz Load Times to look at the load performance of slow workbooks and
optimize the end user experience. And you can use Job Performance to
optimize the performance of slow-running extracts or flows.
The Problem
But there’s a slight problem with the
Admin Insights audit data. By default, Tableau Cloud only stores 90 days of
history, unless you own the Advanced Management add-on, then you’ll get
365 days. In either case, that may not be enough to perform a thorough
longitudinal analysis of the data. For example, let’s say you find that a
workbook is not being used in that 90 days of history you have. You may wish to
archive it, but it would be useful to go back further to understand exactly
what happened—When was it published? Was it ever used? If so, who used it? Does
that person still work at the company? It may be that the workbook was once
useful but some business process changed, which led to it no longer being
applicable. In such a case, it might only require a couple of tweaks to make it
useful again. But, with only 90 days of data, you can’t properly answer this
question.
You might be asking why Tableau
doesn’t allow us to store more of this information? The answer is actually
quite simple—since they are managing the Tableau Cloud platform, they need to
exert some level of control over resource usage. If they allow you to store all
your audit data, then you could end up consuming multiple terabytes of data on
the platform. And reading that data requires computing power and RAM. This is
the way it works with every Cloud platform—for the platform to scale, they must
create some limits.
Of course, if you run your own
Tableau Server, then you can store as much history as you like. That’s because you
are managing the platform and the resources, so it’s not really an issue for
Tableau and it won’t impact other customers.
Another Problem
There is another problem with the
Admin Insights data. All the data is contained in Published Data Sources. Now, I
love Published Data Sources as a way of curating data for my users, but they do
have some limitations. Most importantly, you cannot relate/join them with each
other or with other data sources.
So, for example, let’s say you want
to link the audit data with some additional information about your employees,
such as their department, location, etc. You cannot, in a Tableau data model,
bring those data sources together since one is a Published Data Source. Your
only choice is to blend those data sources, which can lead to lots of
headaches and additional limitations.
Typically, we’re the ones building
published data sources and we do that by building a data model in Tableau that
connects multiple tables together. So, when we run into this limitation, we can
often work around it by creating a new published data source that uses the
original tables plus whatever additional tables we need. But the Admin Insights
data sources are different because we do not have access to those source
tables—the published data sources are all we get.
A Solution!
So, wouldn’t it be nice if we could
find a way to store all this data in a separate database? By doing that, we
could build up the history over time and, because these would be tables in a
database, we could relate/join them to other data sources as needed.
The good news is that we can!!!
I won’t take credit for this idea as
I learned about it from a client, Hover, a leader in
home renovation visualization and scoping used by both homeowners and pros. My
primary contact, Jenny Harris, Director, GTM
Analytics & Strategy, introduced me to Justin Grilli, Senior Data
Engineer. Several years ago, Justin realized that 90 days of data wasn’t really
enough for their analysis needs. So, he wrote a Python program that leverages
the Tableau REST API to extract the
data and write it to their data warehouse on a nightly basis. They implemented
this years ago, meaning they now have several years-worth of audit data
available for their analysis! Justin even built a public Python package for managing
these data sources, among other things. So, if you’d like to build this in
Python, definitely check it out!!
So, writing code is one way we can do
this, but since we can access these published data sources via both Desktop and
Prep, a second method is to use Tableau Prep. That’s what I’m going to do
today. We’re going to build a Tableau Prep flow that pushes all the Admin
Insights data to a database.
The Prep Flow
I’m going to be creating a single
Tableau Prep workflow that copies the Admin Insights data to a PostgreSQL
database. In Tableau Cloud, we have the option to refresh the Admin Insights
data sources on a daily or weekly basis, so the intent of my workflow is that
it would also be run according to that schedule (more on this later). If you
have the Data Management Add-On, then you have access to Tableau Prep
Conductor and can schedule this flow to run automatically. If not, then you
can either schedule it using another scheduling tool and the command line interface or consider
another method, such as the code option mentioned earlier.
Remember that Admin Insights contains
two different types of tables—master and audit—and we’ll need to handle each of
them differently. The master tables are simple as they do not track any sort of
historical data over time. So, we’ll just need to reload these tables each time
we run the flow. Here’s an example using TS Users:
I’ve added one Clean step that adds a
timestamp field to the end of the table, populating it NOW(). This is a best
practice in data warehousing as it can be useful to know when data was actually
written to the table. We’ll do this with all the tables in this workflow.
The output step writes the data to
the Postgres database using the “Replace data” option. This will truncate the
table (delete all data), then insert the new data. If the table does not exist,
it will be automatically created, so there is no need for us to create the
table in the database ahead of time.
The audit tables are a bit more
complicated. Admin Insights automatically truncates anything older than 90 days
(or 365 if you have the Advanced Management add-on), but we want to allow the
database to build up that history over time. Thus, we cannot do a complete
reload of these tables as that will also remove that older history. Instead, we
want to keep the old data in the database and only update and/or insert the
data that exists in Tableau. For example, using TS Events, the logic
looks something like this:
1) Extract – Extract the TS
Events data from Tableau Cloud.
2) Insert New Rows – If a
row’s Event ID does not exist in the database table, insert it.
3) Update Existing Rows – If a
row’s Event ID exists in the database table, update all fields in the
table using the extracted TS Events data (I’m not sure if historical
data actually changes, but it’s best to do this update just in case).
4) Leave Everything Else – If
a row exists in the database table, but not the extracted TS Events
data, then leave it as is—this is how we’ll build our history over time.
With other ETL tools, there would be
several different methods to perform this operation. For instance, we could
first delete anything in the Postgres database if it exists in the Admin
Insights data, then insert from the Admin Insights data. But that requires two
separate output steps one right after the other—the DELETE and the INSERT—and
Tableau Prep cannot do that. OK, that’s not entirely true—if you have the Data
Management add-on, then you could create multiple Prep Flows then set them up
using “Linked Tasks” so that one runs right after the other. But that method
requires some extra steps so we’re just going to do it in a single flow.
Another option is to do something
that is typically referred to as an “UPSERT”. This is an operation that will UPdate
any records that already exist and inSERT any records that do not
(based on the table’s primary key—in the case of TS Events, the Event
ID field). Some ETL tools do this natively, but Prep does not. However, we
can get tricky with some temporary tables and SQL to make this work, as
detailed by Tableau DataDev Ambassador and 2018 Iron Viz Champion, Timothy Vermeiren on the Biztory blog: How To UPSERT or
MERGE Data with Tableau Prep’s Write-back Functionality.
We’ll start by adding the Timestamp
field, just as we did for the master tables.
But then we’ll write the data to a temporary
table. In the example below, I’ve just added the text, “temp_” to the beginning
of the final database table name to get “temp_tableau_ts_events”. This is just
like the step for the master tables, using a “Replace data” option to
automatically create the table if it does not already exist.
This is our one output operation, so
we’ll use Timothy’s trick and add some SQL to the “After” Custom SQL on the
output step:
This SQL will use a MERGE statement.
MERGE basically performs the UPSERT operation we described earlier. It’s been
part of ANSI Standard SQL for quite a while now so it should be available in
most modern databases.
If you’re not a SQL expert, this
might look a little daunting, so let’s walk through it line by line.
1. MERGE
INTO dw.tableau_ts_events e
We’re upserting to the table,
dw.tableau_ts_events. We’re aliasing this table as “e” so that we can easily
refer to it later, without writing out the whole table name.
2. USING
dw.temp_tableau_ts_events u
We’re upserting based on the table,
dw_temp_tableau_ts_events. This is our source table. We’re aliasing this table
as “u”.
3. ON
e."Event Id" = u."Event Id"
We’ll perform the checks based on the
Event ID field. The database will use this field to check for the
existence of a previous record in the target table, so that it will know
whether to perform an Update or an Insert.
4. WHEN
MATCHED THEN
What follows is the action we’ll take
for each row that already exists in the table.
5-33.
UPDATE SET…
For “matched” records, an update
statement will be performed, updating every field in the table.
34. WHEN
NOT MATCHED THEN
What follows is the action we’ll take
for each row that is NOT already in the table.
35-36. INSERT
(…
For “unmatched” records, an insert
statement will be performed.
Note: The above SQL uses double-quotes
for “Delimited Identifiers” around field names, table names, etc. Double-quotes
are the ANSI standard for such identifiers and should be supported by most
modern database platforms. Other platforms may, however, use different
identifiers by default. SQL Server, for example, uses square brackets by
default, while MySQL uses back ticks (`). However, both also support standard
double-quotes so this SQL should work without issue in most platforms.
Important Note: Since our
output steps are writing to the temp tables, we can’t use that step to
automatically create the final target table (e.g. tableau_ts_events). If desired, we
could add them to the Tableau Prep flow, in the “Before” Custom SQL:
Because this uses a CREATE TABLE IF NOT EXISTS statement, it will only create the table if it does not already exist (i.e. the first time the flow runs). I should, however, note that this syntax is not supported by all database platforms—it won’t work in SQL Server, for example—so I think the best option is to create these tables ahead of time and omit them from the “Before SQL”. To help with this, I’ll share the CREATE TABLE statements later in this blog. You may need to do some slight modifications to make them work for your platform.
OK, now that we know how to handle
both the master and audit tables, we can build additional flows for all nine
tables. In the following Prep flow, I’ve grouped the audit tables at the top
and the master tables at the bottom.
If you have the Data Management
Add-On, you can then publish the workflow to Cloud and schedule it to run on a
regular basis. Otherwise, you’ll need to another scheduling tool and use the command line
interface.
I wanted to schedule mine to run soon after the Admin Insights data sources
themselves were updated. While we can change the frequency of these
refreshes—daily or weekly—we cannot specify the exact time that they run. So, I
checked the “Modified Date” of the Admin Insights data and found that, on my
site, most of them were modified around 9:00 AM each morning. Therefore, I
scheduled mine to run at 10:00 AM.
Wrap-Up
I’d love to provide you with the Prep
flow I created and make it easy for you to plug-and-play your own site and
target database, but I found that it’s not easy to do that. Reconnecting this
flow to point to your Cloud site and your database pretty much requires
completely rebuilding the workflow. So, I won’t be providing the flow. However,
I will be providing you with the SQL statements needed to create the audit
tables as well as the MERGE statements for each audit table.
Note: While I wish I could provide
you with SQL that will work in every platform, unfortunately, this syntax will
likely need to be tweaked, depending on the platform you’re working with,
particularly the CREATE TABLE statements.
ts_events
Create SQL:
Merge SQL:
viz_load_times
Create SQL:
Merge SQL:
tableau_job_performance
Create SQL:
Merge SQL:
Thanks for reading. Feel free to
leave your thoughts in the comments below.
Ken Flerlage, February 3, 2025
Twitter | LinkedIn | GitHub | Tableau Public
No comments: