Build Your Tableau Cloud Admin Insights History

If you’re a Tableau Cloud user, then you’ve probably leveraged some of the amazing data available in Admin Insights. Admin Insights is a Tableau Cloud project that is pre-populated with curated published data sources and pre-built workbooks showing useful insights about your site. You can use the pre-built workbooks or build your own using the data sources provided.

 

As of the writing of this blog, the project contains the following data sources:

 

Master Data

These data sources contain “master” data about key content and resources on your Cloud site. Note: When information is deleted from Cloud, it also gets removed from these tables.

 

1) TS Users – Tableau Cloud users, their roles, etc.

2) Groups –Tableau Cloud Groups and membership.

3) Site Content – Projects, data sources, workbooks, views, flows, etc..

4) Permissions – Effective permissions for users and content on your site.

5) Subscriptions – Subscriptions and their creators and content.

6) Tokens – Active tokens, including personal access tokens, refresh tokens, and OAuth tokens.

 

Audit Data

Historical audit data that tracks point-in-time events.

 

7) TS Events – Historical events that occur on the site, including sign-ins, views, data source access, publishing, etc.

8) Viz Load Times – Historical tracking of workbook/view load times to help you better understand and address workbook performance.

9) Job Performance – Historical tracking of various background jobs, including extract refreshes (including Bridge, as of May 2023) and Prep flow executions.

 

Source: Admin Insights Data Sources

 

The audit data is particularly valuable for better understanding and optimizing your environment. For example, you can use TS Events to understand which workbooks and views are most heavily used or which users are most active. Or you can use it to determine which workbooks are stale or which users are not using their licenses. You can use Viz Load Times to look at the load performance of slow workbooks and optimize the end user experience. And you can use Job Performance to optimize the performance of slow-running extracts or flows.

 

The Problem

But there’s a slight problem with the Admin Insights audit data. By default, Tableau Cloud only stores 90 days of history, unless you own the Advanced Management add-on, then you’ll get 365 days. In either case, that may not be enough to perform a thorough longitudinal analysis of the data. For example, let’s say you find that a workbook is not being used in that 90 days of history you have. You may wish to archive it, but it would be useful to go back further to understand exactly what happened—When was it published? Was it ever used? If so, who used it? Does that person still work at the company? It may be that the workbook was once useful but some business process changed, which led to it no longer being applicable. In such a case, it might only require a couple of tweaks to make it useful again. But, with only 90 days of data, you can’t properly answer this question.

 

You might be asking why Tableau doesn’t allow us to store more of this information? The answer is actually quite simple—since they are managing the Tableau Cloud platform, they need to exert some level of control over resource usage. If they allow you to store all your audit data, then you could end up consuming multiple terabytes of data on the platform. And reading that data requires computing power and RAM. This is the way it works with every Cloud platform—for the platform to scale, they must create some limits.

 

Of course, if you run your own Tableau Server, then you can store as much history as you like. That’s because you are managing the platform and the resources, so it’s not really an issue for Tableau and it won’t impact other customers.

 

Another Problem

There is another problem with the Admin Insights data. All the data is contained in Published Data Sources. Now, I love Published Data Sources as a way of curating data for my users, but they do have some limitations. Most importantly, you cannot relate/join them with each other or with other data sources.

 

So, for example, let’s say you want to link the audit data with some additional information about your employees, such as their department, location, etc. You cannot, in a Tableau data model, bring those data sources together since one is a Published Data Source. Your only choice is to blend those data sources, which can lead to lots of headaches and additional limitations.

 

Typically, we’re the ones building published data sources and we do that by building a data model in Tableau that connects multiple tables together. So, when we run into this limitation, we can often work around it by creating a new published data source that uses the original tables plus whatever additional tables we need. But the Admin Insights data sources are different because we do not have access to those source tables—the published data sources are all we get.

 

A Solution!

So, wouldn’t it be nice if we could find a way to store all this data in a separate database? By doing that, we could build up the history over time and, because these would be tables in a database, we could relate/join them to other data sources as needed.

 

The good news is that we can!!!

 

I won’t take credit for this idea as I learned about it from a client, Hover, a leader in home renovation visualization and scoping used by both homeowners and pros. My primary contact, Jenny Harris, Director, GTM Analytics & Strategy, introduced me to Justin Grilli, Senior Data Engineer. Several years ago, Justin realized that 90 days of data wasn’t really enough for their analysis needs. So, he wrote a Python program that leverages the Tableau REST API to extract the data and write it to their data warehouse on a nightly basis. They implemented this years ago, meaning they now have several years-worth of audit data available for their analysis! Justin even built a public Python package for managing these data sources, among other things. So, if you’d like to build this in Python, definitely check it out!!

 

So, writing code is one way we can do this, but since we can access these published data sources via both Desktop and Prep, a second method is to use Tableau Prep. That’s what I’m going to do today. We’re going to build a Tableau Prep flow that pushes all the Admin Insights data to a database.

 

The Prep Flow

I’m going to be creating a single Tableau Prep workflow that copies the Admin Insights data to a PostgreSQL database. In Tableau Cloud, we have the option to refresh the Admin Insights data sources on a daily or weekly basis, so the intent of my workflow is that it would also be run according to that schedule (more on this later). If you have the Data Management Add-On, then you have access to Tableau Prep Conductor and can schedule this flow to run automatically. If not, then you can either schedule it using another scheduling tool and the command line interface or consider another method, such as the code option mentioned earlier.

 

Remember that Admin Insights contains two different types of tables—master and audit—and we’ll need to handle each of them differently. The master tables are simple as they do not track any sort of historical data over time. So, we’ll just need to reload these tables each time we run the flow. Here’s an example using TS Users:

 

 

I’ve added one Clean step that adds a timestamp field to the end of the table, populating it NOW(). This is a best practice in data warehousing as it can be useful to know when data was actually written to the table. We’ll do this with all the tables in this workflow.

 

 

The output step writes the data to the Postgres database using the “Replace data” option. This will truncate the table (delete all data), then insert the new data. If the table does not exist, it will be automatically created, so there is no need for us to create the table in the database ahead of time.

 

The audit tables are a bit more complicated. Admin Insights automatically truncates anything older than 90 days (or 365 if you have the Advanced Management add-on), but we want to allow the database to build up that history over time. Thus, we cannot do a complete reload of these tables as that will also remove that older history. Instead, we want to keep the old data in the database and only update and/or insert the data that exists in Tableau. For example, using TS Events, the logic looks something like this:

 

1) Extract – Extract the TS Events data from Tableau Cloud.

 

2) Insert New Rows – If a row’s Event ID does not exist in the database table, insert it.

 

3) Update Existing Rows – If a row’s Event ID exists in the database table, update all fields in the table using the extracted TS Events data (I’m not sure if historical data actually changes, but it’s best to do this update just in case).

 

4) Leave Everything Else – If a row exists in the database table, but not the extracted TS Events data, then leave it as is—this is how we’ll build our history over time.

 

With other ETL tools, there would be several different methods to perform this operation. For instance, we could first delete anything in the Postgres database if it exists in the Admin Insights data, then insert from the Admin Insights data. But that requires two separate output steps one right after the other—the DELETE and the INSERT—and Tableau Prep cannot do that. OK, that’s not entirely true—if you have the Data Management add-on, then you could create multiple Prep Flows then set them up using “Linked Tasks” so that one runs right after the other. But that method requires some extra steps so we’re just going to do it in a single flow.

 

Another option is to do something that is typically referred to as an “UPSERT”. This is an operation that will UPdate any records that already exist and inSERT any records that do not (based on the table’s primary key—in the case of TS Events, the Event ID field). Some ETL tools do this natively, but Prep does not. However, we can get tricky with some temporary tables and SQL to make this work, as detailed by Tableau DataDev Ambassador and 2018 Iron Viz Champion, Timothy Vermeiren on the Biztory blog: How To UPSERT or MERGE Data with Tableau Prep’s Write-back Functionality.

 

We’ll start by adding the Timestamp field, just as we did for the master tables.

 

 

But then we’ll write the data to a temporary table. In the example below, I’ve just added the text, “temp_” to the beginning of the final database table name to get “temp_tableau_ts_events”. This is just like the step for the master tables, using a “Replace data” option to automatically create the table if it does not already exist.

 

 

This is our one output operation, so we’ll use Timothy’s trick and add some SQL to the “After” Custom SQL on the output step:

 

 

This SQL will use a MERGE statement. MERGE basically performs the UPSERT operation we described earlier. It’s been part of ANSI Standard SQL for quite a while now so it should be available in most modern databases.

 


If you’re not a SQL expert, this might look a little daunting, so let’s walk through it line by line.

 

1. MERGE INTO dw.tableau_ts_events e

We’re upserting to the table, dw.tableau_ts_events. We’re aliasing this table as “e” so that we can easily refer to it later, without writing out the whole table name.

 

2. USING dw.temp_tableau_ts_events u

We’re upserting based on the table, dw_temp_tableau_ts_events. This is our source table. We’re aliasing this table as “u”.

 

3. ON e."Event Id" = u."Event Id"

We’ll perform the checks based on the Event ID field. The database will use this field to check for the existence of a previous record in the target table, so that it will know whether to perform an Update or an Insert.

 

4. WHEN MATCHED THEN

What follows is the action we’ll take for each row that already exists in the table.

 

5-33. UPDATE SET…

For “matched” records, an update statement will be performed, updating every field in the table.

 

34. WHEN NOT MATCHED THEN

What follows is the action we’ll take for each row that is NOT already in the table.

 

35-36. INSERT (…

For “unmatched” records, an insert statement will be performed.

 

Note: The above SQL uses double-quotes for “Delimited Identifiers” around field names, table names, etc. Double-quotes are the ANSI standard for such identifiers and should be supported by most modern database platforms. Other platforms may, however, use different identifiers by default. SQL Server, for example, uses square brackets by default, while MySQL uses back ticks (`). However, both also support standard double-quotes so this SQL should work without issue in most platforms.

 

Important Note: Since our output steps are writing to the temp tables, we can’t use that step to automatically create the final target table (e.g. tableau_ts_events). If desired, we could add them to the Tableau Prep flow, in the “Before” Custom SQL:

 


Because this uses a CREATE TABLE IF NOT EXISTS statement, it will only create the table if it does not already exist (i.e. the first time the flow runs). I should, however, note that this syntax is not supported by all database platforms—it won’t work in SQL Server, for example—so I think the best option is to create these tables ahead of time and omit them from the “Before SQL”. To help with this, I’ll share the CREATE TABLE statements later in this blog. You may need to do some slight modifications to make them work for your platform.

 

OK, now that we know how to handle both the master and audit tables, we can build additional flows for all nine tables. In the following Prep flow, I’ve grouped the audit tables at the top and the master tables at the bottom.

 

 

If you have the Data Management Add-On, you can then publish the workflow to Cloud and schedule it to run on a regular basis. Otherwise, you’ll need to another scheduling tool and use the command line interface. I wanted to schedule mine to run soon after the Admin Insights data sources themselves were updated. While we can change the frequency of these refreshes—daily or weekly—we cannot specify the exact time that they run. So, I checked the “Modified Date” of the Admin Insights data and found that, on my site, most of them were modified around 9:00 AM each morning. Therefore, I scheduled mine to run at 10:00 AM.

 

Wrap-Up

I’d love to provide you with the Prep flow I created and make it easy for you to plug-and-play your own site and target database, but I found that it’s not easy to do that. Reconnecting this flow to point to your Cloud site and your database pretty much requires completely rebuilding the workflow. So, I won’t be providing the flow. However, I will be providing you with the SQL statements needed to create the audit tables as well as the MERGE statements for each audit table.

 

Note: While I wish I could provide you with SQL that will work in every platform, unfortunately, this syntax will likely need to be tweaked, depending on the platform you’re working with, particularly the CREATE TABLE statements.

 

ts_events

 

Create SQL:

 

Merge SQL:

 

viz_load_times

 

Create SQL:

 

Merge SQL:

 

tableau_job_performance

 

Create SQL:

 

Merge SQL:

 

 

Thanks for reading. Feel free to leave your thoughts in the comments below.

 

Ken Flerlage, February 3, 2025

Twitter | LinkedIn | GitHub | Tableau Public


No comments:

Powered by Blogger.