Tableau Data Sources Part 1: Data Source Types
There is so much you can do with a Tableau data source. You can extract it or maintain a live connection. You can publish it to Server/Online so that it can be reused by yourself and others. You can save it with a packaged workbook so you can share that workbook with others. And so much more!
But I’ve found that all of these options can often lead to
confusion. And this confusion is not just for new users—even very experienced
Tableau users often run into issues when dealing with all the different forms
that data sources can take. Personally, this was one of the most confusing
things I’ve dealt with. Luckily, largely through my interactions with people on
the Tableau
Community Forums, I *think* that I’ve gained a pretty good understanding of
the ins and outs of Tableau data sources, so in this two-part series, I’m going
to share some of what I know about them.
The first blog will deal with file-based and server-based
connections and the differences between live and extracted data sources, while
the second will get into detail about published data sources and the variety of
different data-source related files that Tableau works with.
File-Based vs Server-Based
There are two primary types of data connections—file-based
and server-based.
File-Based
File-based connections are ones that involve some sort of
file. On the Connect pane of the main Tableau screen, these are listed
under “To a File”
While Tableau lists Excel, text, JSON, Access, PDF, Spatial
file, and Statistical file, you can use the More option to connect to a
variety of other types of files, such as other Tableau workbooks or Tableau
data source files (more on these in part 2).
Server-Based
Server-based connections do not connect to a file—instead,
they connect to a “server.” Generally speaking, these servers run some sort of
application software that allows you fetch data. The most straightforward
example are database management systems (DBMS) such as SQL Server, Oracle,
MySQL, PostgreSQL, and many others.
Note: While many would claim that spreadsheets and Access
are databases, they would not qualify as complete “database management systems”
like the ones listed above.
Tableau Server-Based Data Source Connectors
The vast majority of Tableau’s server-based connectors are database
management systems, but there are other types as well, including OData (a web-based open protocol
that allows for the creation and consumption APIs in a simple way), Web Data Connectors (a web-based JavaScript connector
custom written to allow people to connect Tableau to data), and Google
Drive/Sheets. Google Sheets’s similarity to Excel may cause some
confusion—why is one file-based and the other is server-based? The reason is
that Excel stores its spreadsheets in files that generally reside on your
computer or somewhere on your company’s file server. Google Sheets, on the
other hand, does not deal with files—all spreadsheets are stored on the server
and you access them via a web application.
Tableau Data Sources
A Tableau Data Source can consist of multiple connections
and those connections can be a mix of file and server-based. For example, the
following screenshot shows a single data source which contains two connections.
The first connection is a file-based connection to Excel,
while the seconds is a server-based connection to SQL Server.
We can add as many connections as we want and we can
relate/join/union them together however we wish. In the end, this combination
of connections and data model has resulted in a full-blown Tableau Data Source.
And, of course, a Tableau workbook can have multiple Data Sources.
This distinction between file-based and server-based data
sources is very important as it will impact much of how you work with a data
source, which we’ll deal with shortly.
Live vs Extract
Once you have your Tableau data source, you need to decide
whether to make it Live or an Extract.
Live Data Sources
By default, data sources will use a live connection, except
in some cases where a live connection is not possible (we’ll address this
shortly). Put simply, a live connection maintains a direct connection to the
source files or server at all times, so any updates to that source can be seen
by Tableau pretty much immediately (Tableau does do some caching, so it’s not
quite this straightforward, but for this blog, this definition is good enough).
When using a live connection, Tableau has to speak to the
source using its native language. For most relational database
management systems, that language is Structured Query Language (SQL). In
such cases, SQL is used to pull in all of your data fields, execute all of the
filter logic, translate calculated fields, and many other things.
Unfortunately, SQL is different for each DBMS and often doesn’t have the same
set of functions as are available in Tableau. For this reason, some functions
are not available to all live data sources. For example, Tableau has a function
called MAKEDATE which allows you to create a date from a year, month, and day.
However, most DBMSs do not have a similar SQL function. In this case, Tableau
cannot translate the MAKEDATE function into SQL, so MAKEDATE is not available
for use with that live connection.
Extract Data Sources
We can choose to change our live connection to an Extract.
When you choose to extract your data, Tableau creates a local file which
contains a copy of the data. Previous to version 10.5, this data was stored in
Tableau Data Engine file with extension TDE. Starting in 10.5, Tableau
converted to the Hyper engine and extract files use the HYPER extension (I’ll primarily
refer to Hyper files from here on out).
Note: Some data connections such as Web Data Connectors do
not even allow live connections. In such cases, you are forced to create an
extract.
A key drawback of using an extract is data freshness. The
data is frozen in time and you have to force it to be refreshed (or schedule
it, which we’ll deal with in part 2), whereas live connections ensure you
always have the most recent data. So, why would you ever want to create a copy
of your data? Why not just always us a live connection? The most common reason
is performance. Tableau can very quickly read data from an extract because 1) it
has been optimized to do so and 2) the extract generally includes only a subset
of the original data connection’s data. For example, you may connect to a
database that has 300 tables, but only use 2 of those tables in your extract. The
smaller data set means more efficient interactions with the data.
It is important to note that extracts are not always
faster than live connections. Some databases, particularly those that are built
for analytics, are incredibly powerful and can retrieve data at lightning-fast
speeds, making them faster than extracts. It is, therefore, helpful to
understand the database platform you’re working with and how its performance
compares to that of your computer or Tableau Server. This being said, in my
experience, extracts are almost always faster than live connections, so if in
doubt and concerned about speed, an extract is usually your best bet.
But performance is not the only advantage of extracts. As
noted earlier, some functions cannot be translated into the native language of
the connection (see the MAKEDATE example). When using an extract, all functions
are available to you automatically. And this is more than just calculated field
functions—there are other functionality available in extracts which may not be
available with a live connection.
Portability is one more advantage of extracts. The resulting
extracted data can be shared with others or even saved with a Tableau workbook.
Of course, this may not always be an advantage as it opens you up to potential
security concerns.
There are, of course, many other factors you must take into
account when choosing between an extract or a live connection. For a deeper
dive, check out the following blog by Jonathan Drummey: TDE or Live? When to Use Tableau Data Extracts (or not)
Live Connection to an Extract?
Now
let’s do an experiment. I’m going to connect to the saved Superstore data
source, then I’ll create an extract. You can see below that I’m saving the
extract file as Superstore.hyper in my Downloads folder.
Now I’m
going to create a new workbook and use the More option under the
file-based connectors to connect to the extract file we just created.
Finally,
we drag over our tables to create a data model.
We now
have a file-based connection to the extract file. The original data source was
file-based, but even if it had been server-based, we are now connected to a
file, so this connection is file-based.
But
there is something a bit strange here and, in my opinion, more than a little
confusing. The connection type is Live! What? But this file is an
extract—why is it showing as a live connection? Well, yes, our original data
source was an extract, but as soon as we connected to that file, our new
data source is a live connection to the file. In short, we have a live
connection to an extract file (if we really wanted to, we could create an
extract of the extract, though I’m not sure why you’d want to do that).
While
the use cases for a live connection to an extract are somewhat limited, they do
exist. For example, perhaps you have a Tableau Prep or Alteryx data flow that
updates the extract on a regular basis? In that case, your live connection to
the hyper file would automatically receive those updates as they occur. This is
sort of a niche use case, but you will see this same sort of thing quite a bit
when you work with published data sources in Tableau Server/Online, something
we’ll address in part 2.
A Quick
Note on Tableau Public
Tableau
Public “is a free platform to publicly share and explore data
visualizations online.” It is important to note that Tableau Public only
supports extracts. So, you will always need to extract your data before
publishing to Public. And, if you’re using the Public version of Tableau
Desktop, you won’t even see the option for a live connection—it will create an
extract for you automatically.
Up Next
Thanks
for reading part 1. In part 2, we’ll discuss published data sources and
we’ll break down all of the different data-source related file types that
Tableau uses. In the meantime, if you have any questions or comments, feel free
to leave them in the comments section below.
Ken Flerlage, March 21, 2022
Twitter | LinkedIn | GitHub | Tableau Public
Thanks alot as always.
ReplyDeleteThanks Ken! This is extremely useful, as it's not always easy to find definitions of terms in the Tableau help, which often leaves me confused and wondering what the heck all the terms mean! (For instance, that was the case with "data source" vs "connection", which your post clarified straight away.)
ReplyDeleteGreat! Glad to hear it was helpful!! Be sure to check out the other blogs in this series as well!!
DeleteBTW, just realised that although this post mentions part 2 a few times, none of those mentions are links. So when I came back to it, it took me a while to trach down part 2! ☹
ReplyDeleteThanks. I just updated it to include a link.
Delete