Tableau Data Sources Part 2: Published Data Sources & More
Welcome back to my two-part series which deals with the ins and outs of Tableau data sources. If you missed Tableau Data Sources Part 1: Data Source Types, then be sure to go back and read it first as I’ll be building upon some of the concepts shared there. We’re going to start part 2 by diving into published data sources.
Published Data Sources
So far, we’ve just been dealing with data sources in
relation to Tableau Desktop, but things get even more complicated when we
introduce Tableau Server or Tableau Online. We can publish data sources to
Tableau Server/Online in two ways. We can either publish the workbook with
the data source included (an Embedded Data Source) or we can publish
the data source as its own standalone object (a Published Data Source). I
typically recommend publishing as a standalone published data source because that
data source can then be used in other workbooks or by other
authors—essentially, it enables data source reuse, which helps to ensure
consistency and reduces pressure on the source connections. I’m going to focus
on published data sources for the rest of this section.
To create a published data source, you first create your
data source in Tableau Desktop (or Prep or other tool). As you might expect,
the data source can be either a Live connection or an Extract. You then connect
to your Tableau Server/Online and use the menu option, Server | Publish
Data Source (Note: Tableau Online will often require the additional
component, Tableau Bridge). This creates a data
source object on server which anyone with permissions can use.
Connecting to a Published Data Source
You can connect to a published data source in Tableau
Desktop by selecting the Tableau Server option from the connection
screen.
You can then search to find the data source you’re looking
for. In the screenshot below, I’ve connected to a published data source, which
is an extracted data source I created from the Superstore Excel file.
But, as I shared in the previous blog, the connection type
shows as “Live.” Why? Yes, our original data source was an extract, but we have
essentially created a new data source that connects to that extract, and that
new data source is a live connection. In short, we have a live connection to
an extracted data source. Every time you connect to a published data
source, it will show as a live connection, regardless of whether or not the
published data source is a live data source.
Warning:
You cannot relate or join a published data source to any other data. If you
need to combine multiple published data sources then you will need to leverage
blending or you will need to create a new data source that connects to the original
data sources and relates or joins them.
Refreshing an Extracted Published Data Source
When
you publish a live data source to Tableau Server/Online, that data source
maintains a live connection to the data, so refreshing it is not necessary.
However, extracts, by their nature, are just snapshots of data and typically
need to be refreshed on a regular basis. This is where scheduled refreshes come
into play. When a refresh process runs, it will connect to your original data
(this could be a file or a server, such as a database management system), it
will pull the data it needs, then update the data source on the server. I won’t
be going into much more detail about the process of refreshing data sources as
that is pretty well documented in the Tableau Help.
I do,
however, want to mention a few differences you’ll see depending on the type of
connections you use. The process is relatively straightforward when using a
server-based connection—Tableau will prompt you to provide details about how to
authenticate the data source, then that’s all you need. Tableau Server/Online should
generally be able to connect to that server data source without any issue. One
common problem I see is that firewalls or similar technology sometimes block communications
between the two servers. If you see network-related errors when you attempt to
refresh a data source using server-based connections, then I recommend working
with your IT team to determine if there are any firewalls getting in the way.
File-based
data sources are a bit trickier because they often reside on your computer or
on a mapped drive and Tableau Server/Online will generally have problems
connecting to those. If you’d like to learn more about this, see my blog, Auto-Refreshing File-Based Data Sources in Tableau Server.
What’s
Included in a Published Data Source
When you publish a data source, it will include all groups,
sets, hierarchies, bins, and calculated fields that you’ve created. When you
are connected to a published data source, these objects, plus all the data
fields, cannot be edited because they are part of the data source (note: there
are a few properties, such as geographic role, which can be changed and will
then be saved as part of the workbook).
You can create copies of these objects or create new groups,
sets, hierarchies, bins, and calculated fields, but those will not be
part of the published data source. Instead, those will become a sort of extension
to the data source which reside only in your workbook. This can be pretty confusing
so my recommendation is that you first edit the data source (see the next
section), then add the new objects, and republish. This will ensure that others
can leverage those new objects, while also avoiding confusion caused by having
some objects in the data source and some in the workbook.
Parameters are a bit odd because they are not specific to a data source—they can be used by any data source to which you are connected. When you publish a data source, the parameters will become part of the data source, but when you connect to that data source, the parameters can still be edited and can be used in other data sources.
Editing
a Published Data Source
Editing a published data source can be a bit tricky at first, so it’s important to understand the process. Kevin included this in his recent tips blog, Ten Tableau Tips & Techniques - Round 4 (tip # 7), so I’m not going to go into any further detail about it here. You can also read about the process in the Tableau Knowledgebase, Editing a Published Data Source On Tableau Server
Contents of a Published Data Source
Like workbooks, you can download a published data source
which will give you a packaged data source file (TDSX). A packaged data
source is really just a zip file—by changing the extension to ZIP, we can open
it up and see its contents. For example, here are the contents of an extracted
published data source:
First, we see a data source file (TDS), which is an XML containing
the metadata about the data source—essentially, a file that describes the data
source, but does not contain any actual data.
Next, we see a Data folder, which contains another folder,
Extracts (because this is an extracted data source). Inside the Extract
folder is an extract file (HYPER). This is the extract file we created in
Tableau Desktop initially (Note: If your published data source was created
before version 10.5 and the data source was never upgraded to Hyper, then your
file will have a TDE extension instead of HYPER).
So, how does it look when we download a data source which
uses a live connection? In this case, it will depend on whether you’re using a
file-based connection or server-based.
If using a file-based connection, you’ll see something like
this:
Once again, we see the data source (TDS) file and a Data
folder. The Data folder contain another folder with the same name as the folder
in which your original data source resides. In my case, I originally connected
to a file in my Downloads folder. Within that folder, you’ll see a copy of the
file source. I had connected to the Superstore Excel file so I see Sample -
Superstore.xlx. So, basically, the packaged data source contains a copy of
the original data file. Thus, if you were to connect to this packaged data
source in Tableau Desktop, you’d be able to read the data and work with it.
If you’re using a server-based data source, then there is no
way for Tableau to put a copy of the data into the data source (without
extracting it, of course). Thus, you’ll only see a data source (TDS) file and
no Data folder. If you use Tableau Desktop to connect to the packaged data
source, it will force you to re-enter the credentials for the server-based
connection before you can use the data.
Workbooks
OK, now
you’re asking why I have a workbooks section in a blog about data sources. The
reason is that, in some cases, workbooks actually contain the data itself. In
order to explain what I mean, we need to first look at some of the different
types of workbook files, like we did when we took apart the packaged data
source files in the previous section. When we download a workbook from Tableau
Server/Online, we get a packaged workbook (TWBX). Like packaged data source
files, packaged workbook files are just zip files. We can change the extension
to ZIP in order to open them. The below example shows the contents of a
packaged workbook using live connection to the Superstore Excel file:
What we
see in these files is very similar to what we see in packaged data source
files—a Data folder which organizes our data files in exactly the same manner as
the packaged data source (with all the same differences for live vs extract and
file vs server-based connections). We’ll also see a workbook file (TWB), which
is an XML file containing the metadata of the workbook (similar to the TDS file
found in the packaged data source). This workbook file has the same name as the
name of the packaged workbook (“Example” in my case). In addition, we may also
see some other folders with objects needed for the workbook, such as images. In
short, the packaged workbook is a fully self-contained file which has everything
we need—the workbook, the data, and any other objects we might need (images,
etc.).
An Experiment…
Okay,
time for an experiment…I’ll start by creating a new workbook in Tableau Desktop
then creating a live data source connecting to the Superstore Excel file
located in my C:\Downloads directory. Next, I’ll save this as a packaged
workbook (TWBX).
Remember
what’s happening here—the packaged workbook will contain the workbook file (TWB)
as well as a copy of the source Excel file.
Now
let’s close the workbook and open it back up. Next, go to the Data Source tab
and edit the connection to the Excel file.
Take
note of the directory in which the Excel file resides. In my case, it’s C:\Users\Ken\AppData\Local\Temp\TableauTemp\3296623394\Data\Downloads.
You may be wondering why it isn’t linked to the C:\Downloads directory I
originally connected to. The reason is that, once I save the file as a packaged
workbook (TWBX), it copies the Excel file into the packaged workbook.
Then, when I open it, Tableau retrieves the data from the packaged workbook and
places it into a temporary directory. Once I save as a packaged workbook,
that link to the original file is broken. This is incredibly important to
remember because, if you go back and edit the original Excel file (in
C:\Downloads) then try to refresh the data source in your workbook, it will
never update. This, of course, is because it is no longer linked to that file.
In this
case, if you simply save the workbook as a workbook (TWB), then you’ll only be
saving the XML metadata for the workbook. Reopening that file will maintain the
connection to the original source file. However, this isn’t always the right
approach. If you intend to share the workbook with others, they will need to
have that data available to them. This is why we always ask for a packaged
workbook (TWBX) on the Tableau Community Forums. If you provide us with a workbook file (TWB) only, then
we won’t be able to see your data and it’s next to impossible to address a problem
without having access to the data.
Recap
Wow
that was a lot to take in!! Data sources are, in concept, quite simple, but
there are so many options and so many ways to store them that they can become
very complex very quickly. I hope that this blog series has helped to demystify
some of this complexity. If you have any thoughts, comments, or additional
questions, please drop them in the comments. Thanks for reading!!
Ken Flerlage, April 11, 2022
Twitter | LinkedIn | GitHub | Tableau Public
Hi Ken. Thanks for the detailed description on published data sources. I have blended excel file. How to publish blended excel file to Tableau server/ tableau online?
ReplyDeleteI don't fully understand. Can you clarify the question?
DeleteThanks for this amazing brain dump. This is going in my technical help notebook!
ReplyDeleteThanks!
Delete@Priti you need publish data sources to server/online separately then blend on server/online.
ReplyDeleteThank you, ken. Great explanations!
ReplyDeleteHi Ken, thanks for the blog post! data sources in tableau seems to have their own special quirks sometimes :)
ReplyDeleteI have one question regarding published data sources.
say I have a published data source, which is just being connected to the database, which in turn only includes the raw data.
now, say i create a dashboard out of it, dozens of calculations, sets and etc. I would have to include the data source as also embedded data source within the workbook, so all the calculations would be materialized in order to promise the best performance (using extract for performance reasons), am I correct?
On the flip side, if the published data source is an Extract and i don't embed the data source within the workbook , it still wouldn't include all the calculations that i would create for the dashboard, correct? or am I missing something?
I'm just trying to find the right strategy when working with published data sources. I would love to hear your thoughts on the matter.
Thanks again for the post and for your time!
If you create the calculations before you publish the DS, then they will be included in the published DS. After connecting to a published DS, any new calcs you create will be part of the workbook, not the data sources.
DeleteHi Ken - when connected to a published data source I understand it does show that 'live' connection to the published extract. I am wondering if that live connection to the extracted data slows down the dashboard at all. Would it increase performance to have a localized extract instead?
ReplyDeleteIt would likely be a little slower to connect to the server than a local file, but I doubt it would be too noticeable.
DeleteHi Ken, we have created two published data sources - each are extracts with multiple joins within them. We want to now join these two published data sources (actually published extracts) in Prep. But doesn't seem t be working out. Is this even possible. Or do we have to make the join before we create the separate extracts.
ReplyDeleteYou can join two published data sources in Prep. Happy to help further if you'd like to email me. flerlagekr@gmail.com
DeleteI'm having a really difficult time working with a published virtual connection. I connect to that virtual connection with Tableau Desktop, pull in the tables I want, build the relationships between them, build my workbook and all seems fine. However, then I publish my workbook to Tableau Server and suddenly no data is returned because my relationships I built are all invalid now and I have to rebuild them and fix a pile of issues because of fields that are considered invalid or interpreted as not existing in the database. It's like on the Desktop side the naming convention of the fields are slightly different. Ie. a field in my relationship called "Pk" isn't being recognized on the Tableau Server side because it only seems to recognize is as being called [Pk]. Any idea if this is caused by how the virtual connection was created or perhaps how I have created the relationships, or how I published my workbook to the server?
ReplyDeleteHey, What if I don't want to include the calculated fields in my workbook in my published data source
ReplyDeleteYou can publish the published data source without calculated field. Then connect to that published data source and create the calculated fields within the workbook.
Delete