Welcome back to our series on SQL for Tableau Users. This week, Robert Crocker will be sharing some strategies for writing better SQL code...


In this post, I want to share a few simple strategies you can apply to your day-to-day work so that your SQL code is cleaner and more sustainable. Remember, someday your code will need to be revisited. Don’t dread that day or make it a horrible experience for the person who comes behind you. Make it a day you look forward to, not one you dread.

To make this motivating statement more concrete, I recently had to come behind work two sets of work—one which was well documented and one which was all over the place with no guide. The well-documented code sent me hunting the previous developer down on LinkedIn to thank them for their efforts. The poorly documented work left me to scrap everything and start over anew. On the one hand, I was able to pick up where the last person left off and on the other, we had to start from scratch.

Well written SQL isn’t the only place where documentation pays off. So, while you read this, be sure to think about how these principles apply elsewhere, including Tableau.

How bad can SQL be?

Below is the final query from our post on subqueries. It’ll work just fine, but it could be written much more clearly.

SELECT o.City, DATENAME(dw, [Order Date]) AS Day_of_Week, AVG(o.Profit) AS Average_Daily_Profit
FROM Orders AS o
JOIN ( SELECT TOP(3) City, SUM(Profit) AS Total_Profit
FROM dbo.Orders
GROUP BY City
ORDER BY 1 DESC) AS sub
ON o.City = sub.City
GROUP BY o.City, DATENAME(dw, [Order Date])
ORDER BY 1, 2, 3;

Notice that every field name in the SELECT and GROUP BY clauses share the same line, so you have to look closely for those little commas. There is also a subquery hidden in there somewhere. Lastly, the ORDER BY uses cryptic numbers to reference fields instead of the actual field names. Now let’s step through a few strategies that will help us clean up this code.

How can we clean this up?

As a data visualization professional, you value formatting. You understand how much more value can be realized by well-designed work versus work that’s been thoughtlessly thrown together. The suggestions which follow are all about formatting. SQL code is just text after all. To make text something that’s well designed, it just needs a little formatting.

So, here are three simple suggestions for making your SQL code more readable and sustainable:

1)  Comma to after the field name, new line, indent
2)  Indent subqueries to the opening parenthesis
3)  Comment your code

Suggestion 1: Comma to after the field name, new line, indent

When reviewing others work, it seems like people love saving lines. No, not lives…lines. I don’t understand why people find lines so precious. Sure, your screen may be wider than it is tall, but that doesn’t mean everyone is carrying around HD desktop monitors. And even if that is the case, it is much easier to understand what fields are in your SELECT when they are nicely stacked on top of one another.

So instead of our original query, we would write…

SELECT o.City,
  DATENAME(dw, [Order Date]) AS Day_of_Week,
  AVG(o.Profit) AS Average_Daily_Profit
FROM Orders AS o
JOIN ( SELECT TOP(3) City,
         SUM(Profit) AS Total_Profit
FROM dbo.Orders
GROUP BY City
ORDER BY 1 DESC) AS sub
ON o.City = sub.City
GROUP BY o.City,
  DATENAME(dw, [Order Date])
ORDER BY 1, 2, 3;

The indentation creates a nice visual block of text for our eyes to key in on. The small commas at the end of the line indicate that there are more column names listed below. Additionally, if you are only selecting a few columns like we are here, then you can quickly count up how many columns to expect in the result set.

Suggestion 2: Indent subqueries to the opening parenthesis

This rule might be a little controversial, but the main mission of this post is to help you write more legible queries. With subqueries being one of the most common culprits of complex SQL, I figure it is best to pull out all the stops.

After applying this suggestion to our query, we have…

SELECT o.City,
  DATENAME(dw, [Order Date]) AS Day_of_Week,
  AVG(o.Profit) AS Average_Daily_Profit
FROM Orders AS o
JOIN (
     SELECT TOP(3) City,
       SUM(Profit) AS Total_Profit
     FROM dbo.Orders
     GROUP BY City
     ORDER BY 1 DESC
     ) AS sub
ON o.City = sub.City
GROUP BY o.City,
  DATENAME(dw, [Order Date])
ORDER BY 1, 2, 3;

By writing your subquery in a way that lets the parenthesis shine, we can clearly see that there is another query packed in there.

Suggestion 3: Comment your code

Last, but definitely not least, are comments. In fact, this could be the most important suggestion we have made thus far. Comments are a chance for you to go off-script from the strict syntax of whatever language you are writing and explain the code your computer is compiling in a way that’s easier for us humans to read.

Comments come in two flavors—single line comments and multi-line comments.

Single line comments start off with two dashes.

-- I am a single line comment.

-- Personally, I like to write several single line
-- comments on top of one another. I know it is probably
-- not best practice, but it’s how I have set up my hot
-- keys in visual studio code.

Multi-line comments open with a forward slash followed by an asterisk and close with an asterisk followed by a forward slash. You can put anything you want between these opening and closing marks, but generally, these multi-line comments are verbose explanations of the code to come.

Here’s an example of an entertaining multi-line comment by Joris Bellenger.

/*
           _
           H||
           H||
 __________H||___________
[|.......................|
||.........## --.#.......|
||.........   #  # ......|            @@@@
||.........     *  ......|          @@@@@@@
||........     -^........|   ,      - @@@@
||.....##\        .......|   |     '_ @@@
||....#####     /###.....|   |     __\@ \@
||....########\ \((#.....|  _\\  (/ ) @\_/)____
||..####,   ))/ ##.......|   |(__/ /     /|% #/
||..#####      '####.....|    \___/ ----/_|-#/
||..#####\____/#####.....|       ,:   '(
||...######..######......|       |:     \
||.....""""  """"...b'ger|       |:      )
[|_______________________|       |:      |
       H||_______H||             |_____,_|
       H||________\|              |   / (
       H||       H||              |  /\  )
       H||       H||              (  \| /
      _H||_______H||__            |  /'=.
    H|________________|           '=>/  \
                                 /  \ /|/
                               ,___/|

*/

So, if we add some comments to our SQL, we’ll have something like this…

-- Create a table displaying the daily sales of our
-- top 3 most profitable cities.
SELECT o.City,
  DATENAME(dw, [Order Date]) AS Day_of_Week,
  AVG(o.Profit) AS Average_Daily_Profit
FROM Orders AS o
JOIN (
     -- Return a list of the top 3 most profitable cities.
     SELECT TOP(3) City,
       SUM(Profit) AS Total_Profit
     FROM dbo.Orders
     GROUP BY City
     ORDER BY 1 DESC
     ) AS sub
-- Use this join as a filter so we don’t run processes
-- on any cities that aren’t among the top 3.
ON o.City = sub.City
GROUP BY o.City,
  DATENAME(dw, [Order Date])
ORDER BY 1, 2, 3;

Now you can pretty much just read through the comments of this query and understand what’s going on. If there’s ever any doubt, you can pull out a piece of the query with the understanding of the original intent of what that block of code was meant to do.

Side Note: Comments are incredibly powerful in any scripting environment, even Tableau. For more on this topic, see Ken’s blog, Comment Your Calcs, where he dives into the importance of commenting your calculated fields.

One Last Suggestion

In just about any SQL book or article I come across, the pros are constantly referencing fields in their ORDER BY clauses with 1’s, 2’s and whatever field index their columns happen to fall (as we’ve done in our examples above). While I respect their place in the SQL world, I disagree with this practice. Yes, single characters save you keystrokes, but when it comes time to review your work, you’ll need to scan up and down your query to make sure your field names are in the right order. And, if you change the order of the columns in your SELECT, you’ll need to ensure you change the number in the ORDER BY. For small throw away queries, this could be fine, but for the big production-grade beasts we see more often at work, you could forget what field number you were trying to sort on by the time you scroll to the top of the query. So, in my opinion, it’s best to refer to these by their fully-qualified names.

After adjusting our SQL to account for this suggestion, we have…

-- Create a table displaying the daily sales of our
-- top 3 most profitable cities.
SELECT o.City,
  DATENAME(dw, [Order Date]) AS Day_of_Week,
  AVG(o.Profit) AS Average_Daily_Profit
FROM Orders AS o
JOIN (
     -- Return a list of the top 3 most profitable cities.
     SELECT TOP(3) City,
       SUM(Profit) AS Total_Profit
     FROM dbo.Orders
     GROUP BY City
     ORDER BY 1 DESC
     ) AS sub
-- Use this join as a filter so we don’t run processes
-- on any cities that aren’t among the top 3.
ON o.City = sub.City
GROUP BY o.City,
  DATENAME(dw, [Order Date])
ORDER BY o.City,
  Day_of_Week,
  Average_Daily_Profit;

Don’t make your work harder than it needs to be. Write out your column names in their fully qualified beauty.

Thanks for Reading

The rules are simple, but they make a big impact on the legibility of this code. Our final query is much more verbose than the original, but that’s perfectly okay because it’s also much easier to support.

Thanks for reading and have fun writing beautiful SQL!! If you have any questions, feel free to reach out to me. You can find me on Twitter @robcrok or on LinkedIn.

Robert Crocker, July 6, 2019


1 comment:

  1. Great advice, I change depending on the client's rules (1 line per column, or chain together and then a hard-return before the horizontal scroll kicks in ..... less lines, no need for a mouse to scroll)

    Also, the ORDER BY is great when you're in SQL but look at deleting it when you're in Tableau to improve performance (not wasting CPU time sorting differently to what you will do in Tableau anyway)

    ReplyDelete

Powered by Blogger.