Thursday, November 19, 2009

Understanding Join and Concatenate

The Qlikview script functions JOIN and CONCATENATE can sometimes be used to tackle the same problem, but there are important differences that should be understood.


Examine the sample tables below. Note that they share one common field name, "Key". Also note that Table1 has a Key value "3" that is not present in Table2.







JOIN will combine rows where the Key value matches. The keyword OUTER will also retain rows that do not match rows in the other table. Here's what the merged table will look like after an outer join.

OUTER JOIN (Table1) LOAD * RESIDENT Table2;


Values A1 and C1, which were in different tables, now occupy the same row in the result table. The row with Key 3 has missing values for C & D, because there was no matching Key in Table2.






Creating a chart that uses "Key" for dimension will produce results similar to the Table Box above.





The important point is that values with the same Key value have been merged together into a single row. If value A1 is selected, note that values C1 & D1 remain associated (white). The set A1,B1,C1,D1 is indivisible.



Now let's look at Concatenate. Concatenate appends the rows of one table to another. Concatenate never merges any rows. The number of rows in a concatenated table is always the sum of the rows from the two input tables. Here's what our sample data will look like after Concatenate.

CONCATENATE (Table1) LOAD * RESIDENT Table2;


Rows with like Key values are not merged together. The rows from Table2 are simply appended to Table1. Because the tables have different fields, rows will have null values for the fields from the "other" table.





If the data is used to build a chart that utilizes the common field "Key" as dimension, the chart looks just like the JOINed table.




Let's make the selection "A1" in Field A and see it's impact on our visible charts and tables.


When A1 is selected, the association to C1 & D1 is lost and C/D values become null in both the Chart and Tablebox. We cannot select both A1 and C1. This is a different result than the JOINed example.


Let's consider a more realistic example where we may choose between JOIN and CONCATENATE. Consider the two tables below. Note that only one BudgetAmount row is present for each Region-Year combination. In the Sales table, the SalesAmount is broken down by Department within Region.

If we load both tables we can produce a chart using expressions like =Sum(BudgetAmount).


The Budget and Sales values have been summed correctly.



We then notice that we have an undesirable synthetic key, created by the Budget and Sales tables sharing the Year and Region fields. One approach to eliminate the synthetic key would be JOIN or CONCATENATE. But which one in this case?







Let's try JOIN and see what the Chart looks like.

OUTER JOIN (Budget) LOAD * RESIDENT SALES


The summed Budget numbers are incorrect!


A look at raw data of the joined table will identify the problem. The JOIN repeated the BudgetAmount value on each Department row.


Let's try with CONCATENATE.


CONCATENATE (Budget) LOAD * RESIDENT Sales;


The numbers are now correct and we've accomplished the goal of eliminating the synthetic key.





A peek at the data in the Concatenated table will make it clear why the chart is now correct. There is only BudgetAmount value or each Year-Region.


JOIN and CONCATENATE are both very useful and frequently used functions in Qlikview. It's important to understand the differences between them.

-Rob

19 comments:

HÃ¥kan said...

Excellent! Very good description and explaination. :)

Anonymous said...

I use a similar sequence of steps when I explain join and concatenate in qlik courses for our clients. I also tell them that the synthetic key can be eilminated by creating a column in which I concatenate region and year and renaming the old column. I this way, I avoid the problems of synthetic keys and I don't need to make joins or concatenate: and the associations between tables remain.

Andrew
Excel reports from QlikView: www.nprinting.com

jm said...

Explains the difference between Join and Concatenate nicely - I still do not see the advantage of Joins - just leave the tables as they are and QlikView will do the joins for you anyway?

Rob Wunderlich said...

@jim,

JOIN is frequently required to eliminate synthetics keys and circular references.

If you have simple (single field) keys, I agree it's usually easier to keep the model as liked tables.

In very large applications, joined tables can sometimes perform better than linked tables.

-Rob

Anonymous said...

Is there any difference between concatentate and the 'union all' database function ? (other than needting to spoof the blank columns in order to make the two tables you're unioning appear to have identicle columns?

Is there some advantage to going one way or the other?

Dave

Rob Wunderlich said...

Dave,
In a QV concatenate, values for the unmatched columns are "missing". Missing is not the same as Null.

In the DB union you'll have to assign some value, even if that values is NULL.

Null and Missing are two different things to QV, although off the top of my head I can't give an example of where this would be hugely significant.

-Rob

Anonymous said...

"Missing is not the same as Null.
...
Null and Missing are two different things to QV"


Rob, on what basis do you say so?
What does QV use represent each of the null and missing value?

And how can we handle null values in our application?
during load and in expressions in charts?

-gun

Rob Wunderlich said...

@gun,
"What does QV use represent each of the null and missing value?"

The symbols to be displayed for Null and Missing Values are specifie on the Presentation tab. There is an entry for Null and separate entry for Missing Values. The default symbol for both is "-".

"And how can we handle null values in our application?
during load and in expressions in charts?"

There are options available to suppress or display null dimension values (on Dimension tab) and missing expression values can be suppressed or displayed on the Presentation tab.

-Rob

AlexeyR said...

Thanks! Its new software for me.

Anonymous said...

In the concatenated table, what would the expression such that if a filtered was applied on Department and Toys was selected, it would calculate a ratio for:
East Region in 2008 of 1,200/1,500?

West Region in 2008 of 4,000/10,000?

Niraj said...

Hi..
you have explained things really well.
Now i have clearly understood why and where to use join and concatenate.
Thanks.
live long

Rob Wunderlich said...

@Anonymous said...
"In the concatenated table, what would the expression such..."

To calculate the ratio against the total BudgetAmount for chart dimension(s).

=sum(SalesAmount) / sum({1} BudgetAmount)

Anonymous said...

Hi Rob,

Thanks for your reply @ "=sum(SalesAmount) / sum({1} BudgetAmount)"

If my concatenated table (from two fact tables) looks like the following:

Line,Premium,Claim,Policy#,PolicyLocation
1,20,-,A,X
2,-,10,A,-
3,30,-,A,Y
4,40,-,B,X
5,50,-,C,Y
6,-,20,C,-

If the user has the ability to filter on the "PolicyLocation" field and selected X, what would the formula be to return the Claim/Premium ratio of 10/60 (combination of lines 1,2 and 4 from the contcatenated table), if the Policy# field cannot be a chart dimension due to large number of Policy#'s?

The PolicyLocation field is not populated for Claim fact table as the Claim cannot be tied to a Location.

Appreciate if you could help with the above.

Anonymous said...

I looks that the expression =sum(SalesAmount) / sum({1} BudgetAmount) does not work ...
because sum({1} BudgetAmount) always return total amount of Budget. Instead of returning annual amount of Budget , so I change the expression the following way:
=sum(SalesAmount) / sum({1 } BudgetAmount)

Please correct me if I am wrong, I am quite new in QlikView.

vikram dabade said...

vikram said..
Mostly concatenation are used to Remove Synthetic Key.is Right sir.

Anonymous said...

awesome post! impressed

Anonymous said...

What will happen, if link table is used to link the Budget and Sales table. Since link table is just like join , would the result be wrong

Rob Wunderlich said...

A properly constructed link table would give the correct result. It would be the same as the Syn Key example, except the link table would stand in for the Syn Key.

Anonymous said...

Very Good post with a good example.Best of all found so far..