Qlikview tutorial2

An optimized QVD loads at a blazing speed. You can load 50 to 100 million rows in matter of minutes.  So, it always makes sense to create a common data layer with a QVD that contains common transformations that many Qlikview applications can consume.

For starters, optimized QVD is the one that contains no transformations other than column aliasing or few other exceptions Qlikview engine allows such as an exists function with a single field.

Having said that, there are some instances in which you have to think creatively to make sure that you can load millions of rows in a reasonable time frame.

In a recent project, one of my teammates needed to load an optimized QVD of a reference table with over 22 million rows. Unlike a type 2 slowly changing dimension — in which a new row  gets added when an attribute value changes — every row gets repopulated monthly with a date key.

He faced a real conundrum when Qlikview natural join created multiple rows in the fact table due to the fact that same key field existed for each month partition in the reference table.

The challenge was to find the latest date key from the reference table and then left join it with the fact to avoid duplicates.

Since I can’t share actual data model, in this Qlikview tutorial, I  have created a sample data set to show you how we managed not only to handle duplicate values but also to improve time to perform data loading. For the sake of this tutorial, I am not using QVD as a source file either.

Step 1

Let’s create a simple data set containing Employee and Sales tables.

Qlikview script 1

For each month, same employee record gets populated with the respective title.

QVD load on a monthly basis should pick the highest date(for that monty) and use that as a filter to join the reference table with the Sales fact table.

So, in April, the date filter should use 3/1/2013 to join Employee table with the Sales table.

Now, let’s load the reference table and find the latest date key.

 

Step 2

Qlikview script 2It is always a good idea to use a numeric value for the date field, so in this step you are seeing a preceding load on top of another.

As you can see, latest date key value from a resident load will be used as a filter to join the reference table with the fact to avoid duplicate rows.

Albeit this works, using a max function on over 22 million rows still takes several minutes. There were multiple large reference tables in this data model.

Soon, it was apparent to find a better way to find the latest key.

 

 

Step 3

We decided to use a simple order by instead of using the max function. And it did the trick. Assuming that you have an optimized QVD with about 22 million rows, it takes over 2 minutes to find latest  date key using max function. A simple order by took only 15 seconds.

qlikview script 3

 

 

 

 

 

 

 

 

 

 

Step 4

Now, let’s left join the reference table with the fact.

qlikview script 4

 

On the surface, difference between 2 mins vs 15 seconds seems meager.

Nonetheless, if you have 10 or 15 reference tables with similar requirement then the performance difference becomes obvious.

Readers:  Have you dealt with similar situation? If  so, I’d love to hear your thoughts as it might benefit someone else trying to figure out same conundrum.

 


If you enjoyed this article, get email updates (it’s free).

Email Address:

  • Matyas 12:52 pm on November 15, 2013 Permalink

    Hi, how would you use this peek function when you need the maximum date for each unique employee? In that case can you still use a variable, since you now have not only one max-date but multiple (one for each employee)?
    Thanks

  • Roberto 5:48 pm on March 4, 2014 Permalink

    hi Matyas, i think you could do it grouping by EmpID ,Max(Date), then you will get a table whit the latest dates for each employee but i don,t think using variables for each employee would be wise, use this table instead it will make easier making a reference to employee data.

  • Jo Tedesco 3:25 am on September 11, 2015 Permalink

    Is it possible to load 2 fields, one from each table usng a resident load?