Tuesday, September 14, 2010

Monitoring the Reload Schedule

Monitoring successful operation is an important aspect of any IT system.  What kind of monitoring is useful for Qlikview? I've found the following monitoring important in a Qlikview Server installation:

  • Notification of individual reload failures. This capability, sending an email on reload failure,  is included in all editions of Qlikview Server.

  • Status of Qlikview services. Most shops have a network monitor such as Servers Alive to monitor and send alerts about status of Windows services.

  • A particularly effective style of monitoring is "goal" monitoring. That is, instead of monitoring the resources required to achieve a goal, monitor the goal itself. In Qlikview terms, this means confirming that a set of Qlikview documents has been updated as scheduled.

In this post I'll look at using a Qlikview document to monitor the filetimes of Qlikview Server documents. An "old" filetime is an indication that a reload or distribution has been missed. An email notification will be sent when we are off schedule.The mail body will look something like this:

4 documents overdue: Expenses_Joe.qvw, Expenses_Rob.qvw, Expenses_Sally.qvw, fieldIndex.qvw

The code used in this post can be downloaded at File Age Monitor. Download and extract the three files:
FileAgeMonitor.qvw  -- monitoring document.
FileAgeMonitor_Rules.txt -- Filename masks and maximum expected age.
FileAgeMonitor_Email.txt -- Address(s) to send alert about overdue documents.

To use FileAgeMonitor in your shop you'll need to make the following changes:
  1. On the "Configuration" script tab, specify the directories you want to scan for qvw files.
  2. Modify FileAgeMonitor_Rules.txt to specify rules meaningful to your installation, Instructions are in the file.
  3. Modify FileAgeMonitor_Email.txt for your email address. Instructions are in the file.

FileAgeMonitor_Rules.txt consists of two fields:
    - a filename mask (Key)
    - a maximum allowable age in hours (MaxAge).
Lines beginning with "#" are comments. Example:

Key, MaxAge
# Rules file for FileAgeMonitor.qvw. First match wins.
# "Filename Mask", "Max allowable age in hours"
# All of the Expenses* docs should be no more than 25 hours old
Expenses*.qvw, 25

#Films can be 90 days old
Films.qvw, 90*24

# Everything else - catch all default - 7 days
*, 7*24
 
The first field, "Key" is a filename mask that may use the standard Qlikview wild card characters of "*" to match any number of characters and "?" to match a single character.
 
The second field, "MaxAge", is the threshold age at which a file is considered "overdue". MaxAge may contain any expression that evaluates to a numeric value. The value is hours.
 
The last entry in the Rules files, "*" will match all files.
 
The script builds a list of qvw files and matches each filename against the entries in the Rules file. The first match wins.  The age of each file is tested against it's matching rule and the flag field "Is Overdue?" is set to Y or N. The flag field is defined as a dual:
  if(FileAge > MaxAge, dual('Y',1), dual('N',0) )
     as "Is Overdue?".

Y has the dual value 1, N the value 0. This allows the flag to be summed.
 

So now we have a chart that displays what's overdue, but how about automatic notifications? For that we'll use a Document Alert. Alerts are created from the Tools menu. Here's the alert defined used to send the email. Refer to the notes that follow the picture.
 
 
1
. Before the condition is evaluated, apply the bookmark that selects "Is Overdue?"=Y.
2. The alert condition is specified as:
  =sum([Is Overdue?])  > 0
Recall that "Is Overdue?" was defined as a dual so it may be summed. If there are any overdue documents, the condition will be true and the alert will "fire".
3. Both the Mail subject and mail body contain the count of overdue documents. The body contains the document names as well. The body expression is:
=sum([Is Overdue?]) & ' documents overdue: ' & concat(FileName, ', ')
4. The email will be sent to the addresses that were loaded into field "AlertTo".  The script loaded this field from the file FileAgeMonitor_Email.txt.
5. Batch Mode limits this alert to server based reload only. If Interactive were checked, the alert may fire when we are reloading during Development.
6. The Alert will be tested at the end of each reload.
7. The trigger level is set to "Message Changes". This means a new email will be sent only when the count of overdue documents changes. So we will not get an hourly email telling us that "10 docs are overdue", but will receive a new email if the next reload produces an overdue count of 8 or 12 -- something different than 10.

FileAgeMonitor itself needs to be scheduled to reload periodically. I schedule it to run towards the end of interval reload cycles. For example, if there are hourly reloads at the top of the hour, I schedule FileAgeMonitor at 45 minutes after the hour.

If FileAgeMonitor relies on Server scheduling, and the entire scheduling process fails, how will FileAgeMonitor be able to tell us that reloads are not running? This is the "monitoring the monitor" problem that inevitably occurs with system monitoring. I address this issue by monitoring the Filetime of FileAgeMonitor.qvw using an external monitor like Servers Alive.

Qlikview reloading can get "off schedule" for any number of reasons; Database errors, Administrator errors. bugs in the scheduling software.  It's important for the Administrator to know of exceptions and their scope as quickly as possible.

-Rob