Friday, December 4, 2009

Simplify with Preceding Load

Most QV script developers are introduced to "preceding load" as a LOAD that precedes an SQL SELECT. But a LOAD may also precede another LOAD, which can be a very useful tool.

Let's review a typical preceding load.


Table1:
LOAD Customer, Sales, today(1) as LoadDate ;
SQL SELECT Customer, Country, Sales FROM SalesResults ;
  • The absence of a "FROM" or "RESIDENT" clause in the LOAD is what makes this a "preceding load".
  • The SQL SELECT will be executed first. The results of the SELECT will be used as input to the LOAD statement.
  • Table1 will have three fields -- Customer, Sale, LoadDate.
  • The Field "Country" will not be present in Table1 because "Country" is not repeated on the LOAD statement.
  • The field "LoadDate" does not exist in the SQL SELECT and is added by the LOAD.



Let's look at an example of where preceding load can be useful. When loading data, you may need to use expressions to parse or cleanse data. For example, extracting a timestamp from a string in a text file.


Table1:
LOAD
timestamp(timestamp#(mid(@1:n,3,12), 'MMDDYYhhmmss')) as EventTime,
mid(@1:n, 17) as Event
FROM myfile.txt (fix, codepage is 1252);


What if you want additional time dimensions from the data? You could add expressions like:


date(date#(mid(@1:n, 3, 6)) as EventDate
month(date(date#(mid(@1:n, 3, 6))) as EventMonth


The script would soon get messy with "paren-disease" and become harder to maintain. Preceding Load to the rescue.


Table2:
LOAD *,
floor(EventTime) as EventDate,
month(EventTime) as EventMonth,
year(EventTime) as EventYear,
hour(EventTime) as EventHour
;
LOAD
timestamp(timestamp#(mid(@1:n,3,12), 'MMDDYYhhmmss')) as EventTime, mid(@1:n, 17) as Event
FROM myfile.txt (fix, codepage is 1252);


  • The syntax is greatly simplified by reusing the "parsed once" EventTime.
  • Table2 will contain six fields: Event, EventTime, EventDate, EventMonth, EventYear, EventHour.
  • The "*" in the top load includes the fields emitted by the bottom load -- EventTime & EventTime.

Preceding Loads may also be stacked more than two deep as in this example.

Table2:
LOAD *,
if(match(EventMonth, 'Aug', 'Dec') OR weekday(EventDate) > 5, 'Holiday', 'Standard') as Rate;

LOAD *,date(floor(EventTime)) as EventDate,
month(EventTime) as EventMonth,
year(EventTime) as EventYear,
hour(EventTime) as EventHour;

LOAD timestamp(timestamp#(mid(@1:n,3,12), 'MMDDYYhhmmss')) as EventTime,
mid(@1:n, 17) as Event
FROM myfile.txt (fix, codepage is 1252);

Preceding load is a useful tool to simplify the syntax of your script and make it easier to maintain.

-Rob