Monday, September 29, 2008

Loading Multiple Excel Sheets

Load from Excel is usually pretty straightforward, but sometimes you'll need to load multiple sheets and make some determinations at runtime. Details such as sheetnames may not be known at script creation time.

The QV statements "SQLTables" and "SQLColumns" may be used to discover information about the sheets and columns available in a workbook. Both of these statements require an ODBC connection. The ODBC connection may also be used to subsequently read the data, but I find using the LOAD biff more convenient.

First make a OLEDB connection to the workbook:
CONNECT TO [Provider=Microsoft.Jet.OLEDB.4.0;Data Source=workbook.xls;Extended Properties="Excel 8.0;"];

Specify the workbook name, relative to the current directory, in the "Data Source=" parameter. This example uses a "DSN-less" connection. It does not require you to predefine an ODBC datasource.


The SQLTables statement return a set of fields describing the tables in the currently connected ODBC datasource, in this case the workbook. A "Table" is an Excel Sheet.

tables:
SQLtables;

Now I've got a list of sheets in the QV "tables" table. The field name that contains the sheetname is "TABLE_NAME". I'll loop through the set of TABLE_NAME values and load each one using a standard biff LOAD.

FOR i = 0 to NoOfRows('tables')-1
LET sheetName = purgeChar(peek('TABLE_NAME', i, 'tables'), chr(39));
Sales:
LOAD *
FROM workbook.xls (biff, embedded labels, table is [$(sheetName)]);
NEXT


Sheetnames that contain blanks will be surrounded by single quotes. The purgeChar() function above removes any single quotes that may be present in the sheetname.

What if I only want to load those sheets names whose name begins with "Sales"? Wrap the LOAD statement in an IF statement to test the sheetname:


IF wildmatch('$(sheetName)', 'Sales*') THEN
LOAD .....
END IF


How about this case? I want to load any sheet that contains the three columns "Sales", "Year" and "Quarter":

columns:
SQLColumns; // Get list of columns
// Join list with columns of interest
RIGHT JOIN (columns) LOAD *;
LOAD * INLINE [

COLUMN_NAME
Quarter
Sales

Year
]
;

// Create a count of how many columns of interest each sheet has
selectSheets:
LOAD TABLE_NAME as SheetName, count(*) as count
RESIDENT columns
GROUP BY TABLE_NAME
;
// Keep only the SheetName that have all 3 columns
RIGHT JOIN
LOAD SheetName
RESIDENT selectSheets
WHERE count = 3


// Load the selected sheets
FOR i = 0 to NoOfRows('selectSheets')-1
LET sheetName = purgeChar(peek('SheetName', i, 'selectSheets'), chr(39));
LOAD....
NEXT


You may wonder if you could use the Excel Driver instead of the Jet provider like this:

CONNECT TO [Provider=MSDASQL;Driver={Microsoft Excel Driver (*.xls)};DBQ=workbook.xls];

The connection will complete and you can use this connection for SQL SELECTs. However, when SQLTables is called, the connection will enumerate tables/columns for all the *.xls files in the current directory.

This provider uses the parameter "DefaultDir=" (default is .) to control which directory is enumerated for SQLTables and SQLColumns calls. The DBQ parm plays no part. You may find this useful as an alternative to using a traditional "for each filelist..." loop to process multiple files.

Complete text of the examples presented here can be found in the QV Cookbook at:
http://robwunderlich.com/Download.html)


4 comments:

count Nazgul said...

Many thanks for this post! It was very helpfully for me! Thanks again!

Arun Kumar said...

For more information on For each loop enumerators (Foreach ADO.NET Schema Rowset Enumerator using Excel Sheets) and programming integration service (SSIS) using C# .Net and VB .Net, please visit the below link:
http://www.sqllion.com/2009/06/programming-foreach-loop-container-%e2%80%93-enumerating-excel-sheets/

thefourth said...

I was trying to load sheets from a XLSM file but it show error with:
tables:
SQLtables;

The error it shows in debug is:
ScriptErrorCount 1
ScriptErrorDetails "Error en la conexión OLEDB"
ScriptErrorList General Error
ScriptError General Error

Do you know how to avoid it?
Thanks

Anonymous said...

I see this is a couple of years old, but I wanted you to know that it's still helping people. Thanks for the work you put into this.