BI Quest [A blog on Informatica]

Share what you learn

Flat File Target & different File Names

Posted by Ankur Nigam on August 23, 2011

Often there is a requirement for deterministic output based on the input when you are writing to flat files. For example you are required to write a mapping that spawns sales output as CSV for every region in a separate file. The name of each file has to be different and each such file should have the sales data for that region only.

I had a scenario somewhat similar to this wherein I have to output categorized items i.e. I have a data model something like this:-

My Items table is joined to category on category id & I was required to output inventory for each category in separate file. The name of each file should be the name of the category.

This can be done through a control Transaction Control. This is a very useful tool but least used. Basically it helps you in controlling when the commit has to be done. For example you want to commit at a point when a certain column value has changed. In our case whenever the category changes the previous data belonging to old category has to be written which means since a new category data has arrived the old category data still held in buffer should now be written into the target.

In addition to Transaction Control, we also need a mechanism to control what filename should be generated. This can be done through a FileName port available in FlatFile Targets. This port is a write-only port and you can only set value to it. When used in session the port value becomes read-only and is used by Integration service to set the name of file.The Output filename attribute will always be ignored if you have used FileName port of target FlatFile. In a sense that a zero byte file of Output filename attribute will be created with no data in it.

So on one hand we control over when and what has to be written, on the other hand we control what file name has to be given to file for writing the data.

Following is the mapping which I have created for this scenario:-

Since I have already defined a relationship between the participating tables the query would be generated accordingly. Please also note that I have sorted the data on the basis of Category Name.The sorted port property in this case is 1.

The main work is done by expression control and transaction control. A more detailed insight of the expression control is given below:-

The basic task of this expression to check if a new category has come or not. I have declared a variable VAR_IS_NEWITEM which is 1 if the incoming category name matched the old category name (stored in VAR_OLD_ITEM). Since matching is done before VAR_OLD_ITEM is assigned new value, we can safely track changes as long as we are getting sorted inputs in expression (recall that sorted output is emitted from Source Qualifier expression)

Coming down further I have outputted the flag whether category is new or not in IS_NEWITEM along withe the desired filename which in my case is <CATEGORY_NAME>.TXT

Moving forward I have now directed this output into a Transaction Control.

Here I am checking if the Category is new then Commit into target. Transaction Control has following properties:-

TC_CONTINUE_TRANSACTION. No action to be taken for the current row. This is the default behavior.

TC_COMMIT_BEFORE. All the old transaction rows before the current row already written in the target have to be committed and a new transaction to begin from current row. In other words old transaction data boundary reached & therefore start new transaction boundary after committing the old transaction data in the target.

TC_COMMIT_AFTER. The current row has to be written and old transaction plus this new row has to be committed.  Thereafter start a new transaction boundary.

TC_ROLLBACK_BEFORE. It means to rollback all old transaction data from target before this row & then begin a new transaction boundary from current row.

TC_ROLLBACK_AFTER. The current row has to be written and old transaction plus this new row has to be roll backed.  Thereafter start a new transaction boundary. The current row is in the rolled back transaction.

Long story said short I need to commit only when the new category has arrived till then keep writing in the target. Since each time also passing the filename from expression into the Filename Port of the target, Integration service will always pick up the value in Filename port of target as the Output filename. In this case where I have used a Filename port the session level Output Filename attribute of target will be ignored in the sense that though the file will be created but with no data.

Following was the query which was issued to database

SELECT ITEM_CATEGORIES.CATEGORY_NAME, STOCK_ITEMS.ITEM_NAME, STOCK_ITEMS.UNIT_PRICE, STOCK_ITEMS.STOCK_QTY
FROM ITEM_CATEGORIES, STOCK_ITEMS
WHERE STOCK_ITEMS.CATEGORY_ID = ITEM_CATEGORIES.CATEGORY_ID ORDER BY ITEM_CATEGORIES.CATEGORY_NAME

which returned data :-

When the session ran the data was neatly organized in different files as per category which is what I required. Note that there is a 0 byte file also created with no data:-

2 Responses to “Flat File Target & different File Names”

  1. […] Comments (RSS) « Flat File Target & different File Names […]

  2. simon said

    Superb,,,its the best explanation,great boss

Leave a comment