Tuesday 3 February 2009

Best Practices in Informatica (Part1)

We have tried to come up with some of best practices in informatica

  • Always try to add expression transformation after source qualifier and before Target. If source or target definition changes it is easier to reconnect the ports
  • Always use Cobol File for normaliser in binary format otherwise there are lot of issues specially with comp-3 fields
  • Remove unused ports, though unused ports do not have any effect on performance bit it is always better to remove them for more visibility
  • If possible try to do calculation in Output Ports instead of variable ports as variable ports are need to assign/reassign each time and it can slow down the performance
  • Try to avoid complex filter expression instead of that try to evaluate filter expression in upstream expression transformation and pass it to filter transformation. If you use too many complex calculations in filter condition expression it can slow down performance.
  • In workflow Source/Target directory Property take advantage of Unix links. Instead of hard coding path in source/target directory specify path with Unix link i.e. suppose in devt environment you are specifying Source directory path as /devserver/team/source and in prod server you specify it as /prodserver/team/source .You can get link created in $PMRootDir in as src_file_dir pointing to /devserver/team/source in dev server and /prodserver/team/source in prod server and in your source/Target file directory you can put path as $PMRootDir/src_file_dir In this case there is no need to change Source/Target directory every time you move between production and dev and testing
  • In sequence generator do not connect current value port to downstream (unless required) transformation as when we connect current value port from sequence generator transformation Informatica Server processes one row in each block. We can optimize performance by connecting only the NEXTVAL port in a mapping .
  • Improve lookup performance by putting all conditions that use the equality operator ‘=’ first in the list of conditions under the condition tab.
  • Always remember rule not to cache look tables having more than 550000 rows (Assuming row size 1024) .If your row size is less than or more than 1024 then adjust number of rows accordingly .
  • Avoid calculating same value again and again. Instead of that store it in a variable use it several times.
  • Limit the number of rows as early in the mapping as possible (SQ, aggregators, joiners, filters) - the more you limit at the beginning of the data flow, the less rows will be processed by all downstream transformations.
  • For generating IDs use IDENTITY columns or DB triggers
  • There should be no unused ports in any transformation (except Source Qualifier) - Source Qualifier should match Source Definition 1-to-1. But for any other transformation there should be no unused ports. Why fetch and process the data for half of the data flow and use extra memory, if you don’t need it?
  • Avoid implicit conversion, as it is slower and might lead to errors - There are number of issues with implicit conversion (e.g. when you link SQ string port to decimal port in the following expression):
  • Remove all the 'Error (Transformation error)' messages from default values in Expression transformations - you will avoid getting unwanted error messages in logs. It will not be checked at session initialization
  • Consider adding Expression transformation right after each SQ and right before each Target - pass-through expressions do not cause additional computation. But I couldn’t count how many times I had to make “little change” before or after the whole data flow in an existing mapping. Having this dummy expression transformation helps a lot. It’s also very useful when relinking all ports after some changes (using Autolink by name).

If you have further queries then please mail to support@itnirvanas.com


  1. hi,
    Can you tell how you came to the conclusion of point 9? What units is 1024 in here? KB? Bytes? I assume it as bytes and your 550000 rows comes in as 500 KB. If I assume it as 1024 KB, then the 550000 rows contribute to 500 MB.

    What are trying to infer here?


  2. Thanks Sarma for visiting ITNirvanas and giving your valuable comments.

    1024 is bytes.I was trying to say if no of rows in lookuptable are more than 500000 with rowsize 1024 Bytes then it is normally recommended to not use cache.

    But having said that i should have also mentioned it also depends how many source rows you are reading.

    Thanks for pointing out.

    Let us say, you have 10 rows in the source and one of the columns has to be checked against a big table (1 million rows). Then PowerCenter builds the cache for the lookup table and then checks the 10 source rows against the cache. It takes more time to build the cache of 1 million rows than going to the database 10 times and lookup against the table directly. Use uncached lookup instead of building the static cache, as the number of source rows is quite less than that of the lookup.

    I did small test on this and following are results

    Source Lkp W/O-Cache(Secs) Cache(Secs)
    10 500000 13 60
    1000 500000 12 50
    10000 500000 28 50
    100000 500000 169 60

    Row size was 1024 Bytes.

    Please continue to give your valuable inputs.

  3. Hi,

    I've given this suggestion and few more in this article. Please have a look at it.


  4. Sorry, I think I forgot to paste the article link.



  5. Hello,
    Very nice posts. Really appreciate your efforts.

    I have doubt in determining the Look up cache size and index cache size for the lookup transformation. On which terms we will decide the size of those?

    For example, if we have a target of size 100GB and we are taking that target as look up table can we go with the default size of look up cache and index cache? If we need to change the cache, on what measure we need to change them?

    Thanks in advance.