Monday, 18 November 2013

11/18/2013 01:51:00 pm

IDQ Interview Questions Part1


Question
What is the difference between the Power Center Integration Service and the Data Integration Service? 

The Power Center Integration Service is an application service that runs sessions and workflows.The Data Integration Service is an application service that performs data integration tasks for the Analyst tool,the Developer tool, and external clients. The Analyst tool and the Developer tool send data integration task requests to the Data Integration Service to preview or run data profiles, SQL data services, and mappings. Commands from the command line or an external client send data integration task requests to the Data Integration Service to run SQL data services or web services.

Question 
What is the difference between the PowerCenter Repository Service and the Model Repository Service?

The PowerCenter application services and PowerCenter application clients use the PowerCenter Repository Service. The PowerCenter repository has folder-based security.

The other application services, such as the Data Integration Service, Analyst Service, Developer tool, and Analyst tool, use the Model Repository Service. The Model Repository Service has project-based security.
You can migrate some Model repository objects to the PowerCenter repository.

Question 
What is the difference between a mapplet in PowerCenter and a mapplet in the Developer tool? 

  1. Mapplet in PowerCenter and in the Developer tool is a reusable object that contains a set of transformations. You can reuse the transformation logic in multiple mappings. 
  2. PowerCenter mapplet can contain source definitions or Input transformations as the mapplet input.  It must contain Output transformations as the mapplet output. 
  3. Developer tool mapplet can contain data objects or Input transformations as the mapplet input.  It can contain data objects or Output transformations as the mapplet output. 
 A mapping in the Developer tool also includes the following features:

  1. You can validate a mapplet as a rule. 
  2. You use a rule in a profile. 
  3. A mapplet can contain other mapplets. 

Question 
What is the difference between a mapplet and a rule? 

You can validate a mapplet as a rule. A rule is business logic that defines conditions applied to source data when you run a profile. You can validate a mapplet as a rule when the mapplet meets the following requirements:

  1. It contains an Input and Output transformation. 
  2. The mapplet does not contain active transformations. 
  3. It does not specify cardinality between input groups. 

Question 5 What is the difference between a source and target in PowerCenter and a physical data object in the Developer tool?  


In PowerCenter, you create a source definition to include as a mapping source. You create a target definition to include as a mapping target. In the Developer tool, you create a physical data object that you can use as a mapping source or target.


Question 
What is the difference between the PowerCenter Repository Service and the Model Repository Service? 

The PowerCenter application services and PowerCenter application clients use the PowerCenter Repository Service. The PowerCenter repository has folder-based security. The other application services, such as the Data Integration Service, Analyst Service, Developer tool, and Analyst tool, use the Model Repository Service. The Model Repository Service has project-based security.
You can migrate some Model repository objects to the PowerCenter repository.


Question
Is it possible to use reference tables in Case Converter Transformation?
Reference tables can be used only when the case conversion type is Title Case or Sentence case

Question 8 
What are the most used transformations in IDQ?
Most used Transforation : Parser

Question 9 
What is address doctor?
Address Doctor - The Address Validator transformation compares input address data with address reference data to determine the accuracy of input addresses and fix errors in those addresses. The transformation can create columns with new address data, corrected address data, and status information for each address.

Ex: Passing the postal code & retrieve all fields available.

Question 10
Can we export an object from IDQ to Powercenter tool. if yes then how ?
Yes, we can export an object from IDQ to Powercenter tool.

Connect to Repositary Service
  • Locate your Project Folder in Developer tool 
  • Expand Mapping tab
  • Choose your mapping(Needs to be exported) 
  • Expand Informatica Folder 
  • Click Object Export File 
  • Locate under your project folder select the Mapping/mapplets 
  • Click Browse and select the Location where you want to export it 


Question 11
What is a reference table?
Referance Tables - Referance table are use as a guide to compare data from input or can aslo be used as a look up
Question 12
In IDQ, is  possible to create user defined reference tables? In what circumstances can they be required?
Yes, user defined reference table can be created. IDQ enables users to maintain reference tables where they can define a set of allowed values. For example, a list of country/state codes can be maintained in reference table. When a column is profiled against reference table, it shows the number & detail of addresses that don’t match the country/state codes. Reference tables can be easily created from the list of unique values of column profiles and edit the table to add or remove values from it.
Question 13
What is a parser transformation?
It is one of most important transformation used in IDQ. Parsing is the core function of any data quality tool and IDQ provides rich parsing functionality to handle complex patterns.

Parser transformation can be created in two mode
Token Parsing Mode
Pattern Based Parsing
Question 14
Does IDQ have a emailing system like PowerCenter ?
It has a Email Validation system.
Mailbox - Sub-domain - Domain
A email is declared valid when it all the 3 components are present appropriately.
Question 15
How can we publish IDQ SSR results on the Intranet/Web ?
Publishing SSR on Web / New - Thru HTML file
Question 16
What type of IDQ plans can be exported as mapplets to Powercenter?
Exporting plans - Well you can import you IDQ plan in your mapping by the Data Quality Transformation (DQ). This transformation is available only in 8x versions. first u need to save your IDQ plan as a xml in your machine, and then you need to call this plan in your mapping by creating a new DQ transformation and then your all logic is available in the mapping. One thing about this transformation is when ever you do some change in the plan you need to refresh the DQ transformation. Hope this will answer your question. We have done the same things in our data cleansing projects.
Question 17
How to check in Informatica Data Quality which fields in range are unique?
 - Using Matching Transformation

Field Matching: Use to find similar or duplicate records
Identity Matching: Compares two or more values from a record and calculates the similarity between them
Question 18
Where are IDQ mappings saved? 
IDQ mappings are saved in the Staging Database
Question 19
What transformation would you use to standardize data using a reference table? 
We need to use Parser transformation to standardize data using a reference table.
Question 20 
If we make changes to a mapplet in IDQ  that has been integrated into PowerCenter, how are those changes propagated to the integrated mapplet?
We need to  manually re-export the IDQ mapplet to PowerCenter.

Question 21
What are the different rule type you can create in Analyst tool to apply to Profile ?
Expression rules. Use expression functions and columns to define rule logic. Create expression rules in the Analyst tool.
Predefined rules. Includes system-defined, user-defined, and reusable rules. System-defined rules are packaged with the content installer for the Developer tool and can appear as reusable rules in the Analyst tool. Rules that are created in the Developer tool as mapplets can appear in the Analyst tool as user-defined rules. An analyst can create an expression rule and promote it to a reusable rule that other analysts can use in multiple profiles.
Question 22 
What are Token Parser Pros and Cons
Pros: Quick to configure, Standardize as it parses, Multiple outputs to the same output, Reverse parse, Append reference tables
Cons: Not sensitive, works better on unstructured data, output types need to be well defined, Tries to use the first reference table, If a value is already parse it overflows rather than parsing to subsequent tables

11/18/2013 01:30:00 pm

IDQ Functionality


Use the IDQ to design and run processes to complete the following tasks:

Profile data : Profiling reveals the content and structure of data. Profiling is a key step in any data project, as it can identify strengths and weaknesses in data and help you define a project plan.

Create scorecards to review data quality : A scorecard is a graphical representation of the quality measurements in a profile.

Standardize data values : Standardize data to remove errors and inconsistencies that you find when you run a profile. You can standardize variations in punctuation, formatting, and spelling. For example, you can ensure that the city, state, and ZIP code values are consistent.

Parse data :  Parsing reads a field composed of multiple values and creates a field for each value according to the type of information it contains. Parsing can also add information to records. For example, you can define a parsing operation to add units of measurement to product data.

Validate postal addresses : Address validation evaluates and enhances the accuracy and deliverability of postal address data. Address validation corrects errors in addresses and completes partial addresses by comparing address records against address reference data from national postal carriers. Address validation can also add postal information that speeds mail delivery and reduces mail costs.

Find duplicate records : Duplicate analysis calculates the degrees of similarity between records by comparing data from one or more fields in each record. You select the fields to be analyzed, and you select the comparison strategies to apply to the data. The Developer tool enables two types of duplicate analysis: field matching, which identifies similar or duplicate records, and identity matching, which identifies similar or duplicate identities in record data.

Create reference data tables : Informatica provides reference data that can enhance several types of data quality process, including standardization and parsing. You can create reference tables using data from profile results.

Create and run data quality rules : Informatica provides rules that you can run or edit to meet your project objectives. You can create mapplets and validate them as rules in the Developer tool.

Collaborate with Informatica users :The Model repository stores reference data and rules, and this repository is available to users of the Developer tool and Analyst tool. Users can collaborate on projects, and different users can take ownership of objects at different stages of a project.

Export mappings to PowerCenter : You can export mappings to PowerCenter to reuse the metadata for physical data integration or to create web services.

Tuesday, 29 October 2013

10/29/2013 04:33:00 pm

IDQ Parser Transformation

IDQ Parser Transformation 

In this article we are going to cover parser based transformation .It is one of most important transformation used in IDQ. Parsing is the core function of any data quality tool and IDQ provides rich parsing functionality to handle complex patterns.

Parser transformation can be created in two mode


  • Token Parsing Mode 
  • Pattern Based Parsing


Token Based Parsing : It is used to parse strings that match token sets regular expression or reference table based entries.We will use a simple example to create a token based parser transformation.Suppose we have email id coming in a field in format "Name@company.domain" and we want to parse this and store it in multiple fields 
NAME COMPANY_NAME DOMAIN

Suppose we have input data coming as below 

Rahul@gmail.com
Sachin@yahoo.com
Stuart@yahoo.co.uk

We will create a token based parser transformation having email id as input ,After creating transformation go to properties and strategies tab and click on new 

Token Based Parsing :It is used to parse strings that match token sets regular expression or reference table based enteries.
We will use a simple example to create a token based parser transformation.Suppose we have email id coming in a field in format "Name@company.domain" and we want to parse this and store it in multiple fields 
NAME
COMPANY_NAME
DOMAIN
Suppose we have input data coming as below 

Rahul@gmail.com
Sachin@yahoo.com
Stuart@yahoo.co.uk

Step1 : We will create a token based transformation having email id as input ,After creating transformation go to properties and strategies tab and click on new 



Step2 : Click on Token Based

Step3 : Select Regular expression (As we want to have multiple output port)

Step4)  Select email parser or you can create your own regular expression to parse different type of transformation


Step5) Create three output port and click on OK then finish


Below is output from Parser transformation Name ,company and email id parsed into separate fields.


Pattern Based Parsing : Pattern based parsers are useful when working with data that needs to be parsed apart or sorted and the data has a moderately high number of patterns that are easily recognized.
Parser Based Transformation need to have output from Label Transformation which will provide two outputs LabelData and Tokenised data
Suppose we have a field named as PATTERN_DATA in source which contains name ,empno and date in it and we need to parse into three seperate fields
Step1 ) We will first create a label transformation with delimiter as , and below properties by creating new strategies 


in second tab choose execution order and assign label


    Output of Label transformation will be

Step2 ) Connect both LabeledOutput and Tokenized data to pattern based transformation
and create three new output port in port tab as shown below

  
Step3 ) In Pattern Tab define below (As per Label defined in Label) 


You can preview Parser data broken in three fields NAME EMPNO DOB

Hope this post make Parser transformation more clear..In case of nay question please send mail to support@ITNirvanas.com or leave your comment here.



10/29/2013 03:45:00 pm

Generating Dynamic Multiple Target files in Informatica

Recently we came across a scenario to generate multiple dynamic Multiple Target files in Informatica. We receive vendor data through Legacy database in below table


  1. Invoice_ID
  2. Invoice_No
  3. Invoice_Amount
  4. Vendor_Id


We need to separate all details related to one vendor in separate file so that we can pass data to third part vendors in separate file.


INVOICE_DETAILS




INVOICE_ID
INVOICE_NO
INVOICE_AMOUNT
VENDOR_ID
1
A01
100.00
10
2
A02
125.00
10
3
A03
150.00
10
4
B01
200.00
20
5
B02
500.00
20
6
D01
345.00
40
7
C01
230.00
30
8
D02
450.00
40
9
E01
40.00
50






 

 
In order to achieve this we need to follow below steps in mapping


Step1 ) Import INVOICE_DETAILS and drag into mapping and connect all port from Source qualifier to expression and do below sql override


SELECT INVOICE_DETAILS.INVOICE_ID, INVOICE_DETAILS.INVOICE_NO, INVOICE_DETAILS.INVOIC_AMOUNT, INVOICE_DETAILS.VENDOR_ID

FROM

INVOICE_DETAILS

order by VENDOR_ID



Step2) Create expression transformation with as below with 4 input port plus

  1. FILE_NAME (Output Port)  'Vendor_'||VENDOR_ID||'Details.csv'
  2. CHECK_VENDOR_ID(Variable Port)  iif (VENDOR_ID <> PREV_VENDOR_ID,1,0)
  1. OUTPUT_VENDOR_ID (Output)  CHECK_VENDOR_ID
  2. PREV_VENDOR_ID (Variable) VENDOR_ID







Step 3 ) Create Transaction Control Transformation (TC_GENERATE_FILES) with below logic 

iif(OUTPUT_VENDOR_ID=1,TC_COMMIT_BEFORE,TC_CONTINUE_TRANSACTION)



Step 4 ) Create a File Target with same structure as source table and new field in target using option “Add FileName to this Table”

Connect FileName column from transaction control to FileName Port.


Step5) Run the workflow and below files will be created






Saturday, 26 October 2013

10/26/2013 11:10:00 am

Informatica Port Order

Port Order

Informatica calculates ports in the following order:


  1. Input ports. Informatica calculates all input ports first as they are not dependent on any other ports. So, we can create input ports in any order.
  2. Variable ports. Variable ports can reference input ports and variable ports, but not output ports. As variable ports can reference input ports, the informatica calculates variable ports after input ports. Likewise, since variable can reference other variables, the display order for variable ports is the same as the order in which the PowerCenter Server evaluates each variable.
Suppose first we calculate the commission variable and then we need to calculate salary by using commission then we need to make sure port which hold commission should appear first before salary port.
  1. Output ports. As output ports can reference both input ports and variable ports, the informatica evaluates output ports last. The display order for output ports does not matter, as output ports cannot reference other output ports. Output ports should appear at the bottom of the list of ports



    How to capture previous row values

    Suppose we have to check whether employee’s department is same as last employee’s department .For that we can do the following

    1. Create two variables VAL_OLD_DEPT, VAL_NEW_DEPT
    2. Place VAL_OLD_DEPT before Input IN_DEPT Port
    3. Place VAL_NEW_DEPT After Input Dept Port
    4. In port VAL_OLD_DEPT assign following IIF(ISNULL(IN_DEPT),’NODEPT’, IN_DEPT)
    5. Assign VAL_NEW_DEPT equal IN_DEPT
    6. Create one output port to compare