Wednesday, May 5, 2010

Lookup (Informatica)

Transformation type:
Passive
Connected/Unconnected
Lookup transformation is used to look up data in a flat file, relational table, view, synonym or source qualifier. 

Following tasks can be performed with a Lookup transformation:
  • Get a related value. Retrieve a value from the lookup table based on a value in the source. For example, the source has an employee ID. Retrieve the employee name from the lookup table.
  • Perform a calculation.Retrieve a value from a lookup table and use it in a calculation. For example, retrieve a sales tax percentage, calculate a tax, and return the tax to a target.   
  • Update slowly changing dimension tables. Determine whether rows exist in a target.
Types of Lookups:
 
Relational or flat file lookup:
Use ODBC for relational. Use flat file wizard for flat file lookup.

Pipeline lookup:
Perform a lookup on application sources such as a JMS, MSMQ, or SAP. Drag the source into the mapping and associate the Lookup transformation with the source qualifier.

Connected or unconnected lookup:
A connected Lookup transformation receives source data, performs a lookup, and returns data to the pipeline. An unconnected Lookup transformation is not connected to a source or target. A transformation in the pipeline calls the Lookup transformation with a :LKP expression. The unconnected Lookup transformation returns one column to the calling transformation.
Points to remember:
  1. Unconnected cache can be called by more than one ports.
  2. Unconnected can be called by one or more tranformations.
  3. More than one column can be returned from unconnected lookups by concatenating the fields in the sql override
  4. Lookup cache persistant can be saved in a file and shared by lookups.
  5. Lookup cache persistant is used when the cache is not going to change for a long time. This cache can be shared in multiple phases of the project.
  6. To improve performance when processing relational or flat file lookup sources, create a pipeline Lookup transformation instead of a relational or flat file Lookup transformation.
Cached or uncached lookup:
Cache the lookup source to improve performance. If you cache the lookup source, you can use a dynamic or static cache. By default, the lookup cache remains static and does not change during the session.
With a dynamic cache, the Integration Service inserts or updates rows in the cache. When you cache the target table as the lookup source, you can look up values in the cache to determine if the values exist in the target. The Lookup transformation marks rows to insert or update the target.

Lookup Transformation Properties
Option
Lookup
Type
Description
Lookup SQL Override
Relational
Overrides the default SQL statement to query the lookup table.
Specifies the SQL statement you want the Integration Service to use for querying lookup values. Use only with the lookup cache enabled.
For more information, see Lookup Query.
Lookup Table Name
Pipeline
Relational
The name of the table or source qualifier from which the transformation looks up and caches values. When you create the Lookup transformation, choose a source, target, or source qualifier as the lookup source. You can also import a table, view, or synonym from another database when you create the Lookup transformation.
If you enter a lookup SQL override, you do not need to enter the Lookup Table Name.
Lookup Source Filter
Relational
Restricts the lookups the Integration Service performs based on the value of data in Lookup transformation input ports. For more information about filtering source data from lookups, see Filtering Lookup Source Rows.
Lookup Caching Enabled
Flat File
Pipeline
Relational
Indicates whether the Integration Service caches lookup values during the session.
When you enable lookup caching, the Integration Service queries the lookup source once, caches the values, and looks up values in the cache during the session. Caching the lookup values can improve session performance.
When you disable caching, each time a row passes into the transformation, the Integration Service issues a select statement to the lookup source for lookup values.
Note: The Integration Service always caches flat file and pipeline lookups.
Lookup Policy on Multiple Match
Flat File
Pipeline
Relational
Determines which rows that the Lookup transformation returns when it finds multiple rows that match the lookup condition. You can select the first or last row returned from the cache or lookup source, or report an error. Or, you can allow the Lookup transformation to use any value. When you configure the Lookup transformation to return any matching value, the transformation returns the first value that matches the lookup condition. It creates an index based on the key ports instead of all Lookup transformation ports.
If you do not enable the Output Old Value On Update option, the Lookup Policy On Multiple Match option is set to Report Error for dynamic lookups. For more information about lookup caches, see Lookup Caches.
Lookup Condition
Flat File
Pipeline
Relational
Displays the lookup condition you set in the Condition tab.
Connection Information
Relational
Specifies the database containing the lookup table. You can define the database in the mapping, session, or parameter file:
-
Mapping. Select the connection object. You can also specify the database connection type. Type Relational: before the connection name if it is a relational connection. Type Application: before the connection name if it is an application connection.
-
Session. Use the $Source or $Target connection variable. If you use one of these variables, the lookup table must reside in the source or target database. Specify the database connection in the session properties for each variable. For more information about using $Source and $Target, see the Workflow Administration Guide.
-
Parameter file. Use the session parameter $DBConnectionName or $AppConnectionName, and define it in the parameter file.For more information about session parameters and parameter files, see the Workflow Administration Guide.
By default, the Designer specifies $Source if you choose an existing source table and $Target if you choose an existing target table when you create the Lookup transformation. You can override these values in the session properties.
The Integration Service fails the session if it cannot determine the type of database connection.
Source Type
Flat File
Pipeline
Relational
Indicates that the Lookup transformation reads values from a relational table, flat file, or source qualifier.
Tracing Level
Flat File
Pipeline
Relational
Sets the amount of detail included in the session log.
Lookup Cache Directory Name
Flat File
Pipeline
Relational
Specifies the directory used to build the lookup cache files when you configure the Lookup transformation to cache the lookup source. Also used to save the persistent lookup cache files when you select the Lookup Persistent option.
By default, the Integration Service uses the $PMCacheDir directory configured for the Integration Service.
Lookup Cache Persistent
Flat File Pipeline Relational
Indicates whether the Integration Service uses a persistent lookup cache, which consists of at least two cache files. If a Lookup transformation is configured for a persistent lookup cache and persistent lookup cache files do not exist, the Integration Service creates the files during the session. Use only with the lookup cache enabled.
Lookup Data Cache Size Lookup Index Cache Size
Flat File
Pipeline
Relational
Default is Auto. Indicates the maximum size the Integration Service allocates to the data cache and the index in memory. You can configure a numeric value, or you can configure the Integration Service to determine the cache size at run-time. If you configure the Integration Service to determine the cache size, you can also configure a maximum amount of memory for the Integration Service to allocate to the cache.
If the Integration Service cannot allocate the configured amount of memory when initializing the session, it fails the session. When the Integration Service cannot store all the data cache data in memory, it pages to disk.
Use with the lookup cache enabled.
Dynamic Lookup Cache
Flat File
Pipeline
Relational
Indicates to use a dynamic lookup cache. Inserts or updates rows in the lookup cache as it passes rows to the target table.
Use only with the lookup cache enabled.
Output Old Value On Update
Flat File
Pipeline
Relational
Use with dynamic caching enabled. When you enable this property, the Integration Service outputs old values out of the lookup/output ports. When the Integration Service updates a row in the cache, it outputs the value that existed in the lookup cache before it updated the row based on the input data. When the Integration Service inserts a new row in the cache, it outputs null values.
When you disable this property, the Integration Service outputs the same values out of the lookup/output and input/output ports.
This property is enabled by default.
Cache File Name Prefix
Flat File
Pipeline
Relational
Use with persistent lookup cache. Specifies the file name prefix to use with persistent lookup cache files. The Integration Service uses the file name prefix as the file name for the persistent cache files it saves to disk. Only enter the prefix. Do not enter .idx or .dat.
You can enter a parameter or variable for the file name prefix. Use any parameter or variable type that you can define in the parameter file. For information about using parameter files, see the Workflow Administration Guide.
If the named persistent cache files exist, the Integration Service builds the memory cache from the files. If the named persistent cache files do not exist, the Integration Service rebuilds the persistent cache files.
Recache From Lookup Source
Flat File
Pipeline
Relational
Use with the lookup cache enabled. When selected, the Integration Service rebuilds the lookup cache from the lookup source when it first calls the Lookup transformation instance.
If you use a persistent lookup cache, it rebuilds the persistent cache files before using the cache. If you do not use a persistent lookup cache, it rebuilds the lookup cache in memory before using the cache.
Insert Else Update
Flat File
Pipeline
Relational
Use with dynamic caching enabled. Applies to rows entering the Lookup transformation with the row type of insert. When enabled, the Integration Service inserts new rows in the cache and updates existing rows When disabled, the Integration Service does not update existing rows.
For more information about defining the row type, see Using Update Strategy Transformations with a Dynamic Cache.
Update Else Insert
Flat File
Pipeline
Relational
Use with dynamic caching enabled. Applies to rows entering the Lookup transformation with the row type of update.
When enabled, the Integration Service updates existing rows, and inserts a new row if it is new. When disabled, the Integration Service does not insert new rows. For more information about defining the row type, see Using Update Strategy Transformations with a Dynamic Cache.
Datetime Format
Flat File
Click the Open button to select a datetime format. Define the format and field width. Milliseconds, microseconds, or nanoseconds formats have a field width of 29.
If you do not select a datetime format for a port, you can enter any datetime format.Default is MM/DD/YYYY HH24:MI:SS. The Datetime format does not change the size of the port.
Thousand Separator
Flat File
If you do not define a thousand separator for a port, the Integration Service uses the properties defined here.
You can choose no separator, a comma, or a period. Default is no separator.
Decimal Separator
Flat File
If you do not define a decimal separator for a particular field in the lookup definition or on the Ports tab, the Integration Service uses the properties defined here.
You can choose a comma or a period decimal separator. Default is period.
Case-Sensitive String Comparison
Flat File
Pipeline
The Integration Service uses case-sensitive string comparisons when performing lookups on string columns.
For relational lookups, the case-sensitive comparison is based on the database support.
Null Ordering
Flat File
Pipeline
Determines how the Integration Service orders null values. You can choose to sort null values high or low. By default, the Integration Service sorts null values high. This overrides the Integration Service configuration to treat nulls in comparison operators as high, low, or null.
For relational lookups, null ordering is based on the database support.
Sorted Input
Flat File
Pipeline
Indicates whether or not the lookup file data is sorted. This increases lookup performance for file lookups. If you enable sorted input, and the condition columns are not grouped, the Integration Service fails the session. If the condition columns are grouped, but not sorted, the Integration Service processes the lookup as if you did not configure sorted input. For more information about sorted input, see Flat File Lookups.
Lookup Source is Static
Flat File
Pipeline
Relational
The lookup source does not change in a session.
Pre-build Lookup Cache
Allow the Integration Service to build the lookup cache before the Lookup transformation requires the data.
-
Auto.
-
Always allowed. The Integration Service builds the lookup cache before the Lookup transformation receives the first source row.
-
Always disallowed. The Integration Service builds the lookup cache when the Lookup transformation receives the source row.
Subsecond Precision
Relational
Specifies the subsecond precision for datetime ports.
For relational lookups, you can change the precision for databases that have an editable scale for datetime data. You can change subsecond precision for Oracle Timestamp, Informix Datetime, and Teradata Timestamp datatypes.
Enter a positive integer value from 0 to 9. Default is 6 (microseconds).


Differences Between Connected and Unconnected Lookups
Connected Lookup
Unconnected Lookup
Receives input values directly from the pipeline.
Receives input values from the result of a :LKP expression in another transformation.
Use a dynamic or static cache.
Use a static cache.
Cache includes the lookup source columns in the lookup condition and the lookup source columns that are output ports.
Cache includes all lookup/output ports in the lookup condition and the lookup/return port.
Can return multiple columns from the same row or insert into the dynamic lookup cache.
Designate one return port (R). Returns one column from each row.
If there is no match for the lookup condition, the Integration Service returns the default value for all output ports. If you configure dynamic caching, the Integration Service inserts rows into the cache or leaves it unchanged.
If there is no match for the lookup condition, the Integration Service returns NULL.
If there is a match for the lookup condition, the Integration Service returns the result of the lookup condition for all lookup/output ports. If you configure dynamic caching, the Integration Service either updates the row the in the cache or leaves the row unchanged.
If there is a match for the lookup condition, the Integration Service returns the result of the lookup condition into the return port.
Pass multiple output values to another transformation. Link lookup/output ports to another transformation.
Pass one output value to another transformation. The lookup/output/return port passes the value to the transformation calling :LKP expression.
Supports user-defined default values.
Does not support user-defined default values.


To be continued...

1 comment:

  1. can any one explain me the significance of "Output Old Value on Update" option in dynamic lookup transformation.?

    ReplyDelete