My BI Stuff: May 2010

Sunday, May 16, 2010

Wednesday, May 12, 2010

Informatica Random

The Integration Service ensures that the target database code page is a superset of the source database code page.

Wednesday, May 5, 2010

Lookup (Informatica)

Transformation type:

Passive

Connected/Unconnected

Lookup transformation is used to look up data in a flat file, relational table, view, synonym or source qualifier.

Following tasks can be performed with a Lookup transformation:

Get a related value. Retrieve a value from the lookup table based on a value in the source. For example, the source has an employee ID. Retrieve the employee name from the lookup table.

Perform a calculation.Retrieve a value from a lookup table and use it in a calculation. For example, retrieve a sales tax percentage, calculate a tax, and return the tax to a target.

Update slowly changing dimension tables. Determine whether rows exist in a target.

Types of Lookups:

Relational or flat file lookup:
Use ODBC for relational. Use flat file wizard for flat file lookup.

Pipeline lookup:
Perform a lookup on application sources such as a JMS, MSMQ, or SAP. Drag the source into the mapping and associate the Lookup transformation with the source qualifier.

Connected or unconnected lookup:
A connected Lookup transformation receives source data, performs a lookup, and returns data to the pipeline. An unconnected Lookup transformation is not connected to a source or target. A transformation in the pipeline calls the Lookup transformation with a :LKP expression. The unconnected Lookup transformation returns one column to the calling transformation.
Points to remember:

Unconnected cache can be called by more than one ports.
Unconnected can be called by one or more tranformations.
More than one column can be returned from unconnected lookups by concatenating the fields in the sql override
Lookup cache persistant can be saved in a file and shared by lookups.
Lookup cache persistant is used when the cache is not going to change for a long time. This cache can be shared in multiple phases of the project.
To improve performance when processing relational or flat file lookup sources, create a pipeline Lookup transformation instead of a relational or flat file Lookup transformation.

Cached or uncached lookup:
Cache the lookup source to improve performance. If you cache the lookup source, you can use a dynamic or static cache. By default, the lookup cache remains static and does not change during the session.
With a dynamic cache, the Integration Service inserts or updates rows in the cache. When you cache the target table as the lookup source, you can look up values in the cache to determine if the values exist in the target. The Lookup transformation marks rows to insert or update the target.

Option

Lookup
Type

Description

Lookup SQL Override

Relational

Overrides the default SQL statement to query the lookup table.

Specifies the SQL statement you want the Integration Service to use for querying lookup values. Use only with the lookup cache enabled.

For more information, see Lookup Query.

Lookup Table Name

Pipeline

Relational

The name of the table or source qualifier from which the transformation looks up and caches values. When you create the Lookup transformation, choose a source, target, or source qualifier as the lookup source. You can also import a table, view, or synonym from another database when you create the Lookup transformation.

If you enter a lookup SQL override, you do not need to enter the Lookup Table Name.

Lookup Source Filter

Relational

Restricts the lookups the Integration Service performs based on the value of data in Lookup transformation input ports. For more information about filtering source data from lookups, see Filtering Lookup Source Rows.

Lookup Caching Enabled

Flat File

Pipeline

Relational

Indicates whether the Integration Service caches lookup values during the session.

When you enable lookup caching, the Integration Service queries the lookup source once, caches the values, and looks up values in the cache during the session. Caching the lookup values can improve session performance.

When you disable caching, each time a row passes into the transformation, the Integration Service issues a select statement to the lookup source for lookup values.

Note: The Integration Service always caches flat file and pipeline lookups.

Lookup Policy on Multiple Match

Flat File

Pipeline

Relational

Determines which rows that the Lookup transformation returns when it finds multiple rows that match the lookup condition. You can select the first or last row returned from the cache or lookup source, or report an error. Or, you can allow the Lookup transformation to use any value. When you configure the Lookup transformation to return any matching value, the transformation returns the first value that matches the lookup condition. It creates an index based on the key ports instead of all Lookup transformation ports.

If you do not enable the Output Old Value On Update option, the Lookup Policy On Multiple Match option is set to Report Error for dynamic lookups. For more information about lookup caches, see Lookup Caches.

Lookup Condition

Flat File

Pipeline

Relational

Displays the lookup condition you set in the Condition tab.

Connection Information

Relational

Specifies the database containing the lookup table. You can define the database in the mapping, session, or parameter file:

-	Mapping. Select the connection object. You can also specify the database connection type. Type `Relational:` before the connection name if it is a relational connection. Type `Application:` before the connection name if it is an application connection.

Session. Use the $Source or $Target connection variable. If you use one of these variables, the lookup table must reside in the source or target database. Specify the database connection in the session properties for each variable. For more information about using $Source and $Target, see the Workflow Administration Guide.

-	Parameter file. Use the session parameter $DBConnectionName or $AppConnectionName, and define it in the parameter file.For more information about session parameters and parameter files, see the Workflow Administration Guide.

By default, the Designer specifies $Source if you choose an existing source table and $Target if you choose an existing target table when you create the Lookup transformation. You can override these values in the session properties.

The Integration Service fails the session if it cannot determine the type of database connection.

Source Type

Flat File

Pipeline

Relational

Indicates that the Lookup transformation reads values from a relational table, flat file, or source qualifier.

Tracing Level

Flat File

Pipeline

Relational

Sets the amount of detail included in the session log.

Lookup Cache Directory Name

Flat File

Pipeline

Relational

Specifies the directory used to build the lookup cache files when you configure the Lookup transformation to cache the lookup source. Also used to save the persistent lookup cache files when you select the Lookup Persistent option.

By default, the Integration Service uses the $PMCacheDir directory configured for the Integration Service.

Lookup Cache Persistent

Flat File Pipeline Relational

Indicates whether the Integration Service uses a persistent lookup cache, which consists of at least two cache files. If a Lookup transformation is configured for a persistent lookup cache and persistent lookup cache files do not exist, the Integration Service creates the files during the session. Use only with the lookup cache enabled.

Lookup Data Cache Size Lookup Index Cache Size

Flat File

Pipeline

Relational

Default is Auto. Indicates the maximum size the Integration Service allocates to the data cache and the index in memory. You can configure a numeric value, or you can configure the Integration Service to determine the cache size at run-time. If you configure the Integration Service to determine the cache size, you can also configure a maximum amount of memory for the Integration Service to allocate to the cache.

If the Integration Service cannot allocate the configured amount of memory when initializing the session, it fails the session. When the Integration Service cannot store all the data cache data in memory, it pages to disk.

Use with the lookup cache enabled.

Dynamic Lookup Cache

Flat File

Pipeline

Relational

Indicates to use a dynamic lookup cache. Inserts or updates rows in the lookup cache as it passes rows to the target table.

Use only with the lookup cache enabled.

Output Old Value On Update

Flat File

Pipeline

Relational

Use with dynamic caching enabled. When you enable this property, the Integration Service outputs old values out of the lookup/output ports. When the Integration Service updates a row in the cache, it outputs the value that existed in the lookup cache before it updated the row based on the input data. When the Integration Service inserts a new row in the cache, it outputs null values.

When you disable this property, the Integration Service outputs the same values out of the lookup/output and input/output ports.

This property is enabled by default.

Cache File Name Prefix

Flat File

Pipeline

Relational

Use with persistent lookup cache. Specifies the file name prefix to use with persistent lookup cache files. The Integration Service uses the file name prefix as the file name for the persistent cache files it saves to disk. Only enter the prefix. Do not enter .idx or .dat.

You can enter a parameter or variable for the file name prefix. Use any parameter or variable type that you can define in the parameter file. For information about using parameter files, see the Workflow Administration Guide.

If the named persistent cache files exist, the Integration Service builds the memory cache from the files. If the named persistent cache files do not exist, the Integration Service rebuilds the persistent cache files.

Recache From Lookup Source

Flat File

Pipeline

Relational

Use with the lookup cache enabled. When selected, the Integration Service rebuilds the lookup cache from the lookup source when it first calls the Lookup transformation instance.

If you use a persistent lookup cache, it rebuilds the persistent cache files before using the cache. If you do not use a persistent lookup cache, it rebuilds the lookup cache in memory before using the cache.

Insert Else Update

Flat File

Pipeline

Relational

Use with dynamic caching enabled. Applies to rows entering the Lookup transformation with the row type of insert. When enabled, the Integration Service inserts new rows in the cache and updates existing rows When disabled, the Integration Service does not update existing rows.

For more information about defining the row type, see Using Update Strategy Transformations with a Dynamic Cache.

Update Else Insert

Flat File

Pipeline

Relational

Use with dynamic caching enabled. Applies to rows entering the Lookup transformation with the row type of update.

When enabled, the Integration Service updates existing rows, and inserts a new row if it is new. When disabled, the Integration Service does not insert new rows. For more information about defining the row type, see Using Update Strategy Transformations with a Dynamic Cache.

Datetime Format

Flat File

Click the Open button to select a datetime format. Define the format and field width. Milliseconds, microseconds, or nanoseconds formats have a field width of 29.

If you do not select a datetime format for a port, you can enter any datetime format.Default is MM/DD/YYYY HH24:MI:SS. The Datetime format does not change the size of the port.

Thousand Separator

Flat File

If you do not define a thousand separator for a port, the Integration Service uses the properties defined here.

You can choose no separator, a comma, or a period. Default is no separator.

Decimal Separator

Flat File

If you do not define a decimal separator for a particular field in the lookup definition or on the Ports tab, the Integration Service uses the properties defined here.

You can choose a comma or a period decimal separator. Default is period.

Case-Sensitive String Comparison

Flat File

Pipeline

The Integration Service uses case-sensitive string comparisons when performing lookups on string columns.

For relational lookups, the case-sensitive comparison is based on the database support.

Null Ordering

Flat File

Pipeline

Determines how the Integration Service orders null values. You can choose to sort null values high or low. By default, the Integration Service sorts null values high. This overrides the Integration Service configuration to treat nulls in comparison operators as high, low, or null.

For relational lookups, null ordering is based on the database support.

Sorted Input

Flat File

Pipeline

Indicates whether or not the lookup file data is sorted. This increases lookup performance for file lookups. If you enable sorted input, and the condition columns are not grouped, the Integration Service fails the session. If the condition columns are grouped, but not sorted, the Integration Service processes the lookup as if you did not configure sorted input. For more information about sorted input, see Flat File Lookups.

Lookup Source is Static

Flat File

Pipeline

Relational

The lookup source does not change in a session.

Pre-build Lookup Cache

Allow the Integration Service to build the lookup cache before the Lookup transformation requires the data.

Auto.

-	Always allowed. The Integration Service builds the lookup cache before the Lookup transformation receives the first source row.

-	Always disallowed. The Integration Service builds the lookup cache when the Lookup transformation receives the source row.

Subsecond Precision

Relational

Specifies the subsecond precision for datetime ports.

For relational lookups, you can change the precision for databases that have an editable scale for datetime data. You can change subsecond precision for Oracle Timestamp, Informix Datetime, and Teradata Timestamp datatypes.

Enter a positive integer value from 0 to 9. Default is 6 (microseconds).

Differences Between Connected and Unconnected Lookups
Connected Lookup	Unconnected Lookup
Receives input values directly from the pipeline.	Receives input values from the result of a :LKP expression in another transformation.
Use a dynamic or static cache.	Use a static cache.
Cache includes the lookup source columns in the lookup condition and the lookup source columns that are output ports.	Cache includes all lookup/output ports in the lookup condition and the lookup/return port.
Can return multiple columns from the same row or insert into the dynamic lookup cache.	Designate one return port (R). Returns one column from each row.
If there is no match for the lookup condition, the Integration Service returns the default value for all output ports. If you configure dynamic caching, the Integration Service inserts rows into the cache or leaves it unchanged.	If there is no match for the lookup condition, the Integration Service returns NULL.
If there is a match for the lookup condition, the Integration Service returns the result of the lookup condition for all lookup/output ports. If you configure dynamic caching, the Integration Service either updates the row the in the cache or leaves the row unchanged.	If there is a match for the lookup condition, the Integration Service returns the result of the lookup condition into the return port.
Pass multiple output values to another transformation. Link lookup/output ports to another transformation.	Pass one output value to another transformation. The lookup/output/return port passes the value to the transformation calling :LKP expression.
Supports user-defined default values.	Does not support user-defined default values.

To be continued...

Tuesday, May 4, 2010

Performance Tuning (Informatica)

Filter rows at source qualifier
Sort data at source qualifier, whenever needed, instead of using sorter
Use filter transformations as close to the source as possible
Use sorted input for aggregator
Use database partitioning
Use workflow partitioning
Avoid multiple datatype conversions
Use variables for repetitive calculations
Avoid joiners
Use sorted input for joiner
In joiners, use the sources with less data as master
From source qualifier, take out only those ports which are required
In lookup, delete the ports from the tranformations, which are not required

Error at Data Flow Task [Pivot [28]]: Output column "TransMonth" (49) cannot be mapped to PivotKey input column.

TITLE: Package Validation Error
------------------------------

Package Validation Error

------------------------------
ADDITIONAL INFORMATION:

Error at Data Flow Task [Pivot [28]]: Output column "TransMonth" (49) cannot be mapped to PivotKey input column.

Error at Data Flow Task [SSIS.Pipeline]: "component "Pivot" (28)" failed validation and returned validation status "VS_ISBROKEN".

Error at Data Flow Task [SSIS.Pipeline]: One or more component failed validation.

Error at Data Flow Task: There were errors during task validation.

(Microsoft.DataTransformationServices.VsIntegration)

------------------------------
BUTTONS:

OK
------------------------------

Resolution:

1. TransMonth column was added in the list of output columns. By removing this column from the output columns the issue was fixed.
Note: There can be many reasons for this error.

Pivot (SSIS)

To be continued...

Row Sampling (SSIS)

The Row Sampling transformation is used to obtain a randomly selected subset of an input dataset. You can specify the exact size of the output sample, and specify a seed for the random number generator.
There are many applications for random sampling. For example, a company that wanted to randomly select 50 employees to receive prizes in a lottery could use the Row Sampling transformation on the employee database to generate the exact number of winners.
The Row Sampling transformation is also useful during package development for creating a small but representative dataset. You can test package execution and data transformation with richly representative data, but more quickly because a random sample is used instead of the full dataset. Because the sample dataset used by the test package is always the same size, using the sample subset also makes it easier to identify performance problems in the package.
The Row Sampling transformation creates a sample dataset by selecting a specified number of the transformation input rows. Because the selection of rows from the transformation input is random, the resultant sample is representative of the input. You can also specify the seed that is used by the random number generator, to affect how the transformation selects rows.
Using the same random seed on the same transformation input always creates the same sample output. If no seed is specified, the transformation uses the tick count of the operating system to create the random number. Therefore, you could use the same seed during testing, to verify the transformation results during the development and testing of the package, and then change to a random seed when the package is moved into production.
This transformation is similar to the Percentage Sampling transformation, which creates a sample dataset by selecting a percentage of the input rows.

http://msdn.microsoft.com/en-us/library/ms141200.aspx

Percentage Sampling (SSIS)

The Percentage Sampling transformation creates a sample data set by selecting a percentage of the transformation input rows. The sample data set is a random selection of rows from the transformation input, to make the resultant sample representative of the input.
The Percentage Sampling transformation is especially useful for data mining. By using this transformation, you can randomly divide a data set into two data sets: one for training the data mining model, and one for testing the model.
The Percentage Sampling transformation is also useful for creating sample data sets for package development. By applying the Percentage Sampling transformation to a data flow, you can uniformly reduce the size of the data set while preserving its data characteristics. The test package can then run more quickly because it uses a small, but representative, data set.

http://msdn.microsoft.com/en-us/library/ms139864.aspx

OLE DB Command (SSIS)

The OLE DB Command transformation runs an SQL statement for each row in a data flow. For example, you can run an SQL statement that inserts, updates, or deletes rows in a database table.
You can configure the OLE DB Command Transformation in the following ways:

Provide the SQL statement that the transformation runs for each row.
Specify the number of seconds before the SQL statement times out.
Specify the default code page.

Typically, the SQL statement includes parameters. The parameter values are stored in external columns in the transformation input, and mapping an input column to an external column maps an input column to a parameter. For example, to locate rows in the DimProduct table by the value in their ProductKey column and then delete them, you can map the external column named Param_0 to the input column named ProductKey, and then run the SQL statement DELETE FROM DimProduct WHERE ProductKey = ?.. The OLE DB Command transformation provides the parameter names and you cannot modify them. The parameter names are Param_0, Param_1, and so on.

http://msdn.microsoft.com/en-us/library/ms141138.aspx

Multicast (SSIS)

The Multicast transformation distributes its input to one or more outputs. This transformation is similar to the Conditional Split transformation. Both transformations direct an input to multiple outputs. The difference between the two is that the Multicast transformation directs every row to every output, and the Conditional Split directs a row to a single output.
There is nothing to configure in the Multicast Transformation Editor other than the names of the outputs.

http://msdn.microsoft.com/en-us/library/ms137701.aspx

Merge Join (SSIS)

The Merge Join transformation provides an output that is generated by joining two sorted datasets using a FULL, LEFT, or INNER join. For example, you can use a LEFT join to join a table that includes product information with a table that lists the country/region in which a product was manufactured. The result is a table that lists all products and their country/region of origin. For more information, see Using Joins.
You can configure the Merge Join transformation in the following ways:

Specify the join is a FULL, LEFT, or INNER join.
Specify the columns the join uses.
Specify whether the transformation handles null values as equal to other nulls.

The Merge Join Transformation requires sorted data for its inputs.

http://msdn.microsoft.com/en-us/library/ms141775.aspx

Merge (SSIS)

The Merge transformation combines two sorted datasets into a single dataset. The rows from each dataset are inserted into the output based on values in their key columns.
By including the Merge transformation in a data flow, you can perform the following tasks:

Merge data from two data sources, such as tables and files.
Create complex datasets by nesting Merge transformations.
Remerge rows after correcting errors in the data.

The Merge transformation is similar to the Union All transformations. Use the Union All transformation instead of the Merge transformation in the following situations:

The transformation inputs are not sorted.
The combined output does not need to be sorted.
The transformation has more than two inputs.

http://msdn.microsoft.com/en-us/library/ms141703.aspx

Lookup (SSIS)

The Lookup transformation performs lookups by joining data in input columns with columns in a reference dataset.

You can configure the Lookup transformation in the following ways:

Select the connection manager that you want to use. If you want to connect to a database, select an OLE DB connection manager. If you want to connect to a cache file, select a Cache connection manager.
Specify the table or view that contains the reference dataset.
Generate a reference dataset by specifying an SQL statement.
Specify joins between the input and the reference dataset.
Add columns from the reference dataset to the Lookup transformation output.
Configure the caching options.

The Lookup transformation supports the following database providers for the OLE DB connection manager:

SQL Server
Oracle
DB2

If there is no matching entry in the reference dataset, no join occurs. By default, the Lookup transformation treats rows without matching entries as errors. However, you can configure the Lookup transformation to redirect such rows to a no match output. For more information, see Lookup Transformation Editor (General Page) and Lookup Transformation Editor (Error Output Page).
If there are multiple matches in the reference table, the Lookup transformation returns only the first match returned by the lookup query. If multiple matches are found, the Lookup transformation generates an error or warning only when the transformation has been configured to load all the reference dataset into the cache. In this case, the Lookup transformation generates a warning when the transformation detects multiple matches as the transformation fills the cache.

http://msdn.microsoft.com/en-us/library/ms141821.aspx

Import Column (SSIS)

The Import Column transformation reads data from files and adds the data to columns in a data flow. Using this transformation, a package can add text and images stored in separate files to a data flow. For example, a data flow that loads data into a table that stores product information can include the Import Column transformation to import customer reviews of each product from files and add the reviews to the data flow.
The Import Column Transform is a partner to the Export Column Transform. These transforms do the
work of translating physical files from system file storage paths into database blob - type fields and vice
versa.
You can configure the Import Column transformation in the following ways:

Specify the columns to which the transformation adds data.
Specify whether the transformation expects a byte-order mark (BOM).

http://msdn.microsoft.com/en-us/library/ms141262.aspx

Fuzzy Grouping (SSIS)

To be continued...

Fuzzy Lookup (SSIS)

To be continued....

Export Column (SSIS)

The Export Column transformation reads data in a data flow and inserts the data into a file.
For example, if the data flow contains product information, such as a picture of each product, you could use the Export Column transformation to save the images to files.
Unlike the other transformations, the Export Column Transform doesn ’ t need a destination to create the
file.

You can configure the Export Column transformation in the following ways:

Specify the data columns and the columns that contain the path of files to which to write the data.
Specify whether the data-insertion operation appends or truncates existing files.
Specify whether a byte-order mark (BOM) is written to the file.

http://msdn.microsoft.com/en-us/library/ms139818.aspx

Derived Column (SSIS)

The Derived Column transformation creates new column values by applying expressions to transformation input columns. An expression can contain any combination of variables, functions, operators, and columns from the transformation input. The result can be added as a new column or inserted into an existing column as a replacement value. The Derived Column transformation can define multiple derived columns, and any variable or input columns can appear in multiple expressions.
You can use this transformation to perform the following tasks:

Concatenate data from different columns into a derived column. For example, you can combine values from the FirstName and LastName columns into a single derived column named FullName, by using the expression FirstName + " " + LastName.
Extract characters from string data by using functions such as SUBSTRING, and then store the result in a derived column. For example, you can extract a person's initial from the FirstName column, by using the expression SUBSTRING(FirstName,1,1).
Apply mathematical functions to numeric data and store the result in a derived column. For example, you can change the length and precision of a numeric column, SalesTax, to a number with two decimal places, by using the expression ROUND(SalesTax, 2).
Create expressions that compare input columns and variables. For example, you can compare the variable Version against the data in the column ProductVersion, and depending on the comparison result, use the value of either Version or ProductVersion, by using the expression ProductVersion == @Version? ProductVersion : @Version.
Extract parts of a datetime value. For example, you can use the GETDATE and DATEPART functions to extract the current year, by using the expression DATEPART("year",GETDATE()).

http://msdn.microsoft.com/en-us/library/ms141069.aspx

Data Mining Query (SSIS)

The Data Mining Query transformation performs prediction queries against data mining models. This transformation contains a query builder for creating Data Mining Extensions (DMX) queries.

The Data Mining Query Transformation typically is used to fill in gaps in your data or predict a new
column for your Data Flow. This transformation runs a data - mining query and adds the output to
the Data Flow. It also can optionally add columns, such as the probability of a certain condition being
true.
A few great scenarios for this transformation would be the following:

You could take columns, such as number of children, household income, and marital income, to predict a new column that states whether the person owns a house or not.
You could predict what customers would want to buy based on their shopping cart items.
You could fill in the blank holes in your data where customers didn ’ t enter all the fields in a questionnaire.

Data Conversion (SSIS)

The Data Conversion transformation converts the data in an input column to a different data type and then copies it to a new output column.

You can configure the Data Conversion transformation in the following ways:

Specify the columns that contain the data to convert and the types of data conversions to perform.
Specify whether the transformation output columns use the quicker, but locale-insensitive, fast parsing routines that Microsoft SQL Server Integration Services provides or the standard locale-sensitive parsing routines.

http://msdn.microsoft.com/en-us/library/ms141706.aspx

Monday, May 3, 2010

Copy Column (SSIS)

The Copy Column transformation creates new columns by copying input columns and adding the new columns to the transformation output. Later in the data flow, different transformations can be applied to the column copies.
This transformation has one input and one output. It does not support an error output.

http://msdn.microsoft.com/en-us/library/ms137741.aspx

Conditional Split (SSIS)

The Conditional Split transformation can route data rows to different outputs depending on the content of the data.
It is similar to a CASE decision structure in a programming language.

You can configure the Conditional Split transformation in the following ways:

Provide an expression that evaluates to a Boolean for each condition you want the transformation to test.
Specify the order in which the conditions are evaluated. Order is significant, because a row is sent to the output corresponding to the first condition that evaluates to true.
Specify the default output for the transformation. The transformation requires that a default output be specified.

Each input row can be sent to only one output, that being the output for the first condition that evaluates to true.

http://msdn.microsoft.com/en-us/library/ms137886.aspx

Character Map (SSIS)

The Character Map transformation applies string functions, such as conversion from lowercase to uppercase, to character data. This transformation operates only on column data with a string data type.
You configure the Character Map transformation in the following ways:

Specify the columns to convert.
Specify the operations to apply to each column.

The Character Map transformation can convert column data in place or add a column to the transformation output and put the converted data in the new column. You can apply different sets of mapping operations to the same input column and put the results in different columns. For example, you can convert the same column to uppercase and lowercase and put the results in two different columns.

http://msdn.microsoft.com/en-us/library/ms141694.aspx

Cache (SSIS)

It allows you to load a cache file on disk in the Data Flow.
The Cache Transform can be used to populate a cache file in the Data Flow as a transform, and then immediately used, or it can be used as a destination and then used by another package or Data Flow in the same package.
This cache file allows you to perform lookups against large datasets.
It allows you to share the same lookup cache across many Data Flows.

Advantages:
As the data to be looked up is cached in a file, the SSIS engine does not need to connect to the database for each incoming row. So the lookups are faster.

Data Modeling Tutorial

http://www.learndatamodeling.com/rdm2.htm

Dimension Model Information

http://download.oracle.com/docs/cd/B28359_01/olap.111/b28124/global.htm

Sunday, May 2, 2010

Joiner (Informatica)

Transformation type:

Active

Connected

The Joiner transformation joins sources with at least one matching column.

The two input pipelines include a master pipeline and a detail pipeline or a master and a detail branch. The master pipeline ends at the Joiner transformation, while the detail pipeline continues to the target.

You cannot use a Joiner transformation when either input pipeline contains an Update Strategy transformation.
You cannot use a Joiner transformation if you connect a Sequence Generator transformation directly before the Joiner transformation.

To improve performance for an unsorted Joiner transformation, use the source with fewer rows as the master source.
To improve performance for a sorted Joiner transformation, use the source with fewer duplicate key values as the master.
By default, when you add ports to a Joiner transformation, the ports from the first source pipeline display as detail sources. Adding the ports from the second source pipeline sets them as master sources.
Use one or more ports from the input sources of a Joiner transformation in the join condition.

Normal Join

With a normal join, the Integration Service discards all rows of data from the master and detail source that do not match, based on the condition.

Master Outer Join

A master outer join keeps all rows of data from the detail source and the matching rows from the master source. It discards the unmatched rows from the master source.

Detail Outer Join

A detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.

Full Outer Join

A full outer join keeps all rows of data from both the master and detail sources.

Joiner Transformation Properties
Option	Description
Case-Sensitive String Comparison	If selected, the Integration Service uses case-sensitive string comparisons when performing joins on string columns.
Cache Directory	Specifies the directory used to cache master or detail rows and the index to these rows. By default, the cache files are created in a directory specified by the process variable $PMCacheDir. If you override the directory, make sure the directory exists and contains enough disk space for the cache files. The directory can be a mapped or mounted drive.
Join Type	Specifies the type of join: Normal, Master Outer, Detail Outer, or Full Outer.
Null Ordering in Master	Not applicable for this transformation type.
Null Ordering in Detail	Not applicable for this transformation type.
Tracing Level	Amount of detail displayed in the session log for this transformation. The options are Terse, Normal, Verbose Data, and Verbose Initialization.
Joiner Data Cache Size	Data cache size for the transformation. Default cache size is 2,000,000 bytes. If the total configured cache size is 2 GB or more, you must run the session on a 64-bit Integration Service. You can configure a numeric value, or you can configure the Integration Service to determine the cache size at runtime. If you configure the Integration Service to determine the cache size, you can also configure a maximum amount of memory for the Integration Service to allocate to the cache.
Joiner Index Cache Size	Index cache size for the transformation. Default cache size is 1,000,000 bytes. If the total configured cache size is 2 GB or more, you must run the session on a 64-bit Integration Service. You can configure a numeric value, or you can configure the Integration Service to determine the cache size at runtime. If you configure the Integration Service to determine the cache size, you can also configure a maximum amount of memory for the Integration Service to allocate to the cache.
Sorted Input	Specifies that data is sorted. Choose Sorted Input to join sorted data. Using sorted input can improve performance. For more information about working with sorted input, see Using Sorted Input.
Transformation Scope	Specifies how the Integration Service applies the transformation logic to incoming data. You can choose Transaction, All Input, or Row. For more information, see Working with Transactions.

Adding Transformations to the Mapping (Sorted Input)

When you add transformations between the sort origin and the Joiner transformation, use the following guidelines to maintain sorted data:

Do not place any of the following transformations between the sort origin and the Joiner transformation:

Custom

-	Unsorted Aggregator

-	Normalizer

Rank

-	Union transformation

-	XML Parser transformation

-	XML Generator transformation

-	Mapplet, if it contains one of the above transformations

You can place a sorted Aggregator transformation between the sort origin and the Joiner transformation if you use the following guidelines:

-	Configure the Aggregator transformation for sorted input using the guidelines in Using Sorted Input.

-	Use the same ports for the group by columns in the Aggregator transformation as the ports at the sort origin.

-	The group by ports must be in the same order as the ports at the sort origin.

When you join the result set of a Joiner transformation with another pipeline, verify that the data output from the first Joiner transformation is sorted.

Tip: You can place the Joiner transformation directly after the sort origin to maintain sorted data.

If you use a sorted Aggregator transformation between the sort origin and the Joiner transformation, treat the sorted Aggregator transformation as the sort origin when you define the join condition. Use the following guidelines when you define join conditions:

The ports you use in the join condition must match the ports at the sort origin.

When you configure multiple join conditions, the ports in the first join condition must match the first ports at the sort origin.

When you configure multiple conditions, the order of the conditions must match the order of the ports at the sort origin, and you must not skip any ports.

The number of sorted ports in the sort origin can be greater than or equal to the number of ports at the join condition.

Example of a Join Condition

For example, you configure Sorter transformations in the master and detail pipelines with the following sorted ports:

ITEM_NO

ITEM_NAME

PRICE

When you configure the join condition, use the following guidelines to maintain sort order:

You must use ITEM_NO in the first join condition.

If you add a second join condition, you must use ITEM_NAME.

If you want to use PRICE in a join condition, you must also use ITEM_NAME in the second join condition.

If you skip ITEM_NAME and join on ITEM_NO and PRICE, you lose the sort order and the Integration Service fails the session.

Joining Data from a Single Source

You may want to join data from the same source if you want to perform a calculation on part of the data and join the transformed data with the original data. When you join the data using this method, you can maintain the original data and transform parts of that data within one mapping. You can join data from the same source in the following ways:

Join two branches of the same pipeline.

Join two instances of the same source.

Joining Two Branches of the Same Pipeline

When you join data from the same source, you can create two branches of the pipeline. When you branch a pipeline, you must add a transformation between the source qualifier and the Joiner transformation in at least one branch of the pipeline. You must join sorted data and configure the Joiner transformation for sorted input.

For example, you have a source with the following ports:

Employee

Department

Total Sales

In the target, you want to view the employees who generated sales that were greater than the average sales for their departments. To do this, you create a mapping with the following transformations:

Sorter transformation. Sorts the data.

Sorted Aggregator transformation. Averages the sales data and group by department. When you perform this aggregation, you lose the data for individual employees. To maintain employee data, you must pass a branch of the pipeline to the Aggregator transformation and pass a branch with the same data to the Joiner transformation to maintain the original data. When you join both branches of the pipeline, you join the aggregated data with the original data.

Sorted Joiner transformation. Uses a sorted Joiner transformation to join the sorted aggregated data with the original data.

Filter transformation. Compares the average sales data against sales data for each employee and filter out employees with less than above average sales.


Mapping that Joins Two Branches of a Pipeline

Note: You can also join data from output groups of the same transformation, such as the Custom transformation or XML Source Qualifier transformation. Place a Sorter transformation between each output group and the Joiner transformation and configure the Joiner transformation to receive sorted input.

Joining two branches might impact performance if the Joiner transformation receives data from one branch much later than the other branch. The Joiner transformation caches all the data from the first branch, and writes the cache to disk if the cache fills. The Joiner transformation must then read the data from disk when it receives the data from the second branch. This can slow processing.

Joining Two Instances of the Same Source

You can also join same source data by creating a second instance of the source. After you create the second source instance, you can join the pipelines from the two source instances. If you want to join unsorted data, you must create two instances of the same source and join the pipelines.


Mapping that Joins Two Instances of the Same Source

Note: When you join data using this method, the Integration Service reads the source data for each source instance, so performance can be slower than joining two branches of a pipeline.

Guidelines

Use the following guidelines when deciding whether to join branches of a pipeline or join two instances of a source:

Join two branches of a pipeline when you have a large source or if you can read the source data only once. For example, you can only read source data from a message queue once.

Join two branches of a pipeline when you use sorted data. If the source data is unsorted and you use a Sorter transformation to sort the data, branch the pipeline after you sort the data.

Join two instances of a source when you need to add a blocking transformation to the pipeline between the source and the Joiner transformation.

Join two instances of a source if one pipeline may process slower than the other pipeline.

Join two instances of a source if you need to join unsorted data.