My BI Stuff: Joiner (Informatica)

Transformation type:

Active

Connected

The Joiner transformation joins sources with at least one matching column.

The two input pipelines include a master pipeline and a detail pipeline or a master and a detail branch. The master pipeline ends at the Joiner transformation, while the detail pipeline continues to the target.

You cannot use a Joiner transformation when either input pipeline contains an Update Strategy transformation.
You cannot use a Joiner transformation if you connect a Sequence Generator transformation directly before the Joiner transformation.

To improve performance for an unsorted Joiner transformation, use the source with fewer rows as the master source.
To improve performance for a sorted Joiner transformation, use the source with fewer duplicate key values as the master.
By default, when you add ports to a Joiner transformation, the ports from the first source pipeline display as detail sources. Adding the ports from the second source pipeline sets them as master sources.
Use one or more ports from the input sources of a Joiner transformation in the join condition.

Normal Join

With a normal join, the Integration Service discards all rows of data from the master and detail source that do not match, based on the condition.

Master Outer Join

A master outer join keeps all rows of data from the detail source and the matching rows from the master source. It discards the unmatched rows from the master source.

Detail Outer Join

A detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.

Full Outer Join

A full outer join keeps all rows of data from both the master and detail sources.

Joiner Transformation Properties
Option	Description
Case-Sensitive String Comparison	If selected, the Integration Service uses case-sensitive string comparisons when performing joins on string columns.
Cache Directory	Specifies the directory used to cache master or detail rows and the index to these rows. By default, the cache files are created in a directory specified by the process variable $PMCacheDir. If you override the directory, make sure the directory exists and contains enough disk space for the cache files. The directory can be a mapped or mounted drive.
Join Type	Specifies the type of join: Normal, Master Outer, Detail Outer, or Full Outer.
Null Ordering in Master	Not applicable for this transformation type.
Null Ordering in Detail	Not applicable for this transformation type.
Tracing Level	Amount of detail displayed in the session log for this transformation. The options are Terse, Normal, Verbose Data, and Verbose Initialization.
Joiner Data Cache Size	Data cache size for the transformation. Default cache size is 2,000,000 bytes. If the total configured cache size is 2 GB or more, you must run the session on a 64-bit Integration Service. You can configure a numeric value, or you can configure the Integration Service to determine the cache size at runtime. If you configure the Integration Service to determine the cache size, you can also configure a maximum amount of memory for the Integration Service to allocate to the cache.
Joiner Index Cache Size	Index cache size for the transformation. Default cache size is 1,000,000 bytes. If the total configured cache size is 2 GB or more, you must run the session on a 64-bit Integration Service. You can configure a numeric value, or you can configure the Integration Service to determine the cache size at runtime. If you configure the Integration Service to determine the cache size, you can also configure a maximum amount of memory for the Integration Service to allocate to the cache.
Sorted Input	Specifies that data is sorted. Choose Sorted Input to join sorted data. Using sorted input can improve performance. For more information about working with sorted input, see Using Sorted Input.
Transformation Scope	Specifies how the Integration Service applies the transformation logic to incoming data. You can choose Transaction, All Input, or Row. For more information, see Working with Transactions.

Adding Transformations to the Mapping (Sorted Input)

When you add transformations between the sort origin and the Joiner transformation, use the following guidelines to maintain sorted data:

Do not place any of the following transformations between the sort origin and the Joiner transformation:

Custom

-	Unsorted Aggregator

-	Normalizer

Rank

-	Union transformation

-	XML Parser transformation

-	XML Generator transformation

-	Mapplet, if it contains one of the above transformations

You can place a sorted Aggregator transformation between the sort origin and the Joiner transformation if you use the following guidelines:

-	Configure the Aggregator transformation for sorted input using the guidelines in Using Sorted Input.

-	Use the same ports for the group by columns in the Aggregator transformation as the ports at the sort origin.

-	The group by ports must be in the same order as the ports at the sort origin.

When you join the result set of a Joiner transformation with another pipeline, verify that the data output from the first Joiner transformation is sorted.

Tip: You can place the Joiner transformation directly after the sort origin to maintain sorted data.

If you use a sorted Aggregator transformation between the sort origin and the Joiner transformation, treat the sorted Aggregator transformation as the sort origin when you define the join condition. Use the following guidelines when you define join conditions:

The ports you use in the join condition must match the ports at the sort origin.

When you configure multiple join conditions, the ports in the first join condition must match the first ports at the sort origin.

When you configure multiple conditions, the order of the conditions must match the order of the ports at the sort origin, and you must not skip any ports.

The number of sorted ports in the sort origin can be greater than or equal to the number of ports at the join condition.

Example of a Join Condition

For example, you configure Sorter transformations in the master and detail pipelines with the following sorted ports:

ITEM_NO

ITEM_NAME

PRICE

When you configure the join condition, use the following guidelines to maintain sort order:

You must use ITEM_NO in the first join condition.

If you add a second join condition, you must use ITEM_NAME.

If you want to use PRICE in a join condition, you must also use ITEM_NAME in the second join condition.

If you skip ITEM_NAME and join on ITEM_NO and PRICE, you lose the sort order and the Integration Service fails the session.

Joining Data from a Single Source

You may want to join data from the same source if you want to perform a calculation on part of the data and join the transformed data with the original data. When you join the data using this method, you can maintain the original data and transform parts of that data within one mapping. You can join data from the same source in the following ways:

Join two branches of the same pipeline.

Join two instances of the same source.

Joining Two Branches of the Same Pipeline

When you join data from the same source, you can create two branches of the pipeline. When you branch a pipeline, you must add a transformation between the source qualifier and the Joiner transformation in at least one branch of the pipeline. You must join sorted data and configure the Joiner transformation for sorted input.

For example, you have a source with the following ports:

Employee

Department

Total Sales

In the target, you want to view the employees who generated sales that were greater than the average sales for their departments. To do this, you create a mapping with the following transformations:

Sorter transformation. Sorts the data.

Sorted Aggregator transformation. Averages the sales data and group by department. When you perform this aggregation, you lose the data for individual employees. To maintain employee data, you must pass a branch of the pipeline to the Aggregator transformation and pass a branch with the same data to the Joiner transformation to maintain the original data. When you join both branches of the pipeline, you join the aggregated data with the original data.

Sorted Joiner transformation. Uses a sorted Joiner transformation to join the sorted aggregated data with the original data.

Filter transformation. Compares the average sales data against sales data for each employee and filter out employees with less than above average sales.


Mapping that Joins Two Branches of a Pipeline

Note: You can also join data from output groups of the same transformation, such as the Custom transformation or XML Source Qualifier transformation. Place a Sorter transformation between each output group and the Joiner transformation and configure the Joiner transformation to receive sorted input.

Joining two branches might impact performance if the Joiner transformation receives data from one branch much later than the other branch. The Joiner transformation caches all the data from the first branch, and writes the cache to disk if the cache fills. The Joiner transformation must then read the data from disk when it receives the data from the second branch. This can slow processing.

Joining Two Instances of the Same Source

You can also join same source data by creating a second instance of the source. After you create the second source instance, you can join the pipelines from the two source instances. If you want to join unsorted data, you must create two instances of the same source and join the pipelines.


Mapping that Joins Two Instances of the Same Source

Note: When you join data using this method, the Integration Service reads the source data for each source instance, so performance can be slower than joining two branches of a pipeline.

Guidelines

Use the following guidelines when deciding whether to join branches of a pipeline or join two instances of a source:

Join two branches of a pipeline when you have a large source or if you can read the source data only once. For example, you can only read source data from a message queue once.

Join two branches of a pipeline when you use sorted data. If the source data is unsorted and you use a Sorter transformation to sort the data, branch the pipeline after you sort the data.

Join two instances of a source when you need to add a blocking transformation to the pipeline between the source and the Joiner transformation.

Join two instances of a source if one pipeline may process slower than the other pipeline.

Join two instances of a source if you need to join unsorted data.

My BI Stuff

Sunday, May 2, 2010

Joiner (Informatica)