Sunday, May 2, 2010

Aggregator (Informatica)

Transformation type:
Active
Connected

The Aggregator transformation lets you perform aggregate calculations, such as averages and sums. The Integration Service performs aggregate calculations as it reads and stores necessary data group and row data in an aggregate cache. The Aggregator transformation is unlike the Expression transformation, in that you use the Aggregator transformation to perform calculations on groups. The Expression transformation permits you to perform calculations on a row-by-row basis only. 
When you use the transformation language to create aggregate expressions, you can use conditional clauses to filter rows, providing more flexibility than SQL language.
After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation. When the Integration Service performs incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally.

A typical use of Aggregator Transformation would be to load the Daily Snapshot table Grouping by AccountID, by InstrumentID, By Source System, By Date.
Here we need to sum up the total Quantity of the instruments and then calculate the total cost.

When you group values, the Integration Service produces one row for each group. If you do not group values, the Integration Service returns one row for all input rows. The Integration Service typically returns the last row of each group (or the last row received) with the result of the aggregation. However, if you specify a particular row to be returned (for example, by using the FIRST function), the Integration Service then returns the specified row.

Sorted Input Conditions

Do not use sorted input if either of the following conditions are true:   
  • The aggregate expression uses nested aggregate functions.   
  • The session uses incremental aggregation.
If you use sorted input and do not sort data correctly, the session fails.




Configure the following options: 
Aggregator Setting
Description
Cache Directory
Local directory where the Integration Service creates the index and data cache files. By default, the Integration Service uses the directory entered in the Workflow Manager for the process variable $PMCacheDir. If you enter a new directory, make sure the directory exists and contains enough disk space for the aggregate caches.
If you have enabled incremental aggregation, the Integration Service creates a backup of the files each time you run the session. The cache directory must contain enough disk space for two sets of the files. For information about incremental aggregation, see “Using Incremental Aggregation” in the Workflow Administration Guide.
Tracing Level
Amount of detail displayed in the session log for this transformation.
Sorted Input
Indicates input data is presorted by groups. Select this option only if the mapping passes sorted data to the Aggregator transformation.
Aggregator Data Cache Size
Data cache size for the transformation. Default cache size is 2,000,000 bytes. If the total configured session cache size is 2 GB (2,147,483,648 bytes) or greater, you must run the session on a 64-bit Integration Service. You can configure the Integration Service to determine the cache size at run time, or you can configure a numeric value. If you configure the Integration Service to determine the cache size, you can also configure a maximum amount of memory for the Integration Service to allocate to the cache.
Aggregator Index Cache Size
Index cache size for the transformation. Default cache size is 1,000,000 bytes. If the total configured session cache size is 2 GB (2,147,483,648 bytes) or greater, you must run the session on a 64-bit Integration Service. You can configure the Integration Service to determine the cache size at run time, or you can configure a numeric value. If you configure the Integration Service to determine the cache size, you can also configure a maximum amount of memory for the Integration Service to allocate to the cache.
Transformation Scope
Specifies how the Integration Service applies the transformation logic to incoming data:
-
Transaction. Applies the transformation logic to all rows in a transaction. Choose Transaction when a row of data depends on all rows in the same transaction, but does not depend on rows in other transactions.
-
All Input. Applies the transformation logic on all incoming data. When you choose All Input, the PowerCenter drops incoming transaction boundaries. Choose All Input when a row of data depends on all rows in the source.
For more information about transformation scope, see the Workflow Administration Guide.


No comments:

Post a Comment