Sunday, March 13, 2011

Lookup, Cache Transformation & Cache Connection

Cache Connection Manager
- This connection manager is to define the cache
- Cache will be stored in a memory, however we can create cache file to hold the data.
- Once we feed the data into that file (*.caw) , later in all data flow tasks, we can use connection manager and get the data from that cache file

Cache Transformation
- This is one of the Data flow transformation
- It uses memory cache / file cache as configured in Cache Connection Manager and pulls data from there
- This is a very helpful transformation in terms of performance improvement

Lookup Transformation
- Basically it is being used to lookup the data
- It is using equi-join.
- All the values from source table (joining column) SHOULD exist in Reference table. If not, it will throw an
   error
- NULL value also will be considered as non-matching row
- This is CASE-SENSITIVE.
- It can internally use Cache Connection / OLEDB Connection.
- There can be 3 cache types
         (A) Full Cache
                  - Cache connection manager can be used only with this type of cache
                  - In this case, when package starts, data will be pulled and kept into memory cache /  file cache
                     and later only cache will be used and not database
                  - We might not get the latest data from database
         (B) Partial Cache
                  - Cache connection manager can't be used with this
                  - "Advance" tab of Lookup transformation will be enabled only in case of Partial cache where one                                    
                     can configure cache size
                  - This is bit different than Full cache
                  - Initially cache will be empty. For each value, first it checks in Cache, if not found then goes to               
                    database, if found from database, then stores that value in cache so it can be used in later stage.
                  - Startup time will be less than Full cache but processing time will be longer
                  - Also lot of available memory should be there.
         (C) No Cache
                  - Every time, it will get it from database
- There are 3 outputs
         (A) Matching Row
         (B) No Matching Row
         (C) Error

EXAMPLE

Pre-requisite

Execute the following query in database


CREATE TABLE LookupDemo (EID int, EName varchar(10), Country VARCHAR(10))
CREATE TABLE LookupDemoReference (Country VARCHAR(10), Region VARCHAR(15))

CREATE TABLE LookupDemoOutput (EID int, EName varchar(10), Country VARCHAR(10), Region VARCHAR(15))

INSERT INTO LookupDemo (EID,EName,Country) VALUES
(1,'Nisarg','India'),
(2,'Megha','INDIA'),
(3,'Swara','China'),
(4,'Nidhi','USA'),
(5,'Lalu','Japan')

INSERT INTO LookupDemoReference (Country, Region) VALUES
('India','ASIA'),
('India','ASIA-Pacific'),
('China','ASIA-Pacific'),
('USA','North America'),
('Japan','ASIA-Pacific')




Steps

1. Right click in Connection Tray and click on "New Connection"
2. select "CACHE" and click on Add
3. Check "Use File Cache" checkbox to store the cache data in a file.
4. select the file destination and give some name
5. Click on "columns" tab and start adding columns.
6. Add 2 columns (Country , Region) with IndexPosition 1 and 0 respectively.
    Index Position 1 : this will be a joining key on which we will use joining conditions
    Index Position 0 : Other columns

7. Drag data flow task and rename as "Build Cache"
8. Go inside DFT and drag OLEDB source which should refer to LookupDemoReference table.
9. Drag Cache Transformation and connect it with source
10. In Connection Manager, select created cache connection manager
11. click on mapping and make sure, you have proper mapping between columns.
12. Go back to control flow and drag one more data flow task and rename it to "Lookup Data"
13. Connect 1st DFT to 2nd DFT. It should look like this.
14. Go to Lookup Data and add OLEDB source which should point to LookupDemo table
15. Drag Lookup Transformation and connect it with source
16. Select "Full Cache" in General tab of Lookup Transformation. Select "Cache Connection"

17. Select created connection manager and in Connection tab
18. Do column mappings as shown below.
19. Drag OLEDB destination and connect it with Lookup Transformation. make sure to choose "Lookup Match output"
20. OLEDB destination should point to LookupDemoOutput table
20. Column mapping should be like this.

21. Now execute the package.
22. Package will be failed at lookup transformation
23. Reason of failure.
      - Source table has one record with INDIA which is not available in Reference table, this signifies that  
         Lookup transformation is CASE-SENSITIVE
      - Lookup Transformation is different then Merge Join here. Merge join will ignore the rows if they are not  
         matching, but Lookup will throw an error.
24a. Open Lookup Transformation and change settings like this.

24b. Now drag Derived Column Transformation and connect it with Lookup Transformation. Make sure Lookup No Match Output gets displayed on the arrow.
25. In Derived Column, add a new column like this. Make sure to cast it into DT_STR

26. Remove the connection between OLEDB Destination and Lookup Transformation.
26. Now drag "Union All Transformation" and connect it with Lookup Transformation. Make sure Lookup Match Output gets displayed on the arrow.
27. Select "Invalid Column" in last row, last column

28. Connect Union All transformation to OLEDB destination
29. Whole DFT will look like this.
30. Execute the package.
31. We can see that 1 rows has been redirected as "No match output" and finally using Union All it is reaching to destination.
32. Result table is looking like this.
33. This signifies following things.
      - Lookup Transformation is Case-sensitive
      - If 2 reference records are existing then Lookup Trans. picks up only 1st record.
      - Lookup Trans. fails if joining key column is not in reference table in case we haven't configure No
         Match output.





No comments:

Post a Comment