Monday, 9 March 2015

Static vs Flow control in Oracle Data Integrator

FLOW CONTROL :

Data quality validation is done before loading the data into target tables.

Check Control Knowledge Module (CKM) will create E$ table and SNP_CHECK_TAB table for data quality check.

It will validate data in I$ table before inserting data into target table. If it has any errors then it will delete from I$ table and insert into E$ table and common error message and interface name into SNP_CHECK_TAB

STATIC CONTROL :

Data quality validation is done after loading the data into target tables.

CKM will validate data on target table and if any error is detected it will be inserted to E$ table and SNP_CHECK_TAB. Remember that the incorrect entry will not be deleted as in Flow control. 

What are the different temporary tables created in ODI ?

Oracle Data Integrator will create temporary tables during the data transformation and loading/integration.
Let us study about these temp tables.

  1. E$ table: This is generated by Check Control Knowledge Module (CKM) if flow control is enabled.
  2. I$ table: Created by Integration Knowledge Module (IKM). After LKM loads the data into C$ table, the IKM will load these into I$ table.
  3. J$ table: Created during the implementation of Change Data Capture (CDC). This is created by Journalizing Knowledge module (JKM)
  4. C$ table: The LKM creates the "C$" temporary table in the staging area. This table will hold the records loaded from the source server.

Sunday, 8 March 2015

Kimball vs Inmon data warehouse models

This article is copied from search business intelligence blog

Bill Inmon’s enterprise data warehouse approach (the top-down design): A normalized data model is designed first. Then the dimensional data marts, which contain data required for specific business processes or specific departments are created from the data warehouse.

Ralph Kimball’s dimensional design approach (the bottom-up design): The data marts facilitating reports and analysis are created first; these are then combined together to create a broad data warehouse.

Kimball vs Inmon data warehouse models
Which one to pick ?

Insurance: It is vital to get the overall picture with respect to individual clients, groups, history of claims, mortality rate tendencies, demography, profitability of each plan and agents, etc. All aspects are inter-related and therefore suited for the Inmon’s approach.

Marketing: This is a specialized division, which does not call for enterprise warehouse. Only data marts are required. Hence, Kimball’s approach is suitable.

CRM in banks: The focus is on parameters such as products sold, up-sell and cross-sell at a customer-level. It is not necessary to get an overall picture of the business. For example, there is no need to link a customer’s details to the treasury department dealing with forex transactions and regulations. Since the scope is limited, you can go for Kimball’s method. However, if the entire processes and divisions in the bank are to be linked, the obvious choice is Inmon’s design vs. Kimball’s.

Manufacturing: Multiple functions are involved here, irrespective of the budget involved. Thus, where there is a systemic dependency as in this case, an enterprise model is required. Hence Inmon’s method is ideal.

SQL joins explained

It is always a confusing topic for SQL people. Joins are explained in an understandable manner by Coding Horror blog. I have copied the same content here so that I can read my own blog whenever I get confused with Joins.
Assume we have the following two tables. Table A is on the left, and Table B is on the right. We'll populate them with four records each.
id name       id  name
-- ----       --  ----
1  Pirate     1   Rutabaga
2  Monkey     2   Pirate
3  Ninja      3   Darth Vader
4  Spaghetti  4   Ninja

Let's join these tables by the name field in a few different ways and see if 
we can get a conceptual match to those nifty Venn diagrams. 
SELECT * FROM TableA
INNER JOIN TableB
ON TableA.name = TableB.name

id  name       id   name
--  ----       --   ----
1   Pirate     2    Pirate
3   Ninja      4    Ninja


Inner join produces only the set of records that match in both Table A and Table B.


SELECT * FROM TableA
FULL OUTER JOIN TableB
ON TableA.name = TableB.name

id    name       id    name
--    ----       --    ----
1     Pirate     2     Pirate
2     Monkey     null  null
3     Ninja      4     Ninja
4     Spaghetti  null  null
null  null       1     Rutabaga
null  null       3     Darth Vader


Full outer join produces the set of all records in Table A and Table B,
with matching records from both sides where available.
 If there is no match, the missing side will contain null.


SELECT * FROM TableA
LEFT OUTER JOIN TableB
ON TableA.name = TableB.name

id  name       id    name
--  ----       --    ----
1   Pirate     2     Pirate
2   Monkey     null  null
3   Ninja      4     Ninja
4   Spaghetti  null  null

Left outer join produces a complete set of records from Table A, with the
matching records (where available) in Table B. If there is no match,
the right side will contain null.



SELECT * FROM TableA
LEFT OUTER JOIN TableB
ON TableA.name = TableB.name
WHERE TableB.id IS null

id  name       id     name
--  ----       --     ----
2   Monkey     null   null
4   Spaghetti  null   null

To produce the set of records only in Table A, but not in Table B,
we perform the same left outer join, then exclude the records
we don't want from the right side via a where clause



SELECT * FROM TableA
FULL OUTER JOIN TableB
ON TableA.name = TableB.name
WHERE TableA.id IS null
OR TableB.id IS null

id    name       id    name
--    ----       --    ----
2     Monkey     null  null
4     Spaghetti  null  null
null  null       1     Rutabaga
null  null       3     Darth Vader

To produce the set of records unique to Table A and Table B, we perform the same 
full outer join, then exclude the records we don't want from both sides via a where clause



There's also a cartesian product or cross join, which as far as I can tell, can't be expressed as a Venn diagram:
SELECT * FROM TableA
CROSS JOIN TableB
This joins "everything to everything", resulting in 4 x 4 = 16 rows, far more than we had in the original sets. If you do the math, you can see why this is a very dangerous join to run against large tables.

Dimensional Modelling, Facts and Measurements

Dim=Dimension tale, Fact= Fact table

Dimensional modeling

Dimensional modeling always uses the concepts of facts, and dimensions. Facts are typically numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts. For example, sales amount is a fact; timestamp, product, register#, store#, etc. are elements of dimensions. Dimensional models are built by business process area, e.g. store sales, inventory, claims, etc.
More at Wikipedia

What is fact table?

A fact table is a table that contains the measures of interest. For example, sales amount would be such a measure. This measure is stored in the fact table with the appropriate granularity. For example, it can be sales amount by store by day. In this case, the fact table would contain three columns: A date column, a store column, and a sales amount column.

What is dimensional table?

Dimensional table maintains information that represents attributes of the business, contains relatively static data. These tables are connected to fact table through foreign key. These are smaller than fact tables.
  • A fact is a measurement.
  • Facts are based on dimensions.
  • Facts are usually additive but may not be true all the time.
  • Dimension table contains business elements.
  • These tables supply the measurements to the fact tables.
  •  Multiple fact tables may reference the dimension tables.
As shown in the diagram the fact table contains measurement fields along with foreign keys pointing to the dimensions

Saturday, 7 March 2015

Types of Work repositories

odi-work-repository-types

While creating the work repository under a master repository we come across this situation, to choose the type of repository. There are two types of work repositories possible as explained below.
  1. Development: This type of repository allows management of design-time objects such as data models and projects (including interfaces, procedures, etc). A development repository includes also the run-time objects (scenarios and sessions). This type of repository is suitable for development environments. 
  2. Execution: This type of repository only includes run-time objects (scenarios, schedules and sessions). It allows launching and monitoring of data integration jobs in Operator Navigator. Such a repository cannot contain any design-time artifacts. Designer Navigator cannot be used with it. An execution repository is suitable for production environments.

Oracle Data Integrator repositories

There are two types of repositories:
  1.  Master repositories
  2. Work repositories
These repositories are data bases stored in RDBMS. Usually there will be one master repository and several work repositories attached to the master repository. 

Master repository: Master repository stores the information defined by topology manager and security navigator. Master Repository stores the following information 
  1. Security Information i.e. information about users, profiles and access privileges 
  2. Topology information including technologies, Data servers, physical and logical schemas , contexts, agents and languages
  3. ODI versions and objects 
Work Repository: Work repository should always attached to a master repository. 
Work repository stores the information defined in Designer navigator . It stores the following information: 
  1. Data models: metadata, constraints, references, data lineage, etc 
  2. Projects: interfaces, packages, procedures, folders, knowledge modules, etc 
  3. Scenarios, scheduling information and logs Work repository contents manged by designer and operator.

Friday, 6 March 2015

ETL vs ELT

etl-vs-elt
ETL vs ELT
Traditional Extract Transform Load (ETL) tools operate by first Extracting the data from various sources, Transforming the data in a proprietary, middle-tier ETL engine that is used as the staging area, and then loading the transformed data into the target data warehouse, integration server, or Hadoop cluster. Hence the term ETL represents both the names and the order of the operations performed.

Drawbacks of ETL approach

  • The data transformation step of the ETL process is by far the most compute-intensive, and is performed entirely by the proprietary ETL engine on a dedicated server. The ETL engine performs data transformations and data quality checks on a row-by-row basis, and hence, can easily become the bottleneck in the overall process. 
  • Data must be moved over the network twice – once between the sources and the ETL server, and again between the ETL server and the target data warehouse or Hadoop cluster. 
  • To ensure referential integrity by comparing data flow references against values from the target data warehouse, the referenced data must be downloaded from the target to the engine, thus further increasing network traffic, download time, and leading to additional performance issues. 

Extract Load Transform (ELT) approach by Oracle Data Integrator (ODI)


In response to the issues raised by ETL architectures, a new architecture has emerged, which in many ways incorporates the best aspects of manual coding and automated code-generation approaches. Known as E-LT, this new approach changes where and how data transformation takes place, and leverages existing developer skills, RDBMS and Big Data engines, and server hardware to the greatest extent possible. In essence, E-LT moves the data transformation step to the target RDBMS, changing the order of operations to: Extract the data from the source tables, Load the tables into the destination server, and then transform the data on the target RDBMS using native SQL operators. Note, with E-LT there is no need for a middle-tier engine or server.

Monday, 23 February 2015

Oracle Data Integrator Skills needed to become a good ETL Developer

The recruiter will look for the following skills while searching for an ETL or ODI developer. Particular positions may need different skills but these are the general ODI skills needed.

  1. Know everything about ODI architecture and ODI components.
  2. Working knowledge with creating different interfaces, packages, constraints, procedures, SQL joins functions and agents etc.
  3. Extend the base ODI Knowledge Modules to create customized versions that will be re-usable across the company's data integration projects.
  4. Map business rules into Oracle Data Integrator Interfaces, Packages, and Load Plans source to target mappings .
  5. Take care of business rules using ODI Constraints and Data quality functions.
  6. Know the differences between Oracle Data Integrator versions like 10g, 11g and 12c.
  7. Working knowledge with Change Data Capture (CDC). 
  8. Knowledge on Data warehouse concepts such as Dimensional modelling, Entity relationship.
Apart from knowing the tool, the developer must be aware of data warehouse concepts in depth. SQL skills are also must.

Sunday, 22 February 2015

What are the Oracle Data Integrator GUI components ?

Oracle Data Integrator is made up of the following graphical user interface (GUI) components.
In the current releases of ODI all of these Graphical User Interface components are combined into Oracle Data Integrator Studio.

What is an agent in Oracle Data Integrator (ODI) ?

We know that there are run time and design time components in Oracle Data Integrator.

ODI agent is a run time component.

What is ODI agent ?

Oracle Data Integrator agent is a java program which will listen on a particular TCP/IP port. Agent can execute the scenarios when it is invoked from one of the ODI GUI modules or schedule these executions when it is started as a scheduler.


There are two types of ODI Agents available.

  1. Standalone Java Agent :When request is received for a scenario execution, it start arranging the scenario and updates the status to work repository. A parent and child standalone agents can be configured in ODI studio to do the load balancing if there is a need to load balance between stand alone agents.
  2. JavaEE agent : This agent is part of web logic and listens on a particular port. When request is received for a scenario execution, it start arranging scenario and updates the status to work repository. Load balancing is supported to through web logic clusters for a Java EE Agent.

Difference between Oracle Data Integrator features 10g, 11g and 12c


Differences between Oracle Data Integrator 12c and 11g: 

ODI 12c is the latest version of ODI.


  1. loading multiple target tables as part of a single interface. Interface is termed as mapping in ODI 12c on wards.
  2. Redesigned declarative flow-based user interface, reusable mappings, runtime performance enhancements, Oracle GoldenGate integration improvements, Big Data and XML improvements, Oracle Warehouse Builder integration.
complete change log is here
Oracle Data Integrator 11g and 10g:
  1. ODI 11g has all the ODI components(Designer, Operator, Topology and Security) integrated together as Oracle Data Integrator Studio.
  2. ODI 11g provides a Java API to manipulate both the design-time and run-time artifacts of the product. This API allows you for example to create or modify interfaces programmatically, create your topology, perform import or export operations, launch or monitor sessions. This API can be used in any Java SE and Java EE applications, or in the context of Java-based scripting languages like Groovy or Jython.
  3. External Password Storage, to have source/target data servers (and contexts) passwords stored in an enterprise credential store. External Authentication, to have user/password information stored in an enterprise identity store (e.g.: LDAP, Oracle Directory, Active Directory), and ODI authenticating against this store. These two features let you optionally store critical information in dedicated storages and not within the ODI repository. The ODI Console may also use Oracle's single-sign on systems with ODI.

Friday, 20 February 2015

Oracle Data Integrator (ODI) Interview questions

1) What components make up Oracle Data Integrator?

"Oracle Data Integrator" comprises of the following

  1. Oracle Data Integrator + Topology Manager + Designer + Operator + Agent. 
  2. Oracle Data Quality for Data Integrator.
  3. Oracle Data Profiling.
2) What is a model in ODI ?

A data model corresponds to group of tabular data structure stored in a data server and is based on a Logical Schema defined in the topology and contain only metadata.


3) What are User Parameters in Oracle Data Integrator?


Oracle Data Integrator saves user parameters such as default directories, windows positions etc. User parameters are saved in the userpref.xml file in /bin.


Tuesday, 27 January 2015

Oracle Data Integrator 11g Certified Implementation Essentials Syllabus

Introduction to Oracle Data Integrator (ODI)
  1. Describe the benefits of ODI (including ODI architecture - Extract Load Transform) and it's typical use cases 
  2. Describe the ODI components and their uses 
  3. Architecture Overview 
  4. Describe ODI Architecture 
  5. Install and Configure the physical and logical architecture 
  6. Explain the Fusion Middleware Components - Upgrade Assistant, Repository Creation Utility and Java Enterprise Edition (JEE) Components and High Availability