ODI Architecture Overview ~ Oracle Data Integrator

Oracle Data Integrator Architecture

The architecture of Oracle Data Integrator relies on different components that collaborate together, as described in Figure: Functional Architecture Overview.

Repositories

The central component of the architecture is the Oracle Data Integrator Repository. It stores configuration information about the IT infrastructure, metadata of all applications, projects, scenarios, and the execution logs. Many instances of the repository can coexist in the IT infrastructure. The architecture of the repository is designed to allow several separated environments that exchange metadata and scenarios (for example: Development, Test, Maintenance and Production environments). In the figure above, two repositories are represented: one for the development environment, and another one for the production environment. The repository also acts as a version control system where objects are archived and assigned a version number. The Oracle Data Integrator Repository can be installed on an OLTP relational database.

The Oracle Data Integrator Repository is composed of a master repository and several Work Repositories. Objects developed or configured through the user interfaces are stored in one of these repository types.

There is usually only one master repository that stores the following information:

Security information including users, profiles and rights for the ODI platform

Topology information including technologies, server definitions, schemas, contexts, languages etc.

Versioned and archived objects.

The Work Repository is the one that contains actual developed objects. Several work repositories may coexist in the same ODI installation (for example, to have separate environments or to match a particular versioning life cycle). A Work Repository stores information for:

Models, including schema definition, datastores structures and metadata, fields and columns definitions, data quality constraints, cross references, data lineage etc.

Projects, including business rules, packages, procedures, folders, Knowledge Modules, variables etc.

Scenario execution, including scenarios, scheduling information and logs.

When the Work Repository contains only the execution information (typically for production purposes), it is then called an Execution Repository.

User Interfaces

Administrators, Developers and Operators use the Oracle Data Integrator Studio to access the repositories. This Fusion Client Platform (FCP) based UI is used for administering the infrastructure (security and topology), reverse-engineering the metadata, developing projects, scheduling, operating and monitoring executions.

Business users (as well as developers, administrators and operators), can have read access to the repository, perform topology configuration and production operations through a web based UI called Oracle Data Integrator Console. This Web application can deployed in a Java EE application server such as Oracle WebLogic.

ODI Studio provides four Navigators for managing the different aspects and steps of an ODI integration project:

Topology Navigator
Designer Navigator
Operator Navigator
Security Navigator

Topology Navigator

Topology Navigator is used to manage the data describing the information system's physical and logical architecture. Through Topology Navigator you can manage the topology of your information system, the technologies and their datatypes, the data servers linked to these technologies and the schemas they contain, the contexts, the language and the agents, as well as the repositories. The site, machine, and data server descriptions will enable Oracle Data Integrator to execute the same interfaces in different environments.

Designer Navigator

Designer Navigator is used to design data integrity checks and to build transformations such as for example:

Automatic reverse-engineering of existing applications or databases

Graphical development and maintenance of transformation and integration interfaces

Visualization of data flows in the interfaces

Automatic documentation generation

Customization of the generated code

The main objects you handle through Designer Navigator are Models and Projects.

Operator Navigator

Operator Navigator is the production management and monitoring tool. It is designed for IT production operators. Through Operator Navigator, you can manage your interface executions in the sessions, as well as the scenarios in production.

Security Navigator

Security Navigator is the tool for managing the security information in Oracle Data Integrator. Through Security Navigator you can create users and profiles and assign user rights for methods (edit, delete, etc) on generic objects (data server, datatypes, etc), and fine-tune these rights on the object instances (Server 1, Server 2, etc).

Design-time Projects

A typical project is composed of several steps and milestones.

Some of these are:

Define the business needs

Identify and declare the sources and targets in the Topology

Design and Reverse-engineer source and target data structures in the form of data models

Implement data quality rules on these data models and perform static checks on these data models to validate the data quality rules

Develop integration interfaces using datastores from these data models as sources and target

Develop additional components for tasks that cannot be achieved using interfaces, such as Receiving and sending e-mails, handling files (copy, compress, rename and such), executing web services

Integrate interfaces and additional components for building Package workflows

Version your work and release it in the form of scenarios

Schedule and operate scenarios.

Oracle Data Integrator will help you cover most of these steps, from source data investigation to metadata lineage, and through loading and data quality audit. With its repository, Oracle Data Integrator will centralize the specification and development efforts and provide a unique architecture on which the project can rely to succeed.

Run-Time Agent

At design time, developers generate scenarios from the business rules that they have designed. The code of these scenarios is then retrieved from the repository by the Run-Time Agent. This agent then connects to the data servers and orchestrates the code execution on these servers. It retrieves the return codes and messages for the execution, as well as additional logging information – such as the number of processed, execution time etc. - in the Repository.

The Agent comes in two different flavors:

The Java EE Agent can be deployed as a web application and benefit from the features of an application server.

The Standalone Agent runs in a simple Java Machine and can be deployed where needed to perform the integration flows.

Both these agents are multi-threaded java programs that support load balancing and can be distributed across the information system. This agent holds its own execution schedule which can be defined in Oracle Data Integrator, and can also be called from an external scheduler. It can also be invoked from a Java API or a web service interface.

Introduction to Integration Interfaces

An integration interface is an Oracle Data Integrator object stored that enables the loading of one target datastore with data transformed from one or more source datastores, based on declarative rules implemented as mappings, joins, filters and constraints.

An integration interface also references the Knowledge Modules (code templates) that will be used to generate the integration process.

Datastores

A datastore is a data structure that can be used as a source or a target in an integration interface. It can be:

a table stored in a relational database

an ASCII or EBCDIC file (delimited, or fixed length)

a node from a XML file

a JMS topic or queue from a Message Oriented Middleware

a node from a enterprise directory

an API that returns data in the form of an array of records

Regardless of the underlying technology, all data sources appear in Oracle Data Integrator in the form of datastores that can be manipulated and integrated in the same way. The datastores are grouped into data models. These models contain all the declarative rules –metadata - attached to datastores such as constraints.

Declarative Rules

The declarative rules that make up an interface can be expressed in human language, as shown in the following example: Data is coming from two Microsoft SQL Server tables (ORDERS joined to ORDER_LINES) and is combined with data from the CORRECTIONS file. The target SALES Oracle table must match some constraints such as the uniqueness of the ID column and valid reference to the SALES_REP table.

Data must be transformed and aggregated according to some mappings expressed in human language as shown in Figure: Example of a business problem.

Data Flow

Business rules defined in the interface are automatically converted into a data flow that will carry out the joins filters, mappings, and constraints from source data to target tables.

By default, Oracle Data Integrator will use the Target RBDMS as a staging area for loading source data into temporary tables and applying all the required mappings, staging filters, joins and constraints. The staging area is a separate area in the RDBMS (a user/database) where Oracle Data Integrator creates its temporary objects and executes some of the rules (mapping, joins, final filters, aggregations etc.). When performing the operations this way, Oracle Data Integrator behaves like an E-LT as it first extracts and loads the temporary tables and then finishes the transformations in the target RDBMS.

In some particular cases, when source volumes are small (less than 500,000 records), this staging area can be located in memory in Oracle Data Integrator's in-memory relational database – In-Memory Engine. Oracle Data Integrator would then behave like a traditional ETL tool.

Figure: Oracle Data Integrator Knowledge Modules in action shows the data flow automatically generated by Oracle Data Integrator to load the final SALES table. The business rules will be transformed into code by the Knowledge Modules (KM). The code produced will generate several steps. Some of these steps will extract and load the data from the sources to the staging area (Loading Knowledge Modules - LKM). Others will transform and integrate the data from the staging area to the target table (Integration Knowledge Module - IKM). To ensure data quality, the Check Knowledge Module (CKM) will apply the user defined constraints to the staging data to isolate erroneous records in the Errors table.

Oracle Data Integrator Knowledge Modules contain the actual code that will be executed by the various servers of the infrastructure. Some of the code contained in the Knowledge Modules is generic. It makes calls to the Oracle Data Integrator Substitution API that will be bound at run-time to the business-rules and generates the final code that will be executed.

At design time, declarative rules are defined in the interfaces and Knowledge Modules are only selected and configured.
At run-time, code is generated and every Oracle Data Integrator API call in the Knowledge Modules (enclosed by <% and %>) is replaced with its corresponding object name or expression, with respect to the metadata provided in the Repository. The generated code is orchestrated by Oracle Data Integrator run-time component - the Agent – on the source and target systems to make them perform the processing, as defined in the E-LT approach.

Introduction to Declarative Design

To design an integration process with conventional ETL systems, a developer needs to design each step of the process: Consider, for example, a common case in which sales figures must be summed over time for different customer age groups. The sales data comes from a sales management database, and age groups are described in an age distribution file. In order to combine these sources then insert and update appropriate records in the customer statistics systems, you must design each step, which includes:

Load the customer sales data in the engine

Load the age distribution file in the engine

Perform a lookup between the customer sales data and the age distribution data

Aggregate the customer sales grouped by age distribution

Load the target sales statistics data into the engine

Determine what needs to be inserted or updated by comparing aggregated information with the data from the statistics system

Insert new records into the target

Update existing records into the target

This method requires specialized skills, depending on the steps that need to be designed. It also requires significant efforts in development, because even repetitive succession of tasks, such as managing inserts/updates in a target, need to be developed into each task. Finally, with this method, maintenance requires significant effort. Changing the integration process requires a clear understanding of what the process does as well as the knowledge of how it is done. With the conventional ETL method of design, the logical and technical aspects of the integration are intertwined.Declarative Design is a design method that focuses on “What” to do (the Declarative Rules) rather than “How” to do it (the Process). In our example, “What” the process does is:

Relate the customer age from the sales application to the age groups from the statistical file

Aggregate customer sales by age groups to load sales statistics

“How” this is done, that is the underlying technical aspects or technical strategies for performing this integration task – such as creating temporary data structures or calling loaders – is clearly separated from the declarative rules.

Declarative Design in Oracle Data Integrator uses the well known relational paradigm to declare in the form of an Interface the declarative rules for a data integration task, which includes designation of sources, targets, and transformations.

Declarative rules often apply to metadata to transform data and are usually described in natural language by business users. In a typical data integration project (such as a Data Warehouse project), these rules are defined during the specification phase in documents written by business analysts in conjunction with project managers. They can very often be implemented using SQL expressions, provided that the metadata they refer to is known and qualified in a metadata repository.

The four major types of Declarative Rules are mappings, joins, filters and constraints:

A mapping is a business rule implemented as an SQL expression. It is a transformation rule that maps source columns (or fields) onto one of the target columns. It can be executed by a relational database server at run-time. This server can be the source server (when possible), a middle tier server or the target server.

A join operation links records in several data sets, such as tables or files. Joins are used to link multiple sources. A join is implemented as an SQL expression linking the columns (fields) of two or more data sets. Joins can be defined regardless of the physical location of the source data sets involved. For example, a JMS queue can be joined to an Oracle table. Depending on the technology performing the join, it can be expressed as an inner join, right outer join, left outer join and full outer join.

A filter is an expression applied to source data sets columns. Only the records matching this filter are processed by the data flow.

A constraint is an object that defines the rules enforced on data sets' data. A constraint ensures the validity of the data in a given data set and the integrity of the data of a model. Constraints on the target are used to check the validity of the data before integration in the target.

Table: Examples of declarative rules gives examples of declarative rules.

Introduction to Data Integration with Oracle Data Integrator

Data Integration ensures that information is timely, accurate, and consistent across complex systems. This section provides an introduction to data integration and describes how Oracle Data Integrator provides support for Data Integration.

Data Integration

Integrating data and applications throughout the enterprise, and presenting them in a unified view is a complex proposition. Not only are there broad disparities in technologies, data structures, and application functionality, but there are also fundamental differences in integration architectures. Some integration needs are Data Oriented, especially those involving large data volumes. Other integration projects lend themselves to an Event Driven Architecture (EDA) or a Service Oriented Architecture (SOA), for asynchronous or synchronous integration.

Data Integration ensures that information is timely, accurate, and consistent across complex systems. Although it is still frequently referred as Extract-Load-Transform (ETL) - Data Integration was initially considered as the architecture used for loading Enterprise Data Warehouse systems - data integration now includes data movement, data synchronization, data quality, data management, and data services.

Oracle Data Integrator

Oracle Data Integrator provides a fully unified solution for building, deploying, and managing complex data warehouses or as part of data-centric architectures in a SOA or business intelligence environment. In addition, it combines all the elements of data integration—data movement, data synchronization, data quality, data management, and data services—to ensure that information is timely, accurate, and consistent across complex systems.

Oracle Data Integrator (ODI) features an active integration platform that includes all styles of data integration: data-based, event-based and service-based. ODI unifies silos of integration by transforming large volumes of data efficiently, processing events in real time through its advanced Changed Data Capture (CDC) capability, and providing data services to the Oracle SOA Suite.. It also provides robust data integrity control features, assuring the consistency and correctness of data. With powerful core differentiators - heterogeneous E-LT, Declarative Design and Knowledge Modules - Oracle Data Integrator meets the performance, flexibility, productivity, modularity and hot-pluggability requirements of an integration platform.

E-LT

Traditional ETL tools operate by first Extracting the data from various sources, Transforming the data in a proprietary, middle-tier ETL engine, and then Loading the transformed data into the target data warehouse or integration server. Hence the term "ETL" represents both the names and the order of the operations performed, as shown in Figure: Traditional ETL versus ODI E-LT.

The data transformation step of the ETL process is by far the most compute-intensive, and is performed entirely by the proprietary ETL engine on a dedicated server. The ETL engine performs data transformations (and sometimes data quality checks) on a row-by-row basis, and hence, can easily become the bottleneck in the overall process. In addition, the data must be moved over the network twice – once between the sources and the ETL server, and again between the ETL server and the target data warehouse. Moreover, if one wants to ensure referential integrity by comparing data flow references against values from the target data warehouse, the referenced data must be downloaded from the target to the engine, thus further increasing network traffic, download time, and leading to additional performance issues.

In response to the issues raised by ETL architectures, a new architecture has emerged, which in many ways incorporates the best aspects of manual coding and automated code-generation approaches. Known as "E-LT", this new approach changes where and how data transformation takes place, and leverages existing developer skills, RDBMS engines and server hardware to the greatest extent possible. In essence, E-LT moves the data transformation step to the target RDBMS, changing the order of operations to: Extract the data from the source tables, Load the tables into the destination server, and then Transform the data on the target RDBMS using native SQL operators. Note, with E-LT there is no need for a middle-tier engine or server as shown in Figure: Traditional ETL versus ODI E-LT.

SATURDAY, 30 APRIL 2011

Oracle Data Integrator Studio ODI 11G Features

Oracle Data Integrator provides a new IDE (Integrated development Environment) called the ODI Studio. It is developed based on JDeveloper.

It includes the several features:

New Navigator Organization
New Look and Feel
Redesigned Editors
Window Management
Document Management and Navigation
Improved User Assistance

New Navigator Organization

The older versions of oracle data integrator have separate module interfaces for components like Designer, Topology, Operator and Security Manager. All these modules now appear as Navigators within the Oracle Data Integrator Studio window of ODI 11G version.

The new Oracle Data Integrator studio is used as a replacement for all Oracle Data Integrator modules (Designer, Topology, Operator and Security Manager).

Navigator organization provides the following features:

· Navigators can be docked/undocked

· Navigator can be hidden

· Accordions group the tree views that appear in the Navigators (for example the Project and Models accordions in the Designer Navigator).

· Not frequently used Accordions can be minimized to allow more room for the other tree views.

· Accordions allow access to the tree view-specific actions from their toolbar menu (for example, import project from the Project Accordion in the Designer Navigator).

· Tree Views objects are provided with context menus and markers the same way as in Oracle Data Integrator 10g.

· Tree view objects can be dragged and dropped within a tree view or across tree views for defining the security policies. Double clicking an object opens by default the corresponding Object Editor.

· Context Menus have been reorganized into groups with separators and normalized across the interface.

· This feature provides a single user interface from which the user can perform all the tasks in a project lifecycle. It also provides a better productivity for the user.

· Using the View menu Navigators can be docked/undocked and displayed/hidden.

2. New Look and Feel

This new look and feel is customizable with the Preferences menu option. Icons are being redesigned in a new, trendy style to enhance the overall visual appeal of Oracle Data Integrator

3. Redesigned Editors

For better usability all object editors in Oracle Data Integrator have been redesigned.

Main changes are:

Tabs are organized as finger tabs on the left hand-side of the editor. Complex editors (as for example Interface or Package Editors) have also tabs appearing in the bottom of the editor.
Fields have been grouped under headers. These field groups implement an expand/collapse behavior.
Fields and labels have been organized in a standard way for all editors for a better readability of the editors.
Text Buttons in the editors are transformed into hyperlinks, and all buttons appearing in editors have been redesigned.

5. Knowledge Modules, Actions and Procedure editors have been redesigned in order to edit the Lines directly from the main editor instead of opening a separate editor.

4. Window Management

1. Full Docking Support:

· All windows, editors and navigators can now be docked and undocked intuitively.

· Fast maximize and restore: To quickly maximize a dockable window or the editor area, double-click on the title bar of the window you want to maximize. To restore the window to its previous dimensions, double-click again on the title bar.

· Title bars as tabs: The tab for a dockable window (when tabbed with another dockable window) is now also the title bar. This makes more effective use of the space on the screen. Reposition a window by dragging its tab. Some additional related enhancements include a new context menu from the gray background area behind the tab, change in terminology from "auto-hide" and "show" to "minimize" and "restore", ability to minimize a set of tabbed windows with a single click, and toggling the display of a minimized window by clicking on its button.

5. Document Management and Navigation

Object edition has been enhanced in the Oracle Data Integrator Studio with improved document management.

This includes:

1. Save and close multiple editors: You can easily save all your work with a single click using the File > Save All option and close all opened editors similarly. You can also close all the editors but the current one.

2. Forward and back buttons: Now you can easily return to a previously visited document with the convenient browser-style forward and back buttons on the main toolbar. These buttons maintain a history, so you can drop down the back or forward button to get a list of the documents and edit locations you have visited. Alt+Left and Alt+Right activate the back and forward buttons.

3. Quick document switching: Switching between editors and navigators is also possible. Now when you press Ctrl+Tab or Ctrl+F6, you can choose which document you want to switch from a list ordered by the most recently used. You can use the same technique to switch between open dockable windows by first placing focus in a dockable window, then pressing Ctrl+Tab or Ctrl+F6.

6.Improved User Assistance

Oracle Data Integrator introduces intuitive new features that improve usability:

· Help Center/Welcome Page: The Welcome page has been transformed into the Help Center, redesigned to provide the user with quick access to help topics and common tasks, as well as links to useful Oracle resources.

· New On-Line Help: The online help has been entirely re-written for supporting the new user interface.

· Help bookmarks: The Help window has a tab labeled Favorites. While browsing the help, you can click on the Add to Favorites button to add the document to this tab.

Thanks Guru’s

Ram Kumar Lanke

THURSDAY, 28 APRIL 2011