Tuesday, June 21, 2011

Data Warehouse vs. Operational Data Store (ODS)- They are different ?


Data Warehouse is an 'offline' integration of data, whereas Operational Data Store is an 'online' integration of data. ODS is used, when the data at a transaction (processing as well as querying) level is dispersed across various systems, and one needs to bring it together on online basis.
For example- Let us say that you want to have a single view of customer to be used by customer service, whereby they can also update the data in that single view online basis. However, the data on the customer (OPD Records, Hospitalization records, diagnostic records, pharmaceutical purchase records..) is lying in different databases. ODS could be a good choice.
The above-said concept of ODS is an ideal one. Another option for an Operational Data Store is to be
used for online queries, but the information, which it provides is not real-time, but pertaining to last End of Day.
For example you have a single customer view, but it does not include the transactions which the customer has done today. For this kind of need, some times even Data  Warehouse repository can also used for this purpose.

Difference between Data cleansing and Data scrubbing ?


Data cleansing is a process of removing errors and resolving inconsistencies in source data before loading data into targets.
Data scrubbing is a process of filtering, merging, decoding and translating the source data to create the validation data for data warehouse.

How many repositories can be created in Informatica ?


Standalone Repository: A repository that functions individually and this is unrelated to any other repositories Global Repository : This is a centralized repository in a domain. This repository can contain shared objects  across the repositories in a domain. The objects are shared
through global shortcuts.
Local Repository : Local repository is within a domain and its not a global repository. Local repository can connect to a global repository using global shortcuts and can use objects in its shared folders.
Versioned Repository : This can either be local or global repository but it allows
version control for the repository. A versioned repository can store multiple copies, or
versions of an object. This features allows to efficiently develop, test and deploy metadata in the production environment.

What is the difference between Local & Global repository ?


Global Repository : This is a centralized repository in a domain. This repository  can contain shared objects across the repositories in a domain. The objects are shared
through global shortcuts.
Local Repository : Local repository is within a domain and it is not a global repository. Local repository can connect to a global repository using global shortcuts and can use objects in it is shared folders.

What is a shared folder ?

Shared Folder is like another folder but that can be accessed by all the users(Access can be changed) This is mainly used to share the Objects between the folders for resuablity. For example you can create a shared folder for keeping all the Common Mapplets Src Tgt and Transformation that can be used across the folders by creating shortcut to those. By doing we are increasing usability of the code also changes can be made at one place and that will easily reflect in all other shortcuts.

What exactly is a shortcut and how do you use it ?


A shortcut is a reference (link) to an object in a shared folder these are commonly used for sources and targets that are to be shared between different environments / or projects. A shortcut is created by assigning 'Shared' status to a folder within the Repository Manager and then dragging objects from this folder into another open folder; this provides a single point of control / reference for the object - multiple projects don't all have import sources and targets into their local folders. A reusable
transformation is usually something that is kept local to a folder examples would be the use of a reusable sequence generator for allocating warehouse Customer Id's which would b e useful if you were loading customer details from multiple source systems and allocating unique ids to each new source key. Many mappings could use the same sequence and the sessions would all draw from the same continuous pool of sequence numbers generated.

What is the target load order ?

You specify the target load order based on source qualifiers in a mapping. if u have the multiple source qualifiers connected to the multiple targets you can designate the order in which Informatica server loads data into the targets.

What is the difference between Oracle Sequence and Informatica Sequence and which is better?


Oracle sequence can be used in a Pl/Sql stored procedure, which in turn can be used with stored procedure transformation of Informatica.
Informatica sequence is generated through sequence generator transformation of Informatica. It depends upon the user needs but Oracle sequence provides greater control.

What is a reusable transformation? What is a Mapplet? Explain difference between them?


Reusable transformation:- if u want to create transformation that perform common tasks such as average sal in a dept
Mapplet:- Is a reusable object that represents a set of transformations.

When to index a particular column on what percentage ?


Generally if you are creating ordinary b-tree index, if you know that that columns will be mostly reference in select statement If yours select statement is retrieving more than 30 % rows then it's better not to use index.
Index will be used out of the table when you are retrieving smaller no. of rows
If you retrieving more no. of rows. You can read the table directly rather than using the index(Full table scan)

What is Click Stream in Data Warehousing ?

Click stream is basically web based data warehousing analysis. Basically e-web intelligence.

Code Page overview ?


A code page contains the encoding to specify characters in a set of one or more languages. An encoding is the assignment of a number to a character in the character set. You use code pages to identify data (characters) that might be in different languages. For example, if you are importing Japanese data into a mapping, you must select a Japanese code page for the source data.
To change the language to English and require the system to use the Latin1 code page, in UNIX, execute the following command.
Setenv LANG en_US.iso88591
If You Are Using PowerCenter
With PowerCenter, you receive all product functionality, including the ability to register multiple servers, share metadata across repositories, and partition data.
A PowerCenter license lets you create a single repository that you can configure as a global repository, the core component of a data warehouse. When this guide mentions a PowerCenter Server, it is referring to an Informatica Server with a PowerCenter license.

What is Code Page Compatibility ?


Compatibility between code pages is used for accurate data movement when the Informatica Sever runs in the Unicode data movement mode. If the code pages are identical, then there will not be any data loss. One code page can be a subset or superset of another. For accurate data movement, the target code page must be a superset of the source code page.
Superset -A code page is a superset of another code page when it contains the characterencoded in the other code page. It also contains additional characters not contained in the other code page.
Subset - A code page is a subset of another code page when all characters in the code page are encoded in the other code page.

What are 2 modes of data movement in Informatica Server ?


The data movement mode depends on whether Informatica Server should process single byte or multi-byte character data. This mode selection can affect the enforcement of code page relationships and code page validation in the Informatica Client and Server.
a) Unicode – IS allows 2 bytes for each character and uses additional byte for each non-ASCII character (such as Japanese characters)
b) ASCII – IS holds all data in a single byte
The IS data movement mode can be changed in the Informatica Server configuration parameters. This comes into effect once you restart the Informatica Server.

What are minimum and maximum values for index and data cache ?


Index cache: Min: 12MB
Max: 24 MB
Data cache: Min: 12MB
Max: 24MB

What are cache and their types in Informatica ?

The Informatica server creates index and data cache for aggregator, Rank, joiner and Lookup transformations in a mapping. The Informatica server stores key values in the index cache and output values in the data cache.

What is Load Manager Process ?


Load manager is the primary Informatica server process. It performs the following tasks:
      Manages sessions and batch scheduling.
      Locks the sessions and reads properties.
      Reads parameter files.
      Expands the server and session variables and parameters.
      Verifies permissions and privileges.
      Validates sources and targets code pages.
      Creates session log files.
      Creates Data Transformation Manager (DTM) process, which executes the session.

What are connected and unconnected transformations ?


Connected transformations are the transformation, which are in the data flow, whereas
unconnected transformation will not be in the data flow. These are dealt in Lookup and Stored procedure transformations.

Diff between Active and passive transformation ?


Transformation can be active or passive, active transformation can change the no of records passed to it, a passive transformation can never change the records count.
Active transformation:- that might change the record count are aggregator, filter, joiner, Normalizer, rank, update strategy, source qualifier If u use power connector to access ERP sources, ERP source qualifier is also an active transformation
Passive transformation:- lookup, expression, external procedure, sequence generator, stored procedure U can connect only 1 active transformation to the same transformation or target can connect any no of passive transformations.

What are reusable transformations ? Can we rollback this option ?

Reusable transformations can be used in multiple transformations. Only one transformation can be used in reusable transformation. You can rollback this option.

What are the different connectivity Informatica uses to connect to sources, targets and the repository ?


Power mart and Power Center uses
1.Network Protocol
2.Native Drivers
3.ODBC
The Server Manager and the Informatica Server uses TCP/IP or IPX/SPX to communicate to each other.

What is a Variable port ? Why it is used ?

Variable port is used to store intermediate results. Variable ports can reference input ports and variable ports, but not output ports.

What are different ports in Informatica ?

Input, Output, Variable, Return, Rank, Lookup and Master.

Difference between PL/SQL and informatica ? Why do u need informatica justify ?


Pl/SQL is free with oracle
Informatica is expensive
Easily u can load the data in a more sophisticated way using informatica
U can monitor the load process, Schedule Sessions etc in Informatica.

Strength and weakness of Informatica ?


Rows are processed one at a time, so calculation and checks among the rows is difficult, even though to some extent values can be stored in variables for further processing. This cause increase in the steps required to accomplish the job. E.g. first load the rows in a temporary table with some calculation, and then make this temp table the source for another table for further row manipulations etc.
1. Very good for load and extraction, however not so flexible to implement procedural
programming logic.
2. Every mapping needs a source and target, so sometimes Just end up using dummy source and target (dual tables) to build the logic.
3. No Replace character function in 5.1 (6.0 has it)
4. Ease of use, graphical representation of the ELT process, Mapplet, and reusable exp transformation helps to standardize the ETL process across the organization.
5. Easy for knowledge transfer and maintenance.

Contrasting a Data Warehouse with an OLTP System?



Below figure illustrates some of the key differences between a data warehouse’s model and an
OLTP system’s. Figure Below Contrasting OLTP and Data Warehousing Environments




One major difference between the types of system is that data warehouses are not usually in third-normal form.
Data warehouses and OLTP systems have vastly different requirements.
Here are some examples of the notable differences between typical data warehouses and OLTP systems:
1. Workload. Data warehouses are designed to accommodate ad hoc queries. The workload of a data warehouse may not be completely understood in advance, and the data warehouse is optimized to perform well for a wide variety of possible query operations. OLTP systems support only predefined operations. The application may be specifically tuned or designed to support only these operations.
2.  Data Modifications. The data in a data warehouse is updated on a regular basis by the ETT process (often, every night or every week) using bulk data-modification techniques. The end users of a data warehouse do not directly update the data warehouse. In an OLTP system, end users routinely issue individual data-modification statements in the database. The OLTP database is always up-to-date, and reflects the current state of each business transaction.
3. Schema Design. Data warehouses often use denormalized or partially denormalized
schemas (such as a star schema) to optimize query performance. OLTP systems often use fully normalized schemas to optimize update/insert/delete performance, and guarantee data consistency.
4. Typical Operations. A typical data warehouse query may scan thousands or millions of rows. For example, "Find the total sales for all customers last month." A typical OLTP operation may access only a handful of records. For example, "Retrieve the current order for a given customer."

5. Historical Data. Data warehouses usually store many months or years of historical data. This is to support historical analysis of business data. OLTP systems usually store only a few weeks' or months' worth of data. The OLTP system only stores as much historical data as is necessary to successfully meet the current transactional requirements.