informatica materials and latest interview questions and scenario: 6/12/11

Saturday, June 18, 2011

What is Data Cardinality?

Cardinality is the term used in database relations to denote the occurrences of data on either

side of the relation

There are 3 basic types of cardinality:

High data cardinality: Values of a data column are very uncommon.
e.g.: email ids and the user names
Normal data cardinality: Values of a data column are somewhat uncommon but never unique.
e.g.: A data column containing LAST_NAME (there may be several entries of the same last name)
Low data cardinality: Values of a data column are very usual.
e.g.: flag statuses: 0/1
Determining data cardinality is a substantial aspect used in data modeling. This is used to determine the relationships

Types of cardinalities:
The Link Cardinality - 0:0 relationships
The Sub-type Cardinality - 1:0 relationships
The Physical Segment Cardinality - 1:1 relationship
The Possession Cardinality - 0: M relation
The Child Cardinality - 1: M mandatory relationship
The Characteristic Cardinality - 0: M relationship
The Paradox Cardinality - 1: M relationship.

What is the difference between Informatica 7.0 and 8.0 ?

The architecture of Power Center 8 has changed a lot:
1. PC8 is service-oriented for modularity, scalability and flexibility.
2. The Repository Service and Integration Service (as replacement for Rep Server and Informatica Server) can be run on different computers in a network (so called nodes), even redundantly.
3. Management is centralized, that means services can be started and stopped on nodes via a central web interface.
4. Client Tools access the repository via that centralized machine, resources are distributed dynamically.
5. Running all services on one machine is still possible, of course.
6. It has a support for unstructured data which includes spreadsheets, email, Microsoft Word files, presentations and .PDF documents. It provides high availability, seamless fail over, eliminating single points of failure.
7. It has added performance improvements (To bump up systems performance, Informatica has added "push down optimization" which moves data transformation processing to the native relational database I/O engine whenever it is most appropriate.)
8. Informatica has now added more tightly integrated data profiling, cleansing, and matching capabilities.
9. Informatica has added a new web based administrative console.
10. Ability to write a Custom Transformation in C++ or Java.
11. Midstream SQL transformation has been added in 8.1.1, not in 8.1.
12. Dynamic configuration of caches and partitioning
13. Java transformation is introduced.
14. User defined functions
15. PowerCenter 8 release has "Append to Target file" feature.

In which circumstances that informatica server creates Reject files ?

When it encounters the DD_Reject in update strategy transformation.

Violates database constraint Field in the rows was truncated or overflown

What is the method of loading 5 flat files of having same structure to a single target and which transformations I can use ?

Two Methods.
1. Write all files in one directory then use file repository concept (don’t forget to type source file type as indirect in the session
2. Use union transformation to combine multiple input files into a single target.

What are mapping parameters and variables in which situation we can use it?

If we need to change certain attributes of a mapping after every time the session is run, it will be very difficult to edit the mapping and then change the attribute. So we use mapping parameters and variables and define the values in a parameter file. Then we could edit the parameter file to change the attribute values. This makes the process simple.

Mapping parameter values remain constant. If we need to change the parameter value then we need to edit the parameter file. But value of mapping variables can be changed by using variable function. If we need to increment the attribute value by 1 after every session run then we can use mapping variables. In a mapping parameter we need to manually edit the attribute value in the parameter file after every session run.

In a sequential batch can you run the session if previous session fails ?

Yes. By setting the option always runs the session.

What type of metadata is stored in repository?

Source definitions: Definitions of database objects (tables, views, synonyms) or files that provide source data.

Target definitions: Definitions of database objects or files that contain the target data.

Multi-dimensional metadata: Target definitions that are configured as cubes and dimensions.

Mappings: A set of source and target definitions along with transformations containing usiness logic that you build into the transformation. These are the instructions that the Informatica Server uses to transform and move data.

Reusable transformations: Transformations that you can use in multiple mappings.

Mapplets: A set of transformations that you can use in multiple mappings.

Sessions and workflows: Sessions and workflows store information about how and when the Informatica Server moves data. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in a workflow. Each session corresponds to a single mapping.

What does #! /bin/sh mean in a shell script?

It actually tells the script to which interpreter to refer. As you know, bash shell has some specific functions that other shell does not have and vice-versa. Same way is for perl, python and other languages.

It's to tell your shell what shell to you in executing the following statements in your shell script.

Below line is the first line of the script ?

#!/usr/bin/sh

#!/bin/ksh

If file exists then send email with attachment ?

if [[ -f $your_file ]]; then uuencode $your_file $your_file|mailx -s "$your_file exists..." your_email_address

Searching for files ?

find command

find -name aaa.txt Finds all the files named abcd.txt in the current directory or

any subdirectory tree.

find / -name vimrc Find all the files named 'abcd' anywhere on the system.

find /usr/ -name "*abcd*"

Find all files whose names contain the string 'abcd' which

exist within the '/usr/' directory tree.

What is the CMD gives total no of users logged in at this time.

who | wc -l

echo "are total number of people logged in at this time."

How to find all processes that are running ?

ps -A

How to create zero byte file ?

Touch filename (touch is the command to create zero byte file)

How to create zero byte file ?

Touch filename (touch is the command to create zero byte file)

How to display alternative rows in a table ?

SQL> select *from emp where (rowid, 0) in (select rowid,mod(rownum,2) from emp);

How to find the Top sal ?

Select * from EMP where sal= (select max (sal) from EMP);

How to find 2 nd highest Sal ?

Select empno, ename, sal, r from (select empno, ename, sal, dense_rank () over (order by sal desc) r from EMP) where r=2;

How to find Top 5 salaries ?

Select empno, ename, sal,r from (select empno,ename,sal,dense_rank() over (order by sal desc) r from emp) where r<=5;
OR
Select * from (select * from EMP order by sal desc) where rownum<=5;

How to find the Dense rank ?

The DENSE_RANK function works acts like the RANK function except that it assigns consecutive ranks:

Select empno, ename, Sal, from (select empno, ename, sal, dense_rank () over (order by sal desc) r from emp);

A query to assign the Ranks

Select empno, ename, sal, r from (select empno, ename, sal, rank () over (order by sal desc) r from EMP);

Write the query to transpose rows into columns.

select

emp_id,

max(decode(row_id,0,address))as address1,

max(decode(row_id,1,address)) as address2,

max(decode(row_id,2,address)) as address3

from (select emp_id,address,mod(rownum,3) row_id from temp order by emp_id )

group by emp_id

Other query:

select

emp_id,

max(decode(rank_id,1,address)) as add1,

max(decode(rank_id,2,address)) as add2,

max(decode(rank_id,3,address))as add3

from

(select emp_id,address,rank() over (partition by emp_id order by emp_id,address )rank_id from temp )

group by

emp_id

How to remove duplicates in the table ?

Delete from EMP where rowid not in (select max (rowid) from EMP group by empno);

How to get duplicate rows from the table ?

Select empno, count (*) from EMP group by empno having count (*)>1;

What are theTypes of Triggers ?

This section describes the different types of triggers:

Row Triggers

A row trigger is fired each time the table is affected by the triggering statement. For example, if an UPDATE statement updates multiple rows of a table, a row trigger is fired once for each row affected by the UPDATE statement. If a triggering statement affects no rows, a row trigger is not run.

BEFORE and AFTER Triggers

When defining a trigger, you can specify the trigger timing--whether the trigger action is to be run before or after the triggering statement. BEFORE and AFTER apply to both statement and row triggers.

BEFORE and AFTER triggers fired by DML statements can be defined only on tables, not on views.

Difference between Trigger and Procedure

Triggers	Stored Procedures
In trigger no need to execute manually. Triggers will be fired automatically. Triggers that run implicitly when an INSERT, UPDATE, or DELETE statement is issued against the associated table.	Where as in procedure we need to execute manually.

Differences between stored procedure and functions

Stored Procedure	Functions
Stored procedure may or may not return values.	Function should return at least one output parameter. Can return more than one parameter using OUT argument.
Stored procedure can be used to solve the business logic.	Function can be used to calculations
Stored procedure is a pre-compiled statement.	But function is not a pre-compiled statement.
Stored procedure accepts more than one argument.	Whereas function does not accept arguments.
Stored procedures are mainly used to process the tasks.	Functions are mainly used to compute values
Cannot be invoked from SQL statements. E.g. SELECT	Can be invoked form SQL statements e.g. SELECT
Can affect the state of database using commit.	Cannot affect the state of database.
Stored as a pseudo-code in database i.e. compiled form.	Parsed and compiled at runtime.

Explain about Triggers ?

Oracle lets you define procedures called triggers that run implicitly when an INSERT, UPDATE, or DELETE statement is issued against the associated table

Triggers are similar to stored procedures. A trigger stored in the database can include SQL and PL/SQL

what is ment by Packages ?

Packages provide a method of encapsulating related procedures, functions, and associated cursors and variables together as a unit in the database.

package that contains several procedures and functions that process related to same transactions.

A package is a group of related procedures and functions, together with the cursors and variables they use, Packages provide a method of encapsulating related procedures, functions, and associated cursors and variables together as a unit in the database.

What are the differences between stored procedures and triggers ?

Stored procedure normally used for performing tasks But the Trigger normally used for tracing and auditing logs.

Stored procedures should be called explicitly by the user in order to execute But the Trigger should be called implicitly based on the events defined in the table.

Stored Procedure can run independently But the Trigger should be part of any DML events on the table.

Stored procedure can be executed from the Trigger But the Trigger cannot be executed from the Stored procedures.

Stored Procedures can have parameters.But the Trigger cannot have any parameters.

Stored procedures are compiled collection of programs or SQL statements in the database.

Using stored procedure we can access and modify data present in many tables. Also a stored procedure is not associated with any particular database object.

But triggers are event-driven special procedures which are attached to a specific database object say a table.

Stored procedures are not automatically run and they have to be called explicitly by the user. But triggers get executed when the particular event associated with the event gets fired.

What is your tuning approach if SQL query taking long time? Or how do u tune SQL query ?

If query taking long time then First will run the query in Explain Plan, The explain plan process stores data in the PLAN_TABLE.

it will give us execution plan of the query like whether the query is using the relevant indexes on the joining columns or indexes to support the query are missing.

If joining columns doesn’t have index then it will do the full table scan if it is full table scan the cost will be more then will create the indexes on the joining columns and will run the query it should give better performance and also needs to analyze the tables if analyzation happened long back. The ANALYZE statement can be used to gather statistics for a specific table, index or cluster using

ANALYZE TABLE employees COMPUTE STATISTICS;

If still have performance issue then will use HINTS, hint is nothing but a clue. We can use hints like

ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing systems.

(/*+ ALL_ROWS */)

FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.

(/*+ FIRST_ROWS */)

CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS, based on statistics gathered.
HASH
Hashes one table (full scan) and creates a hash index for that table. Then hashes other table and uses hash index to find corresponding records. Therefore not suitable for < or > join conditions.

/*+ use_hash */

Hints are most useful to optimize the query performance.

Explain Plan ?

Explain plan will tell us whether the query properly using indexes or not.whatis the cost of the table whether it is doing full table scan or not, based on these statistics we can tune the query.

The explain plan process stores data in the PLAN_TABLE. This table can be located in the current schema or a shared schema and is created using in SQL*Plus as follows:

SQL> CONN sys/password AS SYSDBA

Connected

SQL> @$ORACLE_HOME/rdbms/admin/utlxplan.sql

SQL> GRANT ALL ON sys.plan_table TO public;

SQL> CREATE PUBLIC SYNONYM plan_table FOR sys.plan_table;

Why hints Require ?

It is a perfect valid question to ask why hints should be used. Oracle comes with an optimizer that promises to optimize a query's execution plan. When this optimizer is really doing a good job, no hints should be required at all.

Sometimes, however, the characteristics of the data in the database are changing rapidly, so that the optimizer (or more accuratly, its statistics) are out of date. In this case, a hint could help.

You should first get the explain plan of your SQL and determine what changes can be done to make the code operate without using hints if possible. However, hints such as ORDERED, LEADING, INDEX, FULL, and the various AJ and SJ hints can take a wild optimizer and give you optimal performance Tables analyze and update Analyze Statement.

The ANALYZE statement can be used to gather statistics for a specific table, index or cluster. The statistics can be computed exactly, or estimated based on a specific number of rows, or a percentage of rows:

ANALYZE TABLE employees COMPUTE STATISTICS;

ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 15 PERCENT;

EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES');

Automatic Optimizer Statistics Collection

By default Oracle 10g automatically gathers optimizer statistics using a scheduled job called GATHER_STATS_JOB. By default this job runs within maintenance windows between 10 P.M. to 6 A.M. week nights and all day on weekends. The job calls the DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC internal procedure which gathers statistics for tables with either empty or stale statistics, similar to the DBMS_STATS.GATHER_DATABASE_STATS procedure using the GATHER AUTO option. The main difference is that the internal job prioritizes the work such that tables most urgently requiring statistics updates are processed first.

Hint categories:

Hints can be categorized as follows:

ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing systems.

(/*+ ALL_ROWS */)

FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.

(/*+ FIRST_ROWS */)

CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS, based on statistics gathered.

Hints for Join Orders,
Hints for Join Operations,
Hints for Parallel Execution, (/*+ parallel(a,4) */) specify degree either 2 or 4 or 16
Additional Hints
HASH
Hashes one table (full scan) and creates a hash index for that table. Then hashes other table and uses hash index to find corresponding records. Therefore not suitable for < or > join conditions.

/*+ use_hash */

Use Hint to force using index

SELECT /*+INDEX (TABLE_NAME INDEX_NAME) */ COL1,COL2 FROM TABLE_NAME

Select ( /*+ hash */ ) empno from

ORDERED-à This hint forces tables to be joined in the order specified. If you know table X has fewer rows, then ordering it first may speed execution in a join.

PARALLEL (table, instances)àThis specifies the operation is to be done in parallel.

If index is not able to create then will go for /*+ parallel(table, 8)*/-----For select and update example---in where clase like st,not in ,>,< ,<> then we will use.

What is Ment By Indexes ?

1. Bitmap indexes are most appropriate for columns having low distinct values—such as GENDER, MARITAL_STATUS, and RELATION. This assumption is not completely accurate, however. In reality, a bitmap index is always advisable for systems in which data is not frequently updated by many concurrent systems. In fact, as I'll demonstrate here, a bitmap index on a column with 100-percent unique values (a column candidate for primary key) is as efficient as a B-tree index.

2. When to Create an Index

3. You should create an index if:

4. A column contains a wide range of values

5. A column contains a large number of null values

6. One or more columns are frequently used together in a WHERE clause or a join condition

7. The table is large and most queries are expected to retrieve less than 2 to 4 percent of the rows

By default if u create index that is nothing but b-tree index.

What is the difference between sub-query & co-related sub query ?

A sub query is executed once for the parent statement whereas the correlated sub query is executed once for each row of the parent query.

Sub Query:

Example:

Select deptno, ename, sal from emp a where sal in (select sal from Grade where sal_grade=’A’ or sal_grade=’B’)

Co-Related Sun query:

Example:

Find all employees who earn more than the average salary in their department.

SELECT last-named, salary, department_id FROM employees A

WHERE salary > (SELECT AVG (salary)

FROM employees B WHERE B.department_id =A.department_id

Group by B.department_id)

EXISTS:

The EXISTS operator tests for existence of rows in

the results set of the subquery.

Select dname from dept where exists (select 1 from EMP where dept.deptno= emp.deptno);

Sub-query	Co-related sub-query
A sub-query is executed once for the parent Query	Where as co-related sub-query is executed once for each row of the parent query.
Example: Select * from emp where deptno in (select deptno from dept);	Example: Select a.* from emp e where sal >= (select avg(sal) from emp a where a.deptno=e.deptno group by a.deptno);

Explain MERGE Statement in SQL ?

You can use merge command to perform insert and update in a single command.

Ex: Merge into student1 s1

Using (select * from student2) s2

On (s1.no=s2.no)

When matched then

Update set marks = s2.marks

When not matched then

Insert (s1.no, s1.name, s1.marks) Values (s2.no, s2.name, s2.marks);

Differences between where clause and having clause ?

Where clause	Having clause
Both where and having clause can be used to filter the data.
Where as in where clause it is not mandatory.	But having clause we need to use it with the group by.
Where clause applies to the individual rows.	Where as having clause is used to test some condition on the group rather than on individual rows.
Where clause is used to restrict rows.	But having clause is used to restrict groups.
Restrict normal query by where	Restrict group by function by having
In where clause every record is filtered based on where.	In having clause it is with aggregate records (group by functions).

How to Use where and having clauses in Query ?

SELECT column, group_function

FROM table

[WHERE condition]

[GROUP BY group_by_expression]

[HAVING group_condition]

[ORDER BY column];

The WHERE clause cannot be used to restrict groups. you use the HAVING clause to restrict groups.

Difference between Rowid and Rownum ?

ROWID

A globally unique identifier for a row in a database. It is created at the time the row is inserted into a table, and destroyed when it is removed from a table.'BBBBBBBB.RRRR.FFFF' where BBBBBBBB is the block number, RRRR is the slot(row) number, and FFFF is a file number.

ROWNUM

For each row returned by a query, the ROWNUM pseudo column returns a number indicating the order in which Oracle selects the row from a table or set of joined rows. The first row selected has a ROWNUM of 1, the second has 2, and so on.

You can use ROWNUM to limit the number of rows returned by a query, as in this example:

SELECT * FROM employees WHERE ROWNUM < 10;

Rowid	Row-num
Rowid is an oracle internal id that is allocated every time a new record is inserted in a table. This ID is unique and cannot be changed by the user.	Row-num is a row number returned by a select statement.
Rowid is permanent.	Row-num is temporary.
Rowid is a globally unique identifier for a row in a database. It is created at the time the row is inserted into the table, and destroyed when it is removed from a table.	The row-num pseudocoloumn returns a number indicating the order in which oracle selects the row from a table or set of joined rows.