Discover BI: 2017

Monday, October 23, 2017

Conditional OR joins in SQL query returns duplicate rows

Scenario: Whenever I have a particular OR condition join in my query, I get a duplicated record. The case is when the two join conditions before and after OR are satisfied.

SELECT A.Column1,A.Column2, Name, Address FROM A, B
WHERE (
(A.COLUMN1 = B.COLUMN1) OR (A.COLUMN2 = B.COLUMN2)
)

Cause: Imagine there are rows in A and B that satisfy the two join conditions used in OR, In such a scenario the database will return two rows with same value.

Here as per the query, for row 'Athul' in Table A, both the join conditions match with B and hence will return below output resultset,

As you can see in the result above, 'Athul' is resulted twice. However for the second recrod

Solution: This can be avoided by using a LEFT join instead of an OR join condition. Do not consider using DISTINCT as it is a fairly cost consuming alternative.

Friday, October 13, 2017

Oracle Dual Table alternative equivalent in Microsoft SQL Server

Here is another post addition to the Oracle to Microsoft SQL Server series, This time we discuss about the DUAL table feature that is available in Oracle. However SQL server doesn't support queries on DUAL table as it is not present in the database.

SELECT 1 FROM DUAL returns invalid object name error in MSSQL Server. -

Msg 207, Level 16, State 1, Line 1

Invalid column name 'SYSDATETIME'.

Dual object invalid in MS SQL Server.

Dual table is a one row table that Oracle provides where you can do one row calculations, use to display SYSDATE, test functions with default values etc. eg:-

SELECT SYSDATE FROM DUAL - returns the system date from the database server.
SELECT (9*50) FROM DUAL; - returns 450

Solution or Alternative to DUAL in MS SQL server is as below,

In MSSQL, Any function, calculation, string after SELECT works, Please find the below examples along with screenshot.

SELECT GETDATE() - returns the system date.
SELECT (9*50) - returns 450
SELECT 'Hello World'

Dual alternative in SQL Server.

An alternative to this syntax is to create a DUAL table in MS SQL server with the Dummy column and value 'X' under it. However you have to make sure you that all the users have access this to table to be used just as in Oracle database.

Tuesday, October 10, 2017

Performance difference between != and <> for NOT EQUAL filters in MSSQL / Oracle

Question: Is there a considerable performance improvement when using <> over the != operation in Microsoft SQL Server or Oracle Database ?

Answer: The answer is 'NO'. I had from somewhere heard the same that the <> operator performs better than the != operator. The < > operator for NOT EQUAL to can be preferred as the <> syntax is ANSI compliant unlike the != operator. Chances are when you migrate the same code to a different database, the <> operator is more likely to work without any compilation errors.

So use <> operator instead of != as it is ANSI compliant and will help you reduce probable effort if a database migration happens.

References: Stack Overflow

Tuesday, September 26, 2017

CREATE TABLE from SELECT query in Microsoft SQL Server

Scenario: Oracle has spoiled us so much, We recently came across a scenario where we wanted to have the result set of a SQL query to be saved as a table. In Oracle we have the following syntax to achieve the same,

CREATE TABLE STUDENTS AS SELECT STUD_ID, STUD_NAME FROM STUDENT;

This syntax would create me a table instantly, which I would be able to refer elsewhere. The same feature is available in Microsoft SQL Server as well, just that the syntax is different.

Resolution: The SQL server equivalent of the above query would be as below,

SELECT A.* INTO STUDENTS FROM( SELECT STUD_ID, STUD_NAME FROM STUDENT) A

syntax:

SELECT A.* INTO Insert_Table_Name FROM
(
Your query here.
) A

Note that you need not create the table upfront before running this script. The query creates the table and it's structure based on the result set.

We hope this came of help to you. For more such posts relating Oracle and Microsoft SQL Server check the label - Oracle to SQL Server .

Monday, September 25, 2017

Oracle NVL equivalent in MS SQL Server - COALESCE

Oracle to Microsoft SQL Server

As I am new to the Microsoft SQL Server, I have decided to do a series on Oracle to SQL Server comparison. So here is one new keyword which I learned the other day.

Scenario: I had this scenario where I am supposed to show an email address of an insurance dependent, if this dependent email id is not updated then we should fetch the email id of the parent. If in Oracle I would have gone for the NVL function or a CASE statement to achieve the same. If it just comparing maybe two to three columns then a CASE statement would suffice. What if we had more than 10 columns in a way that we have to show data if one of these 10 columns are having data.

Solution: In this scenario I could go for the COALESCE function in Microsoft SQL Server as the it doesn't provide the NVL function and anyways the COALESCE function performs better than NVL in Oracle and makes your code simpler.

eg: Imagine we have a CUSTOMER table with 4 email address columns, EMAIL_1, to EMAIL_10. we have to return the first available data among these columns checking serially. If in Oracle we could have gone for the below syntax,

SELECT NVL(EMAIL_1,NVL(EMAIL_2,(NVL(EMAIL_3, NVL(EMAIL_4,'Not available,),,'Not available,),'Not available,),'Not available,) EmailId

FROM CUSTOMER

As you can see the code is messy and may take time to understand. But SQL Server provides you a function for such a scenario called the COALESCE function, which does the same for you. The above result in COALESCE would look like below,

SELECT COALESCE(EMAIL_1,EMAIL_2,EMAIL_3,EMAIL_4) EmailId FROM CUSTOMER.

Use of the function makes way for simpler and clean code. The COALESCE function converts your code to a CASE function and executes the same. The difference with ISNULL function is that the ISNULL function is executed only once where as COALESCE is executed until a non NULL argument is reached.

Hope this came of help to you. Happy coding.

Oracle Target Data Warehouse Schema gets Locked out Intermittently. [SOLVED]

Issue: The Oracle Target Data warehouse schema, with dimensions and facts that OBIEE or analytics engine access gets locked out intermittently.

Cause: Oracle account lockouts can happen due to multiple reasons in a Data warehouse environment,

Multiple failed login attempts from a client trying to access the database.
Password expiry date has already passed.
One or more of clients accessing the system has an old password, which it uses to connect to the system.

Solution: The best solution here would be to check all jdbc/odbc/OCI connections where you have saved the schema username and password. This could be anything from connection settings in the analytics engine, parameter files, database links etc. The chances are high that at one of these connections we have provided an incorrect or old username/password combination. This blog suggests to check the listener log on the database server hosting the schema.

If you are using OBIEE, check the RPD connection, JDBC connection you have created to the target schema on the BI Publisher engine etc.

It is an extremely poor solution to increase FAILED_LOGIN_ATTEMPTS parameter at the profile level as this increases the risk of a brute force attack on your database or schema.

Friday, August 25, 2017

MINUS operation in Microsoft SQL Server using EXCEPT

Oracle Database vs Microsoft SQL Server

Coming from an Oracle Database background, I had a tough time getting used to some keywords in SQL Server. While most of the keywords are ANSI compliant, there are some keywords that are unique in SQL Server database. This post is to discuss one such exception with the MINUS set operator keyword which was available in Oracle, but not in SQL Server.

Issue: MINUS keyword not working in Microsoft SQL Server. I had this particular scenario, where in I was rewriting an existing query and trying to performance tune it. To make sure the same records are returned when I make changes to queries, I use the MINUS keyword in Oracle to subtract the new query from the existing one and if no rows are returned I use the new query going forward.

I was trying to run a query similar to below,

SELECT COLUMN_A, COLUMN_B FROM TABLE_NAME

MINUS

SELECT COLUMN_A, COLUMN_B FROM TABLE_NAME

Ideally the query should give me an output with no rows as I am trying to subtract same query from itself. Instead of returning the desired output, SQL server will show me two result sets individually executed without returning an error.

Cause: This is because SQL Server doesn't recognize the keyword MINUS.

Solution: The solution is to use the SQL Server equivalent of MINUS set operation which is the EXCEPT keyword. Below is the restructured query for the same using the EXCEPT keyword.

SELECT COLUMN_A, COLUMN_B FROM TABLE_NAME

EXCEPT

SELECT COLUMN_A, COLUMN_B FROM TABLE_NAME

The except keyword basically functions to return unique records in query on top(or left) that doesn't have an identical record in the query on the right side(bottom).

Things to take care when using the EXCEPT keyword are,

All the data sets(queries) should have identical number of arguments.
Once the number of arguments are same between different queries, the next check is on the datatype of the arguments. The datatype order in different query sets should be same.

Saturday, March 18, 2017

OBIEE Analysis Date Format as 0/0/0 [Solved]

Issue: We recently came across an error where client raised a defect that date comes up as 0/0/0 for some of the records.

Analysis: First we checked the particular records in the EDW target schema. We could see that the records are having date as null in the physical tables.

Cause: The root cause of this defect is that in the physical layer of the RPD for this particular column the nullable flag is not ticked. Since the nullable flag is not ticked, OBIEE expects a value here and substitutes 0/0/0 12:00:00 AM for null values.

The nullable flag property in OBIEE RPD Physical column object.

This could also happen if you manually change the data type from datetime to date. OBIEE imports date as datetime by default.

Resolution: Change the data type to datetime and tick the nullable flag. Once done, save and upload RPD to the server. Now clear your cache, reload files and metadata and check again. If that still doesn't change anything then do a restart of your presentation service.

Tuesday, March 7, 2017

OBIEE security LDAP realm provider JAAS Control Flag options

LDAP Configuration Control Flag Option

When you create a new LDAP, Active directory(ADSI) provider to realm in Weblogic Console, you will be able to see an option called 'Control Flag' with the below four set of choices under provider,

Optional
Required
Requisite
Sufficient

This post we can discuss on the four types of flags and how authentication is carried when each one of them is set to true.

Optional - In optional setting, either of the two providers should have the username/password used to login. The authentication should return success in either one of them. Authentication provider is not always called. If the first authentication provider returns true then authentication ends there.

Required - Required is the default option configured in weblogic, when you have no other providers other than weblogic provider. In this type of control flag, the user must pass authentication in all providers. If there are 5 providers with control flag,

Requsite - User must pass authentication test with the particular provider. Other providers are also tested.

Sufficient- When you create a new provider to existing LDAP, user required to pass at atleast one provider. If one is a successful authentication, then authentication is completed.

Note: For existence of Custom LDAP and obiee realm, SUFFICIENT' should be opted for.

Wednesday, March 1, 2017

Importance of creating separate schema for Work in ODI

We had come across an article a long while back on the significance of using a separate schema for work instead of Target schema in ODIExperts blog. We later learned from this as we could clearly see the advantage of doing the same.

We use OBIEE as our front end tool and we could see that all the work tables of ODI for each target table would be present in the target schema. The more tables in the target schema, the more time it takes for OBIEE to process the import metadata step. Imagine you have 10 tables in your DWH target schema, if you don't define a separate work schema for each table then for one table, W_STUDENT_D you will have the following work tables also,

C$_W_STUDENT_D,

I$_W_STUDENT_D,

E$_W_STUDENT_D

This not only makes OBIEE import slow also it takes up table space on the target schema making database management hard. If maintained separately then you can purge all tables in the work schema at once. The worry of accidentally truncating any of your target tables are also eliminated this way.

Another important step as they discussed is to not use the 'Stage area different from target option', which is done at the interface level. The same can be done at the physical schema in topology level,

ODI Work Schema.

Using ODI Procedures to load data from Source to Target

We came across a scenario where we used ODI procedure to load data from source to target. The scenario was that we had to implement a very complex logic using sub queries in an ODI Interface. given the short timeline, we went ahead with the approach of loading the same using query in an ODI Procedure. We will discuss the same below,

An ODI Procedure has two parts, Command on Source and Command on Target,

ODI Command on Source is where you write the source SQL query that you use to fetch data from source.
ODI Command on Target is where you write the insert script to load data into the target.
Load data from one source to another source (eg: Oracle to Teradata, MSSQL to Oracle, etc)
The source side query can be a plain query. The number of columns in select though should match with the number columns in the insert script and their datatypes also should match.

eg: SELECT ROLL_NO,STUD_NAME FROM STUDENT

The insert script should contain bind variables to select values from the source query, the insert script for the above source query would be,

INSERT INTO W_STUDENTS_D VALUES (:ROLL_NO,:STUD_NAME)

Extremely inefficient compared to interfaces.
Never preferred over interfaces. Interfaces are always faster as ODI follows all standards while generating queries. Preferred for loading of tables with less volumes of data.'
Row by row processing happens and hence it takes longer time for data load.

Typically the errors you get are either SQL syntax errors, Data type mismatch, column size errors, column order mismatch errors.

Thursday, February 23, 2017

All about Oracle Data Integrator(ODI) Variables

ODI variable is an ODI object that can be used to store values of string, date or number datatypes. Variables can be both static and dynamic, they can be updated at run-time using a query. These variables can be passed to other ODI objects like interfaces or procedures as filters or as column values.

Can store String/Date/Number values.
Values are updated during run-time.
Values can be passed to other ODI objects like Procedures and Interfaces.
Can either be Static/Dynamic in nature.
Dynamic ODI variables can be refreshed with a SQL query executed against a source database.
Variables are commonly used for filter conditions, case conditions etc in interfaces, procedures and packages.

The scope of an ODI variable is two,

Global: The variable can be called across projects in ODI repository.

Project: The scope of the ODI variable is limited to the particular project.

There are three attributes to an ODI variable,

Name
Datatype
Action

Action determines how values are to be stored in an ODI variable, The action types are

Persistent: Values are stored for a particular session

Last Value: Stores the last value.

Historize: Stores current the value and the others in history. Very useful if your variable values change frequently over time. Helps in debugging.

An ODI variable can be called using the below syntax,

#PROJECT_NAME.Variable_Name

ODI used scenario examples,

Imagine you are passing a date value dynamically from database to fetch records for ETL load. Instead of using a sub query filter, you can fetch the date value to a variable and use it in all your procedures and interfaces.
Another scenario is storing JDBC url. You might be using JDBC urls at different places in an ODI instance. If hardcoded at each place, it would be hard to find and map the change wherever the same JDBC url has been used. Instead you can save the JDBC url as an ODI variable and then use the variable name wherever needed in topology.

Oracle has a pretty neat document on variables.

Sunday, January 1, 2017

Dimensional Modelling - Loan Dimension in Banking Data warehouse - Part 1

Loan Dimension in a Banking Enterprise Data warehouse is very crucial. It is undoubtedly one of the if not biggest data warehouse dimension in a banking DW architecture. It is also of one of the most confirmed dimensions in an banking data warehouse. Loan dimension has a lot of textual attributes and also an almost equal number of measurable attributes. It is also joined to many other entities of a banking EDW. A few examples of the key entities are the following,

Branch
Primary Customer
Teller
Loan Product
Time/Dates ( Maturity, Commencement, Settlement, Disbursement, Dormancy )

As we discussed before there are many textual attributes are unique to a loan dimension and few examples are listed below,

Account status,
Product Name
Settlement, Account open date, Loan Approved date, First disbursement date, Last Transaction date.

Considering the size of a loan dimension it can be split into multiple tables to improve performance and maintainability. A good way to start modelling a dimensional model is to create bus matrix as suggested by Ralph Kimball.

The bus matrix for a Loan dimension in an ideal scenario would look as below,

Bus Matrix for Loan Dimension.

The loan dimension being one of the largest entities will take up a number of columns. It is a good practice to split the dimension as follows in case it has a large number of table,

The first partition of the dimension table should have all the most commonly used attributes, like Status, Balance etc. The second partition should have all the measurable attributes like Approved amount, applied amount, and the corresponding date keys. These measures can also act as attributes in some cases. The second table can also be used as a fact in some cases. The third or the next partitions should have all the junk values related to the dimension. Advised to put all the rarely used dimension columns here. These tables can be made to one single dimension in BI tool for example with the help of Logical table sources in OBIEE.

Why we are advising this strategy is because otherwise the maintenance of this table would be down the line a headache. Imagine a table with 500 columns. A single insert statement would be of large size. Hence taking more time to execute and load into target schema.

Naming conventions, You can go ahead with standard naming conventions followed by Oracle or have one of your own. An Oracle naming standard would be,

W_LOAN_D

If you have multiple sources, Then you can have a source abbreviation also in the same name.