Data Warehouse and its need

Date January 12, 2009

Introduction

These days there are so many companies trying to build data warehouses and there are a lot of companies selling Business intelligence products. The new version of SQL Server 2005 has the Business Intelligence Development Studio and this is one of its main upgrade from SQL Server 2000. It enhanced the DTS environment and changed its name to Integration Services (SSIS), which is similar to an ETL tool. It integrated with .NET so it can easily build reports or online applications. Microsoft is definitely putting its energy in the data warehouse business.

Also SQL Server is gaining on the market share of other RDBMSs and most experts said it was because of the BI integration. However, does your company need a data warehouse? And how do you choose the right BI product for your company?

First it must be the business direction that produces the driving force to build the data warehouse and not the IT department. The company must identify the goals and objectives of ‘why install a data warehouse’.

For example, my old company was a bank; it decided to build a data warehouse so it could keep track of a loan from the application process until repayment. One objective was the business wanted to find out how long it took the loan from application status to approval status, and if the loan got denied, they wanted to find out what the reason was. However when the company decided to build the data warehouse, the IT department helped the organization to understand the scope of the project. The business has to identify what the enterprise needs. It has to decide how much they want to spend on the project, what kind of data they want to capture, etc. This helps to determine what kind of hardware, operating system, communication links, RDBMS and application program IT needs. The business also has to understand that it is a continuous project, both business and IT has to decide who is responsible to fix problem, who is responsible for new requests and who is responsible for tuning. It needs continuous support from the business.

The IT department has to know what they need in order to build a successful data warehouse. First they need to understand the business data. It needs an implementation plan, a realistic timeline, a team of people with knowledge to build the data warehouse – a data architect, a data modeler, business analyst, data quality administrator, meta data administrator, security administrator and good database developers. The scope of the project has to be reasonable; sometimes the business tends to ask too much at once, the IT manager has to understand one thing – never takes on too much too fast. It tends to fail.

Data Quality

Data is an important asset to the business; it helps the business to make daily decision and direction of the organization. Data quality is the most important element of the data warehouse. Data quality and integrity can directly affect many decisions. The data warehouse is not just capturing data. It can also minimize data redundancy, increase customer satisfaction, improve data quality, and provide good customer reports. So the business needs to set up a metrics to measure if the data warehouse is successful. The response time, the quality of the data, the satisfaction of the customers are some of the measurements the business needs to look at. If the data is incorrect and no one trusts the data warehouse, it becomes a whole bunch of tables sitting in a database.

In a lot of companies, the data can come from ERP systems, legacy systems, outside sources and even may be Excel sheets on someone’s PC. The data warehouse team has to take responsibility to validate the data, to work with the business to set up business rules and to work with data owner to set up security for the data. This is part of the ETL process.

For example, the ERP system had a customer name ‘John Doe’; the legacy system had a customer name ‘Johnny Doe’. Are they the same person? Did they have the same address? It they did, that meant they were the same person. This is part of the cleansing process. It is important because the customer count will be off if the data is not validated. It may mess up the reports for marketing, sales and the finance departments.

The database has to be well designed to eliminate redundant data. The data has to be standardized and consistent. It also needs to have business rules to process certain data. For example if the customer has an invalid zip code, should it be put in the data warehouse or should it be put in an error report until the data gets corrected. Meta data is one way to standardize the data; it also documents the source of the data, where the data stores in the database, the definition of the data and the business rule of the data. In this way, the IT department can easily identify if the data is in the wrong database or comes from the wrong source; it can also explain the definition of data to the customer.

Business Intelligence Products

So now the company decides to build the data warehouse, how do you find the right tool? First the company has to evaluate what RDBMS to use for the data warehouse. You have to know how much data does the company have, how often does the company need to load into the data warehouse and how fast it needs to load the data. There are SQL Server, Oracle, Teradata, DB2, and others. The hardware, the communication links and the support is also an important factor to consider choosing the right database. After serious consideration, my company decided to use SQL Server 2000 for our data warehouse because of the size of our data warehouse and the cost. We also look ahead the new features of SQL Server 2005 and its integration with .NET for future development.

Then we have to choose the ETL tool. There are a lot of ETL tools in the market, for example Ab Initio, Informatica Power Center, IBM WebSphere as well as good old DTS packages in SQL Server. One of the data warehouse experts had said that the important part of ETL is that the people must understand the data. The best tool is the tool that you can retrieve and load the data efficiently and correctly and is the most cost effective for the company. Do not just look at all the fancy and expensive tools, evaluate every aspect. Is the tool easy to set up? Is the tool easy to learn to use? If it takes a developer six months to learn to use the tool, is it worth the time and effort?

I used Ab Initio to move the same amount of data from Oracle 9i to SQL Server 2000 and at the same time I used DTS package to do the same thing. The response time was almost the same. But Ab Initio costs a lot of money and DTS package is free. I am not saying Ab Initio is not a good product, if the company has to retrieve zillions of data everyday, it is definitely a good product to consider. Just liked I go from place A to place B (approximately 10 miles) and I go there driving a Honda Civic. It is cheap and reliable. At the same time I can also go there driving a Hummer. Do I prefer a Honda Civic than a Hummer? Of course I would like to drive a Hummer if I am as rich as Bill Gates, but my bank account tells me a Honda Civic is good enough! But if I have to drive up to Mount Rushmore in South Dakota in December, I may consider another vehicle. In my case we use good old DTS packages and stored procedures to extract, cleanse and load our data warehouse and it is doing a very fine job.

A business intelligence tool is used to help the customers to access and analyze the data in the data warehouse. After you build the data warehouse, then you have to choose BI tool. It must be the one provide the most benefits to the business and match the business expectations. The tool has to provide security and privacy of the data. The users can access it and learn to use it fast. If it does, then it is a good business intelligence tool. As I said before SQL server 2005 has Business Intelligence Development Studio to do the job. There is also a lot of other companies selling BI products; for example Cognos, Hyperion, and MicroStrategy. The business and IT have to evaluate which product works the best for the customers. Right now we use Cognos for reporting, but with the rising cost of Cognos and with the new features in SQL Server 2005, who knows, anything can change in the future.
Well, does your company need BI and DW? You have to ask your CIO and CTO, and it is up to you to explain to them the importance of a data warehouse to your company, also you have explain all the pros and cons of choosing the right product.

Ref: http://www.sqlservercentral.com/articles/Design/doyouneedadatawarehouse/2460/

OLAP Tools

Date January 10, 2009

OLAP tools deliver powerful analysis

Companies today have come to rely on state-of-the-art OLAP tools for high-level, multidimensional analysis of corporate performance. OLAP tools allow companies to get to the bottom of complicated queries through quick consolidation of information in data-rich matrices. This kind of reporting functionality is important for all businesses, but the ability to link reporting to performance strategy is the key to enterprise success. Integrating OLAP tools with a comprehensive performance management program results in better business intelligence and more informed planning about processes, budgeting, and other critical pieces of business infrastructure. For the most current business solutions available on the market today, Cognos is the answer.

OLAP tools are geared towards slicing and dicing of the data. As such, they require a strong metadata layer, as well as front-end flexibility. Those are typically difficult features for any home-built systems to achieve. Therefore, my recommendation is that if OLAP analysis is part of your charter for building a data warehouse, it is best to purchase an existing OLAP tool rather than creating one from scratch.

OLAP Tool Functionalities

Right now, there is a convergence between the traditional ROLAP and MOLAP vendors. ROLAP vendor recognize that users want their reports fast, so they are implementing MOLAP functionalities in their tools; MOLAP vendors recognize that many times it is necessary to drill down to the most detail level information, levels where the traditional cubes do not get to for performance and size reasons.

So what are the criteria for evaluating OLAP vendors? Here they are:

  • Ability to leverage parallelism supplied by RDBMS and hardware: This would greatly increase the tool’s performance, and help loading the data into the cubes as quickly as possible.
  • Performance: In addition to leveraging parallelism, the tool itself should be quick both in terms of loading the data into the cube and reading the data from the cube.
  • Customization efforts: More and more, OLAP tools are used as an advanced reporting tool. This is because in many cases, especially for ROLAP implementations, OLAP tools often can be used as a reporting tool. In such cases, the ease of front-end customization becomes an important factor in the tool selection process.
  • Security Features: Because OLAP tools are geared towards a number of users, making sure people see only what they are supposed to see is important. By and large, all established OLAP tools have a security layer that can interact with the common corporate login protocols. There are, however, cases where large corporations have developed their own user authentication mechanism and have a “single sign-on” policy. For these cases, having a seamless integration between the tool and the in-house authentication can require some work. I would recommend that you have the tool vendor team come in and make sure that the two are compatible.
  • Metadata support: Because OLAP tools aggregates the data into the cube and sometimes serves as the front-end tool, it is essential that it works with the metadata strategy/tool you have selected.

Popular Tools

  • Business Objects
  • Cognos
  • Hyperion
  • Microsoft Analysis Services
  • MicroStrategy

Multidimensional OLAP (MOLAP)

Date January 10, 2009

A cube is aggregated from the relational data source (data warehouse). When user generates a report request, the MOLAP tool can generate the create quickly because all data is already pre-aggregated within the cube.

Subscribe to Saqib-Ansari

Date January 8, 2009

There are two main ways that you can keep up to date with the latest news at Saqib-Ansari. Each option will suit different people and you’re more than welcome to pick any (or all) of them.

RSS Feed by FeedBurner

What Is RSSFor those of you familiar with RSS and News Aggregators you can follow Saqib-Ansari via our RSS feed. Simply add it to your news aggregator and you’ll get notification every time something is published on Saqib-Ansari. Learn More about RSS and how to use it.


Read the rest of this entry »

Subscribe to Technology Tips

Subscribe to Saqib-Ansari via RSS

Or, subscribe via email: