In old companies data was recorded on books and registers while in new age companies now data is stored on hard drives and captured through ERP systems, proprietary systems, excel sheets, accounting soft-wares and databases. Today data comes from following business functions. These functions even work independently but are still connected to each other.
- HR and others.
Now if we want to build report of any of two functions then we probably can not as both systems are different and they do not ave common data. So if we want to find out revenue of two products sold only in India then its literally impossible. Measuring business performance by various characteristics its not possible with heterogeneous systems and no common data. In today’s competitive world companies need to analyze what data they have should identify trends to build competitive advantage so that company can sustain today and survive tomorrow. So what we need is
- Single data storage
- Data should be complete for our purposes of reporting and business intelligence
- Data should be consistent so that we do not miss any important facts
- What we also need is to assemble and manage various sources of data
- A different database other than operational systems so as to maintain speed and consistency
Data warehouse is nothing but what we have defined above. Its single, complete and consistent store of data obtained from various systems ans sources which are made available to end users in businesses.
Data warehousing is process of gathering and managing data from various sources for purpose of answering business questions.
Today it is one of fastest growing trends in corporates. Many banks, hospitals, retailers, manufacturers and others are using it to gain strategic advantage over competitors.
Data warehouse architecture:
Data warehouse consists of
- Operational systems which mainly are OLTP systems. These are generally systems used daily by employees like ERP, Accounting, Inventory management system and CRM. These systems are main and only source of data for data warehouse. So if these systems are not recording data correctly then there is no use of data warehouse. OLTP is online transaction processing which are day to day transaction systems.
- This data has to be transformed to consistent data model to fit in data warehouse.
- Staging areadoes collection of data from various OLTP systems and integrates them as per rules defined by administrator. No analysis is performed on this data. E.g. if date in one system is dd/mm/yyyy and in other is mm/dd/yyyy then both should be transformed to any one which will be consistent throughout.
- ETL(Extract, Transform and Load) : This takes place in three steps.
- Extract data from various sources correctly. (Check received data pattern for corrections)
- Transform the data to consistent model with required accuracy (E.g. Sorting, Coding, Joining)
- Load the data into data warehouse
- Data mart is subset of data warehouse and is generally for specific business purpose. Data mart access is given to end users for reporting. It is constructed by start schema or snowflake schema. It is collection of one or more fact tables and many dimensions tables. Data warehouse is collection of many data marts with few common dimension tables.
- Data cube is multidimensional data model based on facts and dimensions around facts. These are for very quick access to reports. Generally users ask for reports on cubes.