Skip to main content

Datawarehousing Concepts


Data warehousing :

Data warehousing is combining data from multiple and usually varied sources into one comprehensive and easily manipulated database. Common accessing systems of data warehousing include queries, analysis and reporting. Because data warehousing creates one database in the end, the number of sources can be anything you want it to be, provided that the system can handle the volume, of course. The final result, however, is homogeneous data, which can be more easily manipulated.

Data warehousing is comprised of two primary tools: databases and hardware. In a data warehouse, there are multiple databases and data tables used to store information. These tables are related to each other through the use of common information or keys. The size of a data warehouse is limited by the storage capacity of the hardware.

The hardware required for a data warehouse includes a server, hard drives, and processors. In most organizations, the data warehouse is accessible via the shared network or Intranet. A data architect usually is responsible for setting up the database structure and managing the process for the updating of data from the original sources.                                                                                                                       


Data warehouse:

A data warehouse is a repository of an organization's electronically stored data. It is designed to facilitate reporting and analysis.
  • The purpose of data warehouse is to store data consistently across the organization and to make the organizational information accessible.
  • It is adaptive and resilient source of information. When new data is added to the Data Warehouse, the existing data and technologies are not disrupted. The design of separate data marts that make up the data warehouse must be distributed and incremental. Anything else is a compromise.
  • The data warehouse not only controls the access to the data, but gives its owners great visibility into the uses and abuses of the data, even after it has left the data warehouse.
  • Data warehouse is the foundation for decision-making.;


Difference Between Data Warehousing And Business Intelligence:

Data warehousing and business intelligence are two terms that are a common source of confusion, both inside and outside of the information technology (IT) industry. 
Data warehousing refers to the technology used to actually create a repository of data. Business intelligence refers to the tools and applications used in the analysis and interpretation of data. 
 
Data warehousing and business intelligence have grown substantially and are forecast to experience continued growth into the future.     



Different Types of Data Warehouse Design: 

There are two main types of data warehouse design: top-down and bottom-up. The two designs have their own advantages and disadvantages.
Bottom-up is easier and cheaper to implement, but it is less complete, and data correlations are more sporadic.
In a top-down design, connections between data are obvious and well-established, but the data may be out of date, and the system is costly to implement.

Data marts are the central figure in data warehouse design. A data mart is a collection of data based around a single concept. Each data mart is a unique and complete subset of data. Each of these collections is completely correlated internally and often has connections to external data marts.

The way data marts are handled is the main difference between the two styles of data warehouse design. In the top-down design, data marts occur naturally as data is put into the system. In the bottom-up design, data marts are made directly and connected together to form the warehouse. While this may seem like a minor difference, it makes for a very different design.

The top-down method was the original data warehouse design. Using this method, all of the information the organization holds is put into the system. Each broad subject will have its own general area within the databases. As the data is used, connections will appear between correlative data points, and data marts will appear. In addition, any data in the system stays there forever—even if the data is superseded or trivialized by later information, it will stay in the system as a record of past events.

The bottom-up method of data warehouse design works from the opposite direction. A company puts in information as a standalone data mart. As time goes on, other data sets are added to the system, either as their own data mart or as part of one that already exists. When two data marts are considered connected enough, they merge together into a single unit.

The two data warehouse designs each have their own strong and weak points. The top-down method is a huge project for even smaller data sets. Since big projects are also more costly, it is the most expensive in terms of money and manpower. If the data warehouse is finished and maintained, it is a vast collection, containing everything that the company knows.

The bottom-up process is much faster and cheaper, but since the data is entered as needed, the database will never actually be complete. In addition, correlations between data marts are only as strong as their usage makes them. If a strong correlation exists, but no users see it, it goes unconnected.       


Data Warehouse architecture:

 







 

 

 

 


Source Systems/Data Sources
Typically in any organization the data is stored in various databases, usually divided up by the systems. There may be data for marketing, sales, payroll, engineering, etc. These systems might be legacy/mainframe systems,Flat files or relational database systems.


 Staging Area
The data coming from various source systems is first kept in a staging area. The staging area is used to clean, transform, combine, de-duplicate, household, archive, and to prepare source data for use in data warehouse. The data coming from source system is kept as it is in this area. This need not be based on relational terminology. Sometimes managers of the data are comfortable with normalized set of data. In these cases, normalized structure of the data staging storage is certainly acceptable. Also, staging area doesnt provide querying/presentation services. 


 Ware House
Once the data is in staging area, it is cleansed, transformed and then sent to Data warehouse. You may or may not have ODS before transferring data to Data Warehouse.
 


OLAP
The data in Data Warehouse has to be easily manipulated in order to answer the business questions from management and other users. This is accomplished by connecting the data to fast and easy-to-use tools known as Online Analytical Processing (OLAP) tools. OLAP tools can be thought of as super high-speed forklifts that have knowledge of the warehouse and the operators built into them in order to allow ordinary people off the street to jump in and quickly find products by asking English-like questions. Within the OLAP server, data is reorganized to meet the reporting and analysis requirements of the business, including:
    * Exception reporting
    * Ad-hoc analysis
    * Actual vs. budget reporting
    * Data mining (looking for trends or anomalies in the data)
In order to process business queries at high speed, answers to common questions are preprocessed in some OLAP servers, resulting in exceptional query responses at the cost of having an OLAP database that may be several times bigger than the data warehouse itself.  

                                                                                          

Comments

Post a Comment

Popular posts from this blog

BYNET

DEFINITION BYNET, acronym for "BanYan NETwork," is a folded banyan switching network built upon the capacity of the YNET. It acts as a distributed multi-fabric inter-connect to link PEs, AMPs and nodes on a Massively Parallel Processing (MPP) system. OVERVIEW Interconnect technology is important for parallel computing. The BYNET is Teradata's "system interconnect for high-speed, fault tolerant warehouse-optimized messaging between nodes." [11] As an indispensable part of the Teradata MPP system, it can be understood better with its predecessor "YNET" in the background. In 1982, the YNET interconnecting technology used on the DBC 1012 was patented for the parallelism. As a broadcast-based hardware solution, it linked all the IFPs, COPs, and AMPs together with circuit boards and cables in a dual bus architecture. Two costom-built busses operated concurrently within the interconnect framework: YNET A to connect the IFPs and COPs on one side, and YNET

Teradata Sql Assistant (TSA)

Definition: Teradata SQL Assistant (TSA), as part of Teradata Tools and Utilities (TTU), is an ODBC-based client utility used to access and manipulate data on ODBC-compliant database servers. It has two editions: 1) Teradata SQL Assistant for Microsoft Windows 2) Teradata SQL Assistant/Web Edition Teradata SQL Assistant is an information discovery tool designed for Windows XP and Windows 2000. Teradata SQL Assistant retrieves data from any ODBC-compliant database server. The data can then be manipulated and stored on the desktop PC. Teradata SQL Assistant/Web Edition is a web-based query tool that allows you to compose queries, submit them to the Teradata Database, and then view the results in a web browser. Overview : Teradata SQL Assistant for Microsoft Windows, originally called "Queryman" (before V. 6.0)or "QueryMan" (V. 6.0 and up), is also known as "SQLA" among programmers. It supports import / export tasks, but not the serious ones. Wit

SET VS MULTISET

Table Type Specifications of SET VS MULTISET There are two different table type philosophies so there are two different type tables. They are SET and MULTISET. It has been said, “A man with one watch knows the time, but a man with two watches is never sure”. When Teradata was originally designed it did not allow duplicate rows in a table. If any row in the same table had the same values in every column Teradata would throw one of the rows out. They believed a second row was a mistake. Why would someone need two watches and why would someone need two rows exactly the same? This is SET theory and a SET table kicks out duplicate rows. The ANSI standard believed in a different philosophy. If two rows are entered into a table that are exact duplicates then this is acceptable. If a person wants to wear two watches then they probably have a good reason. This is a MULTISET table and duplicate rows are allowed. If you do not specify SET or MULTISET, one is used as a default. Here is the issue: