It creates a schema design for the data warehouse that integrates the data sources, identify summarizability problems in the design, and populate data warehouse tables from sample rows in the data sources.
Mini case study contains two data sources with sample data along with a statement of business needs. Using the data sources and business needs, we will specify a dimensional model with dimensions, measures, and grain, create a schema design for the data warehouse that integrates the data sources, identify summarizability problems in the design, and populate data warehouse tables from sample rows in the data sources.
A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.
In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.
OLTP (Online Transaction Processing): It is used for handling transactions (SELECT,INSERT,UPDATE,DELETE). The primary objective is data processing and not data analysis. The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second. In OLTP database there is detailed and current data, and schema used to store transactional databases is the entity model (usually 3NF). Databases are modeled on the concept of OLTP
OLAP (Online Analytical Processing): The primary objective is data retrieving and data analysis. It is basically an online database query answering system. In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema). Sometime query need to access large amount of data in Management records. Data Warehouse is modeled n the concept of OLAP
The following are some examples of differences between data warehouse and OLTP systems
1> Workload:
Data warehouses are designed to accommodate ad hoc queries. You might not know the workload of your data warehouse in advance, so a data warehouse should be optimized to perform well for a wide variety of possible query operations.
OLTP systems support only predefined operations. Your applications might be specifically tuned or designed to support only these operations.
2>Data modifications:
A data warehouse is updated on a regular basis by the ETL process (run nightly or weekly) using bulk data modification techniques. The end users of a data warehouse do not directly update the data warehouse.
In OLTP systems, end users routinely issue individual data modification statements to the database. The OLTP database is always up to date, and reflects the current state of each business transaction.
3>Schema design:
Data warehouses often use denormalized or partially denormalized schemas (such as a star schema) to optimize query performance.
OLTP systems often use fully normalized schemas to optimize update/insert/delete performance, and to guarantee data consistency.
4>Typical operations:
A typical data warehouse query scans thousands or millions of rows. For example, "Find the total sales for all customers last month."
A typical OLTP operation accesses only a handful of records. For example, "Retrieve the current order for this customer."
5>Historical data:
Data warehouses usually store many months or years of data. This is to support historical analysis.
OLTP systems usually store data from only a few weeks or months. The OLTP system stores only historical data as needed to successfully meet the requirements of the current transaction.
At the core of the OLAP, concept is an OLAP Cube. The OLAP cube is a data structure optimized for very quick data analysis.
The OLAP Cube consists of numeric facts called measures which are categorized by dimensions. OLAP Cube is also called the hypercube.
Usually, data operations and analysis are performed using the simple spreadsheet, where data values are arranged in row and column format. This is ideal for two-dimensional data. However, OLAP contains multidimensional data, with data usually obtained from a different and unrelated source. Using a spreadsheet is not an optimal option. The cube can store and analyze multidimensional data in a logical and orderly manner.
A Data warehouse would extract information from multiple data sources and formats like text files, excel sheet, multimedia files, etc.
The extracted data is cleaned and transformed. Data is loaded into an OLAP server (or OLAP cube) where information is pre-calculated in advance for further analysis.
Four types of analytical operations in OLAP are:
1>Roll-up
2>Drill-down
3>Slice and dice
4>Pivot (rotate)
Roll-up is also known as "consolidation" or "aggregation." The Roll-up operation can be performed in 2 ways:
1> Reducing dimensions
2> Climbing up concept hierarchy. Concept hierarchy is a system of grouping things based on their order or level.
Consider the following diagram:
-In this example, cities New jersey and Lost Angles and rolled up into country USA.
-The sales figure of New Jersey and Los Angeles are 440 and 1560 respectively. They become 2000 after roll-up.
-In this aggregation process, data is location hierarchy moves up from city to the country.
-In the roll-up process at least one or more dimensions need to be removed. In this example, Quater dimension is removed.
In drill-down data is fragmented into smaller parts. It is the opposite of the rollup process. It can be done via
1> Moving down the concept hierarchy
2> Increasing a dimension
Consider the following diagram:
-Quater Q1 is drilled down to months January, February, and March. Corresponding sales are also registers. -In this example, dimension months are added.
Here, one dimension is selected, and a new sub-cube is created.
Following diagram explain how slice operation performed:
-Dimension Time is Sliced with Q1 as the filter. -A new cube is created altogether.
This operation is similar to a slice. The difference in dice is you select 2 or more dimensions that result in the creation of a sub-cube.
Consider the following diagram:
In Pivot, you rotate the data axes to provide a substitute presentation of data.
In the following example, the pivot is based on item types.
The tables that describes the dimensions involved are called Dimension tables
Dividing a Data Warehouse project into dimensions provides structured information for analysis and reporting
End users fire queries on these dimension tables which contain descriptive information
A fact is a measure that can be summed, averaged or manipulated
A Fact table contain 2 kinds of data- a dimension key and a measure
Every dimension table is linked to a Fact table.
A schema gives the logical description of the entire database
It gives details about the constraints placed on the tables, key values present and how the key values are linked between the different tables
A database uses relational model, while a datawarehouse uses Star, Snowflake,and Constellation schema.







