Welcome to lesson one of module two on Multidimensional Data Representation and Manipulation. I'm gonna start with an important motivational question that I want you to think about throughout this lesson. Why are two different data models used for data warehouse representation? You will learn about one model in module two and the usage of the relational model for data warehouses in module three. Module two shifts away from the conceptual background of module one. In module two, you will learn about multidimensional data representation and manipulation, important for usage of a data warehouse by business analysts. Lesson one provides an overview of the multidimensional data model. You have three learning objectives in this lesson. You should be able to discuss the business analysis perspective on data warehouses. In your own words, you should be able to explain reasons for sparsity. Finally, you should be able to provide an example of each aggregation property for measures. Business analysts typically think about problems from a perspective of factors, and outcome variables. A factor is usually a qualitative variable. Such as location, impacting an outcome variable, such as an employee turnover. Business analysts will often use a diagram to represent relationships between factors and outcome variables. A diagram can show the direction of relationships that impact that is positive or negative, and direct or indirect influences. This variation of a fish bone diagram shows four factors, management, location, marketplace, and compensation, that directly influence employee turnover. The business analyst perspective provides insight for data warehouse representation. A data warehouse representation should support this type of reasoning about business problems. Early developers of data warehouse software developed a data model that directly supported this type of reasoning. A data cube supports this business analyst perspective. A data cube provides a multidimensional range of factors as dimensions in quantitative variables in the cells of a data cube. A dimension is a subject label for a row or column. For example, a dimension may be city size or type of health plan offered. A data cube is multi-dimensional. It is not limited to two or three dimensions. A measure is a quantitative variable of interest stored in the cells of a data cube. For example, a measure may be employee turnover, an important metric about employment cost. A cell may have multiple measures for flexibility. In two or three dimensions, data cubes can be easily visualized. This three-dimensional sales data cube contains location, product, and time dimensions. The location dimension on the rows contains USA states, such as California, Utah. The product dimension on the columns shows printer products, such as mono laser inkjet. The time dimension on the depth, or z-axis, shows dates. A cell contains sales in thousands of US dollars for a combination of state, printer product and date. For example, the sales of mono lasers in Colorado on January 1, 2013, is 65,000, is a cell units are thousands of US dollars. Visualization is not simple for data cubes beyond three dimensions. Other lessons in this module will show visualizations providing a software products for data cubes with more than two dimensions. The sales data cube provides insight about extensions of the data cube representation. An important extension is a need for hierarchical representation of some dimensions. For example, the location dimension can contain region, country, province, or state, city and postal code. The sales data cube only shows the USA state level, but clearly location has a hierarchical structure. In many types of business analysis, reasoning about the hierarchical structure of a dimension is important. In lesson two, you will see operators for hierarchical dimensions. Sparsity or empty cells is common in data cubes. The sales data cube does not show empty cells as only the outer face of the cube is shown. It is likely that some combinations of states, products, and dates, do not have any sales. That is a zero sales. Sparsity increases as the granular detail in the dimension increases, such as states to cities. And the number of dimensions increases, such as three to ten dimensions. For large data cubes a majority of the cells could be empty. Sparsity impacts visualization and storage requirements. Two important extensions for cells are multiple measures and derived measures. Typically an organization has a collection of measures that are important to track for an area. For example for retail sales, the number of transactions, number of units and number of gross sales are important. Derived measure such as the sales per transaction are also important. The aggregation property indicates allowable summary operations for measures. Additive measures can be summarized across all dimensions using addition. Common additive measures are sales, cost, and profit. Semi-additive measures can be summarized in some dimensions but not all dimensions, typically not in the time dimension. Periodic measurements such as account balances and inventory levels are semi-additive. Non-additive measures can not be summarized in any dimension. Historical facts involving individual entities such as a unit price are non-additive. Some non-additives measures can be converted to additive or semi-additive. For example, extended price, that is unit price times quantity is additive although unit price is not additive. Business analyst who do not understand allowable operations for measure may perform operations that have no meaning. Thus, understanding measure aggregation is important for data warehouse design and usage. Let's consider a data cube with several hierarchical dimensions. Course, student and time. And four measures, credit hours, grade, unit tuition and tuition revenue. I will comment on aggregation of properties for some measures and leave the others as an exercise for you. Credit hours is an additive measure because it can be summed across all dimensions course, student and time. For example, the number of credit hours can be summed for an individual course or program of study. Grade, often reported on a four point scale, is non additive. The sum of grades for a student, of course, has no meaning. Grades can be averaged, however, such as the grade point average calculation. Module two covers the multidimensional data model, a convenient model for business analysts using a data warehouse. In this lesson, you learned about the problem solving approach of business analysts involving the determination of factors that influence outcome variables. To represent the business analysts perspective, you learned about a multi-dimensional representation known as a data cube. You were shown an example data cube and properties and dimensions and measures. In later lessons of this module, you will see software tools to define and manipulate data cubes. In answer to the opening question, two data cubes are important for data warehouses. For the business analyst perspective, data cubes are natural. Early data warehouse software used the data cube representation to support business analysts. As data warehouse usage grew, however, the limitations of the data cube representation became apparent. In particular, sparsity and lack of integration with the relational DMSs became major problems. DMS vendors soon realized the market potential for data warehouses, and developed product features to support large data warehouses. Module three emphasizes design patterns for data warehouses, implemented as relational databases. Course three emphasizes relational DMS features, especially SQL extensions and summary data storage for data warehouses.