Welcome to lesson 1 of module 3 on data warehouse design practices and methodologies. I'm gonna start with important conceptual questions that I want you to think about throughout this lesson. What is the grain of a data warehouse? Why are data warehouses becoming more detailed with finer grains? Module 3 shifts away from the conceptual background in module 1 and the business analyst perspective in module 2. In module 3, you will learn about designing a data warehouse as a relational database. Module 3 covers design patterns, summarizability problems, and design methodologies. Lesson 1 provides basic concepts of relational database designs for data warehouses. You have three learning objectives in this lesson. You should be able to discuss the motivation for using a relational database for data warehouse. In your own words, you should be able to explain the importance of grain determination. Finally, you should be able to provide an example of each type of fact table. The multidimensional data model, described in the previous module, was originally implemented by special purpose storage engines for data cubes. These multidimensional storage engines lack scalability and integration with relational databases. Because of the commercial dominance of relational based technology, it was only a matter of time before relational DBMSes provided support for multidimensional data. In the last 15 years, major DBMS vendors have invested heavily in research and development to support multidimensional data. Because of the investment level and the market power of relational DBMS vendors, most large data warehouses now use relational DBMSes. A multidimensional representation of a data warehouse involves dimensions with attributes on the access of the data cube it measures in cells. A dimensional model for reasonable size data warehouse typically involves multiple data cubes, sometimes sharing dimensions and measures. A dimensional model is translated into a table design to utilize features of relational DBMSes. A table design for data warehouse typically consists of dimension tables, connected to fact tables, and one-to-many relationships. The attributies of a dimension map to columns in a dimension table. Fact tables contain measures as well as foreign keys to dimension tables. Although this basic design is simple, the number of fact and dimension tables can be large, sometimes hundreds of dimension tables and tens of fact tables. In the next lesson, you will see extension to this basic table design. Grain refers to the finest level of detail for a fact table. The grain is determined by the finest level of each dimension related to a fact table. Each dimension must be specified to determine the grain. A common error early in the design process is to omit dimensions related to a fact table. The size of a fact table can be estimated using the cardinality of each dimension discounted by the rate of sparsity. Sparsity increases as the number of dimensions increases and the cardinality of a dimension increases. The grain decision is a trade-off between flexibility and size. Finer grains provide more flexibility for analysis but require higher storage capacity. The trend is towards finer grains, such as maintaining sales values for individual customers, instead of customer postal codes. To provide more insight about grain specification, an example is useful. For a sales fact table, the grain can be coarse, such as a combination of customer postal code, product type, store, and week. Alternatively, the grain can be fine, such as a combination of individual customer, product, store, and date purchased. The numbers in parentheses indicate the number of values of dimensions. The major difference between the coarse and fine grain is the number of postal codes, 1,000, in which customers reside, versus the much larger number of individual customers 200,000. However, the fine grain dimension has a much higher sparsity, 75%, than the coarse grain dimension, 5%. The storage requirements are a function of the product of dimension sizes reduced by sparsity. The storage requirements of the finer grain are more than 7,000 times larger than the coarser grain after reducing for sparsity. However, the finer grain allows reporting on individual customers and products by day. In contrast, the coarser grain only permits reporting on customer postal codes and product types by week. Fact tables are classified based on the types of measure stored. A transaction table contains additive measures. Typical transaction tables store measures about sales, web activity, and purchases. A snapshot table provides a periodic view of an asset level. Typical snapshot tables store semi-additive measures about inventory levels, accounts receivable balances, and accounts payable balances. A factless table records event occurrences, such as attendance, room reservations, and hiring. Typically, factless tables contain foreign keys without any measures. This classification is somewhat fluid, as a fact table may be a combination of these types. This table shows examples of each type of fact table. The names in black are dimensions, and the names in red are measures. The first column depicts a sales fact table with quantity and extended price as measures. Quantity is additive, while extended price is non-additive. The second column depicts a periodic fact table with balance, transaction count, cumulative dividend, and current year's dividend as measures. Each measure is semi-additive across the counts. Transaction count is also a additive across the count types. The third column depicts a factless fact table, recording enrollment or attendance events. Factless tables are the least common. For a university, some measures would likely be stored, such as time spent online and course page visits. Module 3 covers relational database design practices and methodologies for data warehouses. The first lesson covered basic concepts for representing multidimensional data in a table design. You learned about the basic table pattern with dimension tables connected to fact tables, one-to-many relationships, grain of fact tables, and types of fact table. Other lessons in module 3 will extend beyond these basic concepts for schema patterns, summarizability patterns, and practice problems. In answer to the opening question, the grain is the finest level of detail of a fact table. The grain determines the size of a fact table and the flexibility for analysis. For example, a sales fact table, with sales recorded for individual customers, may require 10,000 times storage compared to a fact table for customer postal codes. Flexibility requires much higher storage capacity, but the cost to provide higher storage capacity keep declining, while business intelligence demands keep increasing. Thus, data warehouses have become finer grained to meet the ever-increasing demands for business intelligence.