Welcome to Lesson 1 of Module 5 on architectures, features, and details of data integration tools. I'm going to start with an important definitional question that I want you to think about throughout this lesson. Why are data integration tools often referred to as ETL tools? Module 5 extends your background about data integration from module 4. Module 5 covers architectures, features, and details about data integration tools to complement the conceptual background in module 4. In lesson 1, you will learn about two architectures in the marketplace for data integration tools. Other lessons in module 5 cover features of data integration tools and details about two open source tools, Talend Open Studio and Pentaho Data Integration. You have three learning objectives in this lesson. You should be able to discuss the motivations for data integration tools. You should be able to explain the differences between ETL and ELT architectures. Here's a more reflective goal. You should think about the market summary dimensions of execution and vision. To support the complexity of data integration processes to populate and refresh a data warehouse, software products for data integration have been developed. In earlier years of data warehouse developement, data integration involved tedious coding for data cleaning tasks and data source connectivity. Many project failures partly resulted from unexpected difficulties during the labor-intensive process to develop data integration solutions. In addition, organizations experience difficulty meeting performance requirements because of the resource-intensive nature of refresh processing. Software vendors realize the potential to improve software development productivity in refresh performance. Data integration software have evolved from independent tools to integrated development environments, supporting a full range of data integration tasks, graphical and visual specification, and cogeneration to minimize custom coding. Improved performance is a more recent feature of data integration tools. Many data integration tools now provide features such as scalable parallel processing to help organizations meet demanding performance requirements of refresh processing. Data integration tools support two architectures. The extraction, transformation, and loading architecture, abbreviated ETL, performs transformation before loading, as shown in this diagram. Extraction, data cleaning, and other integration tasks are performed by a transformation engine before loading into the target data warehouse tables. The transformation engine is independent of the DBMS for data warehouse tables. The extraction, loading, and transformation architecture, abbreviated at ELT, uses a relational DBMS to perform transformations after extraction loading. The ELT Architecture is designed to utilize high performance features of the enterprise DBMSes. ETL architecture supporters emphasize DBMS independence of ETL engines, While ELT architecture supporters emphasize superior optimization technology in relational DBMS engines. ETL architectures can usually support more complex operations in a single transformation than ELT architectures, but ELT architecture may use less network bandwidth. Some data integration tools support both architectures, so the distinction between the architectures may blur somewhat in the future. In addition, a combination of ETL and ELT processing may provide better performance for enterprise data warehouses, so the demand for both architectures should grow without either architecture dominating. The data integration marketplace is diverse and vibrant. The diverse marketplace provides third-party vendors and DBMS vendors offering both proprietary and open-source products, along with a wide range of services. Third-party vendors emphasize support for a variety of DBMS products. DBMS vendors leverage relational database support for data warehouse implementation. Open-source products are typically base products with subscription services for extended products and support. The vibrancy of the marketplace continues with extensive product development and consolidations. At the end of 2014, a Gartner report estimated the data integration market size at 2.5 billion, with annual growth rate of 10% through 2019. DBMS vendors Oracle, IBM, and Microsoft have strong market penetration, along with thirdsparty vendors Informatica, SAP, and Information Builders. Beyond these market leaders, the market contains 10 to 20 additional firms with reasonable market penetrations. The marketplace also contains strong open-source presence with Pentaho, Talend, and CloverETL providing base open source products with subscription services for extended products and support. The Gartner group classifies vendors on two dimensions, execution and vision. High execution indicates large firm size and resources to support current and future products. High vision indicates a firm with broad integrated product offerings. According to this classification, Informatica is the market leader, a position held for ten years. Informatica is followed by IBM, SAP, Oracle, and SAS. Gartner indicates that Microsoft is high in ability to execute but low on completeness of vision. Talend, with open source products and support services, has strong vision but weak execution. Module 5 covers architecture's features and details of data integration tools. Data integration tools are essential for software productivity and performance. The first lesson covered the ETL and ELT architectures supported by data integration tools along with a summary of the diverse data integration market. In answer to the opening question, data integration tools are often referred to as ETL tools. The ETL architecture was the original and only architecture for more than a decade. Thus, many professionals use ETL tools rather than the broader term data integration tools. The ELT architecture is now gaining traction as DBMS vendors have developed integration products leveraging DBMS storage and optimization technology. Independent data integration tools have begun to support both architectures to provide more flexibility.