[MUSIC] Welcome to module 3, for 1 season in 2002, the Chicago Bears, a professional football team, played their home games here in Memorial Stadium. They did this because their home stadium in Chicago was undergoing renovations. The original owner of the Chicago Bears, George Halas, who also founded the National Football League, is an alumni of the University of Illinois. And so he used the colors of his beloved alma mater for his football team. Thus, the Bears also are orange and blue. Nowadays professional sports teams are big business. They generate a lot of data, and that data requires a lot of analytics at every step. From managing the stadium, to engaging with fans, to putting a winning product on the field. One of the most common tools used by business analysts is Excel, or more generally, spreadsheets. These tools allow an analyst to visually interact with data, to perform simple and somewhat complex analyses, and to visualize the data, and the analytic results that they compute, while providing many benefits. And the fact that they can't simplify some analyses, spreadsheets are not ideal for modern day analytics. In this module you'll learn why. And in particular you'll learn why it's better to do this in Python as a programming language. In particular we'll explain how the data frame from the pandas module is better than Excel. In part, this is because of reproducability. If you generate some long data analytics script and it runs on a large data set over a day or a few days, you want your results to be reproducible. You want the results that your coworkers generate using your analytics script to match your results. And you want to have an audit trail so that when you go to your leadership, you can demonstrate why the result you computed makes sense and should drive actionable business intelligence. Imagine trying to trace a analysis through a litany of spreadsheets. Where data had been cut and pasted between different worksheets and the details are not documented. In this course, we will use a notebook, which allows a combination of the documentation, analysis code, and the analysis results. Including visualizations to be combined into a single cohesive package that can be shared, reused, and can inform others of exactly what was done. Next, you will learn about the Unix operating system, which is used by most of the big data platforms, including the standard cloud computing systems like Amazon or Google Cloud. Linux and Mac Os X desktops also are based on the Unix operating system as are most mobile phones. We won't go into the details of the Unix operating system. But we will cover basics like, how files are organized on a disc. This is akin to a filing cabinet, with hierarchical folders, we'll explore file permissions whichi control who can access what. And we'll learn to work with files and directories, so that you can effectively store data. After this, you will learn about reading and writing data out of a file from within a Python programming. This step will start with generic approach before moving on to more specialized techniques that can hide the details. For example, when you're reading or writing a comma separated value or CSV file. CSV files are a popular format for text interchange, and are one way to export a spreadsheet data. This will build on the Unix file system discussion since our course server runs on top of a Unix operating system. And thus all files we read and write are from a Unix file system. Finally, you'll be introduced to the pandas module. pandas module introduces the series data structure, and the data frame data structure. And in particular the data frame is a major tool for any data analyst or data scientist. A data frame resembles a spreadsheet, but exists inside a Python program. At the conclusion of this module you will have mastered the Python concepts required to perform most data analytics tasks. Future modules will build on these concepts to start performing other data analytics tasks, good luck. >> [SOUND]