Hi. Welcome to Foundations of Data engineering. It's a very exciting course because it takes you through the beginning, all the way to the end of the required resources that you'll be needing to be a Data engineer, to be a Machine Learning engineer. Any of these fields, you're going to need these skill. Let's go ahead and talk through each of these courses and why they're so important to your goal of being a professional in a data industry. To start with, we cover in this first course here, the Python statements. Why is this so important? Well, in Python, you can execute these statements inside of the Jupyter Notebook. This Jupyter Notebook can become a document or a record where you're able to go through and build out the structure, and then share that structure with other people. Also will talk about sequences so many times when you're dealing with data, you need to go through and loop through some form of results and that's something that we'll cover in Course 1. We'll also talk about Pandas itself. When is it important when you're using Pandas to potentially move it to a CSV file or to bring it back to Python? What are the trade-offs between the Pandas version versus the regular Python version that really like dialects. Like in the case of Spanish and Portuguese, there's some similarity, but they're different in the same thing goes with P and S. It's a language tool that is very helpful if you're dealing with columnar data, but we'll talk about that in more detail. Then finally, in Course 1 we get into the development environment. One way I like to think about a development environment is if you're cooking, you have several tools that you're working with. You may have a small knife, you may have a big knife, you may have some other specialized tools, same with development environment. In our case, we're going to talk about Vim. That'll be a small tool like a small little knife to cut up vegetables, and then we'll also talk about larger tools like Visual Studio and Visual Studio Code are very common. Because when you're dealing with a project structure, they allow you to more elegantly handle multiple files. Let's talk about Course 2 here. In Course 2, we're going to get into how to actually use Bash in particular to manage the file system to deal with potentially the Linux Operating System. Almost all software engineering projects now are using the Linux Operating System as a deployment target, so it's important to know how to manipulate it. In particular, one thing that you'll see a lot is you'll see Tilda slash dot bashrc. Now what does this actually do? Well, the bashrc is where you would configure your environment so that you could potentially export some secrets. Or you could go through, and have your Python virtual environment inside of there. There's a lot of things that you can do to automate your bash environment by editing your bashrc. We'll also get into Bash itself so how do you actually potentially write core functionality like a function? How do you write a batch function and use it? How you use, for example, an array or hash? These data structures can be very helpful when you're automating things or scripting things. We'll also get into the characteristics of a bash script. What's a Shebang line? How do you make a script executable? Then how you can actually use those scripts for automation? Finally, we wrap up with talking about the file and data methodology. How do you search a file system using commands like find and locate? Then why would you want to do this? Potentially, how can you use this to be successful when dealing with big data problems? Now let's talk about the 3rd course here. The idea with the 3rd course is it helps you really start to navigate towards building real-world solutions. In particular, you learn how to deal with data. This means that potentially if you want to write a file to the file system, how would you do this? Well, in Python, there are several ways to do it. How could you persist data as well? This is another important thing to remember is you can take your Python data in piglet out to disk and, and that's actually saved exactly like Python would like it, and you can load it back in. We'll also get into SQL, which is very important for Data Scientists, Data engineer, Machine Learning engineer because it is a language that's designed for you to be able to query things. We'll talk about how to implement that inside of Python, how to interface with it, and also how to build solutions with SQL. Another common thing that I see a lot in data science is that people need to learn how to scrape data from other websites. Sometimes websites are not built for you to easily pull the data. We dive into that in Course 3, and we showed you how to actually build out scraping utilities. Now let's get into Course 4 here. The idea with Course 4 by this point, you should have all this foundational knowledge and we start to put it into practice in a particular, we talked about how Jupyter is so important for Machine Learning engineering, and Machine Learning operations. In particular, Jupyter is something that's used for things like SageMaker. SageMaker is a machine learning technology from Amazon and you can build out very complex distributed computing pipelines , showing predictions pipelines. Also there are similar things on other platforms like Microsoft Azure has the Azure ML Studio, Google Platform has Vertex AI. All of these types of technologies are centered around the ability to use Jupyter Notebooks. I also get into working with the Web, Command-line tools, and Python Projects Structure in Course 4. These are very important when you're going to the next level, and you want to put your project into production. In particular, let's talk about the Python project structure of why it's so important. In general, you want to have a scaffolding, a layout for your project. Would you set up things in a way that's easily testable using continuous integration or something that I like to call Kaizen or continuous improvement? What's nice about this is it makes sure that your project is always getting better, it's constantly improving. Once you've got that structure setup, we also talk about how to build Command line tools to prototype solutions. Command line tools allow you to easily take input, manipulate things, moving around. That's what's so important for a Data engineer is the ability to rapidly prototype things. Finally, we talk about web development. This comes up a lot, even though it seems like for a Data scientist or engineer, web development is maybe not that important. It really can become a critical component of your job, especially for the development of something called a microservice. We're going to get into how to deal with those microservices, and actually how that microservice can allow you to deploy something quickly into the Cloud.