Welcome to the key concepts in Course 4. These are really important as we're starting to wrap up the whole specialization. Let's start off with Notebook. Why is Notebook so important and why we want to talk about it in detail? Well, I think a good way to think about a Notebook is it's really the tip of the iceberg. It's the execution of your code, but behind the scenes there's a bunch of stuff happening. Let's take an environment like SageMaker. What it's doing is it's using this data lake, which has the ability to do infinite disk I/O and infinite disk compute and actually move this data over and over again inside of some machine learning project or data engineering project. The same goes for things like Spark or again, SageMaker. They're all orchestrated by Notebooks but behind the scenes there's these big data operations. There could be compute nodes being spun up, there are training machine learning jobs, or doing inference machine learning prediction. Really a Notebook is important to know both from the top of it, but also to know what's happening underneath. The other thing that we're going to talk about is the command-line tool interface and it really is the most important thing to be aware of for prototyping for data engineering. It's because you can take an input, you can apply a unit of work so it could be as simple as a Python function and then take that result and then make output, and then send that output through a pipe operation to some other unit of work. Then again, that could be a command-line tool which could pipe it again and that could do a unit of work. Really, the command-line interface here is critical to building very quick, efficient, single-purpose tools. The other thing that is really critical that is often never talked about is this concept of DevOps. You may have heard the term before. Let's go ahead and define what DevOps is. Really DevOps is about continuous improvement or Kaizen, and the idea behind continuous improvement is that you're constantly making things better by applying an automation to your code and so what this means is, let's say we're using, we'll call this GitHub Actions, which is a common build system, you would build your code, you would test your code, and then you would deploy code into production and it would be continually getting better and this is a very efficient and fully automated process. One way to do a litmus test for DevOps is if your code is not fully automated in terms of being able to push it into production, you're not doing DevOps, you're doing something else. DevOps is about full automation of high-quality code and continually making it better. Again, this is the concept of Kaizen. Now, how would you do this? Some of the details of this are what I call a scaffold. If you go to a beach community, you'll see a lot. There's houses that have pylons. They're built on top so that if there's a flood water, they're protected. Same thing with a Python project. You have to build this structure out. What I would say are the three most important things for a Python project would be the Makefile. The Makefile allows you to have a series of recipes inside that abstract out really complex sequences of code, so these save you time. They don't cost you time. They save you time in the life cycle of a project. I would recommend all Python developers use a Makefile to make things simpler then later when you go to the deploy process, you just type in make install, make test, make lint. Likewise, a requirements file is critical because it keeps track of not only the packages, but the specific version of the packages. Is very easy to have a problem where it dynamically pulls in the latest version of the package and breaks your project. This has happened many times in real-world scenarios, but you can pin those version numbers by using a requirements file. Likewise, a Dockerfile is pretty fascinating because what it does is finds the runtime for your project so that you can containerize your project and you can push that container to something like a Container Registry. In the case of Amazon, they have a container registry called ECR. There's also container registries like Docker Hub that are public ones. But the idea here is that you package both your code and the container together by using this Dockerfile. I would say most Python projects, especially in the context of data engineering, machine learning engineering, should have a Makefile, requirements file, and a Dockerfile. Finally, we're going to get into microservices, which are critical to master for a data engineer and the reason why is that it allows you to use the web as the input. This would be the HTTPS endpoint. You can go to website, you do some work and the work could be as small as, let's say five lines of Python. It doesn't matter how large the file is, it really matters about what it solves, what the problem is that it's going to do and then you put the output back out and most of that time as this JSON data structure so you can send this out to other people. We're going to get into the microservices. We'll talk about Flask, we'll talk about FastAPI. We'll also talk about data visualization services that you can build. These are really critical final steps to master inside of Course 4.