Machine learning is completely transforming the way people build products and how they derive insights from data. Let's get started with ML. In this chapter, we will look at what machine learning is. We will look at machine learning from the point of view of playing with it,. That's one of the best ways to learn what something is. Then, we will look at how to create effective machine learning models. And finally, we will create the machine learning datasets that we will use in the rest of this course. Machine learning is a way to derive insights from data. Machine learning works only when you have lots of data. What machine learning does, is that it provides you with very standard algorithms that you can use to obtain some kind of insight from the data. The kinds of insights that you would get from the data tend to be predictive in nature. That's the way I distinguish between things like business intelligence, which is about historical data, trying to figure out what happened and machine learning, which is about training the machine learning model on older data. That's true, but to be able to apply that model to unknown data, to be able to predict with it. Of course, the predictive insight by itself is not very useful. The point of machine learning is that you can make these insights repeatedly, quickly in an automated way, and use the insights to make decisions at a velocity that would not be possible if you were doing manual data analysis. So, when you think about machine learning, you basically think in terms of what you want to accomplish with it. So, let's say you want to accomplish is that you have a bunch of images and we would like to know what's in those images. So, what you want is that the output here, given this image, ought to be a cat and the output here needs to be a dog or the output here in the third image needs to be a car or the output of the fourth image needs to be an apple. In order to do that, we need examples. An example in machine learning terms, is a combination of an input, the input for which we want an output and a label which is the true output, the thing that we know what it needs to be. So, for example, we will have for this image the label cat, and the second image the label dog, and the third image the label car, and the pair of label on input together form an example. So, when we're training a machine learning model, we're training it with examples which are combinations of labels and inputs. And once you have those examples, the machine learning model is a mathematical function that is trained. The way you train the model is that any of the mathematical functions have free parameters, qunable parameters called weights and you adjust those weights in such a way that the output of the ML model, given the first image is hopefully cat, and the second image is hopefully dog. Now, if we had trained this model in such a way, given this image the label would be grass, then that's what the machine learning model would learn. So, what the machine learning model does, is that it learns and labels for a particular image. It basically figures out a function such that this function given this input or given any of these inputs is going to be the corresponding label. And the idea is that given such a function, we can now give it a new image, an image for which we don't know the label, and the resulting function will give you a prediction and that prediction is going to be the right output for this image. Notice that the cat image here is a different cat image than what was shown to the ML model during training. So, the whole idea of machine learning is that you have a large dataset of labeled data. You have a dataset consisting of inputs and the corresponding labels, and you use this training dataset to adjust the machine learning model in such a way that given an input, the output for that input is what the original label was. And then, you don't need the training data anymore. All you need is a model and you can take that model and apply it to an arbitrary image, and hopefully what you get back is what you have trained the model to do on an image like that image. So, you predict with a model that has been trained. As a data engineer, you must focus on both the training stage and the inference stage. Many data scientists will do the training and forget that the whole purpose of machine learning is to make predictions and to do this in a timely way. So far, I've used these words in pretty intuitive terms. So, go ahead and make sure that you understand exactly what these terms mean, what is a label? What is an input? What is an example? What is a model? What's training? What's prediction? If necessary, pause the video a bit, try to answer these questions, and then I'll move on. So, what's a label? A label is a correct output for some input. This is what you train the model with. The label is a correct output for an input. So, what's an input? The input is a thing that you will know and that you can provide at the time of prediction. These are things, for example, if they're images, the image itself as an input. So, what's an example? An example is a combination of the label and the input. An input and its corresponding label together form an example. So, what's a model? A model is a mathematical function that takes an input and creates an output that approximates a label for that input. So, what's training? Training is this process of adjusting the weights of a model in such a way that it can make predictions given an input. And a prediction is this process of taking an input and applying the mathematical model to it. So, to get an output, that is hopefully the correct output for that input.