In this video, we're going to be talking about the machine learning process lifecycle, or MLPL, and what it looks like in general. We've built this process based on our collective experience with many different projects for many clients in many different industries and contexts. It describes the stages we must go through when building a machine learning model, from defining the problem to signing off on the model's performance. Having the process laid out clearly like this is very helpful, not only for our own tracking and management, but also, to communicate to the other stakeholders what the whole process of defining and building a qualm looks like. The process is divided up into four phases, business understanding and problem discovery, data acquisition and understanding, machine learning modeling and evaluation, and finally, delivery and acceptance. These phases are iterative, but an important thing to know is that you can't skip ahead. When you run into problems in the modeling phase, you can revisit your data or revisit your problem definition, but anytime you go backwards, you have to repeat the next steps. More on this in a bit. There are other life cycle models that spend more time on deployment or how the model is incorporated into a full-fledged product, but that's not something we get into here. This process takes you from problem definition through to acceptance of the model or qualm specifically. Let's go through each of the stages in a little more detail. The first phase is business understanding and problem discovery. This phase is about understanding the business context of the machine learning project, including identifying relevant stakeholders and coming up with a clear specific question for the qualm to answer. This is also when you should develop a clear picture of how the qualm is going to be used and evaluated. You also need to consider things like what data is available, and whether there are specific requirements, such as an explainable model. At the end of this phase, you should be able to document what questions being answered, explain what constraints there are on the solution, and have a clear outline of the evaluation process. Next is data acquisition and understanding. This phase is for gathering the necessary data, making sure it's in a form that can be used for machine learning. You should also make sure that it's possible to answer the question you're looking at based on the data you have. And as we mentioned, data cleaning is always a significant portion of the time spent on a machine learning project. We'd love it if we could spend all our time testing and tweaking models, but in fact, it's data wrangling where most of the project time is spent. Future courses will go into data issues in a lot more detail. After acquiring and understanding your data comes machine learning modeling and evaluation. This is when we select which machine learning algorithm is most appropriate, select features, and build our qualm. That isn't the end of the phase, though. We also need to test the qualm to make sure it's doing what we want it to do with high enough accuracy. Testing will show how it's not actually doing what we want it to do. So this phase involves a lot of iteration, modify, refine, tune, and test again. Then when we have a qualm we're happy with, we move on to the fourth phase, which is delivery and acceptance. You can think of it as ensuring the qualm we've built satisfies the requirements of the stakeholders identified back in that initial phase. In some cases, when we're building a system for a client, that means handing it over with documentation of what we built and how we built it, and also, training those people who are going to be using and maintaining the system. Even when you're building a system for internal use, the delivery and acceptance phase matters. It's rare that the exact people who built the qualm are the only ones who are going to be using it. A smooth, well-documented transition takes special care. An important thing to note with this machine learning process lifecycle is that cycle is part of the name. People new to machine learning often think that building models is a simple linear process, but it pretty much never is. The four phases have to happen in the order we just described, but you also often have to go back and start from earlier in the process. For example, you might go through phase one and have a great question that's valuable for the business and appropriate for machine learning, then get into phase two and discover that the data you have access to can't actually answer the question you've identified. So you have to go back and figure out a different question that you can answer. Or you get all the way through building a qualm and discover that the business priorities or the business environment has changed, and you need to start again with another question. Once you've revisited the problem definition, you have to go through data acquisition for this new question. It may be faster the second time around, but you can't count on it. Lots of things cause a lifecycle reset, and not all of them are possible to anticipate. Like most complex projects, building machine learning models requires iteration. Having a clear process to follow gives both the development team and the stakeholders a clear picture of where you are in the process, what's done, and what remains to be done. This greatly improves communication and efficiency, and can also significantly decrease frustration and misunderstandings.