Hello, everyone. This lesson stresses the importance of starting an analytical project with clear requirements about what business problem needs to be solved. Sometimes this might involve the decision to purchase algorithms and ready to deploy models. Other times, an analytical team might decide to create their own algorithms. Regardless, it is important to clearly define what problem needs to be solved and then identify what types of approaches will most efficiently lead to the desired outcomes. Assuming the project plan is to create algorithms within the organization, it is critical to create an analytical plan that identifies the critical steps necessary to achieve the objectives. These include considering the feasibility and ethics of a project or considering the degree to which data will need to be aggregated to achieve the desired model inputs. All this planning will lead to more efficient data extraction and transformation processes, and we'll save time and money down the road. At the end of the lesson, you will be able to communicate to analytical team why they need to carefully define requirements and create an analytical plan before they start to extract and transform the data. Let's get started. Allow me to remind you about the importance of clear requirements. It is okay to have a complex analytical plan but only if the complexity is deemed important to complete the requirements. First, it is important to work with business analysts and domain experts to fully evaluate and document what leaders want to achieve. As programmers and analysts are well aware, it can be very inefficient to create datasets and algorithms only to learn later on that these really do not satisfy the project requirements. Thus, be sure to ask many questions about the objectives. Once you have clarified the essential requirements, think clearly about the workflow processes and associated data that will be used in the analytics. Next, remember the common advice, keep it simple. Even Einstein who grappled with the most complex problems one can imagine often advise people that everything should be made as simple as possible but not simpler. There are two main reasons why simplicity is important. First, simple models are often performed as good or sometimes better than more complex models. Second, complex models are harder to interpret and might have a greater risk of being over-fit. Thus, although counterintuitive, simple models are often wrong but still very useful. Complex models are also often wrong but maybe less actionable or more challenging to interpret. In some, start simple. If you need to, you can always add complexity, and you can add different approaches later on. In this section, I want to spend a few minutes reviewing the often challenging decision about whether a team should build their own algorithms, purchase algorithms, or follow some type of hybrid approach. Let me use a simple example involving health care quality metrics to drive improvement. Consider some of the analytical roles necessary for organizations to create and implement quality metrics. First, health analyst are sometimes employed in research groups that develop quality measures. For example, university faculty, quality measure vendors, and government agencies often work together to create quality measures. They almost always need skilled analytics experts to help them perform research and evaluation about the trade-offs of adopting various inclusion and exclusion rules. Moreover, the teams need to think about how their standardized measures can be implemented among health care systems with various data systems and code sets. Second, organizations often prefer to use pre-existing quality metrics that have been developed by commercial or government organizations. One potential worry is that by creating quality measures de novo, that is starting from the beginning or a new, will be too costly. In addition, it might lead to criticism from other organizations that these nonstandard measures are not comparable. Of course, as simple as it might be to deploy existing metrics, it is almost always much more complex to extract and transform the raw data so that it can be used as inputs to the standardized metrics. Thus, it is necessary to understand how to prepare and transform data to meet the input data specifications of the measures. As a result, analysts need to become data quality experts so that they can carefully evaluate the input datasets. In some, since the algorithms are often provided, the real work is to decide the degree to which the inputted data are populated and reliable. Finally, it is important to understand the different programming approaches available to create measures. For example, some measures are provided in proprietary statistical code, such as statistical analysis system or SAS software. Others come in structured query language or SQL or within prepackaged applications. Overall, there is much to consider when deciding about buying or building performance metrics. While each situation is unique, the following list highlights situations within organizations where I would lean towards buying algorithms. For example, few or no analysts are employed with experience in creating algorithms. Next, the desired outcomes of interests have been captured with a standardized metric. Third, deadlines and time pressure require algorithm output relatively quickly. Here are a few situations in which in-house development might be more appropriate. First, building expertise and creating algorithms will allow organizations to be a leader for other organizations who might have fewer analytical staff. Second, the organization believes that with their data and experienced analysts, they can improve on existing quality metrics than already exists. Finally, creating metrics de novo of new will improve provider buy-in and this will help with quality improvement and driving behavioral change. In this section, let me briefly review research or analytical plans. Let's start with research plans even though these are applicable to a broad type of analytical project. Research plans are iterative and ongoing, but I'm going to break it down into six steps. These nearly always start with a general interest in a topic and then a review of preexisting literature that is published and available in the library or online. With more context about what is known about a specific topic or problem, analysts and domain experts can then form specific research questions that they want to answer. Once questions are defined and the literature has been reviewed, researchers create a study design, that step three. Next, it is wise to consider what effect sizes one would like to see and then define these effect sizes. Then calculate the required sample size to capture such effects. Finally, after all those steps, a statistical plan is specified. The important point is that numerous steps occur before the statistical plan is even considered, let alone before actual modeling of the data begins. When creating a research plan, another framework that might also be helpful is called the FINER Framework. FINER is an acronym that stands for Feasible, Interesting, Novel, Ethical, and Relevant. Evaluating your research plan by these criteria allows a researcher or analyst to think about exactly what they want to learn. Then they will be able to evaluate what goes into their plan and what should not belong there. It also helps identify any possible roadblocks such as not being feasible or interesting. The steps that I provided for creating a research plan and the FINER Framework or approaches that might be better suited for clinical research rather than business or operational analysis. But it's likely that both of these concepts are helpful in nearly all analytical domains. I recommend that you try applying them in the process of developing your analysis or research plan before you attempt to build or modify various types of algorithms. Now, let's talk briefly about an analytical plan that is commonly used in health care. This is the S-E-M-M-A usually pronounced "sima" that stands for Sample, Explore, Modify Model, and Assess. SEMMA is like a research plan. However, it is more general in scope is to compare it to more scientific and academic research that is often performed in university settings. SEMMA shares some similarities to the guidelines I just shared to help you develop research plans but these are driven more by industrial analytics for projects such as quality improvement. SEMMA summarizes a list of data mining steps developed by the SAS Institute. I've directed you to their website and the resources. You may have covered this framework before, but I want to remind you how important each of these steps can be. The process includes five key steps in this order. The first step is S for sample, the second is E for explore, the third M is for modify, fourth is another M for model, and fifth is an A which stands for assess. Let me discuss each in a bit more detail. S for sampling is important for validating models. A test or hold-out sample allows an analyst to evaluate if their models do well on a separate and hopefully independent dataset. E for explore involves descriptive statistics that help an analyst get know their data. Next, to modify usually involves data transformations such as standardizing variables dealing with missing data and handling outliers. Fourth is the next M for modeling. This is what data scientists often love to focus on. We will not be covering all of this here, but from experience I can tell you that modeling is often the least important step. As you have heard many times, garbage in garbage out. So, you need to assess the quality of the data. Clearly, modeling is not the only important step. Using the SEMMA acronym will help you keep modeling in its place, along with the other parts that are needed to come up with your analysis plan. Finally, although not listed in the SEMMA approach, development of the final model is critical. Models are useless unless people within organizations or other groups can use the information that they generate. Model deployment can be simple rules or a scoring criteria on paper or they can come into more complex decision support systems. That concludes this brief lesson on analytic planning. In the next lesson, I will review a topic that is often the core of data science and analytics, data mining and predictive modeling. So see you soon.