Okay. In this module we'll be diving deeper into deep convolutional neural networks. These networks are powerful tools for doing image analysis, and have been very successful in real-world applications especially in medical image analysis. I want to highlight here, one very successful application of convolution neural networks to Diabetic Retinopathy Classification. Diabetic Retinopathy is a disease that affects the retina of diabetic individuals where in the blood vessels deteriorate over time, leading to blindness if left untreated. But can be caught early, if a trained ophthalmologist can find structure, in the unhealthy retina, that distinguishes the unhealthy retina from a healthy retina. So, here on the left,you see a healthy retina, and on the right you see an unhealthy retina in a patient with Diabetic Retinopathy. I hope that you can see the differences between these two retina. So on the right, in the unhealthy retina you see little hemorrhagic spots, on the left side of the retina, you see neovascularization close to the optic nerve that bright spot, on the right side of the image. Together, those are telltale signs of this being an unhealthy retina that if left untreated could eventually lead to blindness. Ophthalmologists are trained, to distinguish this retina from the other, trained with several years of experience in medical school, and during residency. They gain supervisory signals from their colleagues to help them formally distinguish this unhealthy retina, and healthy retina between healthy and unhealthy patients. But because these convolutional neural networks are so good at doing image classification, they can learn the features that distinguish the unhealthy retina from the healthy retina, and thus have a machine performance classification, in a way that circumvents the need for a trained ophthalmologist. So, there is a convolutional neural network that has been applied to this problem, it has worked very well and was published in a study in the Journal of American Medical Association in 2016, and I just want to show you here the money figure from their paper, that's showing the results of a trained neural network during this Diabetic Retinopathy Classification, so trying to find a healthy retina versus an unhealthy retina. To explain the results, I need to unpack the axes here. So sensitivity on the left the y-axis, and one minus specificity on the x-axis on the bottom. What do these figures mean? There are common metrics within the medical community. Sensitivity is a quantity that helps define whether or not a doctor or a machine is doing well at finding the positive examples in a dataset. So, an unhealthy retina if we call that a positive, the name of the game is to try to find as many of the positives as possible out of a large dataset of both positive unhealthy retina examples and negative healthy retina examples. If a doctor comes through, and labels a large set of images healthy and unhealthy, we calculate the sensitivity by looking at the number of true positives, the number of true unhealthy retina in that dataset the doctor was able to find, and dividing that number by the total number of true positives in the dataset. We want this quantity to be as high as possible. However you should be able to see how you can potentially cheat this metric. If you just label all of the examples in the dataset as positive, you'll get a very high sensitivity score, but this will clearly results in a lot of false positives as a doctor is classifying all the healthy retina as unhealthy. So, we need a complimentary metric to help distinguish these false positive rates from false negative rates, and this is specificity. So, this is the complementary metric to sensitivity where, within the labeled datasets, where you're calling a retina healthy or unhealthy. The true positives go into that sensitivity metric and the labeled true negatives go into the specificity metrics. So, you count the number of true negatives, the number of healthy retina the doctor was able to correctly identify, and divide that by the total number of true negatives in the dataset. So, in theory you want specificity to also be as high as possible, together with sensitivity. In this graph here on the left we're plotting one minus specificity. So, because we want specificity to be high, one minus specificity we want to be low or closer to zero. This means that the best value for a classifier of these images is up in the top left corner of this plot. Okay, so now let's revisit this plot. So the colored circles in this plot represent eight individual ophthalmologist, we did this classification task on a large set of healthy and unhealthy retina, and you can see this as a distribution of their ability to do this classification. So the purple doctor dot in the top left hand corner is probably the best ophthalmologists in this group, but there seems to be some variation amiability in this trade off between sensitivity, and specificity so how well these doctors are doing the classification problem. So, those colored dots are the doctors. If you train the convolutional neural network, and we'll get to the details of how you do this later. If you train this network to do the same task in a large set of training images with labels, you can get the machine to perform, according to this black line here. Okay. Because we just said that it's better to be up in that top left hand corner, you can see that anytime that black line is closer to the top left hand corner than those colored dots the machine is beating, the doctors, the ophthalmologists, and in this case you should be able to see that, the machine is beating 80% of the ophthalmologists on this particular task. So, that's a very powerful result, and we think for this particular task machines may be aiding in the diagnosis of this disease, and it also promises, a very promising future for these types of convolution neural networks, and other types of medical image analysis tasks including, radiology, and pathology. Okay. So in addition to image classification, so that was just an example of your classifying the retina as unhealthy or healthy. These convolutional neural networks can also be used to segment out particular features of an image, that might be important for a downstream user. So, imagine you are a TSA screening agent at the airport, looking at images of X-rays through passengers baggage, and the goal is to find prohibited items, in these bags. But you've been sitting in front of these monitors for very long time, you're particularly tired right, it might help to have a machine highlight regions of interests, that correspond to those prohibited items in those bags automatically, and it turns out you can train very well a convolutional neural network to do this task. So on the top here you're seeing, the predicted positions and bounding boxes of guns in multiple different views of a particular bag. So these red boxes are highlighting a toy pistol in this case and some of these bags. Little yellow pixels, if you can see them around the gun are actually this sort of fine grain tracing of that objects in the bag. In the bottom you can see a similar analysis performed on pocket knives, and other type of prohibited item, it's not allowed through the TSA Screening checkpoints. So, the convolutional neural network because it's so good at extracting information from images, it can do this task, and potentially be employed by the TSA to help augment the job of these TSA agents at the screening checkpoints. So this is another powerful application of this tool. So, in addition to those applications, this is maybe a segmentation task taken to the extreme. So this is a network that can read in a video, so in the bottom right hand corner of this video. So this is just a single camera taking a video of a scene, and the goal of the network is to isolate all the humans in these videos, segment them out like kind of like the pistol in the previous example. But instead of just outlining the humans a sign of a dense map of key points that correspond to different anatomical locations on the human to form a mesh, a 3D mesh, that's specific for each of the humans in the video. Okay, and so this is a very challenging task, especially in these single camera views, and typically these meshes are developed using very expensive motion capture technology, and very awkward scenarios. Okay? The particular application of having such a mesh, and why people use the current technology which are these motion capture suits, is that you'd like to develop these message so that you can apply textures in the entertainment industry onto these humans to develop new video games, and new CGI animation for movies, and these meshes may also help interactions with new autonomous vehicles, robots and things like this in the future.