Next, let's talk about health care applications. The first one is this Phenotype Discovery paper, Using Unsupervised Feature Learning over Noisy, Sparse, Irregular Clinical Data. Here are the authors. In practice, this phenotype patterns from large clinical data is very difficult to obtain. So traditional approach uses supervised learning. The expert has to define which patterns to look for, by specifying the learning tasks, or the class label, and where to look for them, by specifying the input variables or features. Then you want to learn this mapping, but in this paper, they use this auto-encoder to learn those patterns in an unsupervised way. So more details about the paper. In this particular case, the data they are dealing with, is this lab measure called "serum uric acid", and they want to use this lab measures over time, trying to figure out, can you classify whether a patient has gout or acute leukemia? So several challenges they have to deal with. The first one is there are a lot of missing values, right. This lab measure, "serum uric acid" are not measured every day. So giving a patient trajectory, for example, over here in these two patients, you can see these solid dots. Those are actual measurements of a lab measure on that day, and the corresponding value. But in many days you don't have observation. So for example, between this period you don't have any lab measure, and the first thing they have to figure out is, can you impute those missing data accurately? So they use this algorithms called, Gaussian process regression, to impute those values by just using those observation at different point in time, but then you can fill in the missing values, curve is the output of the Gaussian process. It tells you what the most likely mapped value at that time. One benefit of Gaussian process is it also not only give you this measurement of possible values, but also the variance of that. So you can see this, the envelope over here tells you the variance. For example, here the variance is pretty large. That indicate we're less certain about that region, because we don't have observations for quite some time in that time period. On the other hand, in this period we have a lot of the observations, and the measurement will be more confident. Okay, so that's step one. Impute the missing value, so we have daily estimate of this "serum uric acid". Then the second step is to discover those patterns. The phenotype discovering process. Here's the pipeline for doing that. Step 1, we already talked about using Gaussian process to impute this daily lab measures. Essentially we will get a sequence of means and standard deviation at daily level. Then we'll take a 30-day window, I slide this throughout this duration of this patient. Actually we do it for many patients. Then we'll get many, many the 30-days window. Each 30-day window give us 30 dimensional input. That's our input vector. Then that input will pass through a sparse auto-encoder, which we have introduced in this lecture, to figure out the hidden layers, right? In this case, they actually have two hidden layers. So you can imagine this as also a stacked auto-encoder with two levels. The loss function, they kind of modify that a little bit as well. Instead of just square loss, they actually also normalize this by a standard deviation. So that's the loss-function. So next, let's look at the result. How well does it work? They take 4,368 "serum uric acid" time series, from Vanderbilt University Medical Center, they're from two different classes. Almost half of them from the gout patient, half of them from leukemia patient. The ultimate goal is, can you differentiate them from each other? Once it goes through this process, and they look at those hidden variables at the first layer of the sparse auto-encoder, and here are some of those hidden variables, and they circle some of those patterns that has significant predictive value for the outcome But overall, you can see there's a lot of interesting patterns ramping up, ramping down, but there are also a lot of redundant patterns. But overall, those patterns seems to make some interesting clinical sense. This is first layer. If you look at the second layer, and second layer is harder to visualize, so what they did is just superimpose the first layer patterns based on the weight associated with the second layer, and you can see roughly still some interesting patterns going up and down with certain patterns. It's very interesting. That's the phenotype patterns. The author argue that especially the first layer patterns seems to give some interesting indication of what can happen. It could be the lab bashers ramping up in this 30 days or ramping down, or fluctuation that may have some different clinical meanings and corresponding to the final prediction. With this either first layer or second layer of this sparse auto-encoder as input, they perform classifications on this two class. They can see that they can achieve very high error into the curve, and in both training centers who test that. In fact, the first layer or second layer perform the same, and extra features perform also very well. If you only use a baseline sequence, just take the mean, it performs slightly worse. That shows the power of this unsupervised feature extraction using auto-encoder that can be used to enable some prediction or classification pass down the row. That's the first paper. The second paper is called deep patient, an unsupervised representation to predict future of patients from electronic health records. This is another good use of autoencoder for encode, in this case, electronic health record. So the idea is actually quite simple. What they did is they take a large number of patient records then for each patient, they construct this vector of events column based on medication, diagnosis, procedure, and labs, and so on. You can imagine this or high-dimensional vectors that when even a prison there will be one otherwise zero. Giving this multi-hoc vector as input, they applied a stacked denoising autoencoder, and the first half of this encoding process and have this decoding process, and then trying to reconstruct the original input. Here's what they managed to do with this hidden representation using auto-encoder. They take 700,000 patient data from Mount Sinai Hospital. Each patient has multiple data modality, diagnosis, medication, procedure code, lab, clink notes, and demographic information. Very rich set of information that you can extract from electron health records, then they want to classify or predict output labor, which is this 78 diagnosis code that they consider very important, and they use the sparse auto-encoder features to train a set of random forest classifier to classify this 78 diagnosis code. Here's the performance of those classifie. Each row is a disease and these columns are three different methods. The patient is the one that they're using the denoising stack, denoising auto-encoder that performed the best and achieve, in many cases, 0.8 or above error under the curve. Well, if you just use the raw features worse, and even you use some other more classical unsupervised dimensionality reduction like principal component analysis, or PCA, still do not perform as well that's the main message. From this paper, you can use autoencoders trying to extract patient representation in this unsupervised way, and they can use that representation to support many different downstream prediction task. Here's the cushion of this lecture. We talked about autoencoder, which is unsupervised way to build a neural network. We discuss several variants of autoencoder, sparse auto-encoders, denoising autoencoder, and stacked auto-encoder. We introduced two health care applications, which uses autoencoders to learn a good representation from patients.