How do we study disease and how do we do this when we're investigating clusters? Just to remind us, a cluster is something that relates to time and space. So, we want to get information about timing. So, the when and the where, the space, and then we want to look at the same kinds of things related to the exposures that we're interested in, the timing and the space. It's very helpful if we can get some information on the rates of the disease in a group of people, or a community, or a state where we don't think there is any exposure, the so called background rates. Then as much as we can do to understand how exposure happens, and I'll talk about that in more detail. We have a variety of tools, scientific tools in our toolbox for studying disease as well as clusters. Though, some of these don't work so well for clusters. The first and most important is the discipline of epidemiology, which means among populations. For that, we can generate information on people who have the disease or cases, and the people who don't have the disease in the same community or in comparable communities. We're also very very concerned about getting precise and quantitative information. How many people? When exactly did it happen? How long did they live there? That's a discipline we call Biostatistics, which is just numbers in service of biology. Now, we also have developed relatively recently very powerful tools of mapping. So, for spatial as well as temporal analysis, and we can actually put these together. We can get a visual image of what the cluster looks like. Exposure information is sometimes the most difficult to obtain. I'll explain why that might be. But why don't you think about why exposure would be more difficult than the disease? Once we have some information, let's say we think it's this chemical that was released into the drinking water that was present when these people came down with the disease, then we want to make sure is this chemical known to cause cancer? Is it known to cause birth defects? That will help us elucidate all the outcomes or diseases that we might need to study. So, I said epidemiology is the study of the incidence and distribution of health and disease over space and time in humans. We also do epidemiology in animals, and we call that epizoology animals. So, these are just representations of various kinds of information you might have. For example, at the top you see a population in which there are people in red who have a disease, people in blue who do not, and some that we were not able to count or include in our study. Our graphs might represent the numbers of cases of disease over time, or they might show a distribution by whether men, women, old, young, or some other distinction. Because we really do need to get an idea of who in the community has the disease and who does not, both of these are really important. How do we get this information? Well, this is where communities can be extraordinarily powerful at assisting us and assisting yourselves. A lot of this information actually comes from surveys, going around the community with a piece of paper, knocking on the door, and asking people a certain amount of questions. First off about the disease, and then later more detailed information that we'll come to. Now, often but not always within a specific state or within a community, there may actually be health records, which talk about the incidents or when diseases have occurred such as cancer within that particular community, county, or state. We also have certain registries in the United States. Though I have to tell you, we have no national registries which is a tremendous impediment to doing these kinds of studies. We have state registries and we have partial registries, for example, for birth defects and cancer but they are not national. Now, when you're at the door asking people questions, it's important to get some other information as well. You want this from both the people with the disease and without the disease, their age, their sex, their behaviors, do they smoke? Do they drink? Are they vegans? Are they mediators? And how long have they lived in the community? These are critically important pieces of information that are really best obtained by people who live within the community. In addition, there are other kinds of surveys we can do. We may want to go back in time. This is very important for cancer because most cancers take a long time to develop between exposure and a diagnosed disease. So, I don't just want to know what you're being exposed to or how long you've lived right now. I really want to know when did you move into this community? Where did you live? How many different places did you live? Where did you work? How many different places did you work? This kind of information really only comes from talking to people about themselves. When you get going to do any kind of work, whether it's a formal epidemiologic study of the kind that I do, or community cluster investigation, set up a way to manage all your data, so you've got it all at your fingertips and you don't lose it. Remember, you don't really need to collect names from anybody in this kind of study. That's not the most important thing that we need to know. We've got all these information now, and now we try to see just by looking at all the information we got on residence time, on what kinds of diets they have, where they worked, whether they smoked, et cetera, to compare the people who have the disease and the people without the disease, to see if something jumps out at us as being particularly common in people with the disease. We also, if we can have information on this topic, compare the overall disease rates within a community or a county with information from state and national disease databases. But I want to stress, these are not as good as they should be. We can particularly start to look at certain risk factors that we might suspect. For example, for cancer. Where did they work? What are the characteristics of their lifestyle or their diets? Then this kind of information can also help us make a map. This is a technique that's really very recent. You can see how powerful it can be. We can draw maps in which we look at whatever factor we think might be important. First off, the distribution of disease. But this again is looking at the age of housing, which is a predictor of whether or not there's lead in a house. So, it relates to lead exposure. But here instead of going out and measuring lead levels in children throughout the state of New Jersey, all they did was map housing age. I'm showing it in three different levels. The most informative but maybe overly so is based on census walks on the left-hand side. So, there's the census information. You can get all this information online from the US census for the entire state of New Jersey. Once again, you can see that there are clusters of these older houses. Now, in the middle or backing up a little bit, and we're going by zip code. Zip code is a larger geographic area than a census tract. So, we're getting more people here, and you can see therefore the clusters are larger because we're including more population and more houses here. Or, we might even want to just look at the county level. Here, you can see the clusters are very clear. They are really resolving down to four distinct areas within the state. That might be good enough for our study. It really depends the kind of question we want to ask. Why would you use these different filters? Well, the census tract is going to be most sensitive for community level information, and you can also build in information from the census tract division or from the US Census on socioeconomic data. So, you get information on age, you get an information on income, you get an information on ethnicity, which could be very useful. Zip code is probably most important for communication, because you get a lot of information that's tied to zip code in terms of people's behaviors. County, is probably most important to policymakers, because that's the level at which state and national policymakers really operate. So, all of these really are information, they're all helpful, but they may be of different value depending on what you're looking for. Now, in terms of time, we have some remarkable examples of how important this kind of information is, and this is one of the most important. This is the famous episode in London, that took place in 1952. Now, remember 1952, we didn't really even think about the environment, but something happened that was so clear and so out of the ordinary, that it drove people to understand that the environment had to be considered. So, over on the right hand side, this is information that was collected from two sources. The bottom line which is dashed which is called smoke, comes from the British Weather Service and they were taking measurements on every day, several times a day as to the levels of quotes smoke. We'd call that particulate matter now in the air. The hospitals, were getting together within London because as you can see, within a very short period of time and these are days on the x-axis, the death rate as it presented itself in hospital, sky-rocketed it from about 200 up to almost 1,000. Why were these two things put together? Well, people literally knew that the air pollution had increased in London at the same time, you couldn't see through the air. Traffic had to be canceled because of the danger of accidents, this was what was called pea soup fog or smog. So, this was one of the most dramatic examples of a very clear association in time between a specific exposure and a deadly outcome. So, that's a picture that it really doesn't take anybody to apply a whole lot of math to. But in fact, biostatistics or the application of mathematics to biology, allows us to see these kinds of linked events much more clearly when they're more subtle, they're not as dramatic as the London smog. These are mostly quantitative, but we have developed methods where we can take in people's narratives and extract the information from what people tell us they experienced or know. Someone, for example, may say, I used to see a truck coming from that factory and they drive out in the woods and they dump something, and I don't know what it was but there was some kind of smell. Now, that's very important information. We want to be able to include it and we do have ways that we can. When we're designing our studies, we also pay attention to biostatistics. So, we make the best study that we possibly can. The real purpose of biostatistics and epidemiology is the same, to understand the connections between disease and potential exposures within a population or among populations. Well, statistics doesn't always have such a great reputation and this is the first great statistician, biostatistician in the world William Farr. The King of England didn't like much of what Farr had to tell him. So, he said, "Here comes Dr. Farr, lies, lies and damn statistics." Which means that there's always been a warfare between people who want to hear what's going on and figure it out and others who do not. But the most important aspect of statistics is that really allows us to challenge our own assumptions and to test them with actual information. Now, the mapping, the space and time point of view, I showed you the time some very dramatic examples of time, but even simple maps can help understand things very clearly. For the time we need to know several aspects of time, when did the disease occur? This can be difficult, there are people who have had the disease, when did they first know they have the disease? Ideally, we'd like to have information on a medically confirmed diagnosis, but we don't want to discount the value of personal experience and personal knowledge. It's very useful to make a time chart when you get this information together and see how many cases occurred over time. It's not going to be as dramatic as the London smog, but if there is a time difference or a temporal cluster, it will show up when you make that kind of a chart. I want to stress this throughout my talks here, and that I have tremendous respect for community-based detective work, which many of you may be interested in gaining more knowledge and skills to carry out. I say that because I believe very strongly that communities know more about what is not measured or written down. Now, along with the community and the local knowledge that members of the community have, there are also local newspapers, and there may even be oral histories that have been carried out. Oftentimes by the way that's an assignment, in schools for kids to do an oral history of their family, there can be very important information there. There may be historical records from your town or county within the public library, and then you can go bigger and you can go to national government sources, which even provide data at a community level. In fact, the US Environmental Protection Agency has a website where you can get data on air quality based on census track and you can get maps of those toxic waste sites like the ones I showed you, and you can even get information on toxic waste sites for each town or city. Water quality, some of this information may be available from your local water supply company. So, all of this information which goes from the more individual oral history all the way to government diagrams and databases, is important for the community-based detective. So this again is showing you that picture of the toxic waste sites and I'm giving you some of the websites where you can go to yourself and find out where these are located and you can enter your own town, you can enter your zip code, you can enter your street address. These maps will be extracted to provide you information on potentially important exposure sources. At the local and regional level, you may go to your state because each state has to monitor water and air quality, and again, that may be presented in a mapping format like this or you may have to draw the map. In addition, there are other permits that the Environmental Protection Agency has to issue for people to have releases from a variety of sources that might contain chemicals, and that again, you may have to map it yourself, but you can get the information to do it. Now, at a very even higher level, and this is here if you have the opportunity to work with your State Department of Environment or Department of Health are within university Department of Health, that we can even get information on individual risk. If we have this opportunity to collaborate with people such as Sir Johns Hopkins. Here, we're looking at exposure in terms of how it presents itself in an individual. These are called bio-markers or the measurement of chemicals within a biologic compartment, and as shown in the second line that could be blood, I could look at blood lead, could be in the urine, or a slight present in urine, hair, mercury presents in hair, fingernails, a whole range of things show up in fingernails, saliva and other things that are taken from humans without tremendous stress upon a donation of, say, fingernails or blood. There's a remarkable amount of information that's there and it can relate both to the disease as well as to the exposure. But, it's not always as simple to interpret this information as we might want. For example, some chemicals don't stay very long in our bodies like solvents, and you may have been exposed to solvents and had some effect on liver function let's say, the chemicals was so longer in blood or urine or another compartment, but it doesn't mean you weren't exposed. The other point of course, is that some exposures and the onset of disease are very far apart in time like cancer, where it may take up to 20 years after somebody starts smoking for lung cancer to appear. Other chemicals such as lead stay in the body for a long time and that makes it easier for us to use a biomarker. But the biomarker is never going to tell us when the exposure started. It's just going to say this is what's in the body at this point in time, and we don't really know it could've been going on for 30 years, could've happened yesterday, and we don't know what is peaks were, that is its magnitude because that's all washed out within the biomarker over time. But it is very conclusive information that exposure has occurred, but we can't rule out the exposure just because we can't measure it.