[MUSIC] Welcome to my first lecture in this course. My name's Raymond St. Leger. I'm a professor at the University of Maryland, and have a broad interest in the sciences, particularly biology. In this and subsequent lectures, I'm goinna be telling you about genomic science, and how it's removing limitations to understanding and modifying life. It's brought us on the verge of a revolution with enormous implications. And that revolution, of course, is biotechnology. And its path is focused on deciphering and manipulating DNA, the information system that defines what we all have in common, and what makes each of us different. Now, part of being human is that we ask ourselves fundamental origin questions. Where do we come from? What is the meaning of life? And for most of our history, these questions were philosophical or religious. But then we developed the scientific enterprise, and we could ask questions such as, how does DNA instruct a fertilized egg to create a body with a self conscious mind? What are the origins of the human species? Is there a genetic basis for disease? Well, we ask these questions, we've asked these questions for centuries. We asked these questions 30 years ago, but even then, we didn't have the knowledge and tools to address them. So the leading geneticists of the day decided that we needed a new way of doing biology. The solution they came up with was, of course, the Human Genome Project. It was sold as a wonderful adventure. Reading all 3 billion base pairs of sequence in the human genome, developing tools to read the millions of DNA differences that distinguish people from each other. It was sold as the cure for everything that ails ye, but the salesman's job was really of course directed at the United States Congress because the scientists involved needed billions of dollars, which they got in 1990. 13 years later, they had a genome. Well, somehow those 3 billion As and Gs and Cs and Ts contain the full instructions for making a human a human, one of us. But they're hardly a simple recipe book. The genome was there, but we had little idea about how it was used, controlled, organized, much less how it led to a living, breathing human. The essence of the age of now, what it means to be a scientist now, the unifying concepts of now are big data and information, which translates, of course, into big data and computers. And that's gonna be true in physics as much as biology. We need to search within huge datasets, to find things that actually matter. The sequencing of genomes has revolutionized biology. But the key thing is now how do we turn that data into understanding, and not be submerged by it? So we can actually use big data to understand ourselves better. Now, thousands of scientists working on a very broad front have been tunneling into the genome to find what the genome does and how it makes you you and not the same as someone else. And of course, also to find the genetic basis of disease, which, like just about everything in genetics, has turned out to be much more difficult than anyone had supposed. So far we've barely scratched the surface. We're looking at a science project for the next 100 years. So let's start with the basic statistics of our genomes as we understand it today. Before the genome, most scientists expected we'd have at least 100,000 genes. Well, it turns out the latest estimate is we have about 20,687 protein coding genes. So actually, we have fewer genes than many plants. However, as explained by Dr. O'Brien, in most cases or genes are broken up in to several short stretches of sense that each contain the information to make a piece of a protein. And these short stretches of sense are interrupted by long stretches of non-sense. We call the sense bits exons and we call the non-sense bits introns. And after the gene has been transcribed into a working copy made of RNA, and before the RNA has been translated into protein, the introns are removed from the RNA in a process called splicing. So all the sense bits with the information to make protein are now joined together. It's been crucial for evolution that genes are made up of coding parts, the exons, with intervening sequences, the introns, not coding for anything. That's because you can mix and match the coding sequences by stitching the exons together in lots of different combinations, to massively increase the number of proteins you can get from a single gene. So exons then, are often then likened to choose your own adventure books. You can read them in different orders, you can start and finish at different points, you can leave out chunks altogether. So most of our genes do this with excellence. If fact, it's been calculated, the average gene produces six different transcripts in a cell dependent fashion, which means that genes can produce different proteins in different cells. Now if you define genes as centers of DNA that encode protein, then less than 2% of our genome is genes. The other 98% is often called junk DNA. And that sounds very demeaning, doesn't it, to the vast majority of our DNA. Makes it sound like a great big waste of energy and raw materials to produce. So why do our cells make so much junk DNA? Well first of all, bear in mind, junk isn't the same as garbage. Garbage is stuff you put outside each week, for the garbage men to pick up. Junk is stuff you put in the attic. You might find some useful. Well, amongst that so-called junk DNA, about 5% of your genome is those introns that allow us to mix and match our exons and multiply our proteins. And in that so-called junk DNA, 20% of your genome, regulatory sequences that switch on the genes and get them to do their stuff. Now Dr. O'Brien has already told you something about gene regulation. Our genomes encode a rich toolkit of regulatory molecules that turn on or off the expression of genes, as appropriate to the occasion. The way these switches work is pretty straightforward to explain, but it's incredibly intricate in detail. So next to each gene is DNA that doesn't code for protein. Instead it functions as the genes on/off switch. They call this switch the promoter. And promoters have docking sites for regulatory molecules, usually proteins. When the regulatory molecules bind to the promoter, the gene switch next to it is either expressed, which means it's turned on, so it makes an RNA transcript, or it's prevented from being expressed. Now, these regulatory molecules aren't permanently bound to the promoter. They can come off and then back on again, and off and on again, so the genes in your cells are switched on and off, and on again, in a very subtle, choreographed dance that maintains the functions of life. All the cells in your genome have the same genes, but different cells produce different regulators that bind to different genes. So a liver cell is only gonna produce liver proteins, and a kidney cell is only gonna produce the proteins necessary to make it a kidney cell. So very crudely then, the switch on the wall is the promoter, your finger is the regulator and gene expression is the light turning on and off. Now, Dr. O'Brien has mentioned, as well as promoters, your genome also has about 400,000 enhancers that can influence the activity of genes, sometimes across great distances. Enhancers also have docking sites for regulatory molecules, usually proteins again. And then you've got other types of proteins on the DNA which can actually physically move the enhancer DNA, loop the enhancer DNA, so the enhancer and its regulators are brought into contact with a promoter. And just as I was mentioning when talking about promoters, different cells produce different regulatory proteins for the enhancers. Perhaps cell A creates a protein that binds to one of these enhancers, and what that protein would do is it would get the DNA to fold and come to interact with the promoter and cause high-level expression of a particular gene. And another cell, cell B, won't be making that regulatory protein, and so it won't express the same gene. It's very, very complicated. That's kind of what I'm trying to get through to you here. It's an extremely complicated process. Think of regulatory proteins attaching on and off thousands of promoters and enhancers in the cell. With different regulators being bound to different enhancers and promoters in different tissues, you'll get some idea of the complicated networks of gene regulation that you'll find in a typical cell. And that's why so much DNA in a cell has to be concerned with a dance of gene regulation. So to remind you again, less than 2% of our genome encodes protein coding genes, but 20% of our genome is all about regulating those genes. On top of this, we have about 18,400 so-called RNA genes. Those are genes that produce RNA that does not get translated into proteins. About 13,000 of these specify mysterious molecules called long non-coding RNAs, that may help control the twists and turns of the chromosomes, that give them a complex 3D shape and structure. Others encode little microRNAs that help regulate protein coding genes through a process called RNA interference or RNAI. And what this involves is certain RNA genes produce these little microRNA transcripts that are complementary to messenger RNAs the cell has previously made. The microRNA will bind to a particular messenger RNA. And in that binding process, it causes the destruction of the messenger. It's the way that the cells has evolved to reduce the amount of protein being made from certain genes. But it just sounds like a bizarre way to regulate gene expression. You let the gene be expressed and then you knock down the messenger RNA later on. Well evolution's a funny old process. It works with whatever you've got. Anyway, these unexpected discoveries of RNA genes and RNAi came out from a group of scientists looking at changes in the colors of flowers. And they basically discovered a whole new biology in the genome which is not about DNA but is about the exquisite control of gene expression by RNAs. And it wouldn't have been discovered when it was if it wasn't for those guys working on flower color. Numerous examples like this, where work that might seem airy fairy and esoteric, with no real world application, provide the fundamental insights into our lives. You end up working on a research project, you accidentally discover something completely unexpected, that no one had predicted. That's the beauty and the wonder of science. Now, most individual scientist believe that science is all about curiosity, but from a broader historical perspective, it's also been about solving broader problems in our world. That's the only way you can get the funding, the resources to support what you're doing. Science will always strive to answer the problems that society wants answered. A lot of scientists see problems arising from the move away from fundamental basic research towards research that is only going to be funded if it can directly be seen to produce a product, such as a cure for a disease. Companies, in particular, are interested in short-term profits. There's a lack of interest in speculative long-term research. Some scientists today are tempted to hype their work, exaggerate its potential, as that's the only way of getting the investment, the venture capital for business folk to get excited about short-term future profits. That might be dangerous, particularly when we look at biotechnology in a few lectures time, as it means we've lost some of the detachment of science. Okay, so anyway, back to the in our genomes. Approximately 50% of our genome is composed of repetitive DNA, which basically means chunks of DNA repeated over and over again. Now that really does sound like it might be junk in the worst possible meaning of the word. Right? Well, but even some of this might be useful, for it's mostly retrotransposons. Retrotransposons are the descendants of viruses called retroviruses, that have inserted their genetic material into us and for millions of years have acquired lots of mutations that have streamlined them down into small genetic parasites, hitchhiking through the generations in our genomes. And people are frequently surprised that genomes are so infested with parasites. No reason to be. You've got fleas on your head, you've got tapeworms in your gut, you've got malaria in your blood, you've got bacteria almost everywhere. So why wouldn't your DNA be as infested as any other part of your body? Well, even these parasites can be subverted sometimes, or perhaps they just evolve where their best interests lay. They're trapped in your genome. If you go down, if you die, they go down with you. The youngest of these sequences, they remain parasites, but many of the older ones, the genomic veterans, well, they may have some functional use for you. Some, for example, contain sequences where proteins combine to influence the activity of nearby genes. Well, according to a massive international project called ENCODE, The Encyclopedia of DNA Elements, 80% of our DNA does something functional. Now the scientists of ENCODE have a loose notion of functionality that's inspired a lot of controversy, so I provided a link to a blog post by Ed Young who discusses ENCODE and some of the criticisms. But at the very least, ENCODE provides a very useful parts list for the genome. ENCODE's results have been published in 30 central papers along with a slew of secondary articles, and of all of it freely available to the public. They've also got a very interactive ENCODE portal website. Let's go back to the genes, for a moment. And let's have a look at an example of a bog standard, run-of-the-mill chromosome. So let's look at Chromosome 6. It's called Chromosome 6 because it's your 6th biggest. It's got 2,190 gene-like structures, of which 1,557 are functional genes. Now functional means that they produce RNA. 772 of those genes have been previously described. So until we had the genome sequence, we didn't know about half of our genes and obviously we didn't know what they did because we even know we had them. So we suddenly found out about a lot of previously unsuspected ways humans do their biology. 633 of the 2,190 gene-like structures are pseudogenes. Pseudogenes are broken genes, genes so mutated they're no longer functional. You have an estimated 11,224 pseudogenes in your whole genome, the presence of pseudogenes means that the genome is not just a parts list, it's a history book. The pseudogenes tell us what used to be important to us, what we used to be able to do. We haven't always been human, have we? Not as we define it. Among the pseudogenes, we find the relics of genes for our furry pelts, for a long extended gut able to digest lots of all vegetation, for tails, the big jaws like an adult ape. One of these pseudogenes used to make vitamin C. Almost all mammals, except higher primates, can make their own vitamin C, we can't. What does that tell us about the ancestor who sustained this mutation? It must have been a fruit eater, because fruit contains vitamin C, so the mutation wasn't a disadvantage. It wouldn't have been selected against in a fruit eater. It would have been selected against in anything else. If this mutation had happened in a cat it would have been fatal. You'd have one dead pussy, because you'd get very little vitamin C if you eat meat. [SOUND]