Redefining Genes

Will new revelations about RNA force us to rethink how our past affects future evolution?

by PHILIP BALL Posted January 14, 2008 09:15 AM


Illustration by Jacob Magraw

In the campus gardens of the Cold Spring Harbor Laboratory, the center for genetics research on Long Island, stands a 15-foot molecule. Made by the architect Charles Jencks, it is an aluminum sculpture of DNA whose structure James Watson, the laboratory's president, deduced with Francis Crick in 1953. Its twinned spiral strands have now come to represent nothing less than life itself: Within these helices, so we are told, lie all the instructions for making an organism, passed from one generation to the next by copying the DNA blueprint.

But over the past year or so, it has begun to look increasingly as though biologists may need to reconsider the role of their favorite molecule. For nearly 50 years, the central dogma of biology has been that genetic information is contained within DNA and is passed by rote transcription through RNA to make proteins. Tiny changes in the information content of the underlying DNA are what then drive evolution. But this information may not be the sole determinant of biological identity. Indeed, it's becoming clear that we do not even know what 'genetic information' means any more—certainly it's not a simple, linear sequence of biochemical 'characters' that define a gene. Even evolution might not be driven solely by the appearance of random mutations in DNA that are inherited by subsequent generations, essentially as Darwin supposed. The central dogma is being eroded, and it now appears as if DNA's cousin, the humble intermediary RNA, plays at least an equal role in genetics and the evolution of the species.

Who says so? Consider the work of Minoo Rassoulzadegan at the French National Institute for Health and Medical Research's laboratory in Nice. Last year she showed that mice could inherit white patches on their tails—normally the result of a mutation in a gene called Kit—even if they lacked the mutant gene for this trait. The white patches appeared because RNA molecules, which passed from parent to offspring after accumulating in sperm cells, overrode the demands of DNA.

Or take the work of David Haussler and his colleagues at the University of California, Santa Cruz. They have shown that a gene called HAR1F, which is probably responsible for some key differences between human and chimpanzee brains, doesn't even make a protein, only an RNA molecule. In other words, the human brain may have evolved through the guidance of RNA.

These and a host of other recent findings are rewriting the textbooks of molecular biology. They are beginning to show not only that RNA is more fundamental to genetics than once believed, but also that it can directly affect evolution and elucidate the differences between species. The result is a story that looks a lot messier, but potentially a lot more interesting, than anyone ever guessed.

The old genetic picture seemed so beautifully simple—indeed, probably too beautiful to be true. It began with the identification by the Austro-Hungarian monk Gregor Mendel of discrete, particle-like units that are responsible for the inheritance of traits from one generation to the next. In Mendel's scheme, you either picked up a trait from one parent or you didn't; there was no blending or averaging from both parents. These units became known as genes, and were found to reside on the chromosomes. In 1944 Oswald Avery and his coworkers found that genes are made of DNA. Nine years later Watson and Crick discovered that genes encode information as a sequence of the four different chemical building blocks of DNA, strung along the double strands like beads. From this the central dogma was born.

But we've slowly learned that genetics is not so simple. For one thing, decoding the human genome—the sum total genetic material in the chromosomes—showed that most (98 percent) of our DNA doesn't consist of protein-encoding genes at all. Some of this non-coding DNA comprises regulatory sequences, to which proteins or RNA bind to control gene transcription, ultimately determining which RNA and proteins are produced. Most is a complete mystery.

That itself didn't seem to challenge the central dogma or the notion that genetics is all about DNA. Rather, this cozy picture has been altered gradually by a series of recent discoveries, beginning with that of so-called microRNAs. Some RNA is not transcribed as a mere messenger for protein synthesis—it's the RNA molecule itself that is the end product, and that plays a key role in controlling events in the cell. In other words, some nominally non-coding DNA does encode important actors in cell biology—those made from RNA, not protein.

HAR1 is an example. This snippet of primate DNA was discovered by comparing the genome sequences of humans and chimpanzees to look for regions that have diverged significantly since we shared a common ancestor. Haussler and his colleagues showed last year that HAR1 appears in a gene (HAR1F) expressed in neurons during a crucial period of 'brain wiring' in the neocortex, which makes it look as though it might be one of the key genetic factors that distinguish human brains from those of other primates. Yet because HAR1 does not make any proteins, the implication is that it's the RNA transcript that somehow controls brain development. Geneticist Gerton Lunter of the University of Oxford thinks that to understand the molecular basis of evolution, "we should stop looking at proteins and start looking at non-coding DNA."

Ronald Plasterk of the University of Utrecht seems to agree. He and his team have scanned the RNA extracted from brain cells of both humans and chimps and have found around 450 new microRNAs, more than doubling the number previously
known. Some of these microRNAs are found in other organisms too, but many are not, suggesting that they have arisen relatively recently in evolutionary history. If they have roles in gene regulation, then it may be that the differences between human and chimp brains aren't so much a matter of differences in genes but in the ways the genes are expressed. Plasterk and colleagues think that organisms might keep a pool of microRNAs on hand as an evolutionary 'playground,' enabling differences between species to be established without having to alter the genomes.

Whatever their function, humans can't do without microRNAs. These molecules can interfere with the processing of DNA by binding to complementary RNA transcripts of genes, preventing the transcripts from being turned into proteins. This is called RNA interference, and it provides a way of turning genes off when they are not required. Such gene 'silencing' isn't necessarily confined to the cell that contains the interfering RNA—it can spread to other cells by transfer of RNA, and can even pass down through generations when it accumulates in sperm and eggs.

RNAs that control genes are one thing; but RNAs that rewrite inheritance—as in the work of Rassoulzadegan—are quite another. Yet, in 2005 Robert Pruitt and his coworkers at Purdue University discovered another example of RNA editing the putative 'book of life.' They found that plants of the cress Arabidopsis could carry the non-mutant form of a gene called HOTHEAD (which causes some plant organs to fuse together) even if both parent plants had the mutant gene. It was as though Arabidopsis had found a way to correct the mistakes of the previous generation. Pruitt suspects that the non-mutant gene may be maintained by mutant plants and passed to offspring in the form of RNA, which can then be 'reverse-transcribed' back into the genome. This kind of inheritance is quite different from that described by Mendel, and seemingly contradicts our straightforward notion of Darwinian evolution.

Pruitt argues that it might be useful for organisms to carry a cache of non-chromosomal genetic information 'remembered' from past generations in order to offset the problems associated with the accumulation of bad genes through inbreeding. The unexplained build-up of RNA in human sperm suggests that we might inherit genetic controls outside of the chromosomes, too. RNA may be guiding our future evolution, through our past.

Another deep insight into the importance of RNA came late last year, when the first results were announced of an international project called the Encyclopedia of DNA Elements (ENCODE), which set out to look in detail at just one percent of the entire human genome (a total of about 30 million DNA bases). It is widely thought that much of our genome is non-coding junk acquired over the course of evolution that no longer serves any useful purpose, like so many dead files forgotten and never consulted on a computer hard drive.

ENCODE showed otherwise. Nearly all of the human genome is transcribed into DNA—it is all, in this sense, active information. We just don't know what it all does. Some of the non-coding transcripts, such as gene-silencing microRNA, have well defined functions, but many—simply called transcripts of unknown function (TUFs)—do not fall cleanly into the categories of coding or non-coding. They may contain bits of protein sequences, and it is possible that they do indeed serve as templates for small proteins. It's not yet clear just how well these TUFs are conserved from one species to another, as one might expect them to be if they have an important role in cellular processes. The fact is that TUFs are baffling—a clue that there's something profoundly lacking in our current picture of genomics, and that somehow RNA is involved.

The ENCODE project also showed that many genes don't seem to be transcribed as expected, linearly from start to finish. About 1-in-20 of the products transcribed are fused from more than one gene, while some transcripts seem to pick up bits and pieces from widely separated parts of the genome. If DNA were a book, it would be unreadable: words would run into one another or be fragmented throughout the text. It is as if our classifications of the genome in terms of genes,
protein-coding sequences, junk, and so on, are simply ignored by the transcription machinery—in other words, it's as if we've misinterpreted the language of inheritance to start with.

One of the consequences of this new view of genetics is that it is forcing some rethinking of what a 'gene' actually is. At the very least, it seems to be a fuzzy-edged entity, and not the sharply defined 'particle' that Mendel's work implied. But the implications go deeper than that, because they erode the primacy of DNA itself. The development and evolution of all organisms must be regarded as an intricate collaboration between both of the cell's nucleic acids—DNA and RNA. If that's so, it is time to stop talking of the RNA World as something that happened billions of years ago on Earth, when RNA is believed to have been required to perform some of the functions of both DNA and proteins, serving as both information carriers and proto-enzymes. We are living in that world now.

Even this picture of a dual role for RNA, though, perhaps imposes a modern prejudice on the whole issue, which says that biomolecules are either data banks or machines. To judge from what we know now, both the implicit hierarchy of the central dogma and the prescriptive rhetoric of a DNA 'book of life' may be misplaced. The time has come for a new definition of the gene that includes a more fundamental role for RNA. Tidy ideas are useful in science, but we need to know when to abandon them, as when both Newtonian mechanics and the solar-system model of the atom were replaced by the subtler world of quantum physics. Molecular and evolutionary biology appears to be poised for a revolution of that order.