Coding versus Noncoding DNA

In prokaryotes, almost all the DNA codes for proteins.
However, in eukaryotes most of the DNA does not code for anything!
- 97 % of human DNA does not code for proteins, and is called “junk DNA.”
- Therefore, only about 3 % of your DNA codes for proteins!

Tandemly Repetitive DNA (Satellite DNA)

A short sequence of DNA is repeated many times
For example: GTTACGTTACGTTACGTTAC
These are collectively called tandemly repetitive DNA because they repeat next to one another (i.e. in tandem).
They are also called satellite DNA, because when the DNA is cut up into small pieces, these segments have different densities than the rest of the DNA.
- When the DNA is centrifuged, it forms satellite bands next to the main DNA bands.
- This is used in DNA fingerprinting.
Tandemly repetitive DNA makes up about 10–15 % of mammalian DNA.

Shows repeated bands of satellite DNA on a chromosome next to a band of regular DNA.

Unknown source

Location

Tandemly repetitive DNA is found in centromeres and telomeres.
It seems that the tandemly repetitive DNA has a structural role for chromosomes:
- Centromeres are important in the separation of sister chromatids in cell division.
- Telomeres are located at the end of chromosomes and can shorten with each cell division.
  - They contain thousands of repeats of the nucleotide sequence: TTAGGG.

Zooms in on a picture of a telomere where the sequence TTAGGG is repeated.

Unknown sources

Tandemly Repetitive DNA Can Cause Diseases

Fragile X Syndrome
- “CGG” is repeated hundreds or even thousands of times creating a “fragile” site on the X chromosome.
- It leads to mental retardation (see page 274).
Huntington's Disease
- “CAG” repeat causes a protein to have long stretches of the amino acid glutamine.
- Leads to a neurological disorder that results in death.

Unknown sources

Interspersed Repetitive DNA

Interspersed repetitive DNA accounts for 25–40 % of mammalian DNA.
The repeats of interspersed repetitive DNA are not found next to each other as in tandemly repetitive DNA.
- They are scattered randomly throughout the genome.
- The units are hundreds to thousands of base pairs long.
- Copies are similar but not identical to each other.
Famous example: Alu elements
- 300 base pairs long
- Can be transcribed, but the function, if any, is not known.
- Comprise 5 % of human genome!

Tandemly Repetitive DNA versus Interspersed Repetitive DNA

Tandemly Repetitive DNA

Interspersed Repetitive DNA

Proportion of mammalian DNA

10–15 %

25–40 %

Length of each repeated unit

1–10 base pairs

100–10 000 base pairs

Relevant Numerical Characteristics

Total length of repetitive DNA per site, in base pairs:

Regular satellite DNA	100 000–10 million
Minisatellite DNA	100–100 000
Microsatellite DNA	10–100

Number of repetitions per genome: 10–1 million

Notes

Repeated units at a site are usually identical.

“Copies” are very similar but not identical.

Some Repetitive DNA Sequences are Transcribed (But Don't Make Proteins): Multigene Families

A collection of identical or very similar genes.
The entire family of genes probably evolved from a single ancestral gene.
Famous example: rRNA (ribosomal RNA)
- Ribosomes, the large structures that make proteins, are made from proteins and RNA.
- Four different pieces of rRNA are used to make up a ribosome: 18S, 5.8S, 28S, and 5S.
- It turns out that three of these rRNAs occur in the genome as a gene family that is transcribed together.
- The entire multigene family is repeated nearly 300 times in clusters on five different chromosomes!
  - It makes sense to have many repeats of this multigene family because each cell needs many ribosomes for protein synthesis.

Shows the processing of the genes for rRNA.

Figure 14.2, Purves's Life: The Science of Biology, 7th Edition

Pseudogenes

Pseudogenes are DNA sequences that are similar to real genes, but lack the regulatory sequences necessary for gene expression (e.g. promoters).

Transposons and Retrotransposons

Interspersed repetitive genes are not stably integrated in the genome; they move from place to place.
These are called transposable elements, or transposons.
A transposon uses transposase, whereas a retrotransposon uses reverse transcriptase.
They can sometimes mess up good genes.

Shows how a transposon can mess up a normal protein-coding gene.

Shows the normal operation of a transposon.

Figure 14.3, Purves's Life: The Science of Biology, 7th Edition; Figure 19.5, page 350, Campbell's Biology, 5th Edition

Evolution of a Multigene Family: Hemoglobin Proteins

Hemoglobin is a quaternary protein comprised of four tertiary subunits:
- Two α-globins
- Two β-globins
Hypothesis: one ancestral globin gene
- Duplication: the ancestral globin was duplicated, producing two copies in the genome.
- Mutation: each gene mutated, producing two slight variations: alpha and beta.
- Transposition: one gene moved to another chromosome via a transposon.
- Duplications and mutations:
  - The α and β genes undergo further duplications and mutations.
  - More viable variations are produced.
  - Pseudogenes are produced.

Demonstrates a hypothesis for the evolution of the two globin gene families.

Figure 19.3, page 349, Campbell's Biology, 8th Edition

Eukaryotic DNA—97 % “Junk”

Coding versus Noncoding DNA

Tandemly Repetitive DNA (Satellite DNA)

Location

Tandemly Repetitive DNA Can Cause Diseases

Interspersed Repetitive DNA

Tandemly Repetitive DNA versus Interspersed Repetitive DNA

Some Repetitive DNA Sequences are Transcribed (But Don't Make Proteins): Multigene Families

Pseudogenes

Transposons and Retrotransposons

Evolution of a Multigene Family: Hemoglobin Proteins