1,278 Pages • 645,299 Words • PDF • 73.9 MB
+ Molecular+ cell+ Biology+ Lodish
Uploaded at 2021-09-21 17:07
Report DMCA
PREVIEW PDF
this page left intentionally blank
Molecular Cell Biology
ABOUT THE AUTHORS HARVEY LODISH is Professor of Biology and Professor of Biological Engineering at the Massachusetts Institute of Technology and a Founding Member of the Whitehead Institute for Biomedical Research. Dr. Lodish is also a member of the National Academy of Sciences and the American Academy of Arts and Sciences and was President (2004) of the American Society for Cell Biology. He is well known for his work on cell-membrane physiology, particularly the biosynthesis of many cell-surface proteins, and on the cloning and functional analysis of several cell-surface receptor proteins, such as the erythropoietin and TGF–β receptors. His laboratory also studies long noncoding RNAs and microRNAs that regulate the development and function of hematopoietic cells and adipocytes. Dr. Lodish teaches undergraduate and graduate courses in cell biology and biotechnology. Photo credit: John Soares.
ARNOLD BERK holds the UCLA Presidential Chair in Molecular Cell Biology in the Department of Microbiology, Immunology, and Molecular Genetics and is a member of the Molecular Biology Institute at the University of California, Los Angeles. Dr. Berk is also a fellow of the American Academy of Arts and Sciences. He is one of the discoverers of RNA splicing and of mechanisms for gene control in viruses. His laboratory studies the molecular interactions that regulate transcription initiation in mammalian cells, focusing in particular on adenovirus regulatory proteins. He teaches an advanced undergraduate course in cell biology of the nucleus and a graduate course in biochemistry. Photo credit: Penny Jennings/UCLA Department of Chemistry & Biochemistry. CHRIS A. KAISER is the Amgen Inc. Professor in the Department of Biology at the Massachusetts Institute of Technology. He is also a former Department Head and former Provost. His laboratory uses genetic and cell biological methods to understand how newly synthesized membrane and secretory proteins are folded and stored in the compartments of the secretory pathway. Dr. Kaiser is recognized as a top undergraduate educator at MIT, where he has taught genetics to undergraduates for many years. Photo credit: Chris Kaiser.
MONTY KRIEGER is the Whitehead Professor in the Department of Biology at the Massachusetts Institute of Technology and a Senior Associate Member of the Broad Institute of MIT and Harvard. Dr. Krieger is also a member of the National Academy of Sciences. For his innovative teaching of undergraduate biology and human physiology as well as graduate cell biology courses, he has received numerous awards. His laboratory has made contributions to our understanding of membrane trafficking through the Golgi apparatus and has cloned and characterized receptor proteins important for pathogen recognition and the movement of cholesterol into and out of cells, including the HDL receptor. Photo credit: Monty Krieger. ANTHONY BRETSCHER is Professor of Cell Biology at Cornell University and a member of the Weill Institute for Cell and Molecular Biology. His laboratory is well known for identifying and characterizing new components of the actin cytoskeleton and elucidating the biological functions of those components in relation to cell polarity and membrane traffic. For this work, his laboratory exploits biochemical, genetic, and cell biological approaches in two model systems, vertebrate epithelial cells and the budding yeast. Dr. Bretscher teaches cell biology to undergraduates at Cornell University. Photo credit: Anthony Bretscher.
HIDDE PLOEGH is Professor of Biology at the Massachusetts Institute of Technology and a member of the Whitehead Institute for Biomedical Research. One of the world’s leading researchers in immune-system behavior, Dr. Ploegh studies the various tactics that viruses employ to evade our immune responses and the ways our immune system distinguishes friend from foe. Dr. Ploegh teaches immunology to undergraduate students at Harvard University and MIT. Photo credit: Hidde Ploegh.
ANGELIKA AMON is Professor of Biology at the Massachusetts Institute of Technology, a member of the Koch Institute for Integrative Cancer Research, and Investigator at the Howard Hughes Medical Institute. She is also a member of the National Academy of Sciences. Her laboratory studies the molecular mechanisms that govern chromosome segregation during mitosis and meiosis and the consequences—aneuploidy—when these mechanisms fail during normal cell proliferation and cancer development. Dr. Amon teaches undergraduate and graduate courses in cell biology and genetics. Photo credit: Pamela DiFraia/ Koch Institute/MIT.
KELSEY C. MARTIN is Professor of Biological Chemistry and Psychiatry and interim Dean of the David Geffen School of Medicine at the University of California, Los Angeles. She is the former Chair of the Biological Chemistry Department. Her laboratory studies the ways in which experience changes connections between neurons in the brain to store long-term memories—a process known as synaptic plasticity. She has made important contributions to elucidating the molecular and cell biological mechanisms that underlie this process. Dr. Martin teaches basic principles of neuroscience to undergraduates, graduate students, dental students, and medical students. Photo credit: Phuong Pham.
Molecular Cell Biology EIGHTH EDITION
Harvey Lodish Arnold Berk Chris A. Kaiser Monty Krieger Anthony Bretscher Hidde Ploegh Angelika Amon Kelsey C. Martin
New York
Publisher: Katherine Ahr Parker Acquisitions Editor: Beth Cole Developmental Editors: Erica Champion, Heather Moffat Editorial Assistants: Nandini Ahuja, Abigail Fagan Executive Marketing Manager: Will Moore Senior Project Editor: Elizabeth Geller Design Manager: Blake Logan Text Designer: Patrice Sheridan Cover Design: Blake Logan Illustration Coordinator: Janice Donnola Art Development Editor: H. Adam Steinberg, Art for Science Permissions Manager: Jennifer MacMillan Photo Editor: Sheena Goldstein Photo Researcher: Teri Stratford Text Permissions: Felicia Ruocco, Hilary Newman Media and Supplements Editors: Amy Thorne, Kathleen Wisneski Senior Media Producer: Chris Efstratiou Senior Production Supervisor: Paul Rohloff Composition: codeMantra Printing and Binding: RR Donnelley Cover Image: Dr. Tomas Kirchhausen and Dr. Lei Lu
ABOUT THE COVER: Imaging of the intracellular organelles of a live human HeLa cell shows the dramatic morphological changes that accompany the process of cell division. The membrane of the endoplasmic reticulum (ER) is labeled green by a fluorescently tagged component of the translocon (GFPSec61β) and chromatin is labeled red by a fluorescently tagged histone (H2BmRFP). Front: An interphase cell showing uncondensed chromatin filling the nucleus, with the ER as a reticulum of cisternae surrounding the nucleus and interconnected with lace-like tubules at the cell periphery. Back: Prior to cell division the chromatin condenses to reveal the worm-like structure of individual chromosomes, the nuclear envelope breaks down, and the ER condenses into an array of cisternae surrounding the condensed chromosomes. As cell division proceeds the replicated chromosomes will segregate equally into two daughter cells, nuclear envelopes will form in the daughter cells, and the ER will return to its characteristic reticular organization. Cover photo: Dr. Tomas Kirchhausen & Dr. Lei Lu.
Library of Congress Control Number: 2015957295 ISBN-13: 978-1-4641-8339-3 ISBN-10: 1-4641-8339-2 © 2016, 2013, 2008, 2004 by W. H. Freeman and Company All rights reserved. Printed in the United States of America First printing W. H. Freeman and Company One New York Plaza, Suite 4500, New York, NY 10004-1562 www.macmillanhighered.com
TO OUR STUDENTS AND TO OUR TEACHERS, from whom we continue to learn, AND TO OUR FAMILIES, for their support, encouragement, and love
this page left intentionally blank
PREFACE
In writing the eighth edition of Molecular Cell Biology, we have incorporated many of the spectacular advances made over the past four years in biomedical science, driven in part by new experimental technologies that have revolutionized many fields. Fast techniques for sequencing DNA, allied with efficient methods to generate and study mutations in model organisms and to map disease-causing mutations in humans, have illuminated a basic understanding of the functions of many cellular components, including hundreds of human genes that affect diseases such as diabetes and cancer. For example, advances in genomics and bioinformatics have uncovered thousands of novel long noncoding RNAs that regulate gene expression, and have generated insights into and potential therapies for many human diseases. Powerful genome editing technologies have led to an unprecedented understanding of gene regulation and function in many types of living organisms. Advances in mass spectrometry and cryoelectron microscopy have enabled dynamic cell processes to be visualized in spectacular detail, providing deep insight into both the structure and the function of biological molecules, post-translational modifications, multiprotein complexes, and organelles. Studies of specific nerve cells in live organisms have been advanced by optogenetic technologies. Advances in stem-cell technology have come from studies of the role of stem cells in plant development and of regeneration in planaria. Exploring the most current developments in the field is always a priority in writing a new edition, but it is also important to us to communicate the basics of cell biology clearly by stripping away as much extraneous detail as possible to focus attention on the fundamental concepts of cell biology. To this end, in addition to introducing new discoveries and technologies, we have streamlined and reorganized several chapters to clarify processes and concepts for students.
New Co-Author, Kelsey C. Martin The new edition of MCB introduces a new member to our author team, leading neuroscience researcher and educator Kelsey C. Martin of the University of California, Los Angeles. Dr. Martin is Professor of Biological Chemistry and Psychiatry and interim Dean of the David Geffen School of Medicine at UCLA. Her laboratory uses Aplysia and mouse models to understand the cell and molecular biology of long-term memory formation. Her group has made important contributions to elucidating the molecular and cell biological mechanisms by which experience changes connections between neurons in the brain to store
long-term memories—a process known as synaptic plasticity. Dr. Martin received her undergraduate degree in English and American Language and Literature at Harvard University. After serving as a Peace Corps volunteer in the Democratic Republic of the Congo, she earned an MD and PhD at Yale University. She teaches basic neurobiology to undergraduate, graduate, dental, and medical students.
Revised, Cutting-Edge Content The eighth edition of Molecular Cell Biology includes new and improved chapters: r “Molecules, Cells, and Model Organisms” (Chapter 1) is an improved and expanded introduction to cell biology. It retains the overviews of evolution, molecules, different forms of life, and model organisms used to study cell biology found in previous editions. In this edition, it also includes a survey of eukaryotic organelles, which was previously found in Chapter 9. r “Culturing and Visualizing Cells” (Chapter 4) has been moved forward (previously Chapter 9) as the techniques used to study cells become ever more important. Light-sheet microscopy, super-resolution microscopy, and two-photon excitation microscopy have been added to bring this chapter up to date. r All aspects of mitochondrial and chloroplast structure and function have been collected in “Cellular Energetics” (Chapter 12). This chapter now begins with the structure of the mitochondrion, including its endosymbiotic origin and organelle genome (previously in Chapter 6). The chapter now discusses the role of mitochondria-associated membranes (MAMs) and communication between mitochondria and the rest of the cell. r Cell signaling has been reframed to improve student accessibility. “Signal Transduction and G Protein–Coupled Receptors” (Chapter 15) begins with an overview of the concepts of cell signaling and methods for studying it, followed by examples of G protein–coupled receptors performing multiple roles in different cells. “Signaling Pathways That Control Gene Expression” (Chapter 16) now focuses on gene expression, beginning with a new discussion of Smads. Further examples cover the major signaling pathways that students will encounter in cellular metabolism, protein degradation, and cellular differentiation. Of particular interest is a new section on Wnt and Notch signaling pathways controlling stem-cell differentiation in planaria. The chapter ends by describing how signaling pathways are integrated
vii
(a)
Point-scanning confocal microscopy
Two-photon excitation microscopy
Electron excited state
Excitation photon (488 nm)
Emission photon (507 nm)
Excitation photon 2 (960 nm)
Emission photon (507 nm)
Excitation photon 1 (960 nm) Electron ground state
(b)
Objective lens of microscope
Immobilized mouse
(c)
to form a cellular response in insulin and glucagon control of glucose metabolism. r Our new co-author, Kelsey C. Martin, has extensively revised and updated “Cells of the Nervous System” (Chapter 22) to include several new developments in the field. Optogenetics, a technique that uses channelrhodopsins and light to perturb the membrane potential of a cell, can be used in live animals to link neural pathways with behavior. The formation and pruning of neural pathways in the central nervous system is under active investigation, and a new discussion of signals that govern these processes focuses on the cell-cell contacts involved. This discussion leads to an entirely new section on learning and memory, which explores the signals and molecular mechanisms underlying synaptic plasticity.
Increased Clarity, Improved Pedagogy As experienced teachers of both undergraduate and graduate students, we are always striving to improve student understanding. Being able to visualize a molecule in action can have a profound effect on a student’s grasp of the molecular processes within a cell. With this in mind, we have updated many of the molecular models for increased clarity and added models where they can deepen student understanding. From the precise fit required for tRNA charging, to the conservation of ribosome structure, to the dynamic strength of tropomyosin and troponin in muscle contraction, these figures communicate the complex details of molecular structure that cannot be conveyed in schematic diagrams alone. In conjunction with these new models, their schematic icons have been revised to more accurately represent them, allowing students a smooth transition between the molecular details of a structure and its function in the cell.
New Discoveries, New Methodologies r Model organisms Chlamydomonas reinhardtii (for study of flagella, chloroplast formation, photosynthesis, and phototaxis) and Plasmodium falciparum (novel organelles and a complex life cycle) (Ch. 1) r Intrinsically disordered proteins (Ch. 3) r Chaperone-guided folding and updated chaperone structures (Ch. 3) FIGURE 4-21 Two-photo excitation microscopy allows deep penetration for intravital imaging. (a) In conventional point-scanning confocal microscopy, absorption of a single photon results in an electron jumping to the excited state. In two-photon excitation, two lower-energy photons arrive almost instantaneously and induce the electron to jump to the excited state. (b) Two-photon microscopy can be used toobserve cells up to 1 mm deep within a living animal immobilized on the microscope stage. (c) Neurons in a lobster were imaged using two-photon excitation microscopy.
r Unfolded proteins and the amyloid state and disease (Ch. 3)
[Part (c) unpublished data from Peter Kloppenburg and Warren R. Zipfel.]
r Super-resolution microscopy (Ch. 4)
viii
t
PREFACE
r Hydrogen/deuterium (HXMS) (Ch. 3)
exchange
mass
r Phosphoproteomics (Ch. 3) r Two-photon excitation microscopy (Ch. 4) r Light-sheet microscopy (Ch. 4)
spectrometry
(a)
Amino acid (Phe) H2N
H
O
C
C
High-energy ester bond
OH
CH2 OH
H2N
H
O
C
C
O
H2N
CH2
1
ATP AAA tRNA specific for Phe (tRNAPhe)
O C
O
CH2
2
Net result: Phe is selected by its codon
Phe-tRNAPhe binds to the UUU codon
Linkage of Phe to tRNAPhe
Aminoacyl-tRNA synthetase specific for Phe
H C
AMP PPi
AAA 5ⴕ
Aminoacyl-tRNA
AAA UUU mRNA
3ⴕ
FIGURE 5-19 (a) Translating nucleic acid sequence into amino acid sequence requires two steps. Step 1: An aminoacyl-tRNA synthetase couples a specific amino acid to its corresponding tRNA. Step 2: The anticodon base-pairs with a codon in the mRNA specifying that amino acid. (b) Molecular model of the human mitochondrial aminoacyl-tRNA synthetase for Phe in complex with tRNAPhe.
b) (b)
r GLUT1 molecular model and transport cycle (Ch. 11) r Expanded discussion of the pathway for import of PTS1-bearing proteins into the peroxisomal matrix (Ch. 13) Aminoacyl-tRNA synthetase specific for Phe
tRNA specific for Phe (tRNAPhe)
r Three-dimensional culture matrices and 3D printing (Ch. 4)
r Expanded discussion of Rab proteins and their role in vesicle fusion with target membranes (Ch. 14) r Human G protein–coupled receptors of pharmaceutical importance (Ch. 15) r The role of Smads in chromatin modification (Ch. 16)
r Ribosome structural comparison across domains shows conserved core (Ch. 5) r CRISPR–Cas9 system in bacteria and its application in genomic editing (Ch. 6) r Chromosome conformation capture techniques reveal topological domains in chromosome territories within the nucleus (Ch. 8) r Mapping of DNase I hypersensitive sites reveals cell developmental history (Ch. 9)
Scaffold RNA
(b)
Cas9 Bacterial DNA Guide RNA
r Long noncoding RNAs involved in X inactivation in mammals (Ch. 9) r ENCODE databases (Ch. 9) r Improved discussion of mRNA degradation pathways and RNA surveillance in the cytoplasm (Ch. 10) r Nuclear bodies: P bodies, Cajal bodies, histone locus bodies, speckles, paraspeckles, and PML nuclear bodies (Ch. 10)
Target DNA
*
* DNA cleavage sites
FIGURE 6-43b Cas9 uses a guide RNA to identify and cleave a specific DNA sequence.
PREFACE
t
ix
(a)
r Pluripotency of mouse ES cells and the potential of differentiated cells derived from iPS and ES cells in treating various diseases (Ch. 21)
(b)
Notum mRNA
r Pluripotent ES cells in planaria (Ch. 21)
Wnt mRNA Pharynx
r Cells in intestinal crypts that dedifferentiate to replenish intestinal stem cells (Ch. 21) r Cdc42 and feedback loops that control cell polarity (Ch. 21)
Wnt mRNA
r Prokaryotic voltage-gated Na+ channel structure, allowing comparison with voltage-gated K+ channels (Ch. 22) 200 m
FIGURE 16-31 Gradients of Wnt and Notum guide regeneration of a head and tail by planaria. [Part (b) Jessica Witchley and Peter Reddien.]
r Optogenetics techniques for linking neural circuits with behavior (Ch. 22) r Mechanisms of synaptic plasticity that govern learning and memory (Ch. 22)
r Wnt concentration gradients in planarian development and regeneration (Ch. 16) r Inflammatory hormones in adipose cell function and obesity (Ch. 16) r Regulation of insulin and glucagon function in control of blood glucose (Ch. 16) r Use of troponins as an indicator of the severity of a heart attack (Ch. 17) r Neurofilaments and keratins involved in skin integrity, epidermolysis bullosa simplex (Ch. 18)
Control
Running
Figure 22-8 Neurogenesis in the adult brain. Newly born neurons were labeled with GFP in the dentate gyrus of control mice and mice that were allowed to exercise on a running wheel. [Chunmei Zhao and Fred H. Gage.]
r New structures and understanding of function of dynein and dynactin (Ch. 18)
r Inflammasomes and non-TLR nucleic acid sensors (Ch. 23)
r Expanded discussion of lamins and their role in nuclear membrane structure and dynamics during mitosis (Ch. 18)
r Expanded discussion of somatic hypermutation (Ch. 23)
r Diseases associated with cohesin defects (Ch. 19) r The Hippo pathway (Ch. 19) r Spindle checkpoint assembly and nondisjunction and aneuploidy in mice; nondisjunction increases with maternal age (Ch. 19) r Expanded discussion of the functions of the extracellular matrix and the role of cells in assembling it (Ch. 20)
r Improved discussion of the MHC molecule classes; MHC-peptide complexes and their interactions with T-cells (Ch. 23) r Lineage commitment of T cells (Ch. 23) r Tumor immunology (Ch. 23) r The characteristics of cancer cells and how they differ from normal cells (Ch. 24)
r Mechanotransduction (Ch. 20)
r How carcinogens lead to mutations and how mutations accumulate to cancer (Ch. 24)
r Structure of cadherins and their cis and trans interactions (Ch. 20)
Medical Connections
r Cadherins as receptors for class C rhinoviruses and asthma (Ch. 20) r Improved discussion of microfibrils in elastic tissue and in LTBP-mediated TGF-β signaling (Ch. 20) r Tunneling nanotubes (Ch. 20) r Functions of WAKs in plants as pectin receptors (Ch. 20)
x
t
PREFACE
Many advances in basic cellular and molecular biology have led to new treatments for cancer and other human diseases. Examples of such medical advances are woven throughout the chapters to give students an appreciation for the clinical applications of the basic science they are learning. Many of these applications hinge on a detailed understanding of multiprotein complexes in cells—complexes that catalyze cell movements; regulate DNA transcription,
replication, and repair; coordinate metabolism; and connect cells to other cells and to proteins and carbohydrates in their extracellular environment. r Stereoisomers of small molecules as drugs—sterically pure molecules have different effects from mixtures (Ch. 2) r Cholesterol is hydrophobic and must be transported by lipoprotein carriers LDL and HDL (Ch. 2) r Essential amino acids must be provided in livestock feed (Ch. 2) r Saturated, unsaturated, and trans fats: their molecular structures and nutritional consequences (Ch. 2) r Protein misfolding and amyloids in neurodegenerative diseases such as Alzheimer’s and Parkinson’s (Ch. 3) r Small molecules that inhibit enzyme activity can be used as drugs (aspirin) or in chemical warfare (sarin gas) (Ch. 3) r Small-molecule inhibitors of the proteasome are used to treat certain cancers (Ch. 3) r Disruptions of GTPases, GAPs, GEFs, and GDIs by mutations and pathogens cause a wide variety of diseases (Ch. 3) r 3-D printing technology may be used to grow replacement organs (Ch. 4) r The high-resolution structures of ribosomes can help identify small-molecule inhibitors of bacterial, but not eukaryotic, ribosomes (Ch. 5) r Mutations in mismatch repair proteins lead to hereditary nonpolyposis colorectal cancer (Ch. 5) r Nucleotide excision-repair proteins were identified in patients with xeroderma pigmentosum (Ch. 5) r Human viruses HTLV, HIV-1, and HPV initiate infection by binding to specific cell-surface molecules, and some integrate their genomes into the host cell’s DNA (Ch. 5) r The sickle-cell allele is an example of one that exhibits both dominant and recessive properties depending on the phenotype being examined (Ch. 6) r DNA microarrays can be useful as medical diagnostic tools (Ch. 6) r Recombinant DNA techniques are used to mass-produce therapeutically useful proteins such as insulin and G-CSF (Ch. 6) r Most cases of genetic diseases are caused by inherited rather than de novo mutations (Ch. 6) r A CFTR knockout mouse line is useful in studying cystic fibrosis (Ch. 6) r ABO blood types are determined by the carbohydrates attached to glycoproteins on the surfaces of erythrocytes (Ch. 7)
r Atherosclerosis, marked by accumulation of cholesterol, other lipids, and other biological substances in an artery, is responsible for the majority of deaths due to cardiovascular disease in the United States (Ch. 7) r Microsatellite repeats have a tendency to expand and can cause neuromuscular diseases such as Huntington disease and myotonic dystrophy (Ch. 8) r L1 transposable elements can cause genetic diseases by inserting into new sites in the genome (Ch. 8) r Exon shuffling can result in bacterial resistance to antibiotics, a growing challenge in hospitals (Ch. 8) r The NF1 gene, which is mutated in patients with neurofibromatosis, is an example of how bioinformatics techniques can be used to identify the molecular basis of a genetic disease (Ch. 8) r Telomerase is abnormally activated in most cancers (Ch. 8) r TFIIH subunits were first identified based on mutations in those subunits that cause defects in DNA repair associated with a stalled RNA polymerase (Ch. 9) r HIV encodes the Tat protein, which inhibits termination of transcription by RNA polymerase II (Ch. 9) r Synthetic oligonucleotides are being used in treatment of Duchenne muscular dystrophy (DMD)(Ch. 10) r Mutations in splicing enhancers can cause exon skipping, as in spinal muscular atrophy (Ch. 10) r Expansion of microsatellite repeats in genes expressed in neurons can alter their relative abundance in different regions of the central nervous system, resulting in neurological disorders (Ch. 10) r Thalassemia commonly results from mutations in globin-gene splice sites that decrease splicing efficiency but do not prevent association of the pre-mRNA with snRNPs (Ch. 10) r Genes encoding components of the mTORC1 pathway are mutated in many cancers, and mTOR inhibitors combined with other therapies may suppress tumor growth (Ch. 10) r Aquaporin 2 levels control the rate of water resorption from urine being formed by the kidney (Ch. 11) r Certain cystic fibrosis patients are being treated with a small molecule that allows a mutant protein to traffic normally to the cell surface (Ch. 11) r SGLT2 inhibitors are in development or have been approved for treatment of type II diabetes (Ch. 11) r Antidepressants and other therapeutic drugs, as well as drugs of abuse, target Na+-powered symporters because of their role in the reuptake and recycling of neurotransmitters (Ch. 11)
PREFACE
t
xi
r Drugs that inhibit the Na+/K+ ATPase in cardiac muscle cells are used in treating congestive heart failure (Ch. 11)
cells than does epinephrine, and is used to treat bronchial asthma, chronic bronchitis, and emphysema (Ch. 15)
r Oral rehydration therapy is a simple, effective means of treating cholera and other diseases caused by intestinal pathogens (Ch. 11)
r Some bacterial toxins (e.g., Bordetella pertussis, Vibrio cholerae, certain strains of E. coli) catalyze a modification of a G protein in intestinal cells, increasing intracellular cAMP, which leads to loss of electrolytes and fluids (Ch. 15)
r Mutations in CIC-7, a chloride ion channel, result in defective bone resorption characteristic of the hereditary bone disease osteopetrosis (Ch. 11) r The sensitivity of mitochondrial ribosomes to the aminoglycoside class of antibiotics, including chloramphenicol, can cause toxicity in patients (Ch. 12) r Mutations and large deletions in mtDNA cause certain diseases, such as Leber’s hereditary optic neuropathy and Kearns-Sayre syndrome (Ch. 12) r Cyanide is toxic because it blocks ATP production in mitochondria (Ch. 12) r Reduction in amounts of cardiolipin, as well as an abnormal cardiolipin structure, results in the heart and skeletal muscle defects and other abnormalities that characterize Barth’s syndrome (Ch. 12) r Reactive oxygen species are by-products of electron transport that can damage cells (Ch. 12) r ATP/ADP antiporter activity was first studied over 2000 years ago through the examination of the effects of poisonous herbs (Ch. 12) r There are two related subtypes of thermogenic fat cells (Ch. 12) r A hereditary form of emphysema results from misfolding of proteins in the endoplasmic reticulum (Ch. 13) r Autosomal recessive mutations that cause defective peroxisome assembly can lead to several developmental defects often associated with craniofacial abnormalities, such as those associated with Zellweger syndrome (Ch. 13) r Certain cases of cystic fibrosis are caused by mutations in the CFTR protein that prevent movement of this chloride channel from the ER to the cell surface (Ch. 14) r Study of lysosomal storage diseases has revealed key elements of the lysosomal sorting pathway (Ch. 14)
r Nitroglycerin decomposes to NO, a natural signaling molecule that, when used to treat angina, increases blood flow to the heart (Ch. 15) r PDE inhibitors elevate cGMP in vascular smooth muscle cells and have been developed to treat erectile dysfunction (Ch. 15) r Many tumors contain inactivating mutations in either TGF-β receptors or Smad proteins and are resistant to growth inhibition by TGF-β (Ch. 16) r Epo and G-CSF are used to boost red blood cells and neutrophils, respectively, in patients with kidney disease and during certain cancer therapies that affect blood cell formation in the bone marrow (Ch. 16) r Many cases of SCID result from a deficiency in the IL-2 receptor gamma chain and can be treated by gene therapy (Ch. 16) r Mutant Ras proteins that bind but cannot hydrolyze GTP, and are therefore locked in an active GTP-bound state, contribute to oncogenic transformation (Ch. 16) r Potent and selective inhibitors of Raf are being clinically tested in patients with melanomas caused by mutant Raf proteins (Ch. 16) r The deletion of the PTEN gene in multiple types of advanced cancers results in the loss of the PTEN protein, contributing to the uncontrolled growth of cells (Ch. 16) r High levels of free β-catenin, caused by aberrant hyperactive Wnt signaling, are associated with the activation of growth-promoting genes in many cancers (Ch. 16) r Inappropriate activation of Hh signaling associated with primary cilia is the cause of several types of tumors (Ch. 16) r Increased activity of ADAMs can promote cancer development and heart disease (Ch. 16)
r The hereditary disease familial hypercholesterolemia results from a variety of mutations in the LDLR gene (Ch. 14)
r The brains of patients with Alzheimer’s disease accumulate amyloid plaques containing aggregates of the Aβ42 peptide (Ch. 16)
r Therapeutic drugs using the TNFα-binding domain of TNFα receptor are used to treat arthritis and other inflammatory conditions (Ch. 15)
r Diabetes mellitus is characterized by impaired regulation of blood glucose, which can lead to major complications if left untreated (Ch. 16)
r Monoclonal antibodies that bind HER2 and thereby block signaling by EGF are useful in treating breast tumors that overexpress HER2 (Ch. 15)
r Hereditary spherocytic anemias can be caused by mutations in spectrin, band 4.1, and ankyrin (Ch. 17)
r The agonist isoproterenol binds more strongly to epinephrine-responsive receptors on bronchial smooth muscle
xii
t
PREFACE
r Duchenne muscular dystrophy affects the protein dystrophin, resulting in progressive weakening of skeletal muscle (Ch. 17)
r Hypertrophic cardiomyopathies result from various mutations in proteins of the heart contractile machinery (Ch. 17) r Blood tests that measure the level of cardiac-specific troponins are used to determine the severity of a heart attack (Ch. 17) r Some drugs (e.g., colchicine) bind tubulin dimers and restrain them from polymerizing into microtubules, whereas others (e.g., taxol) bind microtubules and prevent depolymerization (Ch. 18) r Defects in LIS1 cause Miller-Dieker lissencephaly in early brain development, leading to abnormalities (Ch. 18) r Some diseases, such as ADPKD and Bardet-Biedl syndrome, have been traced to defects in primary cilia and intraflagellar transport (Ch. 18) r Keratin filaments are important to maintaining the structural integrity of epithelial tissues by mechanically reinforcing the connections between cells (Ch. 18) r Mutations in the human gene for lamin A cause a wide variety of diseases termed laminopathies (Ch. 18) r In cohesinopathies, mutations in cohesion subunits or cohesion loading factors disrupt expression of genes critical for development, resulting in limb and craniofacial abnormalities and intellectual disabilities (Ch. 19) r Aneuploidy leads to misregulation of genes and can contribute to cancer development (Ch. 19) r Aneuploid eggs are largely caused by chromosome missegregation in meiosis I or nondisjunction, leading to miscarriage or Down syndrome (Ch. 19) r The protein CDHR3 enables class C rhinoviruses (RV-C) to bind to airway epithelial cells, enter them, and replicate, causing respiratory diseases and exacerbating asthma (Ch.20) r The cadherin desmoglein is the predominant target of autoantibodies in the skin disease pemiphigus vulgaris (Ch. 20) r Some pathogens, such as hepatitis C virus and the enteric bacterium Vibrio cholerae, have evolved to exploit the molecules in tight junctions (Ch. 20) r Mutations in connexin genes cause a variety of diseases (Ch. 20) r Defects in the glomerular basement membrane can lead to renal failure (Ch. 20) r In cells deprived of ascorbate, the pro-α collagen chains are not hydroxylated sufficiently to form the structural support of collagen necessary for healthy blood vessels, tendons, and skin, resulting in scurvy (Ch. 20) r Mutations affecting type I collagen and its associated proteins cause a variety of diseases, including osteogenesis imperfecta (Ch. 20)
r A variety of diseases, often involving skeletal and cardiovascular abnormalities (e.g., Marfan syndrome), result from mutations in the genes encoding the structural proteins of elastic fibers or the proteins that contribute to their proper assembly (Ch. 20) r Connections between the extracellular matrix and cytoskeleton are defective in muscular dystrophy (Ch. 20) r Leukocyte-adhesion deficiency is caused by a genetic defect that results in the leukocytes’ inability to fight infection, thereby increasing susceptibility to repeated bacterial infections (Ch. 20) r The stem cells in transplanted bone marrow can generate all types of functional blood cells, which makes such transplants useful for patients with certain hereditary blood diseases as well as cancer patients who have received irradiation or chemotherapy (Ch. 21) r Channelopathies, including some forms of epilepsy, are caused by mutations in genes that encode ion channels (Ch. 22) r The topical anesthetic lidocaine works by binding to amino acid residues along the voltage-gated Na+ channel, locking it in the open but occluded state (Ch. 22) r The cause of multiple sclerosis is not known, but seems to involve either the body’s production of auto-antibodies that react with myelin basic protein or the secretion of proteases that destroy myelin proteins (Ch. 22) r Peripheral myelin is a target of autoimmune disease, mainly involving the formation of antibodies against Po (Ch. 22) r The key role of VAMP in neurotransmitter exocytosis can be seen in the mechanism of action of botulinum toxin (Ch. 22) r Neurotransmitter transporters are targets of a variety of drugs of abuse (e.g., cocaine) as well as therapeutic drugs commonly used in psychiatry (e.g., Prozac, Zoloft, Paxil) (Ch. 22) r Nicotinic acetylcholine receptors produced in brain neurons are important in learning and memory; loss of these receptors is observed in schizophrenia, epilepsy, drug addiction, and Alzheimer’s disease (Ch. 22) r Studies suggest that the voltage-gated Na+ channel Nav1.7 is a key component in the perception of pain (Ch. 22) r People vary significantly in sense of smell (Ch. 22) r Synaptic translation of localized mRNAs is critical to the formation and the experience-dependent plasticity of neural circuits, and alterations in this process result in neurodevelopmental and cognitive disorders (Ch. 22) r The immunosuppressant drug cyclosporine inhibits calcineurin activity through the formation of a
PREFACE
t
xiii
cyclosporine-cyclophilin complex, thus enabling successful allogenic tissue transplantation (Ch. 23)
r Editing of plant mitochondrial RNA transcripts can convert cytosine residues to uracil residues (Ch. 12)
r Vaccines elicit protective immunity against a variety of pathogens (Ch. 23)
r Photosynthesis is an important process for synthesizing ATP (Ch. 12)
r Increased understanding of the molecular cell biology of tumors is revolutionizing the way cancers are diagnosed and treated (Ch. 24)
r Chloroplast DNAs are evolutionarily younger and show less structural diversity than mitochondrial DNAs (Ch. 12)
Plant Biology Connections
r Chloroplast transformation has led to engineered plants that are resistant to infections as well as plants that can be used to make protein drugs (Ch. 12)
Developments in agriculture, environmental science, and alternative energy production have demonstrated that the molecular cell biology of plants is increasingly relevant to our lives. Understanding photosynthesis and chloroplasts is just the beginning of plant biology. Throughout the text, we have highlighted plant-specific topics, including aspects of cell structure and function that are unique to plants, plant development, and plant biotechnology applications directed toward solving problems in agriculture and medicine. ■
r In giant green algae such as Nitella, the cytosol flows rapidly due to use of myosin V (Ch. 17)
r Vascular plants have rigid cell walls and use turgor pressure to stand upright and grow (Ch. 11)
r The root meristem resembles the shoot meristem in structure and function (Ch. 21)
r Transgenic plants have been produced that overexpress the vacuolar Na+/H+ antiporter, and can therefore grow successfully in soils containing high salt concentrations (Ch. 11)
xiv
t
PREFACE
r Formation of the spindle and cytokinesis have unique features in plants (Ch. 18) r Meristems are niches for stem cells in plants (Ch. 21) r A negative feedback loop maintains the size of the shoot apical stem-cell population (Ch. 21)
MEDIA AND SUPPLEMENTS
LaunchPad for Molecular Cell Biology is a robust teaching and learning tool with all instructor and student resources as well as a fully interactive e-Book.
Concept Check quizzes test student understanding of the most important concepts of each section.
Student Resources Interactive Case Studies guide students through applied problems related to important concepts; topics include cancer, diabetes, and cystic fibrosis.
LearningCurve, a self-paced adaptive quizzing tool for students, tailors questions to their target difficulty level and encourages them to incorporate content from the text into their study routine. A collection of Videos shows students real cell processes as they appear in the lab. Analyze the Data questions ask students to apply critical thinking and data analysis skills to solving complex problems. Classic Experiments introduce students to the details of a historical experiment important to the cell and molecular biology fields.
Case Study “To Kill a Cancer Cell” leads students through the experiments needed to identify a perturbed signaling pathway. Over 60 Animations based on key figures from the text illustrate difficult or important structures and processes.
Instructor Resources All Figures and Photos from the text are optimized for classroom presentation and provided in several formats and with and without labels. A comprehensive Test Bank provides a variety of questions for creating quizzes and exams. Lecture Slides built around high-quality versions of text figures provide a starting point for in-class presentations. Clicker Questions in slide format help instructors promote active learning in the classroom. A PDF Solutions Manual provides answers to the Review the Concepts questions at the end of each chapter. An answer key for Analyze the Data questions is also included.
Animation of Figure 16-3b depicts signal transduction in the TGF-β/Smad pathway.
xv
ACKNOWLEDGMENTS
In updating, revising, and rewriting this book, we were given invaluable help by many colleagues. We thank the following people who generously gave of their time and expertise by making contributions to specific chapters in their areas of interest, providing us with detailed information about their courses, or by reading and commenting on one or more chapters: David Agard, University of California, San Francisco, and Howard Hughes Medical Institute
Ann Aguanno, Marymount Manhattan College Stephen Amato, Northeastern University Shivanthi Anandan, Drexel University Kenneth Balazovich, University of Michigan Amit Banerjee, Wayne State University Lisa Banner, California State University, Northridge Benjamin Barad, University of California, San Francisco Kenneth Belanger, Colgate University Andrew Bendall, University of Guelph Eric Betzig, Howard Hughes Medical Institute Subhrajit Bhattacharya, Auburn University Ashok Bidwai, West Virginia University David Bilder, University of California, Berkeley Elizabeth Blinstrup-Good, University of Illinois Jenna Bloemer, Auburn University Jonathan Bogan, Yale University School of Medicine Indrani Bose, Western Carolina University Laurie Boyer, Massachusetts Institute of Technology James Bradley, Auburn University Eric Brenner, New York University Mirjana Brockett, Georgia Institute of Technology Manal Buabeid, Auburn University Heike Bucking, South Dakota State University Tim Burnett, Emporia State University Samantha Butler, University of California, Los Angeles W. Malcolm Byrnes, Howard University College of Medicine Monique Cadrin, University of Quebec Trois-Rivières Martin Cann, Durham University Steven A. Carr, Broad Institute of Massachusetts Institute of Technology and Harvard
Suzie Chen, Rutgers University Cindy Cooper, Truman State University David Daleke, Indiana University Thomas J. Deerinck, University of California, San Diego Linda DeVeaux, South Dakota School of Mines and Technology
xvi
David Donze, Louisiana State University William Dowhan, University of Texas, Houston Janet Duerr, Ohio University Manoj Duraisingh, Harvard School of Public Health Paul Durham, Missouri State University David Eisenberg, University of California, Los Angeles Sevinc Ercan, New York University Marilyn Farquhar, University of California, San Diego Jeffrey Fillingham, Ryerson University Kathleen Fitzpatrick, Simon Fraser University Friedrich Foerster, Max Planck Institute of Biochemistry Margaret T. Fuller, Stanford University School of Medicine Warren Gallin, University of Alberta Liang Gao, Stony Brook University Chris Garcia, Stanford University School of Medicine Mary Gehring, Massachusetts Institute of Technology Jayant Ghiara, University of California, San Diego David Gilmour, Pennsylvania State University Alfred Goldberg, Harvard Medical School Sara Gremillion, Armstrong State University Lawrence I. Grossman, Wayne State University Barry M. Gumbiner, University of Washington and Seattle Children’s Research Institute
Yanlin Guo, University of Southern Mississippi Gyorgy Hajnoczky, Thomas Jefferson University Nicholas Harden, Simon Fraser University Maureen Harrington, Indiana University Michael Harrington, University of Alberta Marcia Harrison-Pitaniello, Marshall University Craig Hart, Louisiana State University Andreas Herrlich, Harvard Medical School Ricky Hirschhorn, Hood College Barry Honda, Simon Fraser University H. Robert Horvitz, Massachusetts Institute of Technology Nai-Jia Huang, Whitehead Institute Richard O. Hynes, Massachusetts Institute of Technology and Howard Hughes Medical Institute
Rudolf Jaenisch, Massachusetts Institute of Technology Cheryl Jorcyk, Boise State University Naohiro Kato, Louisiana State University Amy E. Keating, Massachusetts Institute of Technology Younghoon Kee, University of South Florida Eirini Kefalogianni, Harvard Medical School Thomas Keller, Florida State University
Greg Kelly, University of Western Ontario Baljit Khakh, University of California, Los Angeles Lou Kim, Florida International University Thomas Kirchhausen, Harvard Medical School Elaine Kirschke, University of California, San Francisco Cindy Klevickis, James Madison University Donna Koslowsky, Michigan State University Diego Krapf, Colorado State University Arnold Kriegsten, University of California, San Francisco Michael LaGier, Grand View University Brett Larson, Armstrong Atlantic State University Mark Lazzaro, College of Charleston Daniel Leahy, Johns Hopkins University School of Medicine Wesley Legant, Howard Hughes Medical Institute Fang Ju Lin, Coastal Carolina University Susan Lindquist, Massachusetts Institute of Techology Adam Linstedt, Carnegie Mellon University Jennifer Lippincott-Schwartz, National Institutes of Health James Lissemore, John Carroll University Richard Londraville, University of Akron Elizabeth Lord, University of California, Riverside Charles Mallery, University of Miami George M. Martin, University of Washington Michael Martin, John Carroll University C. William McCurdy, University of California, Davis, and Lawrence Berkeley National Laboratory
James McNew, Rice University Ivona Mladenovic, Simon Fraser University Vamsi K. Mootha, Harvard Medical School and Massachusetts General Hospital
Tsafrir Mor, Arizona State University Roderick Morgan, Grand Valley State University Sean Morrison, University of Texas Southwestern Medical School Aris Moustakas, Ludwig Institute, Uppsala University, Sweden Dana Newton, College of The Albemarle Bennett Novitch, University of California, Los Angeles Roel Nusse, Stanford University School of Medicine Jennifer Panizzi, Auburn University Samantha Parks, Georgia State University Ardem Patapoutian, The Scripps Research Institute Rekha Patel, University of South Carolina Aaron Pierce, Nicholls State University Joel Piperberg, Millersville University of Pennsylvania Todd Primm, Sam Houston State University April Pyle, University of California, Los Angeles Nicholas Quintyne, State University of New York at Fredonia Peter Reddien, Massachusetts Institute of Technology Mark Reedy, Creighton University Dan Reines, Emory University
Jatin Roper, Tufts University School of Medicine Evan Rosen, Harvard Medical School Richard Roy, McGill University Edmund Rucker, University of Kentucky Helen Saibil, University of London Alapakkam Sampath, University of California, Los Angeles Peter Santi, University of Minnesota Burkhard Schulz, Purdue University Thomas Schwartz, Massachusetts Institute of Technology Stylianos Scordilis, Smith College Kavita Shah, Purdue University Lin Shao, Howard Hughes Medical Institute Allan Showalter, Ohio University Jeff Singer, Portland State University Agnes Southgate, College of Charleston Daniel Starr, University of California, Davis Jacqueline Stephens, Louisiana State University Emina Stojkovic, Northeastern Illinois University Paul Teesdale-Spittle, Victoria University of Wellington, New Zealand
Kurt Toenjes, Montana State University Billings Fredrik Vannberg, Georgia Institute of Technology Pavithra Vivekanand, Susquehanna University Claire Walczak, Indiana University Barbara Waldman, University of South Carolina Feng-Song Wang, Purdue University Calumet Irving Wang, Whitehead Institute for Biomedical Research Keith Weninger, North Carolina State University Laurence Wong, Canadian University College Ernest Wright, University of California, Los Angeles Michael B. Yaffe, Massachusetts Institute of Technology Ning Yan, Tshinghua University Omer Yilmaz, Massachusetts Institute of Technology Junying Yuan, Harvard Medical School Ana Zimmerman, College of Charleston We would also like to express our gratitude and appreciation to all those who contributed to the resources on LaunchPad. A full list of these contributors is posted on the Molecular Cell Biology, Eighth Edition, LaunchPad. This edition would not have been possible without the careful and committed collaboration of our publishing partners at W. H. Freeman and Company. We thank Kate Ahr Parker, Beth Cole, Will Moore, Liz Geller, Norma Sims Roche, Blake Logan, Janice Donnola, Jennifer MacMillan, Sheena Goldstein, Teri Stratford, Nandini Ahuja, Abigail Fagan, Felicia Ruocco, Hilary Newman, Amy Thorne, Kathleen Wisneski, and Paul Rohloff for their labor and for their willingness to work overtime to produce a book that excels in every way. In particular, we would like to acknowledge the talent and commitment of our text editors, Erica Champion and
ACKNOWLEDGMENTS
t
xvii
Heather Moffat. They are remarkable editors. Thank you for all you’ve done in this edition. We are also indebted to H. Adam Steinberg for his pedagogical insight and his development of beautiful molecular models and illustrations. We would like to acknowledge those whose direct contributions to previous editions continue to influence in this edition, especially Ruth Steyn. Thanks to our own staff: Sally Bittancourt, Diane Bush, Mary Anne Donovan, Carol Eng, James Evans, George Kokkinogenis, Julie Knight, Guicky Waller, Nicki Watson, and Rob Welsh. Finally, special thanks to our families for inspiring us and for granting us the time it takes to work on such a book and to our mentors and advisers for encouraging us in our studies and teaching us much of what we know: (Harvey Lodish) my wife, Pamela; my children and grandchildren Heidi and Eric Steinert and Emma and Andrew Steinert; Martin Lodish, Kristin Schardt, and Sophia, Joshua, and
xviii
t
ACKNOWLEDGMENTS
Tobias Lodish; and Stephanie Lodish, Bruce Peabody, and Isaac and Violet Peabody; mentors Norton Zinder and Sydney Brenner; and also David Baltimore and Jim Darnell for collaborating on the first editions of this book; (Arnold Berk) my wife Sally, Jerry Berk, Shirley Berk, Angelina Smith, David Clayton, and Phil Sharp; (Chris A. Kaiser) my wife Kathy O’Neill, my mentors David Botstein and Randy Schekman; (Monty Krieger) my wife Nancy Krieger, parents I. Jay Krieger and Mildred Krieger, children Joshua and Ilana Krieger and Jonathan Krieger and Sofia Colucci, and grandchild Joaquin Krieger; my mentors Robert Stroud, Michael Brown, and Joseph Goldstein; (Anthony Bretscher) my wife Janice and daughters Heidi and Erika, and advisers A. Dale Kaiser and Klaus Weber; (Hidde Ploegh) my wife Anne Mahon; (Angelika Amon) my husband Johannes Weis, Theresa and Clara Weis, Gerry Fink and Frank Solomon; (Kelsey C. Martin) my husband Joel Braslow, children Seth, Ben, Sam, and Maya, father George M. Martin, and mentors Ari Helenius and Eric Kandel.
CONTENTS IN BRIEF
Part I
Chemical and Molecular Foundations
1
Molecules, Cells, and Model Organisms 1
2
Chemical Foundations 31
3
Protein Structure and Function 67
4
Culturing and Visualizing Cells 129
Part II
Biomembranes, Genes, and Gene Regulation
5
Fundamental Molecular Genetic Mechanisms 167
6
Molecular Genetic Techniques 223
7
Biomembrane Structure 271
8
Genes, Genomics, and Chromosomes 301
9
Transcriptional Control of Gene Expression 353
10
Part III
Post-transcriptional Gene Control 417
Cellular Organization and Function
11
Transmembrane Transport of Ions and Small Molecules 473
12
Cellular Energetics 513
13
Moving Proteins into Membranes and Organelles 583
14
Vesicular Traffic, Secretion, and Endocytosis 631
15
Signal Transduction and G Protein–Coupled Receptors 673
16
Signaling Pathways That Control Gene Expression 719
17
Cell Organization and Movement I: Microfilaments 775
18
Cell Organization and Movement II: Microtubules and Intermediate Filaments 821
19
The Eukaryotic Cell Cycle 873
Part IV Cell Growth and Differentiation 20
Integrating Cells into Tissues 921
21
Stem Cells, Cell Asymmetry, and Cell Death 975
22
Cells of the Nervous System 1025
23
Immunology 1079
24
Cancer 1135
xix
CONTENTS
Preface
vii
Part I Chemical and Molecular Foundations
1
Molecules, Cells, and Model Organisms 1
1.1 The Molecules of Life Proteins Give Cells Structure and Perform Most Cellular Tasks Nucleic Acids Carry Coded Information for Making Proteins at the Right Time and Place Phospholipids Are the Conserved Building Blocks of All Cellular Membranes
22
1.5 Metazoan Structure, Differentiation,
and Model Organisms
24
Tissues Are Organized into Organs
24
7
Genomics Has Revealed Important Aspects of Metazoan Evolution and Cell Function
24
9
Embryonic Development Uses a Conserved Set of Master Transcription Factors
25
Planaria Are Used to Study Stem Cells and Tissue Regeneration
27
Invertebrates, Fish, Mice, and Other Organisms Serve as Experimental Systems for Study of Human Development and Disease
28
Genetic Diseases Elucidate Important Aspects of Cell Function
28
The Following Chapters Present Much Experimental Data That Explains How We Know What We Know About Cell Structure and Function
29
7
10 10 11
1.3 Eukaryotic Cell Structure
12
The Cytoskeleton Has Many Important Functions
12
The Nucleus Contains the DNA Genome, RNA Synthetic Apparatus, and a Fibrous Matrix
12
Eukaryotic Cells Contain a Large Number of Internal Membrane Structures
14
2
Chemical Foundations
31
2.1 Covalent Bonds and Noncovalent
Interactions
33
The Electronic Structure of an Atom Determines the Number and Geometry of the Covalent Bonds It Can Make
33
18
Electrons May Be Shared Equally or Unequally in Covalent Bonds
34
18
Covalent Bonds Are Much Stronger and More Stable Than Noncovalent Interactions
36
Ionic Interactions Are Attractions Between Oppositely Charged Ions
36
37 38
Mitochondria Are the Principal Sites of ATP Production in Aerobic Cells
18
Chloroplasts Contain Internal Compartments in Which Photosynthesis Takes Place All Eukaryotic Cells Use a Similar Cycle to Regulate Their Division
1.4 Unicellular Eukaryotic
19
Yeasts Are Used to Study Fundamental Aspects of Eukaryotic Cell Structure and Function
19
Hydrogen Bonds Are Noncovalent Interactions That Determine the Water Solubility of Uncharged Molecules
Mutations in Yeast Led to the Identification of Key Cell Cycle Proteins
21
Van der Waals Interactions Are Weak Attractive Interactions Caused by Transient Dipoles
xx
24 24
Escherichia coli Is Widely Used in Biological Research
Model Organisms
The Parasite That Causes Malaria Has Novel Organelles That Allow It to Undergo a Remarkable Life Cycle
Epithelia Originated Early in Evolution
Prokaryotes Comprise Two Kingdoms: Archaea and Eubacteria
and Function
22
Multicellularity Requires Cell-Cell and Cell-Matrix Adhesions
5
1.2 Prokaryotic Cell Structure
and Function
Studies in the Alga Chlamydomonas reinhardtii Led to the Development of a Powerful Technique to Study Brain Function
The Hydrophobic Effect Causes Nonpolar Molecules to Adhere to One Another
39
NAD+ and FAD Couple Many Biological Oxidation and Reduction Reactions
Molecular Complementarity Due to Noncovalent Interactions Leads to a Lock-and-Key Fit Between Biomolecules
40
3
2.2 Chemical Building Blocks of Cells
41
63
Protein Structure and Function
67
3.1 Hierarchical Structure of Proteins
69
Amino Acids Differing Only in Their Side Chains Compose Proteins
42
The Primary Structure of a Protein Is Its Linear Arrangement of Amino Acids
Five Different Nucleotides Are Used to Build Nucleic Acids
45
Secondary Structures Are the Core Elements of Protein Architecture
70
46
Tertiary Structure Is the Overall Folding of a Polypeptide Chain
72
There Are Four Broad Structural Categories of Proteins
72
Different Ways of Depicting the Conformation of Proteins Convey Different Types of Information
74
Structural Motifs Are Regular Combinations of Secondary Structures
75
Domains Are Modules of Tertiary Structure
76
Monosaccharides Covalently Assemble into Linear and Branched Polysaccharides Phospholipids Associate Noncovalently to Form the Basic Bilayer Structure of Biomembranes
48
2.3 Chemical Reactions and
Chemical Equilibrium
51
69
A Chemical Reaction Is in Equilibrium When the Rates of the Forward and Reverse Reactions Are Equal
52
Multiple Polypeptides Assemble into Quaternary Structures and Supramolecular Complexes
78
The Equilibrium Constant Reflects the Extent of a Chemical Reaction
52
Comparing Protein Sequences and Structures Provides Insight into Protein Function and Evolution
79
Chemical Reactions in Cells Are at Steady State
52
Dissociation Constants of Binding Reactions Reflect the Affinity of Interacting Molecules
53
Biological Fluids Have Characteristic pH Values
54
Planar Peptide Bonds Limit the Shapes into Which Proteins Can Fold
81
Hydrogen Ions Are Released by Acids and Taken Up by Bases
55
The Amino Acid Sequence of a Protein Determines How It Will Fold
81
Buffers Maintain the pH of Intracellular and Extracellular Fluids
55
3.2 Protein Folding
81
Folding of Proteins in Vivo Is Promoted by Chaperones
82
Protein Folding Is Promoted by Proline Isomerases
86
Abnormally Folded Proteins Can Form Amyloids That Are Implicated in Diseases
87
2.4 Biochemical Energetics
57
Several Forms of Energy Are Important in Biological Systems
57
Cells Can Transform One Type of Energy into Another
58
The Change in Free Energy Determines If a Chemical Reaction Will Occur Spontaneously
Specific Binding of Ligands Underlies the Functions of Most Proteins
89
58
Enzymes Are Highly Efficient and Specific Catalysts
90
60
An Enzyme’s Active Site Binds Substrates and Carries Out Catalysis
91
Serine Proteases Demonstrate How an Enzyme’s Active Site Works
92
Enzymes in a Common Pathway Are Often Physically Associated with One Another
96
The ΔG°′ of a Reaction Can Be Calculated from Its Keq The Rate of a Reaction Depends on the Activation Energy Necessary to Energize the Reactants into a Transition State Life Depends on the Coupling of Unfavorable Chemical Reactions with Energetically Favorable Ones
60
3.3 Protein Binding and Enzyme Catalysis 89
61
3.4 Regulating Protein Function
97
Hydrolysis of ATP Releases Substantial Free Energy and Drives Many Cellular Processes
61
Regulated Synthesis and Degradation of Proteins Is a Fundamental Property of Cells
97
ATP Is Generated During Photosynthesis and Respiration
62
The Proteasome Is a Molecular Machine Used to Degrade Proteins
97
CONTENTS
t
xxi
Ubiquitin Marks Cytosolic Proteins for Degradation in Proteasomes Noncovalent Binding Permits Allosteric, or Cooperative, Regulation of Proteins Noncovalent Binding of Calcium and GTP Are Widely Used as Allosteric Switches to Control Protein Activity Phosphorylation and Dephosphorylation Covalently Regulate Protein Activity
99 100 101 102
Ubiquitinylation and Deubiquitinylation Covalently Regulate Protein Activity
103
Proteolytic Cleavage Irreversibly Activates or Inactivates Some Proteins
104
Higher-Order Regulation Includes Control of Protein Location
105
Centrifugation Can Separate Particles and Molecules That Differ in Mass or Density Electrophoresis Separates Molecules on the Basis of Their Charge-to-Mass Ratio Liquid Chromatography Resolves Proteins by Mass, Charge, or Affinity Highly Specific Enzyme and Antibody Assays Can Detect Individual Proteins Radioisotopes Are Indispensable Tools for Detecting Biological Molecules Mass Spectrometry Can Determine the Mass and Sequence of Proteins Protein Primary Structure Can Be Determined by Chemical Methods and from Gene Sequences Protein Conformation Is Determined by Sophisticated Physical Methods
3.6 Proteomics
A Wide Variety of Cell Biological Processes Can Be Studied with Cultured Cells
136
Drugs Are Commonly Used in Cell Biological Research
136
4.2 Light Microscopy: Exploring
Cell Structure and Visualizing Proteins WithinCells
139
141
106
Imaging Subcellular Details Often Requires That Specimens Be Fixed, Sectioned, and Stained
142
107
Fluorescence Microscopy Can Localize and Quantify Specific Molecules in Live Cells
143
109
Intracellular Ion Concentrations Can Be Determined with Ion-Sensitive Fluorescent Dyes
143
111
Immunofluorescence Microscopy Can Detect Specific Proteins in Fixed Cells
144
114
Tagging with Fluorescent Proteins Allows the Visualization of Specific Proteins in Live Cells
146
116
Deconvolution and Confocal Microscopy Enhance Visualization of Three-Dimensional Fluorescent Objects
147
118
Two-Photon Excitation Microscopy Allows Imaging Deep into Tissue Samples
149
119
TIRF Microscopy Provides Exceptional Imaging in One Focal Plane
150
FRAP Reveals the Dynamics of Cellular Components
151
FRET Measures Distance Between Fluorochromes
152
Super-Resolution Microscopy Can Localize Proteins to Nanometer Accuracy
153
Light-Sheet Microscopy Can Rapidly Image Cells in Living Tissue
155
122
Advanced Techniques in Mass Spectrometry Are Critical to Proteomic Analysis
123
129
4.3 Electron Microscopy:
High-ResolutionImaging
4.1 Growing and Studying
Cells in Culture
135
Phase-Contrast and Differential-Interference-Contrast Microscopy Visualize Unstained Live Cells
105
122
Culturing and Visualizing Cells
Hybridomas Produce Abundant Monoclonal Antibodies
139
Proteomics Is the Study of All or a Large Subset of Proteins in a Biological System
4
133
The Resolution of the Conventional Light Microscope Is About 0.2 μm
3.5 Purifying, Detecting, and
Characterizing Proteins
Growth of Cells in Two-Dimensional and Three-Dimensional Culture Mimics the In Vivo Environment
130
156
Single Molecules or Structures Can Be Imaged Using a Negative Stain or Metal Shadowing
157 158
Culture of Animal Cells Requires Nutrient-Rich Media and Special Solid Surfaces
130
Cells and Tissues Are Cut into Thin Sections for Viewing by Electron Microscopy
Primary Cell Cultures and Cell Strains Have a Finite Life Span
131
Immunoelectron Microscopy Localizes Proteins at the Ultrastructural Level
159
Transformed Cells Can Grow Indefinitely in Culture
132
Flow Cytometry Separates Different Cell Types
132
Cryoelectron Microscopy Allows Visualization of Specimens Without Fixation or Staining
160
xxii
t
CONTENTS
Scanning Electron Microscopy of Metal-Coated Specimens Reveals Surface Features
161
4.4 Isolation of Cell Organelles
161
Disruption of Cells Releases Their Organelles and Other Contents
162
Centrifugation Can Separate Many Types of Organelles
5.4 Stepwise Synthesis of Proteins
onRibosomes
188
Ribosomes Are Protein-Synthesizing Machines Met
Methionyl-tRNAi Start Codon
188
Recognizes the AUG 190
162
Eukaryotic Translation Initiation Usually Occurs at the First AUG Downstream from the 5′ End of an mRNA
Organelle-Specific Antibodies Are Useful in Preparing Highly Purified Organelles
162
During Chain Elongation Each Incoming Aminoacyl-tRNA Moves Through Three Ribosomal Sites
193
Proteomics Reveals the Protein Composition of Organelles
164
Translation Is Terminated by Release Factors When a Stop Codon Is Reached
195
Polysomes and Rapid Ribosome Recycling Increase the Efficiency of Translation
195
GTPase-Superfamily Proteins Function in Several Quality-Control Steps of Translation
195
Nonsense Mutations Cause Premature Termination of Protein Synthesis
196
Part II Biomembranes, Genes, and Gene Regulation
5
Fundamental Molecular Genetic Mechanisms
5.1 Structure of Nucleic Acids
167 169
5.5 DNA Replication
191
197
170
DNA Polymerases Require a Primer to Initiate Replication
197
170
Duplex DNA Is Unwound, and Daughter Strands Are Formed at the DNA Replication Fork
199
DNA Can Undergo Reversible Strand Separation
172
Several Proteins Participate in DNA Replication
199
Torsional Stress in DNA Is Relieved by Enzymes
174
DNA Replication Occurs Bidirectionally from Each Origin
201
Different Types of RNA Exhibit Various Conformations Related to Their Functions
174
A Nucleic Acid Strand Is a Linear Polymer with End-to-End Directionality Native DNA Is a Double Helix of Complementary Antiparallel Strands
5.6 DNA Repair and Recombination
203
DNA Polymerases Introduce Copying Errors and Also Correct Them
203
176
Chemical and Radiation Damage to DNA Can Lead to Mutations
203
A Template DNA Strand Is Transcribed into a Complementary RNA Chain by RNA Polymerase
176
High-Fidelity DNA Excision-Repair Systems Recognize and Repair Damage
204
Organization of Genes Differs in Prokaryotic and Eukaryotic DNA
179
Base Excision Repairs T-G Mismatches and Damaged Bases
205
Eukaryotic Precursor mRNAs Are Processed to Form Functional mRNAs
180
Mismatch Excision Repairs Other Mismatches and Small Insertions and Deletions
205
Alternative RNA Splicing Increases the Number of Proteins Expressed from a Single Eukaryotic Gene
181
Nucleotide Excision Repairs Chemical Adducts that Distort Normal DNA Shape
206
Two Systems Use Recombination to Repair Double-Strand Breaks in DNA
207
Homologous Recombination Can Repair DNA Damage and Generate Genetic Diversity
209
5.2 Transcription of Protein-Coding
Genes and Formation of Functional mRNA
5.3 The Decoding of mRNA by tRNAs
183
Messenger RNA Carries Information from DNA in a Three-Letter Genetic Code
183
The Folded Structure of tRNA Promotes Its Decoding Functions
185
Nonstandard Base Pairing Often Occurs Between Codons and Anticodons
186
Most Viral Host Ranges Are Narrow
212
Amino Acids Become Activated When Covalently Linked to tRNAs
188
Viral Capsids Are Regular Arrays of One or a Few Types of Protein
213
5.7 Viruses: Parasites of the Cellular
Genetic System
212
CONTENTS
t
xxiii
Viruses Can Be Cloned and Counted in Plaque Assays
213
Lytic Viral Growth Cycles Lead to Death of Host Cells
213
Viral DNA Is Integrated into the Host-Cell Genome in Some Nonlytic Viral Growth Cycles
216
6
Molecular Genetic Techniques
223
6.1 Genetic Analysis of Mutations to
Identify and Study Genes
224
Recessive and Dominant Mutant Alleles Generally Have Opposite Effects on Gene Function
224
Segregation of Mutations in Breeding Experiments Reveals Their Dominance or Recessivity
225
Conditional Mutations Can Be Used to Study Essential Genes in Yeast
227
Recessive Lethal Mutations in Diploids Can Be Identified by Inbreeding and Maintained in Heterozygotes
Plasmid Expression Vectors Can Be Designed for Use in Animal Cells
251
6.4 Locating and Identifying
Human Disease Genes
254
Monogenic Diseases Show One of Three Patterns of Inheritance
254
DNA Polymorphisms Are Used as Markers for Linkage Mapping of Human Mutations
255
Linkage Studies Can Map Disease Genes with a Resolution of About 1 Centimorgan
256
Further Analysis Is Needed to Locate a Disease Gene in Cloned DNA
257
Many Inherited Diseases Result from Multiple Genetic Defects
257
6.5 Inactivating the Function of
Specific Genes in Eukaryotes
228
259
Normal Yeast Genes Can Be Replaced with Mutant Alleles by Homologous Recombination
260
Genes Can Be Placed Under the Control of an Experimentally Regulated Promoter
260
Specific Genes Can Be Permanently Inactivated in the Germ Line of Mice
261
Somatic Cell Recombination Can Inactivate Genes in Specific Tissues
261
Dominant-Negative Alleles Can Inhibit the Function of Some Genes
262
234
RNA Interference Causes Gene Inactivation by Destroying the Corresponding mRNA
264
Isolated DNA Fragments Can Be Cloned into E. coli Plasmid Vectors
236
Engineered CRISPR–Cas9 Systems Allow Precise Genome Editing
266
Yeast Genomic Libraries Can Be Constructed with Shuttle Vectors and Screened by Functional Complementation
237
cDNA Libraries Represent the Sequences of Protein-Coding Genes
7
238
The Polymerase Chain Reaction Amplifies a Specific DNA Sequence from a Complex Mixture
7.1 The Lipid Bilayer: Composition
239
Cloned DNA Molecules Can Be Sequenced Rapidly by Methods Based on PCR
243
Complementation Tests Determine Whether Different Recessive Mutations Are in the Same Gene Double Mutants Are Useful in Assessing the Order in Which Proteins Function Genetic Suppression and Synthetic Lethality Can Reveal Interacting or Redundant Proteins Genes Can Be Identified by Their Map Position on the Chromosome
6.2 DNA Cloning and Characterization Restriction Enzymes and DNA Ligases Allow Insertion of DNA Fragments into Cloning Vectors
229 230 231 232
234
Hybridization Techniques Permit Detection of Specific DNA Fragments and mRNAs DNA Microarrays Can Be Used to Evaluate the Expression of Many Genes at One Time Cluster Analysis of Multiple Expression Experiments Identifies Co-regulated Genes E. coli Expression Systems Can Produce Large Quantities of Proteins from Cloned Genes
xxiv
t
CONTENTS
andStructural Organization
271 273
Phospholipids Spontaneously Form Bilayers
273
Phospholipid Bilayers Form a Sealed Compartment Surrounding an Internal AqueousSpace
274
Biomembranes Contain Three Principal Classes of Lipids
276
246
Most Lipids and Many Proteins Are Laterally Mobile in Biomembranes
278
246
Lipid Composition Influences the Physical Properties of Membranes
279
247
Lipid Composition Is Different in the Exoplasmic and Cytosolic Leaflets
281
248
Cholesterol and Sphingolipids Cluster with Specific Proteins in Membrane Microdomains
282
Cells Store Excess Lipids in Lipid Droplets
283
6.3 Using Cloned DNA Fragments
to Study Gene Expression
Biomembrane Structure
249
7.2 Membrane Proteins: Structure
and Basic Functions
284
Proteins Interact with Membranes in Three Different Ways
284
Most Transmembrane Proteins Have Membrane-Spanning α Helices
285
Multiple β Strands in Porins Form Membrane-Spanning “Barrels”
288
Covalently Attached Lipids Anchor Some Proteins to Membranes All Transmembrane Proteins and Glycolipids Are Asymmetrically Oriented in the Bilayer Lipid-Binding Motifs Help Target Peripheral Proteins to the Membrane Proteins Can Be Removed from Membranes by Detergents or High-Salt Solutions
288 289 290 290
7.3 Phospholipids, Sphingolipids, and
Cholesterol: Synthesis and Intracellular Movement
293
Fatty Acids Are Assembled from Two-Carbon Building Blocks by Several Important Enzymes
293
Small Cytosolic Proteins Facilitate Movement of Fatty Acids
293
Fatty Acids Are Incorporated into Phospholipids Primarily on the ER Membrane
294
Flippases Move Phospholipids from One Membrane Leaflet to the Opposite Leaflet
295
Cholesterol Is Synthesized by Enzymes in the Cytosol and ER Membrane Cholesterol and Phospholipids Are Transported Between Organelles by Several Mechanisms
8
295 296
Genes, Genomics, and Chromosomes 301
8.1 Eukaryotic Gene Structure
DNA Fingerprinting Depends on Differences in Length of Simple-Sequence DNAs
311
Unclassified Intergenic DNA Occupies a Significant Portion of the Genome
312
8.3 Transposable (Mobile) DNA Elements 312 Movement of Mobile Elements Involves a DNA or an RNA Intermediate
313
DNA Transposons Are Present in Prokaryotes and Eukaryotes
314
LTR Retrotransposons Behave Like Intracellular Retroviruses
316
Non-LTR Retrotransposons Transpose by a Distinct Mechanism
318
Other Retroposed RNAs Are Found in Genomic DNA
321
Mobile DNA Elements Have Significantly Influenced Evolution
321
8.4 Genomics: Genome-Wide Analysis
ofGene Structure and Function
323
Stored Sequences Suggest Functions of Newly Identified Genes and Proteins
324
Comparison of Related Sequences from Different Species Can Give Clues to Evolutionary Relationships Among Proteins
325
Genes Can Be Identified Within Genomic DNA Sequences
326
The Number of Protein-Coding Genes in an Organism’s Genome Is Not Directly Related to Its Biological Complexity
326
8.5 Structural Organization
ofEukaryoticChromosomes 303
327
Chromatin Exists in Extended and Condensed Forms
328
Most Eukaryotic Genes Contain Introns and Produce mRNAs Encoding Single Proteins
303
Modifications of Histone Tails Control Chromatin Condensation and Function
330
Simple and Complex Transcription Units Are Found in Eukaryotic Genomes
303
Nonhistone Proteins Organize Long Chromatin Loops
335
Protein-Coding Genes May Be Solitary or Belong to a Gene Family
305
Additional Nonhistone Proteins Regulate Transcription and Replication
339
Heavily Used Gene Products Are Encoded by Multiple Copies of Genes
307
8.6 Morphology and Functional
Nonprotein-Coding Genes Encode Functional RNAs
308
Elements of Eukaryotic Chromosomes
341
Chromosome Number, Size, and Shape at Metaphase Are Species-Specific
341
309
During Metaphase, Chromosomes Can Be Distinguished by Banding Patterns and Chromosome Painting
341
310
Chromosome Painting and DNA Sequencing Reveal the Evolution of Chromosomes
342
8.2 Chromosomal Organization of
Genes and Noncoding DNA Genomes of Many Organisms Contain Nonfunctional DNA Most Simple-Sequence DNAs Are Concentrated in Specific Chromosomal Locations
309
CONTENTS
t
xxv
Interphase Polytene Chromosomes Arise by DNA Amplification
343
9.4 Regulatory Sequences in Protein-Coding
Genes and the Proteins Through Which They Function
378
Three Functional Elements Are Required for Replication and Stable Inheritance of Chromosomes
345
Promoter-Proximal Elements Help Regulate Eukaryotic Genes
378
Centromere Sequences Vary Greatly in Length and Complexity
345
Distant Enhancers Often Stimulate Transcription by RNA Polymerase II
379
Most Eukaryotic Genes Are Regulated by Multiple Transcription-Control Elements
379
DNase I Footprinting and EMSA Detect Protein-DNA Interactions
380
Activators Are Composed of Distinct Functional Domains
381
Repressors Are the Functional Converse of Activators
383
DNA-Binding Domains Can Be Classified into Numerous Structural Types
384
Structurally Diverse Activation and Repression Domains Regulate Transcription
386
Transcription Factor Interactions Increase Gene-Control Options
387
Multiprotein Complexes Form on Enhancers
388
Addition of Telomeric Sequences by Telomerase Prevents Shortening of Chromosomes
9
Transcriptional Control of Gene Expression
347
353
9.1 Control of Gene Expression in
Bacteria
356
Transcription Initiation by Bacterial RNA Polymerase Requires Association with a Sigma Factor
357
Initiation of lac Operon Transcription Can Be Repressed or Activated
357
Small Molecules Regulate Expression of Many Bacterial Genes via DNA-Binding Repressors and Activators
358
Transcription Initiation from Some Promoters Requires Alternative Sigma Factors
359
Transcription by σ54-RNA Polymerase Is Controlled by Activators That Bind Far from the Promoter
359
Formation of Heterochromatin Silences Gene Expression at Telomeres, near Centromeres, and in Other Regions
Many Bacterial Responses Are Controlled by Two-Component Regulatory Systems
360
Repressors Can Direct Histone Deacetylation at Specific Genes
393
Activators Can Direct Histone Acetylation at Specific Genes
394
Chromatin-Remodeling Complexes Help Activate or Repress Transcription
395
Pioneer Transcription Factors Initiate the Process of Gene Activation During Cellular Differentiation
395
The Mediator Complex Forms a Molecular Bridge Between Activation Domains and Pol II
396
Expression of Many Bacterial Operons Is Controlled by Regulation of Transcriptional Elongation
361
9.2 Overview of Eukaryotic Gene Control 363 Regulatory Elements in Eukaryotic DNA Are Found Both Close to and Many Kilobases Away from Transcription Start Sites
364
Three Eukaryotic RNA Polymerases Catalyze Formation of Different RNAs
367
The Largest Subunit in RNA Polymerase II Has an Essential Carboxy-Terminal Repeat
370
9.3 RNA Polymerase II Promoters and
General Transcription Factors RNA Polymerase II Initiates Transcription at DNA Sequences Corresponding to the 5′ Cap of mRNAs
371 371
9.5 Molecular Mechanisms of Transcription
Repression and Activation
390 390
9.6 Regulation of Transcription-
FactorActivity
398
DNase I Hypersensitive Sites Reflect the Developmental History of Cellular Differentiation
398
Nuclear Receptors Are Regulated by Extracellular Signals
400
All Nuclear Receptors Share a Common Domain Structure
400
Nuclear-Receptor Response Elements Contain Inverted or Direct Repeats
400
The TATA Box, Initiators, and CpG Islands Function as Promoters in Eukaryotic DNA
371
General Transcription Factors Position RNA Polymerase II at Start Sites and Assist in Initiation
Hormone Binding to a Nuclear Receptor Regulates Its Activity as a Transcription Factor
402
373
Metazoans Regulate the RNA Polymerase II Transition from Initiation to Elongation
402
Termination of Transcription Is Also Regulated
402
Elongation Factors Regulate the Initial Stages of Transcription in the Promoter-Proximal Region
xxvi
t
CONTENTS
377
9.7 Epigenetic Regulation of
Transcription
404
DNA Methylation Represses Transcription
404
Methylation of Specific Histone Lysines Is Linked to Epigenetic Mechanisms of Gene Repression
405
Epigenetic Control by Polycomb and Trithorax Complexes
406
Long Noncoding RNAs Direct Epigenetic Repression in Metazoans
409
9.8 Other Eukaryotic Transcription
Systems Transcription Initiation by Pol I and Pol III Is Analogous to That by Pol II
10
Post-transcriptional Gene Control
412 412
417
439
10.3 Transport of mRNA Across
the Nuclear Envelope
440
Phosphorylation and Dephosphorylation of SR Proteins Imposes Directionality on mRNP Export Across the Nuclear Pore Complex
441
Balbiani Rings in Insect Larval Salivary Glands Allow Direct Visualization of mRNP Export Through NPCs
442
Pre-mRNAs in Spliceosomes Are Not Exported from the Nucleus
443
HIV Rev Protein Regulates the Transport of Unspliced Viral mRNAs
444
10.4 Cytoplasmic Mechanisms
of Post-transcriptional Control
10.1 Processing of Eukaryotic
Pre-mRNA
RNA Editing Alters the Sequences of Some Pre-mRNAs
419
445
Degradation of mRNAs in the Cytoplasm Occurs by Several Mechanisms
445
Adenines in mRNAs and lncRNAs May Be Post-transcriptionally Modified by N6 Methylation
447
Micro-RNAs Repress Translation and Induce Degradation of Specific mRNAs
447
424
Alternative Polyadenylation Increases miRNA Control Options
450
Spliceosomes, Assembled from snRNPs and a Pre-mRNA, Carry Out Splicing
426
RNA Interference Induces Degradation of Precisely Complementary mRNAs
450
Chain Elongation by RNA Polymerase II Is Coupled to the Presence of RNA-Processing Factors
428
Cytoplasmic Polyadenylation Promotes Translation of Some mRNAs
451
SR Proteins Contribute to Exon Definition in Long Pre-mRNAs
Protein Synthesis Can Be Globally Regulated
452
428
Self-Splicing Group II Introns Provide Clues to the Evolution of snRNAs
Sequence-Specific RNA-Binding Proteins Control Translation of Specific mRNAs
455
429
3′ Cleavage and Polyadenylation of Pre-mRNAs Are Tightly Coupled
Surveillance Mechanisms Prevent Translation of Improperly Processed mRNAs
456
430
Nuclear Exoribonucleases Degrade RNA That Is Processed Out of Pre-mRNAs
Localization of mRNAs Permits Production of Proteins at Specific Regions Within the Cytoplasm
457
432
RNA Processing Solves the Problem of Pervasive Transcription of the Genome in Metazoans
432
The 5′ Cap Is Added to Nascent RNAs Shortly After Transcription Initiation
420
A Diverse Set of Proteins with Conserved RNA-Binding Domains Associate with Pre-mRNAs
421
Splicing Occurs at Short, Conserved Sequences in Pre-mRNAs via Two Transesterification Reactions
423
During Splicing, snRNAs Base-Pair with Pre-mRNA
10.5 Processing of rRNA
10.2 Regulation of Pre-mRNA
Processing
435
and tRNA
461
Pre-rRNA Genes Function as Nucleolar Organizers
461
Small Nucleolar RNAs Assist in Processing Pre-rRNAs
462
Alternative Splicing Generates Transcripts with Different Combinations of Exons
435
Self-Splicing Group I Introns Were the First Examples of Catalytic RNA
466
A Cascade of Regulated RNA Splicing Controls Drosophila Sexual Differentiation
435
Pre-tRNAs Undergo Extensive Modification in the Nucleus
466
Splicing Repressors and Activators Control Splicing at Alternative Sites
437
Nuclear Bodies Are Functionally Specialized Nuclear Domains
468
CONTENTS
t
xxvii
Part III Cellular Organization and Function
11
Transmembrane Transport of Ions and Small Molecules
473
11.1 Overview of Transmembrane
Transport
474
11.4 Nongated Ion Channels and
the Resting Membrane Potential
495
Selective Movement of Ions Creates a Transmembrane Electric Gradient
495
The Resting Membrane Potential in Animal Cells Depends Largely on the Outward Flow of K+ Ions Through Open K+ Channels
497
Ion Channels Are Selective for Certain Ions by Virtue of a Molecular “Selectivity Filter”
497
Only Gases and Small Uncharged Molecules Cross Membranes by Simple Diffusion
474
Three Main Classes of Membrane Proteins Transport Molecules and Ions Across Cellular Membranes
Patch Clamps Permit Measurement of Ion Movements Through Single Channels
500
475
Novel Ion Channels Can Be Characterized by a Combination of Oocyte Expression and Patch Clamping
501
11.2 Facilitated Transport of
Glucose andWater
477
Uniport Transport Is Faster and More Specific than Simple Diffusion
477
The Low Km of the GLUT1 Uniporter Enables It to Transport Glucose into Most Mammalian Cells
478
The Human Genome Encodes a Family of Sugar-Transporting GLUT Proteins
480
Transport Proteins Can Be Studied Using Artificial Membranes and Recombinant Cells
480 481
Aquaporins Increase the Water Permeability of Cellular Membranes
481
483
There Are Four Main Classes of ATP-Powered Pumps
484
ATP-Powered Ion Pumps Generate and Maintain Ionic Gradients Across Cellular Membranes
485
Muscle Relaxation Depends on Ca2+ ATPases That Pump Ca2+ from the Cytosol into the Sarcoplasmic Reticulum
486
The Mechanism of Action of the Ca2+ Pump Is Known in Detail
486
489
The Na+/K+ ATPase Maintains the Intracellular Na+ and K+ Concentrations in Animal Cells
489
+
V-Class H ATPases Maintain the Acidity of Lysosomes and Vacuoles
489
ABC Proteins Export a Wide Variety of Drugs and Toxins from the Cell
491
xxviii
t
CONTENTS
Na -Linked Symporters Enable Animal Cells to Import Glucose and Amino Acids Against High Concentration Gradients A Bacterial Na /Amino Acid Symporter Reveals How Symport Works
502
503
493 494
504
2+
A Na -Linked Ca Antiporter Regulates the Strength of Cardiac Muscle Contraction
504
Several Cotransporters Regulate Cytosolic pH
505
An Anion Antiporter Is Essential for Transport of CO2 by Erythrocytes
506
Numerous Transport Proteins Enable Plant Vacuoles to Accumulate Metabolites and Ions
507
11.6 Transcellular Transport
Calmodulin Regulates the Plasma-Membrane Pumps That Control Cytosolic Ca2+ Concentrations
The ABC Cystic Fibrosis Transmembrane Regulator Is a Chloride Channel, Not a Pump
502
+
+
11.3 ATP-Powered Pumps and the
Certain ABC Proteins “Flip” Phospholipids and Other Lipid-Soluble Substrates from One Membrane Leaflet to the Other
and Antiporters Na+ Entry into Mammalian Cells Is Thermodynamically Favored
+
Osmotic Pressure Causes Water to Move Across Membranes
Intracellular Ionic Environment
11.5 Cotransport by Symporters
508
Multiple Transport Proteins Are Needed to Move Glucose and Amino Acids Across Epithelia
508
Simple Rehydration Therapy Depends on the Osmotic Gradient Created by Absorption of Glucose and Na+
509
Parietal Cells Acidify the Stomach Contents While Maintaining a Neutral Cytosolic pH
509
Bone Resorption Requires the Coordinated Function of a V-Class Proton Pump and a Specific Chloride Channel
510
12
Cellular Energetics
513
12.1 First Step of Harvesting Energy
from Glucose: Glycolysis
515
During Glycolysis (Stage I), Cytosolic Enzymes Convert Glucose to Pyruvate
516
The Rate of Glycolysis Is Adjusted to Meet the Cell’s Need for ATP
516
Glucose Is Fermented When Oxygen Is Scarce
518
12.2 The Structure and Functions of
Mitochondria
520
Mitochondria Are Multifunctional Organelles
520
Mitochondria Have Two Structurally and Functionally Distinct Membranes
520
Mitochondria Contain DNA Located in the Matrix
523
The Size, Structure, and Coding Capacity of mtDNA Vary Considerably Among Organisms
525
Products of Mitochondrial Genes Are NotExported
Experiments Using Purified Electron-Transport Chain Complexes Established the Stoichiometry of Proton Pumping
549
The Proton-Motive Force in Mitochondria Is Due Largely to a Voltage Gradient Across the Inner Membrane
550
12.5 Harnessing the Proton-Motive
Force to Synthesize ATP
551
526
The Mechanism of ATP Synthesis Is Shared Among Bacteria, Mitochondria, and Chloroplasts
552
Mitochondria Evolved from a Single Endosymbiotic Event Involving a Rickettsia-Like Bacterium
527
ATP Synthase Comprises F0 and F1 Multiprotein Complexes
553
Mitochondrial Genetic Codes Differ from the Standard Nuclear Code
527
Rotation of the F1 γ Subunit, Driven by Proton Movement Through F0, Powers ATP Synthesis
554
Mutations in Mitochondrial DNA Cause Several Genetic Diseases in Humans
528
Multiple Protons Must Pass Through ATP Synthase to Synthesize One ATP
555
Mitochondria Are Dynamic Organelles That Interact Directly with One Another
528
F0 c Ring Rotation Is Driven by Protons Flowing Through Transmembrane Channels
556
529
ATP-ADP Exchange Across the Inner Mitochondrial Membrane Is Powered by the Proton-Motive Force
556
The Rate of Mitochondrial Oxidation Normally Depends on ADP Levels
558
Mitochondria in Brown Fat Use the Proton-Motive Force to Generate Heat
558
Mitochondria Are Influenced by Direct Contacts with the Endoplasmic Reticulum
12.3 The Citric Acid Cycle and Fatty
AcidOxidation In the First Part of Stage II, Pyruvate Is Converted to Acetyl CoA and High-Energy Electrons In the Second Part of Stage II, the Citric Acid Cycle Oxidizes the Acetyl Group in Acetyl CoA to CO2 and GeneratesHigh-Energy Electrons
533 533
12.6 Photosynthesis and Light-
Absorbing Pigments 533
Transporters in the Inner Mitochondrial Membrane Help Maintain Appropriate Cytosolic and Matrix Concentrations of NAD+ and NADH
535
Mitochondrial Oxidation of Fatty Acids Generates ATP
536
Peroxisomal Oxidation of Fatty Acids Generates No ATP
537
12.4 The Electron-Transport Chain
and Generation of the Proton-Motive Force
539
Oxidation of NADH and FADH2 Releases a Significant Amount of Energy
539
Electron Transport in Mitochondria Is Coupled to Proton Pumping
539
Electrons Flow “Downhill” Through a Series of Electron Carriers
540
Four Large Multiprotein Complexes Couple Electron Transport to Proton Pumping Across the Inner Mitochondrial Membrane
542
The Reduction Potentials of Electron Carriers in the ElectronTransport Chain Favor Electron Flow from NADH to O2 546 The Multiprotein Complexes of the Electron-Transport Chain Assemble into Supercomplexes
546
Reactive Oxygen Species Are By-Products of Electron Transport
547
560
Thylakoid Membranes in Chloroplasts Are the Sites of Photosynthesis in Plants
560
Chloroplasts Contain Large DNAs Often Encoding More Than a Hundred Proteins
560
Three of the Four Stages in Photosynthesis Occur Only During Illumination
561
Photosystems Comprise a Reaction Center and Associated Light-Harvesting Complexes
563
Photoelectron Transport from Energized Reaction-Center Chlorophyll a Produces a Charge Separation
564
Internal Antennas and Light-Harvesting Complexes Increase the Efficiency of Photosynthesis
566
12.7 Molecular Analysis of Photosystems 567 The Single Photosystem of Purple Bacteria Generates a Proton-Motive Force but No O2
567
Chloroplasts Contain Two Functionally and Spatially Distinct Photosystems
567
Linear Electron Flow Through Both Plant Photosystems Generates a Proton-Motive Force, O2, and NADPH
568
An Oxygen-Evolving Complex Is Located on the Luminal Surface of the PSII Reaction Center
569
Multiple Mechanisms Protect Cells Against Damage from Reactive Oxygen Species During Photoelectron Transport
570
CONTENTS
t
xxix
Cyclic Electron Flow Through PSI Generates a Proton-Motive Force but No NADPH or O2
570
Disulfide Bonds Are Formed and Rearranged by Proteins in the ER Lumen
603
Relative Activities of Photosystems I and II Are Regulated
571
Chaperones and Other ER Proteins Facilitate Folding and Assembly of Proteins
604
Improperly Folded Proteins in the ER Induce Expression of Protein-Folding Catalysts
606
Unassembled or Misfolded Proteins in the ER Are Often Transported to the Cytosol for Degradation
607
12.8 CO2 Metabolism During
Photosynthesis
573
Rubisco Fixes CO2 in the Chloroplast Stroma
573
Synthesis of Sucrose Using Fixed CO2 Is Completed in the Cytosol
573
Light and Rubisco Activase Stimulate CO2 Fixation
574
Photorespiration Competes with Carbon Fixation and Is Reduced in C4 Plants
576
13
Moving Proteins into Membranes andOrganelles
583
13.1 Targeting Proteins To and Across
the ER Membrane
585
13.4 Targeting of Proteins to
Mitochondria and Chloroplasts
608
Amphipathic N-Terminal Targeting Sequences Direct Proteins to the Mitochondrial Matrix
609
Mitochondrial Protein Import Requires Outer-Membrane Receptors and Translocons in Both Membranes
610
Studies with Chimeric Proteins Demonstrate Important Features of Mitochondrial Protein Import
612
Three Energy Inputs Are Needed to Import Proteins intoMitochondria
613
Multiple Signals and Pathways Target Proteins to Submitochondrial Compartments
613
Pulse-Chase Experiments with Purified ER Membranes Demonstrated That Secreted Proteins Cross the ER Membrane
586
Import of Chloroplast Stromal Proteins Is Similar to Import of Mitochondrial Matrix Proteins
617
A Hydrophobic N-Terminal Signal Sequence Targets Nascent Secretory Proteins to the ER
586
Proteins Are Targeted to Thylakoids by Mechanisms Related to Bacterial Protein Translocation
617
Cotranslational Translocation Is Initiated by Two GTP-Hydrolyzing Proteins
588
13.5 Targeting of Peroxisomal Proteins
619
Passage of Growing Polypeptides Through the Translocon Is Driven by Translation
589
A Cytosolic Receptor Targets Proteins with an SKL Sequence at the C-Terminus to the Peroxisomal Matrix
619
591
Peroxisomal Membrane and Matrix Proteins Are Incorporated by Different Pathways
621
ATP Hydrolysis Powers Post-translational Translocation of Some Secretory Proteins in Yeast
13.6 Transport Into and Out of
13.2 Insertion of Membrane
Proteins into the ER Several Topological Classes of Integral Membrane Proteins Are Synthesized on the ER Internal Stop-Transfer Anchor and Signal-Anchor Sequences Determine Topology of Single-Pass Proteins
the Nucleus
593 593
622
594
Nuclear Transport Receptors Escort Proteins Containing Nuclear-Localization Signals into the Nucleus
624
625 627
Multipass Proteins Have Multiple Internal Topogenic Sequences
597
A Phospholipid Anchor Tethers Some Cell-Surface Proteins to the Membrane
A Second Type of Nuclear Transport Receptor Escorts Proteins Containing Nuclear-Export Signals Out of the Nucleus
598
The Topology of a Membrane Protein Can Often Be Deduced from Its Sequence
Most mRNAs Are Exported from the Nucleus by a Ran-Independent Mechanism
599
14
13.3 Protein Modifications, Folding, and
Quality Control in the ER
601 601
Oligosaccharide Side Chains May Promote Folding and Stability of Glycoproteins
602
t
CONTENTS
Vesicular Traffic, Secretion, and Endocytosis
631
14.1 Techniques for Studying the
A Preformed N-Linked Oligosaccharide Is Added to Many Proteins in the Rough ER
xxx
622
Large and Small Molecules Enter and Leave the Nucleus via Nuclear Pore Complexes
SecretoryPathway Transport of a Protein Through the Secretory Pathway Can Be Assayed in Live Cells
634 634
Yeast Mutants Define Major Stages and Many Components in Vesicular Transport
635
Cell-Free Transport Assays Allow Dissection ofIndividual Steps in Vesicular Transport
637
14.2 Molecular Mechanisms of Vesicle
Budding and Fusion
638
Assembly of a Protein Coat Drives Vesicle Formation and Selection of Cargo Molecules
638
A Conserved Set of GTPase Switch Proteins Controls the Assembly of Different Vesicle Coats
639
Targeting Sequences on Cargo Proteins Make Specific Molecular Contacts with Coat Proteins
641
Rab GTPases Control Docking of Vesicles on Target Membranes
641
Paired Sets of SNARE Proteins Mediate Fusion of Vesicles with Target Membranes
642
Dissociation of SNARE Complexes After Membrane Fusion Is Driven by ATP Hydrolysis
644
662
The Endocytic Pathway Delivers Iron to Cells Without Dissociation of the Transferrin–Transferrin Receptor Complex in Endosomes
663
14.6 Directing Membrane Proteins and
Cytosolic Materials to the Lysosome 665 Multivesicular Endosomes Segregate Membrane Proteins Destined for the Lysosomal Membrane from Proteins Destinedfor Lysosomal Degradation
665
Retroviruses Bud from the Plasma Membrane by a Process Similar to Formation of Multivesicular Endosomes
666
The Autophagic Pathway Delivers Cytosolic Proteins or Entire Organelles to Lysosomes 667
15
Signal Transduction and G Protein– Coupled Receptors 673
15.1 Signal Transduction: From
14.3 Early Stages of the Secretory
Pathway
The Acidic pH of Late Endosomes Causes Most Receptor-Ligand Complexes to Dissociate
645
Extracellular Signal to Cellular Response
675
COPII Vesicles Mediate Transport from the ER to the Golgi
645
Signaling Molecules Can Act Locally or at a Distance
675
COPI Vesicles Mediate Retrograde Transport Within the Golgi and from the Golgi to the ER
647
Receptors Bind Only a Single Type of Hormone or a Group of Closely Related Hormones
676
Anterograde Transport Through the Golgi Occurs by Cisternal Maturation
648
Protein Kinases and Phosphatases Are Employed in Many Signaling Pathways
676
GTP-Binding Proteins Are Frequently Used in Signal Transduction Pathways as On/Off Switches
677
Intracellular “Second Messengers” Transmit Signals from Many Receptors
678
Signal Transduction Pathways Can Amplify the Effects of Extracellular Signals
679
14.4 Later Stages of the Secretory
Pathway Vesicles Coated with Clathrin and Adapter Proteins Mediate Transport from the trans-Golgi
650 651
Dynamin Is Required for Pinching Off ofClathrin-Coated Vesicles
652
Mannose 6-Phosphate Residues Target Soluble Proteins toLysosomes
653
Study of Lysosomal Storage Diseases Revealed Key Components of the Lysosomal Sorting Pathway
655
Protein Aggregation in the trans-Golgi May Function in Sorting Proteins to Regulated Secretory Vesicles
655
Some Proteins Undergo Proteolytic Processing After Leaving the trans-Golgi
656
Several Pathways Sort Membrane Proteins to the Apical or Basolateral Region of Polarized Cells
657
14.5 Receptor-Mediated Endocytosis
659
Cells Take Up Lipids from the Blood in the Form of Large, Well-Defined Lipoprotein Complexes
659
Receptors for Macromolecular Ligands Contain Sorting Signals That Target Them for Endocytosis
660
15.2 Studying Cell-Surface Receptors
and Signal Transduction Proteins
681
The Dissociation Constant Is a Measure of the Affinity of a Receptor for Its Ligand
681
Binding Assays Are Used to Detect Receptors and Determine Their Affinity and Specificity for Ligands
681
Near-Maximal Cellular Response to a Signaling Molecule Usually Does Not Require Activation of All Receptors
682
Sensitivity of a Cell to External Signals Is Determined by the Number of Cell-Surface Receptors and Their Affinity for Ligand
683
Hormone Analogs Are Widely Used as Drugs
683
Receptors Can Be Purified by Affinity Chromatography Techniques
683
Immunoprecipitation Assays and Affinity Techniques Can Be Used to Study the Activity ofSignal Transduction Proteins
684
CONTENTS
t
xxxi
15.3 G Protein–Coupled Receptors:
Structure and Mechanism All G Protein–Coupled Receptors Share the Same Basic Structure
686 686
Activated Phospholipase C Generates Two Key Second Messengers Derived from the Membrane Lipid Phosphatidylinositol 4,5-Bisphosphate
709
2+
The Ca -Calmodulin Complex Mediates Many Cellular Responses to External Signals
713
DAG Activates Protein Kinase C
714
Ligand-Activated G Protein–Coupled Receptors Catalyze Exchange of GTP for GDP on the α Subunit of a Heterotrimeric G Protein
689
Different G Proteins Are Activated by Different GPCRs and In Turn Regulate Different Effector Proteins
Integration of Ca and cAMP Second Messengers Regulates Glycogenolysis
714
691
Signal-Induced Relaxation of Vascular Smooth Muscle Is Mediated by a Ca2+-Nitric Oxide-cGMP-Activated Protein Kinase G Pathway
714
15.4 G Protein–Coupled Receptors
That Regulate Ion Channels
693
Acetylcholine Receptors in the Heart Muscle Activate a G Protein That Opens K+ Channels
693
Light Activates Rhodopsin in Rod Cells of the Eye
694
Activation of Rhodopsin by Light Leads to Closing of cGMP-Gated Cation Channels
695
Signal Amplification Makes the Rhodopsin Signal Transduction Pathway Exquisitely Sensitive
696
Rapid Termination of the Rhodopsin Signal Transduction Pathway Is Essential for the Temporal Resolution of Vision 697 Rod Cells Adapt to Varying Levels of Ambient Light by Intracellular Trafficking of Arrestin and Transducin
2+
698
15.5 G Protein–Coupled Receptors That
Activate or Inhibit Adenylyl Cyclase 699 Adenylyl Cyclase Is Stimulated and Inhibited by Different Receptor-Ligand Complexes
699
Structural Studies Established How Gαs∙GTP Binds to and Activates Adenylyl Cyclase
701
cAMP Activates Protein Kinase A by Releasing Inhibitory Subunits
701
Glycogen Metabolism Is Regulated by Hormone-Induced Activation of PKA
702
cAMP-Mediated Activation of PKA Produces Diverse Responses in Different Cell Types
703
Signal Amplification Occurs in the cAMP-PKA Pathway
16
Signaling Pathways That Control Gene Expression
719
16.1 Receptor Serine Kinases
ThatActivate Smads
722
TGF-β Proteins Are Stored in an Inactive Forminthe Extracellular Matrix
722
Three Separate TGF-β Receptor Proteins Participate in Binding TGF-β and Activating Signal Transduction
722
Activated TGF-β Receptors Phosphorylate Smad Transcription Factors
724
The Smad3/Smad4 Complex Activates Expression of Different Genes in Different CellTypes
724
Negative Feedback Loops Regulate TGF-β/Smad Signaling
725
16.2 Cytokine Receptors and the
JAK/STAT Signaling Pathway
726
Cytokines Influence the Development of Many Cell Types
727
Binding of a Cytokine to Its Receptor Activates One or More Tightly Bound JAK Protein Tyrosine Kinases
728
Phosphotyrosine Residues Are Binding Surfaces for Multiple Proteins with Conserved Domains
730
704
SH2 Domains in Action: JAK Kinases Activate STAT Transcription Factors
731
CREB Links cAMP and PKA to Activation of Gene Transcription
704
Multiple Mechanisms Down-Regulate Signaling from Cytokine Receptors
731
Anchoring Proteins Localize Effects of cAMP to Specific Regions of the Cell
705
16.3 Receptor Tyrosine Kinases
Multiple Mechanisms Suppress Signaling from the GPCR/cAMP/PKA Pathway
706
15.6 G Protein–Coupled Receptors That
Trigger Elevations in Cytosolic and Mitochondrial Calcium Calcium Concentrations in the Mitochondrial Matrix, ER, and Cytosol Can Be Measured with Targeted Fluorescent Proteins
xxxii
t
CONTENTS
708 709
734
Binding of Ligand Promotes Dimerization of an RTK and Leads to Activation of Its Intrinsic Tyrosine Kinase
734
Homo- and Hetero-oligomers of Epidermal Growth FactorReceptors Bind Members of the Epidermal Growth Factor Family
735
Activation of the EGF Receptor Results in the Formation of an Asymmetric Active Kinase Dimer
736
Multiple Mechanisms Down-Regulate Signaling from RTKs
737
16.4 The Ras/MAP Kinase Pathway Ras, a GTPase Switch Protein, Operates Downstream of Most RTKs and Cytokine Receptors Genetic Studies in Drosophila Identified Key SignalTransducing Proteins in the Ras/MAP Kinase Pathway
739
On Binding Delta, the Notch Receptor Is Cleaved, Releasing a Component Transcription Factor
761
739
Matrix Metalloproteases Catalyze Cleavage of Many Signaling Proteins from the Cell Surface
763
739
Inappropriate Cleavage of Amyloid Precursor Protein Can Lead to Alzheimer’s Disease
763
Regulated Intramembrane Proteolysis of SREBPs Releases a Transcription Factor That Acts to Maintain Phospholipid and Cholesterol Levels
763
Receptor Tyrosine Kinases Are Linked to Ras by Adapter Proteins
741
Binding of Sos to Inactive Ras Causes a Conformational Change That Triggers an Exchange of GTP for GDP
742
Signals Pass from Activated Ras to a Cascade of Protein Kinases Ending with MAP Kinase
742
Phosphorylation of MAP Kinase Results in a Conformational Change That Enhances Its Catalytic Activity and Promotes Its Dimerization MAP Kinase Regulates the Activity of Many Transcription Factors Controlling Early Response Genes
744 745
G Protein–Coupled Receptors Transmit Signals to MAP Kinase in Yeast Mating Pathways
746
Scaffold Proteins Separate Multiple MAP Kinase Pathways in Eukaryotic Cells
746
16.5 Phosphoinositide Signaling Pathways 748 Phospholipase C𝛄 Is Activated by Some RTKs and Cytokine Receptors
749
Recruitment of PI-3 Kinase to Activated Receptors Leads to Synthesis of Three Phosphorylated Phosphatidylinositols
749
Accumulation of PI 3-Phosphates in the Plasma Membrane Leads to Activation of Several Kinases
750
Activated Protein Kinase B Induces Many Cellular Responses
750
The PI-3 Kinase Pathway Is Negatively Regulated by PTEN Phosphatase
751
16.6 Signaling Pathways Controlled
16.8 Integration of Cellular Responses
to Multiple Signaling Pathways: InsulinAction
766
Insulin and Glucagon Work Together to Maintain a Stable Blood Glucose Level
766
A Rise in Blood Glucose Triggers Insulin Secretion from the β Islet Cells
767
In Fat and Muscle Cells, Insulin Triggers Fusion of Intracellular Vesicles Containing the GLUT4 Glucose Transporter to the Plasma Membrane
767
Insulin Inhibits Glucose Synthesis and Enhances Storage of Glucose as Glycogen
769
Multiple Signal Transduction Pathways Interact to Regulate Adipocyte Differentiation Through PPAR𝛄, the Master Transcriptional Regulator
770
Inflammatory Hormones Cause Derangement of Adipose Cell Function in Obesity
770
17
Cell Organization and Movement I: Microfilaments
775
17.1 Microfilaments and Actin Structures 778 Actin Is Ancient, Abundant, and Highly Conserved
778
G-Actin Monomers Assemble into Long, Helical F-Actin Polymers
779
F-Actin Has Structural and Functional Polarity
780
by Ubiquitinylation and Protein Degradation: Wnt, Hedgehog, and NF-κB
751
Wnt Signaling Triggers Release of a Transcription Factor from a Cytosolic Protein Complex
752
Concentration Gradients of Wnt Protein Are Essential for Many Steps in Development
Actin Polymerization In Vitro Proceeds in Three Steps
781
753
Actin Filaments Grow Faster at (+) Ends Than at (−) Ends
782
Hedgehog Signaling Relieves Repression of Target Genes
754
Hedgehog Signaling in Vertebrates Requires Primary Cilia
757
Actin Filament Treadmilling Is Accelerated by Profilin and Cofilin
784
Degradation of an Inhibitor Protein Activates the NF-κB Transcription Factor
757
Thymosin-β4 Provides a Reservoir of Actin for Polymerization
785
Polyubiquitin Chains Serve as Scaffolds Linking Receptors to Downstream Proteins in the NF-κB Pathway
760
Capping Proteins Block Assembly and Disassembly at Actin Filament Ends
785
781
17.3 Mechanisms of Actin Filament
16.7 Signaling Pathways Controlled by
Protein Cleavage: Notch/Delta, SREBP, and Alzheimer’s Disease
17.2 Dynamics of Actin Filaments
Assembly 761
786
Formins Assemble Unbranched Filaments
786
CONTENTS
t
xxxiii
The Arp2/3 Complex Nucleates Branched Filament Assembly
787
Intracellular Movements Can Be Powered by Actin Polymerization
789
Microfilaments Function in Endocytosis
790
Toxins That Perturb the Pool of Actin Monomers Are Useful for Studying Actin Dynamics
791
17.4 Organization of Actin-Based
CellularStructures
793
Cross-Linking Proteins Organize Actin Filaments into Bundles or Networks
793
Adapter Proteins Link Actin Filaments to Membranes
793
17.5 Myosins: Actin-Based Motor
Proteins
796
Myosins Have Head, Neck, and Tail Domains with Distinct Functions
797
Myosins Make Up a Large Family of Mechanochemical Motor Proteins
798
18
Cell Organization and Movement II: Microtubules and Intermediate Filaments
821
18.1 Microtubule Structure and
Organization
822
Microtubule Walls Are Polarized Structures Built from αβ-Tubulin Dimers
822
Microtubules Are Assembled from MTOCs to Generate Diverse Configurations
824
18.2 Microtubule Dynamics
827
Individual Microtubules Exhibit Dynamic Instability
827
Localized Assembly and “Search and Capture” Help Organize Microtubules
829
Drugs Affecting Tubulin Polymerization Are Useful Experimentally and in Treatment of Diseases
829
18.3 Regulation of Microtubule
Structure and Dynamics
830
Conformational Changes in the Myosin Head Couple ATP Hydrolysis to Movement
800
Microtubules Are Stabilized by Side-Binding Proteins
830
Myosin Heads Take Discrete Steps Along Actin Filaments
802
+TIPs Regulate the Properties and Functions of the Microtubule (+) End
831
Other End-Binding Proteins Regulate Microtubule Disassembly
832
17.6 Myosin-Powered Movements Myosin Thick Filaments and Actin Thin Filaments in Skeletal Muscle Slide Past Each Other During Contraction Skeletal Muscle Is Structured by Stabilizing and Scaffolding Proteins
803 803
18.4 Kinesins and Dyneins:
Microtubule-Based Motor Proteins
833
805
Organelles in Axons Are Transported Along Microtubules in Both Directions
833
Contraction of Skeletal Muscle Is Regulated by Ca2+ and Actin-Binding Proteins
805
Actin and Myosin II Form Contractile Bundles in Nonmuscle Cells
Kinesin-1 Powers Anterograde Transport of Vesicles Down Axons Toward the (+) Ends of Microtubules
835
807
Myosin-Dependent Mechanisms Regulate Contraction in Smooth Muscle and Nonmuscle Cells
The Kinesins Form a Large Protein Superfamily with Diverse Functions
835
808
Kinesin-1 Is a Highly Processive Motor
836
808
Dynein Motors Transport Organelles Toward the (−) Ends of Microtubules
838
Kinesins and Dyneins Cooperate in the Transport of Organelles Throughout the Cell
841
Tubulin Modifications Distinguish Different Classes of Microtubules and Their Accessibility to Motors
842
Myosin V–Bound Vesicles Are Carried Along Actin Filaments
17.7 Cell Migration: Mechanism,
Signaling, and Chemotaxis
811
Cell Migration Coordinates Force Generation with Cell Adhesion and Membrane Recycling
811
The Small GTP-Binding Proteins Cdc42, Rac, and Rho Control Actin Organization
813
Cell Migration Involves the Coordinate Regulation of Cdc42, Rac, and Rho
815
Migrating Cells Are Steered by Chemotactic Molecules
816
xxxiv
t
CONTENTS
18.5 Cilia and Flagella: Microtubule-
Based Surface Structures Eukaryotic Cilia and Flagella Contain Long Doublet Microtubules Bridged by Dynein Motors
844 844
Ciliary and Flagellar Beating Are Produced by Controlled Sliding of Outer Doublet Microtubules
844
Intraflagellar Transport Moves Material Up and Down Cilia and Flagella
845
Primary Cilia Are Sensory Organelles on Interphase Cells
847
Defects in Primary Cilia Underlie Many Diseases
848
18.6 Mitosis
849
19
868
The Eukaryotic Cell Cycle
873
19.1 Overview of the Cell Cycle
and Its Control
875
The Cell Cycle Is an Ordered Series of Events Leading to Cell Replication
875
850
Cyclin-Dependent Kinases Control the Eukaryotic Cell Cycle
876
851
Several Key Principles Govern the Cell Cycle
876
Centrosomes Duplicate Early in the Cell Cycle in Preparation for Mitosis
849
Mitosis Can Be Divided into Six Stages The Mitotic Spindle Contains Three Classes of Microtubules Microtubule Dynamics Increase Dramatically in Mitosis
852
Mitotic Asters Are Pushed Apart by Kinesin-5 and Oriented by Dynein
Advancement of Neural Growth Cones Is Coordinated by Microfilaments and Microtubules
19.2 Model Organisms and Methods 853
of Studying the Cell Cycle
Chromosomes Are Captured and Oriented During Prometaphase
853
Budding and Fission Yeasts Are Powerful Systems for Genetic Analysis of the Cell Cycle
877
Duplicated Chromosomes Are Aligned by Motors and Microtubule Dynamics
854
Frog Oocytes and Early Embryos Facilitate Biochemical Characterization of the Cell Cycle Machinery
878
The Chromosomal Passenger Complex Regulates Microtubule Attachment at Kinetochores
855
Fruit Flies Reveal the Interplay Between Development and the Cell Cycle
879
Anaphase A Moves Chromosomes to Poles by Microtubule Shortening
857
The Study of Tissue Culture Cells Uncovers Cell Cycle Regulation in Mammals
880
Researchers Use Multiple Tools to Study the Cell Cycle
881
Anaphase B Separates Poles by the Combined Action of Kinesins and Dynein
858
Additional Mechanisms Contribute to Spindle Formation
858
Cytokinesis Splits the Duplicated Cell in Two
859
Plant Cells Reorganize Their Microtubules and Build a New Cell Wall in Mitosis
18.7 Intermediate Filaments
860
861
Intermediate Filaments Are Assembled from Subunit Dimers 861 Intermediate Filaments Are Dynamic
861
Cytoplasmic Intermediate Filament Proteins Are Expressed in a Tissue-Specific Manner
862
Lamins Line the Inner Nuclear Envelope To Provide Organization and Rigidity to the Nucleus
865
Lamins Are Reversibly Disassembled by Phosphorylation During Mitosis
866
877
19.3 Regulation of CDK Activity
882
Cyclin-Dependent Kinases Are Small Protein Kinases That Require a Regulatory Cyclin Subunit for Their Activity
883
Cyclins Determine the Activity of CDKs
884
Cyclin Levels Are Primarily Regulated by Protein Degradation
885
CDKs Are Regulated by Activating and Inhibitory Phosphorylation
886
CDK Inhibitors Control Cyclin-CDK Activity
886
Genetically Engineered CDKs Led to the Discovery of CDK Functions
887
19.4 Commitment to the Cell Cycle
and DNA Replication
887
Cells Are Irreversibly Committed to Division at a Cell Cycle Point Called START or the Restriction Point
888
867
Microfilaments and Microtubules Cooperate to Transport Melanosomes
The E2F Transcription Factor and Its Regulator Rb Control the G1–S Phase Transition in Metazoans
889
867
Extracellular Signals Govern Cell Cycle Entry
889
Cdc42 Coordinates Microtubules and Microfilaments During Cell Migration
867
Degradation of an S Phase CDK Inhibitor Triggers DNA Replication
890
18.8 Coordination and Cooperation
Between Cytoskeletal Elements Intermediate Filament–Associated Proteins Contribute to Cellular Organization
867
CONTENTS
t
xxxv
Replication at Each Origin Is Initiated Once and Only Once During the Cell Cycle
892
Duplicated DNA Strands Become Linked During Replication
893
19.5 Entry into Mitosis
895 896
Mitotic CDKs Promote Nuclear Envelope Breakdown
897
Chromosome Condensation Facilitates Chromosome Segregation
897 899
19.6 Completion of Mitosis:
ChromosomeSegregation andExitfromMitosis
901
Separase-Mediated Cleavage of Cohesins Initiates Chromosome Segregation
901
APC/C Activates Separase Through Securin Ubiquitinylation
901
Mitotic CDK Inactivation Triggers Exit from Mitosis
902
Cytokinesis Creates Two Daughter Cells
903
19.7 Surveillance Mechanisms in
Cell Cycle Regulation Checkpoint Pathways Establish Dependencies and Prevent Errors in the Cell Cycle The Growth Checkpoint Pathway Ensures That Cells Enter the Cell Cycle Only After Sufficient Macromolecule Biosynthesis The DNA Damage Response System Halts Cell Cycle Progression When DNA Is Compromised The Spindle Assembly Checkpoint Pathway Prevents Chromosome Segregation Until Chromosomes Are AccuratelyAttached to the Mitotic Spindle The Spindle Position Checkpoint Pathway Ensures That the Nucleus Is Accurately Partitioned Between TwoDaughter Cells
904 905
905 905
Integrating Cells into Tissues
908
909
911
Extracellular and Intracellular Cues Regulate Germ Cell Formation
912
Several Key Features Distinguish Meiosis from Mitosis
912
921
20.1 Cell-Cell and Cell–Extracellular
923
Cell-Adhesion Molecules Bind to One Another and to Intracellular Proteins
923
The Extracellular Matrix Participates in Adhesion, Signaling, and Other Functions
925
The Evolution of Multifaceted Adhesion Molecules Made Possible the Evolution of Diverse Animal Tissues
928
Cell-Adhesion Molecules Mediate Mechanotransduction
929
20.2 Cell-Cell and Cell–Extracellular
Junctions and Their Adhesion Molecules
931
Epithelial Cells Have Distinct Apical, Lateral, and Basal Surfaces
931
Three Types of Junctions Mediate Many Cell-Cell and Cell-ECM Interactions
932
Cadherins Mediate Cell-Cell Adhesions in Adherens Junctions and Desmosomes
933
Integrins Mediate Cell-ECM Adhesions, Including Those in Epithelial-Cell Hemidesmosomes
938
Tight Junctions Seal Off Body Cavities and Restrict Diffusion of Membrane Components
939
Gap Junctions Composed of Connexins Allow Small Molecules to Pass Directly Between the Cytosols of Adjacent Cells
942
20.3 The Extracellular Matrix I:
The Basal Lamina
19.8 Meiosis: A Special Type
of Cell Division
20
Matrix Adhesion: An Overview
Precipitous Activation of Mitotic CDKs Initiates Mitosis Mitotic CDKs Promote Mitotic Spindle Formation
Part IV Cell Growth and Differentiation
945
The Basal Lamina Provides a Foundation for Assembly of Cells into Tissues
945
Laminin, a Multi-adhesive Matrix Protein, Helps Cross-Link Components of the Basal Lamina
947
Sheet-Forming Type IV Collagen Is a Major Structural Component of the Basal Lamina
948
Perlecan, a Proteoglycan, Cross-Links Components of the Basal Lamina and Cell-Surface Receptors
950
20.4 The Extracellular Matrix II:
Connective Tissue
951
Recombination and a Meiosis-Specific Cohesin Subunit Are Necessary for the Specialized Chromosome Segregation in Meiosis I
915
Fibrillar Collagens Are the Major Fibrous Proteins in the ECM of Connective Tissues
951
Co-orienting Sister Kinetochores Is Critical for Meiosis I Chromosome Segregation
917
Fibrillar Collagen Is Secreted and Assembled into Fibrils Outside the Cell
951
DNA Replication Is Inhibited Between the Two Meiotic Divisions
917
Type I and II Collagens Associate with Nonfibrillar Collagens to Form Diverse Structures
952
xxxvi
t
CONTENTS
Proteoglycans and Their Constituent GAGs Play Diverse Roles in the ECM
953
Hyaluronan Resists Compression, Facilitates Cell Migration, and Gives Cartilage Its Gel-Like Properties
956
Fibronectins Connect Cells and ECM, Influencing Cell Shape, Differentiation, and Movement
956
Elastic Fibers Permit Many Tissues to Undergo Repeated Stretching and Recoiling
959
Metalloproteases Remodel and Degrade the Extracellular Matrix
960
20.5 Adhesive Interactions in
Motile and Nonmotile Cells Integrins Mediate Adhesion and Relay Signals Between Cells and Their Three-Dimensional Environment Regulation of Integrin-Mediated Adhesion and Signaling Controls Cell Movement Connections Between the ECM and Cytoskeleton Are Defective in Muscular Dystrophy IgCAMs Mediate Cell-Cell Adhesion in Neural and Other Tissues Leukocyte Movement into Tissues Is Orchestrated by a Precisely Timed Sequence of Adhesive Interactions
20.6 Plant Tissues
961 961 962 964 965 966
968
21.3 Stem Cells and Niches in
Multicellular Organisms
987
Adult Planaria Contain Pluripotent Stem Cells
988
Multipotent Somatic Stem Cells Give Rise to Both Stem Cells and Differentiating Cells
988
Stem Cells for Different Tissues Occupy Sustaining Niches
988
Germ-Line Stem Cells Produce Sperm or Oocytes
990
Intestinal Stem Cells Continuously Generate All the Cells of the Intestinal Epithelium
991
Hematopoietic Stem Cells Form All Blood Cells
994
Rare Types of Cells Constitute the Niche for Hematopoietic Stem Cells
996
Meristems Are Niches for Stem Cells in Plants
996
A Negative Feedback Loop Maintains the Size of the Shoot Apical Stem-Cell Population
998
The Root Meristem Resembles the Shoot Meristem in Structure and Function
999
21.4 Mechanisms of Cell Polarity and
Asymmetric Cell Division
1000
Cell Polarization Before Cell Division Follows a Common Hierarchy of Steps
1002
969
Polarized Membrane Traffic Allows Yeast to Grow Asymmetrically During Mating
1003
The Par Proteins Direct Cell Asymmetry in the Nematode Embryo
1003
The Par Proteins and Other Polarity Complexes Are Involved in Epithelial-Cell Polarity
1007
The Planar Cell Polarity Pathway Orients Cells Within an Epithelium
1008
The Par Proteins Are Involved in Asymmetric Division of Stem Cells
1008
Plasmodesmata Directly Connect the Cytosols of Adjacent Cells
970 971
Stem Cells, Cell Asymmetry, and Cell Death
975
21.1 Early Mammalian Development
977
Fertilization Unifies the Genome
977
Cleavage of the Mammalian Embryo Leads to the First Differentiation Events
979
21.2 Embryonic Stem Cells and Induced
Pluripotent Stem Cells
986
969
Loosening of the Cell Wall Permits Plant Cell Growth
21
ES and iPS Cells Can Generate Functional Differentiated Human Cells
1000
968
Only a Few Adhesion Molecules Have Been Identified in Plants
983
The Intrinsic Polarity Program Depends on a Positive Feedback Loop Involving Cdc42
The Plant Cell Wall Is a Laminate of Cellulose Fibrils in a Matrix of Glycoproteins
Tunneling Nanotubes Resemble Plasmodesmata and Transfer Molecules and Organelles Between Animal Cells
Somatic Cells Can Generate iPS Cells
980
The Inner Cell Mass Is the Source of ES Cells
980
Multiple Factors Control the Pluripotency of ES Cells
981
Animal Cloning Shows That Differentiation Can Be Reversed
983
21.5 Cell Death and Its Regulation
1011
Most Programmed Cell Death Occurs Through Apoptosis
1012
Evolutionarily Conserved Proteins Participate in the Apoptotic Pathway
1013
Caspases Amplify the Initial Apoptotic Signal and Destroy Key Cellular Proteins
1015
Neurotrophins Promote Survival of Neurons
1015
Mitochondria Play a Central Role in Regulation of Apoptosis in Vertebrate Cells
1017
The Pro-apoptotic Proteins Bax and Bak Form Pores and Holes in the Outer Mitochondrial Membrane
1018
CONTENTS
t
xxxvii
Release of Cytochrome c and SMAC/DIABLO Proteins from Mitochondria Leads to Formation of the Apoptosome and Caspase Activation
1018
Influx of Ca2+ Triggers Release of Neurotransmitters
1054
Trophic Factors Induce Inactivation of Bad, a Pro-apoptotic BH3-Only Protein
1018
A Calcium-Binding Protein Regulates Fusion of Synaptic Vesicles with the Plasma Membrane
1055
Vertebrate Apoptosis Is Regulated by BH3-Only Pro-apoptotic Proteins That Are Activated by Environmental Stresses 1020
Fly Mutants Lacking Dynamin Cannot Recycle Synaptic Vesicles
1056
Two Types of Cell Murder Are Triggered by Tumor Necrosis Factor, Fas Ligand, and Related Death Signals
Signaling at Synapses Is Terminated by Degradation or Reuptake of Neurotransmitters
1057
Opening of Acetylcholine-Gated Cation Channels Leads to Muscle Contraction
1057
All Five Subunits in the Nicotinic Acetylcholine Receptor Contribute to the Ion Channel
1058
Nerve Cells Integrate Many Inputs to Make an All-or-None Decision to Generate an Action Potential
1059
Gap Junctions Allow Direct Communication Between Neurons and Between Glia
1060
22
Three Pools of Synaptic Vesicles Loaded with Neurotransmitter Are Present in the Presynaptic Terminal 1054
Cells of the Nervous System
1021
1025
22.1 Neurons and Glia: Building Blocks
ofthe Nervous System
1026
Information Flows Through Neurons from Dendrites to Axons
1027
Information Moves Along Axons as Pulses of Ion Flow Called Action Potentials
1027
Information Flows Between Neurons via Synapses
1028
22.4 Sensing the Environment:
The Nervous System Uses Signaling Circuits Composed of Multiple Neurons
1028
Mechanoreceptors Are Gated Cation Channels
1061
Glial Cells Form Myelin Sheaths and Support Neurons
1029
Pain Receptors Are Also Gated Cation Channels
1062
1031
Five Primary Tastes Are Sensed by Subsets of Cells in Each Taste Bud
1064
A Plethora of Receptors Detect Odors
1066
Each Olfactory Receptor Neuron Expresses a Single Type of Odorant Receptor
1068
Neural Stem Cells Form Nerve and Glial Cells inthe CentralNervous System
22.2 Voltage-Gated Ion Channels and the
Propagation of Action Potentials The Magnitude of the Action Potential Is Close to ENa and Is Caused by Na+ Influx Through Open Na+ Channels Sequential Opening and Closing of Voltage-Gated Na and K+ Channels Generate Action Potentials
1034 1034
+
Action Potentials Are Propagated Unidirectionally Without Diminution
1035 1037
Touch, Pain, Taste, and Smell
22.5 Forming and Storing Memories
1070
The Hippocampus Is Required for Memory Formation
1071
Multiple Molecular Mechanisms Contribute to Synaptic Plasticity
1072
Formation of Long-Term Memories Requires Gene Expression
1074
1039
All Voltage-Gated Ion Channels Have Similar Structures
1039
Voltage-Sensing S4 α Helices Move in Response to Membrane Depolarization
1039
23
Movement of the Channel-Inactivating Segment into the Open Pore Blocks Ion Flow
1042
23.1 Overview of Host Defenses
Myelination Increases the Velocity of ImpulseConduction
1043
Action Potentials “Jump” from Node to Node in Myelinated Axons
1043
Two Types of Glia Produce Myelin Sheaths
1044
Light-Activated Ion Channels and Optogenetics
1046
1048
Immunology
Leukocytes Circulate Throughout the Body and Take Up Residence in Tissues and Lymph Nodes
1082
Mechanical and Chemical Boundaries Form a First Layer of Defense Against Pathogens
1083
Innate Immunity Provides a Second Line of Defense
1084 1086 1088
1048
Neurotransmitters Are Transported into Synaptic Vesicles by H+-Linked Antiport Proteins
1052
Adaptive Immunity, the Third Line of Defense, Exhibits Specificity
CONTENTS
1081 1081
Inflammation Is a Complex Response to Injury That Encompasses Both Innate and Adaptive Immunity
t
1079
Pathogens Enter the Body Through Different Routes and Replicate at Different Sites
Formation of Synapses Requires Assembly of Presynaptic and Postsynaptic Structures
xxxviii
1070
Memories Are Formed by Changing the Number or Strength of Synapses Between Neurons
Nerve Cells Can Conduct Many Action Potentials in the Absence of ATP
22.3 Communication at Synapses
1061
23.2 Immunoglobulins: Structure
Many of the Variable Residues of TCRs Are Encoded in the Junctions Between V, D, and JGene Segments
1118
1089
Signaling via Antigen-Specific Receptors Triggers Proliferation and Differentiation of T and B Cells
1118
Multiple Immunoglobulin Isotypes Exist, Each with Different Functions
1090
T Cells Capable of Recognizing MHC Molecules Develop Through a Process of Positive and Negative Selection
1120
Each Naive B Cell Produces a Unique Immunoglobulin
1091
T Cells Commit to the CD4 or CD8 Lineage in the Thymus
1121
1093
T Cells Require Two Types of Signals for Full Activation
1122
1094
Cytotoxic T Cells Carry the CD8 Co-receptor and Are Specialized for Killing
1122
T Cells Produce an Array of Cytokines That Provide Signals to Other Immune-System Cells
1123
Helper T Cells Are Divided into Distinct Subsets Based on Their Cytokine Production and Expression of Surface Markers
1124
Leukocytes Move in Response to Chemotactic Cues Provided by Chemokines
1124
and Function Immunoglobulins Have a Conserved Structure Consisting of Heavy and Light Chains
Immunoglobulin Domains Have a Characteristic Fold Composed of Two β Sheets Stabilized by a Disulfide Bond An Immunoglobulin’s Constant Region Determines Its Functional Properties
1089
23.3 Generation of Antibody Diversity
and B-Cell Development
1095
A Functional Light-Chain Gene Requires Assembly of V and J Gene Segments
1096
Rearrangement of the Heavy-Chain Locus Involves V, D, and J Gene Segments
1099
Somatic Hypermutation Allows the Generation and Selection of Antibodies with Improved Affinities
1099
B-Cell Development Requires Input from a Pre-B-Cell Receptor During an Adaptive Response, B Cells Switch from Making Membrane-Bound Ig to Making Secreted Ig B Cells Can Switch the Isotype of Immunoglobulin They Make
1100
1101 1102
23.4 The MHC and Antigen Presentation 1104
23.6 Collaboration of Immune-System
Cells in the Adaptive Response
1125
Engagement of Toll-Like Receptors Leads to Activation of Antigen-Presenting Cells
1127
Production of High-Affinity Antibodies Requires Collaboration Between B and Tcells
1128
Vaccines Elicit Protective Immunity Against aVariety of Pathogens
1130
The Immune System Defends Against Cancer
1131
The MHC Determines the Ability of Two Unrelated Individuals of the Same Species toAccept or Reject Grafts
1104
24
The Killing Activity of Cytotoxic T Cells Is Antigen Specific and MHC Restricted
1105
24.1 How Tumor Cells Differ from
T Cells with Different Functional Properties Are Guided by Two Distinct Classes of MHC Molecules
1105
MHC Molecules Bind Peptide Antigens and Interact with the T-Cell Receptor
1107
1109
The Class I MHC Pathway Presents Cytosolic Antigens
1110 1112
23.5 T Cells, T-Cell Receptors,
and T-Cell Development The Structure of the T-Cell Receptor Resembles the F(ab) Portion of an Immunoglobulin TCR Genes Are Rearranged in a Manner Similar to Immunoglobulin Genes
Cancer
1135
Normal Cells
Antigen Presentation Is the Process by Which Protein Fragments Are Complexed with MHC Products and Posted to the Cell Surface The Class II MHC Pathway Presents Antigens Delivered to the Endocytic Pathway
1125
Toll-Like Receptors Perceive a Variety of Pathogen-Derived Macromolecular Patterns
1115 1115 1116
1136
The Genetic Makeup of Most Cancer Cells Is Dramatically Altered
1137
Cellular Housekeeping Functions Are Fundamentally Altered in Cancer Cells
1137
Uncontrolled Proliferation Is a Universal Trait of Cancer
1139
Cancer Cells Escape the Confines of Tissues
1140
Tumors Are Heterogeneous Organs That Are Sculpted by Their Environment
1140
Tumor Growth Requires Formation of New Blood Vessels
1141
Invasion and Metastasis Are Late Stages of Tumorigenesis
1141
24.2 The Origins and Development
of Cancer
1143
Carcinogens Induce Cancer by Damaging DNA
1143
Some Carcinogens Have Been Linked to Specific Cancers
1144
CONTENTS
t
xxxix
The Multi-hit Model Can Explain the Progress of Cancer Successive Oncogenic Mutations Can Be Traced in Colon Cancers Cancer Development Can Be Studied in Cultured Cells and in Animal Models
24.3 The Genetic Basis of Cancer
1145 1146 1146
1149
Gain-of-Function Mutations Convert Proto-oncogenes into Oncogenes
1149
Cancer-Causing Viruses Contain Oncogenes or Activate Cellular Proto-oncogenes
1152
Loss-of-Function Mutations in Tumor-Suppressor Genes Are Oncogenic
1152
Many Oncogenes Encode Constitutively Active Signal-Transducing Proteins
1160
Inappropriate Production of Nuclear Transcription Factors Can Induce Transformation
1160
Aberrations in Signaling Pathways That Control Development Are Associated with Many Cancers
1161
Genes That Regulate Apoptosis Can Function as Proto-oncogenes or Tumor-Suppressor Genes
1163
24.5 Deregulation of the Cell Cycle
and Genome Maintenance Pathways inCancer
1163
Mutations That Promote Unregulated Passage from G1 to S Phase Are Oncogenic
1164
Inherited Mutations in Tumor-Suppressor Genes Increase Cancer Risk
1153
Loss of p53 Abolishes the DNA Damage Checkpoint
1165
Epigenetic Changes Can Contribute to Tumorigenesis
1155
Loss of DNA-Repair Systems Can Lead to Cancer
1166
Micro-RNAs Can Promote and Inhibit Tumorigenesis
1155
Researchers Are Identifying Drivers of Tumorigenesis
1156
Molecular Cell Biology Is Changing How Cancer Is Diagnosed and Treated
1157
INDEX
24.4 Misregulation of Cell Growth
and Death Pathways in Cancer Oncogenic Receptors Can Promote Proliferation in the Absence of External Growth Factors
xl
t
CONTENTS
GLOSSARY
1159 1159
G-1 I-1
CHAPTER
1 Molecules, Cells, and Model Organisms Two cells in mortal combat: a malaria parasite invading a human red blood cell. [Courtesy Dr. Stuart Ralph, University of Melbourne.]
Nothing in biology makes sense except in the light of evolution. —Theodosius Dobzhansky, 1973, essay in American Biology Teacher 35:125–129
Biology is a science fundamentally different from physics or chemistry, which deal with unchanging properties of matter that can be described by mathematical equations. Biological systems, of course, follow the rules of chemistry and physics, but biology is a historical science, as the forms and structures of the living world today are the results of billions of years of evolution. Through evolution, all organisms are related in a family tree extending from primitive single-celled organisms that lived in the distant past to the diverse plants, animals, and microorganisms of the present era (Figure 1-1, Table 1-1). The great insight of Charles Darwin (Figure 1-2) was the principle of natural selection: organisms vary randomly and compete within their environment for resources. Only those that survive and reproduce are able to pass down their genetic traits.
At first glance, the biological universe does appear amazingly diverse—from tiny ferns to tall fir trees, from single-celled bacteria and protozoans visible only under a microscope to multicellular animals of all kinds. Indeed, cells come in an astonishing variety of sizes and shapes (Figure 1-3). Some move rapidly and have fast-changing structures, as we can see in movies of amoebae and rotifers. Others are largely stationary and structurally stable. Oxygen kills some cells but is an absolute requirement for others. Most cells in multicellular organisms are intimately involved with other cells. Although some unicellular organisms live in isolation (Figure 1-3a), others form colonies or live in close association with other types of organisms (Figure 1-3b, d), such as the bacteria that help plants to extract nitrogen from the air or the bacteria that live in our intestines and help us digest food. Yet the bewildering array of outward biological forms overlies a powerful uniformity: thanks to our common ancestry, all biological systems are composed of cells containing the same types of chemical molecules and employing similar principles of organization at the cellular level. Although the
OU TL I N E 1.1
The Molecules of Life
1.4
Unicellular Eukaryotic Model Organisms
1.2
Prokaryotic Cell Structure and Function
1.5
1.3
Eukaryotic Cell Structure and Function
Metazoan Structure, Differentiation, and Model Organisms
Animals Plants Slime molds BACTERIA
Low G + C grampositives High G + C grampositives
ARCHAEA
Euryarchaeota Korarchaeota Crenarchaeota
δ/ε purples α purples γ /β purples Spirochaetes Fusobacteria Flexibacter/ Bacteroides Cyanobacteria Thermus
Mitochondria Thermotogales
EUKARYOTA Fungi Microsporidia Entamoeba Apicomplexa (e.g., Plasmodium) Euglena
Kinetoplasta (e.g., Trypanosoma) Parabasalia (e.g., Trichomonas) Metamonda (e.g., Giardia)
Chloroplasts
Aquifex Presumed last common ancestor of eukaryotes and archaebacteria Presumed last common ancestor of all extant organisms
FIGURE 11 All living organisms descended from a common ancestral cell. All organisms, from simple bacteria to complex mammals, probably evolved from a common single-celled ancestor. This family tree depicts the evolutionary relationships among the three major lineages of organisms. The structure of the tree was initially ascertained from morphological criteria: creatures that look alike were put close together. More recently, the sequences of DNA and proteins
from J. R. Brown, 2005, “Universal tree of life,” in Encyclopedia of Life Sciences, Wiley InterScience (online).]
basic kinds of biological molecules have been conserved during the billions of years of evolution, the patterns in which they are assembled to form functioning cells and organisms have undergone considerable change. We now know that genes, which chemically are composed of deoxyribonucleic acid (DNA), ultimately define biological structure and maintain the integration of cellular function. Many genes encode proteins, the primary molecules that make up cell structures and carry out cellular activities. Alterations in the structure and organization of genes, or mutations, provide the random variation that can alter biological structure and function. While the vast majority of random mutations have no observable effect on a gene’s or protein’s function, many are deleterious, and only a few confer an evolutionary advantage on the organism. In all organisms, mutations in DNA are constantly occurring, allowing over time the small alterations in cellular structures and functions that may prove to be advantageous. Entirely new cellular structures are rarely created; more often, existing cellular structures undergo changes that better adapt the organism to new circumstances. Slight changes in a protein can cause important changes in its function or abolish its function entirely. For instance, in a particular organism, one gene may randomly become duplicated, after which one copy of the
gene and its encoded protein retain their original function while, over time, the second copy of the gene mutates such that its protein takes on a slightly different or even a totally new function. During the evolution of some organisms, the entire genome became duplicated, allowing the second copies of many genes to undergo mutations and acquire new functions. The cellular organization of organisms plays a fundamental role in this process because it allows these changes to come about by small alterations in previously evolved cells, giving them new abilities. The result is that closely related organisms have very similar genes and proteins as well as similar cellular and tissue organizations. Multicellular organisms, including the human body, consist of such closely interrelated elements that no single element can be fully appreciated in isolation from the others. Organisms contain organs, organs are composed of tissues, tissues consist of cells, and cells are formed from molecules (Figure 1-4). The unity of living systems is coordinated by many levels of interrelationship: molecules carry messages from organ to organ and cell to cell, and tissues are delineated and integrated with other tissues by molecules secreted by cells. Generally all the levels into which we fragment biological systems interconnect.
2
CHAPTER 1
t Molecules, Cells, and Model Organisms
found in organisms have provided more information-rich criteria for assigning relationships. The greater the similarities in these macromolecular sequences, the more closely related organisms are thought to be. The trees based on morphological comparisons and the fossil record generally agree well with those based on molecular data. [Data
TABLE 11
Timeline for Evolution of Life on Earth, as Determined from the Fossil Record
4600 million years ago
The planet Earth forms from material revolving around the young Sun.
∼3900–2500 million years ago
Cells resembling prokaryotes appear. These first organisms are chemoautotrophs: they use carbon dioxide as a carbon source and oxidize inorganic materials to extract energy.
3500 million years ago
Lifetime of the last universal ancestor; the split between Eubacteria and Archaea occurs.
3000 million years ago
Photosynthesizing cyanobacteria evolve; they use water as a reducing agent, thereby producing oxygen as a waste product.
1850 million years ago
Unicellular eukaryotes appear.
1200 million years ago
Simple multicellular organisms evolve, mostly consisting of cell colonies of limited complexity.
580–500 million years ago
Most modern phyla of animals begin to appear in the fossil record during the Cambrian explosion.
535 million years ago
Major diversification of living things in the oceans: chordates, arthropods (e.g., trilobites, crustaceans), echinoderms, mollusks, brachiopods, foraminifers, radiolarians, etc.
485 million years ago
First vertebrates with true bones (jawless fishes) evolve.
434 million years ago
First primitive plants arise on land.
225 million years ago
Earliest dinosaurs (prosauropods) and teleost fishes appear.
220 million years ago
Gymnosperm forests dominate the land; herbivores grow to huge sizes.
215 million years ago
First mammals evolve.
65.5 million years ago
The Cretaceous-Tertiary extinction event eradicates about half of all animal species, including all of the dinosaurs.
6.5 million years ago
First hominids evolve.
2 million years ago
First members of the genus Homo appear in the fossil record.
350 thousand years ago
Neanderthals appear.
200 thousand years ago
Anatomically modern humans appear in Africa.
30 thousand years ago
Extinction of Neanderthals.
FIGURE 12 Charles Darwin (1809–1882). Four years after his epic voyage on HMS Beagle, Darwin had already begun formulating in private notebooks his concept of natural selection, which would be published in his Origin of Species (1859). [Charles Darwin on the Galapagos Islands by Howat, Andrew (20th century)/Private Collection/© Look and Learn/ Bridgeman Images.]
CHAPTER 1
t Molecules, Cells, and Model Organisms
3
(a)
(c)
(b)
1 μm
1 μm
(d)
(e)
100 μm
10 μm (f)
20 μm
20 μm
FIGURE 13 Cells come in an astounding assortment of shapes and sizes. Some of the morphological variety of cells is illustrated in these photographs. In addition to morphology, cells differ in their ability to move, internal organization (prokaryotic versus eukaryotic cells), and metabolic activities. (a) Eubacteria: Lactococcus lactis, which are used to produce cheese such as Roquefort, Brie, and Camembert. Note the dividing cells. (b) A mass of archaeans (Methanosarcina) that produce their energy by converting carbon dioxide and hydrogen gas to methane. Some species that live in the rumens of cattle give rise to >150 liters of methane gas each day. (c) Human blood cells, shown in false color. The red cells are oxygen-bearing erythrocytes, the white cells (leukocytes) are part of the immune system and fight infection, and the green cells are platelets that plug wounds and contain substances to initiate blood clotting. (d) A colonial single-celled green alga,
Syred/Science Source. Part (c) Science Source. Part (d) micro_photo/iStockphoto/Getty Images. Part (e) Courtesy of Dr. Helen M. Blau (Stanford University School of Medicine) and Dr. Clas B. Johansson (Karolinska Institutet). Part (f) Biophoto Associates/Science Source.]
To learn about biological systems, however, we must examine one small portion of a living system at a time. The biology of cells is a logical starting point because an organism can be viewed as consisting of interacting cells, which are the closest thing to autonomous biological units that exist. The last common ancestor of all life on Earth was a single cell (see Figure 1-1), and at the cellular level all life is remarkably similar. All cells use the same molecular building blocks, similar methods for the storage, maintenance, and expression of genetic information, and similar processes of energy metabolism, molecular transport, signaling, development, and structure. In this chapter, we introduce the common features of cells. We begin with a brief discussion of the principal small
molecules and macromolecules found in biological systems. Next we discuss the fundamental aspects of cell structure and function that are conserved in present-day organisms, focusing first on prokaryotic organisms—single-celled organisms without a nucleus—and their uses in studying the basic molecules of life. Then we discuss the structure and function of eukaryotic cells—cells with a defined nucleus—focusing on their many organelles. This discussion is followed by a section describing the use of unicellular eukaryotic organisms in investigations of molecular cell biology, focusing on yeasts and the parasite that causes malaria. We now have the complete sequences of the genomes of several thousand metazoans (multicellular animals), and these sequences have provided considerable insight into the
4
CHAPTER 1
t Molecules, Cells, and Model Organisms
Volvox aureus. The large spheres are made up of many individual cells, visible as blue or green dots. The yellow masses inside are daughter colonies, each made up of many cells. (e) A single Purkinje neuron of the cerebellum, which can form more than a hundred thousand connections with other cells through its branched network of dendrites. The cell was made visible by introduction of a green fluorescent protein; the cell body is the bulb at the upper right. (f) Plant cells are fixed firmly in place in vascular plants, supported by a rigid cellulose skeleton. Spaces between the cells are joined into tubes for transport of water and food. [Part (a) Gary Gaugler/Science Source. Part (b) Power and
(a)
(b)
1 cm
Dead skin cells Epidermal cells Basal lamina Loose connective tissue 20 μm
(d) Intracellular attachment protein
Cell-cell adhesion protein
(c)
Cytoskeletal proteins
Cell-surface receptor 1 μm Multi-adhesive protein
5 nm
Basal lamina
Desmosome Hemidesmosome
FIGURE 14 Living systems such as the human body consist of closely interrelated elements. (a) The surface of the hand is covered by a living organ, skin, that is composed of several layers of tissue. (b) An outer covering of hard, dead skin cells protects the body from injury, infection, and dehydration. This layer is constantly renewed by living epidermal cells, which also give rise to hair and fur in animals. Deeper layers of muscle and connective tissue give skin its tone and firmness. (c) Tissues are formed through subcellular adhesion
structures (desmosomes and hemidesmosomes) that join cells to one another and to an underlying layer of supporting fibers. (d) At the heart of cell-cell adhesion are its structural components: phospholipid molecules that make up the cell-surface membrane, and large protein molecules. Protein molecules that traverse the cell membrane often form strong bonds with internal and external fibers made of multiple proteins.
evolution of genes and organisms. The final section in this chapter shows us how this information can be used to refine the evolutionary relationships among organisms as well as our understanding of human development. Indeed, biologists use evolution as a research tool: if a gene and its protein have been conserved in all metazoans but are not found in unicellular organisms, the protein probably has an important function in all metazoans and thus can be studied in whatever metazoan organism is most suitable for the investigation. Because the structure and function of many types of metazoan cells is also conserved, we now understand the structure and function of many cell types in considerable detail, including muscle and liver cells and the sheets of epithelial cells that line the intestine and form our skin. But other cells—especially the multiple types that form our nervous and immune systems—still remain mysterious; much important cell biological experimentation is needed on these and other cell systems and organs that form our bodies.
1.1 The Molecules of Life While large polymers are the focus of molecular cell biology, small molecules are the stage on which all cellular processes are set. Water, inorganic ions, and a wide array of relatively small organic molecules (Figure 1-5) account for 75 to 80 percent of living matter by weight, and water accounts for about 75 percent of a cell’s volume. These small molecules, including water, serve as substrates for many of the reactions that take place inside the cell, including energy metabolism and cell signaling. Cells acquire these small molecules in different ways. Ions, water, and many small organic molecules are imported into the cell (see Chapter 11); other small molecules are synthesized within the cell, often by a series of chemical reactions (see Chapter 12). Even in the structures of many small molecules, such as sugars, vitamins, and amino acids, we see the footprint of evolution. For example, all amino acids save glycine have an 1.1 The Molecules of Life
5
Oleic acid
Sodium
L-serine
Water
D-serine L-glucose
D-glucose
FIGURE 15 Some of the many small molecules found in cells. Only the L-forms of amino acids such as serine are incorporated into proteins, not their D-mirror images; only the D-form of glucose, not its L-mirror image, can be metabolized to carbon dioxide and water.
asymmetric carbon atom, yet only the l-stereoisomer, never the d-stereoisomer, is incorporated into proteins. Similarly, only the d-stereoisomer of glucose is invariably found in cells, never the mirror-image l-stereoisomer (see Figure 1-5). At an early stage of biological evolution, our common cellular ancestor evolved the ability to catalyze reactions with one
stereoisomer instead of the other. How these selections happened is unknown, but now these choices are locked in place. An important and universally conserved small molecule is adenosine triphosphate (ATP), which stores readily available chemical energy in two of its chemical bonds (Figure 1-6). When one of these energy-rich bonds in ATP is broken, forming ADP (adenosine diphosphate), the released energy can be harnessed to power energy-requiring processes such as muscle contraction or protein biosynthesis. To obtain energy for making ATP, all cells break down food molecules. For instance, when sugar is degraded to carbon dioxide and water, the energy stored in the sugar molecule’s chemical bonds is released, and much of it can be “captured” in the energy-rich bonds in ATP. Bacterial, plant, and animal cells can all make ATP by this process. In addition, plants and a few other organisms can harvest energy from sunlight to form ATP in photosynthesis. Other small molecules (e.g., certain hormones and growth factors) act as signals that direct the activities of cells (see Chapters 15 and 16), and neurons (nerve cells) communicate with one another by releasing and sensing certain small signaling molecules (see Chapter 22). The powerful physiological effects of a frightening event, for example, come from the instantaneous flooding of the body with the small-molecule hormone adrenaline, which mobilizes the “fight or flight” response. Certain small molecules (monomers) can be joined to form polymers (also called macromolecules) through
NH2
NH2
C
ATP
N
C
HC
C N
O ⴚ
O
O
O ⴚ
P O
CH
ADP
N
O ⴚ
O
O
CH2
ⴚ
H
Light (photosynthesis) or compounds with high potential energy (respiration)
O H
H
O
ⴚ
P
O ⴚ
O
P
HC
C
O
CH2
ⴚ
O
H HO
C
N CH N
O
O
P
N N
O
O
P
C
N
H
O H
H
OH
H HO
OH
High-energy bond
High-energy bonds ADP + Pi
ATP
Energy
Synthesis of cellular macromolecules (DNA, RNA, proteins, polysaccharides)
Synthesis of other cellular constituents (such as membrane phospholipids and certain required metabolites)
Cellular movements, including muscle contraction, crawling movements of entire cells, and movement of chromosomes during mitosis
FIGURE 16 Adenosine triphosphate (ATP) is the most common molecule used by cells to capture, store, and transfer energy. ATP is formed from adenosine diphosphate (ADP) and inorganic phosphate 6
CHAPTER 1
t Molecules, Cells, and Model Organisms
Transport of molecules against a concentration gradient
Generation of an electric potential across a membrane (important for nerve function)
Heat
(Pi) by photosynthesis in plants and by the breakdown of sugars and fats in most cells. The energy released by the splitting (hydrolysis) of Pi from ATP drives many cellular processes.
repetition of a single type of covalent chemical-linkage reaction. Cells produce three types of large macromolecules: polysaccharides, proteins, and nucleic acids. Sugars, for example, are the monomers used to form polysaccharides. Different polymers of d-glucose form cellulose, an important component of plant cell walls, and glycogen, a storage form of glucose found in liver and muscle. The cell is careful to provide the appropriate mix of small molecules needed as precursors for synthesis of macromolecules.
Proteins Give Cells Structure and Perform Most Cellular Tasks Proteins, the workhorses of the cell, are the most abundant and functionally versatile of the cellular macromolecules. Cells string together 20 different amino acids in linear chains, each with a defined sequence, to form proteins (see Figure 2-14), which commonly range in length from 100 to 1000 amino acids. During or just after its polymerization, a linear chain of amino acids folds into a complex shape, conferring a distinctive three-dimensional structure and function on the protein (Figure 1-7). Humans obtain amino acids either by synthesizing them from other molecules or by breaking down proteins that we eat. Proteins have a variety of functions in the cell. Many proteins are enzymes, which accelerate (catalyze) chemical reactions involving small molecules or macromolecules (see Chapter 3). Certain proteins catalyze steps in the synthesis of all proteins; others catalyze synthesis of macromolecules such
as DNA and RNA. Cytoskeletal proteins serve as structural components of a cell; for example, by forming an internal skeleton. Other proteins associated with the cytoskeleton power the movement of subcellular structures such as chromosomes, and even of whole cells, by using energy stored in the chemical bonds of ATP (see Chapters 17 and 18). Still other proteins bind adjacent cells together or form parts of the extracellular matrix (see Figure 1-4). Proteins can be sensors that change shape as temperature, ion concentrations, or other properties of the cell change. Many proteins that are embedded in the cell-surface (plasma) membrane import and export a variety of small molecules and ions (see Chapter 11). Some proteins, such as insulin, are hormones; others are hormone receptors that bind their target protein or small molecule and then generate a signal that regulates a specific aspect of cell function. Other important classes of proteins bind to specific segments of DNA, turning genes on or off (see Chapter 9). In fact, much of molecular cell biology consists of studying the function of specific proteins in specific cell types.
Nucleic Acids Carry Coded Information for Making Proteins at the Right Time and Place The macromolecule that garners the most public attention is deoxyribonucleic acid (DNA), whose functional properties make it the cell’s “master molecule.” The three-dimensional structure of DNA, first proposed by James D. Watson and Francis H. C. Crick in 1953, consists of two long helical strands that are coiled around a common axis to form a
10 nm = 100 Å
DNA molecule
RNA molecule
Adenylate kinase Insulin
Glutamine synthetase
Hemoglobin
FIGURE 17 Models of some representative proteins drawn to a common scale and compared with a small portion of a lipid bilayer, a DNA molecule, and an RNA molecule. Each protein has a defined three-dimensional shape held together by numerous chemical bonds. The illustrated proteins include enzymes (glutamine synthetase and adenylate kinase), an antibody (immunoglobulin), a hormone (insulin), and the blood’s oxygen carrier (hemoglobin). [Glutamine synthetase
Immunoglobulin
Lipid bilayer
data from H. S. Gill and D. Eisenberg, 2001, Biochemistry 40:1903–1912, PDB ID 1fpy. Insulin data from E. N. Baker et al., 1988, Phil. Trans. R. Soc. Lond. B Biol. Sci. 319:369–456, PDB ID 4ins. Hemoglobin data from G. Fermi et al., 1984, J. Mol. Biol. 175:159–174, PDB ID 2hhb. Immunoglobulin data from L. J. Harris et al., 1998, J. Mol. Biol. 275:861–872, PDB ID 1igy. Adenylate kinase data from G. Bunkoczi et al., PDB ID 2c9y.]
1.1 The Molecules of Life
7
FIGURE 18 DNA consists of two complementary strands wound around each other to form a double helix. The double helix is stabilized by weak hydrogen bonds between the A and T bases and between the C and G bases. During replication, the two strands are unwound and used as templates to produce complementary strands. The outcome is two identical copies of the original double helix, each containing one of the original strands and one new daughter (complementary) strand.
Nucleotide (T)
Parental strands
Daughter strands
A G T C
double helix (Figure 1-8). The double-helical structure of DNA, one of nature’s most magnificent constructions, is critical to the phenomenon of heredity, the transfer of genetically determined characteristics from one generation to the next. DNA strands are composed of monomers called nucleotides; these monomers are often referred to as bases because they contain cyclic organic bases (see Chapter 5). Four different nucleotides, abbreviated A, T, C, and G, are joined to form a DNA strand, with the base parts projecting inward from the backbone of the strand. Two strands bind together via the bases and twist to form a double helix. Each DNA double helix has a simple construction: wherever one strand has an A, the other strand has a T, and each C is matched with a G (see Figure 1-8). This complementary matching of the two strands is so strong that if complementary strands are separated under the right salt concentration and temperature conditions, they will spontaneously zip back together. This property is critical for DNA replication and inheritance, as we will learn in Chapter 5, and also underlies many of the techniques for studying DNA molecules that are detailed in Chapter 6. The genetic information carried by DNA resides in its sequence, the linear order of nucleotides along a strand. Specific segments of DNA, termed genes, carry instructions for making specific proteins. Commonly, genes contain two parts: the coding region specifies the amino acid sequence of a protein; the regulatory region binds specific proteins and controls when and in which cells the gene’s protein is made. Most bacteria have a few thousand protein-coding genes; yeasts and other unicellular eukaryotes have about 5000. Humans and other metazoans have between 13,000 and 23,000, while many plants have more. As we discuss later in this chapter, many of the genes in bacteria specify the sequences of proteins that catalyze reactions that occur universally, such as the metabolism of glucose and the synthesis of nucleic acids and proteins. These genes, and the proteins encoded by them, are conserved throughout all living organisms, and thus studies on the functions of these genes and proteins in bacterial cells have yielded profound insights into these basic life processes. Similarly, many genes in unicellular eukaryotes such as yeasts encode proteins that are conserved throughout all eukaryotes; we will see how yeasts have been used in studies of processes such as cell division that have yielded profound insights into human diseases such as cancer. 8
CHAPTER 1
t Molecules, Cells, and Model Organisms
How is information stored in the sequence of DNA used? Cells use two processes in series to convert the coded information in DNA into proteins (Figure 1-9). In the first process, called transcription, the protein-coding region of a gene is copied into a single-stranded ribonucleic acid (RNA) whose sequence is the same as one of the two in the double-stranded DNA. A large enzyme, RNA polymerase, catalyzes the linkage of nucleotides into an RNA chain using DNA as a template. In eukaryotic cells, the initial RNA product is processed into a smaller messenger RNA (mRNA) molecule, which moves out of the nucleus to the cytoplasm, the region of the cell outside of the nucleus. Here the ribosome, an enormously complex molecular machine composed of both RNA and proteins, carries out the second process, called translation. During translation, the ribosome assembles and links together amino acids in the precise order dictated by the mRNA sequence according to the nearly universal genetic code. We examine the cell components that carry out transcription and translation in detail in Chapter 5. In addition to its role in transferring information from nucleus to cytoplasm, RNA can serve as a framework for building a molecular machine. The ribosome, for example, is built of four RNA chains that bind to more than 50 proteins to make a remarkably precise and efficient mRNA reader and protein synthesizer. While most chemical reactions in cells are catalyzed by proteins, a few, such as the formation by ribosomes of the peptide bonds that connect amino acids in proteins, are catalyzed by RNA molecules. Well before the entire human genome was sequenced, it was apparent that only about 10 percent of human DNA consists of protein-coding genes, and for many years the remaining 90 percent was considered “junk DNA”! In recent years, we’ve learned that much of the so-called junk DNA is actually copied into thousands of RNA molecules that, though they do not encode proteins, serve equally important purposes in the cell (see Chapter 10). At present, however, we know the function of only a very few of these abundant noncoding RNAs. Like enzymes, certain RNA molecules, termed ribozymes, catalyze chemical reactions, as exemplified by the RNA inside a ribosome. Many scientists support the RNA world hypothesis, which proposes that RNA molecules that could replicate themselves were the precursors of current life forms;
Phospholipids Are the Conserved Building Blocks of All Cellular Membranes 1 Activation
DNA Start
2 Transcription pre-mRNA Nucleus
3 Processing
mRNA
Protein
4
Cytoplasm
Translation
Transcription factor
RNA polymerase
Ribosome
Transcribed region of DNA Nontranscribed region of DNA Protein-coding region of RNA Noncoding region of RNA Amino acid chain
FIGURE 19 The information encoded in DNA is converted into the amino acid sequences of proteins by a multistep process. Step 1 : Transcription factors and other proteins bind to the regulatory regions of the specific genes they control to activate those genes. Step 2 : RNA polymerase begins transcription of an activated gene at a specific location, the start site. The polymerase moves along the DNA, linking nucleotides into a single-stranded pre-mRNA transcript using one of the DNA strands as a template. Step 3 : The transcript is processed to remove noncoding sequences. Step 4 : In a eukaryotic cell, the mature mRNA moves to the cytoplasm, where it is bound by ribosomes that read its sequence and assemble a protein by chemically linking amino acids into a linear chain.
billions of years ago, the RNA world gradually evolved into the DNA, RNA, and protein world of today’s organisms. All organisms must control when and where their genes are transcribed. Nearly all the cells in our bodies contain the full set of human genes, but in each cell type only some of these genes are active, or turned on, and used to make proteins. For instance, liver cells produce some proteins that are not produced by kidney cells, and vice versa. Moreover, many cells respond to external signals or changes in external conditions by turning specific genes on or off, thereby adapting their repertoire of proteins to meet current needs. Such control of gene activity depends on DNA-binding proteins called transcription factors, which bind to specific sequences of DNA and act as switches, either activating or repressing transcription of particular genes, as discussed in Chapter 9.
In all organisms, cellular membranes are composed primarily of a bilayer (two layers) of phospholipid molecules. Each of these bipartite molecules has a “water-loving” (hydrophilic) “head” and a “water-hating” (hydrophobic) “tail.” The two phospholipid layers of a membrane are oriented with all the hydrophilic heads directed toward the inner or outer surfaces of the membrane and the hydrophobic tails buried within its interior (Figure 1-10). Smaller amounts of other lipids, such as cholesterol, are inserted into this phospholipid framework. Cellular membranes are extremely thin relative to the size of a cell. If you magnify a bacterium or yeast cell about 10,000 times to the size of a soccer ball, the plasma membrane is about as thick as a sheet of paper! Phospholipid membranes are impermeable to water, all ions, and virtually all hydrophilic small molecules. Thus each membrane in each cell also contains groups of proteins that allow specific ions and small molecules to cross. Other membrane proteins serve to attach the cell to other cells or to polymers that surround it; still others give the cell its shape or allow its shape to change. We will learn more about membranes and how molecules cross them in Chapters 7 and 11. New cells are always derived from parental cells by cell division. We’ve seen that the synthesis of new DNA molecules is templated by the two strands of the parental DNA such that each daughter DNA molecule has the same sequence as the parental one. In parallel, new membranes are made by incorporation of lipids and proteins into existing membranes in the parental cell and divided between daughter cells by fission. Thus membrane synthesis, like DNA synthesis, is templated by a parental structure.
Cholesterol
Water
Hydrophilic head group
Transmembrane proteins
Hydrophobic fatty acyl chains
FIGURE 110 The watery interior of cells is surrounded by the plasma membrane, a two-layered shell of phospholipids. The phospholipid molecules are oriented with their hydrophobic fatty acyl chains (black squiggly lines) facing inward and their hydrophilic head groups (white spheres) facing outward. Thus both sides of the membrane are lined by head groups, mainly charged phosphates, adjacent to the watery spaces inside and outside the cell. All biological membranes have the same basic phospholipid bilayer structure. Cholesterol (red) and various proteins are embedded in the bilayer. The interior space is actually much larger relative to the volume of the plasma membrane than is depicted here.
1.1 The Molecules of Life
9
1.2 Prokaryotic Cell Structure and Function The biological universe consists of two types of cells: prokaryotic and eukaryotic. Prokaryotic cells such as bacteria consist of a single closed compartment that is surrounded by a plasma membrane, lack a defined nucleus, and have a relatively simple internal organization (Figure 1-11). Eukaryotic cells contain a defined membrane-bounded nucleus and extensive internal membranes that enclose the organelles (see Figure 1-12).
Prokaryotes Comprise Two Kingdoms: Archaea and Eubacteria In recent years, detailed analysis of DNA sequences from a variety of prokaryotic organisms has revealed two distinct kingdoms: the Eubacteria, often simply called “bacteria,” and the Archaea. Eubacteria are single-celled organisms; they include the cyanobacteria, or “blue-green algae,” which can be unicellular or filamentous chains of cells. Figure 1-11 illustrates the general structure of a typical eubacterial cell; archaeal cells have a similar structure. Bacterial cells are commonly 1–2 μm in size and consist of a single closed compartment containing the cytoplasm and bounded by the plasma membrane. The genome is composed of a single circular DNA molecule; many prokaryotes contain additional small circular DNA molecules called plasmids. Although bacterial cells do not have a defined nucleus, the DNA is extensively folded and condensed into the central region of the cell, called the nucleoid. In contrast, most ribosomes are found in the cytoplasm. Some bacteria also have an invagination of the cell membrane, called a mesosome, which is
associated with synthesis of DNA and secretion of proteins. Many proteins are precisely localized within the cytosol or in the plasma membrane, indicating the presence of an elaborate internal organization. Unlike those in eukaryotes (see Figure 1-9), bacterial mRNAs undergo limited if any processing. And because there is no membrane barrier between bacterial DNA and cytoplasm, ribosomes are able to bind to an mRNA as soon as part of it has been synthesized by RNA polymerase; thus in prokaryotes, transcription and translation occur contemporaneously. Bacterial cells possess a cell wall, which lies adjacent to the external side of the plasma membrane. The cell wall is composed of layers of peptidoglycan, a complex of proteins and oligosaccharides; it helps protect the cell and maintain its shape. Some bacteria (e.g., E. coli) have a thin inner cell wall and an outer membrane separated from the inner cell wall by the periplasmic space. Such bacteria are not stained by the Gram technique and thus are classified as gramnegative. Other bacteria (e.g., Bacillus polymyxa) that have a thicker cell wall and no outer membrane take the Gram stain and thus are classified as Gram-positive. In addition to DNA sequence distinctions that separate them from eubacteria, archaea have cell membranes that differ dramatically in composition from those of eubacteria and eukaryotes. Many archaeans grow in unusual, often extreme, environments that may resemble the ancient conditions that existed when life first appeared on Earth. For instance, halophiles (“salt lovers”) require high concentrations of salt to survive, and thermoacidophiles (“heat and acid lovers”) grow in hot (80 °C) sulfur springs, where a pH of less than 2 is common. Still other archaeans live in oxygen-free milieus and generate methane (CH4) by combining water with carbon dioxide.
Cytoplasm
Nucleoid
Periplasmic space and cell wall
Inner (plasma) membrane Cell wall Outer membrane
Inner (plasma) membrane
Nucleoid
Periplasmic space 0.5 μm
Outer membrane
Cytoplasm
FIGURE 111 Prokaryotic cells are have a relatively simple structure. (Left) Electron micrograph of a thin section of Escherichia coli, a common intestinal bacterium. The nucleoid, consisting of the bacterial DNA, is not enclosed within a membrane. E. coli and other gramnegative bacteria are surrounded by two membranes separated by the periplasmic space. The thin cell wall is adjacent to the inner membrane. 10
CHAPTER 1
t Molecules, Cells, and Model Organisms
(Right) This artist’s drawing shows the nucleoid (blue) and a magnification of the layers that surround the cytoplasm. Most of the cell is composed of water, proteins, ions, and other molecules that are too small to be depicted at the scale of this drawing. [Electron micrograph courtesy of I. D. J. Burdett and R. G. E. Murray.]
Escherichia coli Is Widely Used in Biological Research The bacterial lineage includes Escherichia coli, a favorite experimental organism, which in nature is common in soil and in animal intestines. E. coli and several other bacteria have a number of advantages as experimental organisms. They grow rapidly in a simple and inexpensive medium containing glucose and salts, in which they can synthesize all necessary amino acids, lipids, vitamins, and other essential small molecules. Like all bacteria, E. coli possesses elegant mechanisms for controlling gene activity that are now well understood
TABLE 12
(see Chapter 9). Over time, researchers have developed powerful systems for genetic analysis of this organism. These systems are facilitated by the small size of bacterial genomes, the ease of obtaining mutants, the availability of techniques for transferring genes into bacteria, an enormous wealth of knowledge about bacterial gene control and protein functions, and the relative simplicity of mapping genes relative to one another in the bacterial genome. In Chapter 6 we see how E. coli is used in recombinant DNA research. Bacteria such as E. coli that grow in environments as diverse as the soil and the human gut have about 4000 genes, encoding about the same number of proteins (Table 1-2).
Genome Sizes of Organisms Used in Molecular Cell Biology Research That Have Been Completely Sequenced Base Pairs (Millions)
Eubacteria
Approximate Number of Encoded Proteins*
Chromosomes**
Reference
Mycoplasma genitalum
0.58
500
1
a
Helicobacter pylori
1.67
1,500
1
a
Haemophilus influenza
1.83
1,600
1
a
Escherichia coli
4.64
4,100
1
a
Bacillus subtilis
4.22
4,200
1
a
Methanococcus jannaschii
1.74
1,800
1
a
Sulfolobus solfataricus
2.99
3,000
1
a
12.16
6,700
16
b
14,400
17
b
23.26
5,400
14
b
Drosophila melanogaster
168.74
13,900
6
b
Caenorhabditis elegans
100.29
20,500
6
b
Schmidtea mediterranea (planarian)
480
4
c
Archaea
Single-Celled Eukaryotes Saccharomyces cerevisiae Chlamydomonas reinhardtii Plasmodium falciparum
120.4
Multicellular Eukaryotes (Metazoans)
>20,000***
Danio rerio (zebrafish)
1412.46
26,500
25
b
Gallus gallus (chicken)
1072.54
15,500
33
b
Mus musculus (mouse)
3480.96
23,100
21
b
Homo sapiens (human)
3326.74
20,800
24
b
135.67
27,400
5
b
Arabidopsis thaliana
*Numbers of encoded proteins are current estimates rounded to the nearest 100 based on genome DNA sequences. They will likely change slightly in eubacteria and archaea because of the inclusion of newly discovered genes that code for very small proteins, and modestly in eukaryotes because of newly discovered small genes and because of pseudogenes that are not expressed. **Only nuclear chromosomes are counted in eukaryotes, including distinct sex chromosomes in metazoans. ***Predicted value. source: Table courtesy of Dr. Juan Alvarez-Dominguez. References: a, http://www.ncbi.nlm.nih.gov/genome/; b, http://ensemblgenomes.org/; c, http://www.genome.gov/12512286.
1.2 Prokaryotic Cell Structure and Function
11
Parasitic bacteria such as the Mycoplasma species acquire amino acids and other nutrients from their host cells, and they lack the genes for enzymes that catalyze reactions in the synthesis of amino acids and certain lipids. Many bacterial genes encoding proteins essential for DNA, RNA, protein synthesis, and membrane function are conserved in all organisms, and much of our knowledge of these important cellular processes was uncovered first by studies in E. coli and other bacteria. For example, certain E. coli membrane proteins that import amino acids across the plasma membrane are closely related in sequence, structure, and function to membrane proteins in certain mammalian brain cells that import small nerve-to-nerve signaling molecules called neurotransmitters (see Chapters 11 and 22). Because many of its genes and proteins, as well as their functions, are conserved in all organisms, E. coli has been chosen by scientists as a favorite model organism: an experimental system in which the study of specific genes or proteins, or aspects of cell or organismal function or regulation, can provide an understanding of similar molecules or processes in other species. Throughout this chapter, we will encounter other model organisms that have been chosen because, like E. coli, they are easy to grow and study. Of course, many bacteria cause serious diseases, and research on them is often focused on understanding their unique biology and on discovering antibiotics that selectively kill them but not their human or animal hosts.
1.3 Eukaryotic Cell Structure and Function Eukaryotes comprise all members of the plant and animal kingdoms as well as protozoans (proto, “primitive”; zoan, “animal”), which are exclusively unicellular and include fungi and amoebae. Eukaryotic cells are commonly about 10–100 μm across, generally much larger than bacteria. A typical human fibroblast, a connective tissue cell, is about 15 μm across, with a volume and dry weight some thousands of times those of an E. coli cell. An amoeba, a singlecelled protozoan, can have a cell diameter of approximately 0.5mm, more than 30 times that of a fibroblast. Eukaryotic cells, like prokaryotic cells, are surrounded by a plasma membrane. However, unlike prokaryotic cells, most eukaryotic cells (the human red blood cell is an exception) also contain extensive internal membranes that enclose specific subcellular compartments, the organelles, and separate them from the cytoplasm (Figure 1-12). The cytosol, the organelle-free part of the cytoplasm, contains water, dissolved ions, small molecules, and proteins. Plant cells and most fungal cells are surrounded by a cell wall that gives the cell a rigid shape and also allows for rapid cell expansion. All eukaryotic cells have many of the same organelles and other subcellular structures. Many organelles are surrounded by a single phospholipid membrane, but the
12
CHAPTER 1
t Molecules, Cells, and Model Organisms
nucleus, mitochondrion, and chloroplast are enclosed by two membranes. Each organelle membrane and each space in the interior of an organelle has a unique set of proteins that enable it to carry out its specific functions, including enzymes that catalyze requisite chemical reactions. The membranes defining these subcellular compartments contain proteins that control their internal ionic composition so that it generally differs from that of the surrounding cytosol as well as that of the other organelles. Here we describe the organelles common to all eukaryotic cells as well as several that are found only in certain types of eukaryotes. We begin with the proteins that give eukaryotic cells their shapes and organize the organelles.
The Cytoskeleton Has Many Important Functions The cytoplasm contains an array of fibrous proteins collectively called the cytoskeleton (see Chapters 17 and 18). Three classes of fibers compose the cytoskeleton: microtubules (20nm in diameter), built of polymers of the protein tubulin; microfilaments (7nm in diameter), built of the protein actin; and intermediate filaments (10nm in diameter). All of these fibers are long chains of multiple copies of one or more small protein subunits (Figure 1-13). The cytoskeleton gives the cell strength and rigidity, thereby helping to maintain its shape; this is perhaps most obvious with neurons, in which microtubules and other fibers allow the formation of the long, slim protuberances—the axons and dendrites (see Figure 1-3e and Chapter 22)—that emanate from the cell body and allow each neuron to carry out its specialized functions. Cytoskeletal fibers also control movement of structures within the cell; for example, some cytoskeletal fibers connect to organelles or provide tracks along which organelles and chromosomes move. Other fibers play key roles in cell motility. Perhaps most important, cell division and the segregation of chromosomes and organelles into the two daughter cells could not occur without the organizational framework provided by the cytoskeleton and its associated proteins. Cilia and flagella are similar extensions of the plasma membrane. They contain a bundle of microtubules that gives them shape and, together with motor proteins, allows them to beat rhythmically. They propel materials across epithelial surfaces (Figure 1-14), enable sperm to swim, and push eggs through the oviduct (see Chapter 18). As detailed in Chapter 16, most vertebrate cells contain at least one cilium that plays a key role in cell-cell signaling.
The Nucleus Contains the DNA Genome, RNA Synthetic Apparatus, and a Fibrous Matrix The nucleus, the largest organelle in animal cells, is surrounded by two membranes, each one a phospholipid bilayer containing many different types of proteins
(a) Animal cell
1
2
8
3 14
4
5
6
9
15 7
16 12 10 11 13
1
Plasma membrane controls movement of molecules in and out of the cell and functions in cell-cell signaling and cell adhesion.
2
Mitochondria, which are surrounded by a double membrane, generate ATP by oxidation of glucose and fatty acids.
3
Lysosomes, which have an acidic lumen, degrade material internalized by the cell and worn-out cellular membranes and organelles.
4
Nuclear envelope, a double membrane, encloses the contents of the nucleus; the outer nuclear membrane is continuous with the rough ER.
5
Nucleolus is a nuclear subcompartment where most of the cell's rRNA is synthesized.
6
Nucleus is filled with chromatin composed of DNA and proteins; site of mRNA and tRNA synthesis.
7
Smooth endoplasmic reticulum (ER) contains enzymes that synthesize lipids and detoxify certain hydrophobic molecules.
8
Rough endoplasmic reticulum (ER) functions in the synthesis, processing, and sorting of secreted proteins, lysosomal proteins, and certain membrane proteins.
9
Golgi complex processes and sorts secreted proteins, lysosomal proteins, and membrane proteins synthesized on the rough ER.
10 Secretory vesicles store secreted proteins and fuse with the plasma membrane to release their contents.
Plant cell
11 Peroxisomes contain enzymes that break down fatty acids into smaller molecules used for biosynthesis and also detoxify certain molecules. 12 Cytoskeletal fibers form networks and bundles that support cellular membranes, help organize organelles, and participate in cell movement. 13 Microvilli increase surface area for absorption of nutrients from surrounding medium. 14 Cell wall, composed largely of cellulose, helps maintain the cell's shape and provides protection against mechanical stress. 15 Vacuole stores water, ions, and nutrients, degrades macromolecules, and functions in cell elongation during growth.
17
16 Chloroplasts, which carry out photosynthesis, are surrounded by a double membrane and contain a network of internal membrane-bounded sacs. 17 Plasmodesmata are tubelike cell junctions that span the cell wall and connect the cytoplasms of adjacent plant cells.
(b) Nucleus Golgi complex
Lysosome
Mitochondrion
Endoplasmic reticulum
1 μm
FIGURE 112 Subcellular organization of eukaryotic cells. (a) Schematic overview of a “typical” animal cell (top) and plant cell (bottom) and their major substructures. Not every cell type will contain all the organelles, granules, and fibrous structures shown here, and other substructures can be present in some cell types. Cells also differ considerably in shape and in the prominence of various organelles and substructures. (b) Electron micrograph of a plasma cell, a type of white blood cell that secretes antibodies, showing some of the larger organelles. [Part (b) courtesy of I. D. J. Burdett and R. G. E. Murray.]
1.3 Eukaryotic Cell Structure and Function
13
Microtubules
Microfilaments
Intermediate filaments
FIGURE 113 The three types of cytoskeletal filaments have characteristic distributions within mammalian cells. Three views of the same cell. A cultured fibroblast was permeabilized and then treated with three different antibody preparations. Each antibody binds specifically to the protein monomers forming one type of filament and is chemically linked to a differently colored fluorescent
dye (green, blue, or red). Visualization of the stained cell in a fluorescence microscope reveals the locations of filaments bound to a particular dye-antibody preparation. In this case, microtubules are stained blue; microfilaments, red; and intermediate filaments, green. All three fiber systems contribute to the shape and movements of cells. [Courtesy of V. Small.]
(Figure 1-15). The inner nuclear membrane defines the nucleus itself. In most cells, the outer nuclear membrane is continuous with the endoplasmic reticulum, and the space between the inner and outer nuclear membranes is continuous with the lumen of the endoplasmic reticulum (see Figure 1-15a). The two nuclear membranes appear to fuse at nuclear pore complexes, ringlike structures composed of specific membrane proteins through which material moves between the nucleus and the cytosol. The structure of the nuclear pores and the regulated transport of material through them are detailed in Chapters 10 and 13. Intermediate-filament proteins called lamins form a two-dimensional network, called the nuclear lamina, along the inner surface of the inner membrane, giving it shape and rigidity. The breakdown of the lamina occurs early in cell division, as we detail in Chapter 19. In a growing or differentiating cell, the nucleus is metabolically active, as it is the site of DNA replication and the synthesis of ribosomal RNA, mRNA, and a large
variety of noncoding RNAs (see Chapters 5 and 9). Inside the nucleus one can often see a dense subcompartment, termed the nucleolus, where ribosomal RNA is synthesized and ribosomes are assembled (see Figure 1-15b and Chapter 10). The total DNA in an organism is referred to as its genome. In most prokaryotic cells, most or all of the genetic information resides in a single circular DNA molecule about a millimeter in length; this molecule lies, folded back on itself many times, in the central region of the micrometersized cell (see Figure 1-11). In contrast, DNA in the nuclei of eukaryotic cells is distributed among multiple long linear structures called chromosomes. The length and number of chromosomes are the same in all cells of a particular species, but vary among different species (see Table 1-2). Each chromosome comprises a single DNA molecule associated with numerous histones and other proteins. In a nucleus that is not dividing, the chromosomes are dispersed and are not dense enough to be observed in the light microscope. Only during cell division are individual chromosomes visible by light microscopy. When nondividing cells are visualized in an electron microscope, the non-nucleolar regions of the nucleus, called the nucleoplasm, can be seen to have dark- and light-staining areas. The dark areas, which are often closely associated with the nuclear membrane, contain condensed, concentrated DNA that cannot be transcribed into RNA, called heterochromatin (see Figure 1-15b). Chromosomes, which stain intensely with basic dyes, are visible in light and electron microscopes only during cell division, when the DNA becomes tightly compacted (Figure 1-16). Although the large genomic DNA molecule in prokaryotes is associated with proteins, the arrangement of DNA within a bacterial chromosome differs greatly from that within the linear chromosomes of eukaryotic cells; bacterial chromosomes are circular and are associated with different types of proteins than are eukaryotic chromosomes.
Cilia
FIGURE 114 Surface of the ciliated epithelium lining a mammalian trachea viewed in a scanning electron microscope. Beating cilia, which have a core of microtubules, propel mucus and foreign particles out of the respiratory tract, keeping the lungs and airways clear. [NIBSC/Science Source.] 14
CHAPTER 1
t Molecules, Cells, and Model Organisms
Eukaryotic Cells Contain a Large Number of Internal Membrane Structures We noted earlier that, unlike prokaryotic cells, most eukaryotic cells contain extensive internal membranes that enclose
(b)
Rough endoplasmic reticulum
(a) Plasma membrane
Lumen of endoplasmic reticulum Outer nuclear membrane Nuclear pore
Nuclear pore complex
Nucleus
Outer nuclear membrane Inner nuclear membrane
Nucleolus
Nucleolus Condensed heterochromatin
Inner nuclear membrane
Ribosome Chromatin
Cytosol
Endoplasmic reticulum
Lamina
FIGURE 115 Structure of the nucleus. (a) Schematic diagram of the structure of a typical cell nucleus and the connection of the outer nuclear membrane with the rough endoplasmic reticulum. The small black dots attached to the membrane of the rough endoplasmic reticulum are ribosomes that are synthesizing membrane and secreted (a)
proteins. (b) Electron micrograph of a pancreatic acinar cell from the bat Myotis lucifugus. The nucleolus is a subcompartment of the nucleus and is not surrounded by a membrane; most ribosomal RNA is produced in the nucleolus. Darkly staining areas in the nucleus outside the nucleolus are regions of heterochromatin. [Part (b) Don W. Fawcett/Science Source.] (c)
S phase Centromere
Chromosome
Sister chromatid pair
(b)
FIGURE 116 Individual chromosomes can be seen in cells during cell division (mitosis). (a) During the S phase of the cell cycle (see Figure 1-21) chromosomes are duplicated, and the daughter “sister chromatids,” each with a complete copy of the chromosomal DNA, remain attached at the centromere. (b) During the actual cell division process (mitosis), the chromosomal DNA becomes highly compacted, and the pairs of sister chromatids can be seen in the electron micro-
scope, as depicted here. (c) Light-microscope image of a chromosomal spread from a cultured human male lymphoid cell arrested in the metaphase stage of mitosis by treatment with the microtubule-depolymerizing drug colcemid. There is a single copy of the duplicated X and Y chromosomes and two copies of each of the others. [Part (b) Medical RF/The Medical File/Peter Arnold Inc. Part (c) courtesy of Tatyana Pyntikova.]
1.3 Eukaryotic Cell Structure and Function
15
specific subcellular compartments, termed organelles. Here we review the organelles and their functions. Endoplasmic Reticulum and Golgi Complex Generally the largest membrane in a eukaryotic cell encloses the organelle termed the endoplasmic reticulum (ER)—an extensive network of closed, flattened membrane-bounded sacs called cisternae (Figure 1-17; see also Figure 1-15a). The endoplasmic reticulum has a number of functions in the cell but is particularly important in the synthesis of lipids, secreted proteins, and many types of membrane proteins. The smooth endoplasmic reticulum is smooth because it lacks ribosomes; it is the site of synthesis of fatty acids and phospholipids. In contrast, the cytosolic side of the rough endoplasmic reticulum is studded with ribosomes; these ribosomes synthesize certain membrane and organelle proteins and virtually all proteins that are to be secreted from the cell (see Chapter 13). As a growing polypeptide emerges from a ribosome, it passes through the rough ER membrane with the help of specific transport proteins that are embedded in the membrane. Newly made membrane proteins remain associated with the rough ER membrane, and proteins to be secreted accumulate in the lumen, the aqueous interior of the organelle. Several minutes after proteins are synthesized in the rough ER, most of them leave the organelle within small membrane-bounded transport vesicles. These vesicles, which bud from regions of the rough ER not coated with ribosomes, carry the proteins to another membrane-bounded organelle, the Golgi complex (see Figure 1-17). As detailed in Chapter 14, secreted and membrane proteins undergo a series of enzyme–catalyzed chemical modifications in the Golgi complex that are essential for these proteins to function normally. After proteins to be secreted and membrane proteins are modified in the Golgi complex, they are transported out of the complex by a second set of vesicles, which bud from one side of the Golgi complex. Some vesicles carry membrane
proteins destined for the plasma membrane or soluble proteins to be released from the cell into the extracellular space; others carry soluble or membrane proteins to lysosomes or other organelles. How intracellular transport vesicles “know” with which membranes to fuse and where to deliver their contents is also discussed in Chapter 14. Endosomes Although transport proteins in the plasma membrane mediate the movement of ions and small molecules into the cell across the lipid bilayer, proteins and some other soluble macromolecules in the extracellular milieu are internalized by endocytosis. In this process, a segment of the plasma membrane invaginates into a coated pit, whose cytosolic face is lined by a specific set of proteins that cause vesicles to form. The pit pinches from the membrane into a small membrane-bounded vesicle that contains the extracellular material. The vesicle is delivered to and fuses with an endosome, a sorting station of membrane-limited tubules and vesicles (Figure 1-18). From this compartment, some membrane proteins are recycled back to the plasma membrane; other membrane proteins are transported in vesicles that eventually fuse with lysosomes for degradation. The entire endocytic pathway is described in detail in Chapter 14. Lysosomes Lysosomes provide an excellent example of the ability of intracellular membranes to form closed compartments in which the composition of the lumen (the aqueous interior of the compartment) differs substantially from that of the surrounding cytosol. Found exclusively in animal cells, lysosomes are responsible for degrading many components that have become obsolete for the cell or organism. The process by which an aged organelle is degraded in a lysosome is called autophagy (“eating oneself”). Materials taken into a cell by endocytosis or phagocytosis may also be degraded in lysosomes (see Figure 1-18). In phagocytosis, large, insoluble particles (e.g., bacteria) are enveloped by the plasma membrane and internalized.
Golgi complex Rough endoplasmic reticulum
FIGURE 117 The Golgi complex and rough endoplasmic reticulum. An electron micrograph of a section of a human liver cell shows the abundant ribosome-studded rough endoplasmic reticulum and the Golgi complex, as well as many ribosomes free in the cytosol. [Courtesy George E. Palade EM Slide
Vesicles moving proteins
endoplasmic
from the rough endoplasmic
reticulum
reticulum to the Golgi complex
Collection, University of California, San Diego.] 16
CHAPTER 1
200 nm
Lumen of rough
t Molecules, Cells, and Model Organisms
Lumen of Golgi vesicle
FIGURE 118 Endosomes and other cellular structures deliver materials to lysosomes. Schematic overview of three pathways by which materials are moved to lysosomes. Soluble macromolecules and molecules bound to proteins on the cell surface are taken into the cell by invagination of segments of the plasma membrane and delivered to lysosomes through the endocytic pathway 1 . Whole cells and other large, insoluble particles move from the cell surface to lysosomes through the phagocytic pathway 2 . Worn-out organelles and bulk cytoplasm are delivered to lysosomes through the autophagic pathway 3 . Within the acidic lumen of a lysosome, hydrolytic enzymes degrade proteins, nucleic acids, lipids, and other large molecules.
Phagosome
Plasma membrane
Bacterium 2 Phagocytosis
1 Endocytosis
Lysosomes contain a group of enzymes that degrade polymers into their monomeric subunits. For example, nucleases degrade RNA and DNA into their mononucleotide building blocks; proteases degrade a variety of proteins and peptides; phosphatases remove phosphate groups from mononucleotides, phospholipids, and other compounds; still other enzymes degrade complex polysaccharides and glycolipids into smaller units. All of these lysosomal enzymes, collectively termed acid hydrolases, work most efficiently at acidic pH values. The acidic pH helps to denature proteins, making them accessible to the action of the lysosomal hydrolases. These enzymes are less active at the neutral pH of cells and most extracellular fluids. Thus if a lysosome releases its enzymes into the cytosol, where the pH is between 7.0 and 7.3, they cause little degradation of cytosolic components. Cytosolic and nuclear proteins generally are not degraded in lysosomes, but rather in proteasomes, large multiprotein complexes in the cytosol (see Chapter 3).
Lysosome Lysosomes Early endosome
Mitochondrion ER Autophagosome 3 Autophagy
sol and their retention in the vacuole lumen. The number and size of vacuoles depend on both the type of cell and its stage of development; a single vacuole may occupy as much as 80 percent of a mature plant cell (Figure 1-19). Like that of a lysosome, the lumen of a vacuole contains a battery of degradative enzymes and has an acidic pH, which is maintained by similar transport proteins in the vacuolar membrane. Thus plant vacuoles may also have a degradative function similar to that of lysosomes in animal cells. Similar
Vacuole
Peroxisomes All animal cells (except erythrocytes) and many plant and fungal cells contain peroxisomes, a class of roughly spherical organelles 0.2–1.0 μm in diameter. Peroxisomes contain several oxidases: enzymes that use molecular oxygen to oxidize organic substances and in the process form hydrogen peroxide (H2O2), a corrosive substance. Peroxisomes also contain copious amounts of the enzyme catalase, which degrades hydrogen peroxide to yield water and oxygen (see Chapter 12). Plant seeds contain glyoxisomes, small organelles that oxidize stored lipids as a source of carbon and energy for growth. They are similar to peroxisomes and contain many of the same types of enzymes as well as additional ones used to convert fatty acids into glucose precursors. Plant Vacuoles Most plant cells contain at least one membrane-limited vacuole that accumulates and stores water, ions, and small-molecule nutrients such as sugars and amino acids. A variety of membrane proteins in the vacuolar membrane allow the transport of these molecules from the cyto-
Late endosome
Chloroplast Granum Stroma Thylakoid membrane Cell wall
2 μm
FIGURE 119 Electron micrograph of a thin section of a leaf cell. In this cell, a single large vacuole occupies much of the cell volume. Parts of five chloroplasts and the cell wall are also visible. Note the internal subcompartments in the chloroplasts. [Biophoto Associates/ Science Source.]
1.3 Eukaryotic Cell Structure and Function
17
storage vacuoles are found in green algae and in many microorganisms such as fungi.
Mitochondria Are the Principal Sites of ATP Production in Aerobic Cells Most eukaryotic cells contain many mitochondria (Figure 1-20), which occupy up to 25 percent of the volume of the cytoplasm. These complex organelles, which are the main sites of ATP production during aerobic metabolism, are generally exceeded in size only by the nucleus, vacuoles, and chloroplasts. The two membranes that bound a mitochondrion differ in composition and function. The outer mitochondrial membrane contains proteins that allow many molecules to move from the cytosol to the intermembrane space between the inner and outer membrane. The inner mitochondrial membrane, which is much less permeable, is about 20 percent lipid and 80 percent protein—a proportion of protein that is higher than those in other cellular membranes. The surface area of the inner membrane is greatly increased by a large number of infoldings, or cristae, that protrude into the matrix, or central aqueous space. In non-photosynthetic cells, the principal fuels for ATP synthesis are fatty acids and glucose. The complete aerobic degradation of 1 molecule of glucose to carbon dioxide and water is coupled to the synthesis of as many as 30 molecules of ATP from ADP and inorganic phosphate (see Figure 1-6). In eukaryotic cells, the initial stages of glucose degradation take place in the cytosol, where 2 ATP molecules per glucose molecule are generated. The terminal stages of oxidation and Inner membrane
Cristae
Outer membrane
3 Rm Intermembrane space
Matrix granules
Matrix
FIGURE 120 Electron micrograph of a mitochondrion in a pancreas cell. The smooth outer membrane forms the outside boundary of the mitochondrion. The inner membrane is distinct from the outer membrane and is highly invaginated to form sheets and tubes called cristae; ATP is produced by proteins embedded in the membranes of the cristae. The aqueous space between the inner and outer membranes (the intermembrane space) and the space inside the inner membrane (the matrix) each contain specific proteins important for the metabolism of sugars, lipids, and other molecules. [Keith R. Porter/ Science Source.] 18
CHAPTER 1
t Molecules, Cells, and Model Organisms
ATP synthesis are carried out by enzymes in the mitochondrial matrix and inner membrane (see Chapter 12); as many as 28 ATP molecules per glucose molecule are generated in mitochondria. Similarly, virtually all the ATP formed in the oxidation of fatty acids to carbon dioxide is generated in mitochondria. Thus mitochondria can be regarded as the “power plants” of the cell. Mitochondria contain small DNA molecules that encode a small number of mitochondrial proteins; the majority of mitochondrial proteins are encoded by nuclear DNA. As discussed in Chapter 12, the popular endosymbiont hypothesis postulates that mitochondria originated by endocytosis of an ancient bacterium by the precursor of a eukaryotic cell; the bacterial plasma membrane evolved to become the inner mitochondrial membrane.
Chloroplasts Contain Internal Compartments in Which Photosynthesis Takes Place Except for vacuoles, chloroplasts are the largest and the most characteristic organelles in the cells of plants and green algae (see Figure 1-19). The endosymbiont hypothesis (see Chapter 12) posits that these organelles originated by endocytosis of a primitive photosynthetic bacterium. Chloroplasts can be as long as 10 μm and are typically 0.5–2 μm thick, but they vary in size and shape in different cells, especially among the algae. In addition to the inner and outer membranes that bound a chloroplast, this organelle also contains an extensive internal system of interconnected membranelimited vesicles called thylakoids, which are flattened to form disks. Thylakoids often form stacks called grana and are embedded in an aqueous matrix termed the stroma. The thylakoid membranes contain green pigments (chlorophylls) and other pigments that absorb light, as well as enzymes that generate ATP during photosynthesis. Some of the ATP is used to convert carbon dioxide into three-carbon intermediates by enzymes located in the stroma; the intermediates are then exported to the cytosol and converted into sugars. The molecular mechanisms by which ATP is formed in mitochondria and chloroplasts are very similar, as explained in Chapter 12. Besides being surrounded by two membranes, chloroplasts and mitochondria have other features in common: both often migrate from place to place within cells, and both contain their own DNA, which encodes some of the key organelle proteins (see Chapter 12). The proteins encoded by mitochondrial or chloroplast DNA are synthesized on ribosomes within the organelles. However, most of the proteins in each organelle are encoded in nuclear DNA and are synthesized in the cytosol; these proteins are then incorporated into the organelles by processes described in Chapter 13.
All Eukaryotic Cells Use a Similar Cycle to Regulate Their Division Unicellular eukaryotes, animals, and plants all use essentially the same cell cycle, the series of events that prepares a cell to
Nondividing cells
Resting cells
G0
division of cells. In Chapter 24 we examine the molecular and cellular events that lead to inappropriate, uncontrolled proliferation of cells.
G1 RNA and protein synthesis
M
1.4 Unicellular Eukaryotic Model Organisms
Cell division DNA replication
S
RNA and protein synthesis G2
FIGURE 121 During growth, all eukaryotic cells continually progress through the four phases of the cell cycle. In proliferating cells, the four phases of the cell cycle proceed successively. In humans, the cycle takes from 10 to 20 hours depending on cell type and developmental state. Yeasts divide much faster. During interphase, which consists of the G1, S, and G2 phases, the cell roughly doubles its mass. Replication of DNA during the S phase leaves the cell with four copies of each type of chromosome. In the mitotic (M) phase, the chromosomes are evenly partitioned into two daughter cells, and in most cases the cytoplasm divides roughly in half. Under certain conditions, such as starvation or when a tissue has reached its final size, cells will stop cycling and remain in a waiting state called G0. Some types of cells in G0 can reenter the cell cycle if conditions change.
divide, and the same actual division process, called mitosis. The eukaryotic cell cycle is commonly divided into four phases (Figure 1-21). The chromosomes and the DNA they carry are duplicated during the S (synthesis) phase. The replicated chromosomes separate during the M (mitotic) phase, in which the cell divides, and each daughter cell gets a copy of each chromosome. The M and S phases are separated by two gap phases, the G1 phase and the G2 phase, during which mRNAs, proteins, lipids, and other cell constituents are made and the cell increases in size. Under optimal conditions, some bacteria, such as E. coli, can divide to form two daughter cells once every 30 minutes. Most eukaryotic cells take considerably longer to grow and divide, generally several hours. Moreover, the cell cycle in eukaryotes is normally highly regulated (see Chapter 19). This tight control prevents imbalanced, excessive growth of cells and tissues if essential nutrients or certain hormonal signals are lacking. Some highly specialized cells in adult animals, such as neurons and striated muscle cells, divide rarely, if at all. However, an organism usually replaces worn-out cells or makes more cells in response to a new need, as exemplified by the generation of new muscle cells from undifferentiated stem cells in response to exercise or damage. Another example is the formation of additional red blood cells when a person ascends to a higher altitude and needs more capacity to capture oxygen. The fundamental defect in cancer is loss of the ability to control the growth and
Our current understanding of the molecular functioning of eukaryotic cells largely rests on studies of just a few types of organisms, termed model organisms (Figure 1-22). Because of the evolutionary conservation of genes, proteins, organelles, cell types, and so forth, discoveries about biological structures and functions obtained with one experimental organism often apply to others. Thus researchers generally conduct studies with the organism that is most suitable for rapidly and completely answering the question being posed, knowing that the results obtained in one organism are likely to be broadly applicable. Indeed, many organisms, particularly rats, frogs, sea urchins, chickens, and slime molds, have been and continue to be immensely valuable for cell biology research. As more and more organisms have their entire genomes sequenced, a wide variety of other species are increasingly being used for investigations, especially for studies of the evolution of genes, cells, and organisms and of how organisms become adapted to diverse ecological niches. As we have seen, bacteria are excellent models for studies of several cellular functions, but they lack the organelles found in eukaryotes. Unicellular eukaryotes such as yeasts are used to study many fundamental aspects of eukaryotic cell structure and function. Metazoan models such as the roundworm, fruit fly, and mouse are required to study more complex tissue and organ systems and development. As we will see in this section and the next, several eukaryotic model organisms are widely used to understand complex cell systems and mechanisms.
Yeasts Are Used to Study Fundamental Aspects of Eukaryotic Cell Structure and Function One group of single-celled eukaryotes, the yeasts, has proven exceptionally useful in molecular and genetic analysis of eukaryotic cell formation and function. Yeasts and their multicellular cousins, the molds, which collectively constitute the fungi, have an important ecological role in breaking down plant and animal remains for reuse. They also make numerous antibiotics and are used in the manufacture of bread, beer, and wine. The common yeast used to make bread and beer, Saccharomyces cerevisiae, appears frequently in this book because it has proved to be an extremely useful experimental organism. Homologs of many of the approximately 6000 different proteins expressed in an S. cerevisiae cell (see Table 1-2) are found in most, if not all, eukaryotes and are important for cell division or for the functioning of individual eukaryotic organelles. Much of what we know of the proteins in the endoplasmic reticulum and Golgi complex 1.4 Unicellular Eukaryotic Model Organisms
19
(a)
(b)
Yeast (Saccharomyces cerevisiae)
Alga (Chlamydomonas reinhardtii)
Control of cell cycle and cell division Protein secretion and membrane biogenesis Function of the cytoskeleton Cell differentiation Aging Gene regulation and chromosome structure
(c)
Structure and function of flagella Chloroplasts and photosynthesis Organelle movement Phototaxis
(d)
Fruit fly (Drosophila melanogaster)
Roundworm (Caenorhabditis elegans)
Development of the body plan Generation of differentiated cell lineages Formation of the nervous system, heart, and musculature Programmed cell death Genetic control of behavior Cancer genes and control of cell proliferation Control of cell polarization Effects of drugs, alcohol, pesticides
Development of the body plan Cell lineage Formation and function of the nervous system Control of programmed cell death Cell proliferation and cancer genes Aging Behavior Gene regulation and chromosome structure (e)
(f)
Planarian (Schmidtea mediterranea)
Zebrafish (Danio rerio) Development of vertebrate body tissues Formation and function of brain and nervous system Birth defects Cancer
Stem cells Turnover of adult tissues Wound healing Regeneration Pharynx Photoreceptors (g)
(h)
Mouse (Mus musculus), including cultured cells Development of body tissues Function of mammalian immune system Formation and function of brain and nervous system Models of cancers and other human diseases Gene regulation and inheritance Infectious disease Behavior
FIGURE 122 Each eukaryotic organism used in cell biology has advantages for certain types of studies. The yeast Saccharomyces cerevisiae (a) has the cellular organization of a eukaryote but is a relatively simple single-celled organism that is easy to grow and to manipulate genetically. The green alga Chlamydomonas reinhardtii (b) is widely used to study photosynthesis and the structure and function of flagella. In the roundworm Caenorhabditis elegans (c), which has a small number of cells arranged in a nearly identical way in every worm, the formation of each individual cell can be traced. The fruit fly Drosophila melanogaster (d), first used to discover the properties of chromosomes, has been especially valuable in identifying genes that control embryonic development. Many of these genes are evolutionarily conserved in humans. Planaria (e) are flatworms that can regenerate any part of 20
CHAPTER 1
t Molecules, Cells, and Model Organisms
Plant (Arabidopsis thaliana) Development and patterning of tissues Genetics of cell biology Agricultural applications Physiology Gene regulation Immunity Infectious disease
the body that is cut off, including the head and the photoreceptors. The stem cells that give rise to their new cells and tissues are widely studied. The zebrafish Danio rerio (f) is used for rapid genetic screens to identify genes that control vertebrate development and organogenesis. Of the experimental animal systems, mice (Mus musculus) (g) are evolutionarily the closest to humans and have thus provided models for studying numerous human genetic and infectious diseases. The mustard-family weed Arabidopsis thaliana (h) has been used for genetic screens to identify genes involved in nearly every aspect of plant life. [Part (a) Scimat/Photo Researchers, Inc. Part (b) William Dentler University of Kansas. Part (c) Science Source. Part (d) Darwin Dale/Science Source. Part (e) Peter Reddien, MIT Whitehead Institute. Part (f) blickwinkel/Hartl/Alamy. Part (g) J. M. Labat/ Jacana/Photo Researchers, Inc. Part (h) Darwin Dale/Science Source.]
(a)
that promote protein secretion was elucidated first in yeasts (see Chapter 14). Yeasts were also essential for the identification of many proteins that regulate the cell cycle and catalyze DNA replication and transcription. S. cerevisiae (Figure 1-23a; see also Figure 1-22a) and other yeasts offer many advantages to molecular and cellular biologists: r Vast numbers of yeast cells can be grown easily and cheaply in culture from a single cell; the cells in such clones are genetically identical and have the same biochemical properties. Individual proteins or multiprotein complexes can be purified from large amounts of cells and then studied in detail. r Yeast cells may be either haploid (containing one copy of each chromosome) or diploid (containing two copies of each chromosome), and both forms can divide by mitosis; this ability makes isolating and characterizing mutations in genes encoding essential yeast cell proteins relatively straightforward. r Yeasts, like many organisms, have a sexual cycle that allows exchange of genes between cells. Under starvation conditions, diploid cells undergo meiosis (see Chapter 19) to form haploid daughter cells, which are of two types, a and α cells. If haploid a and α cells encounter each other, they can fuse, forming an a/α diploid cell that contains two copies of each chromosome, one from each parent cell (Figure 1-23b). With the use of a single species such as S. cerevisiae as a model organism, results from studies carried out by tens of thousands of scientists worldwide, using multiple experimental techniques, can be combined to yield a deeper level of understanding of a single type of cell. As we will see many times in this book, conclusions based on studies of S. cerevisiae have often proved true for all eukaryotes and have formed the basis for exploring the evolution of more complex processes in multicellular animals and plants.
Mutations in Yeast Led to the Identification of Key Cell Cycle Proteins Biochemical studies can tell us much about an individual protein, but they cannot prove that it is required for cell division or any other cell process. The importance of a protein is demonstrated most firmly if a mutation that prevents its synthesis or makes it nonfunctional adversely affects the process under study. In a classical genetics approach, scientists isolate and characterize mutants that lack the ability to do something a normal organism can do. Often large genetic “screens” are done to look for many different mutant individuals (e.g., fruit flies, yeast cells) that are unable to complete a certain process, such as cell division or muscle formation. Mutations are usually produced by treatment with a mutagen, a chemical or physical agent that promotes mutations in a largely random fashion. But how can we isolate and maintain mutant organisms or cells that are defective in some process, such as cell division or protein secretion, that is essential for survival?
Budding (S. cerevisiae) (b) Mating between haploid 1 cells of opposite mating type a α
2
Vegetative growth of diploid cells
Diploid cells (a/α) Bud
5 Vegetative growth of haploid cells
4
Four haploid ascospores within ascus
Ascus ruptures, spores germinate
Starvation causes 3 ascus formation, meiosis
FIGURE 123 The yeast Saccharomyces cerevisiae can be haploid or diploid and can reproduce sexually or asexually. (a) Scanning electron micrograph of the budding yeast Saccharomyces cerevisiae. These cells grow by an unusual type of mitosis termed mitotic budding. One daughter nucleus remains in the “mother” cell; the other daughter nucleus is transported into the bud, which grows in size and soon is released as a new cell. After each bud cell breaks free, a scar is left at the budding site, so the number of previous buds on the parent cell can be counted. The orange-colored cells are bacteria. (b) Haploid yeast cells can have different mating types, called a (blue) and α (orange). Both types contain a single copy of each yeast chromosome, half the usual number, and grow by mitotic budding. Two haploid cells that differ in mating type, one a and one α, can fuse together to form an a/α diploid cell that contains two copies of each chromosome; diploid cells can multiply by mitotic budding. Under starvation conditions, a diploid cell can undergo meiosis, a special type of cell division, to form four haploid ascospores. Rupture of an ascus releases four haploid spores, which can germinate into haploid a and α cells. These cells can also multiply asexually. [Part (a) SCIMAT/Science Source.]
One way is to isolate organisms with a temperaturesensitive mutation. These mutants are able to grow at the permissive temperature, but not at another, usually higher temperature, the nonpermissive temperature. Normal cells can grow at either temperature. In most cases, a temperature-sensitive mutant produces an altered protein that works at the permissive temperature but unfolds and is nonfunctional at the nonpermissive temperature. Screens for temperature-sensitive mutations are most readily done with haploid 1.4 Unicellular Eukaryotic Model Organisms
21
organisms such as yeasts because they have only one copy of each gene, and thus a mutation in it will immediately have a consequence. By analyzing the effects of numerous different temperature-sensitive mutations that altered the division of haploid yeast cells, geneticists discovered most of the genes necessary for cell division without knowing anything, initially, about which proteins they encode or how these proteins participate in the process. In general, the great power of genetics is to reveal the existence and relevance of all proteins required for a particular cell function without prior knowledge of their biochemical identity or molecular function. These “mutation-defined” genes can be isolated and replicated (cloned) with recombinant DNA techniques discussed in Chapter 6. With the isolated genes in hand, the encoded proteins can be produced in a test tube or in engineered bacteria or cultured cells. In this way, biochemists can investigate whether the genes necessary for cell division encode proteins that associate with other proteins or DNA or catalyze particular chemical reactions during cell division (see Chapter 19). Most of these yeast cell cycle genes are found in human cells as well, and the encoded proteins have similar amino acid sequences. Proteins from different organisms, but with similar amino acid sequences, are said to be homologous; such proteins may have the same or similar functions. Remarkably, it has been shown that a human cell cycle protein, when expressed in a mutant yeast defective in the homologous yeast protein, is able to “rescue the defect” of the mutant yeast (that is, to allow the cell to grow normally), thus demonstrating the protein’s ability to function in a very different type of eukaryotic cell. This experimental result, which garnered a Nobel Prize for Paul Nurse, was especially notable because the common ancestor of present-day yeasts, plants, and humans is thought to have lived over a billion years ago. Clearly the eukaryotic cell cycle and many of the genes and proteins that catalyze and regulate it evolved early in biological evolution and have remained quite constant over a very long period of evolutionary time. Subsequent studies showed that mutations in many yeast cell cycle proteins that allow uncontrolled cell growth also frequently occur in human cancers (see Chapter 24), again attesting to the important conserved functions of these proteins in all eukaryotes.
Studies in the Alga Chlamydomonas reinhardtii Led to the Development of a Powerful Technique to Study Brain Function The green unicellular alga Chlamydomonas reinhardtii (Figure 1-22b), which swims using its two long flagella, is widely used in studies of the structure, function, and assembly of this organelle. In part because of the powerful genetic techniques now available, Chlamydomonas is also used in studies of chloroplast formation and photosynthesis. The Chlamydomonas genome (see Table 1-2) encodes many more proteins than do those of yeasts, including flagellar proteins and proteins needed to build a chloroplast, organelles not found in yeasts. 22
CHAPTER 1
t Molecules, Cells, and Model Organisms
One important outcome of the use of this experimental organism came from studies of phototaxis, the behavior in which an organism moves toward or away from a source of light. Chlamydomonas needs to move toward light to undergo photosynthesis and thus generate the energy it needs to grow and divide, but light that is too intense repels it, as it causes damage to the chloroplast. Studies of Chlamydomonas phototaxis led to the discovery of two proteins in its plasma membrane that, when they absorb light, open a “channel” in the membrane that allows ions such as Ca2+ to flow from the extracellular medium into the cytosol, triggering phototactic responses. As detailed in Chapter 22, recombinant DNA techniques have been used to express one such protein in specific neurons in the mouse brain, allowing investigators to activate just one or a few cells in the brain using a point source of light. Thus studies on this humble alga have led to the development of an important experimental system—optogenetics—for the study of brain function.
The Parasite That Causes Malaria Has Novel Organelles That Allow It to Undergo a Remarkable Life Cycle Whereas yeasts are used in the manufacture of bread, beer, wine, and cheese, some unicellular eukaryotes cause major human diseases and are widely studied in an attempt to develop drugs that will kill them but not injure their human host. Entamoeba histolytica causes dysentery; Trichomonas vaginalis, vaginitis; and Trypanosoma brucei, sleeping sickness. Each year the worst of these protozoans, Plasmodium falciparum and related species, cause more than 300 million new cases of malaria, a disease that kills 1.5 million to 3 million people annually. These protozoans inhabit mammals and mosquitoes alternately, changing their morphology and behavior in response to signals in each of these environments. The complex life cycle of Plasmodium dramatically illustrates how a single cell can adapt to multiple challenges (Figure 1-24a). Additionally, the merozoite form that infects human red blood cells contains several organelles, not found in most eukaryotes, that enable the parasite to invade a red blood cell, including the rhoptry, polar ring, and microneme, as well as a fuzzy surface coat on the outside of the cell (Figure 1-24b, c). Entry of the parasite into a red blood cell is initiated by binding of certain parasite cell-surface proteins to proteins on the red blood cell surface, followed by the formation of a tight junction between the two plasma membranes, the loss of the “fuzzy coat,” and secretion of proteins stored in the microneme and rhoptry. All the transformations in cell type that occur during the Plasmodium life cycle are governed by instructions encoded in the genetic material of this parasite (see Table1-2). The Plasmodium genome has about the same number of protein-coding genes as the yeast Saccharomyces cerevisiae, but about two-thirds of the Plasmodium genes appear to be unique to this and related parasites, attesting to the great evolutionary distance between these parasites, the
(a)
(b)
Sporozoites 1
Liver
Polar ring
Surface coat Microtubule
2 Oocyst
Microneme
Mitochondrion
8
Rhoptry
Plastid Merozoites
Mosquito
Human Red blood cell
Zygote 7
Ribosome Dense granules
Nucleus
Sperm
Plasma membrane Merozoites 3
6
5 4
Egg
Gametocytes
Sporulation
(c) Microneme Rhoptry
FIGURE 124 Plasmodium species, the parasites that cause malaria, are single-celled protozoans with a remarkable life cycle. Many Plasmodium species are known, and they can infect a variety of animals, cycling between insect and vertebrate hosts. The four species that cause malaria in humans undergo several dramatic transformations within their human and mosquito hosts. (a) Diagram of the life cycle. Step 1 : Sporozoites enter a human host when an infected Anopheles mosquito bites a person. Step 2 : They migrate to the liver, where they develop into merozoites, which are released into the blood. Merozoites differ substantially from sporozoites, so this transformation is a metamorphosis (Greek, “to transform” or “many shapes”). Step 3 : Circulating merozoites invade red blood cells (RBCs) and reproduce within them. Proteins produced by some Plasmodium species move to the surface of infected RBCs, causing the cells to adhere to the walls of blood vessels. This prevents infected RBCs from circulating to the spleen, where cells of the immune system would destroy the RBCs and the Plasmodium organisms they harbor. Step 4 : After growing and reproducing in RBCs for a period of time characteristic of each Plasmodium species, the merozoites suddenly burst forth in synchrony from large numbers of infected cells. It is this event that brings on the fevers and shaking chills that are the well-known symptoms of malaria. Some of the released merozoites infect additional RBCs, creating a cycle of production and infection. Step 5 : Eventually, some merozoites undergo meiosis and develop into male and female gametocytes, another metamorphosis. These cells, which contain half the usual number of
Tight junction between plasma membranes of P. vivax and red blood cell
chromosomes, cannot survive for long unless they are transferred in blood to an Anopheles mosquito. Step 6 : In the mosquito’s stomach, the gametocytes are transformed into sperm or eggs (gametes), yet another metamorphosis marked by development of long hairlike flagella on the sperm. Step 7 : Fusion of sperm and eggs generates zygotes, which implant into the cells of the stomach wall and grow into oocysts, essentially factories for producing sporozoites. Step 8 : Rupture of an oocyst releases thousands of sporozoites, which migrate to the salivary glands, setting the stage for infection of another human host. (b) Organelles of the Plasmodium vivax merozoite. Some of these organelles are found only in Plasmodium and related eukaryotic parasitic microorganisms. (c) Section of a Plasmodium vivax merozoite invading a human red blood cell. See A. Cowman and B. Crabb, 2006, Cell 124:755–766. [Part (c) Masamichi Aikawa.] 1.4 Unicellular Eukaryotic Model Organisms
23
Apicomplexa (see Figure 1-1), and most other eukaryotes as well as the presence of unusual organelles required for their complex life cycles.
1.5 Metazoan Structure, Differentiation, and Model Organisms The evolution of multicellular organisms most likely began when cells remained associated in small colonies after division instead of separating into individual cells. A few prokaryotes and several unicellular eukaryotes, such as Volvox (see Figure 1-3d), as well as many fungi and slime molds, exhibit such rudimentary social behavior. The full flowering of multicellularity, however, occurred in eukaryotic organisms whose cells became differentiated and organized into groups, or tissues, in which the different cells performed specialized functions.
Multicellularity Requires Cell-Cell and Cell-Matrix Adhesions The cells of higher plants are encased in a network of chambers formed by the interlocking cell walls surrounding the cells and are connected by cytoplasmic bridges called plasmodesmata (see Figure 1-12a). Animal cells are often “glued” together into a chain, a ball, or a sheet by celladhesion proteins on their surfaces, often called cell-adhesion molecules, or CAMs (see Figure 1-4d). Some CAMs bind cells to one another; other types bind cells to the extracellular matrix, forming a cohesive unit. In animals, the matrix cushions cells and allows nutrients to diffuse toward them and waste products to diffuse away. A specialized, especially tough matrix called the basal lamina, made up of polysaccharides and multiple proteins such as collagen, forms a supporting layer underlying cell sheets and prevents the cell aggregates from ripping apart (see Figure 1-4). Many CAMs and extracellular-matrix proteins found in humans also occur in invertebrates, indicating their importance during metazoan evolution. Similarly, many of the proteins and small molecules used by metazoans as signaling molecules are conserved in humans and many invertebrates, as are their receptors, the cellular proteins that bind to these signaling molecules and trigger an effect in the receiving cell. As one example, the signaling protein Wnt, discussed in Chapter 16, was discovered simultaneously as the gene mutated in the Drosophila Wingless mutation and as the site of integration of a cancer-causing virus in mice.
Epithelia Originated Early in Evolution Metazoans, which are thought to have evolved in an oceanlike, saline environment, had to solve a fundamental problem: separating the inside of the organism from the outside. The external surfaces of all metazoan animals, as well as the surfaces of their internal organs, are covered by a sheet-like layer of tissue called an epithelium. Epithelia commonly serve
24
CHAPTER 1
t Molecules, Cells, and Model Organisms
as barriers and protective surfaces, as exemplified by the sheets of epidermal cells that form the skin (see Figure 1-4). Other epithelia are one cell layer thick and line internal organs such as the small intestine, where they are crucial for transport of the products of digestion (e.g., glucose and amino acids) into the blood (see Chapter 11). As discussed in Chapter 20, epithelia in different body locations have characteristic morphologies and functions. Cells that form epithelial tissues are said to be polarized because their plasma membranes are organized into at least two discrete regions. Typically, the distinct surfaces of a polarized epithelial cell are the apical surface—the “top” of the cell facing the external world—and the basal and lateral (collectively, basolateral) surfaces that face the organism’s interior. As shown in Figure 1-4, the basal surface usually contacts an underlying extracellular matrix, the basal lamina. Specialized junction proteins in the basolateral plasma membrane link adjacent cells together and also bind the cells to the basal lamina.
Tissues Are Organized into Organs Cells in metazoans do not work in isolation; specialized groups of differentiated cells often form tissues, which are themselves the major components of organs. For example, the lumen of a small blood vessel is lined with a sheet-like layer of endothelial cells, or endothelium, which prevents blood cells from leaking out (Figure 1-25). A layer of smooth muscle tissue encircles the endothelium and basal lamina and contracts to limit blood flow. During times of fright, constriction of smaller peripheral vessels forces more blood to the vital organs. The muscle layer of a blood vessel is wrapped in an outer layer of connective tissue, a network of fibers and cells that encases the vessel walls and protects them from stretching and rupture. This hierarchy of tissues is copied in other blood vessels, which differ mainly in the thickness of the layers. The wall of a major artery must withstand much stress and is therefore thicker than that of a minor vessel. The strategy of grouping and layering different tissues is used to build other complex organs as well. In each case, the function of the organ is determined by the specific functions of its component tissues, and each type of cell in a tissue produces the specific groups of proteins that enable the tissue to carry out its functions.
Genomics Has Revealed Important Aspects of Metazoan Evolution and Cell Function Metazoans—be they invertebrates such as the fruit fly Drosophila melanogaster and the roundworm Caenorhabditis elegans, or vertebrates such as mice and humans—contain between 13,000 and 23,000 protein-coding genes, about three to four times as many as a yeast (see Table 1-2). Sequencing of entire genomes has shown that many of these genes are conserved among the metazoans, and genetic
Connective tissue
Lumen
Endothelium
Smooth muscle
FIGURE 125 All organs are organized arrangements of various tissues, as illustrated in this cross section of a small artery (arteriole). Blood flows through the vessel lumen, which is lined by a thin sheet of endothelial cells forming the endothelium and by the underlying basal lamina. This tissue adheres to the overlying layer of smooth muscle tissue; contraction of the muscle layer controls blood flow through the vessel. A fibrillar layer of connective tissue surrounds the vessel and connects it to other tissues. [SPL/Science Source.]
studies have shown that many of them are essential for the formation and function of specific tissues and organs. Thus many of the organisms listed in Table 1-2 are used to study the roles of these conserved proteins in cell development and function. While the human and mouse genomes encode about the same number of proteins as those of the roundworm Caenorhabditis elegans, frogs, and fish, mammalian cells contain about 30 times the DNA of a roundworm and two to three times the DNA of frogs and fish. Only about 10 percent of human DNA encodes proteins. We know now that much of the remaining 90 percent has important functions. Many DNA segments bind proteins that regulate expression of nearby genes, allowing each mammalian gene to make the precise amount of mRNA and protein needed in each of many types of cells. Other segments of DNA are used to synthesize thousands of RNA molecules whose function in regulating gene expression is only now being uncovered. As an example, hundreds of different micro-RNAs, 20 to 25 nucleotides long, are abundant in metazoan cells, where they bind to and repress the activity of target mRNAs. These small RNAs may indirectly regulate the activity of most or all genes, either by inhibiting the ability of mRNAs to be translated into proteins or by triggering the degradation of target mRNAs (see Chapter 10).
Some of this non-protein-coding DNA probably regulates expression of genes that make us uniquely human. Indeed, fish and humans have about the same number of protein-coding genes—about 20,000—yet as noted above, the human genome is over twice the size of that in fish (see Table 1-2). The human brain can perform complex mental processes such as reading and writing a textbook. Somehow these 20,000 human genes are exquisitely regulated such that humans produce a brain with about 100,000,000,000 neurons, which communicate with one another at about 100,000,000,000,000 interaction sites termed synapses. Genomics—the study of the entire DNA sequences of organisms—has shown us how close humans really are to our nearest relatives, the great apes (Figure 1-26). Human DNA is 99 percent identical in sequence to that of chimpanzees and bonobos; the 1 percent difference is about 3,000,000 base pairs, but it somehow explains the obvious differences between our species, such as the evolution of human brains during the past 5,000,000 years since we last shared a common ancestor. Genomics coupled with paleontological findings indicates that humans and mice descended from a common mammalian ancestor that probably lived about 75 million years ago. Nonetheless, both organisms contain about the same number of genes, and about 99 percent of mouse protein-coding genes have homologs in humans, and vice versa. Over 90 percent of mouse and human genomes can be partitioned into regions of synteny—that is, DNA segments that have the same order of unique DNA sequences and genes along a segment of a chromosome. This observation suggests that much of the gene order in the most recent common ancestor of humans and mice has been conserved in both species (Figure 1-27). Of course, mice are not people; relative to humans, mice have expanded families of genes related to immunity, reproduction, and olfaction, probably reflecting the differences between the human and mouse lifestyles. It’s not only human evolution that interests us! Polar bears live in the Arctic and eat a high-fat diet, mostly composed of seals. Recent genome sequencing allowed researchers to conclude that the most recent common ancestor of polar bears and their brown bear relatives, which live in temperate climates, was present about 500,000 years—or only about 20,000 bear generations—ago. But during that rather short evolutionary period the polar bear genome acquired changes in many genes regulating cardiovascular function, fat metabolism, and heart development, allowing it to consume a diet very rich in fats.
Embryonic Development Uses a Conserved Set of Master Transcription Factors The astute reader will note a paradox in the previous discussion: if indeed most human protein-coding genes are shared with apes and mice, and many with flies and worms, how is it that these organisms look and function so differently?
1.5 Metazoan Structure, Differentiation, and Model Organisms
25
Rhesus macaque Macaca mulatta
The Hominidae (great apes) Gibbon Nomascus leucogenys
Sumatran orangutan Pongo abelii
Gorilla Gorilla gorilla
Human Homo sapiens
Bonobo Pan paniscus
Chimpanzee Pan troglodytes
0.996 ~1 Myr ago 0.990 4.5–6 Myr ago 0.984 6–8 Myr ago
Hylobatidae Small apes
0.974 12–16 Myr ago
Cercopithecidae Old World monkeys
0.971 18–20 Myr ago 0.949 25–33 Myr ago
FIGURE 126 Evolutionary tree connecting monkeys, apes, and humans. The evolutionary tree of humans, great apes, a small ape, and an Old World monkey was estimated from the divergence among their genomic DNA sequences. Whole-genome DNA sequences were aligned, and the average nucleotide divergence in unique DNA
The answer to this question resides in the way genes are regulated during the development of all metazoans from a single cell, the fertilized egg. As we learn in Chapters 8 and 9, each protein-coding gene is associated with regulatory DNA sequences that differ in different organisms. Many of these regulatory sequences bind proteins that direct the expression of the gene, and thus the amount of a protein it makes, in specific types of cells. Some of these proteins are termed master transcription factors; these proteins bind to regulatory DNA sequences, are conserved throughout evolution, and control the development of specific types of cells by activating or repressing groups of genes, often at different stages of development. The early stages in the development of a human embryo are similar to those in the mouse. They are characterized by rapid cell divisions (Figure 1-28) followed by the
Human Chr 14 Mouse Chr 12 59.9
60.5 (Mb)
FIGURE 127 Conservation of synteny between human and mouse. Shown is a 510,000-base-pair (bp) segment of mouse chromosome 12 that shares common ancestry with a 600,000-bp section of human chromosome 14. Pink lines connect the reciprocal unique DNA sequences in the two genomes. Mb, 1 million base pairs. [Data from Mouse Genome Sequencing Consortium, 2002, Nature 420:520.] 26
CHAPTER 1
t Molecules, Cells, and Model Organisms
sequences was estimated. Estimates of the times the different species diverged from each other, indicated at each node, were calculated in millions of years (Myr) based on DNA sequence identity; ∼1 Myr implies approximately 1 Myr or less. [Data from D. P. Locke et al., 2011, Nature 469:529–533.]
differentiation of cells into tissues. In all organisms, the embryonic body plan—the spatial pattern of cell types (tissues) and body parts—emerges from two influences: a program of genes that specifies the pattern of the body, and local cell interactions that induce different parts of the program. With only a few exceptions, animals display axial symmetry; that is, their left and right sides mirror each other. This most basic of patterns is encoded in the genome. Developmental biologists have divided bilaterally symmetric animal phyla into two large groups depending on where the mouth and anus form in the early embryo. Protostomes develop a mouth close to a transient opening in the early embryo (the blastopore) and have a ventral nerve cord; protostomes include all worms, insects, and mollusks. Deuterostomes develop an anus close to this transient opening in the embryo and have a dorsal central nervous system; they include echinoderms (such as sea stars and sea urchins) and vertebrates. The bodies of both protostomes and deuterostomes are divided into discrete segments that form early in embryonic development. Protostomes and deuterostomes probably evolved from a common ancestor, termed Urbilateria, that lived approximately 600 million years ago (Figure 1-29a). Many patterning genes encode master transcription factors that control expression of other genes and specify the general organization of an organism, beginning with the major body axes—anterior-posterior (head-to-tail), dorsalventral (back-to-belly), and left-right—and ending with body segments such as the head, chest, abdomen, and tail. The conservation of axial symmetry from the simplest worms to mammals is explained by the presence of conserved
(a)
(b)
(c)
FIGURE 128 The first few cell divisions of a fertilized egg set the stage for all subsequent development. A developing mouse embryo is shown at the (a) two-cell, (b) four-cell, and (c) eight-cell stages. The embryo is surrounded by supporting membranes. The corresponding steps in human development occur during the first few days after fertilization. [Claude Edelmann/Science Source.]
(a)
Urbilateria ~600 million years ago
Protostome
Deuterostome
(b) Genes
Fly (protostome)
Mammal (deuterostome)
FIGURE 129 Similar master transcription factors, conserved during evolution, regulate early developmental processes in diverse animals. (a) Urbilateria is the presumed ancestor of all protostomes and deuterostomes that existed about 600 million years ago. The positions of its nerve cord (violet), surface ectoderm (mainly skin; white), and endoderm (mainly digestive tract and organs; light green) are shown. (b) Highly conserved master transcription factors called Hox proteins, which determine the identity of body segments during embryonic development, are found in both protostomes and deuterostomes. Hox genes are found in clusters on the chromosomes of most or all animals, and they encode related master transcription factors that control the activities of other genes. In many animals, different Hox genes direct the development of different segments along the head-to-tail axis, as indicated by corresponding colors. Each gene is activated (transcriptionally) in a specific region along the head-to-tail axis and controls the growth and development of tissues there. For example, in the mouse, a deuterostome, the Hox genes are responsible for the distinctive shapes of vertebrae. Mutations affecting Hox genes in the fruit fly, a protostome, cause body parts to form in the wrong locations, such as legs in lieu of antennae on the head. In both organisms, these genes provide a head-to-tail “address” and serve to direct the formation of structures in the appropriate places.
patterning genes in their genomes. Other patterning genes encode proteins that are important in cell adhesion or in cell signaling. This broad repertoire of patterning genes permits the integration and coordination of events in different parts of the developing embryo and gives each segment in the body its unique identity. Remarkably, many patterning genes encoding master transcription factors are highly conserved in both protostomes and deuterostomes (Figure 1-29b). This conservation of body plan reflects evolutionary pressure to preserve the commonalities in the molecular and cellular mechanisms controlling development in different organisms. For instance, fly eyes and human eyes are very different in their structure, function, and nerve connections. Nonetheless, the master transcription factors that initiate eye development— eyeless in the fly and Pax6 in the human—are highly related proteins that regulate the activities of other genes and are descended from the same ancestral gene. Mutations in the eyeless or Pax6 genes cause major defects in eye formation (Figure 1-30).
Planaria Are Used to Study Stem Cells and Tissue Regeneration In single-celled organisms, both daughter cells usually (though not always) resemble the parent cell. Similarly, in multicellular organisms, when many types of cells divide, the daughter cells look a lot like the parent cell—liver cells, for instance, divide to generate liver cells with the same characteristics and functions as their parent, as do insulinproducing cells in the pancreas. In contrast, stem cells and certain other undifferentiated cells can generate multiple types of differentiated descendant cells; these cells often divide in such a way that the two daughter cells are different. Such asymmetric cell division is characteristic of stem cells and is critical to the generation of different cell types in the body (see Chapter 21). Often one daughter cell resembles its parent in that it remains undifferentiated and retains its ability to give rise to multiple types of differentiated cells. The other daughter cell divides many times, and each of its daughter cells differentiates into a specific type of cell. The planarian Schmidtea mediterranea is best known for its capacity to regenerate complete individuals—with a normal head—from minuscule body parts formed by dissection (see Figure 1-22e). Planaria contain stem cells that 1.5 Metazoan Structure, Differentiation, and Model Organisms
27
(a)
(b)
(c)
(d)
FIGURE 130 Homologous genes regulate eye development in diverse animals. (a) Development of the large compound eyes in fruit flies requires a gene called eyeless (named for the mutant phenotype). (b) Flies with inactivated eyeless genes lack eyes. (c) Normal human eyes require the gene Pax6, the homolog of eyeless. (d) People lacking adequate Pax6 function have the genetic disease aniridia, a lack of irises in the eyes. Pax6 and eyeless, which encode highly related master transcription factors that regulate the activities of other genes, are homologs and presumably descended from the same ancestral gene. [Parts (a) and (b) Courtesy Andreas Hefti, Interdepartmental Electron Microscopy (IEM), Biocenter of the University of Basel. Part (c) © Simon Fraser/Science Source. Part (d) © Mediscan/Alamy.]
replace cells lost to normal turnover. In portions of a dissected animal, they will, after several cell divisions, generate any cell type needed during regeneration. These stem cells have served as a potent experimental system to discover how heads and tails, each built of many types of cells, are formed (see Chapters 16 and 21). The hormones that instruct stem cells in different parts of the body to generate specific types of cells are similar to those used in mammals, including humans, in development (see Chapter 16), and thus future studies on planarian regeneration may inform scientists how to regenerate human body parts such as a hand or an eye.
Invertebrates, Fish, Mice, and Other Organisms Serve as Experimental Systems for Study of Human Development and Disease Organisms with large-celled embryos that develop outside the mother’s body (e.g., frogs, sea urchins, fish, and chickens) are extremely useful for tracing the fates of cells as they form different tissues, as well as for making extracts for biochemical studies. For instance, a key protein in regulating 28
CHAPTER 1
t Molecules, Cells, and Model Organisms
cell division in all eukaryotes, including humans, was first identified in studies with sea stars and sea urchin embryos and subsequently purified from extracts prepared from these embryos (see Chapter 19). Studies of cells in specialized tissues make use of animal and plant model organisms. Neurons and muscle cells, for instance, were traditionally studied in mammals or in creatures with especially large or accessible cells, such as the giant neural cells of the squid and sea hare or cells in the flight muscles of birds. More recently, muscle and nerve development have been extensively studied in fruit flies (Drosophila melanogaster), roundworms (Caenorhabditis elegans), and zebrafish (Danio rerio), in which mutations in genes required for muscle and nerve formation or function can be readily isolated (see Figure 1-22). Mice have one enormous advantage over other experimental organisms: they are the most closely related to humans of any animal for which powerful genetic approaches have been available for many years. Mice and humans have shared living structures for millennia, have similar nervous systems, have similar immune systems, and are subject to infection by many of the same pathogens. As noted, both organisms contain about the same number of genes, and about 99 percent of mouse protein-coding genes have homologs in the human genome, and vice versa. Using recombinant DNA techniques developed in the past few years, researchers can inactivate any desired gene, and thus abolish production of its encoded protein. Such specific mutations can be introduced into the genomes of worms, flies, frogs, sea urchins, chickens, mice, a variety of plants, and other organisms, permitting the effects of these mutations to be assessed. Using the Cas9 experimental system described in Chapter 6, this approach is being used extensively to produce animal versions of human genetic diseases, in mice as well as in other animals. As an example, people with autism spectrum disorder often have mutations in specific protein-coding genes. To understand the role of these mutations, these genes have been inactivated in mice; in many cases, the mice exhibit symptoms of the human disease, including repetitive actions such as excessive grooming, strongly suggesting that the human mutation indeed has a role in triggering the disorder. Within the past year, similar techniques have been used to produce monkeys in which the targeted gene has been inactivated. Such approaches can be useful in uncovering the role of specific genes in higherorder brain tasks such as learning and memory, or in studies of viruses that infect only humans and nonhuman primates. Once animal models of a human disease are available, further studies on the molecular defects causing the disease can be done and new treatments can be tested, thereby minimizing the testing of new drugs on humans.
Genetic Diseases Elucidate Important Aspects of Cell Function Many genetic diseases are caused by mutations in a single protein; studies on people with these diseases have shed light on the normal function of those proteins. As an example,
Agrin
Laminin
Perlecan Collagen and other fibrous proteins
Basal lamina
Extracellular space Sarcoglycan complex
Carbohydrate chains attached to proteins
Plasma membrane Cytosol
phin
ro Dyst
The protein defective in Duchenne muscular dystrophy
FIGURE 131 The dystrophin glycoprotein complex (DGC) in skeletal muscle cells. Dystrophin—the protein that is defective in Duchenne muscular dystrophy—links the actin cytoskeleton to the multiprotein sarcoglycan complex in the plasma membrane. Other proteins in the complex bind to components of the basal lamina, such as laminin, which in turn bind to the collagen fibers that give the basal lamina strength and rigidity. Thus dystrophin is an important member of a group of proteins that links the muscle cell and its internal actin cytoskeleton with the surrounding basal lamina. See D. E. Michele and K. P. Campbell, 2003, J. Biol. Chem. 278:15457.
Actin
consider Duchenne muscular dystrophy (DMD), the most common among the hereditary muscle-wasting diseases, collectively called muscular dystrophies. DMD, an X chromosome–linked disorder that affects 1 in 3300 boys, results in cardiac or respiratory failure and death, usually in the late teens or early twenties. The first clue to understanding the molecular basis of this disease came from the discovery that people with DMD carry mutations in the gene encoding a protein named dystrophin. As detailed in Chapter 17, this very large protein was later found to be a cytosolic adapter protein that binds to actin filaments that are part of the cytoskeleton (see Figure 1-13) and to a complex of muscle plasma-membrane proteins termed the sarcoglycan complex (Figure 1-31). The resulting large multiprotein assemblage, the dystrophin glycoprotein complex (DGC), links the extracellular matrix protein laminin to the cytoskeleton within muscle cells. Mutations in dystrophin, other DGC components, or laminin can disrupt the DGC-mediated link between the exterior and interior of muscle cells and cause muscle weakness and eventual death. The first step in identifying the entire dystrophin glycoprotein complex involved cloning the dystrophin-encoding gene using DNA from normal individuals and from patients with Duchenne muscular dystrophy.
The Following Chapters Present Much Experimental Data That Explains How We Know What We Know About Cell Structure and Function In subsequent chapters of this book, we discuss cellular processes in much greater detail. We begin (in Chapter 2) with a discussion of the chemical nature of the building blocks
of cells and the basic chemical processes required to understand the macromolecular processes discussed in subsequent chapters. We go on to discuss the structure and function of proteins (in Chapter 3). Chapter 4 discusses many of the techniques biologists use to culture and fractionate cells and to visualize specific proteins and structures within cells. Chapter 5 describes how DNA is replicated, how segments of DNA are copied into RNA, and how proteins are synthesized on ribosomes. Chapter 6 describes many of the techniques used to study genes, gene expression, and protein function, including the generation of animals with specific genetic mutations. Biomembrane structure is the topic of Chapter 7. Gene and chromosome structure and the regulation of gene expression are covered in Chapters 8, 9, and 10. The transport of ions and small molecules across membranes is covered in Chapter 11, and Chapter 12 discusses cellular energetics and the functions of mitochondria and chloroplasts. Membrane biogenesis, protein secretion, and protein trafficking—the directing of proteins to their correct subcellular destinations—are the topics of Chapters 13 and 14. Chapters 15 and 16 discuss the many types of signals and signal receptors used by cells to communicate and regulate their activities. The cytoskeleton and cell movements are discussed in Chapters 17 and 18. Chapter 19 discusses the cell cycle and how cell division is regulated. The interactions among cells, and between cells and the extracellular matrix, that enable formation of tissues and organs are detailed in Chapter 20. Later chapters of the book discuss important types of specialized cells—stem cells (Chapter 21), neurons (Chapter 22), and cells of the immune system (Chapter 23). Chapter 24 discusses cancer and the multiple ways in which cell growth and differentiation can be altered by mutations.
1.5 Metazoan Structure, Differentiation, and Model Organisms
29
this page left intentionally blank
CHAPTER
N
N
N N
N
N
FeII N
N
N
N
N
F P F N
FeII N N
N
N
N
F F
N Fe II N
N
N
F
F
N
N FeII N
N
N N N
2
N FeII N
N
N
N Fe II N
N
N
“Star of David” catenane. Two triply entwined rings composed of carbon, hydrogen, and nitrogen are linked together with bridging iron atoms via a complex chemical synthetic pathway to cross each other six times and form a hexagram (six-pointed star). The chemical structure is indicated on the left, where the two independent rings are colored blue and orange. On the right is the three-dimensional structure determined by x-ray crystallography with the carbon atoms of one ring in blue and the other light gray; irons are pink and nitrogens purple. In the center is a noncovalently bound, negatively charged phosphorus hexafluoride (cyan and green). See D. A. Leigh, R. G. Pritchard, and A. J. Stephens, 2014,
Chemical Foundations
Nature Chem. 6:978–982.
The life of a cell depends on thousands of chemical interactions and reactions exquisitely coordinated with one another in time and space, influenced by the cell’s genetic instructions and its environment. By understanding these interactions and reactions at a molecular level, we can begin to answer fundamental questions about cellular life: How does a cell extract nutrients and information from its environment? How does a cell convert the energy stored in nutrients into the work of movement or metabolism? How does a cell transform nutrients into the cellular components required for its survival? How does a cell link itself to other cells to form a tissue? How do cells communicate with one another so that a complex, efficiently functioning organism can develop
and thrive? One of the goals of Molecular Cell Biology is to answer these and other questions about the structure and function of cells and organisms in terms of the properties of individual molecules and ions. For example, the properties of one such molecule, water, control the evolution, structure, and function of all cells. An understanding of biology is not possible without appreciating how the properties of water control the chemistry of life. Life first arose in a watery environment. Constituting 70–80 percent of most cells by weight, water is the most abundant molecule in biological systems. It is within this aqueous milieu that small molecules and ions, which make up about 7 percent of the weight of living matter, combine into the
OU TL I N E 2.1
Covalent Bonds and Noncovalent Interactions
2.3
Chemical Reactions and Chemical Equilibrium
2.2
Chemical Building Blocks of Cells
2.4
Biochemical Energetics
(a) Molecular complementarity
(b) Chemical building blocks
Protein A
CH3 CH3
H O
N H O
CH3
O
CH3 CH3
Polymerization
C CH3
C
Noncovalent interactions
D D
O
H O
C
Protein B
Macromolecule (c) Chemical equilibrium
(d) Chemical bond energy "High-energy" phosphoanhydride bonds
γ β
kf kr K eq " k f kr
ADP + Pi + Energy
α
Adenosine triphosphate (ATP)
FIGURE 21 Chemistry of life: four key concepts. (a) Molecular complementarity lies at the heart of all biomolecular interactions (see Section 2.1), as when two proteins with complementary shapes and chemical properties come together to form a tightly bound complex. (b) Small molecules serve as building blocks for larger structures (see Section 2.2). For example, to generate the information-carrying macromolecule DNA, four small nucleotide building blocks are covalently linked into long strings (polymers), which then wrap around each other to form the double helix. (c) Chemical reactions are reversible, and the distribution of the chemicals
between starting reactants (left) and the products of the reactions (right) depends on the rate constants of the forward (kf, upper arrow) and reverse (kr, lower arrow) reactions. The ratio of these, Keq, provides an informative measure of the relative amounts of products and reactants that will be present at equilibrium (see Section 2.3). (d) In many cases, the source of energy for chemical reactions in cells is the hydrolysis of the molecule ATP (see Section 2.4). This energy is released when a high-energy phosphoanhydride bond linking the b and g phosphates in the ATP molecule (red) is broken by the addition of a water molecule, forming ADP and Pi.
larger macromolecules and macromolecular assemblies that make up a cell’s machinery and architecture and thus the remaining mass of organisms. These small molecules include amino acids (the building blocks of proteins), nucleotides (the building blocks of DNA and RNA), lipids (the building blocks of biomembranes), and sugars (the building blocks of complex carbohydrates). Many of the cell’s biomolecules (such as sugars) readily dissolve in water; these molecules are referred to as hydrophilic (“water liking”). Others (such as cholesterol) are oily, fatlike substances that shun water; these molecules are said to be hydrophobic (“water fearing”). Still other biomolecules (such as phospholipids) contain both hydrophilic and hydrophobic regions; these molecules are said to be amphipathic or amphiphilic (“both liking”). The smooth functioning of cells, tissues, and organisms depends on all these molecules, from the smallest to the largest. Indeed, the chemistry of the simple proton (H+) can be as important to the survival of a human cell as that of each gigantic DNA molecule (the
mass of the DNA molecule in human chromosome 1 is 8.6 × 1010 times that of a proton!). The chemical interactions of all these molecules, large and small, with water and with one another define the nature of life. Luckily, although many types of biomolecules interact and react in numerous and complex pathways to form functional cells and organisms, a relatively small number of chemical principles are necessary to understand cellular processes at the molecular level (Figure 2-1). In this chapter, we review these key principles, some of which you already know well. We begin with the covalent bonds that connect atoms into molecules and the noncovalent interactions that stabilize groups of atoms within and between molecules. We then consider the basic chemical building blocks of macromolecules and macromolecular assemblies. After reviewing those aspects of chemical equilibrium that are most relevant to biological systems, we end the chapter with the basic principles of biochemical energetics, including the central role of ATP (adenosine triphosphate) in capturing and transferring energy in cellular metabolism.
32
CHAPTER 2
t Chemical Foundations
2.1 Covalent Bonds and Noncovalent Interactions Strong and weak attractive forces between atoms are the “glue” that holds individual molecules together and permits interactions between different molecules. When two atoms share a single pair of electrons, the result is a covalent bond—a type of strong force that holds atoms together in molecules. Sharing of multiple pairs of electrons results in multiple covalent bonds (e.g., “double” or “triple” bonds). The weak attractive forces of noncovalent interactions are equally important in determining the properties and functions of biomolecules such as proteins, nucleic acids, carbohydrates, and lipids. In this section, we first review covalent bonds and then discuss the four major types of noncovalent interactions: ionic bonds, hydrogen bonds, van der Waals interactions, and the hydrophobic effect.
The Electronic Structure of an Atom Determines the Number and Geometry of the Covalent Bonds It Can Make Hydrogen, oxygen, carbon, nitrogen, phosphorus, and sulfur are the most abundant elements in biological molecules. These atoms, which rarely exist as isolated entities, readily form covalent bonds, using electrons in the outermost electron orbitals surrounding their nuclei (Figure 2-2). As a rule, each type of atom forms a characteristic number of covalent bonds with other atoms. These bonds have a well-defined geometry determined by the atom’s size and by both the distribution of electrons around the nucleus and the number of electrons that it can share. In some cases, the number of stable covalent bonds an atom can make is fixed; carbon, for example, always forms four covalent bonds. In other cases, different numbers of stable covalent bonds are possible; for example, sulfur can form two, four, or six stable covalent bonds. All the biological building blocks are organized around the carbon atom, which forms four covalent bonds. In these
organic biomolecules, each carbon usually bonds to three or four other atoms. [Carbon can also bond to two other atoms, as in the linear molecule carbon dioxide, CO2, which has two carbon-oxygen double bonds (O=C=O); however, such bond arrangements of carbon are not found in biological building blocks.] As illustrated in Figure 2-3a for formaldehyde, carbon can bond to three atoms, all in a common plane. The carbon atom forms two single bonds with two atoms and one double bond (two shared electron pairs) with the third atom. In the absence of other constraints, atoms joined by a single bond generally can rotate freely about the bond axis, whereas those connected by a double bond cannot. The rigid planarity imposed by double bonds has enormous significance for the shapes and flexibility of biomolecules such as phospholipids, proteins, and nucleic acids. Carbon can also bond to four rather than three atoms. As illustrated by methane (CH4), when carbon is bonded to four other atoms, the angle between any two bonds is 109.5°, and the positions of bonded atoms define the four points of a tetrahedron (Figure 2-3b). This geometry defines the structures of many biomolecules. A carbon (or any other) atom bonded to four dissimilar atoms or groups in a nonplanar
(a) Formaldehyde O
H C
O
H
H
(b) Methane H
H 109.5s H
C
H
Covalent bond H H H
C
H
C
H
H H Methane H
FIGURE 22 Covalent bonds form by the sharing of electrons. Covalent bonds, the strong forces that hold atoms together in molecules, form when atoms share electrons from their outermost electron orbitals. Each atom forms a defined number and geometry of covalent bonds.
C H
H Chemical structure
Electrons
~120°
C
H
H
H
Ball-and-stick model
Space-filling model
FIGURE 23 Geometry of bonds when carbon is covalently linked to three or four other atoms. (a) A carbon atom can be bonded to three atoms, as in formaldehyde (CH2O). In this case, the carbon-bonding electrons participate in two single bonds and one double bond, which all lie in the same plane. Unlike atoms connected by a single bond, which usually can rotate freely about the bond axis, those connected by a double bond cannot. (b) When a carbon atom forms four single bonds, as in methane (CH4), the bonded atoms (all H in this case) are oriented in space in the form of a tetrahedron. The letter representations on the left clearly indicate the atomic composition of each molecule and its bonding pattern. The ball-and-stick models in the center illustrate the geometric arrangement of the atoms and bonds, but the diameters of the balls representing the atoms and their nonbonding electrons are unrealistically small compared with the bond lengths. The sizes of the electron clouds in the space-filling models on the right more accurately represent the structure in three dimensions. 2.1 Covalent Bonds and Noncovalent Interactions
33
Mirror COO–
COO–
TABLE 21 Bonding Properties of Atoms Most Abundant in Biomolecules Atom and Outer Electrons
Cα
H
NH3+
R D
isomer
H
Cα
NH3+
R L
isomer
FIGURE 24 Stereoisomers. Many molecules in cells contain at least one asymmetric carbon atom. The tetrahedral orientation of bonds formed by an asymmetric carbon atom can be arranged in three-dimensional space in two different ways, producing molecules that are mirror images, or stereoisomers, of each other. Shown here is the common structure of an amino acid, with its central asymmetric carbon and four attached groups, including the R group, discussed in Section 2.2. Amino acids can exist in two mirror-image forms, designated L and D. Although the chemical properties of such stereoisomers are identical, their biological activities are distinct. Only L amino acids are found in proteins.
configuration is said to be asymmetric. The tetrahedral orientation of bonds formed by an asymmetric carbon atom can be arranged in three-dimensional space in two different ways, producing molecules that are mirror images of each other, a property called chirality (“handedness,” from the Greek word cheir, meaning “hand”) (Figure 2-4). Such molecules are called optical isomers, or stereoisomers. Many molecules in cells contain at least one asymmetric carbon atom, often called a chiral carbon atom. The different stereoisomers of a molecule usually have completely different biological activities because the arrangement of atoms within their structures, and thus their ability to interact with other molecules, differs. Some drugs are mixtures of the stereoisomers of small molecules in which only one stereoisomer has the biological activity of interest. The use of a pure single stereoisomer of the chemical in place of the mixture may result in a more potent drug with reduced side effects. For example, one stereoisomer of the antidepressant drug citalopram (Celexa) is 170 times more potent than the other. Some stereoisomers have very different activities. Darvon is a pain reliever, whereas its stereoisomer, Novrad (Darvon spelled backward), is a cough suppressant. One stereoisomer of ketamine is an anesthetic, whereas the other causes hallucinations. ■ The typical numbers of covalent bonds formed by other atoms common in biomolecules are shown in Table 2-1. A hydrogen atom forms only one covalent bond. An atom of oxygen usually forms only two covalent bonds but has two additional pairs of electrons that can participate in 34
CHAPTER 2
t Chemical Foundations
Usual Number of Covalent Bonds
Typical Bond Geometry H
H
1
O
2
O
S
2, 4, or 6
S
N
3 or 4
N
P
5
P
C
4
C
noncovalent interactions. Sulfur forms two covalent bonds in hydrogen sulfide (H2S) but can accommodate six covalent bonds, as in sulfuric acid (H2SO4) and its sulfate derivatives. Nitrogen and phosphorus each have five electrons to share. In ammonia (NH3), the nitrogen atom forms three covalent bonds; the pair of electrons around the atom not involved in a covalent bond can take part in noncovalent interactions. In the ammonium ion (NH4+), nitrogen forms four covalent bonds, which have a tetrahedral geometry. Phosphorus commonly forms five covalent bonds, as in phosphoric acid (H3PO4) and its phosphate derivatives, which form the backbone of nucleic acids. Phosphate groups covalently attached to proteins play a key role in regulating the activity of many proteins, and the central molecule in cellular energetics, ATP, contains three phosphate groups (see Section 2.4). A summary of common covalent linkages and functional groups, which confer distinctive chemical properties on the molecules of which they are a part, is provided in Table 2-2.
Electrons May Be Shared Equally or Unequally in Covalent Bonds The extent of an atom’s ability to attract an electron is called its electronegativity. In a bond between atoms with identical or similar electronegativities, the bonding electrons are essentially shared equally between the two atoms, as is the case for most carbon-carbon single bonds (C−C) and carbon-hydrogen single bonds (C−H). Such bonds are called nonpolar. In many molecules, however, the bonded atoms have different electronegativities, resulting in unequal sharing of electrons. The bond between them is said to be polar. One end of a polar bond has a partial negative charge (δ−), and the other end has a partial positive charge (δ+). In an O−H bond, for example, the greater electronegativity of the oxygen atom relative to the hydrogen atom results in the electrons spending more time around the oxygen atom than around the hydrogen. Thus the O−H bond possesses an
TABLE 22
Common Functional Groups and Linkages in Biomolecules
Functional Groups O
O
O
C
C
OH C
Hydroxyl
R
O
Acyl
Carbonyl
Carboxyl
(triacylglycerol)
(ketone)
(carboxylic acid)
(alcohol)
O SH
NH2 or
Sulfhydryl
O
NH3
O
Amino
O O
O
P
O
Phosphate
(thiol)
O
P
O
P O
Pyrophosphate
(amines) (phosphorylated molecule)
(diphosphate)
Linkages O
O C
O Ester
C
C
O
C
N
Ether
electric dipole, a positive charge separated from an equal but opposite negative charge. The amount of δ+ charge on the oxygen atom of an O−H dipole is approximately 25 percent that of an electron, and there is an equivalent and opposite δ+ charge on the H atom. A common quantitative measure of the extent of charge separation, or strength, of a dipole is called the dipole moment, μ, which for a chemical bond is the product of the partial charge on each atom and the distance between the two atoms. For a molecule with multiple dipoles, the amount of charge separation for the molecule as a whole depends in part on the dipole moments of all of its individual chemical bonds and in part on the geometry of the molecule (the relative orientations of the individual dipole moments). Consider the example of water (H2O), which has two O−H bonds and thus two individual bond dipole moments. If water were a linear molecule with the two bonds on exact opposite sides of the O atom, the two dipoles on each end of the molecule would be identical in strength but would be oriented in opposite directions. The two dipole moments would cancel each other, and the dipole moment of molecule as a whole would be zero. However, because water is a V-shaped molecule, with the individual dipoles of its two O−H bonds both pointing toward the oxygen, one end of the water molecule (the end with the oxygen atom) has a partial negative charge and the other end (the one with the two hydrogen atoms) has a partial positive charge. As a consequence, the molecule as a whole is a dipole with a well-defined dipole moment (Figure 2-5). This dipole moment and the electronic properties of the oxygen and hydrogen atoms allow water to form electrostatic, noncovalent interactions with other
C
Amide
O
δ−
−
δ−
Dipole moment δ+
H
δ+
H
104.5°
+
FIGURE 25 The dipole nature of a water molecule. The symbol δ represents a partial charge (a weaker charge than the one on an electron or a proton). Because of the difference in the electronegativities of H and O, each of the polar H−O bonds in water is a dipole. The sizes and directions of the dipoles of each of the bonds determine the net distance and amount of charge separation, or dipole moment, of the molecule.
water molecules and with molecules of other types. These interactions play a critical role in almost every biochemical interaction in cells and organisms, as we will see shortly. Another important example of polarity is the O=P double bond in H3PO4. In the structure of H3PO4 shown on the left below, lines represent single and double bonds and nonbonding electrons are shown as pairs of dots (each dot represents one electron):
H
O
H
H
O
O
P O
O
H
H
O
P
O
O
H
2.1 Covalent Bonds and Noncovalent Interactions
35
FIGURE 26 Relative energies of covalent bonds and noncovalent interactions. Bond energies are defined as the energy required to break a particular type of linkage. Shown here are the energies required to break a variety of linkages, arranged on a log scale. Covalent bonds, including single (C−C) and double (C=C) carbon-carbon bonds, are one to two powers of 10 stronger than noncovalent interactions. Noncovalent interactions have energies somewhat greater than the thermal energy of the environment at normal room temperature (25 °C). Many biological processes are coupled to the energy released during hydrolysis of a phosphoanhydride bond in ATP.
Noncovalent interactions
van der Waals
Because of the polarity of the O=P double bond, H3PO4 can also be represented by the structure on the right, in which one of the electrons from the P=O double bond has accumulated around the O atom, giving it a negative charge and leaving the P atom with a positive charge. These charges are important in noncovalent interactions. Neither of these two models precisely describes the electronic state of H3PO4. The actual structure can be considered to be an intermediate, or hybrid, between these two representations, as indicated by the double-headed arrow between them. Such intermediate structures are called resonance hybrids.
Covalent Bonds Are Much Stronger and More Stable Than Noncovalent Interactions Covalent bonds are considered to be strong because the energies required to break them are much greater than the thermal energy available at room temperature (25 °C) or body temperature (37 °C). As a consequence, they are stable at these temperatures. For example, the thermal energy available at 25 °C is approximately 0.6 kilocalorie per mole (kcal/mol), whereas the energy required to break the C−C bond in ethane is about 140 times larger (Figure 2-6). Consequently, at room temperature (25 °C), fewer than 1 in 1012 ethane molecules is broken into a pair of ·CH3 molecules, each containing an unpaired, nonbonding electron (called a radical). Covalent single bonds in biological molecules have energies similar to the energy of the C−C bond in ethane. Because more electrons are shared between atoms in double bonds, they require more energy to break than single bonds. For instance, it takes 84 kcal/mol to break a single C−O bond but 170 kcal/mol to break a C=O double bond. The most common double bonds in biological molecules are C=O, C=N, C=C, and P=O. In contrast, the energy required to break noncovalent interactions is only 1–5 kcal/mol, much less than the bond energies of covalent bonds (see Figure 2-6). Indeed, noncovalent interactions are weak enough that they are constantly being formed and broken at room temperature. Although these interactions are weak and have a transient existence 36
CHAPTER 2
t Chemical Foundations
Hydrogen bonds
Thermal energy
0.24
Covalent bonds
Hydrolysis of ATP phosphoanhydride bond
2.4
C
C C
24
C
240 kcal/mol
Increasing bond strength
at physiological temperatures (25–37 °C), multiple noncovalent interactions can, as we will see, act together to produce highly stable and specific associations between different parts of a large molecule or between different macromolecules. Protein-protein and protein-nucleic acid interactions are good examples of such noncovalent interactions. Below, we review the four main types of noncovalent interactions and then consider their roles in the binding of biomolecules to one another and to other molecules.
Ionic Interactions Are Attractions Between Oppositely Charged Ions Ionic interactions result from the attraction between a positively charged ion—a cation—and a negatively charged ion—an anion. In sodium chloride (NaCl), for example, the bonding electron contributed by the sodium atom is completely transferred to the chlorine atom (Figure 2-7a). Unlike covalent bonds, ionic interactions do not have fixed or specific geometric orientations because the electrostatic field around an ion—its attraction for an opposite charge—is uniform in all directions. In solid NaCl, oppositely charged ions pack tightly together in an alternating pattern, forming the highly ordered crystalline array, or lattice, that is typical of salt crystals (Figure 2-7b). The energy required to break an ionic interaction depends on the distance between the ions and the electrical properties of the environment of the ions. When solid salts dissolve in water, the ions separate from one another and are stabilized by their interactions with water molecules. In aqueous solutions, simple ions of biological significance, such as Na+, K+, Ca2+, Mg2+, and Cl−, are hydrated, surrounded by a stable shell of water molecules held in place by ionic interactions between the ion at the center and the oppositely charged ends of the water molecules, which are dipoles (Figure 2-7c). Most ionic compounds dissolve readily in water because the energy of hydration—the energy released when ions tightly bind water molecules and spread out in an aqueous solution—is greater than the lattice energy that stabilizes the crystal structure. Parts or all of the aqueous hydration shell must be removed from ions in solution when they interact directly with proteins. For example,
(a)
(c)
(b)
H2O
+
−
Na Na
−
Cl Cl
−
Cl−
Na+
+
+
−
−
Na+
Cl−
−
+
+
+
−
−
−
Donation of electron
−
+
+
−
+
−
− +
− +
+
− +
+ H2O dissolving − Crystallizing +
−
FIGURE 27 Electrostatic interactions of the oppositely charged ions of salt (NaCl) in crystals and in aqueous solution. (a) In crystalline table salt, sodium atoms are positively charged ions (Na+) due to the loss of one electron each, whereas chloride atoms are correspondingly negatively charged (Cl−) by gaining one electron each. (b) In solid form, ionic compounds form neatly ordered arrays, or crystals, of tightly packed ions in which the positive and negatively charged ions
counterbalance each other. (c) When the crystals are dissolved in water, the ions separate, and their charges, no longer balanced by immediately adjacent ions of opposite charge, are stabilized by interactions with polar water. The water molecules and the ions are held together by electrostatic interactions between the charges on the ion and the partial charges on the water’s oxygen and hydrogen atoms. In aqueous solutions, all ions are surrounded by a hydration shell of water molecules.
water of hydration is lost when ions pass through protein pores in the cell membrane during nerve conduction. The relative strength of the interaction between two oppositely charged ions, A− and C+, depends on the concentration of other ions in a solution. The higher the concentration of other ions (e.g., Na+ and Cl −), the more opportunities A− and C+ have to interact ionically with those other ions, and thus the lower the energy required to break the interaction between A− and C+. As a result, increasing the concentrations of salts such as NaCl in a solution of biological molecules can weaken and even disrupt the ionic interactions holding the biomolecules together. This principle can be exploited to separate complex mixtures of interacting molecules such as proteins into their individual, pure components.
The length of the covalent D−H bond is a bit longer than it would be if there were no hydrogen bond because the acceptor “pulls” the hydrogen away from the donor. An important feature of all hydrogen bonds is directionality. In the strongest hydrogen bonds, the donor atom, the hydrogen atom, and the acceptor atom all lie in a straight line. Nonlinear hydrogen bonds are weaker than linear ones; still, multiple nonlinear hydrogen bonds help to stabilize the three-dimensional structures of many proteins. Hydrogen bonds are both longer and weaker than covalent bonds between the same atoms. In water, for example, the distance between the nuclei of the hydrogen and oxygen atoms of adjacent, hydrogen-bonded water molecules is about 0.27 nm, about twice the length of the covalent O−H bonds within a single water molecule (Figure 2-8a). A hydrogen bond between water molecules (approximately 5 kcal/mol) is much weaker than a covalent O−H bond (roughly 110 kcal/mol), although it is stronger than many other hydrogen bonds in biological molecules (1–2 kcal/mol). Extensive intermolecular hydrogen bonding between water molecules accounts for many of water’s key properties, including its unusually high melting and boiling points and its ability to dissolve many other molecules. The solubility of uncharged substances in an aqueous environment depends largely on their ability to form hydrogen bonds with water. For instance, the hydroxyl group (−OH) in alcohols (−CH2OH) and the amino group (−NH2) in amines (−CH2NH2) can form several hydrogen bonds with water, which allows these molecules to dissolve in water at high concentrations (Figure 2-8b). In general, molecules with polar bonds that easily form hydrogen bonds with water, as well as charged molecules and ions
Hydrogen Bonds Are Noncovalent Interactions That Determine the Water Solubility of Uncharged Molecules A hydrogen bond is the interaction of a partially positively charged hydrogen atom in a dipole, such as water, with unpaired electrons from another atom, either in the same or in a different molecule. Normally, a hydrogen atom forms a covalent bond with only one other atom. However, a hydrogen atom covalently bonded to an electronegative donor atom D may form an additional weak association, the hydrogen bond, with an acceptor atom A, which must have a nonbonding pair of electrons available for the interaction: D
H A
D
H
A
Hydrogen bond
2.1 Covalent Bonds and Noncovalent Interactions
37
(b)
(a)
(c)
O
H
H
O
H
H H
H O
H
H
H
H
O
O
H H
O
O
CH3
Alcohol-water
Water-water
C
H
H H
N
N
CH3
H
Amine-water
H
O
O
H
H
H
O
O
O
Peptide group–water
O
H
H
H
O
H
H
H
O
H H
C
O
Ester group–water
FIGURE 28 Hydrogen bonding of water with itself and with other compounds. Each pair of nonbonding outer electrons in an oxygen or a nitrogen atom can accept a hydrogen atom in a hydrogen bond. The hydroxyl and the amino groups can also form hydrogen bonds with water. (a) In liquid water, each water molecule forms transient hydrogen bonds with several others, creating a dynamic network
of hydrogen-bonded molecules. (b) Water can also form hydrogen bonds with alcohols and amines, which accounts for the high solubility of these compounds. (c) The peptide group and the ester group, which are present in many biomolecules, commonly participate in hydrogen bonds with water or polar groups in other molecules.
that interact with the dipole in water, can readily dissolve in water; that is, they are hydrophilic. Many biological molecules contain, in addition to hydroxyl and amino groups, peptide and ester groups, which form hydrogen bonds with water via otherwise nonbonded electrons on their carbonyl oxygens (Figure 2-8c). X-ray crystallography combined with computational analysis permits an accurate depiction of the distribution of the outermost unbonded electrons of atoms that can participate in hydrogen bonds as well as the electrons in covalent bonds, as illustrated in Figure 2-9.
from the momentary random fluctuations in the distribution of the electrons of any atom, which give rise to a transient unequal distribution of electrons. If two noncovalently bonded atoms are close enough, electrons of one atom will perturb the electrons of the other. This perturbation generates a transient dipole in the second atom, and the two dipoles attract each other weakly (Figure 2-10). Similarly, a polar covalent bond in one molecule attracts an oppositely oriented dipole in another. Van der Waals interactions, involving either transient or permanent dipoles, occur in all types of molecules, both polar and nonpolar. In particular, van der Waals interactions are responsible for the cohesion between nonpolar molecules such as heptane, CH3−(CH2)5−CH3, that cannot form hydrogen bonds or ionic interactions with each other. The strength of van der Waals interactions decreases rapidly with increasing distance; thus these noncovalent interactions
Van der Waals Interactions Are Weak Attractive Interactions Caused by Transient Dipoles When any two atoms approach each other closely, they create a weak, nonspecific attractive force called a van der Waals interaction. These nonspecific interactions result
N
H
C
CF
38
CHAPTER 2
t Chemical Foundations
O
Nonbonded electrons
FIGURE 29 Distribution of bonding and outer nonbonding electrons in the peptide group. Shown here is a peptide bond linking two amino acids within a protein called crambin. No protein has been structurally characterized at higher resolution than crambin. The black lines represent the covalent bonds between atoms. The red (negative) and blue (positive) lines represent contours of charge determined using x-ray crystallography and computational methods. The greater the number of contour lines, the higher the charge. The high density of red contour lines between atoms represents the covalent bonds (shared electron pairs). The two sets of red contour lines emanating from the oxygen (O) and not falling on a covalent bond (black line) represent the two pairs of nonbonding electrons on the oxygen that are available to participate in hydrogen bonding. The high density of blue contour lines near the hydrogen (H) bonded to nitrogen (N) represents a partial positive charge, indicating that this H can act as a donor in hydrogen bonding. [From Proc. Natl. Acad. Sci. USA, 2000, 97(7):3171–3176, Fig. 3A. Accurate protein crystallography at ultra-high resolution: Valence electron distribution in crambin, by Christian Jelsch et al., Copyright (2000) National Academy of Sciences, USA.]
δ
δ δ
Covalent radius (0.062 nm)
δ
van der Waals radius (0.14 nm)
FIGURE 210 Two oxygen molecules in van der Waals contact. In this model, red indicates negative charge and blue indicates positive charge. Transient dipoles in the electron clouds of all atoms give rise to weak attractive forces, called van der Waals interactions. Each type of atom has a characteristic van der Waals radius at which van der Waals interactions with other atoms are optimal. Because atoms repel one another if they are close enough together for their outer electrons to overlap without being shared in a covalent bond, the van der Waals radius is a measure of the size of the electron cloud surrounding an atom. The covalent radius indicated here is for the double bond of O=O; the single-bond covalent radius of oxygen is slightly longer.
can form only when atoms are quite close to one another. However, if atoms get too close together, the negative charges of their electrons create a repulsive force. When the van der Waals attraction between two atoms exactly balances the repulsion between their two electron clouds, the atoms are said to be in van der Waals contact. The strength of the van der Waals interaction is about 1 kcal/mol, so it is weaker than typical hydrogen bonds, and its energy is only slightly higher than the average thermal energy of molecules at 25 °C. Thus multiple van der Waals interactions, a van der Waals interaction together with other noncovalent interactions, or both are required to form van der Waals–mediated stable attractions within and between molecules.
Nonpolar molecules or nonpolar parts of molecules tend to aggregate in water owing to a phenomenon called the hydrophobic effect. Because water molecules cannot form hydrogen bonds with nonpolar substances, they tend to form “cages” of relatively rigid hydrogen-bonded pentagons and hexagons around nonpolar molecules (Figure 2-11, left). This state is energetically unfavorable because it decreases the entropy, or randomness, of the population of water molecules. (The role of entropy in chemical systems is discussed in Section 2.4.) If nonpolar molecules in an aqueous environment aggregate with their hydrophobic surfaces facing each other, the net hydrophobic surface area exposed to water is reduced (Figure 2-11, right). As a consequence, less water is needed to form the cages surrounding the nonpolar molecules, entropy increases relative to the unaggregated state, and an energetically more favorable state is reached. In a sense, then, water squeezes the nonpolar molecules into aggregates. Rather than constituting an attractive force, as in hydrogen bonds, the hydrophobic effect results from an avoidance of an unstable state—that is, extensive water cages around individual nonpolar molecules. Nonpolar molecules can also associate, albeit weakly, through van der Waals interactions. The net result of the hydrophobic effect and van der Waals interactions is a very powerful tendency for hydrophobic molecules to interact with one another, not with water. Simply put, like dissolves like. Polar molecules dissolve in polar solvents such as water; nonpolar molecules dissolve in nonpolar solvents such as hexane. One well-known hydrophobic molecule is cholesterol (see the structure in Section 2.2). Cholesterol, triglycerides, and other poorly water-soluble molecules are called lipids. Unlike hydrophilic molecules such as glucose or
Nonpolar substance
Highly ordered water molecules
The Hydrophobic Effect Causes Nonpolar Molecules to Adhere to One Another Because nonpolar molecules do not contain charged groups, do not possess a dipole moment, and do not become hydrated, they are insoluble, or almost insoluble, in water; that is, they are hydrophobic. The covalent bonds between two carbon atoms and between carbon and hydrogen atoms are the most common nonpolar bonds in biological systems. Hydrocarbons—molecules made up only of carbon and hydrogen—are virtually insoluble in water. Large triacylglycerols (also known as triglycerides), which make up animal fats and vegetable oils, also essentially are insoluble in water. As we will see later, the major part of these molecules consists of long hydrocarbon chains. After being shaken in water, triacylglycerols form a separate phase. A familiar example is the separation of oil from the water-based vinegar in an oil-and-vinegar salad dressing.
Waters released into bulk solution
Hydrophobic aggregation
Lower entropy
Higher entropy
FIGURE 211 Schematic depiction of the hydrophobic effect. Cages of water molecules that form around nonpolar molecules in solution are more ordered than water molecules in the surrounding bulk liquid. Aggregation of nonpolar molecules reduces the number of water molecules involved in forming highly ordered cages, resulting in a higher-entropy, more energetically favorable state (right) compared with the unaggregated state (left).
2.1 Covalent Bonds and Noncovalent Interactions
39
amino acids, lipids cannot readily dissolve in the blood, the aqueous circulatory system that transports molecules and cells throughout the body. Instead, lipids such as cholesterol must be packaged into special hydrophilic carriers, called lipoproteins, that can themselves dissolve in the blood and be transported throughout the body. There can be hundreds to thousands of lipid molecules packed into the center, or core, of each lipoprotein. The hydrophobic core is surrounded by amphipathic molecules that have hydrophilic parts that interact with water and hydrophobic parts that interact with one another and the core. The packaging of lipids into lipoproteins (discussed in Chapter 14) permits their efficient transport in blood and is reminiscent of the containerization of cargo for efficient long-distance transport via cargo ships, trains, and trucks. High-density lipoprotein (HDL) and low-density lipoprotein (LDL) are two such lipoprotein carriers that are associated with either reduced or increased heart disease, respectively, and are therefore often referred to as “good” and “bad” cholesterol. Actually, the cholesterol molecules and their derivatives that are carried by both HDL and LDL are essentially identical and in themselves are neither “good” nor “bad.” However, HDL and LDL have different effects on cells, and as a consequence, LDL contributes to and HDL appears to protect from clogging of the arteries (known as atherosclerosis) and consequent heart disease and stroke. Thus LDL is known as “bad” cholesterol. ■
Molecular Complementarity Due to Noncovalent Interactions Leads to a Lock-and-Key Fit Between Biomolecules Both inside and outside cells, ions and molecules constantly collide. The higher the concentration of any two types of molecules, the more likely they are to encounter each other. When two molecules encounter each other, they are most likely to simply bounce apart because the noncovalent interactions that would bind them together are weak and have a transient existence at physiological temperatures. However, molecules that exhibit molecular complementarity, a lockand-key kind of fit between their shapes, charges, or other physical properties, can form multiple noncovalent interactions at close range. When two such structurally complementary molecules bump into each other, these multiple interactions cause them to stick together, or bind. Figure 2-12 illustrates how multiple, different weak interactions can cause two hypothetical proteins to bind together tightly. Numerous examples of such protein-to- protein molecular complementarity may be found throughout this book (see, for example, Figures 16-8, 16-9, and 16-11). Almost any other arrangement of the same groups of molecules on the two surfaces would not allow the molecules to bind so tightly. Such molecular complementarity between regions within a protein molecule allow it to fold into a unique three-dimensional shape (see Chapter 3); it is also what holds the two chains of DNA together in a double helix (see Chapter 5). Similar interactions underlie the association
40
CHAPTER 2
t Chemical Foundations
OH
O
OH
C
C
Ionic bond
CH3 CH3 CH3
Hydrogen bond
HN
O
H3C
O
C
O
HN
Hydrophobic and van der Waals interactions
C
C
H3C
H3C
O
HO
Protein A Protein B Stable complex
C
O
H3C
Protein A Protein C Less stable complex
FIGURE 212 Molecular complementarity permits tight protein bonding via multiple noncovalent interactions. The complementary shapes, charges, polarity, and hydrophobicity of two protein surfaces permit multiple weak interactions, which in combination produce a strong interaction and tight binding. Because deviations from molecular complementarity substantially weaken binding, a particular surface region of any given biomolecule usually can bind tightly to only one or a very limited number of other molecules. The complementarity of the two protein molecules on the left permits them to bind much more tightly than the two noncomplementary proteins on the right.
of groups of molecules into multimolecular assemblies, or complexes, leading, for example, to the formation of muscle fibers, to the gluelike associations between cells in solid tissues, and to numerous other cellular structures. The antibodies that help neutralize pathogens (see Chapter 23) bind to them using similar principles of complementary molecular shapes. Depending on the number and strength of the noncovalent interactions between the two molecules and on their environment, their binding may be tight or loose and, as a consequence, either lasting or transient. The higher the affinity of two molecules for each other, the better the molecular “fit” between them, the more noncovalent interactions can form, and the more tightly they can bind together. An important quantitative measure of affinity is the binding dissociation constant Kd, described in Section 2.3. It is important to note that many large biological molecules are not hard, rigid structures, but rather can be somewhat malleable. Thus the binding of a molecule to another has the potential to induce a change in the shape of its binding partner. When the molecular complementarity increases after such interactions, the process is called induced fit. As we discuss in Chapter 3, nearly all the chemical reactions that occur in cells also depend on the binding properties of enzymes. These proteins not only speed up, or catalyze, reactions, but do so with a high degree of specificity, which is a reflection of their ability to bind tightly to only one or a few related molecules. The specificity of intermolecular interactions and reactions, which depends on molecular complementarity, is essential for many processes critical to life.
KEY CONCEPTS OF SECTION 2.1
hydrophobic effect, thereby reducing the extent of their direct contact with water molecules (see Figure 2-11).
Covalent Bonds and Noncovalent Interactions
r Molecular complementarity is the lock-and-key fit between molecules whose shapes, charges, and other physical properties are complementary. Multiple noncovalent interactions can form between complementary molecules, causing them to bind tightly (see Figure 2-12), but not between molecules that are not complementary.
r The terms hydrophilic, hydrophobic, and amphipathic/ amphiphilic refer to the tendency of molecules to be waterloving, incapable of interacting with water, and having features of or being tolerant of both, respectively. Hydrophilic molecules typically dissolve readily in water, whereas hydrophobic molecules are poorly soluble or insoluble in water. r Covalent bonds consist of pairs of electrons shared by two atoms. Covalent bonds arrange the atoms of a molecule into a specific geometry. r Many molecules in cells contain at least one asymmetric carbon atom, which is bonded to four dissimilar atoms. Such molecules can exist as stereoisomers (mirror images), designated d and l (see Figure 2-4), which have different biological activities. Nearly all amino acids are l isomers. r Electrons may be shared equally or unequally in covalent bonds. Atoms that differ in electronegativity form polar covalent bonds, in which the bonding electrons are distributed unequally. One end of a polar bond has a partial positive charge and the other end has a partial negative charge (see Figure 2-5). r Covalent bonds are stable in biological systems because the relatively high energies required to break them (50–200 kcal/mol) are much larger than the thermal kinetic energy available at room (25 °C) or body (37 °C) temperatures. r Noncovalent interactions between atoms are considerably weaker than covalent bonds, with energies ranging from about 1–5 kcal/mol (see Figure 2-6). r Four main types of noncovalent interactions occur in biological systems: ionic bonds, hydrogen bonds, van der Waals interactions, and interactions due to the hydrophobic effect. r Ionic bonds result from the electrostatic attraction between the positive and negative charges of ions. In aqueous solutions, all cations and anions are surrounded by a shell of bound water molecules (see Figure 2-7c). Increasing the salt (e.g., NaCl) concentration of a solution can weaken the relative strength of and even break the ionic bonds between biomolecules. r In a hydrogen bond, a hydrogen atom covalently bonded to an electronegative atom associates with an acceptor atom whose nonbonding electrons attract the hydrogen (see Figure 2-8). r Weak and relatively nonspecific van der Waals interactions result from the attraction between transient dipoles associated with all molecules. They can form when two atoms approach each other closely (see Figure 2-10). r In an aqueous environment, nonpolar molecules or nonpolar parts of larger molecules are driven together by the
r The high degree of binding specificity that results from molecular complementarity is one of the features that underlies intermolecular interactions in biology and thus is essential for many processes critical to life.
2.2 Chemical Building Blocks of Cells A common theme in biology is the construction of large macromolecules and macromolecular structures out of smaller molecular subunits, which can be thought of as building blocks. Often these subunits are similar or identical. The three main types of biological macromolecules—proteins, nucleic acids, and polysaccharides—are all polymers composed of multiple covalently linked small molecules, or monomers (Figure 2-13). Proteins are linear polymers containing up to several thousand amino acids linked by peptide bonds. Nucleic acids are linear polymers containing hundreds to millions of nucleotides linked by phosphodiester bonds. Polysaccharides are linear or branched polymers of monosaccharides (sugars) such as glucose linked by glycosidic bonds. Although the actual mechanisms of covalent bond formation between monomers are complex, as we will see, the formation of a covalent bond between two monomers usually involves the net loss of a hydrogen (H) from one monomer and a hydroxyl (OH) from the other monomer—or the net loss of one water molecule—and can therefore be thought of as a dehydration reaction. The breakdown, or cleavage, of a covalent bond in a polymer that releases a monomeric subunit involves the reverse reaction, or the addition of water, called hydrolysis. The covalent bonds that link monomers together are normally stable under normal biological conditions (e.g., 37 °C, neutral pH), so these biopolymers are stable and can perform a wide variety of jobs in cells, such as storing information, catalyzing chemical reactions, serving as structural elements that define cell shape and movement, and many others. Macromolecular structures can also be assembled using noncovalent interactions. The two-ply, or “bilayer,” structure of cellular membranes is built up by the noncovalent assembly of many thousands of small molecules called phospholipids (see Figure 2-13). In this chapter, we focus on the chemical building blocks making up cells—amino acids, nucleotides, sugars, and phospholipids. The structure, function, and assembly of
2.2 Chemical Building Blocks of Cells
41
MONOMERS H H2N
C
H
O C
POLYMERS
OH
H
N
R
O C
H OH
C
H
N
H
O
H
C
C
N
R1
R
H
O
H
C
C
N
R2
H
O
H
C
C
N
H
O
C
C
OH
R4
R3
Peptide bond
Amino acid
Polypeptide Phosphodiester bond B
B
O
O 3ⴕ
HO
P O
B1
O
5ⴕ
O
3ⴕ
OH
HO
P
O
O
5ⴕ
B2 O
3ⴕ
OH
HO
P O
O
5ⴕ
O
3ⴕ
P
B3
O
5ⴕ
O
3ⴕ
P
O
5ⴕ
OH
O Nucleic acid
Nucleotide
O
O
Glycosidic bond 4
OH O
OH 4
HO
1
HO
OH
HO
OH
4
O 1
HO
OH O
O OH
OH
Monosaccharide
HO
OH 1
HO 1
O
OH
4
OH
OH
O
Polysaccharide
Polar group Hydrophilic head group
Phosphate Glycerol C
O C
O Hydrophobic fatty acyl tails Phospholipid bilayer
Phospholipid
FIGURE 213 Overview of the cell’s principal chemical building blocks. (Top) The three major types of biological macromolecules are each assembled by the polymerization of multiple small molecules (monomers) of a particular type: proteins from amino acids (see Chapter 3), nucleic acids from nucleotides (see Chapter 5), and
polysaccharides from monosaccharides (sugars). Each monomer is covalently linked into the polymer by a reaction whose net result is loss of a water molecule (dehydration). (Bottom) In contrast, phospholipid monomers noncovalently assemble into a bilayer structure, which forms the basis of all cellular membranes (see Chapter 7).
proteins, nucleic acids, polysaccharides, and biomembranes are discussed in subsequent chapters.
group. Because the α carbon in all amino acids except glycine is asymmetric, these molecules can exist in two mirrorimage forms, called by convention the d (dextro) and the l (levo) isomers (see Figure 2-4). The two isomers cannot be interconverted (one made identical to the other) without breaking and then re-forming a chemical bond in one of them. With rare exceptions, only the l forms of amino acids are found in proteins. However, d amino acids are prevalent in bacterial cell walls and other microbial products. To understand the three-dimensional structures and functions of proteins, discussed in detail in Chapter 3, you must be familiar with some of the distinctive properties of amino acids, which are determined in part by their side chains. You need not memorize the detailed structure of each type of side chain
Amino Acids Differing Only in Their Side Chains Compose Proteins The monomeric building blocks of proteins are 20 amino acids, which—when incorporated into a protein polymer—are sometimes called residues. All amino acids have a characteristic structure consisting of a central alpha carbon atom (Cα) bonded to four different chemical groups: an amino (−NH 2) group, a carboxyl or carboxylic acid (−COOH) group (hence the name amino acid), a hydrogen (H) atom, and one variable group, called a side chain or R 42
CHAPTER 2
t Chemical Foundations
HYDROPHOBIC AMINO ACIDS COO H
3N
C
COO H N 3
H
C
CH3
COO
H
H N 3
C
H
H
C
CH3
CH H3C
COO
CH3
H N 3
CH2 H3C
CH3
C
COO H N 3
H
C
CH2
CH2
CH
CH2
CH3
H
COO H N 3
C
COO H N 3
H
C
CH2
Valine (Val or V)
Isoleucine (Ile or I)
Methionine (Met or M)
Acidic amino acids
H
3N
C
H
H
3N
H N 3
H
C
H
COO C CH2
CH2
CH2
C
NH
CH2
CH2 C H
N H
H N 3
C
COO H N 3
Histidine (His or H)
C
H
COO H
3N
C
H
H
C
OH
CH2
OH
CH3
Serine (Ser or S)
Threonine (Thr or T)
COO
H
H N 3
CH2
Arginine (Arg or R)
Tryptophan (Trp or W)
Polar amino acids with uncharged R groups
CH2
NH2 Lysine (Lys or K)
Tyrosine (Tyr or Y)
COO
NH2
C
H
Aspartate (Asp or D)
CH
NH
C
COO
CH2
NH3
Phenylalanine (Phe or F)
CH2
H
CH2
CH2
3N
COO
H 2N
C
H
COO H
3N
SPECIAL AMINO ACIDS COO H
3N
C CH2
H
COO H
3N
C H
H2C
SH Cysteine (Cys or C)
H
COO H C H N CH2 2
Glycine (Gly or G)
CH2
Proline (Pro or P)
to understand how proteins work because amino acids can be classified into several broad categories based on the size, shape, charge, hydrophobicity (a measure of water solubility), and chemical reactivity of their side chains (Figure 2-14). Amino acids with nonpolar side chains, called hydrophobic amino acids, are poorly soluble in water. The larger the nonpolar side chain, the more hydrophobic the amino acid. The side chains of alanine, valine, leucine, and isoleucine are linear or branched hydrocarbons that do not form a ring, and they are therefore called aliphatic amino acids. These amino acids are all nonpolar, as is methionine, which is similar to them except that it contains one sulfur atom. Phenylalanine, tyrosine, and tryptophan have large, hydrophobic,
C
H
CH2
CH2
C
CH2 O
C H 2N
Glutamate (Glu or E)
CH NH
OH
COO
COO
H
CH2 C
Basic amino acids COO
C
S
Leucine (Leu or L)
HYDROPHILIC AMINO ACIDS
H N 3
H
CH2
CH3 Alanine (Ala or A)
COO
Asparagine (Asn or N)
O
Glutamine (Gln or Q)
FIGURE 214 The 20 common amino acids used to build proteins. The side chain (R group; red) determines the characteristic properties of each amino acid and is the basis for grouping amino acids into three main categories: hydrophobic, hydrophilic, and special. Shown are the ionized forms that exist at the pH (∼7) of the cytosol. In parentheses are the three-letter and one-letter abbreviations for each amino acid.
aromatic rings in their side chains. In later chapters, we will see in detail how hydrophobic side chains under the influence of the hydrophobic effect often pack into the interior of proteins or line the surfaces of proteins that are embedded within hydrophobic regions of biomembranes. Amino acids with polar side chains are called hydrophilic amino acids; the most hydrophilic of these amino acids is the subset with side chains that are charged (ionized) at the pH typical of biological fluids (∼7) both inside and outside the cell (see Section 2.3). Arginine and lysine have positively charged side chains and are called basic amino acids; aspartic acid and glutamic acid have negatively charged side chains due to the carboxylic acid groups in their side chains 2.2 Chemical Building Blocks of Cells
43
(their charged forms are called aspartate and glutamate) and are called acidic amino acids. A fifth amino acid, histidine, has a side chain containing a ring with two nitrogens, called imidazole, which can shift from being positively charged to uncharged in response to small changes in the acidity of its environment: CH2 C
N
H
H
C C
H
H
N
C C
C
CH2
H
N
N
H
pH 5.8
H
pH 7.8
The activities of many proteins are modulated by shifts in environmental acidity (pH) through protonation or deprotonation of histidine side chains. Asparagine and glutamine are uncharged but have polar side chains containing amide groups with extensive hydrogen-bonding capacities. Similarly, serine and threonine are uncharged but have polar hydroxyl groups, which also participate in hydrogen bonds with other polar molecules. Finally, cysteine, glycine, and proline play special roles in proteins because of the unique properties of their side chains. The side chain of cysteine contains a reactive sulfhydryl group (−SH). On release of a proton (H+), a sulfhydryl group is converted into a thiolate anion (S−). Thiolate anions can play important roles in catalysis, notably in certain enzymes that destroy proteins (proteases). In proteins, each of two adjacent sulfhydryl groups can be oxidized, each releasing a proton and an electron, to form a covalent disulfide bond (−S−S−):
H
N
H
C
CH2
C
O
H
N
H
C
O
C
CH2
SH HS
S
S
N
H
CH2C
H
C
O
N
H
C
H
C
O
CH2
Disulfide bonds serve to “cross-link” regions within a single polypeptide chain (intramolecular cross-linking) or between two separate chains (intermolecular cross-linking). Disulfide bonds stabilize the folded structure of some proteins. The smallest amino acid, glycine, has a single hydrogen atom as its R group. Its small size allows it to fit into tight spaces. Unlike those of the other common amino acids, the side chain of proline (pronounced pro-leen) bends around to form a ring by covalently bonding to the nitrogen atom in the amino group attached to the Cα. As a result, proline is very rigid, and its amino group is not available for typical
44
CHAPTER 2
hydrogen bonding. The presence of proline in a protein creates a fixed kink in the polymer chain, limiting how it can fold in the vicinity of the proline residue. Some amino acids are more abundant in proteins than others. Cysteine, tryptophan, and methionine are not common amino acids: together, they constitute approximately 5 percent of the amino acids in a typical protein. Four amino acids—leucine, serine, lysine, and glutamic acid—are the most abundant amino acids, constituting 32 percent of all the residues in a typical protein. However, the amino acid compositions of particular proteins may vary widely from these values.
t Chemical Foundations
Humans and other mammals can synthesize 11 of the 20 amino acids. The other nine are called essential amino acids and must be included in the diet to permit normal protein production. These essential amino acids are phenylalanine, valine, threonine, tryptophan, isoleucine, methionine, leucine, lysine, and histidine. Adequate provision of these essential amino acids in feed is key to the livestock industry. Indeed, a genetically engineered variety of corn with a high lysine content is now in use as an “enhanced” feed to promote the growth of animals. ■ Although cells use the 20 amino acids shown in Figure2-14 in the initial synthesis of proteins, analysis of cellular proteins reveals that they contain over 100 different amino acids. The difference is due to the chemical modification of some amino acids after they are incorporated into proteins by the addition of a variety of chemical groups (Figure 2-15). One important modification is the addition of acetyl groups (CH3CO) to amino acids, a process known as acetylation. Another is the addition of a phosphate (PO4) to hydroxyl groups in serine, threonine, and tyrosine residues, a process known as phosphorylation. We will encounter numerous examples of proteins whose activity is regulated by reversible phosphorylation and dephosphorylation. Phosphorylation of nitrogen in the side chain of histidine is well known in bacteria, fungi, and plants, but less studied— perhaps because of the relative instability of phosphorylated histidine—and apparently rare in mammals. Methylation of arginine and lysine side chains on proteins called histones is an important regulator of gene expression in eukaryotes (see Chapter9). Like phosphorylation and dephosphorylation, controlled methylation and demethylation are important regulatory processes. The side chains of asparagine, serine, and threonine are sites for glycosylation, the attachment of linear and branched carbohydrate chains. Many secreted proteins and membrane proteins contain glycosylated residues, and the reversible modification of hydroxyl groups on specific serines and threonines by a sugar called N-acetylglucosamine also regulates protein activities. Other amino acid modifications found in selected proteins include the hydroxylation of proline and lysine residues in collagen (see Chapter 19), the methylation of histidine residues in membrane receptors, and the γ-carboxylation of glutamate in blood-clotting factors such as prothrombin. Deamidation of asparagine and
O Acetyl lysine
CH3
H N
C
CH2
CH2
CH2
COO
CH
CH2
NH3
This modification may play an important role in controlling the life span of proteins within cells because many nonacetylated proteins are rapidly degraded.
O −O
Phosphoserine
P
CH
CH2
O
COO
Five Different Nucleotides Are Used to Build Nucleic Acids
O−
NH3
O −O
Phosphotyrosine
P
CH
CH2
O
NH3 CH3
O Phosphothreonine
O−
−O
COO
P
CH
CH2
O
COO
O−
NH3 OH CH
H2C
3-Hydroxyproline
H2C
COO
CH
NH2
HC 3-Methylhistidine
N
H3C
C
N
C H
COO
CH NH3
OOC
-Carboxyglutamate
CH
CH2
OOC
COO
CH
NH3
OH
O-GlcNAc-threonine
CH2
HO
O
CH3 O
HO
Two types of chemically similar nucleic acids, DNA (deoxyribonucleic acid) and RNA (ribonucleic acid), are the cell’s principal molecules that carry genetic information. The monomers from which DNA and RNA polymers are built, called nucleotides, all have a common structure: a phosphate group linked by a phosphoester bond to a pentose (five-carbon) sugar, which in turn is linked to a nitrogenand carbon-containing ring structure commonly referred to as a base (Figure 2-16a). In RNA, the pentose is ribose; in DNA, it is deoxyribose, which has a proton, rather than a hydroxyl group, at position 2′ (Figure 2-16b). (We describe the structures of sugars in more detail below.) The bases adenine, guanine, and cytosine (Figure 2-17) are found in both DNA and RNA; thymine is found only in DNA, and uracil is found only in RNA. Adenine and guanine are purines, which contain a pair of fused rings; cytosine, thymine, and uracil are pyrimidines, which contain a single ring (see Figure 2-17). The bases are often abbreviated A, G, C, T, and U, respectively; these same single-letter abbreviations are also commonly used to denote the entire nucleotides in nucleic acid polymers. In nucleotides, the 1′ carbon atom of the sugar (ribose or deoxyribose) is attached to the nitrogen at position 9 of a purine (N9) or at position 1 of a pyrimidine (N1). The acidic character of nucleotides is due to the phosphate group, which under normal intracellular conditions releases hydrogen
CH
CH
COO
NH
NH3
O C
NH2
(a) Adenine
C CH3
FIGURE 215 Common modifications of amino acid side chains in proteins. These modified residues and numerous others are formed by addition of various chemical groups (red) to the amino acid side chains during or after synthesis of a polypeptide chain.
N1
6
HC 2
3
5C 4C
N
O
CH3
C
N
C
C
H
H
Acetylated N-terminus
9
N
O
5
P
O
4
OH
H
O
H 2
4
H
H
1
H
H 2
OH
OH
Ribose
O
CH2
1
H
H 3
3
R
8 CH
5
HOCH2
OH
Phosphate
glutamine to form the corresponding acidic amino acids, aspartate and glutamate, is also a common occurrence. Acetylation of the amino group of the N-terminal residue is the most common form of amino acid chemical modification, affecting an estimated 80 percent of all proteins:
N
7
O
O
O
(b)
OH
Ribose Adenosine 5-monophosphate (AMP)
5
O
HOCH2 4
H
OH 1
H
H
H 2
3
OH
H
2’-Deoxyribose
FIGURE 216 Common structure of nucleotides. (a) Adenosine 5′-monophosphate (AMP), a nucleotide present in RNA. By convention, the carbon atoms of the pentose sugar in nucleotides are numbered with primes. In natural nucleotides, the 1′ carbon is joined by a β linkage to the base (in this case, adenine); both the base (blue) and the phosphate on the 5′ hydroxyl (red) extend above the plane of the sugar ring. (b) Ribose and deoxyribose, the pentoses in RNA and DNA, respectively.
2.2 Chemical Building Blocks of Cells
45
ester—involves the covalent linking of an acid, such as a carboxylic acid or a phosphoric acid, with an alcohol accompanied by the release of an hydroxyl (−OH) group from the acid and an H from the hydroxyl group on the other molecule, which together form a water molecule. Here, a phosphoric acid is esterified with the 5′ hydroxyl group of the ribose. Nucleoside monophosphates have a single esterified phosphate (see Figure 2-16a); nucleoside diphosphates contain a pyrophosphate group:
PURINES NH2 C N1 HC 2
6
3
O 7
4
9
N
C
C
N
5C
HN1
8 CH
C2
N
H2N
6
3
N
5C
7
4
9
8 CH
C
N
N
H
H
Adenine (A)
Guanine (G)
PYRIMIDINES
HN3 C2 O
O
O
O
NH2
C
C
C
4
1
HN3
5CH 6
4
C2
CH
N
1
N
O
5C 6
CH3
C2
CH O
H
H Uracil (U)
4
N3
1
6
O
P
O
5CH
O
Pyrophosphate
CH
N
and nucleoside triphosphates have a third phosphate. Table2-3 lists the names of the nucleosides and nucleotides in nucleic acids and the various forms of nucleoside phosphates. The nucleoside triphosphates are used in the synthesis of nucleic acids, which we cover in Chapter 5. Among their other functions in the cell, GTP participates in intracellular signaling and acts as an energy reservoir, particularly in protein synthesis, and ATP, discussed later in this chapter, is the most widely used biological energy carrier.
Cytosine (C)
FIGURE 217 Chemical structures of the principal bases in nucleic acids. In nucleic acids and nucleotides, nitrogen 9 of purines and nitrogen 1 of pyrimidines (red) are bonded to the 1′ carbon of ribose or deoxyribose. U is found only in RNA, and T is found only in DNA. Both RNA and DNA contain A, G, and C.
ions (H+), leaving the phosphate negatively charged (see Figure 2-16a). Most nucleic acids in cells are associated with proteins, which form ionic interactions with the negatively charged phosphates. Cells and extracellular fluids in organisms contain small concentrations of nucleosides, combinations of a base and a sugar without a phosphate. Nucleotides are nucleosides that have one, two, or three phosphate groups esterified at the 5′ hydroxyl. Esterification—the formation of an
TABLE 23
O
P
H
Thymine (T)
O
O
Monosaccharides Covalently Assemble into Linear and Branched Polysaccharides The building blocks of the polysaccharides are the simple sugars, or monosaccharides. Monosaccharides are carbohydrates, which are literally covalently bonded combinations of carbon and water in a one-to-one ratio (CH2O)n, where n equals 3, 4, 5, 6, or 7. Hexoses (n = 6) and pentoses (n = 5) are the most common monosaccharides. All monosaccharides
Terminology of Nucleosides and Nucleotides Purines
Pyrimidines
Adenine (A)
Guanine (G)
Cytosine (C)
Uracil (U) Thymine (T)
in RNA
Adenosine
Guanosine
Cytidine
Uridine
in DNA
Deoxyadenosine
Deoxyguanosine
Deoxycytidine
Deoxythymidine
in RNA
Adenylate
Guanylate
Cytidylate
Uridylate
in DNA
Deoxyadenylate
Deoxyguanylate
Deoxycytidylate
Deoxythymidylate
Nucleoside monophosphates
AMP
GMP
CMP
UMP
Nucleoside diphosphates
ADP
GDP
CDP
UDP
Nucleoside triphosphates
ATP
GTP
CTP
UTP
Deoxynucleoside mono-, di-, and triphosphates
dAMP, etc.
dGMP, etc.
dCMP, etc.
dTMP, etc.
Bases Nucleosides
Nucleotides
46
CHAPTER 2
{ {
t Chemical Foundations
contain hydroxyl (−OH) groups and either an aldehyde or a keto group: O C
O
C
H
C
C
Aldehyde
C
Keto
Many biologically important sugars are hexoses, including glucose, mannose, and galactose (Figure 2-18). Mannose is identical to glucose except that the orientation of the groups bonded to carbon 2 is reversed. Similarly, galactose, another hexose, differs from glucose only in the orientation of the groups attached to carbon 4. Interconversion of glucose and mannose or galactose requires the breaking and making of covalent bonds; such reactions are carried out by enzymes called epimerases. d-Glucose (C6H12O6) is the principal external source of energy for most cells in complex multicellular organisms. It can exist in three different forms: a linear structure and two different hemiacetal ring structures (Figure2-18a). If the aldehyde group on carbon 1 combines with the hydroxyl group on carbon 5, the resulting hemiacetal, d- glucopyranose, contains a six-member ring. In the α anomer of d-glucopyranose, the hydroxyl group attached
to carbon 1 points “downward” from the ring, as shown in Figure 2-18a; in the β anomer, this hydroxyl points “upward.” In aqueous solution, the α and β anomers readily interconvert spontaneously; at equilibrium there is about one-third α anomer and two-thirds β, with very little of the open-chain form. Because enzymes can distinguish between the α and β anomers of d-glucose, these forms have distinct biological roles. Condensation of the hydroxyl group on carbon 4 of the linear glucose with its aldehyde group results in the formation of d-glucofuranose, a hemiacetal containing a five-member ring. Although all three forms of d-glucose exist in biological systems, the pyranose (six-member ring) form is by far the most abundant. The pyranose ring in Figure 2-18a is depicted as planar. In fact, because of the tetrahedral geometry around carbon atoms, the most stable conformation of a pyranose ring has a nonplanar, chairlike shape. In this conformation, each bond from a ring carbon to a nonring atom (e.g., H or O) is either nearly perpendicular to the ring, referred to as axial (a), or nearly in the plane of the ring, referred to as equatorial (e): H
a e
e
O a
H
6
HCOH O H
OH
H
H H
3
2
H
6
C
CH2OH 5
4
O
1
1
C
HO
C
H
C
OH
OH H
D-Glucofuranose
C
2 3 4 5
H
OH
4
H
HO H
OH
H 1
OH 3
OH
2
OH
D-Glucopyranose
6
(rare)
CH2OH 5 O H OH H
(common)
CH2OH
D-Glucose
(b)
H
O
1
H
C HO HO H H
C C C C
O
1
C 2 3 4 5
H H OH OH
6
CH2OH
D-Mannose
H HO HO H
C C C C
2 3 4 5
OH H H OH
6
CH2OH
D-Galactose
FIGURE 218 Chemical structures of hexoses. All hexoses have the same chemical formula (C6H12O6) and contain an aldehyde or a keto group. (a) The ring forms of D-glucose are generated from the linear molecule by reaction of the aldehyde at carbon 1 with the hydroxyl on carbon 5 or carbon 4. The three forms are readily interconvertible, although the pyranose form (right) predominates in biological systems. (b) In D-mannose and D-galactose, the configuration of the H (green) and OH (blue) bound to one carbon atom differs from that in glucose. These sugars, like glucose, exist primarily as pyranoses (six-member rings).
e
e Pyranoses
6
CH2OH
4
5
a
e a
(a)
HO
a
HO
H 3
O H 2 1
HO
H
H
OH
-D-Glucopyranose
Disaccharides, formed from two monosaccharides, are the simplest polysaccharides. The disaccharide lactose, composed of galactose and glucose, is the major sugar in milk; the disaccharide sucrose, composed of glucose and fructose, is a principal product of plant photosynthesis and is refined into common table sugar (Figure 2-19). Larger polysaccharides, containing dozens to hundreds of monosaccharide units, can function as reservoirs for glucose, as structural components, or as adhesives that help hold cells together in tissues. The most common storage carbohydrate in animal cells is glycogen, a very long, highly branched polymer of glucose. As much as 10 percent of the liver by weight can be glycogen. The primary storage carbohydrate in plant cells, starch, is also a glucose polymer. It occurs in an unbranched form (amylose) and a lightly branched form (amylopectin). Both glycogen and starch are composed of the α anomer of glucose. In contrast, cellulose, the major constituent of plant cell walls, which confers stiffness to many plant structures (see Chapter 19), is an unbranched polymer of the β anomer of glucose. Human digestive enzymes can hydrolyze the α glycosidic bonds in starch but not the β glycosidic bonds in cellulose. Many species of plants, bacteria, and molds produce cellulose-degrading enzymes. Cows and termites can break down cellulose because they harbor cellulose-degrading bacteria in their gut. Bacterial cell walls consist of peptidoglycan, a polysaccharide chain cross-linked by peptide cross-bridges, which confers rigidity and cell shape. Human tears and gastrointestinal fluids contains lysozyme, an enzyme capable of hydrolyzing peptidoglycan in the bacterial cell wall. 2.2 Chemical Building Blocks of Cells
47
HO
CH2OH O H OH H H
the disaccharides lactose and sucrose. In any glycosidic linkage, the anomeric carbon of one sugar molecule (in either the α or β conformation) is linked to a hydroxyl oxygen on another sugar molecule. The linkages are named accordingly; thus lactose contains a β(1 → 4) bond, and sucrose contains an α(1 → 2) bond.
OH H
H
FIGURE 219 Formation of
1
H 4
H
H 1
CH2OH O 2
H
6
CH2OH
H HO
H OH H
OH
1
H
OH
H
CH2OH O H OH H
OH
Glucose-6-phosphate
H
H 1
O
O
O
OH
O O
P
O
P
Uridine
O
UDP-galactose
The epimerase enzymes that interconvert different monosaccharides often do so using the nucleotide sugars rather than the unmodified, or “free,” sugars. Many complex polysaccharides contain modified sugars that are covalently linked to various small groups, particularly amino, sulfate, and acetyl groups. Such modifications are abundant in glycosaminoglycans, major polysaccharide components of the extracellular matrix that we describe in Chapter 19.
Phospholipids Associate Noncovalently to Form the Basic Bilayer Structure of Biomembranes Biomembranes are large, flexible sheets with a two-ply, or bilayer, structure. They serve as the boundaries of cells and 48
CHAPTER 2
t Chemical Foundations
H
H2O
4
OH H
H
OH
H
CH2OH O H OH H
HO H
Fructose
6
2
OPO3 O H
O
H H
HO
OH
OH
Glucose
OH
Lactose
H
HO
CH2OH O H OH H
H 1
H
OH
The enzymes that make the glycosidic bonds linking monosaccharides into polysaccharides are specific for the α or β anomer of one sugar and a particular hydroxyl group on the other. In principle, any two sugar molecules can be linked in a variety of ways because each monosaccharide has multiple hydroxyl groups that can participate in the formation of glycosidic bonds. Furthermore, any one monosaccharide has the potential to be linked to more than two other monosaccharides, thus generating a branch point and nonlinear polymers. Glycosidic bonds are usually formed between the growing polysaccharide chain and a covalently modified form of a monosaccharide. Such modifications include the addition of a phosphate (e.g., glucose-6-phosphate) or a nucleotide (e.g., UDP-galactose): CH2
HO
CH2OH O H OH H
Glucose
OH H
H 2O
H H
OH
HO
OH
HO
Galactose
CH2OH O H OH H
CH2OH O H OH H
H 1
CH2OH O 2
O
H
CH2OH OH
OH
H
HO H
Sucrose
their intracellular organelles and form the outer surfaces of some viruses. Membranes literally define what is a cell (the outer membrane and the contents within the membrane) and what is not (the extracellular space outside the membrane). Unlike proteins, nucleic acids, and polysaccharides, membranes are assembled by the noncovalent association of their component building blocks. The primary building blocks of all biomembranes are phospholipids, whose physical properties are responsible for the formation of the sheet-like bilayer structure of membranes. In addition to phospholipids, biomembranes can contain a variety of other molecules, including cholesterol, glycolipids, and proteins. The structure and functions of biomembranes will be described in detail in Chapter 7. Here we will focus on the phospholipids in biomembranes. To understand the structure a phospholipid molecule, we have to understand each of its component parts and how it is assembled. As we will see shortly, a phospholipid molecule consists of two long-chain, nonpolar fatty acid groups linked (usually by an ester bond) to small, highly polar groups, including a short organic molecule such as glycerol (trihydroxy propane), a phosphate, and typically, a small organic molecule (Figure 2-20). Fatty acids consist of a hydrocarbon chain attached to a carboxyl group (−COOH). Like glucose, fatty acids are an important energy source for many cells (see Chapter 12). They differ in length, although the predominant fatty acids in cells have an even number of carbon atoms, usually 14, 16, 18, or 20. The major fatty acids in phospholipids are listed in Table 2-4. Fatty acids are often designated by the abbreviation Cx:y, where x is the number of carbons in the chain and y is the number of double bonds. Fatty acids containing 12 or more carbon atoms are nearly insoluble in aqueous solutions because of their long hydrophobic hydrocarbon chains. Fatty acids in which all the carbon-carbon bonds are single bonds—that is, the fatty acids have no carbon-carbon double bonds—are said to be saturated; those with at least one carbon-carbon double bond are called unsaturated.
Fatty acid chains
O
Hydrophobic tail
Hydrophilic head
C
CH2
O O C
CH
O
H2C
Phosphate
O P O
CH3
H2 C O O−
N+ C H2
Glycerol
CH3 CH3
Choline
PHOSPHATIDYLCHOLINE
FIGURE 220 Phosphatidylcholine, a typical phosphoglyceride. All phosphoglycerides are amphipathic phospholipids, having a hydrophobic tail (yellow) and a hydrophilic head (blue) in which glycerol is linked via a phosphate group to an alcohol. Either or both
of the fatty acyl side chains in a phosphoglyceride may be saturated or unsaturated. In phosphatidic acid (red), the simplest phospholipid, the phosphate is not linked to an alcohol.
Unsaturated fatty acids with more than one carbon-carbon double bond are referred to as polyunsaturated. Two “essential” polyunsaturated fatty acids, linoleic acid (C18:2) and linolenic acid (C18:3), cannot be synthesized by mammals and must be supplied in their diet. Mammals can synthesize other common fatty acids. In phospholipids, fatty acids are covalently attached to another molecule by esterification. In the combined molecule formed by this reaction, the part derived from the fatty acid is called an acyl group, or fatty acyl group. This structure is illustrated by the most common forms of phospholipids: phosphoglycerides, which contain two acyl groups attached to two of the three hydroxyl groups of glycerol (see Figure 2-20). In phosphoglycerides, one hydroxyl group of the glycerol is esterified to phosphate while the other two are normally esterified to fatty acids. The simplest phospholipid, phosphatidic acid, contains only these components. Phospholipids such as phosphatidic acids are not only membrane building blocks but also important signaling molecules. Lysophosphatidic acid, in which the acyl chain at the 2 position (attached to the hydroxyl group on the central carbon of the glycerol) has been removed, is relatively water soluble and can be a potent inducer of cell division (called a mitogen). In most phospholipids found in membranes, the phosphate group is also esterified to a hydroxyl group on another hydrophilic compound. In phosphatidylcholine, for example,
choline is attached to the phosphate (see Figure 2-20). The negatively charged phosphate, as well as the charged or polar groups esterified to it, can interact strongly with water. The phosphate and its associated esterified group constitute the “head” group of a phospholipid, which is hydrophilic, whereas the fatty acyl chains, the “tails,” are hydrophobic. Other common phosphoglycerides and associated head groups are shown in Table 2-5. Molecules such as phospholipids that have both hydrophobic and hydrophilic regions are called amphipathic. In Chapter 7, we will see how the amphipathic properties of phospholipids allow their assembly into sheet-like bilayers in which the fatty acyl tails point into the center of the sheet and the head groups point outward toward the aqueous environment (see Figure 2-13). Fatty acyl groups also can be covalently linked in other fatty molecules, including triacylglycerols, or triglycerides, which contain three acyl groups esterified to glycerol:
TABLE 24
O
H3C
(CH2)n C
O
CH2
O
CH
O
CH2
O
H3C
(CH2)n C O
H3C
(CH2)n C
Triacylglycerol
Fatty Acids That Predominate in Phospholipids
Common Name of Acid (ionized form in parentheses)
Abbreviation
Chemical Formula
Saturated Fatty Acids Myristic (myristate)
C14:0
CH3(CH2)12COOH
Palmitic (palmitate)
C16:0
CH3(CH2)14COOH
Stearic (stearate)
C18:0
CH3(CH2)16COOH
Oleic (oleate)
C18:1
CH3(CH2)7CH=CH(CH2)7COOH
Linoleic (linoleate)
C18:2
CH3(CH2)4CH=CHCH2CH=CH(CH2)7COOH
Arachidonic (arachidonate)
C20:4
CH3(CH2)4(CH=CHCH2)3CH=CH(CH2)3COOH
Unsaturated Fatty Acids
2.2 Chemical Building Blocks of Cells
49
TABLE 25
Triglycerides and cholesteryl esters are extremely waterinsoluble molecules in which fatty acids and cholesterol are either stored or transported. Triglycerides are the storage form of fatty acids in the fat cells of adipose tissue and are the principal components of dietary fats. Cholesteryl esters and triglycerides are transported between tissues through the bloodstream in specialized carriers called lipoproteins (see Chapter 14).
Common Phosphoglycerides and Head Groups
Common Phosphoglycerides
Head Group CH3
Phosphatidylcholine
N+
O
CH3 CH3
We saw above that the fatty acids, which are key components of both phospholipids and triglycerides, can be either saturated or unsaturated. An important consequence of the carbon-carbon double bond (C=C) in an unsaturated fatty acid is that two stereoisomeric configurations, cis and trans, are possible around each of these bonds:
Choline
H
Phosphatidylethanolamine
H
N+
O
H
Ethanolamine
H2C
CH2 C
H
H N+
O
Phosphatidylserine
H
O−
O
Serine
OH
OH HO O
Phosphatidylinositol
6 1
OH
4
5 2
3
OH
Inositol
They also can be covalently attached to the very hydrophobic molecule cholesterol, an alcohol, to form cholesteryl esters: H3C
CH3
CH3
CH3
CH3
HO Cholesterol H3C CH3 CH3
CH3 O O Cholesteryl ester
50
CHAPTER 2
t Chemical Foundations
C
H Cis
H
H
H2C
C
C
CH2
H Trans
A cis double bond introduces a rigid kink in the otherwise flexible straight acyl chain of a saturated fatty acid (Figure 2-21). In general, the unsaturated fatty acids in biological systems contain only cis double bonds. Saturated fatty acids without the kink can pack together tightly and so have higher melting points than unsaturated fatty acids. The main fatty molecules in butter are triglycerides with saturated fatty acyl chains, which is why butter is usually solid at room temperature. Unsaturated fatty acids or fatty acyl chains with the cis double bond kink cannot pack as closely together as saturated fatty acyl chains. Thus vegetable oils, composed of triglycerides with unsaturated fatty acyl groups, usually are liquid at room temperature. Vegetable and similar oils may be partially hydrogenated to convert some of their unsaturated fatty acyl chains to saturated fatty acyl chains. As a consequence, the hydrogenated vegetable oil can be molded into solid sticks of margarine. A by-product of the hydrogenation reaction is the conversion of some of the fatty acyl chains into trans fatty acids, popularly called “trans fats.” These “trans fats,” found in partially hydrogenated margarine and other food products, are not natural. Saturated and trans fatty acids have similar physical properties; for example, they tend to be solids at room temperature. Their consumption, relative to the consumption of unsaturated fats, is associated with increased plasma cholesterol levels and is discouraged by some nutritionists. ■ CH3 CH3
H3C
H C
H
H C
H
H C
H
H C
H
H C
H
H C
H H H3C
H
H
H
H
H
H
H
H
H
H
H
H
H
C
C
C
C
C
C
C
C
C
C
C
C
C
C
H
H
H
H
H
H
H
H
H
H
H
H
H
H
O C
Palmitate (ionized form of palmitic acid)
FIGURE 221 The effect of a double bond on the shape of fatty acids. Shown are chemical structures of the ionized form of palmitic acid, a saturated fatty acid with 16 C atoms, and oleic acid,
O
H C
H
H
H
H
H
H
H
H
C
C
C
C
C
C
C
C
H
H
H
H
H
H
H
H
O
C H
C O
Oleate (ionized form of oleic acid)
an unsaturated one with 18 C atoms. In saturated fatty acids, the hydrocarbon chain is often linear; the cis double bond in oleate creates a rigid kink in the hydrocarbon chain.
KEY CONCEPTS OF SECTION 2.2
sugar, leading to formation of disaccharides and other polysaccharides (see Figure 2-19).
Chemical Building Blocks of Cells
r Phospholipids are amphipathic molecules with a hydrophobic tail (often two fatty acyl chains) connected by a small organic molecule (often glycerol) to a hydrophilic head (see Figure 2-20).
r Macromolecules are polymers of monomer subunits linked together by covalent bonds via dehydration reactions. Three major types of macromolecules are found in cells: proteins, composed of amino acids linked by peptide bonds; nucleic acids, composed of nucleotides linked by phosphodiester bonds; and polysaccharides, composed of monosaccharides (sugars) linked by glycosidic bonds (see Figure 2-13). Phospholipids, the fourth major chemical building block, assemble noncovalently into biomembranes. r Differences in the size, shape, charge, hydrophobicity, and reactivity of the side chains of the 20 common amino acids determine the chemical and structural properties of proteins (see Figure 2-14). The three general categories into which the side chains fall are hydrophobic, hydrophilic (basic, acidic, polar), and special (see Figure 2-14). It is helpful to remember which amino acids fall into each of these categories. r The bases in the nucleotides composing DNA and RNA are carbon- and nitrogen-containing rings attached to a pentose sugar. They form two groups: the purines, with two rings—adenine (A) and guanine (G)—and the pyrimidines, with one ring—cytosine (C), thymine (T), and uracil (U) (see Figure 2-17). A, G, T, and C are found in DNA, and A, G, U, and C are found in RNA. r Glucose and other hexoses can exist in three forms: an open-chain linear structure, a six-member (pyranose) ring, and a five-member (furanose) ring (see Figure 2-18). In biological systems, the pyranose form of d-glucose predominates. r Glycosidic bonds are formed between either the α or the β anomer of one sugar and a hydroxyl group on another
r The long hydrocarbon chain of a fatty acid may be saturated (containing no carbon-carbon double bonds) or unsaturated (containing one or more double bonds). Fatty substances such as butter that have primarily saturated fatty acyl chains tend to be solid at room temperature, whereas unsaturated fats with cis double bonds have kinked chains that cannot pack closely together and so tend to be liquids at room temperature.
2.3 Chemical Reactions and Chemical Equilibrium We now shift our discussion to chemical reactions in which bonds, primarily covalent bonds in reactant chemicals, are broken and new bonds are formed to generate reaction products. At any one time, several hundred different kinds of chemical reactions are occurring simultaneously in every cell, and many chemicals can, in principle, undergo multiple chemical reactions. Both the extent to which reactions can proceed and the rate at which they take place determine the chemical composition of cells. In this section, we discuss the concepts of equilibrium and steady state as well as dissociation constants and pH. These concepts will arise again and again throughout this text, so it is important for you to be familiar with them. In Section 2.4, we discuss how energy influences the extents and rates of chemical reactions.
2.3 Chemical Reactions and Chemical Equilibrium
51
A Chemical Reaction Is in Equilibrium When the Rates of the Forward and Reverse Reactions Are Equal When reactants first mix together—before any products have been formed—the rate of the forward reaction to form products is determined in part by the reactants’ initial concentrations, which determine the likelihood of reactants bumping into one another and reacting (Figure 2-22). As the reaction products accumulate, the concentration of each reactant decreases, and so does the forward reaction rate. Meanwhile, some of the product molecules begin to participate in the reverse reaction, which re-forms the reactants. The ability of a reaction to go “backward” is called microscopic reversibility. The reverse reaction is slow at first but speeds up as the concentration of product increases. Eventually, the rates of the forward and reverse reactions become equal, so that the concentrations of reactants and products stop changing. The system is then said to be in chemical equilibrium (plural, equilibria). The ratio of the concentrations of the products to the concentrations of the reactants when they reach equilibrium, called the equilibrium constant (Keq), is a fixed value. Thus Keq provides a measure of the extent to which a reaction occurs by the time it reaches equilibrium. The rate of a chemical reaction can be increased by a catalyst, but a catalyst does not change the equilibrium constant (see Section 2.4). A catalyst accelerates the making and breaking of covalent bonds but itself is not permanently changed during a reaction.
The Equilibrium Constant Reflects the Extent of a Chemical Reaction For any chemical reaction, K eq depends on the chemical nature of the reactants and products, the temperature, and the pressure (particularly in reactions involving gases). Under standard physical conditions (25 °C and 1 atm pressure for biological systems), Keq is always the same for a given reaction, whether or not a catalyst is present. For the general reaction with three reactants and three products, ZZ ZX aA + bB + cC Y Z zZ + yY + xX
(2-1)
where capital letters represent particular molecules or atoms and lowercase letters represent the number of each in the reaction, the formula for the equilibrium constant is given by Keq =
[X]x[Y]y[Z]z [A]a[B]b[C]c
(2-2)
where brackets denote the concentrations of the molecules. In Equation 2-2, the concentrations of reactants and products are those present at equilibrium. The rate of the forward reaction (left to right in Equation 2-1) is Rateforward = kf[A]a[B]b[C]c where k f is the rate constant for the forward reaction. Similarly, the rate of the reverse reaction (right to left in Equation 2-1) is Ratereverse = kr[X]x[Y]y[Z]z
Reaction rate
Rate of forward reaction (decreases as the concentration of reactants decreases)
Chemical equilibrium (forward and reverse rates are equal, no change in concentration of reactants and products) Rate of reverse reaction (increases as the concentration of products increases)
When reactants are first mixed, initial concentration of products = 0 Time
FIGURE 222 Time dependence of the rates of a chemical reaction. The forward and reverse rates of a reaction depend in part on the initial concentrations of reactants and products. The net forward reaction rate slows as the concentration of reactants decreases, whereas the net reverse reaction rate increases as the concentration of products increases. At equilibrium, the rates of the forward and reverse reactions are equal, and the concentrations of reactants and products remain constant. 52
CHAPTER 2
t Chemical Foundations
where kr is the rate constant for the reverse reaction. These reaction rate equations apply whether or not the reaction has reached equilibrium. It is important to remember that the forward and reverse rates of a reaction can change because of changes in reactant or product concentrations, yet at the same time the forward and reverse rate constants do not change; hence the name “constant.” Confusing rates and rate constants is a common error. At equilibrium the forward and reverse rates are equal, so Rateforward/Ratereverse = 1. By rearranging these equations, we can express the equilibrium constant as the ratio of the rate constants: Keq =
kf kr
(2-3)
The concept of Keq is particularly helpful when we want to think about the energy that is released or absorbed when a chemical reaction occurs. We will discuss this concept in considerable detail in Section 2.4.
Chemical Reactions in Cells Are at Steady State Under appropriate conditions and given sufficient time, a single biochemical reaction carried out in a test tube eventually reaches equilibrium, at which the concentrations of reactants and products do not change with time because the
(a) Test tube equilibrium concentrations BBB BBB BBB
AAA
(b) Intracellular steady-state concentrations AA
BBB BBB
CC CC
FIGURE 223 Comparison of reactions at equilibrium and at steady state. (a) In the test tube, a biochemical reaction (A → B) eventually reaches equilibrium, at which the rates of the forward and reverse reactions are equal (as indicated by the reaction arrows of equal length). (b) In metabolic pathways within cells, the product B is commonly consumed—in this example, by conversion to C. A pathway of linked reactions is at steady state when the rate of formation of the intermediates (e.g., B) equals their rate of consumption. As indicated by the unequal length of the arrows, the individual reversible reactions constituting a metabolic pathway do not reach equilibrium. Moreover, the concentrations of the intermediates at steady state can differ from what they would be at equilibrium.
of a ligand (e.g., the hormone insulin or adrenaline) to its receptor on the surface of a cell, which triggers an intracellular signaling pathway (see Chapter 15). Another example is the binding of a protein to a specific sequence of bases in a molecule of DNA, which frequently causes the expression of a nearby gene to increase or decrease (see Chapter 9). If the equilibrium constant for a binding reaction is known, the stability of the resulting complex can be predicted. To illustrate the general approach for determining the concentration of noncovalently associated complexes, let’s calculate the extent to which a protein (P) is bound to DNA (D), forming a protein-DNA complex (PD): ZZ P+DY ZX Z PD Most commonly, binding reactions are described in terms of the dissociation constant (Kd), which is the reciprocal of the equilibrium constant. For this binding reaction, the dissociation constant is calculated from the concentrations of the three components when they are at equilibrium by Kd =
rates of the forward and reverse reactions are equal. Within cells, however, many reactions are linked in pathways in which a product of one reaction is not simply reconverted via a reverse reaction to the reactants. For example, the product of one reaction might serve as a reactant in another, or it might be pumped out of the cell. In this more complex situation, the original reaction can never reach equilibrium because some of the products do not have a chance to be converted back to reactants. Nevertheless, in such nonequilibrium conditions, the rate of formation of a substance can be equal to the rate of its consumption, and as a consequence, the concentration of the substance remains constant over time. In such circumstances, the system of linked reactions for producing and consuming that substance is said to be in a steady state (Figure 2-23). One consequence of such linked reactions is that they prevent the accumulation of excess intermediates, protecting cells from the harmful effects of intermediates that are toxic at high concentrations. When the concentration of a product of an ongoing reaction is not changing over time, it might be a consequence of a state of equilibrium, or it might be a consequence of a steady state. In biological systems, when metabolite concentrations, such as blood glucose levels, are not changing with time—a condition called homeostasis—it is a consequence of a steady state rather than equilibrium.
Dissociation Constants of Binding Reactions Reflect the Affinity of Interacting Molecules The concept of equilibrium also applies to the binding of one molecule to another without covalent changes to either molecule. Many important cellular processes depend on such binding “reactions,” which involve the making and breaking of various noncovalent interactions rather than covalent bonds, as discussed above. A common example is the binding
[P][D] [PD]
(2-4)
It is worth noting that in such a binding reaction, when half of the DNA is bound to the protein ([PD] = [D]), the concentration of P is equal to Kd. The lower the Kd, the lower the concentration of P needed to bind to half of D. In other words, the lower the Kd, the tighter the binding (the higher the affinity) of P for D. Typically, a protein’s binding to a specific DNA sequence exhibits a Kd of 10−10 M, where M symbolizes molarity, or moles per liter (mol/L). To relate the magnitude of this dissociation constant to the intracellular ratio of bound to unbound DNA, let’s consider the simple example of a bacterial cell having a volume of 1.5 × 10−15 L and containing 1 molecule of DNA and 10 molecules of the DNA-binding protein P. In this case, given a Kd of 10−10 M and the total concentration of the P in the cell (∼111 × 10−10 M, about a hundredfold higher than the Kd), 99 percent of the time this specific DNA sequence will have a molecule of protein bound to it and 1 percent of the time it will not, even though the cell contains only 10 molecules of the protein! Clearly Pand D have a high affinity for each other and bind tightly, as reflected by the low value of the dissociation constant for their binding reaction. For protein-protein and protein-DNA binding, Kd values of ∼10−9 M (nanomolar) are considered to be tight, ∼10 −6 M (micromolar) modestly tight, and ∼10−3 M (millimolar) relatively weak. A large biological macromolecule, such as a protein, can have multiple binding surfaces for binding several molecules simultaneously (Figure 2-24). In some cases, these binding reactions are independent, with their own distinct K d values that are independent of each other. In other cases, binding of a molecule at one site on a macromolecule can change the three-dimensional shape, or conformation, of a distant site, thus altering the binding interactions of that distant site with some other molecule. The modifications 2.3 Chemical Reactions and Chemical Equilibrium
53
Multiligand binding macromolecule (e.g., protein) Ligand A (e.g., small protein) Ligand B (e.g., small molecule)
The concentration of hydrogen ions in a solution is expressed conventionally as its pH, defined as the negative log of the hydrogen ion concentration. The pH of pure water at 25 °C is 7: pH = −log [H + ] = log
Binding site A (K dA)
Binding site B (K dB)
Ligand C (e.g., polysaccharide)
Binding site C (K dC)
FIGURE 224 Macromolecules can have distinct binding sites for multiple ligands. A large macromolecule (e.g., a protein, blue) with three distinct binding sites (A–C) is shown; each of the three binding sites exhibit molecular complementarity to three different binding partners (ligands A–C) with distinct dissociation constants (KdA–C).
1 1 = log −7 = 7 [H + ] 10
It is important to keep in mind that a one-unit difference in pH represents a tenfold difference in the concentration of protons. On the pH scale, 7.0 is considered neutral: pH values below 7.0 indicate acidic solutions (higher [H+]), and values above 7.0 indicate basic, or alkaline, solutions (Figure 2-25). For instance, gastric juice, which is rich in hydrochloric acid (HCl), has a pH of about 1. Its [H+] is roughly 1-million-fold greater than that of cytoplasm, which has a pH of about 7.2–7.4. Although the cytosol of cells normally has a pH of about 7.2, the interior of certain organelles in eukaryotic cells (see Chapter 1) can have a much lower pH. The internal (luminal) fluid in lysosomes, for example, has a pH of about 4.5. Increasingly basic (lower H+ concentration)
of amino acid side chains—mentioned above—often contribute to the molecular shapes required for such binding interactions. These covalent and noncovalent binding reactions are important mechanisms by which one molecule can alter, and thus regulate, the structure and binding activity of another. We examine this regulatory mechanism in more detail in Chapter 3.
pH scale 14
Sodium hydroxide (1 N)
13 12
Household bleach Ammonia (1 N)
11 10
Biological Fluids Have Characteristic pH Values The solvent inside cells and in all extracellular fluids is water. An important characteristic of any aqueous solution is the concentration of positively charged hydrogen ions (H+) and negatively charged hydroxyl ions (OH−). Because these ions are the dissociation products of H2O, they are constituents of all living systems, and they are liberated by many reactions that take place between molecules within cells. These ions can also be transported into or out of cells, as when highly acidic gastric juice is secreted by cells lining the walls of the stomach. When a water molecule dissociates, one of its polar H−O bonds breaks. The resulting hydrogen ion, referred to as a proton, has a short lifetime as a free ion and quickly combines with a water molecule to form a hydronium ion (H3O+). For convenience, we refer to the concentration of hydrogen ions in a solution, [H+], even though this quantity really represents the concentration of hydronium ions, [H3O+]. Dissociation of H2O generates one OH− ion along with each H+. The dissociation of water is a reversible reaction: + − ZZ ZX H2O Y Z H + OH
At 25 °C, [H+][OH−] = 10−14 M2, so that in pure water, [H+] = [OH−] = 10−7 M. 54
CHAPTER 2
t Chemical Foundations
Seawater
9 Interior of cell Fertilized egg Unfertilized egg
8 Neutral [H+] = [OH−]
7 6
Urine
5 4 3
Interior of the lysosome Grapefruit juice
2 1
Gastric juice
Hydrochloric acid (1 N)
Increasingly acidic (greater H+ concentration)
FIGURE 225 Some pH values for common solutions. The pH of an aqueous solution is the negative log of the hydrogen ion concentration. The pH values for most intracellular and extracellular biological fluids are near 7 and are carefully regulated to permit the proper functioning of cells, organelles, and cellular secretions. The pH values for solutions of ammonia and hydrochloric acid are for one normal (1 N) solutions.
H2CO3
HCO3– + H+
H2CO3 Percentage of carbonic acid or bicarbonate molecules
The many degradative enzymes within lysosomes function optimally in an acidic environment, whereas their action is inhibited in the near neutral pH environment of the cytoplasm. As this example illustrates, maintenance of a particular pH is essential for the proper functioning of some cellular structures. On the other hand, dramatic shifts in cellular pH may play an important role in controlling cellular activity. For example, the pH of the cytoplasm of an unfertilized egg of the sea urchin, an aquatic animal, is 6.6. Within 1 minute of fertilization, however, the pH rises to 7.2; that is, the [H+] decreases to about one-fourth its original value, a change that is necessary for subsequent growth and division of the egg.
HCO3–
100
pK a = 6.4
50
2
4
6
7.4 8
pH
Hydrogen Ions Are Released by Acids and Taken Up by Bases In general, an acid is any molecule, ion, or chemical group that tends to release a hydrogen ion (H+), such as the carboxyl group (−COOH), which tends to dissociate to form the negatively charged carboxylate ion (−COO−); or hydrochloric acid (HCl). Conversely, a base is any molecule, ion, or chemical group that readily combines with an H+, such as the hydroxyl ion (OH−); ammonia (NH3), which forms an ammonium ion (NH4+); or the amino group (−NH2). When an acid is added to an aqueous solution, the [H+] increases, and the pH goes down. Conversely, when a base is added to a solution, the [H+] decreases, and the pH goes up. Because [H+][OH−] = 10−14 M2, any increase in [H+] is coupled with a commensurate decrease in [OH−], and vice versa. Many biological molecules contain both acidic and basic groups. For example, in neutral solutions (pH = 7.0), many amino acids exist predominantly in the doubly ionized form, in which the carboxyl group has lost a proton and the amino group has accepted one:
NH3 H
C
COO
R
where R represents the uncharged side chain. Such a molecule, containing an equal number of positive and negative ions, is called a zwitterion. Zwitterions, having no net charge, are neutral. At extreme pH values, only one of these two ionizable groups of an amino acid is charged: the −NH2+ at low pH and the −COO− at high pH. The dissociation reaction for an acid (or acid group in a + − ZX ZZ larger molecule) HA can be written as HA Y ZH + A . The equilibrium constant for this reaction, denoted Ka (the subscript a stands for “acid”), is defined as Ka = [H+][A−]/ [HA]. Taking the logarithm of both sides and rearranging the result yields a very useful relation between the equilibrium constant and pH: pH = pKa + log where pKa equals −log Ka.
[A − ] [HA]
(2-5)
FIGURE 226 The relationship between pH, pKa, and the dissociation of an acid. As the pH of a solution of carbonic acid rises from 0 to 8.5, the percentage of the compound in the undissociated, or un-ionized, form (H2CO3) decreases from 100 percent and that of the ionized form increases from 0 percent. When the pH (6.4) is equal to the acid’s pKa, half of the carbonic acid has ionized. When the pH rises to above 8, virtually all of the acid has ionized to the bicarbonate form (HCO3−).
From this expression, commonly known as the Henderson-Hasselbalch equation, it can be seen that the pKa of any acid is equal to the pH at which half the molecules are dissociated and half are neutral (undissociated). This is because when [A−] = [HA], then log ([A−]/[HA]) = 0, and thus pKa = pH. The Henderson-Hasselbalch equation allows us to calculate the degree of dissociation of an acid—that is, the ratio of dissociated and undissociated forms—if both the pH of the solution and the pKa of the acid are known. Experimentally, by measuring the [A−] and [HA] as a function of the solution’s pH, one can calculate the pKa of the acid and thus the equilibrium constant Ka for the dissociation reaction (Figure 2-26). Knowing the pKa of a molecule not only provides an important description of its properties, but also allows us to exploit these properties to manipulate the acidity of an aqueous solution and to understand how biological systems control this critical characteristic of their aqueous fluids.
Buffers Maintain the pH of Intracellular and Extracellular Fluids A living, actively metabolizing cell must maintain a constant pH in the cytoplasm of about 7.2–7.4, and it must do so even as its metabolism is producing many acids. Cells have a reservoir of weak bases and weak acids, called buffers, which ensure that the cell’s cytoplasmic pH remains relatively constant despite small fluctuations in the amounts of H+ or OH− being generated by metabolism or by the uptake or secretion of molecules and ions by the cell. Buffers do this by “soaking up” excess H+ or OH− when these ions are added to the cell or are produced by metabolism. As we shall see below, buffers are most effective at preventing changes in pH when the pH of the solution is similar to the pKa of the buffer. 2.3 Chemical Reactions and Chemical Equilibrium
55
CH3COOH
CH3COO − + H +
6
pH
pK a = 4.75 4
2
0.2
0.4
0.6
0.8
1.0
Fraction of dissociated CH3COOH Added OH−
FIGURE 227 The titration curve of the buffer acetic acid (CH3COOH). The pKa for the dissociation of acetic acid to hydrogen and acetate ions is 4.75. At this pH, half the acid molecules are dissociated. Because pH is measured on a logarithmic scale, the solution changes from 91 percent CH3COOH at pH 3.75 to 9 percent CH3COOH at pH 5.75. The acid has maximum buffering capacity in this pH range.
If additional acid (or base) is added to a buffered solution whose pH is equal to the pKa of the buffer ([HA] = [A−]), the pH of the solution changes, but it changes less than it would if the buffer had not been present. This is because protons released by the added acid are taken up by the ionized form of the buffer (A−); likewise, hydroxyl ions generated by the addition of a base are neutralized by protons released by the undissociated buffer (HA). The capacity of a buffer or any other substance to release hydrogen ions or take them up depends partly on the extent to which the substance has already taken up or released protons, which in turn depends on the pH of the solution relative to the pKa of the substance. The ability of a buffer to minimize changes in pH, its buffering capacity, depends on the concentration of the buffer and the relationship between its pKa value and the pH, which is expressed by the Henderson-Hasselbalch equation. The titration curve for acetic acid shown in Figure 2-27 illustrates the effect of pH on the fraction of molecules in the un-ionized (HA) and ionized forms (A−). When the pH is equal to the pKa, half of the acetic acid is dissociated (dashed lines). At one pH unit below the pKa of an acid, 91 percent of the molecules are in the HA form; at one pH unit above the pKa, 91 percent are in the A− form. At pH values more than one unit above or below the pKa (unshaded regions in Figure 2-27), the buffering capacity of weak acids and bases declines rapidly. In other words, the addition of the same number of moles of base—for example, hydroxyl ions added as sodium hydroxide (NaOH)—to a solution containing a 56
CHAPTER 2
t Chemical Foundations
mixture of HA and A− that is at a pH near the pKa will cause less of a pH change than it would if the HA and A− were not present or if the pH were far from the pKa value. All biological systems contain one or more buffers. Phosphate ions, the ionized forms of phosphoric acid, are present in considerable quantities in cells and are important in maintaining, or buffering, the pH of the cytoplasm. Phosphoric acid (H3PO4) has three protons that are capable of dissociating, but they do not dissociate simultaneously. Loss of each proton can be described by a discrete dissociation reaction and pKa, as shown in Figure 2-28. When hydroxyl ions are added to a solution of phosphoric acid, the pH change is much less steep at pH values near the three pKa values (shaded region) than when the pH of the solution is not similar to any of the pKas. The titration curve for phosphoric acid shows that the pKa for the dissociation of the second proton is 7.2. Thus, at pH 7.2, about 50 percent of cellular phosphate is H2PO4− and about 50 percent is HPO42− according to the Henderson-Hasselbalch equation. For this reason, phosphate is an excellent buffer at pH values around 7.2, the approximate pH of the cytoplasm of cells, and at pH 7.4, the pH of human blood. The amino (lysine), guanidinium (arginine), and carboxylate (aspartate, glutamate) portions of amino acid side chains of proteins as well as the amino and carboxylate groups at the N- and C-termini of proteins can also bind and release protons. Thus proteins that are present in high concentrations inside of cells and in many extracellular fluids can themselves serve as buffers.
14
pKa = 12.7 HPO 42−
12
PO 43− + H+
10 8 pH
8
pKa = 7.2
H2PO4−
HPO 42 − + H+
6 4 pKa = 2.1 2
H3PO4
H2PO 4− + H+
0 Added OH−
FIGURE 228 The titration curve of phosphoric acid (H3PO4), a common buffer in biological systems. This biologically ubiquitous molecule has three hydrogen atoms that dissociate at different pH values; thus phosphoric acid has three pKa values, as noted on the graph. The shaded areas denote the pH ranges—within one pH unit of the three pKa values—where the buffering capacity of phosphoric acid is high. In these regions, the addition of an acid (or base) will cause relatively small changes in the pH.
KEY CONCEPTS OF SECTION 2.3
Chemical Reactions and Chemical Equilibrium r A chemical reaction is at equilibrium when the rate of the forward reaction is equal to the rate of the reverse reaction, and thus there is no net change in the concentration of the reactants or products. r The equilibrium constant Keq of a reaction reflects the ratio of products to reactants at equilibrium and thus is a measure of the extent of the reaction and the relative stabilities of the reactants and products. r The Keq depends on the temperature, pressure, and chemical properties of the reactants and products but is independent of the reaction rate and of the initial concentrations of reactants and products. r For any reaction, the equilibrium constant Keq equals the ratio of the forward rate constant to the reverse rate constant (kf/kr). The rates of conversion of reactants to products and vice versa depend on the rate constants and the concentrations of the reactants or products. r Within cells, the linked reactions in metabolic pathways generally are not at equilibrium, but rather at steady state, at which the rate of formation of the intermediates equals their rate of consumption (see Figure 2-23) and thus the concentrations of the intermediates are not changing. r The dissociation constant Kd for the noncovalent binding of two molecules is a measure of the stability of the complex formed between the molecules (e.g., ligand-receptor or protein-DNA complexes). Kd values of ∼10−9 M (nanomolar) are considered to be tight, ∼10−6 M (micromolar) modestly tight, and ∼10−3 M (millimolar) relatively weak. r The pH is the negative logarithm of the concentration of hydrogen ions (−log [H+]). The pH of the cytoplasm is normally about 7.2–7.4, whereas the interior of lysosomes has a pH of about 4.5. r Acids release protons (H+), and bases bind them. r Buffers are mixtures of a weak acid (HA) and its corresponding base form (A−), which minimize the change in pH of a solution when an acid or base is added. Biological systems use various buffers to maintain their pH within a very narrow range.
2.4 Biochemical Energetics The transformation of energy, its storage, and its use are central to the economy of the cell. Energy may be defined as the ability to do work, a concept that is as applicable to cells as to automobile engines and electric power plants. The energy stored within chemical bonds can be harnessed to support chemical work and the physical movements of cells.
In this section, we review how energy influences the extents of chemical reactions (chemical thermodynamics) and the rates of chemical reactions (chemical kinetics).
Several Forms of Energy Are Important in Biological Systems There are two principal forms of energy: kinetic and potential. Kinetic energy is the energy of movement—the motion of molecules, for example. Potential energy is stored energy—the energy stored in covalent bonds, for example. Potential energy plays a particularly important role in the energy economy of cells. Thermal energy, or heat, is a form of kinetic energy—the energy of the motion of molecules. For heat to do work, it must flow from a region of higher temperature—where the average speed of molecular motion is greater—to one of lower temperature. Although differences in temperature can exist between the internal and external environments of cells, these thermal gradients do not usually serve as the source of energy for cellular activities. The thermal energy in warm-blooded animals, which have evolved a mechanism for thermoregulation, is used chiefly to maintain constant organismal temperatures. This is an important homeostatic function because the rates of many cellular activities are temperature dependent. For example, cooling mammalian cells from their normal body temperature of 37 °C to 4 °C can virtually “freeze” or stop many cellular processes (e.g., intracellular membrane movements). Radiant energy, the kinetic energy of photons, or waves of light, is critical to biology. Radiant energy can be converted to thermal energy, for instance, when light is absorbed by molecules and the energy is converted to molecular motion. Radiant energy absorbed by molecules can also change the electronic structure of the molecules by moving electrons into higher-energy orbitals, whence it can later be recovered to perform work. For example, during photosynthesis, light energy absorbed by pigment molecules such as chlorophyll is subsequently converted into the energy of chemical bonds (see Chapter 12). Mechanical energy, a major form of kinetic energy in biology, usually results from the conversion of stored chemical energy. For example, changes in the lengths of cytoskeletal filaments generate forces that push or pull on membranes and organelles (see Chapters 17 and 18). Electric energy—the energy of moving electrons or other charged particles—is yet another major form of kinetic energy, one with particular importance to membrane function, as in electrically active neurons (see Chapter 22). Several forms of potential energy are biologically significant. Central to biology is chemical potential energy, the energy stored in the bonds connecting atoms in molecules. Indeed, most of the biochemical reactions described in this book involve the making or breaking of at least one covalent chemical bond. In general, energy must be expended to make covalent bonds in typical biomolecules, and energy is released when those bonds are broken. For example, the 2.4 Biochemical Energetics
57
high potential energy in the covalent bonds of glucose can be released by controlled enzymatic combustion in cells (see Chapter 12). This energy is harnessed by the cell to do many kinds of work. A second biologically important form of potential energy is the energy in a concentration gradient. When the concentration of a substance on one side of a barrier, such as a membrane, is different from that on the other side, a concentration gradient exists. All cells form concentration gradients between their interior and the external fluids by selectively exchanging nutrients, waste products, and ions with their surroundings. Furthermore, the fluids within organelles in cells (e.g., mitochondria, lysosomes) frequently contain different concentrations of ions and other molecules than the cytoplasm; the concentration of protons within a lysosome, as we saw in the last section, is about 500 times that in the cytoplasm. Concentration gradients of protons across membranes are an important driver of energy production in mitochondria. A third form of potential energy in cells is an electric potential—the energy of charge separation. For instance, there is a gradient of electric charge of about 200,000 volts per centimeter across the plasma membranes of virtually all cells. We discuss how concentration gradients and electric potential gradients are generated and maintained in Chapter11 and how they are converted to chemical potential energy in Chapter 12.
Cells Can Transform One Type of Energy into Another According to the first law of thermodynamics, energy is neither created nor destroyed, but can be converted from one form to another. (In nuclear reactions, mass is converted to energy, but this is irrelevant in biological systems.) Energy conversions are very important in biology. In photosynthesis, for example, the radiant energy of light is transformed into the chemical potential energy of the covalent bonds between the atoms in a sucrose or starch molecule. In muscles and nerves, chemical potential energy stored in covalent bonds is transformed, respectively, into the kinetic energy of muscle contraction and the electric energy of neural transmission. In all cells, potential energy—released by breaking certain chemical bonds—is used to generate potential energy in the form of concentration and electric potential gradients. Similarly, energy stored in chemical concentration gradients or electric potential gradients is used to synthesize chemical bonds or to transport molecules from one side of a membrane to another to generate a concentration gradient. The latter process occurs during the transport of nutrients such as glucose into certain cells and the transport of many waste products out of cells. Because all forms of energy are interconvertible, they can be expressed in the same units of measurement. Although the standard unit of energy is the joule, biochemists have traditionally used an alternative unit, the calorie (1 joule = 0.239 calorie). A calorie is the amount of energy required
58
CHAPTER 2
t Chemical Foundations
to raise the temperature of one gram of water by 1 °C. Throughout this book, we use the kilocalorie to measure energy changes (1 kcal = 1000 cal). When you read or hear about the “Calories” in food (note the capital C), the reference is almost always to kilocalories as defined here.
The Change in Free Energy Determines If a Chemical Reaction Will Occur Spontaneously Chemical reactions can be divided into two types, depending on whether energy is absorbed or released in the process. In an exergonic (“energy-releasing”) reaction, the products contain less energy than the reactants. Exergonic reactions take place spontaneously. The liberated energy is usually released as heat (the energy of molecular motion) and generally results in a rise in temperature, as in the oxidation (burning) of wood. In an endergonic (“energy-absorbing”) reaction, the products contain more energy than the reactants, and energy is absorbed during the reaction. If there is no external source of energy to drive an endergonic reaction, it cannot take place. Endergonic reactions are responsible for the ability of the instant cold packs often used to treat injuries to rapidly cool below room temperature. Crushing the pack mixes the reactants, initiating the reaction. A fundamentally important concept in understanding if a reaction is exergonic or endergonic, and therefore if it occurs spontaneously or not, is free energy (G), or Gibbs free energy, named after J. W. Gibbs. Gibbs, who received the first PhD in engineering in America in 1863, showed that “all systems change in such a way that free energy [G] is minimized.” In other words, a chemical reaction occurs spontaneously when the free energy of the products is lower than the free energy of the reactants. In the case of a chemical ZZ ZX reaction, reactants Y Z products, the free-energy change, ΔG, is given by ΔG = Gproducts − Greactants The relation of ΔG to the direction of any chemical reaction can be summarized in three statements: r If ΔG is negative, the forward reaction will tend to occur spontaneously, and energy usually will be released as the reaction takes place (exergonic reaction) (Figure 2-29). A reaction with a negative ΔG is referred to as thermodynamically favorable. r If ΔG is positive, the forward reaction will not occur spontaneously; energy will have to be added to the system in order to force the reactants to become products (endergonic reaction). r If ΔG is zero, both forward and reverse reactions will occur at equal rates, and there will be no spontaneous net conversion of reactants to products, or vice versa; the system is at equilibrium. By convention, the standard free-energy change of a reaction (ΔG°′) is the value of the change in free energy at
(a)
(b) Exergonic
Endergonic
ΔG < 0 Products
Products
Free energy, G
Free energy, G
Reactants
ΔG > 0 Reactants
Progress of reaction
Progress of reaction
FIGURE 229 Changes in the free energy (ΔG) of exergonic and endergonic reactions. (a) In exergonic reactions, the free energy of the products is less than that of the reactants. Consequently, these reactions occur spontaneously, and energy is released as the reactions proceed. (b) In endergonic reactions, the free energy of the products is greater than that of the reactants, and these reactions do not occur spontaneously. An external source of energy must be supplied if the reactants are to be converted into products.
298 K (25 °C), 1 atm pressure, pH 7.0 (as in pure water), and initial concentrations of 1 M for all reactants and products except protons, which are kept at 10−7 M (pH 7.0). Most biological reactions differ from these standard conditions, particularly in the concentrations of reactants, which are normally less than 1 M. The free energy of a chemical system can be defined as G = H − TS, where H is the bond energy, or enthalpy, of the system; T is its temperature in degrees Kelvin (K); and S is the entropy, a measure of its randomness or disorder. According to the second law of thermodynamics, the natural tendency of any isolated system is to become more disordered—that is, for entropy to increase. A reaction can occur spontaneously only if the combined effects of changes in enthalpy and entropy lead to a lower ΔG. That is, if temperature remains constant, a reaction proceeds spontaneously only if the free-energy change, ΔG, in the following equation is negative: ΔG = ΔH − TΔS
(2-6)
In an exothermic (“heat-releasing”) chemical reaction, ΔH is negative. In an endothermic (“heat-absorbing”) reaction, ΔH is positive. The combined effects of the changes in the enthalpy and entropy determine if the ΔG for a reaction is positive or negative, and thus if the reaction occurs spontaneously. An exothermic reaction (ΔH < 0), in which entropy increases (ΔS > 0), occurs spontaneously (ΔG < 0). An endothermic reaction (ΔH > 0) will occur spontaneously
if ΔS increases enough so that the TΔS term can overcome the positive ΔH. Many biological reactions lead to an increase in order and thus a decrease in entropy (ΔS < 0). An obvious example is the reaction that links amino acids to form a protein. A solution of protein molecules has a lower entropy than does a solution of the same amino acids unlinked because the free movement of any amino acid is more restricted (greater order) when it is bound into a long chain than when it is not. Thus, when cells synthesize polymers such as proteins from their constituent monomers, the polymerizing reaction will be spontaneous only if the cells can efficiently transfer energy to both generate the bonds that hold the monomers together and overcome the loss in entropy that accompanies polymerization. Often cells accomplish this feat by “coupling” such synthetic, entropy-lowering reactions with independent reactions that have a very highly negative ΔG, such as the hydrolysis of nucleoside triphosphates (see below). In this way, cells can convert sources of energy in their environment into the highly organized structures and metabolic pathways that are essential for life. The actual change in free energy during a reaction is influenced by temperature, pressure, and the initial concentrations of reactants and products, so it usually differs from the standard free-energy change ΔG°′. Most biological reactions—like others that take place in aqueous solutions— are also affected by the pH of the solution. We can estimate free-energy changes for temperatures and initial concentrations that differ from the standard conditions by using the equation ΔG = ΔG°′ + RT ln Q = ΔG°′ + RT ln
[products] [reactants]
(2-7)
where R is the gas constant of 1.987 cal/(degree·mol), T is the temperature (in degrees Kelvin), and Q is the initial ratio ZZ of products to reactants. For a reaction A + B Y ZX Z C, in which two molecules combine to form a third, Q in Equation 2-7 equals [C]/[A][B]. In this case, an increase in the initial concentration of either [A] or [B] will result in a larger negative value for ΔG and thus drive the reaction toward spontaneous formation of C. Regardless of the ΔG°′ of a particular biochemical reaction, it will proceed spontaneously within cells only if ΔG is negative given the intracellular concentrations of reactants and products. For example, the conversion of glyceraldehyde 3-phosphate (G3P) to dihydroxyacetone phosphate (DHAP), two intermediates in the breakdown of glucose, ZZ G3P Y ZX Z DHAP has a ΔG°′ of −1840 cal/mol. If the initial concentrations of G3P and DHAP are equal, then ΔG = ΔG°′ because RT ZZ ln =0; in this situation, the reversible reaction G3P Y ZX Z DHAP will proceed spontaneously in the direction of DHAP formation until equilibrium is reached. However, if the initial [DHAP] is 0.1 M and the initial [G3P] is 0.001 M, with
2.4 Biochemical Energetics
59
The ΔG°′ of a Reaction Can Be Calculated from Its Keq A chemical mixture at equilibrium is in a stable state of minimal free energy. For a system at equilibrium (ΔG = 0, Q = Keq) under standard conditions, we can write ΔG°′ = −2.3RT log Keq = −1362 log Keq
(2-8)
(note the change to base 10 logarithms). Thus, if we determine the concentrations of reactants and products at equilibrium (i.e., the Keq), we can calculate the value of ΔG°′. For example, the Keq for the interconversion of glyceraldehyde 3-phosphate ZZ to dihydroxyacetone phosphate (G3P Y ZX Z DHAP) is 22.2 under standard conditions. Substituting this value into Equation 2-8, we can easily calculate the ΔG°′ for this reaction as −1840 cal/mol. By rearranging Equation 2-8 and taking the antilogarithm, we obtain Keq = 10 −(ΔG°′ 2.3RT)
(2-9)
From this expression, it is clear that if ΔG°′ is negative, the exponent will be positive, and hence Keq will be greater than 1. Therefore, at equilibrium there will be more products than reactants; in other words, the formation of products from reactants is favored. Conversely, if ΔG°′ is positive, the exponent will be negative, and Keq will be less than 1. The relationship between Keq and ΔG°′ further emphasizes the influence of the relative free energies of reactants and products on the extent to which a reaction will occur spontaneously.
The Rate of a Reaction Depends on the Activation Energy Necessary to Energize the Reactants into a Transition State As a chemical reaction proceeds, reactants approach each other; some bonds begin to form while others begin to break. One way to think of the state of the molecules during this transition is that there are strains in the electronic configurations of the atoms and their bonds. The collection of atoms 60
CHAPTER 2
t Chemical Foundations
moves from the relatively stable state of the reactants to this transient, intermediate, and higher-energy state during the course of the reaction (Figure 2-30). The state during a chemical reaction at which the system is at its highest energy level is called the transition state, and the collection of reactants in that state is called the transition-state intermediate. The energy needed to excite the reactants to this higher-energy state is called the activation energy of the reaction. The activation energy is usually represented by ΔG‡, which is analogous to the representation of the change in Gibbs free energy (ΔG) already discussed. From the transition state, the collection of atoms can either release energy as the reaction products are formed or release energy as the atoms go “backward” and re-form the original reactants. The velocity (V) at which products are generated from reactants during the reaction under a given set of conditions (temperature, pressure, reactant concentrations) will depend on the concentration of material in the transition state, which in turn will depend on the activation energy, and on the characteristic rate constant (v) at which the material in the transition state is converted to products. The higher the activation energy, the lower the fraction of reactants that reach the transition state, and the slower the overall rate of the reaction. The relationship between the concentration of reactants, v, and V is ‡
V = v [reactants] × 10 −(ΔG 2.3RT) From this equation, we can see that lowering the activation energy—that is, decreasing the free energy of the transition
Transition state (uncatalyzed) ΔG≠ uncat. Free energy, G
other conditions standard, then Q in Equation 2-7 equals 0.1/0.001 = 100, giving a ΔG of +887 cal/mol. Under these conditions, the reaction will proceed in the direction of formation of G3P. The ΔG of a reaction is independent of the reaction rate. Indeed, under normal physiological conditions, few, if any, of the biochemical reactions needed to sustain life would occur without some mechanism for increasing reaction rates. As we describe below and in more detail in Chapter 3, the rates of reactions in biological systems are usually determined by the activity of enzymes, the protein catalysts that accelerate the formation of products from reactants without altering the value of ΔG.
Transition state (catalyzed) ΔG≠ cat.
Reactants Products Progress of reaction
FIGURE 230 Activation energy of uncatalyzed and catalyzed chemical reactions. This hypothetical reaction pathway (blue) depicts the changes in free energy, G, as a reaction proceeds. A reaction will take place spontaneously if the free energy (G) of the products is less than that of the reactants (ΔG < 0). However, all chemical reactions proceed through one (shown here) or more high-energy transition states, and the rate of a reaction is inversely proportional to the activation energy (ΔG‡), which is the difference in free energy between the reactants and the transition state. In a catalyzed reaction (red), the free energies of the reactants and products are unchanged, but the free energy of the transition state is lowered, thus increasing the velocity of the reaction.
state ΔG‡—leads to an acceleration of the overall reaction rate V. A reduction in ΔG‡ of 1.36 kcal/mol leads to a tenfold increase in the rate of the reaction, whereas a 2.72 kcal/ mol reduction increases the rate a hundredfold. Thus relatively small changes in ΔG‡ can lead to large changes in the overall rate of the reaction. Catalysts such as enzymes (discussed further in Chapter3) accelerate reaction rates by lowering the relative energy of the transition state and thus the activation energy required to reach it (see Figure 2-30). The relative energies of reactants and products determine if a reaction is thermodynamically favorable (negative ΔG), whereas the activation energy determines how rapidly products form—that is, the reaction kinetics. Thermodynamically favorable reactions will not occur at appreciable rates if the activation energies are too high.
Life Depends on the Coupling of Unfavorable Chemical Reactions with Energetically Favorable Ones Many processes in cells are energetically unfavorable (ΔG>0) and will not proceed spontaneously. Examples include the synthesis of DNA from nucleotides and the transport of a substance across the plasma membrane from a lower to a higher concentration. Cells can carry out an energyrequiring, or endergonic, reaction (ΔG1 > 0) by coupling it to an energy-releasing, or exergonic, reaction (ΔG2 < 0) if the sum of the two reactions has an overall net negative ΔG. ZZ ZX Suppose, for example, that the reaction A Y Z B + X has ZX ZZ a ΔG of +5 kcal/mol and that the reaction X Y Z Y + Z has a ΔG of −10 kcal/mol: ZZ (1)A Y ZX Z B + XΔG = +5 kcal mol Z Z X ΔG = −10 kcal mol (2)X YZZ Y + Z ZX ZZ Sum:A Y Z B + Y + Z ΔG°′ = −5 kcal mol In the absence of the second reaction, there would be much more A than B at equilibrium. However, because the conversion of X to Y + Z is such a favorable reaction, it will pull the first process toward the formation of B and the consumption of A. Energetically unfavorable reactions in cells are often coupled to the energy-releasing hydrolysis of ATP.
Hydrolysis of ATP Releases Substantial Free Energy and Drives Many Cellular Processes In almost all organisms, the nucleoside triphosphate adenosine triphosphate, or ATP (Figure 2-31), is the most important molecule for capturing, transiently storing, and subsequently transferring energy to perform work (e.g., biosynthesis, mechanical motion). Commonly referred to as a cell’s energy “currency,” ATP is a type of usable potential energy that cells can “spend” in order to power their activities. The storied history of ATP begins with its discovery in 1929, apparently simultaneously by Kurt Lohmann, who
NH2 C Phosphoanhydride bonds
O O
P
O O
C
HC
C
O
O
N CH N
N
O
P
O
N
P
O
CH2
O
O
H
H H
H Adenosine triphosphate (ATP)
HO
OH
H2O
NH2 C Phosphoanhydride bond
O O
P
O O
H
+
O
O
C
HC
C
O
N
P
N CH
O
P O
N
O
CH2
O
O
H
H H
H HO Inorganic phosphate (Pi)
N
OH
Adenosine diphosphate (ADP)
FIGURE 231 Hydrolysis of adenosine triphosphate (ATP). The two phosphoanhydride bonds (red) in ATP (top), which link the three phosphate groups, each have a ΔG°′ of about −7.3 kcal/mol for hydrolysis. Hydrolysis of the terminal phosphoanhydride bond by the addition of water results in the release of phosphate and generation of ADP. Hydrolysis of the phosphoanhydride bonds of ATP, especially the terminal one, is the source of energy that drives many energy-requiring reactions in biological systems.
was working with the great biochemist Otto Meyerhof in Germany and who published first, and by Cyrus Fiske and Yellapragada SubbaRow in the United States. Muscle contractions were shown to depend on ATP in the 1930s. The proposal that ATP is the main intermediary for the transfer of energy in cells is credited to Fritz Lipmann around 1941. Many Nobel Prizes have been awarded for the study of ATP and its role in cellular energy metabolism, and its importance in understanding molecular cell biology cannot be overstated. The useful energy in an ATP molecule is contained in phosphoanhydride bonds, which are covalent bonds formed from the condensation of two molecules of phosphate by the loss of water: O
O O
OH HO
P O
P O
O
O O
P O
O O
P
O H2O
O
2.4 Biochemical Energetics
61
As shown in Figure 2-31, an ATP molecule has two key phosphoanhydride (also called phosphodiester) bonds. Forming these bonds (represented here by the symbol ∼) in ATP requires an input of energy. When these bonds are hydrolyzed, or broken by the addition of water, that energy is released. Hydrolysis of a phosphoanhydride bond in each of the following reactions has a highly negative ΔG°′ of about −7.3 kcal/mol: Ap∼p∼p + H2O → Ap∼p + Pi + H + (ATP) (ADP) Ap∼p∼p + H2O → Ap + PPi + H + (ATP) (AMP) Ap∼p + H2O → Ap + Pi + H + (ADP) (AMP) Pi stands for inorganic phosphate (PO43−) and PPi for inorganic pyrophosphate, two phosphate groups linked by a phosphoanhydride bond. As the top two reactions show, the removal of a phosphate group from ATP leaves adenosine diphosphate (ADP), and the removal of a pyrophosphate group from ATP leaves adenosine monophosphate (AMP). A phosphoanhydride bond or other “high-energy bond” (commonly denoted by ∼) is not intrinsically different from other covalent bonds. High-energy bonds simply release substantial amounts of energy when hydrolyzed. For instance, the ΔG°′ for hydrolysis of a phosphoanhydride bond in ATP (–7.3 kcal/mol) is more than three times the ΔG°′ for hydrolysis of the phosphoester bond (red) in glycerol 3-phosphate (–2.2 kcal/mol): O HO
OH
P
O
CH2
CH
CH2OH
O
Glycerol 3-phosphate
A principal reason for this difference is that ATP and its hydrolysis products, ADP and Pi, are charged at neutral pH. During synthesis of ATP, a large amount of energy must be used to force the negative charges in ADP and Pi together. Conversely, this energy is released when ATP is hydrolyzed to ADP and Pi. In comparison, formation of the phosphoester bond between an uncharged hydroxyl in glycerol and Pi requires less energy, and less energy is released when this bond is hydrolyzed. Cells have evolved protein-mediated mechanisms for transferring the free energy released by hydrolysis of phosphoanhydride bonds to other molecules, thereby driving reactions that would otherwise be energetically unfavorable. For example, if the ΔG for the reaction B + C → D is positive but less than the ΔG for hydrolysis of ATP, the reaction can be driven to the right by coupling it to hydrolysis of the terminal phosphoanhydride bond in ATP. In one common mechanism of such energy coupling, some of the energy stored in this phosphoanhydride bond is transferred to one of the reactants (here, B) by the breaking of the bond 62
CHAPTER 2
t Chemical Foundations
in ATP and the formation of a covalent bond between the released phosphate group and that reactant. The phosphorylated intermediate generated in this way can then react with reactant C to form product D + Pi in a reaction that has an overall negative ΔG: B + Ap∼p∼p → B∼p + Ap∼p B∼p + C → D + Pi The overall reaction ZX ZZ B + C + ATP Y Z D + ADP + Pi is energetically favorable (ΔG < 0). Similarly, hydrolysis of GTP to GDP can provide energy to perform work, including the synthesis of ATP (see Chapter 12), but most often GTP hydrolysis is used to control cellular systems (e.g., protein synthesis, hormonal signaling) rather than as a source of energy. An alternative mechanism of energy coupling is to use the energy released by ATP hydrolysis to change the conformation of a molecule to an “energy-rich” stressed state. In turn, the energy stored as conformational stress can be released as the molecule “relaxes” back into its unstressed conformation. If this relaxation process can be coupled to another reaction, the released energy can be harnessed to drive cellular processes. As with many biosynthetic reactions, transport of molecules into or out of the cell often has a positive ΔG and thus requires an input of energy to proceed. Such simple transport reactions do not directly involve the making or breaking of covalent bonds; thus their ΔG°′ is 0. In the case of a substance moving into a cell, Equation 2-7 becomes ΔG = RT ln
[Cin] [Cout]
(2-10)
where [C in] is the initial concentration of the substance inside the cell and [Cout] is its concentration outside the cell. We can see from Equation 2-10 that ΔG is positive for transport of a substance into a cell against its concentration gradient (when [Cin] > [Cout]); the energy to drive such “uphill” transport is often supplied by the hydrolysis of ATP. Conversely, when a substance moves down its concentration gradient ([Cout]>[Cin]), ΔG is negative. Such “downhill” transport releases energy that can be coupled to an energyrequiring reaction, such as the movement of another substance uphill across a membrane or the synthesis of ATP itself (see Chapters 11 and 12).
ATP Is Generated During Photosynthesis and Respiration ATP is continuously being hydrolyzed to provide energy for many cellular activities. Some estimates suggest that humans daily hydrolyze a mass of ATP equal to their entire body weight. Clearly, to continue functioning, cells must
constantly replenish their ATP supply. Constantly replenishing ATP requires that cells obtain energy from their environment. For nearly all cells, the ultimate source of energy used to make ATP is sunlight. Some organisms can use sunlight directly. Through the process of photosynthesis, plants, algae, and certain photosynthetic bacteria trap the energy of sunlight and use it to synthesize ATP from ADP and Pi. Much of the ATP produced in photosynthesis is hydrolyzed to provide energy for the conversion of carbon dioxide to six-carbon sugars, a process called carbon fixation: 6 CO2 6 H2O
C6H12O6 6 O2 energy
The sugars made during photosynthesis are a source of food, and thus energy, for the photosynthetic organisms making them and for the non-photosynthetic organisms, such as animals, that consume the plants either directly or indirectly by eating other animals that have eaten the plants. In this way, sunlight is the direct or indirect source of energy for most organisms (see Chapter 12). In plants, animals, and nearly all other organisms, the free energy in sugars and other molecules derived from food is released in the processes of glycolysis and cellular respiration. During cellular respiration, energy-rich molecules in food (e.g., glucose) are oxidized to carbon dioxide and water. The complete oxidation of glucose, C6H12O6 + 6 O2 → 6 CO2 + 6 H2O has a ΔG°′ of −686 kcal/mol and is the reverse of photosynthetic carbon fixation. Cells employ an elaborate set of protein-mediated reactions to couple the oxidation of 1 molecule of glucose to the synthesis of as many as 30 molecules of ATP from 30 molecules of ADP. This oxygen-dependent (aerobic) degradation (catabolism) of glucose is the major pathway for generating ATP in all animal cells, all nonphotosynthetic plant cells, and many bacterial cells. Catabolism of fatty acids can also be an important source of ATP. We discuss the mechanisms of photosynthesis and cellular respiration in Chapter 12. Although light energy captured in photosynthesis is the primary source of chemical energy for cells, it is not the only source. Certain microorganisms that live in or around deepsea hydrothermal vents, where adequate sunlight is unavailable, derive the energy for converting ADP and Pi into ATP from the oxidation of reduced inorganic compounds. These reduced compounds originate deep in the earth and are released at the vents.
NAD+ and FAD Couple Many Biological Oxidation and Reduction Reactions In many chemical reactions, electrons are transferred from one atom or molecule to another; this transfer may or may not accompany the formation of new chemical bonds or the release of energy that can be coupled to other reactions. The loss of electrons from an atom or a molecule is called
oxidation, and the gain of electrons by an atom or a molecule is called reduction. An example of oxidation is the removal of electrons from the sulfhydryl group–containing side chains of two cysteine amino acids to form a disulfide bond, described above in Section 2.2. Electrons are neither created nor destroyed in a chemical reaction, so if one atom or molecule is oxidized, another must be reduced. For example, oxygen draws electrons from Fe2+ (ferrous) ions to form Fe3+ (ferric) ions, a reaction that occurs as part of the process by which carbohydrates are degraded in mitochondria. Each oxygen atom receives two electrons, one from each of two Fe2+ ions: 2 Fe2+ + 1 2 O2 → 2 Fe3+ + O2− Thus Fe2+ is oxidized and O2 is reduced. Such reactions in which one molecule is reduced and another is oxidized are often referred to as redox reactions. Oxygen is an electron acceptor in many redox reactions in cells under aerobic conditions. Many biologically important oxidation and reduction reactions involve the removal or addition of hydrogen atoms (protons plus electrons) rather than the transfer of isolated electrons on their own. The oxidation of succinate to fumarate, which occurs in mitochondria, is an example (Figure 2-32). Protons are soluble in aqueous solutions (as H3O+), but electrons are not, so they must be transferred directly from one atom or molecule to another without a water-dissolved intermediate. In this type of oxidation reaction, electrons are often transferred to small electroncarrying molecules, sometimes referred to as coenzymes. The most common of these electron carriers are NAD+ (nicotinamide adenine dinucleotide), which is reduced to NADH, and FAD (flavin adenine dinucleotide), which is reduced to FADH2 (Figure 2-33). The reduced forms of these coenzymes can transfer protons and electrons to other molecules, thereby reducing them. To describe redox reactions, such as the reaction of ferrous ion (Fe2+) and oxygen (O2), it is easiest to divide them into two half-reactions: Oxidation of Fe2+ :2 Fe2+ → 2 Fe3+ + 2 e − Reduction of O2:2 e − + 1 2 O2 → O2− O
O
C
O
C
O
H
C
H
C
H
H
C
H
C
H
C
O
C
O
O Succinate
2 e 2 H
O Fumarate
FIGURE 232 Conversion of succinate to fumarate. In this oxidation reaction, which occurs in mitochondria as part of the citric acid cycle, succinate loses two electrons and two protons. These protons and electrons are transferred to FAD, reducing it to FADH2. 2.4 Biochemical Energetics
63
(b)
(a)
Oxidized: FAD Oxidized: NAD H
H
O C + N
H
Reduced: NADH H
NH2 H 2e
Nicotinamide
Ribose
O C
H3C
N
2P
NAD H 2 e
H
H3C
N
H N
2 H 2 e
NH2
N
N
O
H
H3C
N
H3C
N H
Ribitol
N
O
Ribitol
2P
2P
2P
Adenosine
Adenosine FAD 2 H 2 e
+
H
H
Adenosine NADH
O N
Flavin
H
Ribose
Adenosine
Reduced: FADH 2 O
FADH2
FIGURE 233 The electron-carrying coenzymes NAD and FAD. (a) NAD+ (nicotinamide adenine dinucleotide) is reduced to NADH by the addition of two electrons and one proton simultaneously. In many biological redox reactions, a pair of hydrogen atoms (two protons and two electrons) is removed from a molecule. In some cases, one of the protons and both electrons are transferred to NAD+; the other
proton is released into solution. (b) FAD (flavin adenine dinucleotide) is reduced to FADH2 by the addition of two electrons and two protons, as occurs when succinate is converted to fumarate (see Figure 2-32). In this two-step reaction, addition of one electron together with one proton first generates a short-lived semiquinone intermediate (not shown), which then accepts a second electron and proton.
In this case, the reduced oxygen (O2−) readily reacts with two protons to form one water molecule (H2O). The readiness with which an atom or a molecule gains an electron is its reduction potential (E). The tendency to lose electrons, the oxidation potential, has the same magnitude as the reduction potential for the reverse reaction, but has the opposite sign. Reduction potentials are measured in volts (V) from an arbitrary zero point set at the reduction potential of the following half-reaction under standard conditions (25 °C, 1atm, and reactants at 1 M):
where n is the number of electrons transferred. Note that a redox reaction with a positive ΔE value will have a negative ΔG and thus will tend to proceed spontaneously from left to right.
reduction
ZZ H+ + e− Y ZX Z 1 2 H2 oxidation The value of E for a molecule or an atom under standard conditions is its standard reduction potential, E′0. A molecule or an ion with a positive E′0 has a higher affinity for electrons than the H+ ion does under standard conditions. Conversely, a molecule or ion with a negative E′0 has a lower affinity for electrons than the H+ ion does under standard conditions. Like the values of ΔG°′, standard reduction potentials may differ somewhat from those found under the conditions in a cell because the concentrations of reactants in a cell are not 1 M. In a redox reaction, electrons move spontaneously toward atoms or molecules having more positive reduction potentials. In other words, a molecule having a more negative reduction potential can transfer electrons spontaneously to, or reduce, a molecule with a more positive reduction potential. In this type of reaction, the change in electric potential ΔE is the sum of the reduction and oxidation potentials for the two half-reactions. The ΔE for a redox reaction is related to the change in free energy ΔG by the following expression: ΔG (cal mol) = −n (23,064) ΔE (volts) 64
CHAPTER 2
t Chemical Foundations
(2-11)
KEY CONCEPTS OF SECTION 2.4
Biochemical Energetics r The change in free energy, ΔG, is the most useful measure for predicting the potential of chemical reactions to occur spontaneously in biological systems. Chemical reactions tend to proceed spontaneously in the direction for which ΔG is negative. The magnitude of ΔG is independent of the reaction rate. A reaction with a negative ΔG is referred to as thermodynamically favorable. r The chemical free-energy change, ΔG°′, equals −2.3 RT log Keq. Thus the value of ΔG°′ can be calculated from the experimentally determined concentrations of reactants and products at equilibrium. r The rate of a reaction depends on the activation energy needed to energize reactants to a transition state. Catalysts such as enzymes speed up reactions by lowering the activation energy of the transition state. r A chemical reaction having a positive ΔG can proceed if it is coupled with a reaction having a negative ΔG of larger magnitude. r Many otherwise energetically unfavorable cellular processes are driven by the hydrolysis of phosphoanhydride bonds in ATP (see Figure 2-31). r Directly or indirectly, light energy captured by photosynthesis in plants, algae, and photosynthetic bacteria is the
ultimate source of chemical energy for nearly all cells on Earth.
unsaturated 48 van der Waals interaction 38
steady state 53 stereoisomer 34 transition state 60
r An oxidation reaction (loss of electrons) is always coupled with a reduction reaction (gain of electrons). r Biological oxidation and reduction reactions are often coupled by electron-carrying coenzymes such as NAD+ and FAD (see Figure 2-33). r Oxidation-reduction reactions with a positive ΔE have a negative ΔG and thus tend to proceed spontaneously.
Visit LaunchPad to access study tools and to learn more about the content in this chapter.
t t
"OBMZ[FUIF%BUB "EEJUJPOBMTUVEZUPPMT JODMVEJOHWJEFPT BOJNBUJPOT BOE RVJ[[FT
Key Terms acid 55 adenosine triphosphate (ATP) 32 α carbon atom (Cα) 42 amino acid 42 amphipathic 32 base 55 buffer 55 catalyst 52 chemical potential energy 57 covalent bond 33 dehydration reaction 41 dipole 35 dissociation constant (Kd) 53 disulfide bond 44 endergonic 58 endothermic 59 energy coupling 62 enthalpy (H) 59 entropy (S) 59 equilibrium constant (Keq) 52 exergonic 58 exothermic 59
fatty acids 48 ΔG (free-energy change) hydrogen bond 37 hydrophilic 32 hydrophobic 32 hydrophobic effect 39 ionic interactions 36 molecular complementarity 40 monomer 41 monosaccharide 46 noncovalent interactions 33 nucleoside 46 nucleotide 45 oxidation 63 pH 54 phosphoanhydride bond 61 phosphoglyceride 49 phospholipid 48 polar 34 polymer 41 redox reaction 63 reduction 63 saturated 48
58
Review the Concepts 1. The gecko is a reptile with an amazing ability to climb smooth surfaces, including glass. Geckos appear to stick to smooth surfaces via van der Waals interactions between septa on their feet and the smooth surface. How is this method of stickiness advantageous over covalent interactions? Given that van der Waals forces are among the weakest molecular interactions, how can the gecko’s feet stick so effectively? 2. The K+ channel is an example of a transmembrane protein (a protein that spans the phospholipid bilayer of the plasma membrane). What types of amino acids are likely to be found (a) lining the channel through which K+ passes, (b) in contact with the hydrophobic core of the phospholipid bilayer containing fatty acyl groups, (c) in the cytosolic domain of the protein, and (d) in the extracellular domain of the protein? 3. V-M-Y-F-E-N: This is the single-letter amino acid abbreviation for a peptide. What is the net charge of this peptide at pH 7.0? An enzyme called a protein tyrosine kinase can attach phosphates to the hydroxyl groups of tyrosine (Y). What is the net charge of the peptide at pH 7.0 after it has been phosphorylated by a tyrosine kinase? What is the likely source of phosphate used by the kinase for this reaction? 4. Disulfide bonds help to stabilize the three-dimensional structure of proteins. What amino acids are involved in the formation of disulfide bonds? Does the formation of a disulfide bond increase or decrease entropy (ΔS)? 5. In the 1960s, the drug thalidomide was prescribed to pregnant women to treat morning sickness. However, thalidomide caused severe limb defects in the children of some women who took the drug, and its use for morning sickness was discontinued. It is now known that thalidomide was administered as a mixture of two stereoisomeric compounds, one of which relieved morning sickness and the other of which was responsible for the birth defects. What are stereoisomers? Why might two such closely related compounds have such different physiological effects? 6. Name the compound shown below. O C 6
HN1 C
2
H2N O
O
3
N
5C 4
7
8 CH 9
C
N
N
O 5
O
P O
O
O
P O
O
O
P
O
O
CH2 4
O
H
H
H
1
H 3
OH
2
OH
Review the Concepts
65
Is this nucleotide a component of DNA, RNA, or both? Name one other function of this compound. 7. The chemical basis of blood-group specificity resides in the carbohydrates displayed on the surfaces of red blood cells. Carbohydrates have the potential for great structural diversity. Indeed, the structural complexity of the oligosaccharides that can be formed from four sugars is greater than that of the oligopeptides that can be formed from four amino acids. What properties of carbohydrates make this great structural diversity possible? 8. Calculate the pH of 1 L of pure water at equilibrium. How will the pH change after 0.008 moles of the strong base sodium hydroxide (NaOH) are dissolved in the water? Now, calculate the pH of a 50 mM aqueous solution of the weak acid 3-(N-morpholino) propane-1-sulfonic acid (MOPS)in which 61 percent of the solute is in its weak acid form and 39percent is in the form of MOPS’s corresponding base (the pKa for MOPS is 7.20). What is the final pH after 0.008 moles of NaOH are added to 1 L of this MOPS buffer? 9. Ammonia (NH3) is a weak base that under acidic conditions becomes protonated to the ammonium ion in the following reaction: NH3 + H + → NH4+ NH3 freely permeates biological membranes, including those of lysosomes. The lysosome is a subcellular organelle with a pH of about 4.5–5.0; the pH of cytoplasm is about 7.0. What is the effect on the pH of the fluid content of lysosomes when cells are exposed to ammonia? Note: Ammonium (NH4+) does not diffuse freely across membranes. 10. Consider the binding reaction L + R → LR, where L is a ligand and R is its receptor. When 1 × 10−3 M of L is added to a solution containing 5 × 10−2 M of R, 90 percent of the L binds to form LR. What is the Keq of this reaction? How will the Keq be affected by the addition of a protein that facilitates (catalyzes) this binding reaction? What is the dissociation equilibrium constant Kd? 11. What is the ionization state of phosphoric acid in the cytoplasm? Why is phosphoric acid such a physiologically important compound? 12. The ΔG°′ for the reaction X + Y → XY is −1000 cal/ mol. What is the ΔG at 25 °C (298 °Kelvin) starting with 0.01 M each of X, Y, and XY? Suggest two ways one could make this reaction energetically favorable. 13. According to health experts, saturated fatty acids, which come from animal fats, are a major factor contributing to coronary heart disease. What distinguishes a saturated fatty acid from an unsaturated fatty acid, and to what does the term saturated refer? Recently, trans unsaturated fatty acids,
66
CHAPTER 2
t Chemical Foundations
or trans fats, which raise total cholesterol levels in the body, have also been implicated in heart disease. How does the cis stereoisomer differ from the trans configuration, and what effect does the cis configuration have on the structure of the fatty acid chain? 14. Chemical modifications of amino acids contribute to the diversity and function of proteins. For instance, γ-carboxylation of specific amino acids is required to make some proteins biologically active. What particular amino acid undergoes this modification, and what is its biological relevance? Warfarin, a derivative of coumarin, which is present in many plants, inhibits γ-carboxylation of this amino acid and was used in the past as a rat poison. At present, it is also used clinically in humans. What patients might be prescribed warfarin and why?
References Alberty, R. A., and R. J. Silbey. 2005. Physical Chemistry, 4th ed. Wiley. Atkins, P., and J. de Paula. 2005. The Elements of Physical Chemistry, 4th ed. W. H. Freeman and Company. Berg, J. M., J. L. Tymoczko, and L. Stryer. 2007. Biochemistry, 6th ed. W. H. Freeman and Company. Cantor, P. R., and C. R. Schimmel. 1980. Biophysical Chemistry. W. H. Freeman and Company. Davenport, H. W. 1974. ABC of Acid-Base Chemistry, 6th ed. University of Chicago Press. Eisenberg, D., and D. Crothers. 1979. Physical Chemistry with Applications to the Life Sciences. Benjamin-Cummings. Guyton, A. C., and J. E. Hall. 2000. Textbook of Medical Physiology, 10th ed. Saunders. Hill, T. J. 1977. Free Energy Transduction in Biology. Academic Press. Klotz, I. M. 1978. Energy Changes in Biochemical Reactions. Academic Press. Murray, R. K., et al. 1999. Harper’s Biochemistry, 25th ed. Lange. Nicholls, D. G., and S. J. Ferguson. 1992. Bioenergetics 2. Academic Press. Oxtoby, D., H. Gillis, and N. Nachtrieb. 2003. Principles of Modern Chemistry, 5th ed. Saunders. Sharon, N. 1980. Carbohydrates. Sci. Am. 243(5):90–116. Tanford, C. 1980. The Hydrophobic Effect: Formation of Micelles and Biological Membranes, 2d ed. Wiley. Tinoco, I., K. Sauer, and J. Wang. 2001. Physical Chemistry— Principles and Applications in Biological Sciences, 4th ed. Prentice Hall. Van Holde, K., W. Johnson, and P. Ho. 1998. Principles of Physical Biochemistry. Prentice Hall. Voet, D., and J. Voet. 2004. Biochemistry, 3d ed. Wiley. Wood, W. B., et al. 1981. Biochemistry: A Problems Approach, 2d ed. Benjamin-Cummings.
CHAPTER
3 Protein Structure and Function Molecular ribbon model of a protein “needle” used by pathogenic bacteria to inject proteins into human cells to initiate infection. Many disease-causing bacteria, including Salmonella typhimurium (food poisoning) and Yersinia pestis (bubonic plague), use a syringe-like protein complex called a type III secretion system to inject proteins into their mammalian target cells. The structure of the needle portion of the syringe used by Salmonella typhimurium, determined using a combination of nuclear magnetic resonance (NMR), electron microscopy, and computational methods, is a long tube with many α helices (illustrated as coiled ribbons) forming the walls of the needle. [Data from A. Loquet et al., 2012, Nature 486:276, PDB ID 2lpz.]
Proteins, which are polymers of amino acids, come in many sizes and shapes. Their three-dimensional diversity principally reflects variations in their lengths and amino acid sequences. In general, the linear, unbranched polymer of amino acids composing any protein will fold into only one or a few closely related three-dimensional shapes—called conformations. The conformation of a protein, together with the distinctive chemical properties of its amino acid side chains, determines its function. In some cases, the conformation, and thus the function, of a protein can change when that protein noncovalently or covalently associates with other molecules. Because of their many different shapes and chemical properties, proteins can perform a dazzling array of distinct functions inside and outside cells that either are essential for life or provide a
selective evolutionary advantage to the cell or organism that contains them. It is, therefore, not surprising that characterizing the structures and activities of proteins is a fundamental prerequisite for understanding how cells work. Much of this textbook is devoted to examining how proteins act together to allow cells to live and function properly. Although their structures are diverse, most proteins can be grouped into one of a few broad functional classes. Structural proteins, for example, determine the shapes of cells and their extracellular environments and serve as guide wires or rails to direct the intracellular movement of molecules and organelles. They are usually formed by the assembly of multiple protein subunits into very large, long structures. Scaffold proteins bring other proteins together into ordered
OU TL I N E 3.1
Hierarchical Structure of Proteins
3.4
Regulating Protein Function
3.2
Protein Folding
3.5
Purifying, Detecting, and Characterizing Proteins
3.3
Protein Binding and Enzyme Catalysis
3.6
Proteomics
arrays to perform specific functions more efficiently than those proteins would if they were not assembled together. Enzymes are proteins that catalyze chemical reactions. Membrane transport proteins permit the flow of ions and molecules across cellular membranes. Regulatory proteins act as signals, sensors, and switches to control the activities of cells by altering the functions of other proteins and genes. Regulatory proteins include signaling proteins, such as the hormones and cell-surface receptors that transmit extracellular signals to the cell interior. Motor proteins are responsible for moving other proteins, organelles, cells—even whole organisms. Any one protein can be a member of more than one protein class, as is the case with some cell-surface signaling receptors that are both enzymes and regulator proteins because they transmit signals from outside to inside cells by catalyzing chemical reactions. To accomplish their diverse missions efficiently, some proteins assemble into large complexes, often called molecular machines. How do proteins perform so many diverse functions? They do so by exploiting a few simple activities. Most fundamentally, proteins bind—to one another, to other macromolecules such as DNA, and to small molecules and ions. In many cases, such binding induces a conformational change (a change in the three-dimensional structure) in the protein and thus influences its activity. Binding is based on molecular complementarity between a protein and its binding partner, as described in Chapter 2. A second key activity is enzymatic catalysis. Appropriate folding of a protein will place some amino acid side chains and some carboxyl and amino groups of its backbone into positions that permit the catalysis of covalent bond rearrangements. A third activity is folding into a channel or pore within a membrane through which molecules and ions can flow. Although these are especially crucial protein activities, they are not the only ones. For example, fish that live in frigid waters—the Antarctic borchs and Arctic cods—have antifreeze proteins in their circulatory systems to prevent water crystallization. A complete understanding of how proteins permit cells to live and thrive requires the identification and characterization of all the proteins used by a cell. In a sense, molecular cell biologists want to compile a complete protein “parts list” and construct a “user’s manual” that describes how these proteins work. Compiling a comprehensive inventory of proteins has become feasible in recent years with the sequencing of the entire genomes—complete sets of genes— of more and more organisms. From a computer analysis of a genome’s sequence, researchers can deduce the amino acid sequences and approximate number of the proteins it encodes (see Chapter 6). The term proteome was coined to refer to the entire protein complement of an organism. The human genome contains some 20,000–23,000 genes that encode proteins. However, variations in mRNA production, such as alternative splicing (see Chapter 10), and more than a hundred types of protein modifications may generate hundreds of thousands of distinct human proteins. By comparing the sequences and structures of proteins of unknown function with those of proteins of known function, scientists can often deduce much about what the unknown 68
CHAPTER 3
t Protein Structure and Function
proteins do. In the past, characterization of protein function by genetic, biochemical, or physiological methods often preceded the identification of particular proteins. In the modern genomic and proteomic era, a protein is usually identified before its function is determined. In this chapter, we begin our study of how the structure of a protein gives rise to its function, a theme that recurs throughout this book (Figure 3-1). The first section examines how linear chains of amino acid building blocks are arranged in a three-dimensional structural hierarchy. The next section discusses how proteins fold into these structures. We then turn to protein function, focusing on enzymes, those proteins that catalyze chemical reactions. Various mechanisms that cells use to control the activities and life spans of proteins are covered next. The chapter concludes with a discussion (a)
MOLECULAR STRUCTURE Primary (sequence)
Secondary (local folding)
Tertiary (overall conformation)
Supramolecular (large-scale assembly)
Quaternary (multimeric structure)
(b) Signaling
Regulation "off " "on" FUNCTION
Transport
Structure
Movement
Catalysis A
B
FIGURE 31 Overview of protein structure and function. (a)Proteins have a hierarchical structure. A polypeptide’s linear sequence of amino acids linked by peptide bonds (primary structure) folds into local helices or sheets (secondary structure) that pack into a complex three-dimensional shape (tertiary structure). Some individual polypeptides associate into multichain complexes (quaternary structure), which in some cases can be very large, consisting of tens to hundreds of subunits (supramolecular complexes). (b) Proteins perform numerous functions, including organizing the genome, organelles, cytoplasm, protein complexes, and membranes in three-dimensional space (structure); controlling protein activity (regulation); monitoring the environment and transmitting information (signaling); moving small molecules and ions across membranes (transport); catalyzing chemical reactions (via enzymes); and generating force for movement (via motor proteins). These functions and others arise from specific binding interactions and conformational changes in the structure of a properly folded protein.
of commonly used techniques for identifying, isolating, and characterizing proteins, and a discussion of the burgeoning field of proteomics.
3.1 Hierarchical Structure of Proteins In many proteins, the polymer chain folds into a distinct three-dimensional shape that is stabilized primarily by noncovalent interactions between regions in the linear sequence of amino acids. A key concept in understanding how proteins work is that function is often derived from three-dimensional structure, and three-dimensional structure is determined by both a protein’s amino acid sequence and intramolecular noncovalent interactions. The principles relating biological structure and function were initially formulated by the biologists Johann von Goethe (1749–1832), Ernst Haeckel (1834–1919), and D’Arcy Thompson (1860–1948), whose work has been widely influential in biology and beyond. Indeed, their ideas greatly influenced the school of “organic” architecture pioneered in the early twentieth century that is epitomized by the dicta “form follows function” (Louis Sullivan) and “form is function” (Frank Lloyd Wright). Here we consider the architecture of proteins at four levels of organization: primary, secondary, tertiary, and quaternary (Figure 3-2). (a) Primary structure
The Primary Structure of a Protein Is Its Linear Arrangement of Amino Acids As discussed in Chapter 2, proteins are polymers constructed out of 20 different types of amino acids. Individual amino acids are linked together in linear, unbranched chains by covalent amide bonds, called peptide bonds. Peptide bond formation between the amino group of one amino acid and the carboxyl group of another results in the net release of a water molecule and thus is a form of dehydration reaction (Figure 3-3a). The repeated amide N, α carbon (Cα), carbonyl C, and oxygen atoms of each amino acid residue form the backbone of a protein molecule from which the various sidechain groups project (Figure 3-3b, c). As a consequence of (a) H +H
3N
O O– + +H3N
Cα C R1
O O–
Cα C R2
H2O H +H N 3
H
O
Cα C N R1
(b) Secondary structure
– Ala – Glu – Val – Thr – Asp – Pro – Gly –
H
α helix
O O–
Cα C
H R2 Peptide bond
(b)
(c) Tertiary structure
H β sheet
+H
3N
O
H
H H
Cα C N
Cα C N C α
R1
R2 O
H
R3
O
H
O
C N
Cα
C
H
R4
Amino end (N-terminus)
O–
Carboxyl end (C-terminus)
Domain (c) aa1
(d) Quaternary structure
R1
aa2
R3
Peptide bond O
H H
H
N
Cα H
C
N
H
C
N
Cα
H
R2
Peptide bond
O
FIGURE 32 Four levels of protein hierarchy. (a) The linear sequence of amino acids linked together by peptide bonds is the primary structure. (b) Folding of the polypeptide chain into local α helices or β sheets represents secondary structure. (c) Secondary structural elements, together with various loops and turns in a single polypeptide chain, pack into a larger, independently stable tertiary structure, which may include distinct domains. (d) Some proteins consist of more than one polypeptide associated together in a quaternary structure.
aa3
Cα H
C
O
O
FIGURE 33 Structure of a polypeptide. (a) Individual amino acids are linked together by peptide bonds, which form via reactions that result in a loss of water (dehydration). R1, R2, etc., represent the side chains (“R groups”) of amino acids. (b) Linear polymers of peptidebond-linked amino acids are called polypeptides, which have a free amino end (N-terminus) and a free carboxyl end (C-terminus). (c) A ball-and-stick model shows peptide bonds (yellow) linking the amino nitrogen atom (blue) of one amino acid (aa) with the carbonyl carbon atom (gray) of an adjacent one in the chain. The R groups (green) extend from the α carbon atoms (black) of the amino acids. These side chains largely determine the distinct properties of individual proteins. 3.1 Hierarchical Structure of Proteins
69
the peptide linkage, the backbone exhibits directionality, usually referred to as an N-to-C orientation, because all the amino groups are located on the same side of the Cα atoms. Thus one end of a protein has a free (unlinked) amino group (the N-terminus), and the other end has a free carboxyl group (the C-terminus). The sequence of a protein chain is conventionally written with its N-terminal amino acid on the left and its C-terminal amino acid on the right, and the amino acids are numbered sequentially starting from the N-terminus. The primary structure of a protein is simply the linear covalent arrangement, or sequence, of the amino acid residues that compose it. The first primary structure of a protein determined was that of insulin in the early 1950s. Today the number of known sequences exceeds 10 million and is growing daily. Many terms are used to denote the chains formed by the polymerization of amino acids. A short chain of amino acids linked by peptide bonds and having a defined sequence is called an oligopeptide, or simply a peptide; longer chains are referred to as polypeptides. Peptides generally contain fewer than 20–30 amino acid residues, whereas polypeptides are often 200–500 residues long. The longest protein described to date is the muscle protein titin, some forms of which can be more than 34,000 residues long. We generally reserve the term protein for a polypeptide (or complex of polypeptides) that has a well-defined threedimensional structure. The size of a protein or a polypeptide is expressed either as its mass in daltons (a dalton is 1 atomic mass unit) or as its molecular weight (MW), which is a dimensionless number equal to the mass in daltons. For example, a 10,000-MW protein has a mass of 10,000 daltons (Da), or 10 kilodaltons (kDa). Later in this chapter, we will consider different methods for measuring the sizes and other physical characteristics of proteins. The precise molecular weight of a protein that has not been covalently modified is readily determined by summing up the weights of all of its constituent amino acids as determined from its amino acid sequence. The proteins encoded by the yeast genome, for example, have an average molecular weight of 52,728 and contain, on average, 466 amino acid residues. The average molecular weight of amino acids in proteins is 113, taking into account their average relative abundances. This value can be used to estimate the number of residues in a protein of unknown sequence if you know its molecular weight or, conversely, to estimate from the number of residues in a protein its likely molecular weight. Covalent modification of one or more amino acids in a protein—for example, by phosphorylation or glycosylation (see Chapters 2 and 13)—alters the mass of those residues and thus the mass of the protein in which they reside. How many proteins are there in a typical eukaryotic (nucleated) cell? Let’s do a simple calculation for one such cell, a hepatocyte (a major type of cell in the mammalian liver). This type of cell, roughly a cube 15 μm (0.0015 cm) on a side, has a volume of 3.4 × 10−9 cm3 (or milliliters, ml). Assuming a cell density of 1.03 g/ml, the cell would weigh 3.5×10−9g. Since protein accounts for approximately 20 percent
70
CHAPTER 3
t Protein Structure and Function
of a cell’s weight, the total weight of cellular protein is 7 × 10−10g. Assuming that an average protein has a molecular weight of 52,728 g/mol, we can calculate the total number of protein molecules per hepatocyte as about 7.9 × 109 from the total protein weight and Avogadro’s number, the number of molecules per mole of any chemical compound (6.02 ×1023). To carry this calculation one step further, consider that a hepatocyte contains about 10,000 different proteins; thus each cell, on average, would contain close to a million molecules of each type of protein. In fact, the abundances of different proteins vary widely, from the quite rare insulin-binding receptor protein (20,000 molecules per cell) to the structural protein actin (5 × 108 molecules per cell). Every cell closely regulates the abundance of each protein such that each is present in the appropriate quantity for its cellular functions at any given time. We will learn more about the mechanisms used by cells to regulate protein levels later in this chapter and in Chapters 9 and 10.
Secondary Structures Are the Core Elements of Protein Architecture The second level in the hierarchy of protein structure is secondary structure. Secondary structures are stable spatial arrangements of segments of a polypeptide chain held together by hydrogen bonds between backbone amide and carbonyl groups and often involving repeating structural patterns. The propensity of a segment of a polypeptide chain to form any given secondary structure depends on its amino acid sequence (see Section 3.2 below). A single polypeptide may contain multiple types of secondary structure in various portions of the chain, depending on its sequence. The principal secondary structures are the alpha (𝛂) helix, the beta (β) sheet, and the short U-shaped beta (𝛃) turn. Parts of the polypeptide that don’t form these structures but nevertheless have a well-defined, stable shape are said to have an irregular structure. The term random coil applies to highly flexible parts of a polypeptide chain that have no fixed threedimensional structure. In an average protein, 60 percent of the polypeptide chain exists as α helices and β sheets; the remainder of the molecule is in irregular structures, coils, and turns. Thus α helices and β sheets are the major internal supportive elements in most proteins. Here we explore the shapes of secondary structures and the forces that favor their formation. In later sections, we examine how arrays of secondary structure fold together into larger, more complex arrangements called tertiary structure. The 𝛂 Helix In a polypeptide segment folded into an α helix, the backbone forms a spiral structure in which the carbonyl oxygen atom of each peptide bond is hydrogen-bonded to the amide hydrogen atom of the amino acid four residues farther along the chain in the direction of the C-terminus (Figure3-4). Within an α helix, all the backbone amino and carboxyl groups are hydrogen-bonded to one another except at the very beginning and end of the helix. This periodic arrangement of bonds confers an amino-to-carboxy-terminal
Amino terminus H
R
N
H
H C O
R
N R
H N
C
H C
N R
O H
O
R
N
C
H C
N
O
R
H N
R
C O
C N
3.6 residues/turn H
R
O
N
C
R
O
O
C O
Carboxyl terminus
FIGURE 34 The 𝛂 helix, a common secondary structure in proteins. The polypeptide backbone (seen as a ribbon) is folded into a spiral that is held in place by hydrogen bonds between backbone oxygen and hydrogen atoms. Only hydrogens involved in bonding are shown. The outer surface of the helix is covered by the side-chain R groups (green).
directionality on the helix because all the hydrogen bond acceptors (i.e., the carbonyl groups) have the same orientation (pointing in the downward direction in Figure 3-4), resulting in a structure in which there is a complete turn of the spiral every 3.6 residues. An α helix 36 amino acids long has 10turns of the helix and is 5.4 nm long (0.54 nm per turn). The stable arrangement of hydrogen-bonded amino acids in the α helix holds the backbone in a straight, rodlike cylinder from which the side chains point outward. The relative hydrophobic or hydrophilic quality of a particular helix within a protein is determined entirely by the characteristics of the side chains. In water-soluble proteins, hydrophilic helices with polar side chains extending outward tend to be found on the outside surfaces, where they can interact with the aqueous environment, whereas hydrophobic helices with nonpolar, hydrophobic side chains tend to be buried within the core of the folded protein. Proteins embedded in the hydrophobic core of cellular membranes (see Chapter 7)
often use one or more hydrophobic helices that are 20–25 residues long to cross the membrane. The amino acid proline is usually not found in α helices because the covalent bonding of its amino group with a carbon in the side chain prevents its participation in stabilizing the backbone through normal hydrogen bonding. While the classic α helix is the most intrinsically stable and most common helical form in proteins, there are variations, such as more tightly or loosely twisted helices. For example, in a specialized helix called a coiled coil (described several sections farther on), the helix is more tightly wound (3.5 residues and 0.51 nm per turn). The 𝛃 Sheet Another type of secondary structure, the β sheet, consists of laterally packed β strands. Each β strand is a short (5–8-residue), nearly fully extended polypeptide segment. In contrast to the α helix, in which hydrogen bonds occur between the backbone amino and carboxyl groups of nearly adjacent residues, hydrogen bonds in the β sheet occur between backbone atoms in separate, but adjacent, β strands and are oriented perpendicularly to the chains of backbone atoms (Figure 3-5a). These distinct β strands (indicated as green and blue arrows in the figure) may be either within a single polypeptide chain, with short or long loops between the β strand segments, or on different polypeptide chains in a protein composed of multiple polypeptides. Figure3-5b shows how two or more β strands align into adjacent rows, forming a nearly two-dimensional β pleated sheet (or simply pleated sheet), in which hydrogen bonds within the plane of the sheet hold the β strands together as the side chains stick out above and below the plane. Like α helices, β strands have a directionality defined by the orientation of the peptide bonds. Therefore, in a pleated sheet, adjacent β strands can be oriented in alternating opposite (antiparallel) directions (see Figure 3-5a) or in the same (parallel) direction (Figure 3-5c). In some proteins, β sheets form part of the hydrophobic core of the protein (described below) or the side of an open space that binds other molecules; in some proteins embedded in membranes, the β sheets curve around and form a hydrophilic central pore through which ions and small molecules may flow (see Chapter 7). The 𝛃 Turn Composed of four residues, β turns are located on the surface of a protein, forming sharp bends that reverse the direction of the polypeptide backbone, often toward the protein’s interior. These short, U-shaped secondary structures are often stabilized by a hydrogen bond between their end residues (Figure 3-6). Glycine and proline are commonly found in β turns. The lack of a large side chain in glycine and the presence of a built-in bend in proline allow the polypeptide backbone to fold into a tight U shape. β Turns help long polypeptides fold into highly compact structures. A reversal in the direction of the polypeptide backbone may also be mediated by segments of the polypeptide that are longer than four residues and that form bends or loops. In contrast to tight β turns, which exhibit just a few well-defined conformations, longer loops can have many different conformations.
3.1 Hierarchical Structure of Proteins
71
(a) Top view
R
N
C R
R
R
N
C R
R
R
N
C R
(b) Side view
R
R
R R R
(c)
R
Cα
R R R
R R R
Anti-parallel
Parallel C
N
C
N
C
N
C
N
C
N
C
FIGURE 35 The 𝛃 sheet, another common secondary structure in proteins. (a) Top view of a three-stranded β sheet. Each strand is highlighted by a ribbon-like arrow with alternating blue and green segments that is pointed with an N-to-C orientation, with the loops of connecting residues indicated by thick black lines. In this antiparallel β sheet, each strand (arrow) points in the direction opposite to that of the adjacent strand. The stabilizing hydrogen bonds between the β strands are indicated by green dashed lines. (b) Side view of an antiparallel β sheet. The projection of the R groups (green) above and below the plane of the sheet is obvious in this view. The fixed bond angles in the polypeptide backbone produce a pleated contour represented in panel (a) by the alternating colored segments. (c) Top view of two β sheets, whose individual strands (N-to-C orientations represented by arrows) are either antiparallel, in which the strands alternately point in opposite directions (left), or parallel, in which all strands point in the same direction (right).
Tertiary Structure Is the Overall Folding of a Polypeptide Chain Tertiary structure refers to the overall conformation of a polypeptide chain—that is, the three-dimensional arrangement of all its amino acid residues. In contrast to secondary structures, which are stabilized only by hydrogen bonds, tertiary structure is stabilized primarily by hydrophobic interactions between nonpolar side chains, together with hydrogen
CHAPTER 3
t Protein Structure and Function
Cα
FIGURE 36 Structure of a 𝛃 turn. Composed of four residues, β turns reverse the direction of a polypeptide chain (resulting in a 180° Uturn). The Cα carbons of the first and fourth residues are usually less than 0.7 nm apart, and those residues are often linked by a hydrogen bond. β turns facilitate the folding of long polypeptides into compact structures.
R
R
R
R
N
72
Cα
Cα
R
bonds involving polar side chains and backbone amino and carboxyl groups. These stabilizing forces hold together elements of secondary structure—α helices, β strands, turns, and coils. Because the stabilizing interactions are often weak, however, the tertiary structure of a protein is not rigidly fixed, but undergoes continual minute fluctuations, and some segments within the tertiary structure of a protein can be so mobile that they are considered to be disordered—that is, lacking well-defined, stable, three-dimensional structure. This variation in structure has important consequences for the function and regulation of proteins. The chemical properties of amino acid side chains help define tertiary structure. In some proteins—for example, those that are secreted from cells or are cell-surface proteins that face the extracellular environment—disulfide bonds between the side chains of cysteine residues can covalently link regions of the proteins, thus restricting the proteins’ flexibility and increasing the stability of their tertiary structures. Amino acids with charged hydrophilic polar side chains tend to be on the outer surfaces of proteins; by interacting with water, they help to make the proteins soluble in aqueous solutions and can form noncovalent interactions with other water-soluble molecules, including other proteins. In contrast, amino acids with hydrophobic nonpolar side chains are usually sequestered away from the water-facing surfaces of a protein, in many cases forming a water-insoluble central core. This observation led to what’s known as the “oil drop model” of protein conformation because the core of a protein is relatively hydrophobic, or “oily” (Figure 3-7). Uncharged hydrophilic polar side chains are found both on the surface and in the inner core of proteins.
There Are Four Broad Structural Categories of Proteins Proteins usually fall into one of four broad structural categories based on their tertiary structure: globular proteins, fibrous proteins, integral membrane proteins, and intrinsically disordered
Most hydrophilic
Most hydrophobic
N
Unfolded protein
C
Folding
Unfolding
Surface Folded protein
Core
Remove several surface residues to reveal protein’s core
FIGURE 37 The oil drop model of protein folding. The hydrophobic and hydrophilic residues of a polypeptide chain can be distributed throughout its linear sequence as illustrated in the unfolded protein (top). The color scale denotes the most most hydrophilic residues (blue) to the most hydrophobic (yellow). When the protein folds (bottom left), hydrophilic (charged and uncharged polar) side chains will often be exposed on the protein’s surface, where they can form stabilizing interactions with surrounding water and ions. In contrast, the hydrophobic residues tend to cluster together in the inner core, somewhat like drops of oil in an aqueous liquid, driven away from the aqueous surroundings by the hydrophobic effect (see Chapter 2). These core residues are more easily seen when several surface residuesare removed (bottom right). [Data from M. C. Vaney et al., 1996, Acta Crystallogr., Sect. D. 52:505, PDB ID 193l.]
proteins. These four broad categories of proteins are not mutually exclusive—some proteins are made up of combinations of segments that fall into two or more of these categories. Globular proteins are generally water-soluble, compactly folded structures, often but not exclusively spheroidal, that comprise a mixture of secondary structures [see the structures of ras (Figure 3-9 below) and myoglobin (Figure 3-14 below)]. Fibrous proteins are large, elongated, often stiff molecules. Some fibrous proteins are composed of a long polypeptide chain comprising many tandem copies of a short amino acid sequence that forms a single repeating secondary structure (see the structure of collagen, the most abundant protein in mammals, in Figure 20-25). Other fibrous proteins are composed of repeating globular protein subunits, such as the helical array of G-actin protein monomers that forms F-actin microfilaments (see Chapter17). Fibrous proteins, which often aggregate into large multiprotein fibers that do not readily dissolve in water, usually play a structural role or participate in cellular movements. Integral membrane proteins are embedded within the phospholipid bilayer of the membranes that enclose cells and organelles and are discussed in detail in Chapter 7.
Intrinsically disordered proteins are fundamentally distinct from the well-ordered proteins in the other three categories. Many proteins we consider in this book adopt only one or a few very closely related conformations when they are in their normal functional state, called the native state. Intrinsically disordered proteins, however, do not have well-ordered structures in their native, functional states; instead, their polypeptide chains are very flexible—indeed, disordered—with no fixed conformation. Sometimes only a segment of a polypeptide chain, rather than the entire chain, will be intrinsically disordered. The exceptional conformational flexibilities of intrinsically disordered proteins or protein segments appear to be key to their functional activities, such as the ability to interact with multiple partner proteins or to fold into a well-defined conformation only after binding to such partners (Figure 3-8a). Intrinsically disordered proteins typically, but not exclusively, serve as signaling molecules, regulators of the activities of other molecules, or as scaffolds for multiple proteins, small molecules, and ions (e.g., binding ions via multiple charged residues). Regions of intrinsic disorder can provide flexible links, or tethers, between well-ordered regions of a protein; serve as sites of some types of post-translational protein modification [e.g., covalent addition of phosphate groups (phosphorylation) or sugars (glycosylation)]; serve as targets of protease digestion that regulates protein activity; inhibit the activity of the protein in which they are embedded (autoinhibition sites); or serve as signals for intracellular sorting of proteins (see Chapter 13). The activities of many proteins containing intrinsically disordered segments are described in subsequent chapters. For example, phosphorylation of the disordered C-terminal domain (CTD) of RNA polymerase II (see Figure 8-12), which is composed of multiple repeats of a seven-amino-acid sequence containing proline, threonine, and serine, regulates key steps in the synthesis of mRNA (see Chapters 9 and 10). The N-termini of histone proteins that control DNA organization in chromatin (see Chapter 8) are sites of important post-translational modifications, and the disordered, proline-rich FH1 region in the protein formin controls the assembly of actin filaments (see Chapter 17). Intrinsically disordered proteins can be identified experimentally using various biochemical techniques, such as tests of sensitivity to protease digestion (disordered regions usually exhibit greater protease sensitivity), and a wide variety of biophysical techniques, including spectroscopy. The intrinsic disorder of these proteins apparently arises as a consequence of their having a sequence that, relative to well-ordered proteins, is richer in polar amino acids, proline, and net charge, and poorer in hydrophobic residues (Figure 3-8b). Algorithms primarily based on calculations of amino acid composition—particularly net charge and hydrophobicity—are used to predict which proteins or segments of proteins are intrinsically disordered. By some estimates, about 30 percent or more of eukaryotic proteins are predicted to have at least one segment of 50 or more consecutive residues that is disordered.
3.1 Hierarchical Structure of Proteins
73
(a)
Conformation selection
(b) Well structured Intrinsically disordered Exceptions
Transiently ordered PUMA
Intrinsically disordered PUMAs
W Well-structured MCL1 M
⫹
Ordered PUMA bound to MCL1
Mean net charge
0.6
0.4
0.2
0.0 0.1
0.2
0.3 0.4 0.5 Mean hydrophobicity
0.6
Induced fit
EXPERIMENTAL FIGURE 38 Intrinsically disordered proteins: mechanisms of binding to well-ordered proteins and identification based on hydrophobicity and net charge. (a) The binding of an intrinsically disordered protein (PUMA, blue) to a well-ordered protein (MCL1, gray) results in the formation of a well-defined structure in the previously disordered protein. PUMA and MCL1 are intracellular proteins that can influence the regulated process of cell death called apoptosis (see Chapter 21). Two mechanisms have been proposed for generating a bound complex in which both proteins are structured: conformational selection (top pathway) and induced fit (bottom pathway). In conformational selection, the disordered protein (PUMA) occasionally and transiently adopts in solution the structure it would have in the bound state. The well-ordered binding partner (MLC1) can then bind to (select) PUMA in that transient, ordered conformation, forming a relatively stable bound complex. In induced fit, the disordered protein begins to bind to the well-ordered partner while still disordered and then, while bound, is induced to form the ordered conformation present in the relatively stable, heterodimeric complex. Recent experiments suggest
Different Ways of Depicting the Conformation of Proteins Convey Different Types of Information The simplest way to represent three-dimensional protein structure is to trace the course of the backbone atoms, sometimes only the Cα atoms, with a solid line (called a Cα backbone trace, Figure 3-9a); the most complex representation, called a ball-and-stick model, shows every atom (Figure3-9b). The Cα backbone trace shows the overall folding of the polypeptide chain without consideration of the amino acid side chains; the ball-and-stick model (with balls representing atoms and sticks representing bonds) details the interactions between side-chain atoms, including those that stabilize the protein’s conformation and interact with other molecules, as well as the atoms of the backbone. Even though both views are useful, the elements of secondary structure are not always easily discerned in them. Another type of representation, called a ribbon diagram, uses common shorthand symbols for depicting secondary structure—for example, 74
CHAPTER 3
t Protein Structure and Function
that the induced fit mechanism best describes the binding of PUMA and MCL1. (b) The sequences of 275 well-ordered, monomeric globular proteins (gray squares) and 91 intrinsically disordered proteins (black and yellow circles) were used to calculate the mean hydrophobicity per residue in each protein using a scale of 0 (least hydrophobic) to 1 (most hydrophobic, x axis), and the mean net charge per residue at pH 7.0 (y axis). With only three exceptions (black circles), the proteins define two distinct distributions: low hydrophobicity, high net charge (intrinsically disordered, yellow circles) and high hydrophobicity, low net charge (well-ordered, gray squares). The three disordered proteins (black circles) that overlap with the well-ordered population each contain substantial segments predicted to be disordered (low hydrophobicity, high net charge) that apparently overwhelm the rest of the proteins’ sequences that might otherwise result in a well-ordered conformation. [Part (a) from Rogers, J. et al., “Folding and Binding of an Intrinsically Disordered Protein: Fast, but Not ‘Diffusion-Limited,’” J. Am. Chem. Soc., 2013, 135 (4), pp1415-1422. http://pubs.acs.org/doi/pdf/10.1021/ja309527h. Part (b) data from V. N. Uversky, J. R. Gillespie, and A. L. Fink, 2000, Proteins 41:415–427.]
coiled ribbons or solid cylinders for α helices, flat ribbons or arrows for β strands, and flexible thin strands for β turns, coils, and loops (Figure 3-9c). In a variation of the basic ribbon diagram, ball-and-stick or space-filling models of all or only a subset of side chains can be attached to the backbone ribbon. In this way, side chains that are of interest can be visualized in the context of the secondary structure that is especially clearly represented by the ribbons. However, none of these three ways of representing protein structure conveys much information about the atoms that are on the protein’s surface and in contact with the watery environment. The surface is of interest because it is where other molecules usually bind to a protein. Thus a useful alternative way to represent proteins is to show only the water-accessible surface and use colors to highlight regions having a common chemical character, such as hydrophobicity or hydrophilicity, and charge characteristics, such as positive (basic) or negative (acidic) side chains (Figure 3-9d). Such models reveal the topography of the protein surface
(a) CF backbone trace
(b) Ball-and-stick model
FIGURE 39 Five ways to visualize the protein Ras with its bound GDP. (a) The Cα backbone trace demonstrates how the polypeptide is tightly packed into a small volume. (b) A ball-and-stick representation reveals the locations of all atoms. (c) Turns and loops connect pairs of helices and strands. (d) A water-accessible surface reveals the numerous lumps, bumps, and crevices on the protein surface. Regions of positive charge are shaded purple; regions of negative charge are shaded red. (e) Hybrid model in which ribbon and transparent surface models are combined. [Data from E. F. Pai et al., 1990, EMBO J. 9:2351–2359, PDB ID 5p21.]
(c) Ribbon diagram
(d) Water-accessible surface
and the distribution of charge, both important features of binding sites, as well as clefts in the surface where other molecules may bind. This view represents a protein as it is “seen” by another molecule.
Structural Motifs Are Regular Combinations of Secondary Structures A particular combination of two or more secondary structures that form a distinct three-dimensional structure is called a structural motif when it appears in multiple proteins. A structural motif is often, but not always, associated with a specific function. Any particular structural motif will frequently perform a common function in different proteins, such as binding to a particular ion or small molecule—for example, calcium or ATP. Some structural motifs, when isolated from the rest of a protein, are stable, and are thus called structural domains, as we shall see shortly. However other structural motifs do not form thermodynamically stable structures in the absence of other portions of the protein and are thus not considered to be independent structural domains. One common structural motif is the α helix–based coiled coil, or heptad repeat. Many proteins, including fibrous proteins and DNA-regulating proteins called transcription factors (see Chapter 9), assemble into dimers or trimers by using a coiled-coil motif, in which α helices from two, three, or even four separate polypeptide chains coil about one another—resulting in a coil of coils; hence the name (Figure3-10a). The individual helices bind tightly to one
(e) Hybrid model
another because each helix has a strip of aliphatic (hydrophobic, but not aromatic) side chains (leucine, valine, etc.) running along one side of the helix that interacts with a similar strip in the adjacent helix, thus sequestering the hydrophobic groups away from water and stabilizing the assembly of multiple independent helices. These hydrophobic strips are generated along only one side of the helix because the primary structure of each helix is composed of repeating seven-amino-acid units, called heptads, in which the side chains of the first and fourth residues are aliphatic and the other side chains are often hydrophilic (see Figure 3-10a). Because hydrophilic side chains extend from one side of the helix and hydrophobic side chains extend from the opposite side, the overall helical structure is amphipathic. Because leucine frequently appears in the fourth positions and the hydrophobic side chains merge together like the teeth of a zipper, these structural motifs are also called leucine zippers. Many other structural motifs contain α helices. A common calcium-binding motif called the EF hand contains two short helices connected by a loop (Figure 3-10b). This structural motif, one of several helix-turn-helix and helix-loophelix structural motifs, is found in more than a hundred proteins and is used for sensing calcium levels. The binding of a Ca2+ ion to oxygen atoms in conserved residues in the loop depends on the concentration of Ca2+ in the cell and sometimes induces a conformational change in the protein, altering its activity. Thus calcium concentrations can directly control proteins’ structures and functions. Somewhat different helix-turn-helix and basic helix-loop-helix 3.1 Hierarchical Structure of Proteins
75
(a) Coiled-coil motif N
(b) EFhand/helix-loop-helix motif Ca2+
N
Asn
Asp
C
Thr
Leu (4)
(c) Zinc-finger motif
His
Zn2+
H2O Glu
Asp
Val (1)
Cys
N His
Leu (4) Cys
Asn (1)
Leu (4) N
Val (1) Leu (4)
C
C
C
FIGURE 310 Motifs of protein secondary structure. (a) This parallel two-stranded coiled-coil motif (left) is characterized by two α helices wound around each other. Helix packing is stabilized by interactions between hydrophobic side chains (red and blue) present at regular intervals along each strand and found along the seam of the intertwined helices. Each α helix exhibits a characteristic heptad repeat sequence with a hydrophobic residue often, but not always, at positions 1 and 4, as indicated. The coiled-coil nature of this structural motif is more apparent in long coiled coils containing many such motifs (right). (b) An EF hand, a type of helix-loop-helix motif, consists of two helices connected by a short loop in a specific conformation. This structural motif is common to many proteins, including many calcium-binding and DNA-binding regulatory proteins.
(bHLH) structural motifs are used for protein binding to DNA and, consequently, for the regulation of gene activity (see Chapter9). Yet another structural motif commonly found in proteins that bind RNA or DNA is the zinc finger, which contains three secondary structures—an α helix and two β strands with an antiparallel orientation—that form a fingerlike bundle held together by a zinc ion (Figure 3-10c). The relationship between the primary structure of a polypeptide chain and the structural motifs into which it folds is not always straightforward. The amino acid sequences responsible for any given structural motif in different proteins may be very similar to one another. In other words, a common sequence motif can result in a common structural motif. This is the case for the heptad repeats that form coiled coils. However, it is also possible for seemingly unrelated amino acid sequences to fold into a common structural motif, so it is not always possible to predict which amino acid sequences will fold into a given structural motif. Conversely, it is possible that a commonly occurring sequence motif will not fold into a well-defined structural motif. Sometimes short sequence motifs that have an unusual abundance of a particular amino acid, such as proline or aspartate or glutamate, are called “domains”; however, these 76
CHAPTER 3
t Protein Structure and Function
In calcium-binding proteins such as calmodulin, oxygen atoms from five residues in the acidic glutamate- and aspartate-rich loop and one water molecule form ionic bonds with a Ca2+ ion. (c) The zinc-finger motif is present in many DNA-binding proteins that help regulate transcription. AZn2+ ion is held between a pair of β strands (blue) and a single α helix (red) by a pair of cysteine residues and a pair of histidine residues. The two invariant cysteine residues are usually at positions 3and 6, and the two invariant histidine residues are at positions 20 and 24 in this 25-residue motif. [Part (a) data from L. Gonzalez, Jr., D. N. Woolfson, and T. Alber, 1996, Nat. Struct. Biol. 3:1011–1018, PDB IDs 1zik and 2tma. Part (b) data from R. Chattopadhyaya et al., 1992, J. Mol. Biol. 228:1177–1192, PDB ID 1cll. Part (c) data from S. A. Wolfe, R. A. Grant, and C. O. Pabo, 2003, Biochemistry 42:13401–13409, PDB ID 1llm.]
and other short contiguous segments are more appropriately called “sequence motifs” than “domains,” as the latter term has a distinct meaning that we will define shortly. We will encounter numerous additional motifs in our discussions of proteins in this and other chapters. The presence of the same structural motif in different proteins with similar functions clearly indicates that these useful combinations of secondary structures have been conserved in evolution.
Domains Are Modules of Tertiary Structure Distinct regions of protein structure are often referred to as domains. There are three main classes of protein domains: functional, structural, and topological. A functional domain is a region of a protein that exhibits a particular activity characteristic of that protein, usually even when isolated from the rest of the protein. For instance, a particular region of a protein may be responsible for its catalytic activity (e.g., a kinase domain that covalently adds a phosphate group to another molecule) or its binding ability (e.g., a DNA-binding domain or a membrane-binding domain). Functional domains are often identified experimentally by whittling down a protein to its smallest active fragment with the aid of proteases,
(b)
(a) HA2
DISTAL
PROXIMAL
Globular domain
Fibrous domain
N
HA1
Sialic acid
FIGURE 311 Tertiary and quaternary levels of structure. The protein pictured here, hemagglutinin (HA), is found on the surface of the influenza virus. This long multimeric molecule has three identical subunits, each composed of two polypeptide chains, HA1 and HA2. (a) The tertiary structure of each HA subunit comprises the folding of its helices and strands into a compact structure that is 13.5 nm long and divided into two domains. The membrane-distal domain (silver) is folded into a globular conformation. The membrane-proximal domain (gold) has a fibrous, stemlike conformation owing to the alignment of two long α helices (cylinders) of HA2 with β strands in HA1. Short turns and longer loops, many of them at the surface of the molecule, connect the helices and strands in each chain. (b) The quaternary structure of HA is stabilized by lateral interactions between the long helices (cylinders) in the fibrous domains of the three subunits (gold, blue, and green), forming a triple-stranded coiled-coil stalk. Each of the distal globular domains in HA binds sialic acid (red) on the surface of target cells. Like many membrane proteins, HA contains several covalently linked carbohydrate chains (not shown). [Data from S. J. Gamblin et al., 2004, Science 303:1838–1842, PDB ID 1ruz.]
N External Viral membrane Internal C
enzymes that cleave one or more peptide bonds in a target polypeptide. Alternatively, the DNA encoding a protein can be modified so that when the modified DNA is used to generate a protein, only a particular region, or domain, of the full-length protein is made. Thus it is possible to determine if specific parts of a protein are responsible for particular activities exhibited by the protein. Indeed, functional domains are often also associated with corresponding structural domains. A structural domain is a region about 40 or more amino acids in length, arranged in a single, stable, and distinct structure often comprising one or more secondary structures. Many structural domains can fold into their characteristic structures independently of the rest of the protein in which they are embedded. As a consequence, distinct structural domains can be linked together—sometimes by short or long spacers—to form a large multidomain protein. Each of the polypeptide chains in the trimeric flu virus hemagglutinin, for example, contains a globular domain and a fibrous domain (Figure 3-11a). Structural domains can be incorporated as modules into different proteins. The modular approach to protein architecture is particularly easy to recognize in large proteins, which tend to be mosaics of different domains that confer distinct activities and thus can perform different functions simultaneously. As many as 75 percent of the proteins in eukaryotes have multiple structural domains. Structural domains frequently are also functional domains in that they can have an activity independent of the rest of the protein. The epidermal growth factor (EGF) domain is a structural domain that is present in several proteins (Figure 3-12). EGF
is a small, soluble peptide hormone that binds to cells in the embryo and in skin and connective tissue in adults, causing them to divide. It is generated by proteolytic cleavage (breaking of a peptide bond) between repeated EGF domains in the EGF precursor protein, which is anchored in the plasma membrane by a membrane-spanning domain. EGF domains with sequences similar to, but not identical to, that of the EGF peptide hormone are present in other proteins and can be liberated by proteolysis. These proteins include tissue plasminogen activator (TPA), a protease that is used to dissolve blood
EGF precursor Neu
EGF
TPA
FIGURE 312 Modular nature of protein domains. Epidermal growth factor (EGF) is generated by proteolytic cleavage of a precursor protein containing multiple EGF domains (green) and a membranespanning domain (blue). An EGF domain is also present in the Neu protein and in tissue plasminogen activator (TPA). These proteins also contain other widely distributed domains, indicated by shape and color. See I. D. Campbell and P. Bork, 1993, Curr. Opin. Struc. Biol. 3:385. 3.1 Hierarchical Structure of Proteins
77
clots in heart attack victims; Neu protein, which takes part in embryonic differentiation; and Notch protein, a receptor protein in the plasma membrane that functions in developmentally important signaling (see Chapter 16). Besides the EGF domain, these proteins have other domains in common with other proteins. For example, TPA possesses a trypsin domain, a functional domain found in some proteases. It is estimated that there are about a thousand different types of structural domains in all proteins. Some of these are not very common, whereas others are found in many different proteins. Indeed, by some estimates, only nine major types of structural domains account for as much as a third of all the structural domains in all proteins. Structural domains can be recognized in proteins whose structures have been determined by x-ray crystallography or nuclear magnetic resonance (NMR) analysis or in images captured by electron microscopy. Regions of proteins that are defined by their distinctive spatial relationships to the rest of the protein are topological domains. For example, some proteins associated with cellsurface membranes have a part extending inward into the cytoplasm (cytoplasmic domain), a part embedded within the phospholipid bilayer (membrane-spanning domain), and a part extending outward into the extracellular space (extracellular domain). Each of these parts can comprise one or more structural and functional domains. In Chapter 8, we will consider the mechanism by which the gene segments that correspond to domains became shuffled in the course of evolution, resulting in their appearance in many proteins. Once a functional, structural, or topological domain has been identified and characterized in one protein, it is possible to use that information to search for similar domains in other proteins and to suggest potentially similar functions for those domains in those proteins.
Multiple Polypeptides Assemble into Quaternary Structures and Supramolecular Complexes Multimeric proteins consist of two or more polypeptide chains, which in this context are referred to as subunits. A fourth level of structural organization, quaternary structure, describes the number (stoichiometry) and relative positions of the subunits in multimeric proteins (Figure 3-2). Flu virus hemagglutinin, for example, is a trimer of three identical subunits (a homotrimer) held together by noncovalent bonds (Figure 3-11b). Other multimeric proteins are composed of various numbers of identical (homomeric) or different (heteromeric) subunits. Hemoglobin, the oxygen-carrying molecule in blood, is an example of a heteromeric multimeric protein, as it has two copies each of two different polypeptide chains (as discussed below). In many cases, the individual monomer subunits of a multimeric protein cannot function normally unless they are assembled into the multimeric protein. In other cases, assembly into a multimeric protein permits proteins that act sequentially in a pathway to increase their efficiency of operation owing to their juxtaposition in space, a phenomenon referred to as metabolic coupling. Classic examples of metabolic coupling are the fatty acid synthases, the enzymes in fungi that synthesize fatty acids, and the polyketide synthases, 78
CHAPTER 3
t Protein Structure and Function
the large multiprotein complexes in bacteria that synthesize a diverse set of pharmacologically relevant molecules called polyketides, including the antibiotic erythromycin. The highest level in the hierarchy of protein structure is the association of proteins into supramolecular complexes. Typically, such structures are very large, in some cases exceeding 1 megadalton (MDa) in mass, approaching 30–300 nm in size, and containing tens to hundreds of polypeptide chains and sometimes other biopolymers such as nucleic acids. The capsid that encases the nucleic acids of the viral genome is an example of a supramolecular complex with a structural function. The bundles of cytoskeletal filaments that support and give shape to the plasma membrane are another example. Other supramolecular complexes act as molecular machines, carrying out the most complex cellular processes by integrating multiple proteins, each with distinct functions, into one large assembly. For example, a transcriptional machine is responsible for synthesizing messenger RNA (mRNA) using a DNA template. This transcriptional
General transcription factors
RNA polymerase DNA
Mediator complex Promoter
Transcription preinitiation complex
FIGURE 313 A molecular machine: the transcription initiation complex. The core RNA polymerase, general transcription factors, a mediator complex containing about 20 subunits, and other protein complexes not depicted here assemble at a promoter in DNA. The polymerase carries out transcription of DNA; the associated proteins are required for initial binding of the polymerase to a specific promoter. The multiple components function together as a molecular machine.
machine, the operational details of which are discussed in Chapters 5 and 9, consists of RNA polymerase, itself a multimeric protein, and at least 50 additional components, including general transcription factors, promoter-binding proteins, helicase, and other protein complexes (Figure3-13). Ribosomes, also discussed in Chapter 5, are complex multiprotein and multi-nucleic acid machines that synthesize proteins. One of the most complex multiprotein assemblies is the nuclear pore, a structure that allows communication and passage of macromolecules between the nucleus and the cytoplasm (see Chapter 14). It is composed of multiple copies of about 30distinct proteins and forms an assembly with an estimated mass of 50 MDa. The fatty acid synthases and polyketide synthases referred to above are also molecular machines.
Comparing Protein Sequences and Structures Provides Insight into Protein Function and Evolution Analyses of many diverse proteins have conclusively established a relation between the amino acid sequence, threedimensional structure, and function of proteins. One of the earliest examples involved a comparison of two oxygencarrying proteins: myoglobin in muscle and hemoglobin in red blood cells. Myoglobin—a monomer (consisting of one polypeptide chain/protein molecule)—and hemoglobin—a (a)
α
α
(b)
tetramer (consisting of two α and two β polypeptides, or subunits, per protein)—both contain a heme group noncovalently attached to each polypeptide chain (Figure 3-14a). The heme group binds oxygen. A mutation in the gene encoding the β chain of hemoglobin that results in the substitution of a valine for a glutamic acid disturbs this protein’s folding and function and causes sickle-cell disease (also called sickle-cell anemia). The properly aligned sequences of the 141-residue myoglobin and the 153-residue β subunit of hemoglobin have 40 residues in equivalent positions in the sequences that are identical and another 21 that have side chains that are chemically very similar. This high degree of identity and similarity (43 percent of the myoglobin residues) is consistent with their similar oxygen-binding functions. X-ray crystallographic analysis showed that the three-dimensional structures of myoglobin and of the α and β subunits of hemoglobin, as well as that of the evolutionarily distant oxygen-carrying leghemoglobin from plants, are remarkably similar (see Figure 3-14a). A good rule of thumb is that the greater the similarity of the sequences of two polypeptide chains, the more likely they are to have similar three-dimensional structures and similar functions. While this comparative approach is very powerful, caution must always be exercised when attributing to one protein, or a part of a protein, a function or structure similar to that of another protein based only on amino acid sequence Vertebrate
HEMOGLOBIN
α
β
MYOGLOBIN
Dicot Monocot hemoglobin LEGHEMOGLOBIN hemoglobin Annelid
Insect Nematode
β
β Hemoglobin Protozoan Algal Fungal Bacterial
β subunit of hemoglobin
Myoglobin
Leghemoglobin
FIGURE 314 Evolution of the globin protein family. (a) Hemoglobin is a tetramer of two α and two β subunits. The structural similarity of these subunits to leghemoglobin and myoglobin, both of which are monomers, is evident. A heme molecule (red) noncovalently associated with each globin polypeptide is directly responsible for oxygen binding in these proteins. (b) A primitive monomeric oxygen-binding globin is thought to be the ancestor of modern-day blood hemoglobins, muscle myoglobins, and plant leghemoglobins. Sequence comparisons have revealed that the evolution of the globin
Ancestral oxygen-binding protein
proteins parallels the evolution of animals and plants. Major changes occurred with the divergence of plant globins from animal globins and of myoglobin from hemoglobin. Later, gene duplication gave rise to the α and β subunits of hemoglobin. See R. C. Hardison, 1996, P. Natl. Acad. Sci. USA 93:5675. [Part (a) data from G. Fermi et al., 1984, J. Mol. Biol. 175:159–174, PDB ID 2hbb (hemoglobin), H. C. Watson, 1969, Prog. Stereochem. 4:299, PDB ID 1mbn (myoglobin), and M. S. Hargrove et al., 1997, J. Mol. Biol. 266:1032–1042, PDB ID 1bin (leghemoglobin).]
3.1 Hierarchical Structure of Proteins
79
similarities. There are examples in which proteins with similar overall structures display different functions, as well as cases in which functionally unrelated proteins with dissimilar amino acid sequences nevertheless have very similar folded tertiary structures, as will be explained below. Nevertheless, in many cases, such comparisons of sequences provide important insights into protein structure and function. Use of sequence comparisons to deduce protein structure and function has expanded substantially in recent years as the genomes and messenger RNAs of more and more organisms have been sequenced, permitting a vast array of protein sequences to be deduced. Indeed, the molecular revolution in biology during the last decades of the twentieth century created a new scheme of biological classification based on similarities and differences in the amino acid sequences of proteins. Proteins that have a common ancestor are referred to as homologs. The main evidence for homology among proteins, and hence for their common ancestry, is similarity in their sequences, which is often reflected in similar structures. We can describe homologous proteins as belonging to a “family” and can trace their lineage—how closely or distantly they are related to one another in an evolutionary sense—from comparisons of their sequences. Generally, more closely related proteins exhibit greater sequence similarity than more distantly related proteins because, over evolutionary time, mutations accumulate in the genes encoding these proteins. The folded three-dimensional structures of homologous proteins may be similar even if some parts of their primary structure show little evidence of sequence homology. Initially, proteins with relatively high sequence similarities (>50 percent exact amino acid matches, or “identities”) and related functions or structures were defined as an evolutionarily related family, while a superfamily encompassed two or more families in which the interfamily sequences matched less well (∼30–40 percent identities) than within one family. It is generally thought that proteins with about 30 percent sequence identity are likely to have similar three-dimensional structures; however, such high sequence identity is not required for proteins to share similar structures. Revised definitions of family and superfamily have been proposed, in which a family comprises proteins with a clear evolutionary relationship (>30 percent identity or additional structural and functional information showing common descent but 1015 dpm/mmol) are available. Kinases within cells (or used in vitro) can transfer a 32P-labeled phosphate from 32P-labeled ATP to label phosphoproteins. Likewise, commercial preparations of 3H-labeled nucleic acid precursors have much higher specific activities than those of the corresponding 14C-labeled preparations. In most experiments, the former are preferable because they allow RNA or DNA to be adequately labeled a shorter time after incorporation or require a smaller cell sample. Various phosphate-containing compounds in which the phosphorus atom is the radioisotope phosphorus-32 are readily available. Because of their high specific activity, 32P-labeled nucleotides are routinely used to label nucleic acids in cellfree systems. Labeled compounds in which a radioisotope replaces atoms normally present in the molecule have virtually the same chemical properties as the corresponding unlabeled compounds. Enzymes, for instance, generally cannot distinguish between substrates labeled in this way and their unlabeled substrates. The presence of such radioactive atoms is indicated with the isotope in brackets (no hyphen) as a prefix (e.g., [3H]leucine). In contrast, labeling of almost any biomolecule (e.g., protein or nucleic acid) with the radioisotope iodine-125 (125I) requires the covalent addition of 125I to a molecule that normally does not have iodine as part of its structure. Because this labeling procedure modifies the chemical structure, the biological activity of the labeled molecule may differ somewhat from that of the unlabeled form. The presence of such radioactive atoms is indicated with the isotope as a prefix followed a hyphen (no bracket) (e.g., 125I-trypsin). Standard methods for labeling proteins with 125I result in covalent attachment of the 125I primarily to the aromatic rings of tyrosine side chains (mono- and diiodotyrosine). Nonradioactive isotopes are finding increasing use in cell biology, especially in nuclear magnetic resonance studies and in mass spectroscopy applications, as will be explained below. Labeling Experiments and Detection of Radiolabeled Molecules Whether labeled compounds are detected by autoradiography—exposure of the sample on a two-dimensional detector (photographic emulsion or electronic detector)—or their radioactivity is measured in an appropriate “counter,” the amount of a radiolabeled compound in a sample can be determined with great precision. In one use of autoradiography, a tissue, cell, or cell constituent is labeled with a radioactive molecule, unassociated radioactive material is washed away, and the structure of the sample is stabilized either by chemically cross-linking the macromolecules in the sample (“fixation”) or by freezing it. The sample is then overlaid with a photographic emulsion that is sensitive to radiation. Development of the emulsion yields small silver grains whose distribution corresponds to that of the radioactive material and is usually detected by microscopy. Autoradiographic studies of whole
cells were crucial in determining the intracellular sites where various macromolecules are synthesized and the subsequent movements of those macromolecules within cells. Various techniques employing fluorescence microscopy, which we describe in Chapter 4, have largely supplanted autoradiography for studies of this type. However, autoradiography is sometimes used in various assays for detecting specific isolated DNA or RNA sequences at specific tissue locations (see Chapter 6) in a technique referred to as in situ hybridization. Quantitative measurements of the amount of radioactivity in a labeled material are performed with several different instruments. A Geiger counter measures ions produced in a gas by the β particles or γ rays emitted from a radioisotope. These instruments are mostly handheld devices used to monitor radioactivity in the laboratory to protect investigators from excess exposure. In a scintillation counter, a radiolabeled sample is mixed with a liquid containing a fluorescent compound that emits a flash of light when it absorbs the energy of the β particles or γ rays released in the decay of the radioisotope; a phototube in the instrument detects and counts these light flashes. Phosphorimagers detect radioactivity using a two-dimensional array detector, storing digital data on the number of disintegrations per minute per small pixel of surface area. These instruments, which can be thought of as a kind of reusable electronic film, are commonly used to quantify radioactive molecules separated by gel electrophoresis and are replacing photographic emulsions for this purpose. Combinations of labeling and biochemical techniques and of visual and quantitative detection methods are often employed in labeling experiments. For instance, to identify the major proteins synthesized by a particular cell type, a sample of the cells is incubated with a radiolabeled amino acid (e.g., [35S]methionine) for a few minutes, during which time the labeled amino acid enters the cells and mixes with the cellular pool of unlabeled amino acids, and some of it is biosynthetically incorporated into newly synthesized proteins. Subsequently, unincorporated radiolabeled amino acid is washed away from the cells. The cells are harvested, and the mixture of cellular proteins is extracted from the cells (for example, by a detergent solution) and then separated by any of the methods commonly used to resolve complex protein mixtures into individual components. Gel electrophoresis in combination with autoradiography or phosphorimager analysis is often the method of choice. The radioactive bands in the gel correspond to newly synthesized proteins, which have incorporated the radiolabeled amino acid. To detect a specific protein of interest, rather than the entire ensemble of biosynthetically radiolabeled proteins, a specific protein can be isolated by immunoprecipitation. The precipitate is then solubilized, for example, in an SDS-containing buffer, and the sample is analyzed by SDS-PAGE followed by autoradiography to detect the protein that is radioactively labeled. In this type of experiment, a fluorescent compound that is activated by the radiation (“scintillator”) may be infused into the gel on completion of the electrophoretic separation so that the light emitted can be used to detect the presence of
(a) Pulse (h) Chase (h) Normal protein
.5
1
2
0.5 4 6
8
12 24
m p
(b) Mutant protein
m p
EXPERIMENTAL FIGURE 342 Pulse-chase experiments can track the pathway of protein modification within cells. (a) To follow the fate of a specific newly synthesized protein in cells, cells were incubated with [35S]methionine for 0.5 hours (the pulse) to label all newly synthesized proteins, and any radioactive amino acid not incorporated into the cells was then washed away. The cells were further incubated (the chase) for varying times up to 24 hours, and samples from each time of chase were subjected to immunoprecipitation to isolate one specific protein (here the low-density lipoprotein receptor). SDS-PAGE of the immunoprecipitates followed by autoradiography permitted visualization of the target protein, which is initially synthesized as a small precursor (p) and then rapidly modified to a larger mature form (m) by addition of carbohydrates. About half of the labeled protein was converted from p to m during the pulse; the rest was converted after 0.5 hours of chase. The protein remained stable for 6–8 hours before it began to be degraded (as indicated by reduced band intensity). (b) The same experiment was performed in cells in which a mutant form of the protein is made. The mutant p form cannot be properly converted to the m form, and it is more quickly degraded than the normal protein. [© Kozarsky et al., The Journal of Cell Biology. 102: 1567–1575. doi:10.1083/ jcb.102.5.1567.]
the labeled protein, using either film or a two-dimensional electronic detector. An example is shown in the experiment described below (Figure 3-42). This method is particularly useful for weak β emitters such as 3H. Pulse-chase experiments are particularly useful for tracing changes in the intracellular location of proteins or the modification of a protein or metabolite over time. In this experimental protocol, a cell sample is exposed to a radiolabeled compound that can be incorporated into or otherwise attached to a cellular molecule of interest—the “pulse”— for a brief period. The pulse ends when the unincorporated radiolabeled molecules are washed away and the cells are exposed to a vast excess of the identical, but unlabeled, compound to dilute the radioactivity of any remaining, but unincorporated, radiolabeled compound. This procedure prevents any incorporation of significant amounts of radiolabel after the “pulse” period and initiates the “chase” period (see Figure3-42). Samples taken periodically during the chase period are assayed to determine the location or chemical form of the radiolabel as a function of time. Pulse-chase
3.5 Purifying, Detecting, and Characterizing Proteins
115
Mass Spectrometry Can Determine the Mass and Sequence of Proteins Mass spectrometry (MS) is a powerful technique for characterizing proteins, especially for determining the mass of a protein or fragments of a protein. With such information in hand, it is also possible to determine part or all of the protein’s sequence. This method permits the accurate direct determination of the ratio of the mass (m) of a charged molecule (molecular ion) to its charge (z), or m/z. Additional techniques are then used to deduce the absolute mass of the molecular ion. All mass spectrometers have four key features. The first is an ion source, from which charge, usually in the form of protons, is transferred to the peptide or protein molecules under study (ionization). Their conversion to ions occurs in the presence of a high electric field, which then directs the charged molecular ions into the second key component, the mass analyzer. The mass analyzer, which is always in a high vacuum chamber, physically separates the ions on the basis of their differing mass-to-charge (m/z) ratios. The separated ions are subsequently directed to strike a detector, the third key component, which provides a measure of the relative abundances of each of the ions in the sample. The fourth essential component is a computerized data system that is used to calibrate the instrument; to acquire, store, and process the resulting data; and often to direct the instrument to automatically collect additional specific types of data from the sample, based on the initial observations. This type of automated feedback is used for the tandem MS (MS/MS) peptide-sequencing methods described below. The two most frequently used methods of generating ions of proteins and protein fragments are (1) matrix-assisted laser desorption/ionization (MALDI) and (2) electrospray (ES). In MALDI (Figure 3-43), the peptide or protein sample is mixed with a low-molecular-weight, UV-absorbing organic acid (the matrix) and then dried on a metal target. Energy from a laser ionizes and vaporizes the sample, producing singly charged molecular ions from the constituent molecules. In ES (Figure 3-44a), a sample of peptides or
116
CHAPTER 3
t Protein Structure and Function
Laser
Metal target
1 Ionization +
+
2 Acceleration Sample Intensity
experiments in which the radiolabeled protein is detected by autoradiography after immunoprecipitation and SDS-PAGE are often used to follow the rate of synthesis, modification, and degradation of proteins. In these experiments, radiolabeled amino acid precursors are added during the pulse, and the amounts and characteristics of the radiolabeled target protein are detected during the chase. One can thus observe postsynthetic modifications of the protein, such as the covalent addition of sugars (see Chapters 13 and 14) or proteolytic cleavage, that change its electrophoretic mobility, as well as the rate of degradation of the protein, which is detected as the loss of signal with increasing time of chase. A classic use of the pulse-chase technique with autoradiography was in studies that elucidated the pathway traversed by secreted proteins from their site of synthesis in the endoplasmic reticulum to the cell surface (see Chapter 14).
+ 3 Detection
Lightest ions arrive at detector first Time
EXPERIMENTAL FIGURE 343 Molecular mass can be determined by matrix-assisted laser desorption/ionization timeof-flight (MALDI-TOF) mass spectrometry. In a MALDI-TOF mass spectrometer, pulses of light from a laser ionize a protein or peptide mixture that is absorbed on a metal target (step 1 ). An electric field in the mass analyzer accelerates the ions in the sample toward the detector (steps 2 and 3 ). The time it takes an ion to reach the detector is proportional to the square root of the mass-to-charge (m/z) ratio. Among ions having the same charge, the smaller ions move faster (shorter time to the detector). The molecular weight of each ion from the sample is calculated using the time of flight of a standard.
proteins in solution is converted into a fine mist of tiny droplets by spraying through a narrow capillary at atmospheric pressure. The droplets are formed in the presence of a high electric field, which renders them highly charged. The solvent evaporates from the droplets in their short flight (mm) to the entrance of the mass spectrometer’s mass analyzer, forming multiply charged ions from the peptides and proteins. The gaseous ions are transferred into the mass analyzer region of the MS, where they are then accelerated by electric fields and separated by the mass analyzer on the basis of their m/z. The two most frequently used types of mass analyzers are time-of-flight (TOF) instruments and ion traps. TOF instruments exploit the fact that the time it takes an ion to pass through the length of the mass analyzer before reaching the detector is proportional to the square root of m/z (smaller ions move faster than larger ones with the same charge; see Figure 3-43). In ion-trap analyzers, tunable electric fields are used to capture, or “trap,” ions with a specific m/z and to sequentially pass the trapped ions out of the mass analyzer onto the detector (see Figure 3-44a). By varying the electric fields, researchers can examine ions with a wide range of m/z values one by one, producing a mass spectrum, which is a graph of m/z (x axis) versus relative abundance, determined by the intensity of the signal measured by the detector (yaxis) (Figure 3-44b, top panel). In tandem, or MS/MS, instruments, any given parent ion in the original mass spectrum (see Figure 3-44b, top panel) can be chosen (mass-selected) for further analysis. The chosen ions are transferred into a second chamber in which
(a)
Electrospray needle 3–5 kV
Atmosphere + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Liquid
Droplets containing solvated ions
Vacuum +++ Mass Detector analyzer
Ions Mass spectrometer
Electrospray ionization
Relative abundance of ions
(b)
568.65 100 90 80 70 60 50 40 30 20 10 0
852.49
836.47
426.25 400
525.36 932.43 500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
MS/MS of m /z 836.47
Relative abundance of ions
1199.53 100 880.46 90 FIIVGYVDDTQFVR 979.49 80 70 792.35 693.26 60 1298.60 1142.53 706.62 50 1497.46 40 650.44 765.40 1251.46 30 421.33 473.15 549.46 907.26 20 1124.44 1398.48 261.30 818.64 10 1536.14 0 1100 1200 1300 1400 1500 1600 300 400 500 600 700 800 900 1000 m /z
EXPERIMENTAL FIGURE 344 Molecular mass of proteins and peptides can be determined by electrospray ionization iontrap mass spectrometry. (a) Electrospray (ES) ionization converts proteins and peptides in a solution into highly charged gaseous ions by passing the solution through a needle (forming the droplets) that has a high voltage across it (charging the droplets). Evaporation of the solvent produces gaseous ions that enter a mass spectrometer. The ions are analyzed by an ion-trap mass analyzer that then directs ions to the detector. (b) Top panel: Mass spectrum of a mixture of three major and several minor peptides from the mouse H-2 class I histocompatibility antigen Q10 α chain is presented as the relative abundance of the ions striking the detector (y axis) as a function of the mass-to-charge (m/z) ratio (x axis). Bottom panel: In an MS/MS instrument such as the ion trap
shown in part (a), a specific peptide ion can be selected for fragmentation into smaller ions that are then analyzed and detected. The MS/ MS spectrum (also called the product-ion spectrum) provides detailed structural information about the parent ion, including sequence information for peptides. Here the ion with an m/z of 836.47 was selected and fragmented and the m/z mass spectrum of the product ions measured. Note there is no longer an ion with an m/z of 836.47 present because it was fragmented. From the varying sizes of the product ions, the understanding that peptide bonds are often broken in such experiments, the known m/z values for individual amino acid fragments, and database information, the sequence of the peptide, FIIVGYVDDTQFVR, can be deduced. [Part (b), unpublished data from S. Carr.]
3.5 Purifying, Detecting, and Characterizing Proteins
117
they are broken into smaller fragment ions by collision with an inert gas, and then the m/z and relative abundances of the resulting fragment ions are measured in a second MS analyzer (Figure 3-44b, bottom panel, see also Figure 3-47 later in this chapter). These multiple mass analysis and fragmentation steps all take place within the same machine in about 0.1 seconds per selected parent ion. The fragmentation and subsequent mass analysis permit the sequences of short peptides (10 kb in the genome sequence. Biotin is represented by red flags. See E.Lieberman-Aiden, 2009, Science 326:289. (c) Heat map of chromosome conformation capture data for a region of chromosome 6 in mouse embryonic stem cells. The sequence from 49 to 54 Mb from the left end of chromosome 6 is represented on both axes. Each pixel shows data from a 10 kb sequence. The number of times a sequence from one 10-kb region indicated on the x axis was ligated to a sequence from a second 10-kb region on the y axis is indicated by the intensity of red color, as shown in the key at the lower left. A value of 100 (dark red) indicates that a sequence anywhere within the 10-kb region on the x axis was found ligated to a sequence from anywhere in the 10-kb region on the y axis 100 times. Since the probability that two ends generated by sonication will be ligated together is higher for ends that are close together than for ends that are far apart, the intensity of the red color in any pixel indicates the relative proximity of the sequences in the two 10-kb intervals in the nuclei at the time of crosslinking. Inset shows a model of chromatin folding that is consistent with these results. [Part (c) data from J. R. Dixon, 2012, Nature 485:376.]
a median size of 880 kb. For example, sequences in the interval of chromosome 6 between 50.9 Mb and 51.3 Mb (see Figure 8-34c, topological domain A) are much more likely to be ligated to each other than to sequences in the interval from 51.3 Mb to 52.2 Mb (topological domain B), or to sequences from any of the other topological domains that are apparent. In situ hybridization studies showed that sequences within a topological domain lie much closer to each other in the fixed cell nucleus than to sequences the same distance away in base pairs, but in a neighboring topological domain. These results have been interpreted to indicate that the chromatin fiber is folded into topological domains, as represented in the inset of Figure 8-34c. The topological domains are separated by shorter regions of chromatin, called boundary elements, that do not interact with distant regions of chromatin. Since the topological domains are on the order of 200 kb–1.5 Mb in length, they are long enough to contain several averagesized genes. The topological domains identified by these chromatin conformation capture assays may correspond to
the loops of chromatin observed in the lampbrush chromosomes described above, which are not constrained by the nuclear envelope of a vastly smaller nucleus and have an opportunity to unfold (see Figure 8-31). Current research is exploring what protein-DNA interactions might be responsible for establishing boundary elements between topological domains. As we will see in Chapter 9, related chromosome conformation capture techniques have provided strong evidence that proteins bound to enhancers interact with proteins bound to promoters many kilobases away. Metaphase Chromosome Structure Condensation of chromosomes during prophase (see Figure 18-37) may involve the formation of many more loops of chromatin, so that the length of each loop is greatly reduced compared with chromatin loops in interphase cells. As a result, chromosomes condense into structures of much greater width than interphase chromosomes and decrease in length severalfold, generating the condensed chromosomes observed during metaphase (Figure 8-35). The geometry of chromatin in metaphase chromosomes is not well understood. Experiments with frog egg extracts have shown that a protein complex called condensin, composed of SMC subunits (see Figure 8-32 and Chapter 19), contributes to chromosome condensation using energy from ATP hydrolysis. Microscopic analysis of mammalian chromosomes as they condense during prophase indicates that in the initial period of prophase, the 30-nm chromatin fiber folds into a 100–130-nm chromonema fiber associated with the nuclear envelope (Figure 8-36). Chromonema fibers then fold into structures with a diameter of 200–250 nm, called middle prophase chromatids (Figure 8-36a, 3 ), which then fold into the 500–750-nm-diameter chromatids observed during metaphase when the nuclear envelope retracts into the endoplasmic reticulum (Figure 8-36a, 4 ) (see also Chapter 19). Ultimately, the full lengths of the two associated daughter chromosomes generated by DNA replication during the previous S phase of the cell cycle (see Figure 1-21) condense into bar-shaped structures (chromatids) that in most eukaryotes are linked at the central constriction called the centromere (see Figure 8-35). An electron micrograph of a section through a metaphase chromosome stained with anti-SMC antibodies linked to small gold spheres (Figure 8-36b) shows that condensin, proposed to be at the bases of chromatin loops (see Figure8-32c), occupies approximately one-third of the chromatid diameter (Figure 8-36c, right), where it contributes to the shaping of each chromatid.
Additional Nonhistone Proteins Regulate Transcription and Replication As we have seen, the total mass of the histones associated with DNA in chromatin is about equal to that of the DNA. Interphase chromatin and metaphase chromosomes also contain small amounts of a complex set of other proteins. For instance, thousands of different transcription factors are associated with interphase chromatin. The structure and
Telomere Chromatid
Centromere
Chromatid
Telomere
FIGURE 835 Typical metaphase chromosome. As seen in this scanning electron micrograph, the chromosome has replicated and comprises two chromatids, each containing one of two identical DNA molecules. The centromere, where the chromatids are attached at a constriction, is required for their separation late in mitosis. Special telomere sequences at the ends function in preventing chromosome shortening. [Andrew Syred/Science Source.]
function of these critical nonhistone proteins, which regulate transcription, are examined in Chapter 9. Other lowabundance nonhistone proteins associated with chromatin regulate DNA replication during the eukaryotic cell cycle (see Chapter 19). A few other nonhistone DNA-binding proteins are present in much larger amounts than the transcription or replication factors. Some of these proteins exhibit high mobility during electrophoretic separation and thus have been designated HMG (high-mobility group) proteins. When genes encoding the most abundant HMG proteins are deleted from yeast cells, normal transcription is disturbed in most genes examined. Some HMG proteins have been found to assist in the cooperative binding of several transcription factors to specific DNA sequences that are close to each other, stabilizing multiprotein complexes that regulate transcription of a neighboring gene, as discussed in Chapter 9. 8.5 Structural Organization ofEukaryoticChromosomes
339
(b)
(a)
(c)
30 nm 1
Nuclear envelope
100–130 nm (chromonema fiber)
200–250 nm (middle prophase chromatid)
(b)
2
500–750 nm (metaphase chromatid)
3 0.5m
4
FIGURE 836 Model for mitotic chromosome condensation. (a) Stages of chromosome condensation during mitosis. Changes in large-scale chromatin folding (blue) versus distribution of Smc2, a subunit of condensin (red), from early prophase 1 to middle prophase 2 to late prophase 3 to metaphase 4 . (b) Transmission electron micrograph of immunogold staining of Smc2 in a section through a metaphase chromosome reveals axial staining of Smc2 of about 0.15–0.2 μm in width. (c) “Hierarchical folding, axial glue” model of metaphase chromosome structure. (Left) 30-nm fiber folds into 100–130-nm chromonema fiber, which folds into 200–250-nm middle prophase chromatid, which folds into 500–750-nm metaphase chromatid. Only one chromatid is shown. (Right) Axial condensin distribution (red) occupies approximately one-third of the chromatid diameter, acting as a cross-linking “glue” to stabilize the structure of the metaphase chromosome. [Part (b) © 2004 Kireeva et al., The Journal of Cell Biology. 166:775-785. doi: 10.1083/ jcb.200406049.]
KEY CONCEPTS OF SECTION 8.5
Structural Organization of Eukaryotic Chromosomes r In eukaryotic cells, DNA is associated with about an equal mass of histone proteins in a highly condensed nucleoprotein complex called chromatin. The building block of chromatin is the nucleosome, consisting of a histone octamer around which is wrapped about 147 bp of DNA (see Figure 8-24). r The chromatin in transcriptionally inactive regions of DNA within cells is thought to exist in a condensed, 30-nm fiber form and higher-order structures built from it (see Figure 8-25 and 8-36). r The chromatin in transcriptionally active regions of DNA within cells is thought to exist in an open, extended form. r The flexible, intrinsically disordered N-terminal tails of histones, particularly H4 lysine 16, are required for
340
CHAPTER 8
t Genes, Genomics, and Chromosomes
beads-on-a-string chromatin (the 10-nm chromatin fiber) to fold into a 30-nm fiber. r Histone tails can be modified by acetylation, methylation, phosphorylation, and ubiquitinylation (see Figure 8-26). These modifications influence chromatin structure by regulating the binding of histone tails to other, less abundant chromatin-associated proteins. r The reversible acetylation and deacetylation of lysine residues in the N-terminal tails of the core histones regulate chromatin condensation. Proteins involved in transcription, replication, and repair, and enzymes such as DNase I, can more easily access chromatin with hyperacetylated histone tails (euchromatin) than chromatin with hypoacetylated histone tails (heterochromatin). r When metaphase chromosomes decondense during interphase, areas of heterochromatin remain much more condensed than regions of euchromatin.
r Heterochromatin protein 1 (HP1) uses a chromodomain to bind to histone H3 trimethylated at lysine 9. The chromoshadow domain of HP1 associates with itself and with the histone methyl transferase that methylates H3 lysine 9. These interactions cause condensation of the 30-nm chromatin fiber and spreading of the heterochromatic structure along the chromosome until a boundary element is encountered (see Figure 8-29). r One X chromosome in nearly every cell of mammalian females consists of highly condensed heterochromatin, resulting in repression of expression of nearly all genes on that inactive chromosome. This inactivation results in dosage compensation so that genes on the X chromosome are expressed at the same level in both males and females. r Each eukaryotic chromosome contains a single DNA molecule packaged into nucleosomes and folded into a 30-nm chromatin fiber, which is associated with structural maintenance of chromosome (SMC) proteins thought to organize it into the megabase loops observed by hybridization to fluorescently labeled DNA probes and in lampbrush chromosomes observed in oocytes (see Figures 8-30, 8-31, and 8-32c). Additional folding of the chromosomes further compacts the structure into the highly condensed form of metaphase chromosomes (see Figure 8-36). r In interphase cells, chromosomes are localized to largely non-overlapping “territories” in the nucleus (see Figure 8-33). r Chromosome conformation capture methods indicate that chromatin is organized into topological domains separated by boundary elements (see Figure 8-34c). These topological domains may correspond to the loops in lampbrush chromosomes observed in the giant nuclei of oocytes (see Figure 8-31) and inferred by studies of fluorescently labeled DNA probes hybridized to interphase nuclei (see Figure 8-30). r During mitosis, chromosomes condense greatly, decreasing their lengths severalfold and increasing their diameter to generate metaphase chromosomes visible by light microscopy. The geometry of the 30-nm chromatin fiber in metaphase chromosomes is not well understood, but intermediates of increasing diameter and decreasing length have been observed during prophase.
8.6 Morphology and Functional Elements of Eukaryotic Chromosomes Having examined the detailed structural organization of chromosomes in the previous section, we now view them from a more global perspective. Early microscopic observations on the number and size of chromosomes and their staining patterns led to the discovery of many important general characteristics of chromosome structure. Researchers subsequently identified specific regions of chromosomes that are critical to their replication and segregation to
daughter cells during cell division. In this section, we discuss these functional elements of chromosomes and consider how chromosomes evolved through rare rearrangements of ancestral chromosomes.
Chromosome Number, Size, and Shape at Metaphase Are Species-Specific In interphase cells, as noted previously, chromosome territories can be visualized with chromosome-specific fluorescently labeled hybridization probes (see Figure 8-33), but the detailed structure of individual chromosomes cannot be observed, even with the aid of electron microscopy. During mitosis and meiosis, however, the chromosomes condense and become visible in the light microscope. Therefore, almost all cytogenetic work (i.e., studies of chromosome morphology) has been done with condensed metaphase chromosomes obtained from dividing cells—either somatic cells in mitosis or dividing gametes during meiosis. The condensation of metaphase chromosomes probably results from several orders of folding of 30-nm chromatin fibers (see Figure 8-36). At the time of mitosis, cells have already progressed through the S phase of the cell cycle and have replicated their DNA. Consequently, the chromosomes that become visible during metaphase are duplicated structures. Each metaphase chromosome consists of two sister chromatids, which are linked at a constricted region, the centromere (see Figure 8-35). The number, sizes, and shapes of the metaphase chromosomes constitute the karyotype, which is distinctive for each species. In most organisms, all somatic cells have the same karyotype. However, species that appear quite similar can have very different karyotypes, indicating that similar genetic potential can be organized on chromosomes in very different ways. For example, two species of small deer—the Indian muntjac and Reeves muntjac—contain about the same total amount of genomic DNA. In one species, however, this DNA is organized into 22 pairs of homologous autosomes and two physically separate sex chromosomes. In contrast, the other species contains the smallest number of chromosomes of any mammal, only three pairs of autosomes; one sex chromosome is physically separate, but the other is joined to the end of one autosome.
During Metaphase, Chromosomes Can Be Distinguished by Banding Patterns and Chromosome Painting Certain dyes selectively stain some regions of metaphase chromosomes more intensely than other regions, producing characteristic banding patterns that are specific for individual chromosomes. The regularity of chromosome bands provides useful visible landmarks along the length of each chromosome and can help to distinguish chromosomes of similar size and shape, as we will see later in this section. Today the method of chromosome painting greatly simplifies the identification and differentiation of individual
8.6 Morphology and Functional Elements of Eukaryotic Chromosomes
341
(a)
(b)
EXPERIMENTAL FIGURE 837 Human chromosomes are readily identified by chromosome painting. (a) Image of human chromosomes from a male cell in mitosis made by fluorescence in situ
chromosomes within a karyotype, many of which have similar sizes and shapes. This technique, a variation of fluorescence in situ hybridization (FISH), makes use of probes specific for sites scattered along the length of each chromosome. The probes are labeled with several different fluorescent dyes with distinct excitation and emission wavelengths. Probes specific for each chromosome are labeled with a predetermined fraction of each of the dyes. After the probes are hybridized to chromosomes and the excess removed, the sample is observed with a fluorescence microscope in which a detector determines the fraction of each dye present at each fluorescing position in the microscopic field. This information is conveyed to a computer, and a special program assigns a false-color image to each type of chromosome (Figure 8-37a). Computer graphics allows the two homologs of each chromosome to be placed next to each other and numbered according to their decreasing size. Such an image clearly displays the cell’s karyotype (Figure 8-37b). Chromosome painting is a powerful method for detecting an abnormal number of chromosomes, such as chromosome 21 trisomy in patients with Down syndrome, or chromosomal translocations that occur in rare individuals and in cancer cells (Figure 8-38). The use of probes with different ratios of fluorescent dyes that hybridize to distinct positions along each normal human chromosome allows finer structural analysis of the chromosomes that can more readily reveal deletions or duplications of chromosomal regions. The chapter-opening figure illustrates the use of such multicolor FISH in analysis of the karyotype of a normal human female.
342
CHAPTER 8
t Genes, Genomics, and Chromosomes
hybridization (FISH) using chromosome paint probes. (b) Alignment of these painted chromosomes by computer graphics to reveal the normal human male karyotype. [Courtesy of Dr. Michael R. Speicher.]
Chromosome Painting and DNA Sequencing Reveal the Evolution of Chromosomes Analysis of chromosomes from different species has provided considerable insight into how chromosomes evolved. For example, hybridization of chromosome paint probes for chromosome 16 of the tree shrew (Tupaia belangeri) to tree shrew metaphase chromosomes revealed the two copies of chromosome 16, as expected (Figure 8-39a). However, when the same chromosome paint probes were hybridized to human metaphase chromosomes, most of the probes hybridized to the long arm of chromosome 10 (Figure8-39b). Further, when multiple probes for the long arm of human chromosome 10 with different fluorescent dye labels were hybridized to tree shrew metaphase chromosomes, these probes bound to sequences along tree shrew chromosome 16 in the same order in which they bind to human chromosome 10. These results indicate that during the evolution of humans and tree shrews from a common ancestor that lived as recently as 85 million years ago, a long, continuous DNA sequence on one of the ancestral chromosomes became chromosome 16 in tree shrews, but evolved into the long arm of chromosome 10 in humans. The phenomenon of genes occurring in the same order on a chromosome in two different species is referred to as conserved synteny (derived from Latin for “on the same ribbon”). The presence of two or more genes in a common chromosomal region in two or more species indicates a conserved syntenic segment. The relationships between the chromosomes of many primates have been determined by cross-species application of chromosome paint probes, as shown for human and tree
(b)
(a)
Normal chromosome 9 “Philadelphia chromosome” der (22) Normal chromosome 22 Philadelphia chromosome
9
der (9)
der (22) 22
der (9)
EXPERIMENTAL FIGURE 838 Chromosomal translocations can be analyzed using chromosome painting. Characteristic chromosomal translocations are associated with certain genetic disorders and specific types of cancers. For example, in nearly all patients with chronic myelogenous leukemia, the leukemic cells contain the Philadelphia chromosome, a shortened chromosome 22 [der (22)], and an
abnormally long chromosome 9 [der (9)] (“der” stands for derivative). These forms result from a translocation between normal chromosomes 9 and 22. This translocation can be detected (a) by classical banding analysis or (b) by chromosome painting. [Part (b) courtesy of J. Rowley and
shrew in Figure 8-39a, b. Using these relationships, as well as higher-resolution analyses of regions of synteny by DNA sequencing and other methods, it has been possible to propose the karyotype of the common ancestor of all primates based on the minimum number of chromosomal rearrangements necessary to generate the regions of synteny in chromosomes of contemporary primates. Human chromosomes are thought to have been derived from a common primate ancestor with 23 autosomes plus the X and Y sex chromosomes by several different mechanisms (Figure 8-39c). Some human chromosomes were derived without large-scale rearrangements of chromosome structure. Others are thought to have evolved by breakage of an ancestral chromosome into two chromosomes or, conversely, by fusion of two ancestral chromosomes. Still other human chromosomes appear to have been generated by exchanges of parts of the arms of distinct chromosomes; that is, by reciprocal translocation involving two ancestral chromosomes. Analysis of regions of conserved synteny between the chromosomes of many mammals indicates that chromosomal rearrangements by breakage, fusion, and translocations occurred rarely in mammalian evolution, about once every 5 million years. When such chromosomal rearrangements did occur, they very likely contributed to the evolution of new species that could not interbreed with the species from which they evolved. Chromosomal rearrangements similar to those inferred for the primate lineage have been inferred for other groups of related organisms, including the invertebrate, plant, and fungus lineages. The excellent agreement between predictions of evolutionary relationships based on analysis of
syntenic regions of chromosomes from organisms with related morphology (i.e., among mammals, among insects with similar body organization, among similar plants, etc.) and evolutionary relationships based on the fossil record and on the extent of divergence of DNA sequences for homologous genes is a strong argument for the validity of evolution as the process that generated the diversity of contemporary organisms.
R. Espinosa.]
Interphase Polytene Chromosomes Arise by DNA Amplification The larval salivary glands of Drosophila species and other dipteran insects contain enlarged interphase chromosomes that are visible in the light microscope. When fixed and stained with a dye that stains DNA, these polytene chromosomes are characterized by a large number of reproducible, well-demarcated bands, which have been assigned standardized numbers (Figure 8-40a). The densely staining bands represent regions where the chromatin is more condensed, and the light interband areas are regions where the chromatin is less condensed. Although the molecular mechanisms that control the formation of bands in polytene chromosomes are not yet understood, the highly reproducible banding pattern seen in Drosophila salivary gland chromosomes provides an extremely powerful method for locating specific DNA sequences along the chromosomes of this species. Not only are chromosomal translocations and inversions readily detectable in polytene chromosomes, but specific chromosomal proteins can be localized on interphase polytene chromosomes by immunostaining with specific antibodies raised
8.6 Morphology and Functional Elements of Eukaryotic Chromosomes
343
(c)
(a)
Primate ancestor
1
2
3
4
5
6
X
7
8
9
10 11 12 13 14
15 16 17 18 19 20 21 22 23 Homo sapiens 11 1
2
3
4
6
X
9
1
2
3
4
5
6
X
19 10
7
12
8
13
14
15
5
21
7 5
8
9
10 11 12 13 14
19 20
16
17
23 22
18
2
14 21
15 16 17 18 19 20 21 22
(b)
FIGURE 839 Evolution of primate chromosomes. (a) Chromosome paint probes (yellow) for chromosome 16 of the tree shrew (T. belangeri, distantly related to humans) hybridized to tree shrew metaphase chromosomes (red). (b) The same tree shrew chromosome 16 paint probes hybridized to human metaphase chromosomes. (c)Proposed evolution of human chromosomes (bottom) from the chromosomes of the common ancestor of all primates (top). The proposed common primate ancestor chromosomes are numbered according to their sizes, with each chromosome represented by a different color. The human chromosomes are also numbered according to their relative sizes and labeled with colors taken from the colors of the proposed common primate ancestor chromosomes from which they were derived. Small numbers to the left of the colored regions of the human chromosomes indicate the number of the ancestral chromosome from which the region was derived. Various human chromosomes were derived from the proposed chromosomes of the common primate ancestor without significant rearrangements (e.g., human chromosome 1); by fusion (e.g., human chromosome 2 by fusion of ancestral chromosomes 9 and 11); by breakage (e.g., human chromosomes 14 and 15 by breakage of ancestral chromosome 5); or by chromosomal translocations (e.g., a reciprocal translocation between ancestral chromosomes 14 and 21 generated human chromosomes 12 and 22). [Parts (a) and (b) republished with permission of Springer, from Muller, S., et al., “Defining the ancestral karyotype of all primates by multidirectional chromosome painting between tree shrews, lemurs and humans,” Chromosoma, 1999, 108(6):393-400; permission conveyed through Copyright Clearance Center. Part (c) data from L. Froenicke, 2005, Cytogenet. Genome Res. 108:122.]
against them (see Figure 9-15). Insect polytene chromosomes offer one of the only experimental systems in all of nature in which such immunolocalization studies on decondensed interphase chromosomes are possible. A generalized amplification of DNA gives rise to the polytene chromosomes found in the salivary glands of
344
CHAPTER 8
t Genes, Genomics, and Chromosomes
Drosophila. This process, termed polytenization, occurs when the DNA repeatedly replicates everywhere except at the telomeres and centromere, but the daughter chromosomes do not separate. The result is an enlarged chromosome composed of many parallel copies of itself, 1024 resulting from ten such replications in Drosophila melanogaster salivary
Three Functional Elements Are Required for Replication and Stable Inheritance of Chromosomes
(a) Chromocenter
2L 4
2R
3L
3R
X
(b) Centromere
Telomere
Telomere
EXPERIMENTAL FIGURE 840 Banding on Drosophila polytene salivary gland chromosomes. (a) In this light micrograph of Drosophila melanogaster larval salivary gland chromosomes, four chromosomes can be observed (X, 2, 3, and 4), with a total of approximately 5000 distinguishable bands. The banding pattern results from reproducible patterns of DNA and protein packing within each site along the chromosome. Dark bands are regions of more highly compacted chromatin. The centromeres of all four chromosomes often appear fused at the chromocenter. The tips of chromosomes 2 and 3 are labeled (L = left arm; R = right arm), as is the tip of the X chromosome. (b) The pattern of amplification of chromosome 4 during five replications. Double-stranded DNA is represented by a single line. Telomere and centromere DNA are not amplified. In salivary gland polytene chromosomes, each parental chromosome undergoes about 10 replications (210 = 1024 strands). See C. D. Laird et al., 1973, Cold Spring Harbor Symp. Quant. Biol. 38:311. [Part (a) courtesy of Joseph Gall, Carnegie Institution for Science.]
glands (Figure 8-40b). The amplification of chromosomal DNA greatly increases gene copy number, presumably to supply sufficient mRNA for protein synthesis in the massive salivary gland cells. The bands in Drosophila polytene chromosomes each represent some 50,000–100,000 bp, and the banding pattern reveals that the condensation of DNA varies greatly along these relatively short regions of an interphase chromosome.
Although eukaryotic chromosomes differ in length and number among species, cytogenetic studies have shown that they all behave similarly at the time of cell division. Moreover, any eukaryotic chromosome must contain three functional elements in order to replicate and segregate correctly: (1) replication origins at which DNA polymerases and other proteins initiate synthesis of DNA (see Figures 5-31 and 5-33); (2) the centromere, the constricted region required for proper segregation of daughter chromosomes; and (3) the two ends, or telomeres. The yeast transformation studies depicted in Figure 8-41 demonstrated the functions of these three chromosomal elements and established their importance for chromosome function. As discussed in Chapter 5, replication of DNA begins from sites that are scattered throughout eukaryotic chromosomes. The yeast genome contains many 100-bp sequences, called autonomously replicating sequences (ARSs), that act as replication origins. The observation that insertion of an ARS into a circular plasmid allows the plasmid to replicate in yeast cells provided the first functional identification of replication origins in eukaryotic DNA (Figure 8-41a). Even though circular ARS-containing plasmids can replicate in yeast cells, only about 5–20 percent of progeny cells contain the plasmid because mitotic segregation of the plasmids is faulty. However, plasmids that also carry a CEN sequence, derived from the centromeres of yeast chromosomes, segregate equally, or nearly so, to both mother and daughter cells during mitosis (Figure 8-41b). If circular plasmids containing an ARS and a CEN sequence are cut once with a restriction enzyme, the resulting linear plasmids do not transform yeast cells generating LEU+ colonies that grow on medium lacking leucine unless they contain special telomeric (TEL) sequences ligated to their ends (Figure 8-41c). The first successful experiments involving transfection of yeast cells with linear plasmids were achieved by using the ends of a DNA molecule that was known to replicate as a linear molecule in the ciliated protozoan Tetrahymena. During part of the life cycle of Tetrahymena, much of the nuclear DNA is repeatedly copied in short pieces to form a so-called macronucleus. One of these repeated fragments was identified as a dimer of ribosomal DNA, the ends of which contained a repeated sequence (G4T2)n. When a section of this repeated TEL sequence was ligated to the ends of linear yeast plasmids containing ARS and CEN, replication and good segregation of the linear plasmids occurred. This first cloning and characterization of telomeres garnered the Nobel Prize in Physiology or Medicine in 2009.
Centromere Sequences Vary Greatly in Length and Complexity Once the yeast centromere regions that confer mitotic segregation were cloned, their sequences could be determined and
8.6 Morphology and Functional Elements of Eukaryotic Chromosomes
345
Plasmid with Transfected ⴚ sequence from leu cell normal yeast
Progeny of transfected cell Growth without leucine
(a)
LEU
LEU
Conclusion
Mitotic segregation
ARS required for plasmid replication
No
No LEU
LEU
ARS
ARS LEU
Yes
Poor (5–20% of cells have plasmid)
In presence of ARS, plasmid replication occurs, but mitotic segregation is faulty
Good (>90% of cells have plasmid)
Genomic fragment CEN required for good segregation
ARS
(b) CE N
LEU
Yes ARS
LEU
CE N
CE N
LEU ARS
ARS
CE N
LEU
Yes
ARS
(c)
ARS
CE N
LEU
CE N
No
Linear plasmid lacking TEL is unstable
Yes
Linear plasmids containing ARS and CEN behave like normal chromosomes if genomic fragment TEL is added to both ends
ARS
Restriction enzyme produces linear plasmid
CE
A RS
LEU
L
N
L
TE
TE
LEU
LEU
CE
Yes
L TE
TE L
N
ARS
L TE
L
N
Good
TE
TEL ARS LEU CEN
CE
TEL
A RS
LEU
compared. The results revealed three regions (I, II, and III) that are conserved among the centromeres on different yeast chromosomes (Figure 8-42a). Short, fairly well-conserved nucleotide sequences are present in regions I and III. Region II does not have a specific sequence, but is AT-rich with a fairly constant length, probably so that regions I and III will lie on the same side of a specialized centromere-associated histone octamer. This specialized centromere-associated histone octamer contains the usual histones H2A, H2B, and H4, but a variant form of histone H3. Centromeres from all eukaryotes similarly contain nucleosomes with a specialized, centromere-specific form of histone H3, called CENP-A in humans. In the simple kinetochore of S. cerevisiae, a protein 346
CHAPTER 8
t Genes, Genomics, and Chromosomes
EXPERIMENTAL FIGURE 841 Yeast transformation experiments were used to identify the functional chromosomal elements necessary for normal chromosome replication and segregation. In these experiments, plasmids containing the LEU gene from normal yeast cells are constructed and introduced into leu− cells by transfection. If the plasmid is maintained in the leu− cells, they are transformed to LEU+ cells by the LEU gene on the plasmid and can form colonies on medium lacking leucine. (a) Sequences that allow autonomous replication (ARS) of a plasmid were identified because their insertion into a plasmid vector containing a cloned LEU gene resulted in a high frequency of transformation to LEU+. However, even plasmids with ARS exhibit poor segregation during mitosis and therefore do not appear in each of the daughter cells. (b) When randomly broken pieces of yeast DNA are inserted into plasmids containing ARS and LEU, some of the subsequently transfected cells produce large colonies, indicating that a high rate of mitotic segregation among their plasmids is facilitating the continuous growth of daughter cells. The DNA recovered from plasmids in these large colonies contains yeast centromere (CEN) sequences. (c) When leu− yeast cells are transfected with linearized plasmids containing LEU, ARS, and CEN, no colonies grow. Addition of telomere (TEL) sequences to the ends of the linear DNA gives the linearized plasmids the ability to replicate as new chromosomes that behave very much like a normal chromosome in both mitosis and meiosis. See A. W. Murray and J. W. Szostak, 1983, Nature 305:89, and L.Clarke and J. Carbon, 1985, Ann. Rev. Genet. 19:29.
complex called CBF3 associates with this specialized nucleosome. The CBF3 complex, in turn, associates with several copies of an elongated multiprotein complex called Ndc80 (Figure 8-42b). The Ndc80 complexes initially make lateral interactions with a spindle microtubule and subsequently interact with a Dam1 complex, which forms a ring around the end of the microtubule (Figure 8-42c). This interaction results in an end-on attachment of the centromere to the spindle microtubule. S. cerevisiae has by far the simplest centromere known in nature. In the fission yeast S. pombe, centromeres are 40–100kb in length and are composed of repeated copies of sequences similar to those in S. cerevisiae centromeres. Multiple
(a)
I
A A Yeast CEN: GTCACGTG
(b)
II
III
78–86 bp
T TGTTTCTGNTTTCCGAAA
Ndc80 complex
Domains that associate with a microtubule
(c)
Domains that associate with the CBF3 complex
CENP-A
Centromeric chromatin CBF3 complex
Addition of Telomeric Sequences by Telomerase Prevents Shortening of Chromosomes
Ndc80 complex
Lateral attachment
the CENP-A histone H3 variant, as well as other repeated simple-sequence DNAs. In higher eukaryotes, a complex protein structure called the kinetochore assembles at centromeres and associates with multiple mitotic spindle fibers during mitosis (see Figure18-40). Homologs of many of the centromereassociated proteins found in the yeasts occur in humans and other higher eukaryotes. For those yeast proteins for which clear homologs are not evident in higher cells based on amino acid sequence comparisons (such as the Dam1 complex), alternative complexes with similar properties have been proposed to function at kinetochores. The functions of the centromere and of the kinetochore proteins that bind to it during the segregation of sister chromatids in mitosis and meiosis are described in Chapters 18 and 19.
Microtubule plus end
Spindle pole Dam1 complex Lateral to end-on conversion
End-on attachment
FIGURE 842 Kinetochore-microtubule interaction in S. cerevisiae. (a) Sequence of the simple centromeres of S. cerevisiae. See L. Clarke and J. Carbon, 1985, Ann. Rev. Genet. 19:29. (b) Ndc80 complexes associate with both the microtubule and the CBF3 complex. (c) Diagram of the centromere-associated CBF3 complex and its associated Ndc80 complexes, which associate with a ring of Dam1 proteins at the end of a spindle microtubule. The Ndc80 complexes initially make lateral interactions with the side of a spindle microtubule (top) and then associate with the Dam1 ring, making an end-on attachment (bottom) to the microtubule. See T. U. Tanaka, 2010, EMBO J. 29:4070.
copies of proteins homologous to those that interact with S. cerevisiae centromeres bind to these complex S. pombe centromeres, and in turn bind the much longer S. pombe chromosomes to several microtubules of the mitotic spindle apparatus. In plants and animals, centromeres are megabases in length and are composed of multiple repeats of simple-sequence DNA. In humans, centromeres contain 2–4-Mb arrays of a 171-bp simple-sequence DNA called alphoid DNA, which is bound by nucleosomes containing
Sequencing of telomeres from multiple organisms, including humans, has shown that most are repetitive oligomers with a high G content located in the strand with its 3′ end at the end of the chromosome. The telomere repeat sequence in humans and other vertebrates is TTAGGG. These simple sequences are repeated at the very termini of chromosomes for a total of a few hundred base pairs in yeasts and protozoans and a few thousand base pairs in vertebrates. The 3′ end of the G-rich strand extends 12–16 nucleotides beyond the 5′ end of the complementary C-rich strand. This region is bound by specific proteins that protect the ends of linear chromosomes from attack by exonucleases. The need for a specialized region at the ends of eukaryotic chromosomes is apparent when we consider that all known DNA polymerases elongate DNA chains at the 3′ end, and all require an RNA or DNA primer. As the replication fork approaches the end of a linear chromosome, synthesis of the leading strand continues to the end of the DNA template strand, completing one daughter DNA double helix. However, because the lagging-strand template is copied in a discontinuous fashion, it cannot be replicated in its entirety (Figure 8-43). When the final RNA primer is removed, there is no upstream strand onto which DNA polymerase can build to fill the resulting gap. Without some special mechanism, the daughter DNA strand resulting from lagging-strand synthesis would be shortened at each cell division. The problem of telomere shortening is solved by an enzyme that adds telomeric repeat sequences to the ends of each chromosome. The enzyme is a protein–RNA complex called telomere terminal transferase, or telomerase. Because the sequence of the telomerase-associated RNA, as we will see, serves as the template for addition of deoxyribonucleotides to the ends of telomeres, the source of the enzyme, and not the source of the telomeric DNA primer, determines the sequence added. This was proved by transforming Tetrahymena with a mutated form of the gene encoding
8.6 Morphology and Functional Elements of Eukaryotic Chromosomes
347
5’
Lagging strand DNA synthesis UC
3’
Parent strands
RNA primer
5’ Chromosome end
CCCCAACCCCAACCC - 5’
3’ UA AA
AACCCCAAC U
5’
3’ 5’ Primer
3’ 5’
3’ Ligation
Elongation
1 UC
CCCC AACCCCAACCC
3’ 5’
5’
AACCCCAAC U
AC
GGGGT TGGGGT TGGGGT TGGGGT TGGGGT TG
5’
Translocation
2
CCCCAACCCCAACCC
3’ UA AA
AACCCCAAC U
3’
U
AC
5’
Elongation
3 UC
3’ 5’ Gap not filled
FIGURE 843 Standard DNA replication leads to loss of DNA at the 5′ end of each strand of a linear DNA molecule. Replication of the right end of a linear DNA is shown; the same process occurs at the left end (as can be shown by inverting the figure). As the replication fork approaches the end of the parental DNA molecule, the leading strand can be synthesized all the way to the end of the template strand without the loss of deoxyribonucleotides. However, since synthesis of the lagging strand requires RNA primers, the right end of the lagging daughter DNA strand would remain as ribonucleotides, which are removed and therefore cannot serve as the template for a replicative DNA polymerase. Alternative mechanisms must be used to prevent successive shortening of the lagging strand with each round of replication.
the telomerase-associated RNA. The resulting telomerase added a DNA sequence complementary to the mutated RNA sequence to the ends of telomeric primers. Thus telomerase is a specialized form of a reverse transcriptase that carries its own internal RNA template to direct DNA synthesis. These experiments also earned the Nobel Prize in Physiology or Medicine for the structure and function of telomeres in 2009. Figure 8-44 depicts how telomerase, by reverse transcription of its associated RNA, elongates the 3′ end of the singlestranded DNA at the end of the G-rich strand mentioned above. Cells from knockout mice that cannot produce the telomerase-associated RNA exhibit no telomerase activity, and their telomeres shorten successively with each cell generation. Such mice can breed and reproduce normally for three generations before the long telomere repeats become substantially eroded. Then, the absence of telomere DNA t Genes, Genomics, and Chromosomes
U
GGGGT TGGGGT TGGGGT TGGGGT TGGGGT TG
Shortened end
CHAPTER 8
3’ UA AA
UC
Gap fill-in Primer removal
348
AC
GGGGT TGGGGT TGGGGT TGGGGT TGGGG - 3’
Leading strand DNA synthesis
Polymerase
U
CCCCAACCCCAACCC
3’ UA AA
AACCCCAAC U
U
AC
GGGGT TGGGGT TGGGGT TGGGGT TGGGGT TGGGGT TG
FIGURE 844 Mechanism of action of telomerase. The singlestranded 3′ terminus of a telomere is extended by telomerase, counteracting the inability of the DNA replication mechanism to synthesize the extreme terminus of linear DNA. Telomerase elongates this singlestranded end by a reiterative reverse-transcription mechanism. The action of the telomerase from the protozoan Tetrahymena, which adds a T2G4 repeat unit, is depicted here; other telomerases add slightly different sequences. The telomerase contains an RNA template (red) that base-pairs to the 3′ end of the lagging-strand template. The telomerase catalytic site then adds deoxyribonucleotides TTG (blue), using the RNA molecule as a template (step 1 ). The strands of the resulting DNA-RNA duplex are then thought to slip (translocate) relative to each other so that the TTG sequence at the 3′ end of the replicating DNA base-pairs to the complementary RNA sequence in the telomerase RNA (step 2 ). The 3′ end of the replicating DNA is then again extended by telomerase (step 3 ). Telomerases can add multiple repeats by repetition of steps 2 and 3 . DNA polymerase α-primase can prime synthesis of new Okazaki fragments on this extended template strand. The net result prevents shortening of the lagging strand at each cycle of DNA replication. See C. W. Greider and E. H. Blackburn, 1989, Nature 337:331.
results in adverse effects, including fusion of chromosome termini and chromosome loss. By the fourth generation, the reproductive potential of these knockout mice declines, and they cannot produce offspring after the sixth generation. The human genes expressing the telomerase protein and the telomerase-associated RNA are active in germ cells and stem cells, but are turned off in most cells of adult
tissues that replicate only a limited number of times, or will never replicate again (such cells are called postmitotic). However, these genes are activated in most human cancer cells, where telomerase is required for the multiple cell divisions necessary to form a tumor. This phenomenon has stimulated a search for inhibitors of human telomerase as potential therapeutic agents for treating cancer. ■ While telomerase prevents telomere shortening in most eukaryotes, some organisms use alternative strategies. Drosophila species maintain telomere lengths by the regulated insertion of non-LTR retrotransposons into telomeres. This is one of the few instances in which a mobile element has a specific function in its host organism.
Visit LaunchPad to access study tools and to learn more about the content in this chapter.
t t t t
1FSTQFDUJWFTGPSUIF'VUVSF "OBMZ[FUIF%BUB &YUFOEFE3FGFSFODFT "EEJUJPOBMTUVEZUPPMT JODMVEJOHWJEFPT BOJNBUJPOT BOE RVJ[[FT
Key Terms KEY CONCEPTS OF SECTION 8.6
Morphology and Functional Elements of Eukaryotic Chromosomes r During metaphase, eukaryotic chromosomes become sufficiently condensed that they can be visualized individually in the light microscope. r The chromosomal karyotype is characteristic of each species. Closely related species can have dramatically different karyotypes, indicating that similar genetic information can be organized on chromosomes in different ways. r Banding analysis and chromosome painting are used to identify the different human metaphase chromosomes and to detect translocations and deletions (see Figure 8-37 and 8-38). r Analysis of chromosomal rearrangements and regions of conserved synteny between related species allows scientists to make predictions about the evolution of chromosomes (see Figure 8-39c). The evolutionary relationships between organisms indicated by these studies are consistent with proposed evolutionary relationships based on the fossil record and DNA sequence analysis. r The highly reproducible banding patterns of polytene chromosomes make it possible to visualize chromosomal deletions and rearrangements as changes in the normal pattern of bands. r Three types of DNA sequences are required for a long linear DNA molecule to function as a chromosome: a replication origin, called ARS in yeast; a centromere (CEN) sequence; and two telomere (TEL) sequences at the ends of the DNA (see Figure 8-41). r Telomerase, a protein–RNA complex, has a special reverse transcriptase activity that completes replication of telomeres during DNA synthesis (see Figure 8-44). In the absence of telomerase, the daughter DNA strand resulting from lagging-strand synthesis would be shortened at each cell division in most eukaryotes (see Figure 8-43).
nucleosome 328 centromere 345 open reading frame chromatid 341 (ORF) 326 chromatin 327 polytene chromosome 343 DNA transposon 313 protein family 306 euchromatin 332 pseudogene 307 exon shuffling 322 retrotransposon 313 fluorescence in situ hybridization (FISH) 342 simple-sequence (satellite) DNA 310 gene family 306 SINEs 318 genomics 302 SMC proteins 336 heterochromatin 332 telomere 345 histones 327 transcription unit 303 karyotype 341 transposable (mobile) DNA LINEs 318 element 312 long terminal repeats (LTRs) 316
Review the Concepts 1. Genes can be transcribed into mRNA, in the case of protein-coding genes, or into RNA, in the case of genes such as those that encode ribosomal or transfer RNAs. Define a gene. For the following characteristics, state whether they apply to (a) continuous, (b) simple, or (c) complex transcription units. i. Found in eukaryotes ii. Contain introns iii. Capable of making only a single protein from a given gene 2. Sequencing of the human genome has revealed much about the organization of genes. Describe the differences between solitary genes, gene families, pseudogenes, and tandemly repeated genes. 3. Much of the human genome consists of repetitious DNA. Describe the difference between microsatellite and minisatellite DNA. How is this repetitious DNA useful for identifying individuals by the technique of DNA fingerprinting? Review the Concepts
349
4. Mobile DNA elements that can move or transpose to a new site directly as DNA are called DNA transposons. Describe the mechanism by which a bacterial DNA transposon, called an insertion sequence, can transpose. 5. Retrotransposons are a class of mobile elements that transpose via an RNA intermediate. Contrast the mechanism of transposition between retrotransposons that contain long terminal repeats (LTRs) and those that lack LTRs. 6. Discuss the role that transposons may have played in the evolution of modern organisms. What is exon shuffling? What role do transposons play in the process of exon shuffling? 7. What are paralogous and orthologous genes? What are some of the explanations for the finding that humans are a much more complex organism than the roundworm C. elegans, yet have only about 5 percent more proteincoding genes (21,000 versus 20,000)? 8. The DNA in a cell associates with proteins to form chromatin. What is a nucleosome? What role do histones play in nucleosomes? How are nucleosomes arranged in condensed 30-nm fibers? 9. How do chromatin modifications regulate transcription? What modifications are observed in regions of the genome that are being actively transcribed? In regions that are not actively transcribed? 10. What is FISH? Briefly describe how it works. How is FISH used to characterize chromosomal translocations associated with certain genetic disorders and specific types of cancers? 11. What is chromosome painting, and how is this technique useful? How can chromosome paint probes be used to analyze the evolution of mammalian chromosomes? 12. Certain organisms contain cells that possess polytene chromosomes. What are polytene chromosomes, where are they found, and what function do they serve? 13. Replication and segregation of eukaryotic chromosomes require three functional elements: replication origins, a centromere, and telomeres. How would a chromosome be affected if it lacked (a) replication origins or (b) a centromere? 14. Describe the problem that occurs during DNA replication at the ends of chromosomes. How are telomeres related to this problem?
References Eukaryotic Gene Structure Black, D. L. 2003. Mechanisms of alternative pre-messenger RNA splicing. Ann. Rev. Biochem. 72:291–336. Davuluri, R. V., et al. 2008. The functional consequences of alternative promoter use in mammalian genomes. Trends Genet. 24:167–177. Wang, E. T., et al. 2008. Alternative isoform regulation in human tissue transcriptomes. Nature 456:470–476.
350
CHAPTER 8
t Genes, Genomics, and Chromosomes
Chromosomal Organization of Genes and Noncoding DNA Celniker, S. E., and G. M. Rubin. 2003. The Drosophila melanogaster genome. Ann. Rev. Genomics Hum. Genet. 4:89–117. Crook, Z. R., and D. Housman. 2011. Huntington’s disease: can mice lead the way to treatment? Neuron 69:423–435. Feuillet, C., et al. 2011. Crop genome sequencing: lessons and rationales. Trends Plant Sci. 16:77–88. Giardina, E., A. Spinella, and G. Novelli. 2011. Past, present and future of forensic DNA typing. Nanomedicine (Lond.) 6:257–270. Hannan, A. J. 2010. TRPing up the genome: tandem repeat polymorphisms as dynamic sources of genetic variability in health and disease. Discov. Med. 10:314–321. International Human Genome Sequencing Consortium. 2004. Finishing the euchromatic sequence of the human genome. Nature 431:931–945. Jobling, M. A., and P. Gill. 2004. Encoded evidence: DNA in forensic analysis. Nature Rev. Genet. 5:739–751. Lander, E. S., et al. 2001. Initial sequencing and analysis of the human genome. Nature 409:860–921. Todd, P. K., and H. L. Paulson. 2010. RNA-mediated neurodegeneration in repeat expansion disorders. Ann. Neurol. 67:291–300. Venter, J. C., et al. 2001. The sequence of the human genome. Science 291:1304–1351.
Transposable (Mobile) DNA Elements Curcio, M. J., and K. M. Derbyshire. 2003. The outs and ins of transposition: from mu to kangaroo. Nature Rev. Mol. Cell Biol. 4:865–877. Goodier, J. L., and H. H. Kazazian, Jr. 2008. Retrotransposons revisited: the restraint and rehabilitation of parasites. Cell 135:23–35. Jones, R. N. 2005. McClintock’s controlling elements: the full story. Cytogenet. Genome Res. 109:90–103. Lisch, D. 2009. Epigenetic regulation of transposable elements in plants. Ann. Rev. Plant Biol. 60:43–66.
Genomics: Genome-Wide Analysis of Gene Structure and Function BLAST Information can be found at: http://blast.ncbi.nlm.nih. gov/Blast.cgi. 1000 Genomes Project Consortium. 2010. A map of human genome variation from population-scale sequencing. Nature 467:1061–1073. Alkan, C., B. P. Coe, and E. E. Eichler. 2011. Genome structural variation discovery and genotyping. Nature Rev. Genet. 12: 363–376. Chimpanzee Sequencing and Analysis Consortium. 2005. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437:69–87. du Plessis, L., N. Skunca, and C. Dessimoz. 2011. The what, where, how and why of gene ontology—a primer for bioinformaticians. Brief Bioinform. 12:723–735. Ideker, T., J. Dutkowski, and L. Hood. 2011. Boosting signal-to-noise in complex biology: prior knowledge is power. Cell 144:860–863. Lander, E. S. 2011. Initial impact of the sequencing of the human genome. Nature 470:187–197. Mills, R. E., et al. 2011. Mapping copy number variation by population-scale genome sequencing. Nature 470:59–65. Picardi, E., and G. Pesole. 2010. Computational methods for ab initio and comparative gene finding. Meth. Mol. Biol. 609:269–284. Ramskold, D., et al. 2009. An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data. PLoS Comput. Biol. 5:e1000598.
Raney, B. J., et al. 2011. ENCODE whole-genome data in the UCSC genome browser (2011 update). Nucl. Acids Res. 39: D871–D875. Sleator, R. D. 2010. An overview of the current status of eukaryote gene prediction strategies. Gene 461:1–4. Sonah, H., et al. 2011. Genomic resources in horticultural crops: status, utility and challenges. Biotechnol. Adv. 29:199–209. Stratton, M. R. 2011. Exploring the genomes of cancer cells: progress and promise. Science 331:1553–1558. Venter, J. C. 2011. Genome-sequencing anniversary. The human genome at 10: successes and challenges. Science 331:546–547.
Structural Organization of Eukaryotic Chromosomes Bannister, A. J., and T. Kouzarides. 2011. Regulation of chromatin by histone modifications. Cell Res. 21:381–395. Bernstein, B. E., A. Meissner, and E. S. Lander. 2007. The mammalian epigenome. Cell 128:669–681. Horn, P. J., and C. L. Peterson. 2006. Heterochromatin assembly: a new twist on an old model. Chromosome Res. 14:83–94. Kurdistani, S. K. 2011. Histone modifications in cancer biology and prognosis. Prog. Drug Res. 67:91–106. Luger, K. 2006. Dynamic nucleosomes. Chromosome Res. 14:5–16. Luger, K., and T. J. Richmond. 1998. The histone tails of the nucleosome. Curr. Opin. Genet. Devel. 8:140–146. Nasmyth, K., and C. H. Haering. 2005. The structure and function of SMC and kleisin complexes. Ann. Rev. Biochem. 74:595–648.
Schalch, T., et al. 2005. X-ray structure of a tetranucleosome and its implications for the chromatin fibre. Nature 436:138–141. Woodcock, C. L., and R. P. Ghosh. 2010. Chromatin higherorder structure and dynamics. Cold Spring Harbor Perspect. Biol. 2:a000596.
Morphology and Functional Elements of Eukaryotic Chromosomes Armanios, M., and C. W. Greider. 2005. Telomerase and cancer stem cells. Cold Spring Harbor Symp. Quant. Biol.70:205–208. Belmont, A. S. 2006. Mitotic chromosome structure and condensation. Curr. Opin. Cell Biol. 18:632–638. Blackburn, E. H. 2005. Telomeres and telomerase: their mechanisms of action and the effects of altering their functions. FEBS Lett. 579:859–862. Cvetic, C., and J. C. Walter. 2005. Eukaryotic origins of DNA replication: could you please be more specific? Semin. Cell Dev. Biol. 16:343–353. Froenicke, L. 2005. Origins of primate chromosomes as delineated by Zoo-FISH and alignments of human and mouse draft genome sequences. Cytogenet. Genome Res. 108:122–138. MacAlpine, D. M., and S. P. Bell. 2005. A genomic view of eukaryotic DNA replication. Chromosome Res. 13:309–326. Ohta, S., et al. 2011. Building mitotic chromosomes. Curr. Opin. Cell Biol. 23:114–121. Tanaka, T. U. 2010. Kinetochore-microtubule interactions: steps towards bi-orientation. EMBO J. 29:4070–4082.
References
351
this page left intentionally blank
CHAPTER
9 Transcriptional Control of Gene Expression Drosophila polytene chromosomes stained with antibodies against a chromatin-remodeling ATPase called Kismet (blue), RNA polymerase II with low CTD phosphorylation (red), and RNA polymerase II with high CTD phosphorylation (green). [Reproduced with permission of The Company of Biologists, from Srinivasan, S., et al., “The Drosophila trithorax group protein Kismet facilitates an early step in transcriptional elongation by RNA Polymerase II,” Development, 2005, 132(7):1623-1635; permission conveyed through Copyright Clearance Center, Inc.]
In previous chapters, we have seen that the properties and functions of each cell type are determined by the proteins it contains. In this chapter and the next, we consider how the kinds and amounts of the various proteins produced by a particular cell type in a multicellular organism are regulated. This regulation of gene expression is the fundamental process that controls the development of multicellular organisms such as ourselves from a single fertilized egg cell into the thousands of cell types of which we are made. When gene expression goes awry, cellular properties are altered, a process that all too often leads to the development of cancer. As discussed further in Chapter 24, genes encoding proteins that restrain cell growth are abnormally repressed in cancer cells, whereas genes encoding proteins that promote cell growth and replication are inappropriately activated in cancer cells. Abnormalities in gene expression also result in developmental defects such as cleft palate, tetralogy of Fallot (a serious developmental defect of the heart that can be treated surgically), and many others. Regulation of gene expression
also plays a vital role in bacteria and other single-celled microorganisms, in which it allows cells to adjust their enzymatic machinery and structural components in response to their changing nutritional and physical environment. Consequently, to understand how microorganisms respond to their environment and how multicellular organisms normally develop, as well as how pathological abnormalities of gene expression occur, it is essential to understand the molecular interactions that control protein production. The basic steps in gene expression—that is, the entire process whereby the information encoded in a particular gene is decoded into a particular protein—are reviewed in Chapter 5. Synthesis of mRNA requires that an RNA polymerase initiate transcription (initiation), polymerize ribonucleoside triphosphates complementary to the DNA coding strand (elongation), and then terminate transcription (termination) (see Figure 5-11). In bacteria, ribosomes and translation initiation factors have immediate access to newly formed RNA transcripts, which function as mRNA without further modification.
OU TL I N E 9.1
Control of Gene Expression in Bacteria
9.2
Overview of Eukaryotic Gene Control
9.3
RNA Polymerase II Promoters and General Transcription Factors
9.4
Regulatory Sequences in Protein-Coding Genes and the Proteins Through Which They Function
9.5
Molecular Mechanisms of Transcription Repression and Activation
9.6
Regulation of Transcription-Factor Activity
9.7
Epigenetic Regulation of Transcription
9.8
Other Eukaryotic Transcription Systems
In eukaryotes, however, the initial RNA transcript is subjected to processing that yields a functional mRNA (see Figure 5-15). The mRNA then is transported from its site of synthesis in the nucleus to the cytoplasm, where it is translated into protein with the aid of ribosomes, tRNAs, and translation factors (see Figures 5-23, 5-24, and 5-26). Regulation may occur at several of the various steps in gene expression outlined above: transcription initiation, elongation, RNA processing, and mRNA export from the nucleus, as well as through control of mRNA degradation, mRNA translation into protein, and protein degradation. This regulation results in differential protein expression in different cell types or developmental stages or in response to external conditions. Although examples of regulation at each step in gene expression have been found, control of transcription initiation and of elongation— the first two steps—are the most important mechanisms for determining whether most genes are expressed and how much of the encoded mRNAs and, consequently, proteins are produced (Figure 9-1). The molecular mechanisms that regulate transcription initiation and elongation are critical to numerous biological phenomena, including the development of a multicellular organism, as mentioned above, the immune responses that protect us from pathogenic microorganisms, and neurological processes such as learning and memory. When these regulatory mechanisms controlling transcription function improperly, pathological processes may occur. For example, dominant mutations of the HOXD13 gene result in polydactyly, the embryological development of extra digits of the feet, hands, or both (Figure 9-2a). HOXD13 encodes a transcription factor that normally regulates the transcription of multiple genes involved in development of the extremities. Other mutations affecting the function or expression of transcription factors cause an extra pair of wings to develop in Drosophila (Figure 9-2b), Rates of: Transcription 73%
8% 8%
mRNA translation Protein degradation
11% mRNA degradation
FIGURE 91 Contributions of the major processes that regulate protein concentrations. The concentration of a protein is controlled by regulation of the frequency with which the mRNA encoding the protein is synthesized (gene transcription), the rate at which that mRNA is degraded, the rate at which that mRNA is translated into protein, and the rate at which that protein is degraded. The relative contributions of these four rates to determining the concentrations of thousands of proteins in cultured mouse fibroblasts were determined by mass spectrometry to measure protein concentrations (see Chapter3), mRNA sequencing (RNA-seq) to measure mRNA levels (see Chapter6), protection of mRNA from ribonuclease digestion by associated ribosomes (ribosome footprinting) to estimate translation rates, stable isotope labeling to determine degradation rates, and statistical analysis of the data to correct for inherent biases and errors in these methods. [Data from J. J. Li and M. D. Biggin, 2014, Science 347:1066.] 354
CHAPTER 9
t Transcriptional Control of Gene Expression
alter the structures of flowers in plants (Figure 9-2c), and are responsible for multiple other developmental abnormalities. Transcription is a complex process involving many layers of regulation. In this chapter, we focus on the molecular events that determine when transcription of a gene occurs. First, we consider the mechanisms of gene expression in bacteria, in which DNA is not bound by histones and packaged into nucleosomes. Repressor and activator proteins recognize and bind to specific DNA sequences to control the transcription of a nearby gene, and in many cases, specific tertiary structures in nascent mRNAs, called riboswitches, bind metabolites to regulate transcription elongation. The remainder of the chapter focuses on eukaryotic regulation of transcription and how the basic tenets of bacterial regulation are applied in more complex ways in higher organisms. In addition, eukaryotic regulation mechanisms make use of the association of DNA with histone octamers, forming chromatin structures with varying degrees of condensation, and of post-translational modifications of histone tails such as acetylation and methylation (see Figure8-26). Figure 9-3 provides an overview of transcriptional regulation in metazoans (multicellular animals) and of the processes outlined in this chapter. We discuss how the RNA polymerases responsible for the transcription of different classes of eukaryotic genes bind to promoter sequences to initiate the synthesis of an RNA molecule, and how specific DNA sequences function as transcription-control regions by serving as the binding sites for the transcription factors that regulate transcription. Next we consider how eukaryotic activators and repressors influence transcription through interactions with large multiprotein complexes. Some of these multiprotein complexes modify chromatin condensation, altering the accessibility of chromosomal DNA to transcription factors and RNA polymerases. Other complexes directly influence the frequency at which RNA polymerases bind to promoters and initiate transcription. Very recent research has revealed that, for many genes in multicellular animals, the RNA polymerase pauses after transcribing a short RNA, and that one transcriptional regulation mechanism involves a release of the paused polymerase, allowing it to transcribe the rest of the gene. We discuss how transcription of specific genes can be specified by particular combinations of the roughly 1400 transcription factors encoded in the human genome, giving rise to cell-type-specific gene expression. We consider the various ways in which the activities of transcription factors themselves are controlled to ensure that genes are expressed only in the correct cell types and at the appropriate time during their differentiation. We also discuss recent studies revealing that RNA-protein complexes in the nucleus can regulate transcription. New methods for sequencing DNA, coupled with reverse transcription of RNA into DNA in vitro, have revealed that much of the genome of eukaryotes is transcribed into lowabundance RNAs that do not encode proteins. Several nuclear long noncoding RNAs (lncRNAs) have recently been discovered to regulate the transcription of other proteincoding genes. This finding raises the possibility that transcriptional control by such noncoding RNAs may be much more general than is currently understood. Recent advances in mapping the association of transcription factors with
specific regions of chromatin across the entire genome in a variety of cell types have provided the first glimpses of how transcription factors regulate embryonic development from the pluripotent stem cells of the early embryo to the fully differentiated cells that make up most of our tissues. RNA processing and various post-transcriptional mechanisms
for controlling eukaryotic gene expression are covered in Chapter 10. Subsequent chapters, particularly Chapters 15, 16, and 21, provide examples of how transcription is regulated by interactions between cells and how the resulting gene control contributes to the development and function of specific types of cells in multicellular organisms.
(a)
(b)
Dominant HOXD13 mutation
Normal
Haltere
Normal
Ubx mutation
Normal
Homozygous recessive mutations in ap2-1, pi-1, and ag-1 genes
(c)
FIGURE 92 Phenotypes of mutations in genes encoding transcription factors. (a) A dominant mutation in the human HOXD13 gene results in the development of extra digits, a condition known as polydactyly. (b) Homozygous recessive mutations that prevent expression of the Ubx gene in the third thoracic segment of Drosophila result in transformation of that segment, which normally has a balancing organ called a haltere, into a second copy of the thoracic segment that develops wings. (c) Mutations in Arabidopsis thaliana that inactivate both copies of three floral organ–identity genes transform the normal parts of the flower into leaflike structures. In each case, these mutations
affect master regulatory transcription factors that regulate multiple genes, including many genes encoding other transcription factors. [Part (a), left, Lightvision, LLC/Moment Open/Getty Images; right, Goodman, F. R. and Scrambler, P. J., Human HOX gene mutations. Clinical Genetics, 2001, 59:1, pages 1–11. Part (b) from “The bithorax complex: the first fifty years,” by Edward B. Lewis, reproduced with permission from The International Journal of Developmental Biology, 1998, Vol 42(403-15), Figures 4a and 4b. Part (c) republished with permission of Elsevier, from Weigel, D. and Meyerowitz, M., “The ABCs of floral homeotic genes,” Cell, 1994, 78(2):203-209; permission conveyed through Copyright Clearance Center, Inc.]
CHAPTER 9
t Transcriptional Control of Gene Expression
355
FIGURE 93 Overview of eukaryotic transcriptional control. Inactive genes are assembled into regions of condensed chromatin that inhibit RNA polymerases and their associated general transcription factors from interacting with promoters. A pioneer transcription factor is able to bind to a specific regulatory sequence within the condensed chromatin and interact with chromatin-remodeling enzymes and histone acetylases that decondense the chromatin, making it accessible to RNA polymerase II and the general transcription factors. Additional activator proteins then bind to specific transcriptioncontrol elements in both promoter-proximal sites and distant enhancers, where they interact with one another and with the multisubunit Mediator complex to assemble RNA polymerase II (Pol II) and general transcription factors on promoters. Alternatively, repressor proteins bind to other transcription-control elements to inhibit transcription initiation by PolII and interact with multiprotein co-repressor complexes to condense chromatin. During transcriptional activation, Pol II initiates transcription, but pauses after transcribing fewer than 100 nucleotides due to the action of the elongation inhibitor NELF associated with DSIF. Activators promote the association of the Pol II-NELF-DSIF complex with elongation factor P-TEFb, which releases NELF and allows productive elongation through the gene. DSIF is the DRB sensitivity-inducing factor, NELF is the negative elongation factor, and P-TEFb is a protein kinase made up of CDK9 and cyclin T. See S. Malik and R. G.Roeder, 2010, Nat. Rev. Genet. 11:761.
Closed chromatin
Gene “Off”
Repressors
Pioneer transcription factors
Chromatin co-activators Ac
Ac
Ac
Me
Me
Me
Repressors
Ac
Ac
Open chromatin
Activators
Ac Ac Me Me Me
IIH
Ac IIE
IID IIB
Ac
IIH
IIA IIF
Pol II
Me
Mediator Activators, another Pol II
Activators Ac
Ac
Pausing Scaffold IID IIA
Gene “On”
Pol II Ac NELF
Scaffold Activators
DSIF
IID IIA
Me Nascent transcript
9.1 Control of Gene Expression in Bacteria Because the structure and function of a cell are determined by the proteins it contains, the control of gene expression is a fundamental aspect of molecular cell biology. Most commonly, the “decision” to transcribe the gene encoding a particular protein is the major mechanism for controlling production of the encoded protein in a cell. By controlling transcription, a cell can regulate which proteins it produces 356
CHAPTER 9
t Transcriptional Control of Gene Expression
P-TEFb
Ac
7MeG
DSIF
Me Pol II
and how rapidly they are synthesized. When transcription of a gene is repressed, the corresponding mRNA and encoded protein or proteins are synthesized at low rates. Conversely, when transcription of a gene is activated, both the mRNA and encoded protein or proteins are produced at much higher rates. In most bacteria and other single-celled organisms, gene expression is highly regulated in order to adjust the cell’s enzymatic machinery and structural components to changes in the nutritional and physical environment. Thus at any given
time, a bacterial cell normally synthesizes only those proteins that are required for its survival under the current conditions. Here we describe the basic features of transcriptional control in bacteria, using the lac operon and the glutamine synthetase gene in E. coli and the xpt-pbuX operon in Bacillus subtilis as our primary examples. Many of the same features are involved in eukaryotic transcriptional control, which will be the subject of the remainder of this chapter.
Transcription Initiation by Bacterial RNA Polymerase Requires Association with a Sigma Factor In E. coli, about half the genes are clustered into operons, each of which encodes enzymes involved in a particular metabolic pathway or proteins that interact to form one multisubunit protein complex. For instance, the trp operon discussed in Chapter 5 encodes five polypeptides needed in the biosynthesis of tryptophan (see Figure 5-13). Similarly, the lac operon encodes three proteins required for the metabolism of lactose, a sugar present in milk. Because a bacterial operon is transcribed from one start site into a single mRNA, all the genes within an operon are coordinately regulated; that is, they are all activated or repressed at the same time to the same extent. The transcription of operons, as well as that of isolated genes, is controlled by interplay between RNA polymerase and specific repressor and activator proteins. In order to initiate transcription, E. coli RNA polymerase must associate with one of a small number of σ (sigma) factors. The most common one in eubacterial cells is σ70. This σ-factor binds to both RNA polymerase and promoter DNA sequences, bringing the RNA polymerase enzyme to the promoter. It recognizes and binds to both a six-base-pair sequence centered at about 10 bp and a seven-base-pair sequence centered at about 35 bp upstream from the +1 transcription start. Consequently, the −10 sequence and the −35 sequence together constitute a promoter for E. coli RNA polymerase associated with σ70 (see Figure 5-10b). Although the promoter sequences contacted by σ70 are located at −35 and −10, E. coli RNA polymerase binds to the promoter-region DNA from roughly −50 to +20 through interactions with DNA that do not depend on the sequence. The σ-factor also assists the RNA polymerase in separating the DNA strands at the transcription start site and in inserting the coding strand into the active site of the polymerase so that transcription starts at +1 (see Figure 5-11, step 2 ). The optimal σ70-RNA polymerase promoter sequence, determined as the “consensus sequence” of multiple strong promoters, is −35 region
−10 region
ttgACAt——15–17 bp——tatAAt This consensus sequence shows the most commonly occurring base at each of the positions in the −35 and −10 regions. The size of the font indicates the importance of the base at that position, as determined by the influence of mutations of these bases on the frequency of transcription
initiation (i.e., the number of times per minute that RNA polymerases initiate transcription). The sequence shows the strand of DNA that has the same 5′→3′ orientation as the transcribed RNA (i.e., the nontemplate strand). However, the σ70-RNA polymerase initially binds to double-stranded DNA. After the polymerase transcribes a few tens of base pairs, σ70 is released. Thus σ70 acts as an initiation factor that is required for transcription initiation, but not for RNA strand elongation once initiation has taken place.
Initiation of lac Operon Transcription Can Be Repressed or Activated When E. coli is in an environment that lacks lactose, synthesis of lac mRNA is repressed so that cellular energy is not wasted synthesizing enzymes the cell does not require. In an environment containing both lactose and glucose, E. coli cells preferentially metabolize glucose, the central molecule of carbohydrate metabolism. The cells metabolize lactose at a high rate only when lactose is present and glucose is largely depleted from the medium. They achieve this metabolic adjustment by repressing transcription of the lac operon until lactose is present and allowing synthesis of only low levels of lac mRNA until the cytosolic concentration of glucose falls to low levels. Transcription of the lac operon under different conditions is controlled by lac repressor protein and catabolite activator protein (CAP) (also called CRP, for cAMP receptor protein), each of which binds to a specific DNA sequence in the lac transcription-control region; these two sequences are called the operator and the CAP site, respectively (Figure 9-4, top). For transcription of the lac operon to begin, the σ70 subunit of the RNA polymerase must bind to the lac promoter at the −35 and −10 promoter sequences. When no lactose is present, the lac repressor binds to the lac operator, which overlaps the transcription start site. Therefore, the lac repressor bound to the operator site blocks σ70 binding and hence transcription initiation by RNA polymerase (Figure 9-4a). When lactose is present, it binds to specific binding sites in each subunit of the tetrameric lac repressor, causing a conformational change in the protein that makes it dissociate from the lac operator. As a result, the polymerase can bind to the promoter and initiate transcription of the lac operon. However, when glucose is also present, the frequency of transcription initiation is very low, resulting in the synthesis of only low levels of lac mRNA and thus of the proteins encoded by the lac operon (Figure 9-4b). The frequency of transcription initiation is low because the −35 and −10 sequences in the lac promoter differ from the ideal σ70-binding sequences shown previously. Once glucose is depleted from the medium and the intracellular glucose concentration falls, E. coli cells respond by synthesizing cyclic AMP (cAMP). As the concentration of cAMP increases, it binds to a site in each subunit of the dimeric CAP protein, causing a conformational change that allows the protein to bind to the CAP site in the lac transcription-control region. The bound CAP-cAMP complex interacts with the polymerase bound to the promoter, greatly increasing the 9.1 Control of Gene Expression in Bacteria
357
1 (transcription start site) Promoter lacZ CAP site Operator E. coli lac transcription-control regions X70 Pol
CAP (a)
lac repressor
lactose glucose (low cAMP)
lacZ No mRNA transcription lactose
(b) X70
lactose glucose (low cAMP)
lacZ
Pol
Low transcription
cAMP
(c)
X70
lactose glucose (high cAMP)
lacZ High transcription
Pol
O3
(d)
O1
repressor Lac repres Promoter O3
O1
lacZ lacZ
O2
O3
Promoter
O1
O2
Lac repressor
FIGURE 94 Regulation of transcription from the lac operon of E. coli. (Top) The transcription-control region, composed of roughly a hundred base pairs, includes three protein-binding regions: the CAP site, which binds catabolite activator protein; the lac promoter, which binds the σ70-RNA polymerase complex; and the lac operator, which binds lac repressor. The lacZ gene encoding the enzyme β-galactosidase, the first of the three genes in the operon, is shown to the right. (a) In the absence of lactose, very little lac mRNA is produced because the lac repressor binds to the operator, inhibiting transcription initiation by σ70-RNA polymerase. (b) In the presence of glucose and lactose, lac repressor binds lactose and dissociates from the operator, allowing σ70-RNA polymerase to initiate transcription at a low rate. (c) Maximal transcription of the lac operon occurs in the presence of lactose and the absence of glucose. In this situation, cAMP increases in response to the low glucose concentration and forms a CAP-cAMP complex, which binds to the CAP site, where it interacts with RNA polymerase to increase the rate of transcription initiation. (d) The tetrameric lac repressor binds to the primary lac operator (O1) and one of two secondary operators (O2 or O3) simultaneously. The two structures are in equilibrium. See B. Muller-Hill, 1998, Curr. Opin. Microbiol. 1:145. [Part (d) data from M. Lewis et al., 1996, Science 271:1247-1254, PDB IDs 1lbh and 1lbg; and R. Daber et al., 2007, J. Mol. Biol. 370:609-619, PDB ID 2pe5.]
358
CHAPTER 9
t Transcriptional Control of Gene Expression
frequency of transcription initiation. This activation leads to synthesis of high levels of lac mRNA and subsequently of the enzymes encoded by the lac operon (Figure 9-4c). In fact, the lac operon is more complex than depicted in the simplified model in Figure 9-4a–c. The tetrameric lac repressor actually binds to two DNA sequences simultaneously, one at the primary operator (lacO1), which overlaps the region of DNA bound by RNA polymerase at the promoter, and the other at one of two secondary operators centered at +412 (lacO2), within the lacZ protein-coding region, and −82 (lacO3) (Figure 9-4d). The lac repressor tetramer is a dimer of dimers. Each dimer binds to one operator (Figure 9-4d). Simultaneous binding of the tetrameric lac repressor to the primary lac operator and one of the two secondary operators is possible because DNA is quite flexible, as we saw in the wrapping of DNA around the surface of a histone octamer in the nucleosomes of eukaryotes (see Figure 8-24). The secondary operators function to increase the local concentration of lac repressor in the micro-vicinity of the primary operator where repressor binding blocks RNA polymerase binding. Since the equilibrium of binding reactions depends on the concentrations of the binding partners, the resulting increased local concentration of lac repressor in the vicinity of O1 increases repressor binding to O1. There are approximately 10 lac repressor tetramers per E. coli cell. Because of binding to O2 and O3, there is nearly always a lac repressor tetramer much closer to O1 than would otherwise be the case if the 10 repressor tetramers were diffusing randomly through the cell. If both O2 and O3 are mutated so that the lac repressor no longer binds to them with high affinity, repression at the lac promoter is reduced by a factor of 70. Mutation of only O2 or only O3 reduces repression twofold, indicating that either one of these secondary operators can provide most of the increase in repression. Although the promoters for different E. coli genes exhibit considerable homology, their exact sequences differ. The promoter sequence determines the intrinsic frequency at which RNA polymerase–σ complexes initiate transcription of a gene in the absence of a repressor or activator protein. Promoters that support a high frequency of transcription initiation have −10 and −35 sequences similar to the ideal promoter shown previously and are called strong promoters. Those that support a low frequency of transcription initiation differ from this ideal sequence and are called weak promoters. The lac operon, for instance, has a weak promoter whose sequence differs from the consensus strong promoter at several positions. Its low intrinsic frequency of initiation is further reduced by the lac repressor and substantially increased by the cAMP-CAP complex.
Small Molecules Regulate Expression of Many Bacterial Genes via DNA-Binding Repressors and Activators Transcription of most E. coli genes is regulated by processes similar to those described for the lac operon, although the detailed interactions differ at each promoter. The general mechanism involves a specific repressor that binds to the operator
region of a gene or operon, thereby blocking transcription initiation. A small-molecule ligand binds to the repressor controlling its DNA-binding activity, and consequently the frequency of transcription initiation and therefore the rate of synthesis of the mRNA and encoded proteins as appropriate for the needs of the cell. As for the lac operon, many eubacterial transcription-control regions contain one or more secondary operators that contribute to the level of repression. Specific activator proteins, such as CAP in the lac operon, also control transcription of a subset of bacterial genes that have binding sites for the activator. Like CAP, other activators bind to DNA together with RNA polymerase, stimulating transcription from a specific promoter. The DNA-binding activity of an activator can be modulated in response to cellular needs by the binding of specific smallmolecule ligands (e.g., cAMP) or by post-translational modifications, such as phosphorylation, that alter the conformation of the activator.
Transcription Initiation from Some Promoters Requires Alternative Sigma Factors 70
Most E. coli promoters interact with σ -RNA polymerase, the major initiating form of the bacterial enzyme. The transcription of certain groups of genes, however, is initiated by E. coli RNA polymerases containing one of several
alternative sigma factors that recognize different consensus promoter sequences than σ70 does (Table 9-1). These alternative σ-factors are required for the transcription of sets of genes with related functions, such as those involved in the response to heat shock or nutrient deprivation, motility, or sporulation in gram-positive eubacteria. In E. coli, there are 6 alternative σ-factors in addition to the major “housekeeping” σ-factor, σ70. The genome of the gram-positive, sporulating bacterium Streptomyces coelicolor encodes 63 σ-factors, the current record, based on sequence analysis of hundreds of eubacterial genomes. Most are structurally and functionally related to σ70. Transcription initiation by RNA polymerases containing σ70-like factors is regulated by repressors and activators that bind to DNA near the region where the polymerase binds. But one class, represented in E. coli by σ54, is unrelated to σ70 and functions differently.
Transcription by σ54-RNA Polymerase Is Controlled by Activators That Bind Far from the Promoter The sequence of σ54 is distinctly different from that of all the σ70-like factors. Transcription of genes by RNA polymerases containing σ54 is regulated solely by activators whose binding sites in DNA, referred to as enhancers, are generally located 80–160 bp upstream from the transcription start site.
TABLE 91 Sigma Factors of E. coli Promoter Consensus Sigma Factor
Promoters Recognized
−35 Region
−10 Region
σ70 (σD)
Housekeeping genes, most genes in exponentially replicating cells
TTGACA
TATAAT
σS (σ38)
Stationary-phase genes and general stress response
TTGACA
TATAAT
σ32 (σH)
Induced by unfolded proteins in the cytoplasm; genes encoding chaperones that refold unfolded proteins and protease systems leading to the degradation of unfolded proteins in the cytoplasm
TCTCNCCCTTGAA
CCCCATNTA
σE (σ24)
Activated by unfolded proteins in the periplasmic space and cell membrane; genes encoding proteins that restore integrity to the cellular envelope
GAACTT
TCTGA
σF (σ28)
Genes involved in flagellum assembly
CTAAA
CCGATAT
FecI (σ18)
Genes required for iron uptake
TTGGAAA
GTAATG
−24 Region
−12 Region
CTGGNA
TTGCA
σ
54
N
(σ )
Genes for nitrogen metabolism and other functions
Data from T. M. Gruber and C. A. Gross, 2003, Annu. Rev. Microbiol. 57:441, and B. K. Cho et al., 2014, BMC Biol. 12:4.
9.1 Control of Gene Expression in Bacteria
359
Even when enhancers are moved more than a kilobase away from a start site, σ54-activators can activate transcription. The best-characterized σ54-activator—the NtrC protein (nitrogen regulatory protein C)—stimulates transcription of the glnA gene. The glnA gene encodes the enzyme glutamine synthetase, which synthesizes the amino acid glutamine, the central molecule of nitrogen metabolism, from glutamic acid and ammonia. The σ 54-RNA polymerase binds to the glnA promoter but does not melt the DNA strands and initiate transcription until it is activated by NtrC, a dimeric protein. NtrC, in turn, is regulated by a protein kinase called NtrB. In response to low levels of glutamine, NtrB phosphorylates dimeric NtrC, which then binds to an enhancer upstream of the glnA promoter. Enhancer-bound phosphorylated NtrC then stimulates the σ54-polymerase bound at the promoter to separate the DNA strands and initiate transcription. Electron microscopy studies have shown that phosphorylated NtrC bound at enhancers and σ54-polymerase bound at the promoter interact directly, forming a loop in the DNA between the binding sites (Figure 9-5). As discussed
later in this chapter, this activation mechanism resembles the predominant mechanism of transcriptional activation in eukaryotes. NtrC has ATPase activity, and ATP hydrolysis is required for activation of bound σ54-RNA polymerase by phosphorylated NtrC. Mutants with an NtrC that is defective in ATP hydrolysis are invariably defective in stimulating the σ54-RNA polymerase to melt the DNA strands at the transcription start site. It is postulated that ATP hydrolysis supplies the energy required for melting the DNA strands. In contrast, the σ70-polymerase does not require ATP hydrolysis to separate the strands at a start site.
Many Bacterial Responses Are Controlled by Two-Component Regulatory Systems As we have just seen, control of the E. coli glnA gene depends on two proteins, NtrC and NtrB. Such two-component regulatory systems control many responses of bacteria to changes in their environment. At high concentrations of glutamine, glutamine binds to a sensor domain of NtrB, causing a
NtrC dimers
-
(a) Pair of phosphorylated NtrC dimers P P
P
P
Enhancer (–140 and –108)
glnA promoter
(b)
P P
P P
NtrC dimers
EXPERIMENTAL FIGURE 95 DNA looping permits interaction of bound NtrC and σ54-RNA polymerase. (a) Drawing (left) and electron micrograph (right) of DNA restriction fragment with phosphorylated NtrC dimers bound to the enhancer region near one end and σ54-RNA polymerase bound to the glnA promoter near the other end. 360
CHAPTER 9
t Transcriptional Control of Gene Expression
-
(b) Drawing (left) and electron micrograph (right) of the same fragment preparation, showing NtrC dimers and σ54-RNA polymerase bound to each other, with the intervening DNA forming a loop between them. See W. Su et al., 1990, Proc. Natl. Acad. Sci. USA 87:5504. [Micrographs courtesy Harrison Echols and Carol Gross.]
conformational change in the protein that inhibits its histidine kinase activity (Figure 9-6a). At the same time, the regulatory domain of NtrC blocks its DNA-binding domain from binding the glnA enhancers. At low concentrations of glutamine, glutamine dissociates from the sensor domain in the NtrB protein, leading to activation of a histidine kinase transmitter domain in NtrB that transfers the γ-phosphate of ATP to a histidine residue (H) in the transmitter domain. This phosphohistidine then transfers the phosphate to an aspartic acid residue (D) in the NtrC protein. This causes a conformational change in NtrC that unmasks the NtrC DNA-binding domain so that it can bind to the glnA enhancers. Many other bacterial responses are regulated by two proteins with homology to NtrB and NtrC (Figure 9-6b). (a) Two-component system regulating response to low Gln NtrB NtrC Regulatory Sensor domain domain High [Gln] Gln
D
H
His kinase transmitter domain
DNA-binding domain
Low [Gln] DNA-binding domain
Sensor His kinase domain transmitter domain H P
ATP
D P
glnA enhancer
ADP
(b) General two-component signaling system Receiver domain
Sensor domain Histidine kinase sensor
D H
His kinase domain Sensor domain
Stimulus
Response regulator
Effector domain
His kinase domain H P
ATP ADP
D P
Effector domain
Response
FIGURE 96 Two-component regulatory systems. (a) At low cytoplasmic concentrations of glutamine, glutamine dissociates from NtrB, resulting in a conformational change that activates a protein kinase transmitter domain that transfers an ATP γ-phosphate to a conserved histidine (H) in the transmitter domain. This phosphate is then transferred to an aspartic acid (D) in the regulatory domain of the response regulator NtrC. This converts NtrC into its activated form, which binds the enhancer sites upstream of the glnA promoter (see Figure 9-5). (b) General organization of two-component histidyl-aspartyl phospho-relay regulatory systems in bacteria and plants. See A. H. West and A. M. Stock, 2001, Trends Biochem. Sci. 26:369.
In each of these regulatory systems, one protein, called a histidine kinase sensor, contains a latent histidine kinase transmitter domain that is regulated in response to environmental changes detected by a sensor domain. When activated, the transmitter domain transfers the γ-phosphate of ATP to a histidine residue in the transmitter domain. The second protein, called a response regulator, contains a receiver domain homologous to the region of NtrC containing the aspartic acid residue that is phosphorylated by activated NtrB. The response regulator contains a second functional domain that is regulated by phosphorylation of the receiver domain. In many cases, this domain of the response regulator is a sequence-specific DNA-binding domain that binds to related DNA sequences and functions either as a repressor, like the lac repressor, or as an activator, like CAP or NtrC, regulating the transcription of specific genes. However, the effector domain can have other functions as well, such as controlling the direction in which the bacterium swims in response to a concentration gradient of nutrients. Although all transmitter domains are homologous (as are receiver domains), the transmitter domain of a specific sensor protein will phosphorylate only the receiver domains of specific response regulators, allowing specific responses to different environmental changes. Similar two-component histidyl-aspartyl phospho-relay regulatory systems are also found in plants.
Expression of Many Bacterial Operons Is Controlled by Regulation of Transcriptional Elongation In addition to regulation of transcription initiation by activators and repressors, expression of many bacterial operons is controlled by regulation of transcriptional elongation in the promoter-proximal region. This mechanism of control was first discovered in studies of trp operon transcription in E. coli (see Figure 5-13). Transcription of the trp operon is repressed by the trp repressor when the concentration of tryptophan in the cytoplasm is high. But the low level of transcription initiation that still occurs is further controlled by a process called attenuation when the concentration of charged tRNATrp is sufficient to support a high rate of protein synthesis. The first 140 nt of the trp operon does not encode proteins required for tryptophan biosynthesis, but rather consists of a short peptide “leader sequence,” as diagrammed in Figure 9-7a. Region 1 of this leader sequence contains two successive Trp codons. Region 3 can base-pair with either region 2 or region 4. A ribosome follows closely behind the RNA polymerase, initiating translation of the leader peptide shortly after the 5′ end of the trp leader sequence emerges from the RNA polymerase. When the concentration of tRNATrp is sufficient to support a high rate of protein synthesis, the ribosome translates quickly through region 1 into region 2, blocking the ability of region 2 to base-pair with region 3 as it emerges from the surface of the transcribing RNA polymerase (Figure 9-7b, left). Instead, region 3 base-pairs with region 4 as soon as it emerges from the surface of the polymerase, forming a stem-loop (see Figure5-9a) followed by several uracils, which is a signal for 9.1 Control of Gene Expression in Bacteria
361
(a) trp leader RNA Translation start codon 1 | 5’|
50 |
1
2
100 ||
3
4
140 | UUUUU| 3’
(b) Translation of trp leader Low tryptophan Ribosome is stalled at trp codons in region 1
High tryptophan Ribosome covers region 2
Leader peptide
2 3
Leader peptide
5’
RNA polymerase terminates transcription
3-4 stem-loop forms
FIGURE 97 Transcriptional control by regulation of RNA polymerase elongation and termination in the E. coli trp operon. (a)Diagram of the 140-nucleotide trp leader RNA. The numbered regions are critical to attenuation. (b) Translation of the trp leader sequence begins near the 5′ end soon after it is transcribed, while transcription of the rest of the polycistronic trp mRNA molecule continues.
bacterial RNA polymerase to pause transcription and terminate. As a consequence, the remainder of the long trp operon is not transcribed, and the cell does not waste the energy required for tryptophan synthesis, or for the translation of the encoded proteins, when the concentration of tryptophan is high. However, when the concentration of tRNA Trp is not sufficient to support a high rate of protein synthesis, the ribosome stalls at the two successive Trp codons in region 1 (Figure 9-7b, right). As a consequence, region 2 base-pairs with region 3 as soon as it emerges from the transcribing RNA polymerase. This prevents region 3 from base-pairing with region 4, so the 3–4 hairpin does not form and does not cause RNA polymerase pausing or transcription termination. As a result, the proteins required for tryptophan synthesis are translated by ribosomes that initiate translation at the start codons for each of these proteins in the long polycistronic trp mRNA. Attenuation of transcription elongation also occurs at some operons and single genes encoding enzymes involved in the biosynthesis of other amino acids and metabolites through the function of riboswitches. Riboswitches are sequences of RNA most commonly found in the 5′ untranslated region of bacterial mRNAs. They fold into complex tertiary structures called aptamers that bind small-molecule metabolites when those metabolites are present at sufficiently high concentrations. In some cases, this binding results in the formation of stem-loop structures that lead to early termination of transcription, as in the Bacillus subtilis xpt-pbuX operon, which encodes enzymes involved in purine synthesis (Figure 9-8). When the concentration of small-molecule metabolites is lower, the metabolites are 362
CHAPTER 9
t Transcriptional Control of Gene Expression
1 5’ 4
UUUUU 3’ 3 4
1
2
2-3 stem-loop forms RNA polymerase continues transcription
At high concentrations of charged tRNATrp, formation of the 3–4 stemloop followed by a series of uracils causes termination of transcription. At low concentrations of charged tRNATrp, region 3 is sequestered in the 2–3 stem-loop and cannot base-pair with region 4. In the absence of the stem-loop structure required for termination, transcription of the trp operon continues. See C. Yanofsky, 1981, Nature 289:751.
not bound by the aptamers, and alternative RNA structures form that do not induce transcription termination, allowing transcription of genes encoding enzymes involved in the synthesis of the metabolites. As we will see below, although the mechanism in eukaryotes is different, regulation of promoterproximal transcriptional pausing and termination has recently been discovered to occur frequently in the regulation of gene expression in multicellular organisms as well.
KEY CONCEPTS OF SECTION 9.1
Control of Gene Expression in Bacteria r Gene expression in both prokaryotes and eukaryotes is regulated primarily by mechanisms that control gene transcription. r The first step in the initiation of transcription in E. coli is the binding of a σ-factor complexed with an RNA polymerase to a promoter. r The nucleotide sequence of a promoter determines its strength, that is, how frequently different RNA polymerase molecules can bind and initiate transcription per minute. r Repressors are proteins that bind to operator sequences that overlap or lie adjacent to promoters. Binding of a repressor to an operator inhibits transcription initiation or elongation. r The DNA-binding activity of most bacterial repressors is modulated by small-molecule ligands. This allows bacterial cells to regulate transcription of specific genes in response
(a)
(b)
Folding of aptamer Gene “On”
Transcription continues
Low purine concentration 5’
Pol
High purine concentration
Transcription termination
UUUUU 3’
5’
Purine
Gene “Off” 5’
FIGURE 98 Riboswitch control of transcription terminationin B.subtilis. (a) During transcription of the Bacillus subtilis xpt-pbuX operon, which encodes enzymes involved in purine synthesis, the 5′ untranslated region of the mRNA can fold into alternative structures depending on the concentration of purines in the cytoplasm, forming the “purine riboswitch.” At high concentrations of purines, the riboswitch folds into an aptamer that binds a purine ligand (cyan circle), allowing formation of a stem-loop transcription termination signal similar to the termination signal that forms in the E. coli trp operon mRNA at high
to changes in the concentration of various nutrients in the environment and metabolites in the cytoplasm. r The lac operon and some other bacterial genes are also regulated by activator proteins that bind next to a promoter and increase the frequency of transcription initiation by interacting directly with RNA polymerase bound to that promoter. r The major sigma factor in E. coli is σ70, but several other, less abundant sigma factors are also found, each recognizing different consensus promoter sequences or interacting with different activators. r Transcription initiation by all E. coli RNA polymerases, except those containing σ54, can be regulated by repressors and activators that bind near the transcription start site (see Figure 9-4). r Genes transcribed by σ54-RNA polymerase are regulated by activators that bind to enhancers located about 100 base pairs upstream from the start site. When the activator and σ54-RNA polymerase interact, the DNA between their binding sites forms a loop (see Figure 9-5). r In two-component regulatory systems, one protein acts as a sensor, monitoring the level of nutrients or other components in the environment. Under appropriate conditions, the
tryptophan concentrations (see Figure 9-7), i.e., a stem loop followed by a run of Us. Atlow purine concentrations, an alternative RNA structure forms that prevents formation of the transcription termination signal, permitting transcription of the operon. Note the alternative base pairing of the red and blue regions of the RNA. (b) Structure of the purine riboswitch bound to a purine (cyan) as determined by X-ray crystallography. See A. D. Garst, A. L. Edwards, and R. T. Batey, 2011, Cold Spring Harb. Perspect. Biol. 3:a003533. [Part (b) data from R. T. Batey, S. D. Gilbert, and R. K. Montagne, 2004, Nature 432:411, PDB ID 4fe5.]
γ-phosphate of an ATP is transferred first to a histidine in the sensor protein and then to an aspartic acid in a second protein, the response regulator. The phosphorylated response regulator then performs a specific function in response to the stimulus, such as binding to DNA regulatory sequences, thereby stimulating or repressing transcription of specific genes (see Figure 9-6). r Transcription in bacteria can also be regulated by control of transcriptional elongation in the promoter-proximal region. This control can be exerted by ribosome binding to the nascent mRNA, as in the case of the E. coli trp operon (see Figure 9-7), or by riboswitches, RNA sequences that bind small molecules, as for the B. subtilis xpt-pbuX operon (see Figure 9-8), to determine whether a stem-loop followed by a string of uracils forms, causing the bacterial RNA polymerase to pause and terminate transcription.
9.2 Overview of Eukaryotic Gene Control In bacteria, gene control serves mainly to allow a single cell to adjust to changes in its environment so that its growth and division can be optimized. In multicellular organisms, environmental changes also induce changes in gene expression. An example is the response to low oxygen concentrations 9.2 Overview of Eukaryotic Gene Control
363
(hypoxia), in which a specific set of genes is rapidly induced that helps the cell survive under the hypoxic conditions. These genes include those encoding secreted angiogenic proteins that stimulate the growth and penetration of new capillaries into the surrounding tissue. However, the most characteristic and biologically far-reaching purpose of gene control in multicellular organisms is execution of the genetic program that underlies embryological development. Generation of the many different cell types that collectively form a multicellular organism depends on the right genes being activated in the right cells at the right time during the developmental period. In most cases, once a developmental step has been taken by a cell, it is not reversed. Thus these decisions are fundamentally different from the reversible activation and repression of bacterial genes in response to environmental conditions. In executing their genetic programs, many differentiated cells (e.g., skin cells, red blood cells, and antibody-producing cells) march down a pathway to final cell death, leaving no progeny behind. The fixed patterns of gene control leading to differentiation serve the needs of the whole organism and not the survival of an individual cell. Despite the differences in the purposes of gene control in bacteria and eukaryotes, two key features of transcriptional control first discovered in bacteria and described in the previous section also apply to eukaryotic cells. First, protein-binding regulatory DNA sequences, or transcription-control regions, are associated with genes. Second, specific proteins that bind to a gene’s transcription-control regions determine where transcription will start and either activate or repress transcription. One fundamental difference between transcriptional control in bacteria and in eukaryotes is a consequence of the association of eukaryotic chromosomal DNA with histone octamers, forming nucleosomes that associate into chromatin fibers that further associate into chromatin of varying degrees of condensation (see Figures 8-24, 8-25, 8-27, and 8-28). Eukaryotic cells exploit chromatin structure to regulate transcription, a mechanism of transcriptional control that is not available to bacteria. In multicellular eukaryotes, many inactive genes are assembled into condensed chromatin, which inhibits binding of the RNA polymerases and general transcription factors required for transcription initiation (see Figure 9-3). Activator proteins, which bind to transcription-control regions near the transcription start site of a gene as well as kilobases away, promote chromatin decondensation, binding of RNA polymerase to the promoter, and transcriptional elongation. Repressor proteins, which bind to alternative control elements, cause condensation of chromatin and inhibition of polymerase binding or elongation. In this section, we discuss the general principles of eukaryotic gene control and point out some similarities and differences between bacterial and eukaryotic systems. Subsequent sections of this chapter will address specific aspects of eukaryotic transcription in greater detail.
364
CHAPTER 9
t Transcriptional Control of Gene Expression
Regulatory Elements in Eukaryotic DNA Are Found Both Close to and Many Kilobases Away from Transcription Start Sites Direct measurements of the transcription rates of multiple genes in different cell types have shown that regulation of transcription, either at the initiation step or during elongation in the promoter-proximal region, is the most widespread form of gene control in eukaryotes, as it is in bacteria. In eukaryotes, as in bacteria, a DNA sequence that specifies where RNA polymerase binds and initiates transcription of a gene is called a promoter. Transcription from a particular promoter is controlled by DNA-binding proteins that are functionally equivalent to bacterial repressors and activators. However, eukaryotic transcriptional regulatory proteins can often function either to activate or to repress transcription, depending on their associations with other proteins. Consequently, they are more generally called transcription factors. The DNA control elements in eukaryotic genomes to which transcription factors bind are often located much farther from the promoter they regulate than is the case in bacterial genomes. In some cases, transcription factors bind at regulatory sites tens of thousands of base pairs either upstream (opposite to the direction of transcription) or downstream (in the same direction as transcription) from the promoter. As a result of this arrangement, transcription of a single gene may be regulated by the binding of multiple different transcription factors to alternative control elements, which direct expression of the same gene in different types of cells and at different times during development. For example, several separate transcription-control regions regulate expression of the mammalian gene encoding the transcription factor Pax6. As mentioned in Chapter 1, Pax6 protein is required for development of the eye. Pax6 is also required for the development of certain regions of the brain and spinal cord, and the cells in the pancreas that secrete hormones such as insulin. As also mentioned in Chapter 1, heterozygous humans with only one functional Pax6 gene are born with aniridia, a lack of irises in the eyes (see Figure 1-30d). In mammals, the Pax6 gene is expressed from at least three alternative promoters that function in different cell types and at different times during embryogenesis (Figure 9-9a). Researchers often analyze transcription-control regions by preparing recombinant DNA molecules that combine a fragment of DNA to be tested with the coding region for a reporter gene whose expression is easily assayed. Typical reporter genes include the gene that encodes luciferase, an enzyme that generates light that can be assayed with great sensitivity and over many orders of magnitude of intensity using a luminometer. Other frequently used reporter genes encode green fluorescent protein (GFP), which can be visualized by fluorescence microscopy (see Figures 4-9d and 4-16), and E. coli β-galactosidase, which generates an intensely blue insoluble precipitate when incubated with the colorless soluble lactose analog X-gal. When transgenic mice (see Figure 6-40) containing a β-galactosidase
AAA
(a) 0 Pancreas
Lens and cornea
12
α 5 6
3 4
Telencephalon
Retina
7 Retina
8 9 10 11
12
13
12
13
Di- and rhomboencephalon
Transcript a 0
2
3 4
5 6
7
8 9 10 11
AAA Transcript b 1 2
3 4
5 6
8 9 10 11
7
12
13 AAA
Transcript c α 5 6 5
10
15
20
7
8 9 10 11
25
30
12
13
kb
(c)
(b)
LP
P
PAX6
(d) 0
100
200
RCN1
FIGURE 99 Transcription-control regions of the mouse Pax6 gene and the orthologous human PAX6 gene. (a) Three alternative Pax6 promoters are used at distinct times during embryogenesis in different tissues of the developing mouse embryo. Transcription-control regions regulating expression of Pax6 in different tissues are indicated by colored rectangles. These control regions are some 200–500 bp in length. (b) Expression of a β-galactosidase reporter transgene fused to the 8 kb of mouse DNA upstream from exon 0. A transgenic mouse embryo 10.5 days after fertilization was stained with X-gal to reveal β-galactosidase. Lens pit (LP) is the tissue that will develop into the lens of the eye. Expression was also observed in tissue that will develop into the pancreas (P). (c) Expression in a mouse embryo at 13.5 days after fertilization of a β-galactosidase reporter gene linked to the sequence in part (a) between exons 4 and 5 marked Retina. Arrow points to nasal and temporal regions of the developing retina. (d) Human PAX6 control regions identified in the 600-kb region of human DNA between the upstream gene RCN1 and the promoter of the downstream ELP4 gene. RCN1 and ELP4 are transcribed in the opposite direction from PAX6, as represented by the leftward-pointing arrows associated with their first exons. RCN1 and ELP1 exons are shown as black rectangles
300
500 kb
ELP4
below the line representing this region of human DNA. PAX6 exons are diagrammed as red rectangles above the line. The three PAX6 promoters first characterized in the mouse are shown by rightward arrowheads, and the control regions shown in (a) are represented by gray rectangles. Regions flanking the gene where the sequence is partially conserved in most vertebrates (as in Figure 9-10a) are shown as ovals. Colored ovals represent sequences that cause expression of the transgene in specific neuroanatomical locations in the zebrafish central nervous system. Ovals with the same color stimulated expression in the same region. Gray ovals represent conserved sequences that did not stimulate reporter-gene expression in the developing zebrafish embryo, or were not tested. Such conserved regions may function only in combination, or they may have been conserved for some reason other than regulation of transcription, such as proper folding of the chromosome into topological domains (see Figure 8-34). [Part (a) data from B. Kammendal et al., 1999, Devel. Biol. 205:79. Part (b) republished with permission of Elsevier, B. Kammendal et al., “Distinct cis-essential modules direct the time-space pattern of the Pax6 gene activity,” Developmental Biology, 1999, 205(1): 79–97; permission conveyed through Copyright Clearance Center, Inc. Part (c) courtesy of Peter Gruss and Birgitta Kammandel. Part (d) data from S. Batia et al., 2014, Devel. Biol. 387:214.]
9.2 Overview of Eukaryotic Gene Control
365
reporter gene fused to 8 kb of DNA upstream from Pax6 exon 0 were produced, β-galactosidase was observed in the developing lens, cornea, and pancreas of the embryo halfway through gestation (Figure 9-9b). Analysis of transgenic mice with smaller fragments of DNA from this region allowed the mapping of the separate transcription-control regions regulating transcription in the pancreas, and in both the lens and cornea. Transgenic mice with other reporter gene constructs revealed additional transcription-control regions (see Figure 9-9a). These regions control transcription in the developing retina and in different regions of the developing brain (encephalon). Some of these transcription-control regions are in introns between exons 4 and 5 and between exons 7 and 8. For example, a reporter gene under control of the region labeled Retina in Figure 9-9a between exons 4 and 5 led to reporter-gene expression specifically in the retina (Figure 9-9c). Control regions for many genes are found hundreds of kilobases away from the coding exons of the gene. One method for identifying such distant control regions is to compare the sequences of distantly related organisms. Transcription-control regions for a conserved gene are also often conserved and can be recognized in the background of nonfunctional sequences that diverge during evolution.
For example, there is a human DNA sequence, which is highly conserved between humans, mice, chickens, frog, and fish, about 500 kb downstream of the SALL1 gene (Figure 9-10a). SALL1 encodes a transcription factor required for normal development of the limbs. When transgenic mice were produced containing this conserved DNA sequence linked to a β-galactosidase reporter gene (Figure9-10b), the transgenic embryos expressed a very high level of β-galactosidase in the developing limb buds (Figure 9-10c). Human patients with deletions in this region of the genome develop with limb abnormalities. These results indicate that this conserved region directs transcription of the SALL1 gene in the developing limb. Presumably, other transcription-control regions control expression of this gene in other types of cells, where it functions in the normal development of the ears, the lower intestine, and kidneys. Because the sequences and functions of transcriptioncontrol regions are often conserved through evolution, the transcription factors that bind to these transcription-control regions to regulate gene expression in specific cell types are presumably conserved during evolution as well. This has made it possible to assay control regions in human DNA by
Sequence similarity to human
(a) Comparative analysis
Mouse
Chicken
Frog
Fish 50215
50217 Chromosome 16 (kb)
(b) Mouse egg microinjection
50219
(c) E11.5 reporter staining
Forelimb bud Hindlimb bud
366
CHAPTER 9
t Transcriptional Control of Gene Expression
FIGURE 910 The human SALL1 enhancer activates expression of a reporter gene in limb buds of the developing mouse embryo. (a) Graphic representation of the conservation of DNA sequence in a region of the human genome (in the interval of chromosome 16 from 50214 kb to 50220.5 kb) about 500 kb downstream from the SALL1 gene, which encodes a zinc-finger transcription repressor. A region of roughly 500 bp of nonprotein-coding sequence is conserved from zebrafish to human. Nine hundred base pairs of human DNA including this conserved region were inserted into a plasmid next to the coding region for E. coli β-galactosidase. (b) The plasmid was microinjected into a pronucleus of a fertilized mouse egg and implanted in the uterus of a pseudopregnant mouse to generate a transgenic mouse embryo with the reporter-gene-containing plasmid incorporated into its genome (see Figure 5-43). (c) After 11.5 days of development, at the time when limb buds develop, the fixed and permeabilized embryo was incubated in X-gal, which is converted by β-galactosidase into an insoluble, intensely blue compound. The results showed that the conserved region contains an enhancer that stimulates strong transcription of the β-galactosidase reporter gene specifically in limb buds. [Part (a) data from A. Visel et al., 2007. VISTA Enhancer Browser—a database of tissue-specific human enhancers. Nucleic Acids Res. 35:D88–92. Part (b) ©Deco/Alamy. Part (c) republished with permission of Nature, from Pennacchio, L.A., et al., “In vivo enhancer analysis of human conserved noncoding sequences”, Nature, 444, 499–506, 2006; permission conveyed through Copyright Clearance Center, Inc.]
reporter-gene expression in transgenic zebrafish, a procedure that is far simpler, faster, and less expensive than preparing transgenic mice (Figure 9-9d). After discussing the proteins that function with RNA polymerase to carry out transcription in eukaryotic cells and eukaryotic promoters, we will return to a discussion of how such distant transcriptioncontrol regions, called enhancers, are thought to function.
Total protein Pol II
Pol III
10
20 30 Fraction number
40
RNA synthesis in presence of 1 μg/ml α-amanitin
The nuclei of all eukaryotic cells examined so far (e.g., vertebrate, Drosophila, yeast, and plant cells) contain three different RNA polymerases, designated I, II, and III. These enzymes are eluted at different salt concentrations during ion-exchange chromatography, reflecting the differences in their net charges. The three nuclear RNA polymerases also differ in their sensitivity to α-amanitin, a poisonous cyclic octapeptide produced by some mushrooms (Figure 9-11). RNA polymerase I is insensitive to α-amanitin, but RNA polymerase II is very sensitive— the drug binds near the active site of the enzyme and inhibits translocation of the enzyme along the DNA template. RNA polymerase III has intermediate sensitivity. Each eukaryotic RNA polymerase catalyzes transcription of genes encoding different classes of RNA (Table 9-2). RNA polymerase I (Pol I), located in the nucleolus, transcribes genes encoding precursor rRNA (pre-rRNA), which is processed into 28S, 5.8S, and 18S rRNAs. RNA polymeraseIII (Pol III) transcribes genes encoding tRNAs, 5S rRNA, and an array of small stable RNAs, including one involved in RNA splicing (U6) and the RNA component of the signal recognition particle (SRP) involved in directing nascent proteins to the endoplasmic reticulum (see Chapter 13). RNA polymerase II (Pol II) transcribes all protein-coding genes: that is, it functions in production of mRNAs. RNApolymerase II
Pol I
Protein RNA synthesis
Three Eukaryotic RNA Polymerases Catalyze Formation of Different RNAs
[NaCl]
50
EXPERIMENTAL FIGURE 911 Liquid chromatography separates and identifies the three eukaryotic RNA polymerases, each with its own sensitivity to 𝛂-amanitin. A protein extract from the nuclei of cultured eukaryotic cells was passed through a DEAE Sephadex column and adsorbed protein eluted (black curve) with a solution of constantly increasing NaCl concentration. An aliquot of each fraction of eluate collected from the column was assayed for RNA polymerase activity without (red curve) and with (green shading) 1 μg/ml α-amanitin. This concentration of α-amanitin inhibits polymerase II activity but hasno effect on polymerases I and III. Polymerase III is inhibited by 10μg/ml of α-amanitin, whereas polymerase I is unaffected even at this higher concentration. See R. G. Roeder, 1974, J. Biol. Chem. 249:241.
also produces four of the five small nuclear RNAs (snRNAs) that take part in RNA splicing and micro-RNAs (miRNAs) involved in translation control, as well as the closely related endogenous small interfering RNAs (siRNAs) (see Chapter10).
TABLE 92 Classes of RNA Transcribed by the Three Eukaryotic Nuclear RNA Polymerases and Their Functions Polymerase
RNA Transcribed
RNA Function
RNA polymerase I
Pre-rRNA (28S, 18S, 5.8S rRNAs)
Ribosome components, protein synthesis
RNA polymerase II
mRNA snRNAs siRNAs miRNAs
Encodes protein RNA splicing Chromatin-mediated repression, translation control Translation control
RNA polymerase III
tRNAs 5S rRNA snRNA U6 7S RNA
Protein synthesis Ribosome component, protein synthesis RNA splicing Signal recognition particle for insertion of polypeptides into the endoplasmic reticulum Various functions, unknown for many
Other small stable RNAs
9.2 Overview of Eukaryotic Gene Control
367
(a) Bacterial RNA polymerase
(b) Yeast RNA polymerase II
ω αΙΙ
8
β'
RPB1
5
11
αΙ
DNA
3
(c) Yeast RNA polymerase II
6
DNA
12
β
RPB2 9
10
RPB4
RPB7
RNA exit Clamp Clamp
Clamp Wall
Wall
FIGURE 912 Comparison of three-dimensional structures of bacterial and eukaryotic RNA polymerases. (a, b) These space-filling models are based on x-ray crystallographic analysis. (a) RNA polymerase from the bacterium T. aquaticus. The five subunits of the bacterial enzyme are distinguished by color. Only the N-terminal domains of the α subunits are included in this model. (b) Core RNA polymerase II from S. cerevisiae. Ten of the 12 subunits constituting yeast RNA polymerase II are shown in this model. Subunits that are similar in conformation to those in the bacterial enzyme are shown in the same colors. The C-terminal domain of the large subunit RPB1 was not observed in the crystal structure, but it is known to extend from the position
2000, Science 289:619–625. Part (b) data from P. Cramer et al., 2001, Science 292:1863, PDB ID 1i50. Part (c) data from K. J. Armache et al., 2003, P. Natl. Acad. Sci. USA 100:6964, and D. A. Bushnell and R. D. Kornberg, 2003, P. Natl. Acad. Sci. USA 100:6969.]
Each of the three eukaryotic RNA polymerases is more complex than E. coli RNA polymerase, but all four of these multisubunit RNA polymerases have a similar overall design (Figure 9-12a, b). All three eukaryotic RNA polymerases contain two large subunits and 10–14 smaller subunits, some of which are common between two or all three of the polymerases. The best-characterized eukaryotic RNA polymerases are from the yeast Saccharomyces cerevisiae. Each of the yeast genes encoding the polymerase subunits has been subjected to gene-knockout mutations and the resulting phenotypes characterized. In addition, the three-dimensional structure of yeast RNA polymerase II has been determined (Figure 9-12b, c). The three nuclear RNA polymerases from all eukaryotes so far examined are very similar to those of yeast. Plants contain two additional nuclear RNA polymerases (RNA polymerases IV and V), which are closely related to their RNA polymerase II but have a unique large subunit and some additional unique subunits. These two polymerases function in transcriptional repression directed by nuclear siRNAs in plants. The two large subunits of all three eukaryotic RNA polymerases (and RNA polymerases IV and V of plants) are related
to one another and are similar to the E. coli β′ and β subunits, respectively (see Figure 9-12a, b). Each of the eukaryotic RNA polymerases also contains an ω-like and two nonidentical α-like subunits (Figure 9-13). The extensive similarity in the structures of these core subunits in RNA polymerases from various sources indicates that RNA polymerase arose early in evolution and was largely conserved. This seems logical for an enzyme catalyzing a process as fundamental as the copying of RNA from DNA. In addition to the core subunits that are related to the E. coli RNA polymerase subunits, all three yeast RNA polymerases contain four additional small subunits, common to them but not to the bacterial RNA polymerase. Finally, each eukaryotic nuclear RNA polymerase has several enzyme-specific subunits that are not present in the other two (see Figure 9-13). Three of these additional subunits of Pol I and Pol III are homologous to the three additional Pol II-specific subunits. The other two Pol I-specific subunits are homologous to the Pol II general transcription factor TFIIF, discussed later, and the four additional subunits of Pol III are homologous to the Pol II general transcription factors TFIIF and TFIIE. These are likely stably associated with Pol III in the cell, and do not dissociate from it during purification.
368
CHAPTER 9
t Transcriptional Control of Gene Expression
marked with a red arrow. (RPB is the abbreviation for “RNA polymerase B,” which is an alternative way of referring to RNA polymerase II.) DNA entering the polymerases as they transcribe to the right is diagrammed. (c) Space-filling model of yeast RNA polymeraseII including subunits 4 and 7. These subunits extend from the core portion of the enzyme shown in (b) near the region of the C-terminal domain of the large subunit. [Part (a) data courtesy of Seth Darst; see N. Korzheva et al.,
E. coli core RNA polymerase (F2GGⴕ\) Gⴕ
G
FI
FII
(a) Free RNA polymerase II Clamp domain
\
Rpb5 Eukaryotic RNA polymerases I Gⴕ- and G-like subunits
1
II 2
1
Wall
III 2
1
Bridge 2
Mg2+ Rpb2 lobe
CTD F-like subunits
Rpb9
\-like subunit
(b) Transcribing RNA polymerase II Common subunits
Additional enzyme-specific subunits
Clamp domain
RNA transcript
Rpb5 Direction of transcription
5
3
7
FIGURE 913 Schematic representation of the subunit structure of the E. coli RNA core polymerase and yeast nuclear RNA polymerases. All three yeast polymerases have five core subunits homologous to the β, β′, two α, and ω subunits of E. coli RNA polymerase. The largest subunit (RPB1) of RNA polymerase II also contains an essential C-terminal domain (CTD). RNA polymerases I and III contain the same two nonidentical α-like subunits, whereas RNA polymerase II contains two other nonidentical α-like subunits. All three polymerases share the same ω-like subunit and four other common subunits. In addition, each yeast polymerase contains three to seven unique smaller subunits.
The clamp domain of subunit RPB1 is so designated because it has been observed in two different positions in crystals of free Pol II (Figure 9-14a) and in a complex that mimics the elongating form of the enzyme (Figure 9-14b). This domain rotates on a hinge that is probably open when downstream DNA is inserted into this region of the polymerase, and then swings shut when the enzyme is in its elongation mode. It is postulated that when the 8–9-bp RNA-DNA hybrid region near the active site (where RNA is base-paired to the template strand; see Figure 9-14b) is bound between RBP1 and RBP2, the clamp is locked in its closed position, anchoring the polymerase to the downstream double-stranded DNA. Furthermore, a transcription elongation factor called DSIF, discussed later, associates with the elongating polymerase, holding the clamp in its closed conformation. As a consequence, the polymerase is extraordinarily processive, which is to say that it continues to polymerize ribonucleotides until transcription is terminated. After termination and release of RNA from the exit
Wall
DNA
Mg2+ Rpb2 lobe
Rpb9
FIGURE 914 The clamp domain of RPBI. The structures of the free (a) and transcribing (b) RNA polymerase II differ mainly in the position of a clamp domain in the RPB1 subunit (orange), which swings over the cleft between the jaws of the polymerase during formation of the transcribing complex, trapping the template DNA strand and transcript. Binding of the clamp domain to the 8–9-bp RNA-DNA hybrid may help couple clamp closure to the presence of RNA, stabilizing the closed, elongating complex. RNA is shown in red, and the template strand in light purple. For clarity, downstream nontemplate DNA is not shown. The clamp closes over the incoming downstream DNA. Portions of RBP2 that form one side of the cleft have been removed so that the nucleic acids can be better visualized. The Mg2+ ion that participates in catalysis of phosphodiester bond formation is shown in green. Wall is the domain of RPB2 that forces the template DNA entering the jaws of the polymerase to bend before it exits the polymerase. The bridge α helix, shown in green, extends across the cleft in the polymerase (see Figure 9-12b) and is postulated to bend and straighten as the polymerase translocates one base down the template strand. The nontemplate strand is thought to form a flexible single-stranded region above the cleft (not shown), extending from three bases downstream of the template base-paired to the 3′ base of the growing RNA to where the template strand exits the polymerase, where it hybridizes with the template strand to generate the transcription bubble. [Part (a) data from P. Cramer, D. A. Bushnell, and R. D. Kornberg, 2001, Science 292:1863, PDB ID 1i50. Part (b) data from A. L. Gnatt et al., 2001, Science 292:1876, PDB ID 1i6h.]
9.2 Overview of Eukaryotic Gene Control
369
channel, the clamp can swing open, releasing the enzyme from the template DNA. This mechanism can explain how human RNA polymerase II can transcribe the longest human gene, encoding dystrophin (DMD; see Figure 1-31), which is some 2 million base pairs in length, without dissociating and terminating transcription. Since transcription elongation proceeds at 1–2 kb per minute, transcription of the DMD gene requires approximately one day! Gene-knockout experiments in yeast indicate that most of the subunits of the three nuclear RNA polymerases are essential for cell viability. Disruption of the genes encoding the few polymerase subunits that are not essential for viability (e.g., subunits 4 and 7 of RNA polymerase II) nevertheless results in very poorly growing cells. Thus all the subunits are necessary for eukaryotic RNA polymerases to function normally. Archaea, like eubacteria, have a single type of RNA polymerase involved in gene transcription, but archaeal RNA polymerases, like eukaryotic nuclear RNA polymerases, have on the order of a dozen subunits. Archaea also have general transcription factors, discussed later, that are related to those of eukaryotes, consistent with the closer evolutionary relationship between archaea and eukaryotes than between eubacteria and eukaryotes (see Figure 1-1).
The Largest Subunit in RNA Polymerase II Has an Essential Carboxy-Terminal Repeat The carboxyl end of RPB1, the largest subunit of RNA polymerase II, contains a stretch of seven amino acids that is nearly precisely repeated multiple times. Neither RNA polymerase I nor III contains these repeating units. This heptapeptide repeat, with a consensus sequence of TyrSer-Pro-Thr-Ser-Pro-Ser, is known as the carboxy-terminal domain (CTD) (see Figure 9-12b, red arrow). Yeast RNA polymerase II contains 26 or more repeats, vertebrate enzymes have 52 repeats, and an intermediate number of repeats occur in RNA polymerase II from nearly all other eukaryotes. The CTD is critical for viability, and at least 10copies of the repeat must be present for yeast to survive. In vitro experiments with model promoters first showed that RNA polymerase II molecules that initiate transcription have a nonphosphorylated CTD. Once the polymerase initiates transcription and begins to move away from the promoter, many of the serine and some tyrosine residues in the CTD are phosphorylated. Analysis of polytene chromosomes from Drosophila salivary glands prepared just before molting of the larva, a time of active transcription, indicates that the CTD is also phosphorylated during in vivo transcription. The large chromosomal “puffs” induced at this time in development are regions where the genome is very actively transcribed. Staining with antibodies specific for the phosphorylated or nonphosphorylated CTD demonstrated that RNA polymerase II associated with the highly transcribed puffed regions contains a phosphorylated CTD (Figure 9-15).
370
CHAPTER 9
t Transcriptional Control of Gene Expression
74EF 75B
EXPERIMENTAL FIGURE 915 Antibody staining demonstrates that the carboxy-terminal domain of RNA polymerase II is phosphorylated during in vivo transcription. Salivary-gland polytene chromosomes were prepared from Drosophila larvae just before they molted. The preparation was treated with a rabbit antibody specific for phosphorylated CTD and with a goat antibody specific for nonphosphorylated CTD. The preparation was then stained with fluorescein-labeled anti-goat antibody (green) and rhodamine-labeled anti-rabbit antibody (red). Thus polymerase molecules with a nonphosphorylated CTD stained green, and those with a phosphorylated CTD stained red. The molting hormone ecdysone induces very high rates of transcription in the puffed regions labeled 74EF and 75B; note that only phosphorylated CTD is present in these regions. Smaller puffed regions transcribed at high rates are also visible. Nonpuffed sites that stained red (up arrow) or green (horizontal arrow) are also indicated, as is a site staining both red and green, producing a yellow color (down arrow). [From J. R. Weeks et al., “Locus-specific variation in phosphorylation state of RNA polymerase II in vivo: correlations with gene activity and transcript processing,” Genes & Development, 1993, 7(12A):2329–44; courtesy of J. R. Weeks and A. L. Greenleaf; republished with permission from Cold Spring Harbor Press.]
KEY CONCEPTS OF SECTION 9.2
Overview of Eukaryotic Gene Control r The primary purpose of gene control in multicellular organisms is the execution of precise developmental programs so that the proper genes are expressed in the proper cells at the proper times during embryologic development and cellular differentiation. r Transcriptional control is the primary means of regulating gene expression in eukaryotes, as it is in bacteria. r In eukaryotic genomes, DNA transcription-control elements may be located many kilobases away from the promoter they regulate. Different control elements can control transcription of the same gene in different cell types. r Eukaryotes contain three types of nuclear RNA polymerases. All three contain two large and three smaller core
subunits with homology to the β′, β, α, and ω subunits of E. coli RNA polymerase, as well as several additional small subunits (see Figure 9-13). r RNA polymerase I synthesizes only pre-rRNA. RNA polymerase II synthesizes mRNAs, some of the small nuclear RNAs that participate in mRNA splicing, and micro- and small interfering RNAs (miRNAs and siRNAs) that regulate the translation and stability of mRNAs. RNA polymerase III synthesizes tRNAs, 5S rRNA, and several other small stable RNAs (see Table 9-2). r The carboxy-terminal domain (CTD) in the largest subunit of RNA polymerase II becomes phosphorylated during transcription initiation and remains phosphorylated as the enzyme transcribes the DNA template.
experiments, the 5′ cap was added to the 5′ end of the nascent RNA by enzymes in the nuclear extract, which can add a cap only to an RNA that has a 5′ tri- or diphosphate. Because a 5′ end generated by cleavage of a longer RNA would have a 5′ monophosphate, it would not be capped. Consequently, researchers concluded that the capped nucleotides generated in the in vitro transcription reactions must have been the nucleotides with which transcription was initiated. Sequence analysis revealed that, for any given gene, the sequence at the 5′ end of the RNA transcripts produced in vitro is the same as that at the 5′ end of the mRNAs isolated from cells, confirming that the capped nucleotide of eukaryotic mRNAs coincides with the transcription start site. Today the transcription start site for a newly characterized mRNA is generally determined simply by identifying the DNA sequence encoding the 5′-capped nucleotide of the encoded mRNA.
9.3 RNA Polymerase II Promoters and General Transcription Factors
The TATA Box, Initiators, and CpG Islands Function as Promoters in Eukaryotic DNA
The mechanisms that regulate transcription initiation and elongation by RNA polymerase II have been studied extensively because this polymerase is the one that transcribes mRNAs. Transcription initiation and elongation by RNA polymerase II are the initial biochemical processes required for the expression of protein-coding genes and are the steps in gene expression that are most frequently regulated to determine when and in which cells specific proteins are synthesized. As noted in the previous section, the expression of eukaryotic protein-coding genes is regulated by multiple protein-binding DNA sequences, generically referred to as transcription-control regions. These sequences include promoters, which determine where transcription of the DNA template begins, and other types of control elements located near transcription start sites, as well as sequences located far from the genes they regulate, called enhancers, which control the type of cell in which the gene is transcribed and how frequently it is transcribed. In this section, we take a closer look at the properties of various transcription-control elements found in eukaryotic protein-coding genes and some techniques used to identify them.
TATA Boxes The first genes to be sequenced and studied through in vitro transcription systems were viral genes and cellular protein-coding genes that are very actively transcribed, either at particular times of the cell cycle or in specific differentiated cell types. In all these highly transcribed genes, a conserved sequence called the TATA box was found about 26–31 bp upstream of the transcription start site (Figure 9-16). Mutagenesis studies have shown that a singlebase change in this nucleotide sequence drastically decreases
RNA Polymerase II Initiates Transcription at DNA Sequences Corresponding to the 5′ Cap of mRNAs In vitro transcription experiments using purified RNA polymerase II, a protein extract prepared from the nuclei of cultured cells, and DNA templates containing sequences encoding the 5′ ends of mRNAs for a number of abundantly expressed genes revealed that the transcripts produced always contained a cap structure at their 5′ ends identical to that present at the 5′ end of the spliced mRNA normally expressed from the gene in vivo (see Figure 5-14). In these
Several different types of DNA sequences can function as promoters for RNA polymerase II, telling the polymerase where to initiate transcription of an RNA complementary to the template strand of a double-stranded DNA molecule. These sequences include TATA boxes, initiators, and CpG islands.
∼ −37 to −32
∼ −31 to −26
BRE TFIIB recognition element GGG CGCC C CA
TATA box
A AA T TG
TATA A
−2 to +4
Inr Initiator Drosophila +1 G T TCA T T C T Mammals YYAN YY A
+28 to +32
DPE Downstream promoter element G A AC A G G TT C
FIGURE 916 Core promoter elements of non-CpG island promoters in metazoans. The sequence of each element is shown with the 5′ end at the left and the 3′ end at the right. The most frequently observed bases in TATA box promoters are shown in larger font. A+1 is the base at which transcription starts, Y is a pyrimidine (C or T), N is any of the four bases. [Data from S. T. Smale and J. T. Kadonaga, 2003, Annu. Rev. Biochem. 72:449.]
9.3 RNA Polymerase II Promoters and General Transcription Factors
371
in vitro transcription of the gene adjacent to it. If the base pairs between the TATA box and the normal transcription start site are deleted, transcription of the altered, shortened template begins at a new site about 25 bp downstream from the TATA box. Consequently, the TATA box acts similarly to an E. coli promoter to position RNA polymerase II for transcription initiation (see Figure 5-12). Initiator Sequences Instead of a TATA box, some eukaryotic genes contain an alternative promoter element called an initiator. Most naturally occurring initiator elements have a cytosine (C) at the −1 position and an adenine (A) residue at the transcription start site (+1). Directed mutagenesis of mammalian genes with an initiator-containing promoter revealed that the nucleotide sequence immediately surrounding the start site determines the strength of such promoters. In contrast to the conserved TATA box sequence, however, only an extremely degenerate initiator consensus sequence has been defined: (5′) Y-Y-A+1-N-T/A-Y-Y-Y (3′) where A+1 is the base at which transcription starts, Y is a pyrimidine (C or T), N is any of the four bases, and T/A is T or A at position +3. As we will see, other promoter elements, designated BRE and DPE (see Figure 9-16), can be bound by general transcription factors and influence promoter strength. CpG Islands Transcription of genes with promoters containing a TATA box or initiator element begins at a well-defined initiation site. However, the transcription of most proteincoding genes in mammals (~70 percent) occurs at a lower rate than at TATA box–containing and initiator-containing promoters and begins at any of several alternative start sites within regions of about 100–1000 bp that have an unusually high frequency of CG sequences. Many such genes encode proteins that are not required in large amounts (e.g., genes encoding enzymes involved in basic metabolic processes required in all cells, often called “housekeeping genes”). These promoter regions are called CpG islands (where “p” represents the phosphate between the C and G nucleotides) because they occur relatively rarely in the genome sequences of mammals. In mammals, most Cs followed by a G that are not associated with CpG island promoters are methylated at position 5 of the pyrimidine ring (5-methyl C, represented CMe; see Figure 2-17). CG sequences are thought to be underrepresented in mammalian genomes because spontaneous deamination of 5-methyl C generates thymidine. Over the time scale of mammalian evolution, this is thought to have led to the conversion of most CGs to TG by DNArepair mechanisms. As a consequence, the frequency of CG in the human genome is only 21 percent of that expected if Cs were randomly followed by any base. However, the
372
CHAPTER 9
t Transcriptional Control of Gene Expression
Cs in active CpG island promoters are unmethylated. Consequently, when they deaminate spontaneously, they are converted to U, a base that is recognized by DNA-repair enzymes and converted back to C. As a result, the frequency of CG sequences within CpG island promoters is close to that expected if C were followed by any of the other three nucleotides randomly. CG-rich sequences are bound by histone octamers more weakly than CG-poor sequences because more energy is required to bend them into the small-diameter loops required to wrap around the histone octamer forming a nucleosome (see Figure 8-24). As a consequence, CpG islands coincide with nucleosome-free regions of DNA. Much remains to be learned about the molecular mechanisms that control transcription from CpG island promoters, but a current hypothesis is that the general transcription factors discussed in the next section can bind to them because CpG islands exclude nucleosomes. Divergent Transcription from CpG Island Promoters Another remarkable feature of CpG islands is that transcription from these elements is initiated in both directions, even though only transcription of the sense strand yields an mRNA. By a mechanism(s) that remains to be fully elucidated, most RNA polymerase II molecules transcribing in the “wrong” direction—that is, transcribing the antisense strand—pause or terminate transcription about 1–3 kb from the transcription start site. This phenomenon was discovered by taking advantage of the stability conferred on the elongation complex by the RNA polymerase II clamp domain when an RNA-DNA hybrid is bound near the active site (see Figure9-14b, c). Nuclei were isolated from cultured human fibroblasts and incubated in a buffered solution containing salt and mild detergent, which removes RNA polymerases except for those in the process of elongation because of their stable association with template DNA. Nucleotide triphosphates were then added, with UTP replaced by bromo-UTP, containing uracil with a Br atom at position 5 on the pyrimidine ring (see Figure 2-17). The nuclei were then incubated at 30 °C long enough for about 100 nucleotides to be polymerized by the RNA polymerase II (Pol II) molecules that were in the process of elongation at the time the nuclei were isolated. RNA was then isolated, and RNA containing bromo-U was immunoprecipitated with an antibody specific for BrU-labeled RNA. Thirty-three nucleotides at the 5′ ends of these RNAs were then sequenced by massively parallel DNA sequencing (see Chapter 6) of reverse transcripts, and the sequences were mapped on the human genome. Figure 9-17 shows a plot of the number of sequence reads per kilobase of total BrU-labeled RNA relative to the major transcription start sites (TSS) of all currently known human protein-coding genes. The results show that approximately equal numbers of RNA polymerase molecules transcribed most promoters (mostly CpG island promoters) in the sense direction, toward the gene (blue, plotted
that there is a low level of transcription from seemingly random sites throughout the genome. These recent discoveries of divergent transcription from CpG island promoters and low-level transcription of most of the genomes of eukaryotes have been a great surprise to most researchers.
Sequence reads per kilobase
100 +50 bp
80 60 40 20 0 −20 −40
−250 bp
−60 −3
−2
−1 0 1 Distance relative to TSS (kb)
2
3
EXPERIMENTAL FIGURE 917 Analysis of elongating RNA polymerase II molecules in human fibroblasts. Nuclei from cultured fibroblasts were isolated and incubated in a buffer with a non-ionic detergent that prevents RNA polymerase II from initiating transcription. Treated nuclei were then incubated with ATP, CTP, GTP, and Br-UTP for 5 minutes at 30 °C, a time sufficient to incorporate about 100 nucleotides. RNA was then isolated and broken into fragments of about 100nucleotides each by controlled incubation at high pH. Specific RNA oligonucleotides were ligated to the 5′ and 3′ ends of the RNA fragments, which were then subjected to reverse transcription. The resulting DNA was amplified by the polymerase chain reaction and subjected to massively parallel DNA sequencing. The sequences determined were aligned to the transcription start sites (TSS) of all known human genes, and the number of sequence reads per kilobase of total sequenced DNA was plotted for 10-bp intervals of sense transcripts (blue) and antisense transcripts (purple). See text for discussion. [Data from L. J. Core, J. J. Waterfall, and J. T. Lis, 2008, Science 322:1845.]
upward to indicate transcription in the sense direction), and in the antisense direction, away from the gene (purple, plotted downward to represent transcription of the complementary DNA strand in the opposite, antisense direction). A peak of sense transcripts was observed at about +50relative to the major transcription start site (TSS), indicating that Pol II pauses in the +50to +200 region before elongating further. A peak at −250 to −500 relative to the major transcription start site of Pol II transcribing in the opposite direction was also observed, revealing paused RNA polymerase II molecules at the other ends of the nucleosome-free regions in CpG island promoters. Note that the number of sequence reads, and therefore the number of elongating polymerases, is lower for polymerases transcribing in the antisense direction more than 1 kb from the transcription start site compared with polymerases transcribing more than 1 kb from the transcription start site in the sense direction. The molecular mechanism(s) potentially accounting for this difference is presented in Figure 10-15, in which transcription termination is discussed. Note that a low number of sequence reads was also observed resulting from transcription upstream of the major transcription start sites (blue sequence reads to the left of 0 and purple sequence reads to the right of 0), indicating
Chromatin Immunoprecipitation The technique of chromatin immunoprecipitation outlined in Figure 9-18a, using an antibody to RNA polymerase II, provided additional data supporting the occurrence of divergent transcription from most CpG island promoters in mammals. The data from this analysis are reported as the number of times a specific sequence from this region of the genome was identified per million total sequences analyzed (Figure 9-18b). At divergently transcribed genes, such as the Hsd17b12 gene encoding an enzyme involved in intermediary metabolism, two peaks of immunoprecipitated DNA were detected, corresponding to Pol II transcribing in the sense and antisense directions and then pausing. However, Pol II was detected more than 1 kb from the start site only in the sense direction. The number of counts per million from this region of the genome was very low because the gene is transcribed at low frequency. However, the number of counts per million at the transcription start site regions for both sense and antisense transcription was much higher, reflecting the fact that Pol II molecules had initiated transcription in both directions at this promoter, but paused before transcribing farther than 500 bp from the start sites in each direction. In contrast, the Rpl6 gene, encoding a large ribosomal subunit protein that was abundantly transcribed in the proliferating mouse embryonic stem cells used in the study, was transcribed almost exclusively in the sense direction. The peak in counts per million less than 250 bp from the transcription start site again results from a long pause in transcription in the promoter-proximal region before the polymerase is released to transcribe into the gene. The number of sequence counts per million more than 1 kb downstream from the transcription start site was much higher than for sense-direction transcription of the Hsd17b12 gene, reflecting the high rate of transcription of the Rpl6 gene.
General Transcription Factors Position RNA Polymerase II at Start Sites and Assist in Initiation Initiation of transcription by RNA polymerase II requires several initiation factors. These initiation factors position PolII molecules at transcription start sites and help to separate the DNA strands so that the template strand can enter the active site of the enzyme. They are called general transcription factors because they are required at most, if not all, promoters of genes transcribed by RNA polymerase II. These proteins are designated TFIIA, TFIIB, and so on, and most are multimeric proteins. The largest is TFIID, which consists
9.3 RNA Polymerase II Promoters and General Transcription Factors
373
(a)
1 Treat living cells or tissues with a membrane-permeating cross-linker such as formaldehyde
Nucleus
2 Sonicate to shear cellular chromatin to short fragments and add antibody to Pol ll
Antibody to Pol II
5⬘
Paused polymerase
Nascent RNA Elongation inhibitor
DNA-RNA hybrid region 3 Immunoprecipitate to isolate Pol II cross-linked to DNA
EXPERIMENTAL FIGURE 918 The chromatin immunoprecipitation technique localizes where a protein of interest associates with the genome. (a) step 1 : Live cultured cells or tissues are incubated in 1 percent formaldehyde to covalently cross-link proteins to DNA and proteins to proteins. Step 2 : The preparation is then subjected to sonication to solubilize chromatin and shear it into fragments of 200–500 bp of DNA. Step 3 : An antibody to a protein of interest, here RNA polymerase II, is added, and DNA covalently linked to the protein of interest is immunoprecipitated. Step 4 : The covalent cross-linking is then reversed and the DNA is isolated. The isolated DNA can be analyzed by PCR with primers for a sequence of interest. Alternatively, total recovered DNA can be amplified, labeled by incorporation of a fluorescently labeled nucleotide, and hybridized to a microarray (see Figure 6-27) or subjected to massively parallel DNA sequencing. See A. Hecht and M. Grunstein, 1999, Method. Enzymol. 304:399. (b)Results from DNA sequencing of chromatin from mouse embryonic stem cells immunoprecipitated with antibody to RNA polymerase II are shown for a gene that is divergently transcribed (left) and a gene that is transcribed only in the sense direction (right). Data are plotted as the number of times a DNA sequence in a 50-bp interval was observed per million base pairs sequenced. The region encoding the 5′ end of the gene is shown below, with exons shown as rectangles and introns as lines. [Part (b) data from P. B. Rahl et al., 2010, Cell 141:432.]
4 Reverse cross-linking, isolate DNA, and subject to massively parallel DNA sequencing
Unidirectional initiation
Bidirectional initiation 20
30
RNA Pol ll
RNA Pol ll
20
10
10
Hsd17b12
121463
121464
121465
121466
kb
121467
93962
kb
93960
0 93958
0 93956
Counts/Million
(b)
Rpl6
of a single 38-kDa TATA box–binding protein (TBP) and 13 TBP-associated factors (TAFs). General transcription factors with similar activities and homologous sequences are found in all eukaryotes. The complex of Pol II and its general transcription factors bound to a promoter and ready to initiate transcription is called a preinitiation complex (PIC). Figure 9-19 summarizes the current model for the stepwise assembly of the Pol II transcription preinitiation complex on a promoter containing a TATA box. The TBP subunit of TFIID is the first protein to bind to a TATA box promoter. All eukaryotic TBPs analyzed to date have very similar C-terminal domains of 180 residues. This domain of TBP folds into a saddle-shaped structure; the two halves of the molecule exhibit an overall dyad symmetry but are not identical. TBP interacts with the minor groove in 374
CHAPTER 9
t Transcriptional Control of Gene Expression
DNA, bending the helix considerably (see Figure 5-5). The DNA-binding surface of TBP is conserved in all eukaryotes, explaining the high conservation of the TATA box promoter element (see Figure 9-16). Once TFIID has bound to the TATA box, TFIIA and TFIIB can bind. TFIIA is a heterodimer larger than TBP, and TFIIB is a monomeric protein, slightly smaller than TBP. TFIIA associates with TBP and DNA on the upstream side of the TBP–TATA box complex. The C-terminal domain of TFIIB makes contact with both TBP and DNA on either side of the TATA box. During transcription initiation, its N-terminal domain is inserted into the RNA exit channel of RNA polymerase II (see Figure 9-12c). The TFIIB N-terminal domain assists Pol II in melting the DNA strands at the transcription start site and interacts with the template strand near the Pol II active site. Following TFIIB binding, a preformed complex of TFIIF (a heterodimer of two different subunits in mammals) and Pol II binds, positioning the polymerase over the start site. Two more general transcription factors must bind before the DNA duplex can be separated to expose the template strand. First to bind is TFIIE, a heterodimer of two different subunits. TFIIE creates a docking site for TFIIH, another multimeric factor containing 10different subunits. Binding of TFIIH completes assembly of the transcription preinitiation complex (see Figure 9-19). Figure 9-20 shows a cryoelectron microscopic image of a yeast (S. cerevisiae) preinitiation complex assembled in vitro from purified RNA polymerase II and general transcription factors with TBP in place of the complete TFIID complex—a total of thirty-three polypeptides with a mass
FIGURE 919 Model for the sequential assembly of an RNA polymerase II preinitiation complex. The indicated general transcription factors and purified RNA polymerase II (Pol II) bind sequentially to TATA box DNA to form a preinitiation complex (PIC). ATP hydrolysis then provides the energy for the unwinding of DNA at the transcription start site by a TFIIH helicase subunit that pushes downstream DNA into the polymerase. The DNA is held in position in the PIC by binding of the TATA box by the TBP subunit of TFIID, and the resulting strain on the structure of the duplex DNA assists the N-terminal region of TFIIB and Pol II to melt the DNA at the transcription start site, forming the transcription bubble. As Pol II initiates transcription in the resulting open complex, the polymerase transcribes away from the promoter, its CTD becomes phosphorylated by the TFIIH kinase domain, and the general transcription factors dissociate from the promoter. See S. Sainsbury, C.Berrnecky, and P. Cramer, 2015, Nat. Rev. Mol. Cell Biol. 16:129.
TAFs Unbound promoter TFIID TBP Promoter DNA
TATA box TFIIA TFIIB
+1
Upstream promoter complex
CTD Pol II TFIIF
Core PIC
Upstream DNA
Downstream DNA TFIIE
TFIIH kinase
TFIIH
Closed PIC
ATP
Open PIC
Transcription bubble NTPs
of 1.5 megadaltons (MDa)—about the size of a ribosomal subunit. Such elaborate preinitiation complexes assemble at the promoters of every protein-coding gene expressed by a eukaryotic cell. The helicase activity of one of the core TFIIH subunits (Ssl2 in yeast; see Figure 9-20d) uses energy from ATP hydrolysis to help unwind the DNA duplex at the start site, allowing Pol II to form an open complex in which the DNA duplex surrounding the start site is melted and the template strand is bound at the polymerase active site. As the polymerase transcribes away from the promoter region, the N-terminal domain of TFIIB is released from the RNA exit channel as the 5′ end of the nascent RNA enters it. Three TFIIH subunits form a kinase module (TFIIH kinase in Figure 9-19) that phosphorylates the Pol II CTD multiple times on serine 5 (underlined) of the Tyr-Ser-ProThr-Ser-Pro-Ser repeat that constitutes the CTD. As we will discuss further in Chapter 10, a multiply phosphorylated CTD is a docking site for the enzymes that form the cap structure (see Figure 5-14) on the 5′ end of an RNA transcribed by RNA polymerase II. In the minimal in vitro transcription assay with TBP substituted for the full TFIID complex and purified RNA polymerase II, TBP remains bound to the TATA box as the polymerase transcribes away from the promoter region, but the other general transcription factors dissociate.
Nascent RNA Initially transcribing complex
Elongation factors Initiation factors
P P
Elongation complex
5’ cap
P
P
P
P P
P
Remarkably, the first subunits of TFIIH to be cloned from humans were identified because mutations in them cause defects in the repair of damaged DNA, such as a base with a covalently linked mutagen or a UV-induced thymine-thymine dimer (see Figure 5-37). In normal individuals, when a transcribing RNA polymerase becomes stalled at a region of damaged template DNA, the core TFIIH complex, lacking the three subunits of the kinase domain (see Figure 9-19) but including the helicase subunit mentioned above, recognizes the stalled polymerase and then associates with other proteins that function with TFIIH in repairing the damaged DNA region. In patients with mutant forms of these TFIIH subunits, such repair of damaged DNA in 9.3 RNA Polymerase II Promoters and General Transcription Factors
375
(a) Side
(b) Front
(c) Back
TFIIH
TFIIH
TFIIH
90°
180°
Pol II TFIIE Clamp
stre Down DNA
Ssl2
TFIIH
TFIIF
TFIIF
TFIIA
am
Ssl2
TBP
TFIIH
Pol II
TFIIE Clamp
TBP TFIIB
TFIIE
TFIIB Ssl2
TFIIS TFIIF
TFIIA
TFIIS
TFIIF
Upstream DNA
Upstream DNA (d) TFIIA
TFIIA Ssl2
Ssl2 TBP
TBP
ATP TFIIB
Pol II
TFIIB
Pol II
FIGURE 920 Model of the yeast preinitiation complex based on cryoelectron microscopy and fitting of known protein x-ray crystal structures. (a-c) Three views of the nearly complete PIC. The relative positions of Pol II and most of the GTFs are observed, but only about 50% of the mass of TFIIH is depicted because a large part of the mass of TFIIH is highly flexible and consequently could not be accurately determined by cryo-EM. Also high resolution structures have not been determined for many of the TFIIH subunits, and consequently could not be fitted to the TFIIH mass detected by cryo-EM. However, the interaction between DNA at the downstream side of the Pol II
cleft and the TFIIH Ssl2 helicase subunit required to melt promoter DNA is clearly visualized in (b) and (c). In (c), the interaction between TFIIH and TFIIE is not visualized because of the low resolution of the complex in this region. TFIIS is a Pol II elongation factor added to stabilize the PIC. (d)Model of entry of the template strand into the floor of the cleft where RNA polymerization is catalyzed. The Ssl2 helicase pushes DNA that is bound upstream to TBP, TFIIB, and TFIIA, creating torsional stress that contributes to transcription bubble melting.
transcriptionally active genes is impaired. As a result, affected individuals have extreme skin sensitivity to sunlight (a common cause of DNA damage through the generation of thymine-thymine dimers) and exhibit a high incidence of cancer. Consequently, these subunits of TFIIH serve two functions in the cell, one in the process of transcription initiation and a second in the repair of DNA. Depending on the severity of the defect in TFIIH function, these individuals may suffer from diseases such as xeroderma pigmentosum (see Chapter24) and Cockayne syndrome (see Chapter 5). ■
TATA-less genes that contain it by increasing TFIID binding. In addition, an α helix of TFIIB binds to the major groove of DNA upstream of the TATA box, and the strongest promoters contain the optimal sequence for this interaction, called the TFIIB recognition element (BRE) (see Figure 9-16). Chromatin immunoprecipitation assays (see Figure 9-18) using antibodies to TBP show that it binds in the region between the divergent transcription start sites in CpG island promoters. Consequently, the same general transcription factors are probably required for initiation from the weaker CpG island promoters as for initiation from promoters containing a TATA box. The absence of the promoter elements summarized in Figure 9-16 may account for the divergent transcription from multiple transcription start sites observed from CpG island promoters, since cues from the DNA sequence are not present to correctly orient the preinitiation complex. TFIID and the other general transcription factors may choose among alternative, nearly equivalent weak binding sites in CpG island promoters, which may explain the low frequency of transcription initiation as well as the alternative transcription start sites in divergent directions generally observed from this class of promoters.
The TAF subunits of TFIID function in initiating transcription from promoters that lack a TATA box. For instance, some TAF subunits contact the initiator element in promoters in which it occurs; their function probably explains how such sequences can replace a TATA box (see Figure 9-16). Additional TFIID TAF subunits can bind to a consensus sequence, A/G-G-A/T-C/T-G/A/C, that is centered about 30 bp downstream from the transcription start site in many genes that lack a TATA box promoter. Because of its position, this regulatory sequence is called the downstream promoter element (DPE) (see Figure 9-16). The DPE facilitates transcription of
376
CHAPTER 9
t Transcriptional Control of Gene Expression
[Data from K. Murakami, et al. 2015. Proc. Natl. Acad. Sci. USA, 112:13543, PDB ID 5fmf.]
Elongation Factors Regulate the Initial Stages of Transcription in the Promoter-Proximal Region
5ⴕ
In metazoans, at most promoters, Pol II pauses after transcribing fewer than 100 nucleotides, due to the binding of a five-subunit protein called NELF (negative elongation factor). NELF binds to Pol II along with a two-subunit elongation factor called DSIF (DRB sensitivity-inducing factor, so named because an ATP analog called DRB inhibits further transcription elongation in its presence). The inhibition of elongation that results from NELF binding to PolII is relieved when DSIF, NELF, and serine 2 of the Pol II CTD (Tyr-Ser-Pro-Thr-Ser-Pro-Ser) are phosphorylated by a protein kinase with two subunits, cyclin T–CDK9, also called P-TEFb, which associates with the Pol II-NELF-DSIF complex. The same elongation factors regulate transcription from CpG island promoters. These factors that regulate elongation in the promoter-proximal region provide a mechanism for controlling gene transcription in addition to the regulation of transcription initiation. This overall strategy for regulating transcription at both the initiation and elongation steps in the promoter-proximal region is similar to the regulation of the trp operon in E. coli (see Figure 9-7), although the molecular mechanisms involved are distinct. Transcription of HIV (human immunodeficiency virus), the cause of AIDS, is dependent on the activation of cyclin T–CDK9 by a small viral protein called Tat. Cells experimentally infected with tat− mutants produce short viral transcripts about 50 nucleotides long. In contrast, cells infected with wild-type HIV synthesize long viral transcripts that encompass the entire integrated proviral genome (see Figure 5-48 and Figure 8-13). Thus Tat functions as an antitermination factor, permitting RNA polymerase II to read through a transcriptional block. (Tat is initially made by rare transcripts that fail to terminate when the HIV promoter is transcribed at a high rate in “activated” T-lymphocytes; see Chapter 23.) Tat is a sequence-specific RNA-binding protein. It binds to the RNA copy of a sequence called TAR, which forms a stemloop structure near the 5′ end of the HIV transcript (Figure9-21). TAR also binds cyclin T, holding the cyclin T–CDK9 complex close to the polymerase, where it efficiently phosphorylates its substrates, resulting in transcription elongation. Chromatin immunoprecipitation assays done after treating cells with specific inhibitors of CDK9 indicate that the transcription of some 30 percent of mammalian genes is regulated by controlling the activity of cyclin T–CDK9 (P-TEFb), although this is probably done most frequently by sequence-specific DNA-binding transcription factors ratherthan by an RNA-binding protein, as in the case of HIV Tat. ■
TAR
CTD P
Cyclin T
Tat
Cdk9
P P
P
P P
P RNA Pol II
NELF DSIF
HIV DNA
FIGURE 921 Model of antitermination complex composed of HIV Tat protein and several cellular proteins. The TAR element in the HIV transcript contains sequences recognized by Tat and the cellular protein cyclin T. Cyclin T activates and helps position the protein kinase CDK9 near its substrates, the CTD of RNA polymerase II, NELF, and DSIF. CTD phosphorylation at serine 2 of the Pol II CTD heptad repeat is required for transcription elongation. Cellular proteins DSIF and the NELF complex are also involved in regulating Pol II elongation, as discussed in the text. See T. Wada et al., 1998, Gene Dev. 12:343; Y. Yamaguchi etal., 1999, Cell 97:451; and T. Yamada et al., 2006, Mol. Cell 21:227.
KEY CONCEPTS OF SECTION 9.3
RNA Polymerase II Promoters and General Transcription Factors r RNA polymerase II initiates transcription of genes at the nucleotide in the DNA template that corresponds to the 5′ nucleotide that is capped in the encoded mRNA. r Three principal types of promoter sequences have been identified in eukaryotic DNA. The TATA box is prevalent in highly transcribed genes. Initiator promoters are found in some genes, and CpG islands, the promoters for about 70percent of protein-coding genes in vertebrates, are characteristic of genes transcribed at a low rate. r Transcription of protein-coding genes by Pol II is initiated by sequential binding of the following in the indicated order: TFIID, which contains the TBP subunit that binds to TATA box DNA; TFIIA and TFIIB; a complex of Pol II and TFIIF; TFIIE; and finally, TFIIH (see Figure 9-19). r The helicase activity of a TFIIH subunit helps to separate the DNA strands at the transcription start site in most promoters, a process that requires hydrolysis of ATP. As Pol II begins transcribing away from the start site, its CTD is phosphorylated on serine 5 by the TFIIH kinase domain. r In metazoans, NELF and DSIF associate with Pol II after initiation, inhibiting elongation fewer than 100 bp from the transcription start site. Inhibition of elongation is relieved when cyclin T–CDK9 (also called P-TEFb) associates with the elongation complex and CDK9 phosphorylates subunits of NELF, DSIF, and serine 2 of the Pol II CTD.
9.3 RNA Polymerase II Promoters and General Transcription Factors
377
9.4 Regulatory Sequences in ProteinCoding Genes and the Proteins Through Which They Function As noted in the previous section, expression of eukaryotic protein-coding genes is regulated by multiple protein-binding DNA sequences, generically referred to as transcriptioncontrol regions. These regions include promoters and other types of control elements located near transcription start sites, as well as sequences located far from the genes they regulate. In this section, we take a closer look at the properties of various control elements found in eukaryotic proteincoding genes and the proteins that bind to them.
Promoter-Proximal Elements Help Regulate Eukaryotic Genes Recombinant DNA techniques have been used to systematically mutate the nucleotide sequences of various eukaryotic genes in order to identify transcription-control regions. The use of linker scanning mutagenesis, for example, can pinpoint the sequences within a regulatory region that function to control transcription. In this approach, a set of constructs with contiguous overlapping mutations are assayed for their effect on expression of a reporter gene or production of a specific mRNA (Figure 9-22a). This type of analysis
identified promoter-proximal elements of the thymidine kinase (tk) gene from herpes simplex type I virus (HSV-I). The results demonstrated that the DNA region upstream of the HSV-I tk gene contains three separate transcription-control sequences: a TATA box in the interval from −32 to −16 and two other control elements farther upstream (Figure 9-22b). Experiments using mutants containing single-base-pair changes in promoter-proximal control elements revealed that these elements are generally about 6–10 bp long. Recent results indicate that in human genes, they are found both upstream and downstream of the transcription start site at equal frequency. While, strictly speaking, the term promoter refers to the DNA sequence that determines where a polymerase initiates transcription, the term is often used to refer to both a promoter and its associated promoter-proximal control elements. To test the spacing constraints on control elements in the HSV-I tk promoter region identified by analysis of linker scanning mutations, researchers prepared and assayed constructs containing small deletions and insertions between the elements. Changes in spacing between the promoter and promoter-proximal control elements of 20 bp or fewer had little effect. However, insertions of 30–50 bp between a HSV-I tk promoter-proximal element and the TATA box was equivalent to deleting the element. Similar analyses of other eukaryotic promoters have also indicated that considerable flexibility in the spacing
(a) tk coding region
Vector DNA
tk mRNA
Control region Mutant no. 1 2 3 4 5 6 7 8 9 Control elements (b) PE-2
PE-1
TATA box
Control region of tk gene
378
CHAPTER 9
t Transcriptional Control of Gene Expression
tk
EXPERIMENTAL FIGURE 922 Linker scanning mutations identify transcriptioncontrol elements. (a) In linker scanning mutagenesis, a region of eukaryotic DNA (tan) that supports high-level expression of a reporter gene (light purple) is cloned in a plasmid vector as diagrammed at the top. Overlapping linker scanning (LS) mutations (crosshatched areas) are introduced from one end of the region being analyzed to the other. These mutations are created by scrambling the nucleotide sequence in a short stretch of the DNA. After the mutant plasmids are transfected separately into cultured cells, the activity of the reportergene product is assayed. In the example shown here, the sequence from −120 to +1 of the herpes simplex virus thymidine kinase gene, LS mutations 1, 4, 6, 7, and 9 have little or no effect on expression of the reporter gene, indicating that the regions altered in these mutants contain no control elements. Reporter-gene expression is significantly reduced in mutants 2, 3, 5, and 8, indicating that control elements (brown) lie in the intervals shown at the bottom. (b) Analysis of these LS mutations identified a TATA box and two promoter-proximal elements (PE-1 and PE-2). See S. L. McKnight and R.Kingsbury, 1982, Science 217:316.
between promoter-proximal elements is generally tolerated, but that separations of several tens of base pairs may decrease transcription.
hormone-secreting cells of the pancreas is located in a roughly 200-bp region upstream of exon 0 (so named because it was discovered after the exon called “exon 1”).
Distant Enhancers Often Stimulate Transcription by RNA Polymerase II
Most Eukaryotic Genes Are Regulated by Multiple Transcription-Control Elements
As noted earlier, transcription from many eukaryotic promoters can be stimulated by control elements located thousands of base pairs away from the transcription start site. Such long-distance transcription-control elements, referred to as enhancers, are common in eukaryotic genomes but fairly rare in bacterial genomes. Procedures such as linker scanning mutagenesis have indicated that enhancers, usually on the order of 200 bp long, are, like promoter-proximal elements, composed of several functional sequence elements of about 6–10 bp each. As discussed later, each of these regulatory elements is a binding site for a sequence-specific DNAbinding transcription factor. Analyses of many different metazoan enhancers have shown that they can occur with equal probability upstream from a promoter or downstream from a promoter within an intron, or even downstream from the final exon of a gene, as in the case of the SALL1 gene (see Figure 9-10a). Many enhancers are cell-type-specific. For example, an enhancer controlling Pax6 expression in the retina was characterized in the intron between exons 4 and 5 (see Figure9-9a), whereas an enhancer controlling Pax6 expression in the
Initially, enhancers and promoter-proximal elements were thought to be distinct types of transcription-control elements. However, as more enhancers and promoter-proximal elements were analyzed, the distinctions between them became less clear. For example, both types of elements can generally stimulate transcription even when inverted, and both types are often cell-type-specific. The general consensus now is that a spectrum of control elements regulates transcription by RNA polymerase II. At one extreme are enhancers, which can stimulate transcription from a promoter tens of thousands of base pairs away. At the other extreme are promoter-proximal elements, such as the upstream elements controlling the HSV-I tk gene, which lose their influence when moved 30–50 bp farther from the promoter. Researchers have identified a large number of transcription-control elements that can stimulate transcription from distances between these two extremes. Figure 9-23a summarizes the locations of transcriptioncontrol sequences for a hypothetical mammalian gene with a promoter containing a TATA box. The transcription start site encodes the first (5′) nucleotide of the first exon of an
(a) Mammalian gene with a TATA box
(b) Mammalian gene with a CpG island promoter
(c) S. cerevisiae gene
Exon
∼ −90
Intron
Promoter-proximal element
TATA box Enhancer; yeast UAS
CpG island
FIGURE 923 General organization of control elements that regulate gene expression in multicellular eukaryotes and yeast. (a) Mammalian genes with a TATA box promoter are regulated by promoter-proximal elements and enhancers. The promoter elements shown in Figure 9-16 position RNA polymerase II to initiate transcription at the start site and influence the rate of transcription. Enhancers may be either upstream or downstream and as far away as hundreds of kilobases from the transcription start site. In some cases, enhancers lie within introns. Promoter-proximal elements are found upstream and downstream of transcription start sites at equal frequency in mammalian genes.(b)For
mammalian genes with a CpG island promoter, transcription initiates at several sites in both the sense and antisense directions from the ends of the CpG-rich region. Transcripts in the sense direction are elongated and are processed into mRNAs by RNA splicing. These genes express mRNAs with alternative 5′ exons determined by the transcription start site. Genes with CpG island promoters contain promoter-proximal control elements. Currently, it is not clear whether they are also regulated by distant enhancers. (c) Most S. cerevisiae genes contain only one regulatory region, called an upstream activating sequence (UAS), and a TATA box, which is about 90 bp upstream from the transcription start site.
9.4 Regulatory Sequences in Protein-Coding Genes and the Proteins Through Which They Function
379
mRNA, the nucleotide that is capped. In addition to the TATA box at about −31 to −26, promoter-proximal elements, which are relatively short (~6–10 bp), are located within the first 200 bp either upstream or downstream of the start site. Enhancers, in contrast, are usually about 50–200 bp long and are composed of multiple elements of about 6–10 bp. Enhancers may be located up to 50 kb or more upstream or downstream from the start site or within an intron. Like the Pax6 gene, many mammalian genes are controlled by multiple enhancer regions that function in different types of cells. Figure 9-23b summarizes the promoter region of a mammalian gene with a CpG island promoter. About 70 percent of mammalian genes are expressed from CpG island promoters, usually at much lower levels than genes with TATA box promoters. Multiple alternative transcription start sites are used, generating mRNAs with alternative 5′ ends for the first exon derived from each start site. Transcription occurs in both directions, but Pol II molecules transcribing in the sense direction are elongated to 1 kb or more, much more efficiently than transcripts in the antisense direction. In the important model organism Saccharomyces cerevisiae (budding yeast), genes are closely spaced (see Figure 8-4b), and few genes contain introns. In this organism, enhancers, which are referred to as upstream activating sequences (UASs), usually lie within 200 bp upstream of the promoters of the genes they regulate. Most yeast genes contain only one UAS. In addition, S. cerevisiae genes contain a TATA box about 90 bp upstream from the transcription start site (Figure 9-23c).
DNase I Footprinting and EMSA Detect Protein-DNA Interactions The various transcription-control elements found in eukaryotic DNA are binding sites for regulatory proteins called transcription factors. The simplest eukaryotic cells encode hundreds of transcription factors, and the human genome encodes at least 1400. The transcription of each gene in the genome is independently regulated by combinations of specific transcription factors that bind to its transcription-control regions. The number of possible combinations of this many transcription factors is astronomical, sufficient to generate unique controls for every gene encoded in the genome. In yeast, Drosophila, and other genetically tractable eukaryotes, numerous genes encoding transcription activators and repressors have been identified by classical genetic analyses like those described in Chapter 6. However, in mammals and other vertebrates, which are less amenable to such genetic analysis, most transcription factors have been detected initially and subsequently purified by biochemical techniques. In this approach, a DNA regulatory element that has been identified by the kinds of mutational analyses described above is used to identify cognate proteins—those proteins that bind specifically
380
CHAPTER 9
t Transcriptional Control of Gene Expression
to it. Twocommon techniques for detecting such cognate proteins are DNase I footprinting and the electrophoretic mobility shift assay. DNase I footprinting takes advantage of the fact that when a protein is bound to a region of DNA, it protects that DNA sequence from digestion by nucleases. As illustrated in Figure 9-24a, samples of a DNA fragment that has been labeled with a radioactive atom at one end of one strand are digested under carefully controlled conditions in the presence and absence of a DNA-binding protein, then denatured and electrophoresed, and the resulting gel is subjected to autoradiography. The region protected by the bound protein appears as a gap, or “footprint,” in the array of bands resulting from digestion in the absence of the protein. When footprinting is performed with a DNA fragment containing a known transcription-control element, the appearance of a footprint indicates the presence of a transcription factor that binds that control element in the protein sample being assayed. Footprinting also identifies the specific DNA sequence to which the transcription factor binds. For example, DNase I footprinting of the strong adenovirus late promoter shows a protected region over the TATA box when TBP is added to the labeled DNA before DNase I digestion (Figure 9-24b). DNase I does not digest all phosphodiester bonds in a duplex DNA at equal rate. Consequently, in the absence of added protein (lanes 1, 6, and 9), a particular pattern of bands is observed that depends on the DNA sequence and results from cleavage at some phosphodiester bonds and not others. However, when increasing amounts of TBP are incubated with the end-labeled DNA before digestion with DNase I, TBP binds to the TATA box, and when sufficient TBP is added to bind all the labeled DNA molecules, it protects the region between about −35 and −20 from digestion (lanes 2–5). In contrast, increasing amounts of TFIID (lanes 7 and 8) protect not only the TATA box region, but also regions near −7, +1 to +5, +10 to +15, and +20 from digestion, producing a different “footprint” from TBP. Results such as this tell us that other subunits of TFIID (the TBP-associated factors, or TAFs) also bind to the DNA in the region downstream from the TATA box. The electrophoretic mobility shift assay (EMSA), also called the gel-shift or band-shift assay, is more useful than the footprinting assay for quantitative analysis of DNAbinding proteins. In general, the electrophoretic mobility of a DNA fragment is reduced when it is complexed with protein, causing a shift in the location of the fragment band. EMSA can be used to detect a transcription factor in protein fractions incubated with a radiolabeled DNA fragment (the probe) containing a known control element (Figure 9-25). The more transcription factor is added to the binding reaction, the more labeled probe is shifted to the position of the DNA-protein complex. In the biochemical isolation of a transcription factor, an extract of cell nuclei is commonly subjected sequentially to several
Sample B (DNA-binding protein present)
TBP
TFIID
No protein
(b)
Sample A (DNA-binding protein absent)
No protein
No protein
(a)
bp from the TSS
Sequence-specific binding protein
5ⴕ
3ⴕ
5ⴕ
3ⴕ
3ⴕ
5ⴕ
3ⴕ
5ⴕ
−50
TBP Footprint
Protein-binding sequence
−40 −30 TFIID Footprint
−20 −10 +1 +10
+20
+30
+40
1 2 3 4 5 6 7 8 9
lane
EXPERIMENTAL FIGURE 924 DNase I footprinting reveals the region of a DNA sequence where a transcription factor binds. (a) A DNA fragment known to contain a transcription-control element is labeled at one end with 32P (red dot). Portions of the labeled DNA sample are then digested with DNase I in the presence and in the absence of protein samples containing a sequence-specific DNAbinding protein. DNase I hydrolyzes the phosphodiester bonds of DNA between the 3′ oxygen on the deoxyribose of one nucleotide and the 5′ phosphate of the next nucleotide. A low concentration of DNase I is used so that, on average, each DNA molecule is cleaved just once (vertical arrows). If the protein sample does not contain a protein that binds to a specific sequence in the labeled DNA, the DNA fragment is cleaved at multiple positions between the labeled and unlabeled ends of the original fragment, as in sample A (left). If the protein sample does contain such a protein, as in sample B (right), the protein binds to its
cognate sequence in the DNA, thereby protecting a portion of the fragment from digestion. Following DNase treatment, the DNA is separated from protein, denatured to separate the strands, and electrophoresed. Autoradiography of the resulting gel detects only labeled strands and reveals fragments extending from the labeled end to the site of cleavage by DNase I. Cleavage fragments containing the transcription-control element show up on the gel for sample A but are missing in sample B because the bound cognate protein has blocked cleavages within that sequence and thus production of the corresponding fragments. The missing bands on the gel constitute the footprint. (b) Footprints produced by increasing amounts of TBP (indicated by the triangle) and of TFIID on the strong adenovirus major late promoter. [Part (b) from Zhou, Q.
types of liquid chromatography (see Chapter 3). Fractions eluted from the columns are assayed by DNase I footprinting or EMSA using DNA fragments containing an identified regulatory element (see Figure 9-22). Fractions containing a protein that binds to the regulatory element in these assays contain a putative transcription factor. A powerful technique that is commonly used for the final step in purifying transcription factors is sequence-specific DNA affinity chromatography, a particular type of affinity chromatography in which long DNA strands containing multiple copies of the transcription-factorbinding site are coupled to a column matrix. Once a transcription factor has been isolated and purified, its partial amino acid sequence can be determined and used to clone the gene or cDNA encoding it, as outlined in
Chapter 6. The isolated gene can then be used to test the ability of the encoded protein to activate or repress transcription in an in vivo transfection assay (Figure 9-26).
et al., “Holo-TFIID supports transcriptional stimulation by diverse activators and from a TATA-less promoter,” Genes & Development, 11/1992; 6(10):1964–74; republished with permission from Cold Spring Harbor Laboratory Press.]
Activators Are Composed of Distinct Functional Domains Studies with a yeast transcription activator called Gal4 provided early insight into the domain structure of transcription factors. The gene encoding Gal4, which promotes expression of enzymes needed to metabolize galactose, was identified by complementation analysis of gal4 mutants that cannot form colonies on an agar medium in which galactose is the only source of carbon and energy (see Chapter6).
9.4 Regulatory Sequences in Protein-Coding Genes and the Proteins Through Which They Function
381
Fraction ON 1 2 3 4 5 6 7 8 9 10 11 12 14 16 18 20 22
Bound probe
Free probe
EXPERIMENTAL FIGURE 925 The electrophoretic mobility shift assay can be used to detect transcription factors during purification. In this example, protein fractions separated by column chromatography were assayed for their ability to bind to a radiolabeled DNA-fragment probe containing a known regulatory element. After an aliquot of the protein sample was loaded onto the column (ON) and successive column fractions (numbers) were incubated with the labeled probe, the samples were electrophoresed under conditions that do not disrupt protein-DNA interactions. The free probe not bound to protein migrated to the bottom of the gel. A protein in the preparation applied to the column and in fractions 7 and 8 bound to the probe, forming a DNA-protein complex that migrated more slowly than the free probe. These fractions are therefore likely to contain the regulatory protein being sought. [From Yoshinaga, S. et al., “Purification and characterization of transcription factor IIIC2,” J. Biol. Chem., 1989, 264:10726 ©1989 American Society for Biochemistry and Molecular Biology.]
Directedmutagenesis studies like those described previously identified UASs for the genes activated by Gal4. Each of these UASs was found to contain one or more copies of a 17-bp sequence called UASGAL. DNase I footprinting assays with recombinant Gal4 protein produced in E. coli from the yeast GAL4 gene showed that Gal4 binds to UASGAL sequences. When a copy of UASGAL was cloned upstream of a TATA box followed by a β-galactosidase reporter gene, and that construct was introduced into yeast cells, expression of β-galactosidase was activated in galactose media in wildtype cells, but not in gal4 mutants. These results showed that UASGAL is a transcription-control element activated by the Gal4 transcription factor in galactose media. A remarkable set of experiments with gal4 deletion mutants demonstrated that the Gal4 transcription factor is composed of separable functional domains: an N-terminal DNA-binding domain, which binds to specific DNA sequences, and a C-terminal activation domain, which interacts with other proteins to stimulate transcription from a nearby promoter (Figure 9-27). When the N-terminal DNA-binding domain of Gal4 was fused directly to various portions of its own C-terminal region, deleting internal sequences, the resulting truncated proteins retained the ability to stimulate expression of a reporter gene in an in vivo assay like that depicted in Figure 9-26. Thus the internal portion of the protein is not required for the functioning of Gal4 as a transcription factor. Similar experiments with another
382
CHAPTER 9
t Transcriptional Control of Gene Expression
yeast activator, Gcn4, which regulates genes required for the synthesis of many amino acids, indicated that it contains a roughly 50-amino-acid DNA-binding domain at its C-terminus and a roughly 20-amino-acid activation domain near the middle of its sequence. Further evidence for the existence of distinct activation domains in Gal4 and Gcn4 came from experiments in which their activation domains were fused to a DNA-binding domain from an entirely unrelated E. coli DNA-binding protein. When these fusion proteins were assayed in vivo, they activated transcription of a reporter gene containing the cognate site for the E. coli protein. Thus functional transcription factors can be constructed from entirely novel combinations of prokaryotic and eukaryotic elements. Studies such as these have now been carried out with many eukaryotic transcription factors. The structural model of eukaryotic activators that has emerged from these studies is a modular one in which one or more activation domains are connected to a sequence-specific DNA-binding domain by intrinsically disordered, flexible protein domains (Figure 9-28). In some cases, amino acids included in the DNA-binding domain also contribute to transcriptional activation. As discussed in a later section, activation domains
Gene encoding protein X
Reporter gene
Plasmid 1
X-binding site
Plasmid 2
Protein X
Reporter-gene transcripts
1 2 Nucleus
EXPERIMENTAL FIGURE 926 An in vivo transfection assay measures transcription activity to evaluate proteins believed to be transcription factors. The assay system requires two plasmids. One plasmid contains the gene encoding the putative transcription factor (protein X). The second plasmid contains a reporter gene (e.g., luciferase) and one or more binding sites for protein X. Both plasmids are simultaneously introduced into cells that lack the gene encoding protein X. The production of reporter-gene RNA transcripts is measured; alternatively, the activity of the encoded protein can be assayed. If reporter-gene transcription is greater in the presence of the X-encoding plasmid than in its absence, then the protein is an activator; if transcription is less, then it is a repressor. By use of plasmids encoding a mutated or rearranged transcription factor, important domains of the protein can be identified.
(a) Reporter-gene construct lacZ gene UASGAL
TATA box
β-galactosidase Binding activity to UASGAL
(b) Wild-type and mutant GAL4 proteins 1
Wild-type
74
738 823
C
+
+++
881
_
_
+
+++
+
+++
+
++
+
+
+
–
+
–
881
+
+++
881
+
+++
881
+
++
N DNA-binding domain
Activation domain
50
848 823
N- and C-terminal deletion mutants
792 755 692 74
74
Internal deletion mutants
684
74 74
738 768
EXPERIMENTAL FIGURE 927 Deletion mutants of the GAL4 gene in yeast with a UASGAL reporter-gene construct demonstrate the separate functional domains in a transcription activator. (a)Diagram of DNA construct containing a lacZ reporter gene (encoding β-galactosidase) and TATA box ligated to UASGAL, a regulatory element that contains several Gal4-binding sites. The reporter-gene construct and DNA encoding wild-type or mutant (deleted) Gal4 were simultaneously introduced into mutant (gal4) yeast cells, and the activity of β-galactosidase expressed from lacZ was assayed. Activity should be high if the introduced GAL4 DNA encodes a functional protein. (b) Schematic diagrams of wild-type Gal4 and various mutant forms. Small numbers refer to positions in the wild-type sequence. Deletion of 50 amino acids
from the N-terminal end destroyed the ability of Gal4 to bind to UASGAL and to stimulate expression of β-galactosidase from the reporter gene. Proteins with extensive deletions from the C-terminal end still bound to UASGAL. These results localize the DNA-binding domain to the N-terminal end of Gal4. The ability to activate β-galactosidase expression was not entirely eliminated unless somewhere between 126 and 189 or more amino acids were deleted from the C-terminal end. Thus the activation domain lies in the C-terminal region of Gal4. Proteins with internal deletions (bottom) were also able to stimulate expression of β-galactosidase, indicating that the central region of Gal4 is not crucial for its function in this assay. See J. Ma and M. Ptashne, 1987, Cell 48:847; I. A. Hope and K.Struhl, 1986, Cell 46:885; and R. Brent and M. Ptashne, 1985, Cell 43:729.
are thought to function by binding other proteins involved in transcription. The presence of flexible, intrinsically disordered protein domains (see Figure 3-8) connecting the DNA-binding domain to the activation domains may explain why alterations in the spacing between control elements are so well tolerated in eukaryotic control regions. Thus even when the positions of transcription factors bound to DNA are shifted relative to each other, their activation domains may still be able to interact because they are attached to their DNA-binding domains through flexible protein regions.
in yeast that result in continuously high expression of certain genes. This type of unregulated, abnormally high expression, called constitutive expression, results from the inactivation of a repressor that normally inhibits the transcription of these genes. Similarly, mutants of Drosophila melanogaster and Caenorhabditis elegans have been isolated that are defective in embryonic development because they express genes in embryonic cells where those genes are normally repressed. The mutations in these mutants inactivate repressors, leading to abnormal development. Repressor-binding sites in DNA have been identified by systematic linker scanning mutation analyses similar to the one depicted in Figure 9-22. In this type of analysis, whereas mutation of an activator-binding site leads to decreased expression of the linked reporter gene, mutation of a repressorbinding site leads to increased expression of a reporter gene.
Repressors Are the Functional Converse of Activators Eukaryotic transcription is regulated by repressors as well as activators. For example, geneticists have identified mutations
9.4 Regulatory Sequences in Protein-Coding Genes and the Proteins Through Which They Function
383
Examples N
C
GAL4
N
C
GCN4
N
C
GR
N
C
SP1
DNA-binding domain Activation domain Intrinsically disordered protein domain
FIGURE 928 Schematic diagrams illustrating the modular structure of eukaryotic transcription activators. Transcription factors may contain more than one activation domain but rarely contain more than one DNA-binding domain. Gal4 and Gcn4 are yeast transcription activators. The glucocorticoid receptor (GR) promotes transcription of target genes when certain hormones are bound to the C-terminal activation domain. SP1 binds to GC-rich promoter elements in a large number of mammalian genes.
The repressor proteins that bind such sites can be purified and assayed using the same biochemical techniques described earlier for activator proteins. Eukaryotic transcription repressors are the functional converse of activators. They can inhibit transcription of a gene they do not normally regulate when their cognate binding sites are placed within tens of base pairs to many kilobases of the gene’s transcription start site. Like activators, most eukaryotic repressors are modular proteins that have two functional domains: a DNA-binding domain and a repression domain. Like activation domains, repression domains continue to function when fused to another type of DNA-binding domain. If binding sites for this second DNAbinding domain are inserted within a few hundred base pairs of a promoter, expression of the fusion protein inhibits transcription from the promoter. Also like activation domains, repression domains function by interacting with other proteins, as discussed later in this chapter.
negatively charged phosphates in the sugar-phosphate backbone, and in some cases, interactions with atoms in the DNA minor groove, also contribute to binding. The principles of specific protein-DNA interactions were first discovered during the study of bacterial repressors. Many bacterial repressors are dimeric proteins in which an α helix from each monomer inserts into the major groove in the DNA helix and makes multiple, specific interactions with the atoms there (Figure 9-29). This α helix is referred to as the recognition helix or sequence-reading helix because most of the amino acid side chains that contact bases in the DNA extend from this helix. The recognition helix, which protrudes from the surface of a bacterial repressor, is usually supported in the protein structure in part by hydrophobic interactions with a second α helix just N-terminal to it. This entire structural element, which is present in many bacterial repressors, is called a helix-turnhelix motif. Many additional structural motifs that can present an α helix to the major groove of DNA are found in eukaryotic transcription factors, which are often classified according to the type of DNA-binding domain they contain. Because most of these motifs have characteristic consensus amino acid sequences, potential transcription factors can be recognized among the cDNA sequences from various tissues that have
Turn Helix (recognition)
DNA-Binding Domains Can Be Classified into Numerous Structural Types The DNA-binding domains of eukaryotic transcription factors contain a variety of structural motifs that bind specific DNA sequences. The ability of DNA-binding proteins to bind to specific DNA sequences commonly results from noncovalent interactions between atoms in an α helix in the DNA-binding domain and atoms on the edges of the bases within the major groove in the DNA. Ionic interactions between positively charged residues arginine and lysine and 384
CHAPTER 9
t Transcriptional Control of Gene Expression
Helix
FIGURE 929 Interaction of bacteriophage 434 repressor with DNA. Ribbon diagram of 434 repressor bound to its specific operator DNA. The recognition helices are shown in green. The α helices N-terminal tothe recognition helix and the turn in the polypeptide backbone between the helices in the helix-turn-helix structural motif are shown in yellow and red, respectively. The protein interacts intimately with one side of the DNA molecule over a length of 1.5 turns. [Data from A. K. Aggarwal et al., 1988, Science 242:899, PDB ID 2ori.]
been characterized in humans and other species. Here we introduce several common classes of DNA-binding proteins whose three-dimensional structures have been determined. In all these examples, and in many other transcription factors, at least one α helix is inserted into the major groove of DNA. However, some transcription factors contain alternative structural motifs (e.g., β strands and loops; see NFAT in Figure 9-33 below as an example) that interact with DNA. Homeodomain Proteins Many eukaryotic transcription factors that function during development contain a conserved 60-residue DNA-binding motif, called a homeodomain, that is similar to the helix-turn-helix motif of bacterial repressors. These transcription factors were first identified in Drosophila mutants in which one body part was transformed into another during development (see Figure 9-2b). The conserved homeodomain sequence has also been found in vertebrate transcription factors, including those that have similar master-control functions in human development. Zinc-Finger Proteins A number of different eukaryotic proteins have regions that fold around a central Zn2+ ion, producing a compact domain from a relatively short length of polypeptide chain. Termed a zinc finger, this structural motif was first recognized in DNA-binding domains, but is now known to occur
(a)
(c)
Finger 5 Finger 4 Finger 3 Zn2+
Finger 2 Finger 1
Zn2+ (b)
(d) Binding domain 1
Zn2+
Binding domain 2
in other proteins that do not bind to DNA. Here we describe two of the several classes of zinc-finger motifs that have been identified in eukaryotic transcription factors. The C2H2 zinc finger is the most common DNA-binding motif encoded in the human genome and the genomes of other mammals. It is also common in multicellular plants, but is not the dominant type of DNA-binding domain in plants, as it is in animals. This motif has a 23–26-residue consensus sequence containing two conserved cysteine (C) and two conserved histidine (H) residues, whose side chains bind one Zn2+ ion (see Figure 3-10c). The name “zinc finger” was coined because a two-dimensional diagram of the structure resembles a finger. When the three-dimensional structure was solved, it became clear that the binding of the Zn2+ ion by the two cysteine and two histidine residues folds the relatively short polypeptide sequence into a compact domain, which can insert its α helix into the major groove of DNA. Many transcription factors contain multiple C2H2 zinc fingers, which interact with successive groups of base pairs, within the major groove, as the protein wraps around the DNA double helix (Figure 9-30a). A second type of zinc-finger structure, designated the C 4 zinc finger (because it has four conserved cysteines in contact with the Zn 2+), is found in some 50 human
FIGURE 930 Eukaryotic DNA-binding domains that use an 𝛂 helix to interact with the major groove of specific DNA sequences. (a) The GL1 DNA-binding domain is monomeric and contains five C2H2 zinc fingers. The α helices are shown as cylinders, the Zn2+ ions as spheres. Finger 1 does not interact with DNA, whereas the other four fingers do. (b) The glucocorticoid receptor is a homodimeric C4 zinc-finger protein, one monomer in green, one in yellow. The α helices are shown as cylinders, the βstrands as white arrows, the Zn2+ ions as spheres. Two α helices (darker shade), one in each monomer, interact with the DNA. Like all C4 zinc-finger homodimers, this transcription factor has twofold rotational symmetry. (c) In leucine-zipper proteins, basic residues in the extended α-helical regions of the monomers interact with the DNA backbone at adjacent sites in the major groove. The coiled-coil dimerization domain is stabilized by hydrophobic interactions between the monomers. (d) In bHLH proteins, the DNA-binding helices at the right (N-termini of the monomers) are separated by nonhelical loops from a leucine zipper–like region containing a coiled-coil dimerization domain. [Part (a), see N. P. Pavletich and C. O. Pabo, 1993, Science 261:1701, PDB ID 2gli. Part (b), see B. F. Luisi et al., 1991, Nature 352:497 PDB ID 1glu. Part (c), data from T. E. Ellenberger et al., 1992, Cell 71:1223, PDB ID 1ysa. Part (d), data from P. Brownlie et al., 1997, Structure 5:509, PDB ID 1hlo.]
9.4 Regulatory Sequences in Protein-Coding Genes and the Proteins Through Which They Function
385
transcription factors. The first members of this class were identified as specific intracellular high-affinity binding proteins, or “receptors,” for steroid hormones, which led to the name steroid receptor superfamily. Because similar intracellular receptors for nonsteroid hormones were subsequently found, these transcription factors are now commonly called nuclear receptors. The characteristic feature of C4 zinc fingers is the presence of two groups of four critical cysteines, one toward each end of the 55–56-residue domain. Although the C4 zinc finger was initially named by analogy with the C2H2 zinc finger, the three-dimensional structures of proteins containing these DNA-binding motifs were later found to be quite distinct. A particularly important difference between the two is that C2H2 zinc-finger proteins generally contain three or more repeating finger units and bind as monomers, whereas C4 zinc-finger proteins generally contain only two finger units and generally bind to DNA as homodimers or heterodimers. Homodimers of C4 zinc-finger DNA-binding domains have twofold rotational symmetry (Figure 9-30b). Consequently, homodimeric nuclear receptors bind to consensus DNA sequences that are inverted repeats. Leucine-Zipper Proteins Another structural motif present in the DNA-binding domains of a large class of transcription factors contains the hydrophobic amino acid leucine at every seventh position in the sequence. These proteins bind to DNA as dimers, and mutagenesis of the leucines showed that they were required for dimerization. Consequently, the name leucine zipper was coined to denote this structural motif of a coiled coil of two α helixes. The DNA-binding domain of the yeast Gcn4 transcription factor mentioned earlier is a leucine-zipper domain. X-ray crystallographic analysis of complexes between DNA and the Gcn4 DNA-binding domain has shown that the dimeric protein contains two extended α helices that “grip” the DNA molecule, much like a pair of scissors, at two adjacent sites in the major groove separated by about half a turn of the double helix (Figure 9-30c). The portions of the α helices contacting the DNA include positively charged (basic) residues that interact with phosphates in the DNA backbone and additional residues that interact with specific bases in the major groove. Gcn4 forms dimers via hydrophobic interactions between the C-terminal regions of the α helices, forming a coiled-coil structure. This structure is common in proteins containing amphipathic α helices in which hydrophobic amino acid residues are regularly spaced alternately three or four positions apart in the sequence, forming a stripe down one side of the α helix. These hydrophobic stripes make up the interacting surfaces between the α-helical monomers in a coiled-coil dimer (see Figure 3-10a). Although the first leucine-zipper transcription factors to be analyzed contained leucine residues at every seventh position in the dimerization region, additional DNA-binding proteins containing other hydrophobic amino acids in these positions were subsequently identified. Like leucine-zipper proteins, they form dimers containing a C-terminal coiledcoil dimerization region and an N-terminal DNA-binding 386
CHAPTER 9
t Transcriptional Control of Gene Expression
domain. The term basic zipper (bZIP) is now frequently used to refer to all proteins with these common structural features. Many basic-zipper transcription factors are heterodimers of two different polypeptide chains, each containing one basic-zipper domain. Basic Helix-Loop-Helix (bHLH) Proteins The DNA-binding domain of another class of dimeric transcription factors contains a structural motif that is very similar to the basiczipper motif except that a nonhelical loop of the polypeptide chain separates two α-helical regions in each monomer (Figure 9-30d). Termed a basic helix-loop-helix (bHLH), this motif was predicted from the amino acid sequences of these proteins, which contain an N-terminal α helix with basic residues that interact with DNA, a middle loop region, and a C-terminal region, with hydrophobic amino acids spaced at intervals characteristic of an amphipathic α helix, that dimerizes into a coiled coil. As with basic-zipper proteins, different bHLH proteins can form heterodimers.
Structurally Diverse Activation and Repression Domains Regulate Transcription Experiments with fusion proteins composed of the Gal4 DNA-binding domain and random segments of E. coli proteins demonstrated that a diverse group of amino acid sequences (~1 percent of all E. coli sequences) can function as activation domains, even though they evolved to perform other functions. Many transcription factors contain activation domains marked by an unusually high percentage of particular amino acids. Gal4, Gcn4, and most other yeast transcription factors, for instance, have activation domains that are rich in acidic amino acids (aspartic and glutamic acids). These so-called acidic activation domains are generally capable of stimulating transcription in nearly all types of eukaryotic cells—fungal, animal, and plant cells. Activation domains from some Drosophila and mammalian transcription factors are glutamine-rich, and some are proline-rich; still others are rich in the closely related amino acids serine and threonine, both of which have hydroxyl groups. However, some strong activation domains are not particularly rich in any specific amino acid. Biophysical studies indicate that acidic activation domains have an unstructured, random-coil, intrinsically disordered conformation. These domains stimulate transcription when they are bound to a protein co-activator. The interaction with a co-activator causes the activation domain to assume a more structured α-helical conformation in the activation domain–co-activator complex. A well-studied example of a transcription factor with an acidic activation domain is the mammalian CREB protein, which is phosphorylated in response to increased levels of cAMP. This regulated phosphorylation is required for CREB to bind to its co-activator CBP (CREB binding protein), resulting in the transcription of genes whose control regions contain a CREB-binding site (see Figure 15-30). When the phosphorylated random-coil activation domain of CREB interacts with CBP, it undergoes a conformational change to form two α helices linked by a
short loop, which wrap around the interacting domain of CBP (Figure 9-31a). Some activation domains are larger and more highly structured than acidic activation domains. For example, the ligand-binding domains of nuclear receptors function as activation domains when they bind their specific hormone ligand (Figure 9-31b, c). Binding of ligand induces a large (a)
Domain of CBP
pS133 D140
L128
I137
R124 D144
CREB activation domain
Y134 L141 L138 P146
A145
(b)
(c)
conformational change in the nuclear receptor that allows the ligand-binding domain with bound hormone to interact with a short α helix in a co-activator; the resulting complex can then activate transcription of genes whose control regions bind the nuclear receptor. Thus the acidic activation domain in CREB and the ligand-binding activation domains in nuclear receptors represent two structural extremes. The CREB acidic activation domain is an intrinsically disordered random coil that folds into two α helices when it binds to the surface of a globular domain in a co-activator. In contrast, the nuclear-receptor ligand-binding activation domain is a structured globular domain that interacts with a short α helix in a co-activator, which probably is a random coil before it is bound. In both cases, however, specific proteinprotein interactions between a co-activator and the activation domain permit the transcription factor to stimulate gene expression. Currently, less is known about the structure of repression domains. The globular ligand-binding domains of some nuclear receptors function as repression domains in the absence of their specific hormone ligand. Like activation domains, repression domains may be relatively short, comprising 15 or fewer amino acids. Biochemical and genetic studies indicate that repression domains also mediate protein-protein interactions and bind to co-repressor proteins, forming a complex that inhibits transcription initiation by mechanisms that are discussed later in the chapter.
Transcription Factor Interactions Increase Gene-Control Options α-helix from interacting co-activator Estrogen (agonist) Tamoxifen (antagonist)
FIGURE 931 Activation domains may be random coils until they interact with co-activator proteins or folded protein domains. (a) The acidic activation domain of CREB (cyclic AMP response element-binding protein) is activated by phosphorylation at serine 123. It is a random coil until it interacts with a domain of its co-activator, CBP (shown as a spacefilling surface model with negatively charged regions in red and positively charged regions in blue). When the CREB activation domain binds to CBP, it folds into two amphipathic α helices. Side chains in the activation domain that interact with the surface of the CBP domain are labeled. (b) The ligand-binding activation domain of the estrogen receptor is a folded-protein domain. When estrogen is bound to the domain, the green α helix interacts with the ligand, generating a hydrophobic groove in the ligand-binding domain (dark brown helices), which binds an amphipathic α helix in a co-activator subunit (blue). (c) The conformation of the estrogen receptor in the absence of hormone is stabilized by binding of the estrogen antagonist tamoxifen. In this conformation, the green helix of the receptor folds into a conformation that interacts with the co-activator– binding groove of the active receptor, sterically blocking binding of coactivators. [Part (a) data from I. Radhakrishnan et al., 1997, Cell 91:741, PDB ID 1kdx. Parts (b) and (c) data from A. K. Shiau et al., 1998, Cell 95:927, PDB ID 3erd and 3ert.]
Two types of DNA-binding proteins discussed previously— bZIP and bHLH proteins—often exist in alternative heterodimeric combinations of monomers. Other classes of transcription factors not discussed here also form heterodimeric proteins. In some heterodimeric transcription factors, each monomer recognizes the same sequence. In these cases, the formation of alternative heterodimers does not increase the number of different sites on which the monomers can act, but rather allows the activation domains associated with each monomer to be brought together in alternative combinations that bind to the same site (Figure 9-32a). As we will see later, and in subsequent chapters, the activities of individual transcription factors can be regulated by multiple mechanisms. Consequently, a single bZIP- or bHLH-binding DNA regulatory element in the transcription-control region of a gene may elicit different transcriptional responses depending on which bZIP or bHLH monomers are expressed in the cell and how their activities are regulated. In some heterodimeric transcription factors, however, each monomer has a different DNA-binding specificity. The resulting combinatorial possibilities increase the number of potential DNA sequences that a family of transcription factors can bind. Three different transcription-factor monomers could theoretically combine to form six different homo- and heterodimeric transcription factors, as illustrated in Figure 9-32b. Four different monomers could form a total of ten
9.4 Regulatory Sequences in Protein-Coding Genes and the Proteins Through Which They Function
387
(a) Factor A
Factor B
Factor C Activation domain DNA-binding domain
(b) Factor A
Factor B
Factor C Activation domain DNA-binding domain
Site 1
Inhibitory factor
Site 2
Site 3
Site 4
Site 5
Site 6
Site 2
Site 3
Site 4
Site 5
Site 6
(c)
Site 1
FIGURE 932 Combinatorial possibilities due to formation of heterodimeric transcription factors. (a) In some heterodimeric transcription factors, the activation domain of each monomer recognizes the same DNA sequence. In the hypothetical example shown, transcriptionfactor monomers A, B, and C can all interact with one another, creating six different alternative combinations of activation domains that can all bind at the same site. Each composite binding site is divided into two half-sites, and each heterodimeric factor contains the activation domains of its two constituent monomers. (b) When transcription-factor monomers recognize different DNA sequences, six alternative combinations of the transcription-factor monomers A, B, and C, each with a unique pair of activation domains, can bind to six different DNA sequences (sites 1–6). (c) Expression of an inhibitory factor (red) that interacts only with the dimerization domain of A inhibits binding; hence transcriptional activation at sites 1, 4, and 5 is inhibited, but activation at sites 2, 3, and 6 is unaffected.
dimeric factors; five monomers, sixteen dimeric factors; and so forth. In addition, inhibitory factors are known that bind to some bZIP and bHLH monomers, thereby blocking their binding to DNA. When these inhibitory factors are expressed, they repress transcriptional activation by the factors with which they interact (Figure 9-32c). Thus the rules governing the interactions of members of a heterodimeric transcription factor family are complex. This combinatorial complexity expands both the number of DNA sites from which these factors can activate transcription and the ways in which they can be regulated. Similar combinatorial transcription regulation is achieved through the interaction of structurally unrelated 388
CHAPTER 9
t Transcriptional Control of Gene Expression
transcription factors bound to closely spaced binding sites in DNA. An example is the interaction of two transcription factors, NFAT and AP1, that bind to neighboring sites in a composite promoter-proximal element regulating the gene encoding interleukin-2 (IL-2). Expression of the IL-2 gene is critical to the immune response, but abnormal expression of IL-2 can lead to autoimmune diseases such as rheumatoid arthritis (see Chapter 23). Neither NFAT nor AP1 binds to its site in the IL-2 control region in the absence of the other. The affinities of these factors for these particular DNA sequences are too low for the individual factors to form a stable complex with DNA. However, when both NFAT and AP1 are present, protein-protein interactions between them stabilize the ternary complex composed of NFAT, AP1, and DNA (Figure 9-33a). Such cooperative DNA binding by various transcription factors results in considerable combinatorial complexity of transcriptional control. As a result, the 1400 or so transcription factors encoded in the human genome can bind to DNA through a much larger number of cooperative interactions, resulting in unique transcriptional control for each of the roughly 21,000 human genes. In the case of IL-2, transcription occurs only when NFAT is activated, which results in its transport from the cytoplasm to the nucleus, and the two subunits of AP1 are synthesized. These two events are controlled by distinct signal transduction pathways (see Chapters 15 and 16), allowing stringent control of IL-2 expression. Cooperative binding by NFAT and AP1 occurs only when their weak binding sites are positioned quite close to each other in DNA. The sites must be located at a precise distance from each other for effective binding. The requirements for cooperative binding are not so stringent in the case of some other transcription factors and transcription-control regions. For example, the EGR-1 control region contains a composite binding site to which the SRF and SAP1 transcription factors bind cooperatively (Figure 9-33b). Because SAP1 has a long, flexible domain that interacts with SRF, the two proteins can bind cooperatively when their individual sites in DNA are separated by any distance up to about 30 bp or are inverted relative to each other.
Multiprotein Complexes Form on Enhancers As noted previously, enhancers generally range in length from about 50 to 200 bp and include binding sites for several transcription factors. Analysis of the roughly 50-bp enhancer that regulates expression of β-interferon, an important protein in defense against viral infections in vertebrates, provides a good example of the structure of the DNA-binding domains of several transcription factors bound to the several transcription-factor-binding sites that constitute an enhancer (Figure 9-34). The term enhanceosome has been coined to describe such large DNA-protein complexes that assemble from transcription factors as they bind to the multiple binding sites in an enhancer. Because of the presence of flexible regions connecting the DNA-binding domains and activation or repression domains in transcription factors (see Figure 9-28), and because
control in eukaryotes. Transposition of DNA sequences and recombination between repeated sequences over evolutionary time probably created new combinations of control elements that were subjected to natural selection and retained if they proved beneficial. The latitude in spacing between regulatory elements probably allowed many more functional combinations to be subjected to this evolutionary experimentation than would be the case if constraints on the spacing between regulatory elements were strict, as for most genes in bacteria.
(a) AP1 NFAT
Weak NFAT Weak AP1 binding site binding site
Cooperative binding of NFAT and AP1
ATF-2 p50
(b)
SRFB
IRF-7B
SRFA
IRF-3C
Jun SAP1 B-box
IRF-7D IRF-3A RelA 102 | ATF-2
70
51
IRF-3A IRF-3C p50 | | 5’ TAAATGACATAGGAAAACTGAAAGGGAGAAGTGAAAGTGGGAAATTCCTCTG 3’ 3’ TTTACTGTATCCTTTTGACTTTCCCTCTTCACTTTCACCCTTTAAGGAGACA 5’ Jun IRF-7B IRF-7D RelA
SAP1 ETS
FIGURE 933 Cooperative binding of two unrelated transcription factors to neighboring sites in a composite control element. (a) By themselves, both monomeric NFAT and heterodimeric AP1 transcription factors have low affinity for their respective binding sites in the IL-2 promoter-proximal region. Protein-protein interactions between NFAT and AP1 add to the overall stability of the NFATAP1-DNA complex, so that the two proteins bind to the composite site cooperatively. (b)Cooperative DNA binding by dimeric SRF and monomeric SAP1 can occur when their binding sites are separated by 5–30 bp and when the SAP1 binding site is inverted because the domain of SAP1 that interacts with SRF is connected to the DNA-binding domain of SAP1 by a flexible linker region of the SAP1 polypeptide chain (dotted line). [Part (a) data from L. Chen et al., 1998, Nature 392:42, PDB ID 1a02; part (b) data from M. Hassler and T. J. Richmond, 2001, EMBO J. 20:3018, PDB ID 1hbx.]
of the ability of interacting proteins bound to distant sites to produce loops in the DNA between their binding sites (see Figure 9-5), considerable leeway in the spacing between regulatory elements in transcription-control regions is permissible. This tolerance for variable spacing between binding sites for specific transcription factors, and between promoter binding sites for the general transcription factors and for Pol II, probably contributed to rapid evolution of gene
FIGURE 934 Model of the enhanceosome that forms on the 𝛃-interferon enhancer. Two heterodimeric factors, Jun/ATF-2 and p50/RelA (NF-κB), and two copies each of the monomeric transcription factors IRF-3 and IRF-7, bind to the six overlapping binding sites in this enhancer. See D. Penne, T. Manniatis, and S. Harrison, 2007, Cell129:1111.
KEY CONCEPTS OF SECTION 9.4
Regulatory Sequences in Protein-Coding Genes and the Proteins Through Which They Function r Expression of eukaryotic protein-coding genes is generally regulated through multiple protein-binding transcriptioncontrol regions that are located close to or distant from the transcription start site (see Figure 9-23). r Promoters direct binding of RNA polymerase II to DNA, determine the site of transcription initiation, and influence the frequency of transcription initiation. r Promoter-proximal elements occur within about 200 bp of a start site. Several such elements, containing 6–10 bp, may help regulate a particular gene.
9.4 Regulatory Sequences in Protein-Coding Genes and the Proteins Through Which They Function
389
r Activation and repression domains in transcription factors exhibit a variety of amino acid sequences and threedimensional structures. In general, these functional domains interact with co-activators or co-repressors, which are critical to the ability of transcription factors to modulate gene expression.
chromatin structure, inhibiting or stimulating the ability of general transcription factors to bind to promoters. Recall from Chapter 8 that the DNA in eukaryotic cells is not free, but is associated with a roughly equal mass of protein in the form of chromatin. The basic structural unit of chromatin is the nucleosome, which is composed of about 147 bp of DNA wrapped tightly around a disk-shaped core of histone proteins. Residues within the N-terminal region of each histone, and the C-terminal regions of histones H2A and H2B, called histone tails, extend from the surface of the nucleosome and can be reversibly modified (see Figure 8-26b). Such modifications influence the relative condensation of chromatin and thus its accessibility to proteins required for transcription initiation. Second, activators and repressors interact with a large multiprotein complex called the mediator of transcription complex, or simply Mediator. This complex, in turn, binds to Pol II and directly regulates assembly of the preinitiation complex. In addition, some activation domains interact with TFIID-TAF subunits or other components of the preinitiation complex, and these interactions contribute to preinitiation complex assembly. Finally, activation domains may also interact with the elongation factor P-TEFb (cyclin T–CDK9) and other as yet unknown factors to stimulate elongation by Pol II away from the promoter region. In this section, we review the current understanding of how repressors and activators control chromatin structure and preinitiation complex assembly. In the next section of the chapter, we discuss how the concentrations and activities of activators and repressors themselves are controlled, so that gene expression is precisely attuned to the needs of the cell and organism.
r The transcription-control regions of most genes contain binding sites for multiple transcription factors. Transcription of such genes varies depending on the particular repertoire of transcription factors that are expressed and activated in a particular cell at a particular time.
Formation of Heterochromatin Silences Gene Expression at Telomeres, near Centromeres, and in Other Regions
r Enhancers, which contain multiple short control elements, may be located from 200 bp to tens of kilobases upstream or downstream from a promoter, within an intron, or downstream from the final exon of a gene. r Promoter-proximal elements and enhancers are often celltype-specific, functioning only in specific differentiated cell types. r Transcription factors, which activate or repress transcription, bind to promoter-proximal regulatory elements and enhancers in eukaryotic DNA. r Transcription activators and repressors are generally modular proteins containing a single DNA-binding domain and one or a few activation domains (for activators) or repression domains (for repressors). The different domains are frequently linked by flexible, intrinsically disordered polypeptide regions (see Figure 9-28). r Among the most common structural motifs found in the DNA-binding domains of eukaryotic transcription factors are the homeodomain, C2H2 zinc finger, basic zipper (leucine zipper), and basic helix-loop-helix (bHLH). All these and many other DNA-binding motifs contain one or more α helices that interact with the major groove in their cognate site in DNA.
r Combinatorial complexity in transcriptional control results from alternative combinations of monomers that form heterodimeric transcription factors (see Figure 9-32) and from cooperative binding of transcription factors to composite control sites (see Figure 9-33). r Binding of multiple transcription factors to multiple sites in an enhancer forms a DNA-protein complex called an enhanceosome (see Figure 9-34).
9.5 Molecular Mechanisms of Transcription Repression and Activation The repressors and activators that bind to specific sites in DNA and regulate expression of the associated protein-coding genes do so by three general mechanisms. First, these regulatory proteins act in concert with other proteins to modulate 390
CHAPTER 9
t Transcriptional Control of Gene Expression
For many years it has been clear that inactive genes in eukaryotic cells are often associated with heterochromatin, regions of chromatin that are more highly condensed and stain more darkly with DNA dyes than euchromatin, in which most transcribed genes are located (see Figure 8-28a). Regions of chromosomes near the centromeres and telomeres, as well as additional specific regions that vary in different cell types, are organized into heterochromatin. The DNA in heterochromatin is less accessible to externally added proteins than is DNA in euchromatin and consequently is often referred to as “closed” chromatin. For instance, in an experiment described in Chapter 8, the DNA of inactive genes was found to be far more resistant to digestion by DNase I than the DNA of transcribed genes (see Figure 8-27). Study of DNA regions in S. cerevisiae that behave like the heterochromatin of higher eukaryotes provided early insight into the chromatin-mediated repression of transcription. This yeast can grow either as haploid or diploid cells. Haploid cells exhibit one of two possible mating types, called a and α. Cells of different mating type can “mate,” or fuse,
to generate a diploid cell (see Figure 1-23). When a haploid cell divides by budding, the larger “mother” cell switches its mating type. Genetic and molecular analyses have revealed that three genetic loci on yeast chromosome III control the mating type of yeast cells (Figure 9-35). The central matingtype locus, termed MAT—the only one of the three that is actively transcribed—encodes transcription factors (a1, or α1 and α2) that regulate genes that determine the mating type. In any one cell, either an a or α DNA sequence is located at the MAT. The two additional loci, termed HML and HMR, near the left and right telomere, respectively, contain “silent” (nontranscribed) copies of the a or α genes. These sequences are transferred alternately from HMLα or HMRa into the MAT locus by a type of nonreciprocal recombination between homologous sequences during cell division. When the MAT locus contains the DNA sequence from HMLα, the cells behave as α cells. When the MAT locus contains the DNA sequence from HMRa, the cells behave like a cells. Our interest here is in how transcription of the silent mating-type genes at HML and HMR is repressed. If these genes are expressed, as they are in yeast mutants with defects in the repressing mechanism, both a and α proteins are expressed, causing the cells to behave like diploid cells, which cannot mate. The promoters and UASs controlling transcription of the a and α genes lie near the center of the DNA sequence that is transferred and are identical whether the sequences are at the MAT locus or at one of the silent loci. This arrangement indicates that the function of the transcription factors that interact with these sequences must somehow be blocked at HML and HMR, but not at the MAT locus. This repression of the silent loci depends on silencer sequences located next to the region of transferred DNA at HML and HMR (see Figure 9-35). If the silencer is deleted, the adjacent locus is transcribed. Remarkably, any gene placed near the yeast mating-type silencer sequence by recombinant DNA techniques is repressed, or “silenced,” even a tRNA gene transcribed by RNA polymerase III, which uses a different
set of general transcription factors than RNA polymerase II uses, as discussed later. Several lines of evidence indicate that repression of the HML and HMR loci results from a condensed chromatin structure that sterically blocks transcription factors from interacting with the DNA. In one telling experiment, the gene encoding an E. coli enzyme that methylates adenine residues in the sequence GATC was introduced into yeast cells under the control of a yeast promoter so that the enzyme was expressed. Researchers found that GATC sequences within the MAT locus and most other regions of the genome in these cells were methylated, but not those within the HML and HMR loci. These results indicate that the DNA of the silent loci is inaccessible to the E. coli methylase, and presumably to proteins in general, including transcription factors and RNA polymerase. Similar experiments conducted with various yeast histone mutants indicated that specific interactions involving the histone tails of H3 and H4 are required for formation of a fully repressed chromatin structure. Other studies have shown that the telomeres of every yeast chromosome also behave like silencer sequences. For instance, when a gene is placed within a few kilobases of any yeast telomere, its expression is repressed. In addition, this repression is relieved by the same mutations in the H3 and H4 histone tails that interfere with repression at the silent mating-type loci. Genetic studies led to identification of several proteins, RAP1 and three SIR proteins, that are required for repression of the silent mating-type loci and the telomeres in yeast. RAP1 was found to bind within the DNA silencer sequences associated with HML and HMR and to a sequence that is repeated multiple times at each yeast-chromosome telomere. Further biochemical studies showed that the SIR2 protein is a histone deacetylase; it removes acetyl groups on lysines of the histone tails. Furthermore, the RAP1 and SIR2, 3, and 4 proteins bind to one another, and SIR3 and SIR4 bind to the N-terminal tails of histones H3 and H4, which are maintained in a largely nonacetylated state by the deacetylase
Yeast chromosome III Centromere
Silencer Telomere
Silencer
a
HML
Telomere HMRa
MATa or
sequences at MAT locus
2
1
FIGURE 935 Arrangement of mating-type loci on chromosome III in the yeast S. cerevisiae. Silent (unexpressed) mating-type genes (either a or α) are located at the HML locus. The opposite mating-type gene is present at the silent HMR locus. When the α or a sequences are present at the MAT locus, they can be transcribed into mRNAs whose
a sequences at MAT locus
a1
encoded proteins specify the mating-type phenotype of the cell. The silencer sequences near HML and HMR bind proteins that are critical for repression of these silent loci. Haploid cells can switch mating types in a process that transfers the DNA sequence from HML or HMR to the transcriptionally active MAT locus.
9.5 Molecular Mechanisms of Transcription Repression and Activation
391
(b) Telomeres
(a) Nuclei and telomeres
(d)
(c) SIR3 protein
Hypoacetylated histone N-terminal tails
Sir2 Sir4 Sir3 Rap1
Telomeric DNA
Sir2, Sir3, Sir4 proteins Sir2 Sir3
Hypoacetylated histone N-terminal tails
Sir4
Nucleosomes condense and multiple telomeres associate
EXPERIMENTAL FIGURE 936 Antibody and DNA probes colocalize SIR3 protein with telomeric heterochromatin in yeast nuclei. (a) Confocal micrograph 0.3 mm thick through three diploid yeast cells, each containing 68 telomeres. Telomeres were labeled by hybridization to a fluorescent telomere-specific probe (yellow). DNA was stained red to reveal the nuclei. The 68 telomeres coalesce into a much smaller number of regions near the nuclear periphery. (b, c) Confocal micrographs of yeast cells labeled with a telomere-specific hybridization probe (b) and a fluorescent-labeled antibody specific for SIR3 (c). Note that SIR3 is localized in the repressed telomeric heterochromatin. Similar experiments with RAP1, SIR2, and SIR4 have shown that these proteins also colocalize with the repressed telomeric heterochromatin. (d) Schematic model of the silencing mechanism at yeast telomeres. (Top left) Multiple copies of RAP1 392
CHAPTER 9
t Transcriptional Control of Gene Expression
bind to a simple repeated sequence at each telomere region that lacks nucleosomes. SIR3 and SIR4 bind to RAP1, and SIR2 binds to SIR4. SIR2 is a histone deacetylase that deacetylates the tails on the histones neighboring the repeated RAP1-binding site. (Middle) The hypoacetylated histone tails are also binding sites for SIR3 and SIR4, which in turn bind additional SIR2, deacetylating neighboring histones. Repetition of this process results in spreading of the region of hypoacetylated histones with associated SIR2, SIR3, and SIR4. (Bottom) Interactions between complexes of SIR2, SIR3, and SIR4 cause the chromatin to condense and several telomeres to associate, as shown in a–c. The higher-order chromatin structure generated sterically blocks other proteins from interacting with the underlying DNA. See M.Grunstein, 1997, Curr. Opin. Cell Biol. 9:383. [Parts (a)–(c) ©1996 Gotta et al., The Journal of Cell Biology, 134: 1349–1363. doi:10.1083/jcb.134.6.134.]
activity of SIR2. A series of experiments using fluorescence confocal microscopy on yeast cells either stained with fluorescent-labeled antibody to any one of the SIR proteins or RAP1 or hybridized to a labeled telomere-specific DNA probe revealed that these proteins form large, condensed telomeric nucleoprotein structures resembling the heterochromatin found in higher eukaryotes (Figure 9-36a, b, c). Figure 9-36d depicts a model for the chromatin-mediated silencing at yeast telomeres based on these and other studies. Formation of heterochromatin at telomeres is nucleated by multiple RAP1 protein molecules bound to repeated sequences in a nucleosome-free region at the extreme end of a telomere. A network of protein-protein interactions involving telomere-bound RAP1, three SIR proteins (2, 3, and 4), and hypoacetylated histones H3 and H4 creates a higherorder nucleoprotein complex that includes several telomeres and in which the DNA is largely inaccessible to external proteins. One additional protein, SIR1, is also required for silencing of the mating-type loci. It binds to the silencer regions associated with HML and HMR together with RAP1 and other proteins to initiate assembly of a similar multiprotein silencing complex that encompasses HML and HMR. An important feature of this model is the dependence of repression on hypoacetylation of the histone tails. This dependence was demonstrated in experiments with yeast mutants expressing histones in which lysines in histone Ntermini were replaced with arginines, glutamines, or glycines. Arginine is positively charged, like lysine, but cannot be acetylated. Glutamine, on the other hand, is neutral and simulates the neutral charge of acetylated lysine, and glycine, with no side chain, also mimics the absence of a positively charged lysine. Repression at telomeres and at the silent mating-type loci was defective in the mutants with glutamine and glycine substitutions for lysine in the H3 or H4 histone tails, but not in the mutants with arginine substitutions. Further, acetylation of H3 and H4 lysines interferes with binding by Sir3 and Sir4 and consequently prevents repression at the silent loci and telomeres. Finally, chromatin immunoprecipitation experiments (see Figure 9-18a) using antibodies specific for acetylated lysines at particular positions in the histone N-terminal tails (see Figure 8-26a) confirmed that histones in repressed regions near telomeres and at the silent mating loci are hypoacetylated, but become hyperacetylated in sir mutants when genes in these regions are derepressed.
Repressors Can Direct Histone Deacetylation at Specific Genes The importance of histone deacetylation in chromatinmediated gene repression was further supported by studies of eukaryotic repressors that regulate genes at internal chromosomal positions. These proteins are now known to act in part by causing deacetylation of histone tails in nucleosomes that encompass the TATA box and promoter- proximal region of the genes they repress. In vitro studies have shown that when promoter DNA is part of a nucleosome with nonacetylated histones, the general transcription factors cannot bind to the TATA box and promoter-proximal
region. In nonacetylated histones, the N-terminal lysines are positively charged and may interact with DNA phosphates. The nonacetylated histone tails also interact with neighboring histone octamers and other chromatin-associated proteins, favoring the folding of chromatin into condensed higher-order structures whose precise conformation is not well understood. The net effect is that general transcription factors cannot assemble into a preinitiation complex on a promoter associated with hypoacetylated histones. In contrast, binding of general transcription factors is repressed much less by histones with hyperacetylated tails, in which the positively charged lysines are neutralized and electrostatic interactions are eliminated. The connection between histone deacetylation and repression of transcription at specific yeast promoters became clearer when the cDNA encoding a human histone deacetylase was found to have high homology to the yeast RPD3 gene, known to be required for the normal repression of a number of yeast genes. Further work showed that the yeast Rpd3 protein has histone deacetylase activity. The ability of Rpd3 to deacetylate histones at a number of promoters depends on two other proteins: Ume6, a repressor that binds to a specific upstream regulatory sequence (URS1), and Sin3, which is part of a large multiprotein complex called Rpd3L that also contains Rpd3 (Figure 9-37a). Sin3 also binds to the repression domain of Ume6, thus positioning the Rpd3 histone deacetylase in the complex so that it can interact with nearby promoter-associated nucleosomes and remove acetyl groups from histone-tail lysines. Additional experiments, using the chromatin immunoprecipitation technique outlined in Figure 9-18a and antibodies to specific histone acetylated lysines, demonstrated that in wild-type yeast, one or two nucleosomes in the immediate vicinity of Ume6-binding sites are hypoacetylated. These sites include the promoters of genes repressed by Ume6. In sin3 and rpd3 deletion mutants, not only were these promoters derepressed, but the nucleosomes near the Ume6-binding sites were hyperacetylated. All these findings provide considerable support for the model of repressor-directed deacetylation shown in Figure9-37a. In yeast, the Sin3-Rpd3 complex (Rpd3L) functions as a co-repressor, a protein or complex of proteins that binds to a repression domain and interacts with chromatin, Pol II, or the general transcription factors to repress transcription. Co-repressor complexes containing histone deacetylases have also been found associated with many repressors from mammalian cells. Some of these complexes contain the mammalian homolog of Sin3 (mSin3), which interacts with the repression domain of repressors, as in yeast. Other histone deacetylase complexes identified in mammalian cells contain additional or different repression domain-binding proteins. These various repressor and co-repressor combinations mediate histone deacetylation at specific promoters by a mechanism similar to the yeast mechanism (see Figure 9-37a). In addition to repressing transcription through the formation of “closed” chromatin structures, some repression domains have also been found to inhibit the assembly of preinitiation complexes in in vitro experiments with purified general
9.5 Molecular Mechanisms of Transcription Repression and Activation
393
(a) Repressor-directed histone deacetylation Rpd3L "Acetyl group Rpd3
Deacetylation of histone N-terminal tails
Sin3 RD
Ume6 DBD
URS1
Histone N-terminal tail (b) Activator-directed histone hyperacetylation SAGA complex
Gcn5
Hyperacetylation of histone N-terminal tails
AD
Gcn4
FIGURE 937 Proposed mechanism of histone deacetylation and hyperacetylation in yeast transcriptional control. (a) Repressordirected deacetylation of histone N-terminal tails. The DNA-binding domain (DBD) of the repressor Ume6 interacts with a specific upstream control element of the genes it regulates, called URS1. The Ume6 repression domain (RD) binds Sin3, a subunit of a multiprotein complex that includes Rpd3, a histone deacetylase. Deacetylation of histone N-terminal tails on nucleosomes in the region of the Ume6-binding site inhibits binding of general transcription factors at the TATA box, thereby repressing gene expression. (b) Activatordirected hyperacetylation of histone N-terminal tails. The DNA-binding domain of the activator Gcn4 interacts with specific upstream activating sequences (UAS) of the genes it regulates. The Gcn4 activation domain (AD) then interacts with a multiprotein histone acetylase complex that includes the Gcn5 catalytic subunit. Subsequent hyperacetylation of histone N-terminal tails on nucleosomes in the vicinity of the Gcn4-binding site facilitates access by the general transcription factors required for initiation. Repression and activation of many genes in higher eukaryotes occur by similar mechanisms.
DBD
UAS
transcription factors in the absence of histones. This activity probably contributes to the repression of transcription by these repression domains in vivo as well.
Activators Can Direct Histone Acetylation at Specific Genes Just as repressors function through co-repressors that bind to their repression domains, the activation domains of DNA-binding activators function by binding multisubunit co-activator complexes, protein complexes that interact with or modify chromatin, Pol II, or general transcription factors to activate transcription. One of the first co-activator complexes to be characterized was the yeast SAGA complex, which functions with the Gcn4 activator protein described in Section 9.4. Early genetic studies indicated that full activity of the Gcn4 activator required a protein called Gcn5. The clue to Gcn5’s function came from biochemical studies of a histone acetylase purified from the protozoan Tetrahymena, the first histone acetylase to be purified. Sequence analysis revealed homology between the Tetrahymena protein and yeast Gcn5, which was soon shown to have histone acetylase activity as well. Further genetic and biochemical studies revealed that Gcn5 is one subunit of a multiprotein
394
CHAPTER 9
t Transcriptional Control of Gene Expression
co-activator complex, named the SAGA complex after genes encoding some of the subunits. Another subunit of this histone acetylase complex binds to activation domains in multiple yeast activator proteins, including Gcn4. The model shown in Figure 9-37b is consistent with the observation that nucleosomes near the promoter region of a gene regulated by the Gcn4 activator are specifically hyperacetylated compared with most histones in the cell. This activatordirected hyperacetylation of nucleosomes near a promoter region opens the chromatin structure so as to facilitate the binding of other proteins required for transcription initiation. The chromatin structure is less condensed than most chromatin, as indicated by its sensitivity to digestion with nucleases in isolated nuclei. In addition to leading to the decondensation of chromatin, the acetylation of specific histone lysines generates binding sites for proteins containing bromodomains. A bromodomain is a sequence of about 110 amino acids that folds into a domain that binds acetylated lysine. One or more bromodomains are found in several chromosome-associated proteins that contribute to transcriptional activation. For example, a subunit of the general transcription factor TFIID contains two bromodomains, which bind to acetylated nucleosomes with high affinity. Recall that TFIID binding to a
promoter initiates assembly of an RNA polymerase II preinitiation complex (see Figure 9-19). Nucleosomes at promoter regions of virtually all active genes have acetylated lysines in their H3 and H4 histone tails. A similar activation mechanism operates in higher eukaryotes. Mammalian cells contain multisubunit histone acetylase co-activator complexes that are homologous to theyeast SAGA complex. They also express two related 300-kDa, multidomain proteins called CBP and p300, which function similarly. As noted earlier, one domain of CBP binds the phosphorylated acidic activation domain in the CREB transcription factor. Other domains of CBP interact with different activation domains in other activators. Yet another domain of CBP has histone acetylase activity, and another CBP domain associates with additional multisubunit histone acetylase complexes. CREB and many other mammalian activators function in part by directing CBP and the associated histone acetylase complex to specific nucleosomes, where they acetylate histone tails, facilitating the interaction of general transcription factors with promoter DNA.
Chromatin-Remodeling Complexes Help Activate or Repress Transcription In addition to histone acetylase complexes, multiprotein chromatin-remodeling complexes are required for activation at many promoters. The first of these complexes characterized was the yeast SWI/SNF chromatin-remodeling complex. One of the SWI/SNF subunits has homology to DNA helicases, enzymes that use energy from ATP hydrolysis to disrupt interactions between base-paired nucleic acids or between nucleic acids and proteins. In vitro, the SWI/SNF complex is thought to pump or push DNA into the nucleosome so that DNA bound to the surface of the histone octamer transiently dissociates from the surface and translocates, causing the nucleosomes to “slide” along the DNA. The net result of such chromatin remodeling is to facilitate the binding of transcription factors to specific DNA sequences in chromatin. Many activation domains bind to such chromatinremodeling complexes, and this binding stimulates in vitro transcription from chromatin templates in which the DNA is associated with histone octamers. Thus the SWI/SNF complex represents another type of co-activator complex. The experiment shown in Figure 9-38 demonstrates dramatically how an activation domain can cause decondensation of a region of chromatin. This decondensation results from association of the activation domain with chromatin-remodeling and histone acetylase complexes. Chromatin-remodeling complexes are required for many processes involving DNA in eukaryotic cells, including transcriptional control, DNA replication, recombination, and DNA repair. Several types of chromatin-remodeling complexes are found in eukaryotic cells, all with homologous DNA helicase domains. SWI/SNF complexes and related chromatin-remodeling complexes in multicellular organisms contain subunits with bromodomains that bind to acetylated
(a) Condensed chromatin
(b) Decondensed chromatin
2 Rm
Lacl
Ac
Ac
Ac
Me
Me Me
Ac
Ac
Lacl-VP16 AD Histone acetylase and chromatin-remodeling complexes
FIGURE 938 Expression of fusion proteins demonstrates chromatin decondensation in response to an activation domain. A cultured hamster cell line was engineered to contain multiple copies of a tandem array of E. coli lac operator sequences integrated into a chromosome in a region of heterochromatin. (a) When an expression vector for the lac repressor (LacI) was transfected into these cells, lac repressor bound to the lac operator sites could be visualized in a region of condensed chromatin using an antibody against the lac repressor (red). DNA was visualized by staining with DAPI (blue), revealing the nucleus. A diagram of condensed chromatin is shown below. (b) When LacI fused to an activation domain was transfected into these cells, staining as in (a) revealed that the activation domain causes this region of chromatin to decondense into a thinner chromatin fiber that fills a much larger volume of the nucleus. A diagram of a region of decondensed chromatin with bound LacI fusions to the VP16 activation domain (AD) and associated chromatin remodeling and histone acetylase complexes is shown below. [Photos ©1999 Dr. Andrew S. Belmont et al., The Journal of Cell Biology, 145:1341–1354. doi: 10.1083/jcb.145.7.1341.]
histone tails. Consequently, SWI/SNF complexes remain associated with activated, acetylated regions of chromatin, presumably maintaining them in a decondensed conformation. Chromatin-remodeling complexes can also participate in transcriptional repression. These complexes bind to the repression domains of repressors and contribute to repression, presumably by folding chromatin into condensed structures. Much remains to be learned about how this important class of proteins alters chromatin structure to influence gene expression and other processes.
Pioneer Transcription Factors Initiate the Process of Gene Activation During Cellular Differentiation As cells differentiate during embryogenesis and during differentiation from stem cells in adult organisms (see Chapter 21), many of the genes induced during the
9.5 Molecular Mechanisms of Transcription Repression and Activation
395
process are initially in repressed regions of heterochromatin in undifferentiated progenitor cells. Activation of these genes requires that the chromatin environment of their transcription-control regions become decondensed so that transcription factors can bind to enhancers and promoterproximal control elements and so that the general transcription factors and Pol II can bind to promoters. In many cases, this decondensation is initiated by special pioneer transcription factors that can bind to their cognate binding sites in DNA even when those sites are within repressed heterochromatic regions of chromatin. These factors have a DNA-binding domain that binds to one side of the DNA helix in a manner similar to the bacteriophage 434 repressor (see Figure 9-29). This domain allows these factors to bind to their specific binding sites while the DNA is wrapped around a histone octamer with the opposite side of the DNA against the surfaces of histones.
Yeast
(a)
Human
Head
Middle Tail
CKM
Head
Middle Tail
CKM
Med6 Med8 Med11 Med17 Med18 Med20 Med22
Med1 Med4 Med7 Med9 Med10 Med19 Med21 Med31
Med12 Med13 Cdk8 CycC
MED6 MED8 MED11 MED17 MED18 MED20 MED22 MED27 MED28 MED29 MED30
MED1 MED4 MED7 MED9 MED10 MED19 MED21 MED31 MED26
MED12/12L MED13/13L Cdk8/CDK19 CycC
(b)
Mediator
Med2 Med3 Med5 Med14 Med15 Med16
MED14 MED15 MED16 MED23 MED24 MED25
Holoenzyme
Middle RNA Pol II Head
Tail
FIGURE 939 Structure of yeast and human Mediator complexes. (a) Subunits of the S. cerevisiae and human Mediator complexes. The subunits constituting the head, middle, and tail modules of Mediator are indicated, as well as the subunits of the CDK8-kinase module (CKM) that associates with some Mediator complexes, blocking PolII binding. (b) Cryoelectron microscopic structure of the yeast Mediator without the CKM. (Left) The head, middle, and tail modules composed of the subunits listed above are color-coded. (Right) The structure of a complex of Mediator with Pol II, called the holoenzyme, suggests that the Mediator modules rotate relative to one another as shown to create a surface that binds Pol II. [Part (b) republished with permission of Elsevier, from Tsai, K.L., “Subunit architecture and functional modular rearrangements of the transcriptional mediator complex,” Cell, 2014,157(6): 1430–1444; permission conveyed through Copyright Clearance Center, Inc.]
396
CHAPTER 9
One example of pioneer transcription factors initiating the process of transcriptional activation involves the liver-specific gene Alb1, encoding serum albumin, a major constituent of blood serum that is secreted into the blood by hepatocytes. In the developing mouse, the FoxA and GATA-4 or GATA-6 transcription factors are the first transcription factors to bind to an Alb1 enhancer in undifferentiated gut endodermal cells destined to develop into the liver. FoxA has a “winged helix” DNA-binding domain that binds to one side of the DNA helix containing the FoxA-binding site. GATA factors are also able to bind to their specific sites in DNA when those sites are included in nucleosomal DNA wrapped around a histone octamer. The FoxA and GATA-4/6 activation domains may then interact with chromatin remodeling complexes and histone acetylase complexes to decondense the chromatin of the 120-bp Alb1 enhancer, allowing the observed subsequent binding of four additional transcription factors in the nascent liver bud that develops later.
t Transcriptional Control of Gene Expression
The Mediator Complex Forms a Molecular Bridge Between Activation Domains and Pol II Once the interaction of activation domains with histone acetylase complexes and chromatin remodeling complexes converts the chromatin of a promoter region to an “open” structure that allows the binding of general transcription factors, activation domains interact with another multisubunit co-activator complex, the Mediator complex (Figure 9-39). Activation domain–Mediator interactions stimulate assembly of the preinitiation complex on the promoter. Recent cryoelectron microscopy studies show that the head and middle domains of the Mediator complex interact directly with Pol II. Several Mediator subunits bind to activation domains in various activator proteins. Thus Mediator can form a molecular bridge between an activator bound to its cognate site in DNA and Pol II bound to a promoter. Experiments with temperature-sensitive yeast mutants indicate that some Mediator subunits are required for transcription of virtually all yeast genes. These subunits help maintain the overall structure of the Mediator complex or bind to Pol II; they are therefore required for activation by all activators. In contrast, other Mediator subunits are required for normal activation or repression of specific subsets of genes. DNA microarray analysis (see Figure 6-26) of yeast gene expression in mutants with defects in these nonessential Mediator subunits have indicated that each one influences transcription of 3–10percent of all genes to the extent that its deletion either increases or decreases mRNA expression by a factor of twofold or more. In many cases, these Mediator subunits have been discovered to interact with specific activation domains; thus when one Mediator subunit is defective, transcription of genes regulated by activators that bind to that subunit is severely depressed, but
transcription of other genes is unaffected. Recent cryoelectron microscopy studies suggest that when activation domains interact with Mediator, the head, middle, and tail domains depicted in Figure 9-39 rotate relative to one another, creating a binding surface for RNA polymeraseII. The surface of the polymerase that interacts with general transcription factors in the preinitiation complex (see Figure 9-20) remains exposed in the proposed model of the polymerase-Mediator complex, referred to as the holoenzyme. The various experimental results indicating that individual Mediator subunits bind to specific activation domains suggest that multiple activators may influence transcription from a single promoter by interacting with a Mediator complex simultaneously or in rapid succession (Figure 9-40). Activators bound at enhancers or promoter-proximal elements can interact with Mediator associated with a promoter because chromatin, like DNA, is flexible and can form a loop, bringing the regulatory regions and the promoter close together, as observed for the E. coli NtrC activator and σ54-RNA polymerase (see Figure 9-5). The multiprotein complexes that form on eukaryotic promoters may comprise more than 100 polypeptides with a total mass of 3–5 megadaltons (MDa)—as large as a ribosome. In vivo, assembly of a preinitiation complex on a promoter and initiation of transcription is a highly cooperative process generally requiring that several transcription factors bound to transcription-control elements interact with co-activators that in turn interact with Pol II and general transcription factors. A cell must produce the specific set of activators required for transcription of a particular gene in order to express that gene.
Activators bound to enhancers
Long loop of chromatin
GTFs
Promoterproximal activators
Pol II
TAFs
FIGURE 940 Model of several DNA-bound activators interacting with a single Mediator complex. The ability of different Mediator subunits to interact with specific activation domains may contribute to the integration of signals from several activators at a single promoter. See the text for discussion.
r Some repression domains function by interacting with co-repressors that are histone deacetylase complexes. The subsequent deacetylation of histone N-terminal tails in nucleosomes near the repressor-binding site inhibits interaction between the promoter DNA and general transcription factors, thereby repressing transcription initiation (see Figure 9-37a). r Activation domains function by binding multiprotein coactivator complexes such as histone acetylase complexes. The subsequent hyperacetylation of histone N-terminal tails in nucleosomes near the activator-binding site facilitates interactions between the promoter DNA and general transcription factors, thereby stimulating transcription initiation (see Figure 9-37b). r SWI/SNF chromatin-remodeling factors constitute another type of co-activator. These multisubunit complexes can transiently dissociate DNA from histone cores in an ATPdependent reaction and may also decondense regions of chromatin, thereby promoting the binding of DNA-binding proteins needed for transcription initiation.
KEY CONCEPTS OF SECTION 9.5
Molecular Mechanisms of Transcription Repression and Activation r Eukaryotic transcription activators and repressors exert their effects largely by binding to multisubunit co-activators or co-repressors that influence the assembly of preinitiation complexes either by modulating chromatin structure or by interacting with Pol II and general transcription factors. r The DNA in condensed regions of chromatin (heterochromatin) is relatively inaccessible to transcription factors and other proteins, so that gene expression in these regions is repressed. r The interactions of several proteins with one another and with the hypoacetylated N-terminal tails of histones H3 and H4 are responsible for the chromatin-mediated repression of transcription that occurs in the telomeres and the silent mating-type loci in S. cerevisiae (see Figure 9-36).
r The Mediator complex, another type of co-activator, is a roughly 30-subunit complex that forms a molecular bridge between activation domains and RNA polymerase II by binding directly to the polymerase and activation domains. By binding to several different activators either simultaneously or in rapid succession, Mediator probably helps integrate the effects of multiple activators on a single promoter (see Figure 9-40). r Activators bound to a distant enhancer can interact with transcription factors bound to a promoter because chromatin is flexible and the intervening chromatin can form a large loop. r The highly cooperative assembly of preinitiation complexes in vivo generally requires several activators. A cell must produce the specific set of activators required for transcription of a particular gene in order to express that gene.
9.5 Molecular Mechanisms of Transcription Repression and Activation
397
9.6 Regulation of TranscriptionFactorActivity We have seen in the preceding discussion how combinations of transcription factors that bind to specific DNA regulatory sequences control the transcription of eukaryotic genes. Whether or not a specific gene in a multicellular organism is expressed in a particular cell at a particular time is largely a consequence of the nuclear concentrations and activities of the transcription factors that interact with the transcription-control regions of that gene. (Exceptions are due to the “transcriptional memory” that results from the epigenetic mechanisms discussed in the next section.) Which transcription factors are expressed in a particular cell type, and the amounts produced, are determined by multiple regulatory interactions between transcription factors and control regions in genes encoding transcription factors that occur during the development and differentiation of that cell type. Recent advances in the analysis of transcription-factor-binding sites through identification of DNase I hypersensitive sites on a genomic scale have given us the first high-resolution view of how transcription-factor binding changes during the development and differentiation of multiple human cell types.
DNase I Hypersensitive Sites Reflect the Developmental History of Cellular Differentiation In Chapter 8, we learned that an expressed gene is far more sensitive to digestion by DNase I (a bovine pancreatic enzyme) than the same gene in a different cell type in which it is not expressed (see Figure 8-34). In addition to this general increase in DNase I sensitivity over long regions, researchers later found that specific short regions of the genome, on the order of a hundred base pairs in length, are extremely sensitive to DNase I digestion and are the first regions cut when isolated nuclei are treated with low levels of DNase I. These sites are known as DNase I hypersensitive sites (DHSs). High-throughput sequencing methods have allowed mapping of DHSs across the genome in multiple differentiated and embryonic cell types. Briefly, after digestion of isolated nuclei with low levels of DNase I, DNA is isolated from the treated chromatin. Oligonucleotide linkers of a known sequence are ligated to the DNA ends generated by DNase I digestion. Then the DNA is sheared into small fragments by sonication, amplified by PCR, and sequenced. Human DNA sequences adjacent to the known sequence of the oligonucleotide linker were thus identified as DHSs. Figure 9-41a shows plots of the number of times a DHS was sequenced—a measure of the DNase sensitivity of the site—in samples from the human cell types indicated at the left. A roughly 600-kb region of the genome on chromosome 12, located 96.2–96.8 Mb from the left end of the chromosome, is shown. The height of each vertical bar represents the degree of sensitivity of the DNA sequence at that position to digestion in nuclei isolated from each of the cell types.
398
CHAPTER 9
t Transcriptional Control of Gene Expression
Mapping of binding sites for specific transcription factors by chromatin immunoprecipitation (see Figure 9-18) has shown that most transcription-factor-binding sites are coincident with DHSs. This may be because the DNA-binding domain of the bound transcription factor exposes DNA flanking the binding site to DNase I digestion, or because the transcription-factor activation domain interacts with chromatin-remodeling complexes that destabilize the interaction of DNA with histone octamers in neighboring nucleosomes, causing the DNA to be more sensitive to DNase I digestion. Because DHSs are coincident with bound transcription factors, the DHS pattern in a region of chromatin represents the positions of bound transcription factors, although the transcription factors bound are not directly identified. In Figure 9-41a, the type of tissue from which the DHS data were determined is shown on the left, and the embryonic tissues from which these tissue types developed are colorcoded as indicated in Figure 9-41b. It is apparent that more closely related cell types, such as fibroblasts from different regions of the body, or endothelial cells that line the inner surfaces of blood vessels from different organs, have more similar DHSs than more distantly related cell types. With computer methods, it is possible to compare the similarity of the DHS maps for each of these cell types across the entire genome. With these computational methods, a dendrogram can be generated showing how closely the DHS map from one cell type resembles those of other cell types (see Figure 9-41b). This dendrogram is similar to the dendrograms used to show the relatedness, and hence the evolution, of gene sequences (see Figure 8-21b). Importantly, the DHS pattern of embryonic stem cells is at the root of the DHS dendrogram for all cell types (see Figure 9-41b). These cells from the inner cell mass of the early mammalian embryo, discussed in Chapter 21 (see Figure21-5), are the progenitors of all cells in the adult organism. Embryonic stem cells appear to have the most complex transcriptional control of all cells in that they have the largest number of DHSs: about 257,000 in one study, compared with 90,000–150,000 in differentiated cells. This difference probably reflects the developmental potential of embryonic stem cells. Approximately 30 percent of the DHSs observed in adult differentiated cells are also observed in embryonic stem cells, but a different 30 percent is retained in each adult cell type. An additional 50,000–100,000 new DHSs not found in embryonic stem cells arise during development, but a different set of DHSs arises in each cell type. These DHS patterns reveal the complexity of the combinations of transcription factors that regulate each gene. Approximately a million distinct DHSs were characterized in the cell types shown in Figure 9-41, suggesting that on average, combinations of four or five enhancers regulate the transcription of each of the roughly 21,000 genes in the human genome. This analysis excluded the central nervous system, probably the most complex organ system of all, so the total number of human enhancers may be much larger. But in the tissues analyzed, the maps of DHSs reveal where binding of early embryonic
(a)
FIGURE 941 Maps of DNase I hypersensitive sites in embryonic and adult cells reflect their developmental history. (a) DHSs from each of the human cell types shown at the left are mapped in the interval on chromosome 12 between 96.2 and 96.8 Mb from the left end. The height of each vertical bar in the figure represents the number of times a sequence in a 50-bp interval at that position was sequenced after following the protocol described in the text to ligate alinker of known sequence to DNA ends resulting from low-level DNase I digestion of chromatin. The plots are color-coded according to the embryonic tissue from which they developed, as shown in (b). (b) Dendrogram showing the relationships among the DHS maps for each cell type across the entire genome. The embryonic tissue from which each of these cell types develops is shown at the right. Embryonic stem cells form the root of the dendrogram. The DHS maps for all other cell types are derived from those for the embryonic stem cell by loss of some DHSs and the acquisition of other DHSs. The dendrogram, based on how closely DHS maps from two cell types are related, parallels the developmental relationships among the cell types. [Republished with permission of Elsevier, Stergachis, A.B., et al., “Developmental Fate and Cellular Maturity Encoded in Human Regulatory DNA Landscapes,” Cell, 2013, 154: 888-903; permission conveyed through Copyright Clearance Center, Inc.]
Embryonic stem cells
(b)
9.6 Regulation of Transcription-FactorActivity
399
factors function as transcription activators only when bound to their ligands.
transcription factors is lost and where new cell-type-specific combinations of transcription factors bind as a cell differentiates from the embryonic stem cell. Even this estimate fails to capture the complexity of transcriptional control, since many transcription-factor-binding sites detected as one DHS are bound by different related transcription factors expressed in different cell types. Often different related transcription factors bind to the same transcription-control region in different cell types to regulate the appropriate level of transcription for that cell type.
All Nuclear Receptors Share a Common Domain Structure Sequencing of cDNAs derived from mRNAs encoding various nuclear receptors has revealed remarkable conservation in their amino acid sequences. It has also revealed that each of these receptors has three functional regions (Figure 9-43). The first is a unique N-terminal region of variable length (100–500 amino acids). Portions of this variable region function as activation domains in most nuclear receptors. The second is a DNA-binding domain that maps near the center of the primary sequence and contains a repeat of the C4 zinc-finger motif (see Figure 9-30b). The third region, the hormonebinding domain, located near the C-terminal end, contains a hormone-dependent activation domain (see Figure 9-31b, c). In some nuclear receptors, the hormone-binding domain functions as a repression domain in the absence of ligand.
Nuclear Receptors Are Regulated by Extracellular Signals In addition to controlling the expression of transcription factors, cells also regulate the activities of many of the transcription factors expressed in a particular cell type. For example, many transcription factors are regulated by intercellular signals. Interactions between the extracellular domains of transmembrane receptor proteins on the surface of the cell and specific protein ligands for these receptors secreted by other cells or expressed on the surfaces of neighboring cells activate the intracellular domains of these transmembrane proteins, transducing the signal received on the outside of the cell to a signal on the inside of the cell. The intracellular signal then regulates the activities of enzymes that modify transcription factors by phosphorylation, acetylation, and other types of post-translational protein modifications. These post-translational modifications activate or inhibit transcription factors in the nucleus. In Chapter16, we describe the major types of cell-surface receptors for protein ligands and the intracellular signaling pathways that regulate transcription-factor activity. Here we discuss another major group of extracellular signals that regulate the activities of transcription factors: small, lipid-soluble hormones including many different steroid hormones, retinoids, and thyroid hormones. These lipid-soluble hormones can diffuse through the plasma and nuclear membranes and interact directly with the transcription factors they control (Figure 9-42). As noted earlier, transcription factors regulated by lipid-soluble hormones include the nuclear-receptor superfamily. These transcription
CH2OH C HO
H3C
CH3
CH3
Nuclear-Receptor Response Elements Contain Inverted or Direct Repeats The DNA sites to which nuclear receptors bind are called response elements. The characteristic nucleotide sequences of several response elements have been determined. The consensus sequences of response elements for two steroid hormone receptors, the glucocorticoid receptor response element (GRE) and the estrogen receptor response element (ERE) are 6-bp inverted repeats separated by any three base pairs (Figure 9-44a, b). This finding suggested that the cognate steroid hormone receptors would bind to DNA as symmetric dimers (i.e., dimers with twofold rotational symmetry), as was later confirmed by x-ray crystallographic analysis of the homodimeric glucocorticoid receptor’s C4 zinc-finger DNA-binding domain (see Figure 9-30b). Some nuclear-receptor response elements, such as those for the receptors that bind nonsteroids such as vitamin D3, thyroid hormone, and retinoic acid, are direct repeats of the same sequence that is recognized by the estrogen receptor, separated by three, four, or five base pairs (Figure 9-44c–e).
CH3
O C
O OH
OH CH3 Retinoic acid
I
O Cortisol
HO
I O
NH2 CH2
CH
O C OH
I
I Thyroxine
400
CHAPTER 9
t Transcriptional Control of Gene Expression
FIGURE 942 Examples of hormones that bind to nuclear receptors. These and related lipid-soluble hormones diffuse through the plasma and nuclear membranes and bind to receptors located in the cytosol or nucleus. The ligand-receptor complex functions as a transcription activator.
1
Estrogen receptor (ER)
553
1
1
1
946
Progesterone receptor (PR)
777
Glucocorticoid receptor (GR)
408
Thyroxine receptor (TR)
1
Retinoic acid receptor (RAR)
432
C
N Variable region (100–500 aa)
DNA-binding domain (68 aa)
Ligand-binding domain (225–285 aa)
42–94%
15–57%
Amino acid identity:
FIGURE 943 General design of transcription factors in the nuclear-receptor superfamily. The centrally located DNA-binding domain exhibits considerable sequence homology among different receptors and contains two copies of the C4 zinc-finger motif
(a)
GRE
5ⴕ AGA A C A(N)3T G T T C T 3ⴕ 3ⴕ T C T T G T(N)3A C A AG A 5ⴕ
(b)
ERE
5ⴕ AGG T C A(N)3T GA C C T 3ⴕ 3ⴕ T C C AG T(N)3A C T GG A 5ⴕ
(c)
VDRE
5ⴕ AGG T C A(N)3AGG T C A 3ⴕ 3ⴕ T C C AG T(N)3 T C C AG T 5ⴕ
(d)
TRE
5ⴕ AGG T C A(N)4AGG T C A 3ⴕ 3ⴕ T C C AG T(N)4 T C C AG T 5ⴕ
(e)
RARE
5ⴕ AGG T C A(N)5AGG T C A 3ⴕ 3ⴕ T C C AG T(N)5 T C C AG T 5ⴕ
(f)
N C
C N
C C N
N
General primary structure
(seeFigure 9-30b). The C-terminal hormone-binding domain exhibits somewhat less homology. The N-terminal regions of various receptors vary in length, have unique sequences, and may contain one or more activation domains. See R. M. Evans, 1988, Science 240:889.
FIGURE 944 Consensus sequences of DNA response elements that bind five nuclear receptors. (a, b) The glucocorticoid and estrogen receptors are twofold symmetric dimers that bind, respectively, to the glucocorticoid receptor response element (GRE) and the estrogen receptor response element (ERE). Each of these response elements contains inverted repeats separated by three base pairs. (c–e) The heterodimeric nuclear receptors each contain one RXR subunit associated with another nuclear-receptor subunit that defines the hormone response. RXR-VDR mediates responses to vitamin D3 by binding to a direct repeat separated by three base pairs (a VDRE). RXR-TR mediates responses to thyroid hormone by binding to the same DNA bases in a direct repeat separated by four base pairs (a TRE). Similarly, RXR-RAR mediates a response to retinoic acid by binding to the same direct repeat separated by five base pairs, comprising a RARE. The repeat sequences bound by the reading helices of these receptors are indicated by red arrows. (f) Crystal structures of the glucocorticoid receptor bound to DNA containing a GRE (top) and of the RXR-TR heterodimer bound to DNA containing a TRE (bottom). Red arrows indicate the orientation from N to C of the helices below them. Note that in the twofold symmetric glucocorticoid receptor, the reading helices are inverted relative to each other so that they “read” an AGAACA on the top strand of the left half-site and on the bottom strand of the right half-site, separated by 3 base pairs. Consequently, the binding site for the glucocorticoid receptor and other twofold symmetric homodimers such as the estrogen receptor is an inverted repeat (see a and b). In contrast, the reading helices in the RXR-TR heterodimer are in the same orientation. Consequently, they read an AGGTCA sequence in the same orientation in the two-half sites separated by four base pairs, a direct-repeat binding site. The interface between the RXR subunit and the vitamin D3 receptor (VDR) subunit bound to a VDRE brings the two reading helices closer together so that they bind to the same half-sites separated by three rather than four base pairs. Similarly, the interface between the RXR and RAR subunits bound to a RARE positions the two reading helices in the heterodimer farther apart than in the RXR-TR, so that they bind the same AGGTCA sequences separated by five base pairs. See K. Umesono et al., 1991, Cell 65:1255, and A. M. Naar et al., 1991, Cell 65:1267. [Part (f) top data from B. F. Luisi et al., 1991, Nature 352:497–505, PDB ID 1glu. Part (f) bottom data from F. Rastinejad et al., 1995, Nature 375:203, PDB ID 2nll.]
9.6 Regulation of Transcription-FactorActivity
401
The specificity of these response elements is determined by the spacing between the repeats. The nuclear receptors that bind to these direct-repeat response elements do so as heterodimers, all of which share a monomer called RXR. The vitamin D3 response element (VDRE), for example, is bound by the RXR-VDR heterodimer, and the retinoic acid response element (RARE) is bound by RXR-RAR. The monomers composing these heterodimers interact with each other in such a way that the two DNA-binding domains lie in the same rather than inverted orientation, allowing the RXR heterodimers to bind to direct repeats of the binding site for each monomer (Figure 9-44f). In contrast, the monomers in homodimeric nuclear receptors (e.g., GRE and ERE) have an inverted orientation.
Hormone Binding to a Nuclear Receptor Regulates Its Activity as a Transcription Factor The mechanism whereby hormone binding controls the activity of nuclear receptors differs between heterodimeric and homodimeric receptors. Heterodimeric nuclear receptors (e.g., RXR-VDR, RXR-TR, and RXR-RAR) are located exclusively in the nucleus. In the absence of their hormone ligand, they repress transcription when bound to their cognate sites in DNA. They do so by directing histone deacetylation at nearby nucleosomes by associating with histone deacetylase complexes, as described earlier for other repressors (see Figure 9-37a). When heterodimeric nuclear receptors bind their ligand, they undergo a conformational change, and as a consequence, they bind histone acetylase complexes, thereby reversing their own repressing effects. In the presence of ligand, the ligand-bound conformation of the receptor also binds Mediator, stimulating preinitiation complex assembly. In contrast to heterodimeric nuclear receptors, homodimeric receptors are found in the cytoplasm in the absence of their ligands. Hormone binding to these receptors leads to their translocation to the nucleus. The hormone-dependent translocation of the homodimeric glucocorticoid receptor (GR) was demonstrated in the transfection experiments shown in Figure 9-45a–c. The GR hormone-binding domain alone mediates this transport. Subsequent studies showed that in the absence of hormone, GR cannot be transported into the nucleus because its ligand-binding domain is partially unfolded by the major cellular chaperone Hsp70. As long as the receptor is confined to the cytoplasm, it cannot interact with target genes and hence cannot activate transcription. Hormone binding promotes a “handoff” of GR from Hsp70 to Hsp90, which, with coupled hydrolysis of ATP, refolds the GR ligandbinding domain, increasing the affinity for hormone and releasing GR from Hsp70 so that it can enter the nucleus. Once in the nucleus in the conformation induced by ligand binding, it can bind to response elements associated with target genes (Figure 9-45d). Once the receptor with bound hormone binds to a response element, it activates transcription by interacting with chromatin-remodeling and histone acetylase complexes and Mediator.
402
CHAPTER 9
t Transcriptional Control of Gene Expression
Metazoans Regulate the RNA Polymerase II Transition from Initiation to Elongation A recent unexpected discovery that resulted from application of the chromatin immunoprecipitation technique (see Figure9-18) is that a large fraction of genes in metazoans have a paused elongating RNA polymerase II within about 100 bp of the transcription start site. Thus expression of the encoded protein is controlled not only by transcription initiation, but also by transcription elongation early in the transcription unit. The first genes discovered to be regulated by control of transcription elongation were heat-shock genes (e.g., hsp70), which encode molecular chaperones that help to refold denatured proteins and other proteins that help the cell to deal with the effects of heat shock. When heat shock occurs, the heat-shock transcription factor (HSTF) is activated. Binding of activated HSTF to specific sites in the promoter-proximal region of heat-shock genes stimulates the paused polymerase to continue chain elongation and promotes rapid reinitiation by additional Pol II molecules, leading to many transcription initiations per minute. This mechanism of transcriptional control permits a rapid response: these genes are always paused in a state of suspended transcription and therefore, when an emergency arises, no time is required to remodel and acetylate chromatin at the promoter and assemble a transcription preinitiation complex. Another transcription factor shown to regulate transcription by controlling elongation by Pol II paused near the transcription start site is MYC, which functions in the regulation of cell growth and division. MYC is often expressed at high levels in cancer cells and is a key transcription factor in the reprogramming of somatic cells into pluripotent stem cells capable of differentiation into any cell type. The ability to induce differentiated cells to convert to pluripotent stem cells has elicited enormous research interest because of its potential for the development of therapeutic treatments for traumatic injuries to the nervous system and degenerative diseases (see Chapter 21).
Termination of Transcription Is Also Regulated Once Pol II has transcribed about 200 nucleotides from the transcription start site, elongation through most genes is highly processive. Chromatin immunoprecipitation with antibody to Pol II, however, indicates that the amount of Pol II at various positions in a transcription unit in a population of cells varies greatly (see Figure 9-18b, right). This finding indicates that the enzyme can elongate through some regions much more rapidly than others. In most cases, Pol II does not terminate transcription until after a sequence is transcribed that directs cleavage and polyadenylation of the RNA at the sequence that forms the 3′ end of the encoded mRNA. Pol II can then terminate transcription at any of multiple sites located 0.5–2 kb beyond this poly(A) addition site. Experiments with mutant genes show that termination is coupled to the process that cleaves and polyadenylates the 3′ end of a transcript, which is discussed in the next chapter.
(b)
(a)
(c)
− Dex
+ Dex
Proteins expressed:
(d)
N
C
N
-Galactosidase
C Glucocorticoid receptor
Hormone Exterior Chaperones AD
DBD
Cytosol
LBD
LBD
LBD
GR AD
DBD DBD
AD
Response element AD
DBD
GR
LBD
Nucleus
N
C GR ligand-binding domain
EXPERIMENTAL FIGURE 945 Fusion proteins demonstrate that the hormone-binding domain of the glucocorticoid receptor mediates translocation to the nucleus in the presence of hormone. Cultured animal cells were transfected with expression vectors encoding the proteins diagrammed at the bottom. Immunofluorescence with a labeled antibody specific for β-galactosidase was used to detect the expressed proteins in transfected cells. (a) In cells that expressed β-galactosidase alone, the enzyme was localized to the cytoplasm in the presence and absence of the glucocorticoid hormone dexamethasone (Dex). (b) In cells that expressed a fusion protein consisting of β-galactosidase and the entire glucocorticoid receptor (GR), the fusion protein was present in the cytoplasm in the absence of hormone but was transported to the nucleus in the presence of hormone. (c) Cells that expressed a fusion protein composed of β-galactosidase and only the GR ligand-binding domain (light purple) also exhibited hormone-dependent transport of the fusion protein to the nucleus. (d) Model of hormone-dependent gene activation by a homodimeric nuclear receptor. In the absence of hormone, the receptor is kept in the cytoplasm by interaction between its ligand-binding domain (LBD) and chaperone proteins. When hormone is present, it diffuses through the plasma membrane and binds to the ligand-binding domain, causing a conformational change that releases the receptor from the chaperone proteins. The receptor with bound ligand is then translocated into the nucleus, where its DNA-binding domain (DBD) binds to response elements, allowing the ligandbinding domain and an additional activation domain (AD) at the N-terminus to stimulate transcription of target genes. [Parts (a)–(c) from Picard, D. and Yamamoto, K. R., “Two signals mediate hormone-dependent nuclear localization of the glucocorticoid receptor,” EMBO J., 1987, 6(11):3333–3340; courtesy of the authors.]
KEY CONCEPTS OF SECTION 9.6
Regulation of Transcription-Factor Activity r The activities of many transcription factors are indirectly regulated by binding of extracellular proteins and peptides to cell-surface receptors. These receptors activate intracellular signal transduction pathways that regulate specific transcription factors through a variety of mechanisms discussed in Chapter 16.
r Nuclear receptors constitute a superfamily of dimeric C4 zinc-finger transcription factors that bind lipid-soluble hormones and interact with specific response elements in DNA (see Figures 9-42 and 9-44). r Hormone binding to nuclear receptors induces conformational changes that modify the interactions of these receptors with other proteins (see Figure 9-31b, c). r Heterodimeric nuclear receptors (e.g., those for retinoids, vitamin D, and thyroid hormone) are found only in the
9.6 Regulation of Transcription-FactorActivity
403
nucleus. In the absence of hormone, they repress transcription of target genes with the corresponding response element. When bound to their ligands, they activate transcription. r Steroid hormone receptors are homodimeric nuclear receptors. In the absence of hormone, they are trapped in the cytoplasm by molecular chaperones. When bound to their ligands, they can translocate to the nucleus and activate transcription of target genes (see Figure 9-45). r DNase I hypersensitive sites (DHSs) indicate the positions of transcription-factor binding in chromatin, although they do not indicate which transcription factor is bound. Nonetheless, mapping of DHSs in differentiating cells gives an overview of how transcription-factor-binding sites change as a cell differentiates into a specific cell type. r In metazoans, RNA polymerase II often pauses during elongation within approximately 50–100 base pairs from the transcription start site. Release from this pause contributes to the regulation of gene transcription. r Resumption of elongation by Pol II paused in the promoterproximal region is also required for gene transcription and is a regulated step. r In most cases, Pol II does not terminate transcription until after a sequence is transcribed that directs cleavage and polyadenylation of the RNA.
9.7 Epigenetic Regulation of Transcription The term epigenetics refers to the study of inherited changes in the phenotype of a cell that do not result from changes in DNA sequence. For example, during the differentiation of bone marrow stem cells into the several different types of blood cells, a hematopoietic stem cell divides into two daughter cells, one of which continues to have the properties of a hematopoietic stem cell, including the potential to differentiate into all the different types of blood cells. But the other daughter cell becomes either a lymphoid progenitor cell or a myeloid progenitor cell (see Figure 21-17). Lymphoid progenitor cells generate daughter cells that differentiate into lymphocytes, which perform many of the functions involved in immune responses to pathogens (see Chapter 23). Myeloid progenitor cells divide into daughter cells that are committed to differentiating into red blood cells, different kinds of phagocytic white blood cells, or the cells that generate platelets involved in blood clotting. Lymphoid and myeloid progenitor cells both have the same DNA sequence as the zygote (generated by fertilization of an egg cell by a sperm cell) from which they developed, but they have restricted developmental potential because of epigenetic differences between them.
404
CHAPTER 9
t Transcriptional Control of Gene Expression
Such epigenetic changes are initially the consequence of the expression of specific master transcription factors that are regulators of cellular differentiation, controlling the expression of other genes that encode transcription factors and proteins involved in cell-cell communication in complex networks of gene control, and which are currently the subject of intense investigation. Changes in gene expression initiated by transcription factors are often reinforced and maintained over multiple cell divisions by post-translational modifications of histones and methylations of DNA at position 5 of the cytosine pyrimidine ring (see Figure 2-17) that are maintained and propagated to daughter cells when cells divide. Consequently, the term epigenetic marks is used to refer to such post-translational modifications of histones and 5-methyl C modification of DNA.
DNA Methylation Represses Transcription As mentioned earlier, most promoters in mammals fall into the CpG island class. Active CpG island promoters have Cs in their CG sequences that are unmethylated. Unmethylated CpG island promoters have reduced affinity for histone octamers, but nucleosomes immediately neighboring the unmethylated promoters are modified by histone H3 lysine 4 di- or trimethylation and are associated with PolII molecules that are paused during transcription of both the sense and antisense template DNA strands, as discussed earlier (see Figures 9-18 and 9-19). Recent research indicates that methylation of histone H3 lysine 4 occurs in mouse cells because a protein named Cfp1 (CXXC finger protein 1) binds unmethylated CpG-rich DNA through a zinc-finger domain (CXXC) and associates with a histone methylase specific for histone H3 lysine 4 (Setd1). Chromatin-remodeling complexes and the general transcription factor TFIID, which initiates Pol II preinitiation complex assembly (see Figure 9-19), associate with nucleosomes bearing the H3 lysine 4 trimethyl mark, promoting Pol II transcription initiation. In differentiated cells, however, a small percentage of specific CpG island promoters, depending on the cell type, have CpGs marked by 5-methyl C. This modification of CpG island DNA triggers chromatin condensation. A family of proteins that bind to DNA that is rich in 5-methyl C–modified CpGs (called methyl CpG-binding proteins, or MBDs) bind to the marked promoters and associate with histone deacetylases and repressive chromatin-remodeling complexes that condense chromatin, resulting in transcriptional repression. The 5-methyl C is added to the CpGs by DNA methyl transferases named DNMT3a and DNMT3b. They are referred to as de novo DNA methyl transferases because they methylate an unmethylated C. Much remains to be learned about how DNMT3a and b are directed to specific CpG islands. But once they have methylated a DNA sequence, methylation at that C is passed on through DNA replication through the action of the ubiquitous maintenance methyl transferase DNMT1:
5 3
CMeG G
CMe
3 5
DNA Replication
5 3
CMeG G C
3 5
5 3
C G
3 5
(red indicates daughter strands). As a consequence, once a CpG island promoter is methylated by DNMT3a or b, it continues to be methylated by DNMT1 in subsequent daughter cells. Consequently, the promoter remains repressed in all subsequent daughter cells through interactions with MBDs, even after the stimulus for the initial C-methylation by DNMT3a or b has ceased. Therefore, repression of C-methylated promoters is inherited through cell division. This mechanism of epigenetic repression is being intensely investigated because tumor-suppressor genes encoding proteins that function to suppress the development of cancer are often inactivated in cancer cells by abnormal CpG methylation of their promoter regions, as discussed further in Chapter 24.
Methylation of Specific Histone Lysines Is Linked to Epigenetic Mechanisms of Gene Repression Figure 8-26b summarized the different types of posttranslational modifications that are found on histones, including acetylation of lysines and methylation of lysines on the nitrogen atom of the terminal ε-amino group of the lysine side chain (see Figure 2-14). Lysines can be modified by the addition of one, two, or three methyl groups to this terminal nitrogen atom, generating mono-, di-, and trimethylated lysine, all of which carry a single positive charge. The acetylation state at a specific histone lysine on a particular nucleosome results from a dynamic equilibrium between acetylation and deacetylation by histone acetylases and histone deacetylases, respectively. Acetylation of histones in a localized region of chromatin predominates when local DNAbound activators transiently bind histone acetylase complexes. Deacetylation predominates when repressors transiently bind histone deacetylase complexes. Pulse-chase radiolabeling experiments have shown that acetyl groups on histone lysines turn over rapidly through the sequential actions of histone acetylases and histone deacetylases. In contrast, methyl groups on histones are much more stable. Histone lysine methyl groups can be removed by histone lysine demethylases. But the resulting turnover of histone lysine methyl groups is much slower than the turnover of histone lysine acetyl groups, which makes methylation the more appropriate post-translational modification for propagating epigenetic information. Several other post-translational modifications of histones have been characterized (see Figure 8-26b). These modifications all have the potential to positively or negatively regulate the binding of proteins that interact with the chromatin fiber to regulate transcription as well as other processes, such as chromosome folding into the highly condensed structures that form during mitosis (see Figures 8-35 and 8-36). A picture of chromatin has emerged in which histone tails extending as random coils from the chromatin fiber are
G CMe
DNMTI
5 3
CMeG G CMe
3 5
5 3
CMeG G CMe
3 5
post-translationally modified to generate one of many possible combinations of modifications that regulate transcription and other processes by regulating the binding of a large number of different protein complexes. This control of the interactions of proteins with specific regions of chromatin that results from the combined influences of various posttranslational modifications of histones has been called a histone code. Some of these modifications, such as histone lysine acetylation, are rapidly reversible, whereas others, such as histone lysine methylation, can be templated through chromatin replication, generating epigenetic inheritance in addition to inheritance of DNA sequence. Table 9-3 summarizes the influence that post-translational modifications of specific histone amino acid residues usually have on transcription. Histone H3 Lysine 9 Methylation in Heterochromatin In most eukaryotes, some co-repressor complexes contain histone methyl transferase subunits that methylate histone H3 at lysine 9, generating di- and trimethyl lysines. These methylated lysines are binding sites for isoforms of HP1 protein that function in the condensation of heterochromatin, as discussed in Chapter 8 (see Figure 8-29). For example, the KAP1 co-repressor complex functions with a class of more than 200 zinc-finger transcription factors encoded in the human genome. This co-repressor complex includes an H3 lysine 9methyl transferase that methylates nucleosomes over the promoter regions of repressed genes, leading to HP1 binding and repression of transcription. An integrated transgene in cultured mouse fibroblasts that was repressed through the action of the KAP1 co-repressor was associated with heterochromatin in most cells, whereas the active form of the same transgene was associated with euchromatin (Figure 9-46). Chromatin immunoprecipitation assays (see Figure 9-18) showed that the repressed gene was associated with histone H3 methylated at lysine 9 and with HP1, whereas the active gene was not. Importantly, H3 lysine 9 methylation is maintained following chromosome replication by the mechanism diagrammed in Figure 9-47. When a methylated region of DNA is replicated in S phase, the histone octomers associated with the parent DNA are randomly distributed to the daughter DNA molecules. New histone octamers that are not methylated on lysine 9 also associate randomly with the new daughter chromosomes, but since the parent histone octomers are associated with both daughter chromosomes, approximately half of the daughter chromosomes’ nucleosomes are methylated on lysine 9. Association of histone H3 lysine methyl transferases (directly or indirectly) with the parent methylated nucleosomes leads to methylation of the newly assembled histone octamers. Repetition of this process with each cell division results in maintenance of H3 lysine 9 methylation of this region of the chromosome.
9.7 Epigenetic Regulation of Transcription
405
TABLE 93
Histone Post-Translational Modifications Associated with Active and Repressed Genes
Modification
Sites of Modification
Effect on Transcription
Acetylated lysine
H3 (K9, K14, K18, K27, K56) H4 (K5, K8, K13, K16) H2A (K5, K9, K13) H2B (K5, K12, K15, K20)
Activation Activation Activation Activation
Hypoacetylated lysine
Repression
Phosphorylated serine/threonine
H3 (T3, S10, S28) H2A (S1, T120) H2B (S14)
Activation Activation Activation
Methylated arginine
H3 (R17, R23) H4 (R3)
Activation Activation
Methylated lysine
H3 (K4) Me3 in promoter region H3 (K4) Me1 in enhancers H3 (K36, K79) in transcribed region H3 (K9, K27) H4 (K20)
Activation Elongation Repression Repression
H2B (K120 in mammals, K123 in S. cerevisiae) H2A (K119 in mammals)
Activation Repression
Ubiquitinylated lysine
Active
Transgene
Repressed
Heterochromatin
FIGURE 946 Association of a repressed transgene with heterochromatin. Mouse fibroblasts were stably transformed with a transgene that contained binding sites for an engineered repressor. The repressor was a fusion between a DNA-binding domain, a repression domain that interacts with the KAP1 co-repressor complex, and the ligand-binding domain of a nuclear receptor that allows the nuclear import of the fusion protein to be controlled experimentally (see Figure 9-45). DNA was stained blue with the dye DAPI. Brighter-staining regions are regions of heterochromatin, where the DNA concentration is higher than in euchromatin. The transgene was detected by hybridization of a fluorescently labeled complementary probe (green). When the recombinant
Epigenetic Control by Polycomb and Trithorax Complexes Another type of epigenetic mark that is essential for repression of genes in specific cell types in multicellular animals and plants involves a set of proteins known collectively as 406
CHAPTER 9
t Transcriptional Control of Gene Expression
Transgene
repressor was retained in the cytoplasm, the transgene was transcribed (left) and was associated with euchromatin in most cells. When hormone was added so that the recombinant repressor entered the nucleus, the transgene was repressed (right) and associated with heterochromatin. Chromatin immunoprecipitation assays (see Figure 9-18) showed that the repressed gene was associated with histone H3 methylated at lysine 9 and HP1, whereas the active gene was not. [From Ayyanathan, K. et al., “Regulated recruitment of HP1 to a euchromatic gene induces mitotically heritable, epigenetic gene silencing: a mammalian cell culture model of gene variegation,” Genes and Development, 2003,17:1855–1869. Courtesy of Frank Rauscher; republished with permission from Cold Spring Harbor Laboratory Press.]
Polycomb proteins and a counteracting set of proteins known as Trithorax proteins. These names were derived from the phenotypes of mutations in the genes encoding these proteins in Drosophila, in which they were first discovered. The Polycomb repression mechanism is essential for maintaining the repression of genes in specific types of cells, and in
Me3
Me3
Me3 Me3 Me3 Me3 Replication
H3K9 HMT Me3
H3K9 HMT Me3
Me3
Me3
Me3 Me3 Me3 Me3
Methylation H3K9 HMT Me3
H3K9 HMT Me3
Me3 Me3 Me3 Me3
FIGURE 947 Maintenance of histone H3 lysine 9 methylation during chromosome replication. When chromosomal DNA is replicated, the parent histones randomly associate with the two daughter chromosomes, while unmethylated histones synthesized during Sphase are assembled into other nucleosomes in those same daughter chromosomes. Association of histone H3 lysine 9 methyl transferases (H3K9 HMT) with parent nucleosomes bearing the histone 3 lysine 9di- or trimethylation mark methylates the newly added unmodified nucleosomes. Consequently, histone H3 lysine 9 methylation marks are maintained during repeated cell divisions unless they are specifically removed by a histone demethylase.
all the subsequent cells that develop from them, throughout the life of an organism. Important genes regulated by Polycomb proteins include the Hox genes, which encode master regulatory transcription factors. Different combinations of Hox transcription factors help to direct the development of specific tissues and organs in a developing embryo. Early in embryogenesis, expression of Hox genes is controlled by typical activator and repressor proteins. However, the expression of these activators and repressors stops at an early point in embryogenesis. Correct expression of the Hox genes in the descendants of the early embryonic cells is then maintained throughout the remainder of embryogenesis and on into adult life by the Polycomb proteins, which maintain the repression of specific Hox genes. Trithorax proteins perform the opposite function, maintaining the expression of the Hox genes that were expressed in a specific cell early in embryogenesis in all the subsequent descendants of that cell. Polycomb and Trithorax proteins control thousands of genes, including genes that regulate cell growth and division (i.e., the cell cycle, as discussed in Chapter 19). Polycomb and Trithorax genes are often mutated in cancer cells, contributing importantly to the abnormal properties of these cells (see Chapter 24). Remarkably, virtually all cells in the developing embryo and adult express a similar set of Polycomb and Trithorax proteins, and all cells contain the same set of Hox genes. Yet only the Hox genes in cells where they were initially
repressed in early embryogenesis remain repressed, even though the same Hox genes in other cells remain active in the presence of the same Polycomb proteins. Consequently, as in the case of the yeast silent mating-type loci, the expression of Hox genes is regulated by a process that involves more than specific DNA sequences interacting with proteins that diffuse through the nucleoplasm. A current model for repression by Polycomb proteins is depicted in Figure 9-48. Most Polycomb proteins are subunits of one of two classes of multiprotein Polycomb repressive complexes: PRC1 and PRC2. The PRC2 complexes are thought to act initially by associating with the repression domains of specific repressors bound to their cognate DNA sequences early in embryogenesis, or with ribonucleoprotein complexes containing long noncoding RNAs, as discussed in a later section. The PRC2 complexes contain histone deacetylases that inhibit transcription, as discussed above. They also contain a subunit [E(z) in Drosophila, EZH2 in mammals] with a SET domain, which is the catalytic domain of several histone methyl transferases. This SET domain in PRC2 complexes methylates histone H3 on lysine 27, generating di- and trimethyl lysines. A PRC1 complex then binds the methylated nucleosomes through dimeric Pc subunits (CBXs in mammals), each containing a methyl lysine–binding domain (called a chromodomain) specific for methylated H3 lysine27. Binding of the dimeric Pc to neighboring nucleosomes is proposed to condense the chromatin into a structure that inhibits transcription. This proposal is supported by electron microscopy studies showing that PRC1 complexes cause nucleosomes to associate in vitro (Figure9-48d, e). PRC1 complexes also repress transcription through additional mechanisms. The PRC1 complex contains a ubiquitin ligase that monoubiquitinylates histone H2A at lysine 119 in the H2A C-terminal tail (see Figure 8-26). This modification of H2A inhibits elongation by inhibiting a histone chaperone that removes histone octamers from DNA as Pol II transcribes through a nucleosome, then replaces them as the polymerase passes. PRC1 also associates with a histone demethylase that specifically removes methyl groups from lysine 4 of histone H3, an activating mark discussed above. PRC2 complexes associate with nucleosomes bearing the histone H3 lysine 27 trimethylation mark, maintaining methylation of H3 lysine 27 in nucleosomes in the region. This methylation results in association of the chromatin with PRC1 and PRC2 complexes even after expression of the initial repressor proteins shown in Figure 9-48a, b has ceased. This association maintains H3 lysine 27 methylation by a mechanism analogous to that diagrammed in Figure 9-47. This mechanism is a key feature of Polycomb repression, which is maintained through successive cell divisions for the life of an organism (~100 years for some vertebrates, 2000 years for a sugar cone pine!). Trithorax proteins counteract the repressive mechanism of Polycomb proteins, as shown in studies of expression of the Hox transcription factor Abd-B in the Drosophila embryo (Figure 9-49). Abd-B is normally expressed only in posterior segments of the developing embryo. When the
9.7 Epigenetic Regulation of Transcription
407
(a)
(b)
(c)
PRC2 complex
PRC2 complex
E(z)
PRC1 complex PRC1
E(z)
H3
H3
K27
K27
Repressor K27 E(z)
H3
H3 K27
K27 H3
H3
H3
H3
H3
Pc Me
H3
K27 Me K27 Me K27 Me K27 Me
K27
K27
H3
Me K27 Me K27Me K27Me K27
K27 H3
E(z)
H3
H3
H3
Pc Pc Me Me
Me Me Pc Pc
H3
Pc Pc Me Me
Me Me Pc Pc
Pc Me
Me Me Pc Pc
PRC1 HMT
(d) Nucleosomes on DNA
(e) Nucleosomes + PRC1 complex on DNA
50 nm
FIGURE 948 Model for repression by Polycomb complexes. (a) During early embryogenesis, repressors associate with the PRC2 complex. (b) This association results in methylation (Me) of neighboring nucleosomes on histone H3 lysine 27 (K27) by the SET domain– containing subunit E(z). (c) The PRC1 complex binds nucleosomes methylated at H3 lysine 27 through a dimeric, chromodomaincontaining subunit Pc. The PRC1 complex condenses the chromatin into a repressed chromatin structure. PRC2 complexes associate with PRC1 complexes to maintain H3 lysine 27 methylation of neighboring Anterior
histones. As a consequence, PRC1 and PRC2 association with the region is maintained when expression of the repressor proteins in (a)ceases. (d, e) Electron micrograph of a 1-kb fragment of DNA bound by four nucleosomes in the absence (d) and presence (e) of one PRC1 complex per five nucleosomes. See A. H. Lund and M. van Lohuizen, 2004, Curr. Opin. Cell Biol. 16:239; and N. J. Francis, R. E. Kingston, and C.L. Woodcock, 2004, Science 306:1574. [Parts (d) and (e) republished with permission of AAAS, from Francis, N.J. et al., “Chromatin compaction by a polycomb group protein complex, “ Science, 2004, 306(5701):1574–7; permission conveyed through Copyright Clearance Center, Inc.]
Posterior
wt
Scm− (PcG)
Abd-B
trx− (trxG)
408
FIGURE 949 Opposing influence of Polycomb and Trithorax complexes on expression of the Hox transcription factor Abd-B in Drosophila embryos. At the stage of Drosophila embryogenesis shown, Abd-B is normally expressed only in posterior segments of the developing embryo, as shown at the top (wt) by immunostaining with a specific anti–Abd-B antibody. In embryos with homozygous mutations of Scm, a Polycomb gene (PcG) encoding a protein associated with the PRC1 complex, Abd-B expression is derepressed in all embryo segments. In contrast, in homozygous mutants of trx, a Trithorax gene (trxG), Abd-B repression is increased so that the protein is expressed at high concentrations only in the most posterior segment. [From Klymenko, T., and Muller, J., “The histone methyltransferases Trithorax and Ash1 prevent transcriptional silencing by Polycomb group proteins,” EMBO Reports ©2004 John Wiley and Sons. Reproduced with permission of Wiley-VCH.]
CHAPTER 9
t Transcriptional Control of Gene Expression
Polycomb system is defective, Abd-B is expressed in all cells of the embryo. When the Trithorax system is defective and cannot counteract repression by the Polycomb system, Abd-B is repressed in most cells, except those in the very posterior of the embryo. Trithorax complexes include a histone methyl transferase that trimethylates histone H3 lysine 4, a histone methylation that is associated with the promoters of actively transcribed genes. This histone modification creates a binding site for histone acetylase and for chromatinremodeling complexes that promote transcription, as well as for TFIID, the general transcription factor that initiates preinitiation-complex assembly (see Figure 9-19). Nucleosomes with H3 lysine 4 methylation are also binding sites for specific histone demethylases that remove H3 histone K9 and K27 methylation, preventing the binding of HP1 and the Polycomb repressive complexes. Nucleosomes marked with H3 lysine 4 methylation are also thought to be distributed to both daughter DNA molecules during DNA replication, resulting in maintenance of this epigenetic mark by a strategy similar to that diagrammed in Figure 9-47.
Long Noncoding RNAs Direct Epigenetic Repression in Metazoans Repressive complexes have been discovered that are composed of multiple repressing proteins bound to RNAs many kilobases in length that do not contain long open reading frames and are consequently called long noncoding RNAs or lncRNAs. In some cases, these lncRNA-protein complexes repress genes on the same chromosome from which the RNA is transcribed, as in the case of X-chromosome inactivation in female mammals. In other cases, these repressive RNA-protein complexes act in trans, repressing genes on chromosomes other than those from which the lncRNA is transcribed. X-Chromosome Inactivation in Mammals The phenomenon of X-chromosome inactivation in female mammals (see Chapter8) is one of the most intensely studied examples of epigenetic repression mediated by a lncRNA. X inactivation is controlled by a roughly 100-kb domain on the X chromosome called the X-inactivation center. Remarkably, this region encodes several lncRNAs required for the random inactivation of one entire X chromosome early in the development of female mammals. The functions of these lncRNAs are only partially understood. The most intensively studied are transcribed from the complementary DNA strands near the middle of the X-inactivation center: the 40-kb TSIX lncRNA and the XIST RNA, which is spliced and polyadenylated into an RNA of about 17kb that is not exported to the cytoplasm (Figure 9-50a). In differentiated female cells, the inactive X chromosome is associated with XIST RNA-protein complexes along its entire length (Figure 9-50b). Targeted deletion of the Xist gene (see Figure 6-39) in cultured embryonic stem cells showed that it is required for X inactivation. Unlike most protein-coding genes on the inactive X chromosome, the Xist gene is actively transcribed. The XIST RNA-protein complexes do not diffuse to interact with the active X or other chromosomes, but remain
associated with the inactive X chromosome. Since the full length of the inactive X becomes coated by XIST RNA-protein complexes (see Figure 9-50b), these complexes must spread along the chromosome from the X-inactivation center where XIST is transcribed. In contrast to XIST, TSIX is transcribed from the active X chromosome, not from the inactive X chromosome. In the early female mouse embryo, made up of embryonic stem cells capable of differentiating into all cell types (see Chapter 21), genes on both X chromosomes are transcribed, and the 40-kb TSIX lncRNA (see Figure 9-50a) is transcribed from both copies of the X chromosome. Experiments employing engineered deletions in the X-inactivation center showed that TSIX transcription prevents significant transcription of the XIST RNA from the complementary DNA strand. Later in development, as cells begin to differentiate, TSIX transcription is repressed on one of the X chromosomes. This repression occurs randomly in different cells on the X chromosome derived from the sperm (Xp) or on the X chromosome derived from the egg (Xm). This inhibition of TSIX transcription determines which of the X chromosomes will be inactivated as the cells differentiate further because inhibition of TSIX transcription allows transcription of the XIST lncRNA on that chromosome. The transcribed XIST RNA contains RNA sequences that, by unknown mechanisms, cause it to spread along the X chromosome. Recent studies indicate that XIST lncRNA-protein complexes first associate with regions of the X chromosome localized near the X-inactivation center in the three-dimensional, folded structure of the future inactive X (Figure 9-50c), as shown by chromosome conformation capture assays (see Figure 8-34). These initial sites of XIST association are in gene-rich regions of the X chromosome and are postulated to serve as “entry sites” where additional copies of the XIST lncRNA-protein complexes first bind and then spread to neighboring regions. The mechanism of spreading is not currently understood. The inactive X chromosome also becomes associated with PRC2 complexes, which catalyze the trimethylation of histone H3 lysine 27. This methylation results in association of the PRC1 complex and transcriptional repression, as discussed above. These mechanisms of transcriptional repression must be redundant, however, because repression still occurs in the absence of the Polycomb proteins essential for the assembly of PRC1 and PRC2. At the same time, continued transcription of TSIX from the other, active X chromosome continues, represses XIST transcription from that X chromosome, and consequently prevents XIST-mediated repression of the active X. XIST and PRC1 and 2 complexes are then observed to associate with gene-poor regions of the inactive X chromosome as well as with gene-rich regions. Recent analysis by protein mass spectrometry (see Chapter3) of proteins associated with XIST lncRNA during the initiation phase of X inactivation in cultured mouse embryonic stem cells revealed that SMRT, a protein first characterized as a co-repressor that interacts with the thyroid hormone nuclear receptor in the absence of hormone, is part of the protein complex that interacts with XIST RNA. SMRT, in turn, interacts with a histone deacetylase (HDAC3). Subsequent knockdown experiments with siRNAs directed against SMRT
9.7 Epigenetic Regulation of Transcription
409
(a)
Xa
TSIX 73,780
73,800
73,820
73,840
73,860kb
XIST Xi
(b)
(c) Early X inactivation
ChrX
Xist
Entry sites for Xist localization
FIGURE 950 The Xist long noncoding RNA encoded in the X-inactivation center coats the inactive X chromosome in cells of mammalian females, repressing transcription of most genes on the inactive X. (a) The region of the human X-inactivation center encoding the noncoding RNAs Xist (transcribed from the inactive X), and Tsix (transcribed from the active X). Numbers are base pairs from the left end of the X chromosome. (b) A cultured fibroblast from a human female was analyzed by in situ hybridization with a probe complementary to Xist RNA labeled with a red fluorescent dye (left), a chromosome paint set of probes for the X chromosome labeled with a green fluorescent dye (center), and an overlay of the two fluorescent micrographs. The condensed inactive X chromosome is associated with Xist RNA. (c) Model for the spreading of the Xist lncRNA-protein complex on the inactive X chromosome during early differentiation of female embryonic stem cells. See E. Heard and A.-V. Gendrel, 2014, Annu. Rev. Cell Dev. Biol. 30:561. (d) Proteins associated with Xist lncRNA. Question marks indicate that it is not yet known how PRC2 complexes associate with HDAC3 and the RNA-binding protein SHARP. See C. A. McHugh et al., 2015, Nature 521:232. [Part (b) ©1996 C. M. Clemson et al., The Journal of Cell Biology, 132:259–275. doi: 10.1083/jcb.132.3.259.]
ChrX Spatially proximal sites
Mid X inactivation ChrX
Xist
Xist and PRC1 and 2 and H3K27me at gene dense regions
Xist
Xist and PRC1 and 2 and H3K27me at gene-dense and gene-poor regions
Terminal X inactivation ChrX
(d) SMRT SHARP HDAC3 ? ? RBAP48 SUZ12 EED PRC2 EZH2
Xist
K27 K27me
HNRNPU/ SAF-A
Gene
and HDAC3 showed that they are required for X inactivation, as are other identified RNA- and chromatin-binding proteins that link SMRT to XIST RNA and are required for the association of XIST RNA and PRC2 with the inactive X chromosome (Figure 9-50d). A short time later in development, the DNA of the inactive X also becomes methylated at most of its CpG island promoters. Specialized histone octamers in which histone H2A is replaced by a paralog of H2A called macroH2A also become associated with the inactive X. DNA methylation and macroH2A contribute to the stable repression of the inactive
410
CHAPTER 9
t Transcriptional Control of Gene Expression
X through the multiple cell divisions that occur later during embryogenesis and throughout adult life. Trans Repression by Long Noncoding RNAs Another example of transcriptional repression by a long noncoding RNA was discovered recently by researchers studying the function of noncoding RNAs transcribed from a region encoding a cluster of Hox genes, the HOXC locus, in cultured human fibroblasts. Depletion of a 2.2-kb noncoding RNA expressed from the HOXC locus by siRNA (see Figure 6-42) unexpectedly led to derepression of the HOXD locus, a roughly 40-kb region on another chromosome encoding several Hox proteins and multiple other noncoding RNAs, in these cells. Assays similar to chromatin immunoprecipitation showed that this noncoding RNA, named HOTAIR (for Hox Antisense Intergenic RNA), associates with the HOXD loci and with PRC2 complexes. This association results in histone H3 lysine 27 di- and trimethylation, PRC1 association, histone H3 lysine 4 demethylation, histone H2A monoubiquitinylation, and transcriptional repression. This process is similar to the recruitment of Polycomb complexes by Xist RNA, except that Xist RNA functions in cis, remaining in association with the chromosome from which it is transcribed, whereas HOTAIR leads to Polycomb repression in trans on both copies of another chromosome. Once again, redundant mechanisms for repression of these HOXD loci must exist, because extensive, but less complete, repression at the HOXD locus continues in the appropriate cells in mouse embryos with homozygous HOTAIR knockout mutations. Cis Activation by Long Noncoding RNAs Examples of lncRNAs involved in gene activation have been characterized recently. For example, HOTTIP lncRNA, which is transcribed from the 5′ end of the HOXA locus, is proposed to coordinate the activation of HOXA genes by binding to a histone H3 lysine 4 methylase. In addition, nascent
transcripts of lncRNA genes have been reported to activate transcription from promoters several kilobases away by interacting with the Mediator complex and delivering it to the promoter by looping of the intervening chromatin. In humans, but not in mice, a lncRNA called XACT has been discovered to associate with multiple sites along the full length of the active X chromosome and is postulated to contribute to maintenance of gene activity on that chromosome. XACT is also remarkable for being one of the longest characterized RNAs: 252 kb! It is mostly unspliced. In Drosophila, equal expression of genes encoded on the X chromosome in males and females (dosage compensation) does not result from inactivating one X chromosome in females. Rather, a generalized twofold increase in transcriptional activation of genes on the single X chromosome in males is controlled by two lncRNAs, roX1 and roX2, transcribed from the X chromosome in males only. The roX1 and roX2 RNAs associate with several proteins encoded by MSL (male-specific-lethal) genes and spread over the X chromosome specifically, much as Xist lncRNA-protein complexes spread over the inactive X in mammals. Recently, sequencing of total cellular RNA in multiple types of human cells identified roughly 15,000 human lncRNAs. Many of these lncRNAs have sequences that are evolutionarily conserved in most mammals, and about 5000 are found only in primates. This conservation of sequence strongly suggests that these lncRNAs, like XIST, HOTAIR, and HOTTIP, have important functions. Multiple lncRNAs are expressed only in specific cell types at specific times during development. For example, multiple lncRNAs are expressed primarily in differentiating red blood cells. Knockdown (see Figure 6-42 and Chapter 10) of several of these lncRNAs inhibits normal red blood cell development, but precisely how these lncRNAs perform their essential functions is not yet clear. The study of these conserved long noncoding RNAs and how they influence gene expression is another area of intense current investigation. ENCODE (Encyclopedia of DNA Elements) encompasses a consortium of international research groups organized and funded by the US National Human Genome Research Institute with the goal of building a comprehensive, publically available database of human DNA control elements and the transcription factors that bind to them in different cell types, histone post-translational modifications mapped by ChIP-seq and other related methods, DNase I hypersensitive sites, and regulatory lncRNAs and their sites of association in the genome, as well as newly discovered regulatory elements “that control cells and circumstances in which a gene is active.” Data sets from human cells and cells of model organisms that are too large to be published are also made publically available at a site called GEO (Gene Expression Omnibus) maintained by the US National Center for Bioinformatics (NCBI). Most journals that publish research based on genomic methods such as RNA-seq and ChIP-seq require that authors upload their original data to GEO. Worldwide public access to these data sets is greatly accelerating the pace of discovery in the area of gene regulation.
KEY CONCEPTS OF SECTION 9.7
Epigenetic Regulation of Transcription r Epigenetic control of transcription refers to repression or activation that is maintained after cells replicate as the result of DNA methylation or post-translational modification of histones, especially histone methylation. r Methylation of CpG sequences in CpG island promoters in mammals generates binding sites for a family of methylbinding proteins (MBDs) that associate with histone deacetylases, inducing hypoacetylation of the promoter regions and transcriptional repression. r Histone H3 lysine 9 di- and trimethylation creates binding sites for the heterochromatin-associated protein HP1, which results in the condensation of chromatin and transcriptional repression. These post-translational modifications are perpetuated following chromosome replication because the methylated histones are randomly associated with the daughter DNA molecules and associate with histone H3 lysine 9 methyl transferases that methylate histone 3 lysine 9 on newly synthesized histone octamers assembled on the daughter DNA. r Polycomb complexes maintain repression of genes initially repressed by sequence-specific repressors expressed early during embryogenesis. One class of Polycomb repressive complexes, PRC2 complexes, associates with these repressors in early embryonic cells, resulting in methylation of histone H3 lysine 27. This methylation creates binding sites for subunits in the PRC2 complex as well as for PRC1 complexes, which condense chromatin, inhibit the assembly of preinitiation complexes, and inhibit elongation. Since parent histone octamers with H3 methylated at lysine 27 are distributed to both daughter DNA molecules following DNA replication, PRC2 complexes that associate with these nucleosomes maintain histone H3 lysine 27 methylation through cell division. r Trithorax complexes oppose repression by Polycomb complexes by methylating H3 at lysine 4 and maintaining this activating mark through chromosome replication. r X-chromosome inactivation in female mammals requires a long noncoding RNA (lncRNA) called Xist that is transcribed from the X-inactivation center of one X chromosome and then spreads by a poorly understood mechanism along the length of the same chromosome. Xist interacts with a co-repressor that binds a histone deacetylase and PRC2 complexes at an early stage of embryogenesis, initiating X inactivation. X inactivation is maintained throughout the remainder of embryogenesis and adult life by continued association with Polycomb complexes and DNA methylation of CpG island promoters on the inactive X. r Some lncRNAs have been discovered that lead to repression of genes in trans, as opposed to the cis inactivation imposed by Xist. Repression is initiated by their interaction with PRC2 complexes.
9.7 Epigenetic Regulation of Transcription
411
r Some lncRNAs are associated with gene activation. Much remains to be learned about how lncRNAs are targeted to specific chromosomal regions, but the discovery of about 15,000 nuclear lncRNAs expressed in specific types of human cells during specific stages of their differentiation suggests that lncRNAs are central to widely used mechanisms of transcription regulation.
9.8 Other Eukaryotic Transcription Systems We conclude this chapter with a brief discussion of transcription initiation by the other two eukaryotic nuclear RNA polymerases, Pol I and Pol III. The distinct polymerases that transcribe mitochondrial and chloroplast DNA will be discussed in Chapter 12, on cellular energetics. Although these systems, and particularly their regulation, are less thoroughly understood than transcription by RNA polymerase II, they are equally fundamental to the life of eukaryotic cells.
Transcription Initiation by Pol I and Pol III Is Analogous to That by Pol II The formation of transcription initiation complexes involving Pol I and Pol III is similar in some respects to assembly of Pol II initiation complexes (see Figure 9-19). However, each of the three eukaryotic nuclear RNA polymerases requires its own polymerase-specific general transcription factors and recognizes different DNA control elements. Moreover, neither Pol I nor Pol III requires ATP hydrolysis by a DNA helicase to help melt the DNA template strands to initiate transcription, whereas Pol II does. Transcription initiation by Pol I, which synthesizes pre-rRNA, and by Pol III, which synthesizes
5.8S RNA
T0
T1–10
18S RNA
28S RNA
pRNA
CK2 TIF-IA TTF-I
UBF UCE
SL1
Topo I
SIRT7
Pol I
NM1 Actin
CORE
T0
tRNAs, 5S rRNA, and other small stable RNAs (see Table 9-2), is tightly coupled to the rate of cell growth and proliferation. Initiation by Pol I The regulatory elements directing Pol I initiation are similarly located relative to the transcription start site in yeast and in mammals. A core element spanning the transcription start site from −40 to +5 is essential for Pol I transcription. An additional upstream control element extending from roughly −155 to −60 increases in vitro Pol I transcription tenfold. In humans, assembly of the Pol I preinitiation complex (Figure 9-51) is initiated by the cooperative binding of UBF (upstream binding factor) and SL1 (selectivity factor), a multisubunit factor containing TBP and four Pol I–specific TBP-associated factors (TAFIs), to the Pol I promoter region. The TAFI subunits interact directly with Pol I–specific subunits, directing this specific nuclear RNA polymerase to the transcription start site. TIF-1A, the mammalian homolog of S. cerevisiae RRN3, is another required factor, as are the abundant nuclear protein kinase CK2 (casein kinase 2), nuclear actin, nuclear myosin, the protein deacetylase SIRT7, and topoisomerase I, which prevents DNA supercoils (see Figure 5-8) from forming during rapid Pol I transcription of the 14-kb transcription unit. Transcription of the 14-kb precursor of 18S, 5.8S, and 28S rRNAs (see Chapter 10) is highly regulated to coordinate ribosome synthesis with cell growth and division. This coordination is achieved through regulation of the activities of the Pol I initiation factors by post-translational modifications, including phosphorylation and acetylation at specific sites, control of the rate of Pol I elongation, and control of the number of the 300 or so human rRNA genes that are transcriptionally active by epigenetic mechanisms that assemble inactive copies into heterochromatin. Switching between the active and heterochromatic silent states of rRNA genes is accomplished by a multisubunit chromatin-remodeling complex called NoRC
FIGURE 951 Transcription of the rRNA precursor RNA by RNA polymerase I. (Top) Electron micrograph of RNA-protein complexes transcribed from one copy of the repeated rRNA genes. (Middle) A single Pol I transcription unit. Enhancers that stimulate Pol I transcription from a single transcription start site are represented by blue boxes. PolI transcription termination sites (T0, T1–T10) bound by the Pol I–specific termination factor TTF-1 are shown as red rectangles. pRNA indicates transcription of the noncoding pRNA required for transcriptional silencing. The sequences of regions of DNA shown as yellow rectangles are retained during processing of 18S, 5.8S and 28S rRNAs. The other regions transcribed from the black arrow to the red termination sites are removed and degraded. (Bottom) The core promoter element and upstream control element are shown with the location of Pol I and its general transcription factors UBF, SL1, and TIF-1A represented, as well as other proteins required for Pol I elongation and control. See I. Grummt, 2010, FEBS J. 277:4626. [Electron micrograph courtesy Ann L. Beyer.]
412
CHAPTER 9
t Transcriptional Control of Gene Expression
(“No” for nucleolus, the site of rRNA transcription within nuclei). NoRC localizes a nucleosome over the Pol I transcription start site, blocking preinitiation complex assembly. It also interacts with a DNA methyl transferase that methylates a critical CpG in the upstream control element, inhibiting binding by UBF, as well as with histone methyl transferases that di- and trimethylate histone H3 lysine 9, creating binding sites for heterochromatic HP1, and with histone deacetylases. Moreover, a roughly 250-nt noncoding RNA called pRNA (promoter-associated RNA) transcribed by Pol I from about 2 kb upstream of the rRNA transcription unit (red arrow in Figure 9-51) is bound by a subunit of NoRC and is required for transcriptional silencing. The pRNA is believed to target NoRC to Pol I promoter regions by forming an RNA:DNA triplex with the T0 terminator sequence. This creates a binding site for the DNA methyl transferase DNMT3b, which methylates the critical CpG in the upstream promoter element. Initiation by Pol III Unlike those of protein-coding genes and pre-rRNA genes, the promoter regions of tRNA and 5S-rRNA genes lie entirely within the transcribed sequence (Figure 9-52a, b). Two such internal promoter elements, termed the A box and the B box, are present in all tRNA genes. These highly conserved sequences not only function as promoters, but also encode two invariant portions of eukaryotic tRNAs that are required for protein synthesis. In 5S-rRNA genes, a single internal control region, the C box, acts as a promoter. Three general transcription factors are required for PolIII to initiate transcription of tRNA and 5S-rRNA genes in vitro. Two multimeric factors, TFIIIC and TFIIIB, participate in initiation at both tRNA and 5S-rRNA promoters; a third factor, TFIIIA, is required for initiation at 5S-rRNA promoters. As with assembly of Pol I and Pol II initiation complexes, the Pol III general transcription factors bind to promoter DNA in a defined sequence. The N-terminal half of one TFIIIB subunit, called BRF (for TFIIB-related factor), is similar in sequence to TFIIB (a Pol II factor). This similarity suggests that BRF and TFIIB perform a similar function in initiation, namely, to assist in separating the template DNA strands at the transcription start site. Once TFIIIB has bound to either a tRNA or a 5SrRNA gene, Pol III can bind and initiate transcription in the presence of ribonucleoside triphosphates. The BRF subunit of TFIIIB interacts specifically with one of the polymerase subunits unique to Pol III, accounting for initiation by this specific nuclear RNA polymerase. Another of the three subunits composing TFIIIB is TBP, which we can now see is a component of a general transcription factor for all three eukaryotic nuclear RNA polymerases. The finding that TBP participates in transcription initiation by Pol I and Pol III was surprising, since the promoters recognized by these enzymes often do not contain TATA boxes. Nonetheless, in the case of Pol III transcription, the TBP subunit of TFIIIB interacts with DNA about 30 bp upstream of the transcription start site similarly to the way it interacts with TATA boxes.
Pol III also transcribes genes for small stable RNAs with upstream promoters containing a TATA box. One example is the gene for U6 snRNA, which is involved in pre-mRNA splicing, as discussed in Chapter 10. In mammals, this gene contains an upstream promoter element called the PSE in addition to the TATA box (Figure 9-52c). The PSE is bound by a multisubunit complex called SNAPC, while the TATA box is bound by the TBP subunit of a specialized form of TFIIIB containing an alternative BRF subunit. MAF1 is a specific inhibitor of Pol III transcription that functions by interacting with the BRF subunit of TFIIIB and with Pol III. Its function is regulated by control of its import from the cytoplasm into the nucleus by phosphorylations at specific sites in response to signal transduction protein kinase cascades that respond to cell stress and nutrient deprivation (a)
TFIIIC
TFIIIB tRNA gene A
B
Pol III (b) TFIIIC TFIIIB
TFIIIA
5S-rRNA gene C
Pol III
(c) Pol III
SNAPC U6 snRNA gene
PSE
IIIB-like
TATA
FIGURE 952 Transcription-control elements in genes transcribed by RNA polymerase III. Both tRNA (a) and 5S-rRNA (b) genes contain internal promoter elements (yellow) located downstream from the start site and named A, B, and C boxes, as indicated. Assembly of transcription initiation complexes on these genes begins with the binding of Pol III–specific general transcription factors TFIIIA, TFIIIB, and TFIIIC to these control elements. Green arrows indicate strong, sequence-specific protein-DNA interactions. Blue arrows indicate interactions between general transcription factors. Purple arrows indicate interactions between general transcription factors and Pol III. (c) Transcription of the U6 snRNA gene in mammals is controlled by an upstream promoter with a TATA box bound by the TBP subunit of a specialized form of TFIIIB with an alternative BRF subunit and an upstream regulatory element called the PSE bound by a multisubunit factor called SNAPC. See L. Schramm and N. Hernandez, 2002, Gene Dev. 16:2593. 9.8 Other Eukaryotic Transcription Systems
413
(see Chapters 16 and 24). In mammals, Pol III transcription is also repressed by the critical tumor suppressors p53 and the retinoblastoma (Rb) family. In humans, there are two genes encoding RNA polymerase III subunit RPC32. One of these is expressed specifically in replicating cells, and its forced expression can contribute to oncogenic transformation of cultured human fibroblasts.
KEY CONCEPTS OF SECTION 9.8
Other Eukaryotic Transcription Systems r The process of transcription initiation by Pol I and PolIII is similar to that by Pol II but requires different general transcription factors, is directed by different promoter elements, and does not require hydrolysis of ATP β-γ phosphodiester bonds to separate the DNA strands at the start site as Pol II transcription does. r Pol I transcribes only a single RNA, the 45S precursor of 18S, 5.8S, and 28S rRNA, from multiple copies of the prerRNA gene. r Pol III transcribes tRNAs from promoters within the genes that encode the tRNA regions common to all tRNAs. This internal promoter is bound by transcription factor TFIIIC, which in turn binds TFIIIB, a multisubunit factor that includes the TATA box–binding protein, TBP, which associates with the tRNA gene about 30 bp upstream of the transcription start site. r Pol III transcribes 5s rRNA directed by a promoter within the 5S-rRNA coding region that is bound by transcription factor TFIIIA. TFIIIA then associates with TFIIIC and TFIIIB, which interact with Pol III in a manner similar to their interactions in tRNA transcription. r Additional small stable RNAs, several with as yet unknown functions, are transcribed by Pol III as directed by TBP-containing transcription factors that bind immediately upstream of the genes (see Figure 9-52). r Pol III transcription is regulated by a specific inhibitor, MAF1, whose transport from the cytoplasm into the nucleus is controlled in response to nutrient availability.
Visit LaunchPad to access study tools and to learn more about the content in this chapter.
t t t t
414
1FSTQFDUJWFTGPSUIF'VUVSF "OBMZ[FUIF%BUB &YUFOEFE3FGFSFODFT "EEJUJPOBMTUVEZUPPMT JODMVEJOHWJEFPT BOJNBUJPOT BOE RVJ[[FT
CHAPTER 9
t Transcriptional Control of Gene Expression
Key Terms activation domain 382 activators 354 antitermination factor 377 bromodomain 394 carboxy-terminal domain (CTD) 370 chromatin-mediated repression 390 chromodomain 407 co-activator 386 co-repressor 393 DNase I footprinting 380 enhanceosome 388 enhancers 359 general transcription factors 373 heat-shock genes 402 histone deacetylation 393
leucine zipper 386 MAT locus (in yeast) 391 Mediator 390 nuclear receptors 386 promoter 364 promoter-proximal elements 378 repression domain 384 repressors 354 RNA polymerase II 367 silencer sequences 391 specific transcription factors 380 TATA box 371 TATA box–binding protein (TBP) 374 upstream activating sequence (UAS) 380 zinc finger 385
Review the Concepts 1. Describe the molecular events that occur at the lac operon when E. coli cells are shifted from a glucose-containing medium to a lactose-containing medium. 2. The concentration of free glutamine affects transcription of the enzyme glutamine synthetase in E. coli. Describe the mechanism of this effect. 3. Recall that the trp repressor binds to a site in the operator region of tryptophan-producing genes when tryptophan is abundant, thereby preventing transcription. What would happen to the expression of the tryptophan biosynthetic enzyme genes in the following scenarios? Fill in the blanks with one of the following phrases: never be expressed/always (constitutively) be expressed a. The cell produces a mutant trp repressor that cannot bind to the operator. The enzyme genes will ________________. b. The cell produces a mutant trp repressor that binds to its operator site even if no tryptophan is present. The enzyme genes will ________________. c. The cell produces a mutant sigma factor that cannot bind the promoter region. The enzyme genes will ________________. d. Elongation of the leader sequence is always stalled after transcription of region 1. The enzyme genes will ________________. 4. Compare and contrast bacterial and eukaryotic gene expression mechanisms.
5. What types of genes are transcribed by RNA polymerases I, II, and III? Design an experiment to determine whether a specific gene is transcribed by RNA polymerase II. 6. The CTD of the largest subunit of RNA polymerase II can be phosphorylated at multiple serine residues. What are the conditions that lead to the phosphorylated versus nonphosphorylated RNA polymerase II CTD? 7. What do TATA boxes, initiators, and CpG islands have in common? Which was the first of these to be identified? Why? 8. Describe the methods used to identify the location of transcription-control elements in promoter-proximal regions of genes. 9. What is the difference between a promoter-proximal element and a distal enhancer? What are the similarities? 10. Describe the methods used to identify the location of DNA-binding proteins in the regulatory regions of genes. 11. Describe the structural features of transcription activator and repressor proteins. 12. Give two examples of how gene expression may be repressed without altering the coding sequence. 13. Using CREB and nuclear receptors as examples, compare and contrast the structural changes that take place when these transcription factors bind to their coactivators. 14. What general transcription factors associate with an RNA polymerase II promoter in addition to the polymerase? In what order do they bind in vitro? What structural change occurs in the DNA when an “open” transcription initiation complex is formed? 15. Expression of recombinant proteins in yeast is an important tool for biotechnology companies that produce new drugs for human use. In an attempt to get a new gene X expressed in yeast, a researcher has integrated gene X into the yeast genome near a telomere. Will this strategy result in good expression of gene X? Why or why not? Would the outcome of this experiment differ if the experiment had been performed in a yeast line containing mutations in the H3 or H4 histone tails? 16. You have isolated a new protein called STICKY. You can predict from comparisons with other known proteins that STICKY contains a bHLH domain and a Sin3-interacting domain. Predict the function of STICKY and explain the importance of these domains in STICKY function. 17. Prokaryotes and lower eukaryotes such as yeast have transcription-control elements called upstream activating sequences. What are the comparable sequences found in higher eukaryotic species? 18. You are curious to identify the region of the gene X sequence that serves as an enhancer for gene expression. Design an experiment to investigate this issue. 19. Some organisms have mechanisms in place that will override transcription termination. One such mechanism
using the Tat protein is employed by the HIV retrovirus. Explain why Tat is therefore a good target for HIV vaccination. 20. Upon identification of the DNA regulatory sequence responsible for translating a given gene, you note that it is enriched with CG sequences. Is the corresponding gene likely to be a highly expressed transcript? 21. Name four major classes of DNA-binding proteins that are responsible for controlling transcription, and describe their structural features.
References Control of Gene Expression in Bacteria Bush, M., and R. Dixon. 2012. The role of bacterial enhancer binding proteins as specialized activators of σ54-dependent transcription. Microbiol. Mol. Biol. R. 76:497–529. Casino, P., V. Rubio, and A. Marina. 2010. The mechanism of signal transduction by two-component systems. Curr. Opin. Struc. Biol. 20:763–771. Fürtig, B., et al. 2015. Multiple conformational states of riboswitches fine-tune gene regulation. Curr. Opin. Struc. Biol. 30:112–124. Muller-Hill, B. 1998. Some repressors of bacterial transcription. Curr. Opin. Microbiol. 1:145–151.
Overview of Eukaryotic Gene Control Djebali, S., et al. 2012. Landscape of transcription in human cells. Nature 489:101–108. Kellis, M., et al. 2014. Defining functional DNA elements in the human genome. P. Natl. Acad. Sci. USA 111:6131–6138.
RNA Polymerase II Promoters and General Transcription Factors Sainsbury, S., C. Bernecky, and P. Cramer. 2015. Structural basis of transcription initiation by RNA polymerase II. Nat. Rev. Mol. Cell Biol. 16:129–143.
Regulatory Sequences in Protein-Coding Genes and the Proteins Through Which They Function de Wit, E., and W. de Laat. 2012. A decade of 3C technologies: insights into nuclear organization. Genes Dev. 26:11–24. ENCODE Project Consortium. 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74. Vaquerizas, J. M., et al. 2009. A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet. 10:252–263.
Molecular Mechanisms of Transcription Repression and Activation Berger, S. L. 2007. The complex language of chromatin regulation during transcription. Nature 447:407–412. Malladi, V. S., et al. 2015. Ontology application and use at the ENCODE DCC. Database (Oxford). doi: 10.1093/database/bav010. Plaschka, C., et al. 2015. Architecture of the RNA polymeraseII-Mediator core initiation complex. Nature 518:376–380. Rothbart, S. B., and B. D. Strahl. 2014. Interpreting the language of histone and DNA modifications. Biochim. Biophys. Acta 1839:627–643. Zaret, K. S., and J. S. Carroll. 2011. Pioneer transcription factors: establishing competence for gene expression. Genes Dev. 25:2227–2241.
References
415
Regulation of Transcription-Factor Activity Kirschke, E., et al. 2014. Glucocorticoid receptor function regulated by coordinated action of the Hsp90 and Hsp70 chaperone cycles. Cell 157:1685–1697.
Epigenetic Regulation of Transcription Derrien, T., et al. 2012. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22:1775–1789. Gendrel, A. V., and E. Heard. 2014. Noncoding RNAs and epigenetic mechanisms during X-chromosome inactivation. Annu. Rev. Cell Dev. Biol. 30:561–580.
416
CHAPTER 9
t Transcriptional Control of Gene Expression
Klose, R. J., and A. P. Bird. 2006. Genomic DNA methylation: the mark and its mediators. Trends Biochem. Sci. 31:89–97. McHugh, C. A., et al. 2015. The Xist lncRNA interacts directly with SHARP to silence transcription through HDAC3. Nature 521:232–236.
Other Eukaryotic Transcription Systems Moir, R. D., and I. M. Willis. 2015. Regulating maf1 expression and its expanding biological functions. PLoS Genet. 11:e1004896.
CHAPTER
10 Posttranscriptional Gene Control Portion of a “lampbrush chromosome” from an oocyte of the newt Nophthalmus viridescens. The hnRNP protein associated with nascent RNA transcripts fluoresces red after staining with a monoclonal antibody. [Courtesy of M. Roth and J. Gall.]
In the previous chapter, we saw that most genes are regulated at the first step in gene expression, transcription, by regulation of the assembly of the transcription preinitiation complex on a promoter DNA sequence and of transcription elongation in the promoter-proximal region. Once transcription has been initiated, synthesis of the encoded RNA requires that RNA polymerase transcribe the entire gene and not terminate transcription prematurely. Moreover, the initial primary transcripts produced from eukaryotic genes must undergo various processing reactions to yield the corresponding functional RNAs. For mRNAs, the 5′ cap structure necessary for translation must be added (see Figure5-14), introns must be spliced out of pre-mRNAs, and the 3′ end must be polyadenylated (see Figure 5-15). Once formed in the nucleus, mature, functional RNAs are exported to the cytoplasm as components of ribonucleoproteins. Both the processing of RNAs and their export from the nucleus offeropportunities for further regulation of gene expression after the initiation of transcription.
Recently, the vast amount of sequence data on human mRNAs expressed in different tissues and at various times during embryogenesis and cellular differentiation has revealed that some 95 percent of human genes give rise to alternatively spliced mRNAs. These alternatively spliced mRNAs encode related proteins with differences in their sequences that are limited to specific functional domains. In many cases, alternative RNA splicing is regulated to meet the need for a specific protein isoform in a specific cell type. Given the complexity of pre-mRNA splicing, it is not surprising that mistakes are occasionally made, giving rise to mRNA precursors with improperly spliced exons. However, eukaryotic cells have evolved RNA surveillance mechanisms that prevent the export of incorrectly processed RNAs to the cytoplasm or lead to their degradation if they are exported. Additional control of gene expression can occur in the cytoplasm. In the case of protein-coding genes, for instance, the amount of protein produced depends on the stability of the corresponding mRNAs in the cytoplasm and the
OU TL I N E 10.1
Processing of Eukaryotic Pre-mRNA
10.2
Regulation of Pre-mRNA Processing
10.3
Transport of mRNA Across the Nuclear Envelope
10.4
Cytoplasmic Mechanisms of Post-transcriptional Control
10.5
Processing of rRNA and tRNA
Nucleolus
DNA
Pol III
Pol II
Base modification, cleavage, and ribosomal subunit synthesis in nucleolus
A
RN
-m Pre
1
2
Pre-rRNA transcription
5S rRNA
Pre-mRNA transcription
Cap
Pol I
Pre-tRNA transcription
Pre-mRNA splicing
Excised pre-rRNA 8b
Cleavage/ polyadenylation
A
AA
AA
Pre-tRNA processing
8a Improperly processed mRNA
Correctly processed mRNA
Exosome 9
3
Excised pre-tRNA
mRNA export
Nucleus
tRNA export
Ribosome export
Cytoplasm AAAAA
6
Cytoplasmic poly(A) polymerase
Decapping enzyme
Cytoplasmic polyadenylation
miRNA
A
7
miRNA translation inhibition
AAAA
Translation initiation
FIGURE 101 Overview of RNA processing and posttranscriptional gene control. Nearly all cytoplasmic RNAs are processed from primary transcripts in the nucleus before they are exported to the cytoplasm. For protein-coding genes transcribed by RNA polymerase II, gene control can be exerted through step 1 the choice of alternative exons during pre-mRNA splicing and step 2 the choice of alternative poly(A) sites. Improperly processed mRNAs are blocked from export to the cytoplasm and degraded step 3 by a large complex called the exosome that contains multiple ribonucleases. Once the mRNA has been exported to the cytoplasm, step 4 translation initiation factors bind to the 5′ cap cooperatively with poly(A)-binding protein I bound to the poly(A) tail and initiate translation (see Figure 4-28). Step 5 mRNA is degraded in the cytoplasm by deadenylation and decapping followed by degradation by cytoplasmic exosomes. CHAPTER 10
Deadenylase 5
AAAAA
418
Cytoplasmic exosome
P body
4
t Post-transcriptional Gene Control
AAAAA
Cytoplasmic deadenylation
These processes occur rapidly in dense regions of the cytoplasm called P bodies that function in translational repression. The degradation rate of each mRNA is controlled, thereby regulating the mRNA concentration and, consequently, the amount of protein translated. Some mRNAs are synthesized without long poly(A) tails. Their translation is regulated by step 6 control of the synthesis of a long poly(A) tail by a cytoplasmic poly(A) polymerase. Step 7 Translation is also regulated by other mechanisms, including miRNAs. When expressed, these ~22-nucleotide RNAs inhibit translation of mRNAs to which they hybridize, usually in the 3′ untranslated region. tRNAs and rRNAs are also synthesized as precursor RNAs that must be step 8 processed before they are functional. Regions of precursors cleaved from the mature RNAs are degraded by nuclear exosomes step 9 . See Houseley et al., 2006, Nat. Rev. Mol. Cell Biol. 7:529.
rate oftheir translation. For example, during an immune response, lymphocytes communicate by secreting polypeptide hormones called cytokines that signal neighboring lymphocytes through cytokine receptors that span their plasma membranes (see Chapter 23). It is important for lymphocytes to synthesize and secrete cytokines in short bursts. This is possible because cytokine mRNAs are extremely unstable; consequently, the concentration of these mRNAs in the cytoplasm falls rapidly once their synthesis is stopped. In contrast, mRNAs encoding proteins required in large amounts that function over long periods, such as ribosomal proteins, are extremely stable, so that multiple polypeptides are transcribed from each mRNA. Just as pre-mRNA processing, nuclear export, and translation are regulated, so is the cellular localization of many, if not most, mRNAs, so that newly synthesized protein is concentrated where it is needed. Particularly striking examples of this type of regulation occur in the nervous systems of multicellular animals. Some neurons in the human brain generate more than a thousand separate synapses with other neurons. During the process of learning, synapses that fire more frequently than others increase in size many times, while other synapses made by the same neuron do not. This can occur because mRNAs encoding proteins critical for synapse enlargement are stored at all synapses, but translation of these localized, stored mRNAs is regulated at each synapse independently by the frequency at which the synapse signals. In this way, synthesis of synapse-associated proteins can be regulated independently at each of the many synapses made by the same neuron (see Chapter 22). Another type of gene regulation involves micro-RNAs (miRNAs), which regulate the translation and stability of specific target mRNAs in multicellular animals and plants. Analyses of these short miRNAs in various human tissues indicate that about 1900 miRNAs are expressed in the multiple types of human cells. Although some have recently been discovered to function through inhibition of target-gene expression in the appropriate tissue and at the appropriate time in development, the functions of the vast majority of human miRNAs are unknown and are the subject of a growing new area of research. If most miRNAs do indeed have significant functions, miRNA genes constitute an important subset of the 25,000 or so human genes. A closely related process, called RNA interference (RNAi), leads to the degradation of viral RNAs in infected cells and the degradation of transposon-encoded RNAs in many eukaryotes. This discovery is of tremendous significance to biological researchers because it is possible to design short interfering RNAs (siRNAs) to inhibit the translation of specific mRNAs experimentally by a process called RNA knockdown. This method makes it possible to inhibit the function of any desired gene, even in organisms that are not amenable to classical genetic methods for isolating mutants. We refer to all the mechanisms that regulate gene expression following transcription as post-transcriptional gene control (Figure 10-1). Because the stability and translation rate of an mRNA contribute to the amount of protein expressed from a gene, these post-transcriptional processes are important components of gene control. Indeed, the protein
output of a gene is regulated at every step in the life of an mRNA, from the initiation of its synthesis to its degradation. Thus genetic regulatory processes act on RNA as well as on DNA. In this chapter, we consider the events in the processing of mRNA that follow transcription initiation and promoter-proximal elongation as well as the various mechanisms that are known to regulate these events. In the last section, we briefly discuss the processing of primary transcripts produced from genes encoding rRNAs and tRNAs.
10.1 Processing of Eukaryotic Pre-mRNA In this section, we take a closer look at how eukaryotic cells convert the initial primary transcript synthesized by RNA polymerase II into a functional mRNA. Three major events occur during the process: 5′ capping, 3′ cleavage and polyadenylation, and RNA splicing (Figure 10-2). Adding these specific modifications to the 5′ and 3′ ends of the pre-mRNA protects it from enzymes that quickly digest uncapped RNAs generated by RNA processing, such as spliced-out introns and RNA transcribed downstream from a polyadenylation site. Thus the 5′ cap and 3′ poly(A) tail distinguish pre-mRNA molecules from the many other kinds of RNAs in the nucleus (Table 10-1). Pre-mRNA molecules are bound by nuclear proteins that function in mRNA export to the cytoplasm. Prior to nuclear export, introns must be removed to generate the correct coding region of the mRNA. In higher eukaryotes, including humans, alternative splicing is intricately regulated in order to substitute different functional domains into proteins, producing a considerable expansion of the proteome of these organisms. The pre-mRNA processing events of capping, polyadenylation, and splicing occur in the nucleus as the nascent mRNA precursor is being transcribed. Thus pre-mRNA processing is co-transcriptional. As the RNA emerges from the surface of RNA polymerase II, its 5′ end is immediately modified by the addition of the 5′ cap structure found on all mRNAs (see Figure 5-14). As the nascent pre-mRNA continues to emerge from the surface of the polymerase, it is immediately bound by members of a complex group of RNA-binding proteins that assist in RNA splicing and export of the fully processed mRNA through nuclear pore complexes into the cytoplasm. Some of these proteins remain associated with the mRNA in the cytoplasm, but most either remain in the nucleus or shuttle back into the nucleus shortly after the mRNA is exported to the cytoplasm. Cytoplasmic RNA-binding proteins are exchanged for the nuclear ones. Consequently, mRNAs never occur as free RNA molecules in the cell, but are always associated with proteins as ribonucleoprotein (RNP) complexes, first as nascent pre-mRNPs that are capped and spliced as they are transcribed. Then, following cleavage and polyadenylation, they are referred to as nuclear mRNPs. Following the exchange of proteins that accompanies export to the cytoplasm, they are called cytoplasmic mRNPs. Although we frequently refer to premRNAs and mRNAs, it is important to remember that they are always associated with proteins as RNP complexes. 10.1 Processing of Eukaryotic Pre-mRNA
419
Poly(A) Termination site sites
Exon Intron DNA
1 Transcription, 5 capping
Cap
3
5 Endonuclease
2 Cleavage at poly(A) site
5
3
Poly(A) polymerase (PAP) ATP
3 Polyadenylation
5
A ~250 3 4 RNA splicing mRNA 5
FIGURE 102 Overview of mRNA processing in eukaryotes. Shortly after RNA polymerase II initiates transcription at the first nucleotide of the first exon of a gene, the 5′ end of the nascent RNA is capped with 7-methylguanylate (step 1 ). Transcription by RNA polymerase II terminates at any one of multiple termination sites downstream from the poly(A) site, which is located at the 3′ end of the final exon. After the primary transcript is cleaved at the poly(A) site (step 2 ), a string of adenosine (A) residues is added (step 3 ). The poly(A)
The 5′ Cap Is Added to Nascent RNAs Shortly After Transcription Initiation As a nascent eukaryotic RNA transcript emerges from the RNA exit channel of RNA polymerase II (see Figure 9-12) and reaches a length of about 25 nucleotides, a protective cap composed of 7-methylguanosine and methylated riboses
A ~250 3
Pre-mRNA processing
Primary RNA transcript
tail contains ~250 A residues in mammals, ~150 in insects, and ~100 in yeasts. For short primary transcripts with few introns, splicing (step 4 ) usually follows cleavage and polyadenylation, as shown. For large genes with multiple introns, introns are often spliced out of the nascent RNA during its transcription, before transcription of the gene is complete. Note that the 5′ cap and the sequence adjacent to the poly(A) tail are retained in mature mRNAs. The diagram shown represents processing of human β-globin RNA.
is added to the 5′ end of the mRNA (see Figure 5-14). This 5′ cap marks RNA molecules as mRNA precursors and protects them from RNA-digesting enzymes (5′-exoribonucleases) in the nucleus and cytoplasm. This initial step in RNA processing is catalyzed by a dimeric capping enzyme, which associates with the phosphorylated carboxy-terminal domain (CTD) of RNA polymerase II. Recall that the TFIIH general
TABLE 101
RNAs Discussed in Chapter 10
mRNA
Fully processed messenger RNA with 5′ cap, introns removed by RNA splicing, and a poly(A) tail.
pre-mRNA
An mRNA precursor containing introns and not cleaved at the poly(A) site.
hnRNA
Heterogeneous nuclear RNAs. These RNAs include pre-mRNAs and RNA-processing intermediates containing one or more introns.
snRNA
Five small nuclear RNAs that function in the removal of introns from pre-mRNAs by RNA splicing, plus two small nuclear RNAs that substitute for the first two at rare introns.
pre-tRNA
A tRNA precursor containing additional transcribed bases at the 5′ and 3′ ends compared with the mature tRNA. Some pre-tRNAs also contain an intron in the anticodon loop.
pre-rRNA
The precursor to mature 18S, 5.8S, and 28S ribosomal RNAs. The mature rRNAs are processed from this long precursor RNA molecule by cleavage, removal of bases from the ends of the cleaved products, and modification of specific bases.
snoRNA
Small nucleolar RNAs. These RNAs base-pair with complementary regions of the pre-rRNA molecule, directing cleavage of the RNA chain and modification of bases during maturation of the rRNAs.
siRNA
Short interfering RNAs, ~22 bases long, that are each perfectly complementary to a sequence in an mRNA. Together with associated proteins, siRNAs cause cleavage of the “target” RNA, leading to its rapid degradation.
miRNA
Micro-RNAs, ~22 bases long, that base-pair extensively, but not completely, with mRNAs, especially over bases 2 to 7 at the 5′ end of the miRNA (the “seed” sequence). This pairing inhibits translation of the “target” mRNA and targets it for degradation.
420
CHAPTER 10
t Post-transcriptional Gene Control
transcription factor phosphorylates the CTD multiple times on serine 5 of the CTD heptapeptide repeat during transcription initiation (see Figure 19-20). Binding of the capping enzyme to the serine 5–phosphorylated CTD stimulates the activity of the enzyme so that it is focused on RNAs containing a 5′ triphosphate that emerge from RNA polymerase II, and not on RNAs transcribed by RNA polymerases I or III, which do not have a CTD. This is important because pre-mRNA synthesis accounts for only about 80 percent of the total RNA synthesized in replicating cells. About 20 percent is preribosomal RNA, which is transcribed by RNA polymerase I, and 5S rRNA, tRNAs, and other small stable RNAs, which are transcribed by RNA polymerase III. These two mechanisms, (1) binding of the capping enzyme to RNA polymerase II specifically through its unique CTD phosphorylated on serine 5 of the heptapeptide repeat during transcription initiation by TFIIH, and (2) activation of the capping enzyme by the serine 5–phosphorylated CTD, result in specific capping of RNAs transcribed by RNA polymerase II. One subunit of the capping enzyme removes the γ phosphate from the 5′ end of the nascent RNA (Figure 10-3). Another domain of this subunit transfers the GMP moiety from GTP to the 5′ diphosphate of the nascent transcript,
creating the unusual guanosine 5′-5′ triphosphate structure. In the final steps, separate enzymes transfer methyl groups from S-adenosylmethionine to the N7 position of the guanine and to the 2′ oxygens of riboses of the first one or two nucleotides at the 5′ end of the nascent RNA. Considerable evidence indicates that capping of the nascent transcript is coupled to elongation by RNA polymerase II so that all of its transcripts are capped during the earliest phase of elongation. As discussed in Chapter 9, in metazoans, during the initial phase of transcription, the polymerase elongates the nascent transcript very slowly due to the association of NELF (negative elongation factor) with RNA polymerase II in the promoter-proximal region (see Figure 9-21). Once the 5′ end of the nascent RNA is capped, phosphorylation of the RNA polymerase CTD at serine 2 in the heptapeptide repeat and of NELF and DSIF (DRB-sensitivity-inducing factor) by the cyclin T–CDK9 protein kinase (also known as P-TEFb) causes the release of NELF. (DRB is an analog of ATP that inhibits CDK9, preventing transcription elongation from the promoter-proximal region.) This allows RNA polymerase II to enter into a faster mode of elongation that rapidly transcribes away from the promoter. The net effect of this mechanism is that the polymerase waits for the nascent RNA to be capped before elongating at a rapid rate.
5 end of RNA γ β α P P P N
A Diverse Set of Proteins with Conserved RNABinding Domains Associate with Pre-mRNAs
Pre-mRNA
Phosphohydrolase
γ P
α β γ β α GP P P + P P N
Pre-mRNA
GTP β γ P P
Guanylyl transferase
GP P P N
Pre-mRNA NH2
Guanine-7-methyl transferase
+CH3 from S-Ado-Met
m7G P P P N 2 -O-methyl transferase
Pre-mRNA +CH3 from S-Ado-Met
N
N
H3C ⴙ S
ⴚOOC ⴙ NH3
N
N O
OH OH
m7G P P P Nm
Pre-mRNA
FIGURE 103 Synthesis of the 5′ cap on eukaryotic mRNAs. The 5′ end of a nascent RNA contains a 5′ triphosphate from the initiating rNTP. The γ phosphate is removed in the first step of capping, while the remaining α and β phosphates (orange) remain associated with the cap. The third phosphate of the 5′,5′ triphosphate bond is derived from the α phosphate of the GTP that donates the guanine. The methyl donor for methylation of the cap guanine and the first one or two riboses of the mRNA is S-adenosylmethionine (S-Ado-Met). See S. Venkatesan and B. Moss, 1982, Proc. Natl. Acad. Sci. USA 79:340.
As noted earlier, neither nascent RNA transcripts of proteincoding genes nor the intermediates of mRNA processing, collectively referred to as pre-mRNA, exist as free RNA molecules in the nuclei of eukaryotic cells. From the time nascent transcripts first emerge from RNA polymerase II until mature mRNAs are transported into the cytoplasm, the RNA molecules are associated with an abundant set of nuclear proteins. These proteins are the major protein components of heterogeneous ribonucleoprotein particles (hnRNPs), which contain heterogeneous nuclear RNA (hnRNA), a collective term referring to pre-mRNA and other nuclear RNAs of various sizes. These hnRNP proteins contribute to further steps in RNA processing, including splicing, polyadenylation, and export through nuclear pore complexes to the cytoplasm. Researchers identified hnRNP proteins by first exposing cultured cells to high-dose UV irradiation, which causes covalent cross-links to form between RNA bases and closely associated proteins. Chromatography of nuclear extracts from treated cells on an oligo-dT cellulose column, which binds RNAs with a poly(A) tail, was used to recover the proteins that had become cross-linked to nuclear polyadenylated RNA. Subsequent treatment of cell extracts from nonirradiated cells with monoclonal antibodies specific for the major proteins identified by this cross-linking technique revealed a complex set of abundant hnRNP proteins ranging in size from 30 to 120 kDa. Like transcription factors, most hnRNP proteins have a modular structure. They contain one or more RNA-binding
10.1 Processing of Eukaryotic Pre-mRNA
421
domains and at least one other domain that interacts with other proteins. Several different RNA-binding motifs have been identified by creating hnRNP proteins with missing amino acid sequences and testing their ability to bind RNA. Functions of hnRNP Proteins The association of pre-mRNAs with hnRNP proteins prevents the pre-mRNAs from forming short secondary structures by base pairing of complementary regions, thereby making the pre-mRNAs accessible for interaction with other RNA molecules or proteins. Pre-mRNAs associated with hnRNP proteins present a more uniform substrate for subsequent processing steps than would free, unbound pre-mRNAs, each of which would form a unique secondary structure due to its specific sequence. Binding studies with purified hnRNP proteins indicate that different hnRNP proteins associate with different regions of a newly made pre-mRNA molecule. For example, the hnRNP proteins A1, C, and D bind preferentially to the pyrimidine-rich sequences at the 3′ ends of introns (see Figure 10-7 below). Some hnRNP proteins interact with the RNA sequences that specify RNA splicing or cleavage/ polyadenylation and contribute to the structure recognized by RNA-processing factors. Finally, cell-fusion experiments have shown that some hnRNP proteins remain localized in the nucleus, whereas others cycle in and out of the cytoplasm, suggesting that they function in the export of mRNA from the nucleus to the cytoplasm (Figure 10-4). Conserved RNA-Binding Motifs The RNA recognition motif (RRM), also called the RNP motif and the RNA-binding domain (RBD), is the most common RNA-binding domain
(b)
(a)
FIGURE 104 Human hnRNP A1 protein can cycle in and out of the nucleus, but human hnRNP C protein cannot. Cultured HeLa cells and Xenopus cells were fused by treatment with polyethylene glycol, producing heterokaryons containing nuclei from each cell type. These hybrid cells were treated with cycloheximide immediately after fusion to prevent protein synthesis. After 2 hours, the cells were fixed and stained with fluorescent-labeled antibodies specific for human hnRNP C and A1 proteins. These antibodies do not bind to the homologous Xenopus proteins. (a) A fixed preparation viewed by phase-contrast microscopy includes unfused HeLa cells (arrowhead) and Xenopus cells (dotted arrow), as well as fused heterokaryons (solid arrow). In the heterokaryon in this micrograph, the round HeLa-cell nucleus is to the right of the oval-shaped Xenopus nucleus. (b, c) When the same preparation 422
CHAPTER 10
t Post-transcriptional Gene Control
in hnRNP proteins. This 80-residue domain, which occurs in many other RNA-binding proteins as well, contains two highly conserved sequences (RNP1 and RNP2) that are found across organisms ranging from yeast to humans—indicating that, like many DNA-binding domains, it evolved early in eukaryotic evolution. Structural analyses have shown that the RRM domain consists of a four-stranded β sheet flanked on one side by two α helices. To interact with the negatively charged RNA phosphates, the β sheet forms a positively charged surface. The conserved RNP1 and RNP2 sequences lie side by side on the two central β strands, and their side chains make multiple contacts with a single-stranded region of RNA that lies across the surface of the β sheet (Figure 10-5). The 45-residue KH motif is found in the hnRNP Kprotein and several other RNA-binding proteins. The three-dimensional structure of representative KH domains is similar to that of the RRM domain but smaller, consisting of a three-stranded β sheet supported from one side by two α helices. Nonetheless, the KH domain interacts with RNA much differently than does the RRM domain. RNA binds to the KH domain by interacting with a hydrophobic surface formed by the α helices and one β strand. The RGG box, another RNA-binding motif found in hnRNP proteins, contains five Arg-Gly-Gly (RGG) repeats with several interspersed aromatic amino acids. A recent structural analysis indicates that in one example of RNA binding, an RGG-containing peptide binds in the major groove of a G-rich RNA duplex region (see Figure 5-4b). KH domains and RGG repeats are often interspersed in two or more sets in a single RNA-binding protein.
(c)
was viewed by fluorescence microscopy, the stained hnRNP C protein appeared green and the stained hnRNP A1protein appeared red. Note that the unfused Xenopus cell on the left is unstained, confirming that the antibodies are specific for the human proteins. In the heterokaryon, hnRNP C protein appears only in the HeLa-cell nucleus (b), whereas the A1 protein appears in both the HeLa-cell nucleus and the Xenopus nucleus (c). Since protein synthesis was blocked after cell fusion, some of the human hnRNP A1 protein must have left the HeLa-cell nucleus, moved through the cytoplasm, and entered the Xenopus nucleus in the heterokaryon. [Reprinted by permission of Nature Publishing Group, from: Piñol-Roma S., and Dreyfuss, G., “Shuttling of pre-mRNA binding proteins between nucleus and cytoplasm,” Nature, 1992, 355(6362):730–2; permission conveyed through the Copyright Clearance Center, Inc.]
(a) RNA recognition motif (RRM)
(b) Sex-lethal (Sxl) RRM domains 5
5 Pre-mRNA 3 β3
β1
(c) Polypyrimidine tract binding protein (PTB)
3
RRM2 3
β4
RRM3
RN
p(Y)– tract
P2
RN
5
5
P1
RRM4
3 β2
3 RRM1 5
FIGURE 105 Structure of the RRM domain and its interaction with RNA. (a) Ribbon diagram of the RRM domain found in hnRNP proteins, showing the two α helices (green) and four β strands (red) that characterize this motif. The conserved RNP1 and RNP2 regions are located in the two central β strands. (b, c) Ribbon diagram and surface representation of the two RRM domains in Drosophila Sex-lethal (Sxl) protein (b) and the polypyrimidine tract-binding protein (PTB) (c). In both (b) and (c), positively charged regions are shown in shades of blue; negatively charged regions, in shades of red; RNA is yellow. The two RRMs in Sxl are oriented like the two parts of an open pair of castanets, with the β sheets of the RRMs facing toward each other. The pre-mRNA is bound to the surfaces of the positively charged β sheets, making most of its contacts with the RNP1 and RNP2 regions of each RRM. PTB has a strikingly different orientation of RRM domains,
Splicing Occurs at Short, Conserved Sequences in Pre-mRNAs via Two Transesterification Reactions During the formation of a mature, functional mRNA, the introns are removed and the exons are spliced together. For short transcription units, RNA splicing often follows cleavage and polyadenylation of the 3′ end of the primary transcript, as depicted in Figure 10-2 for the processing of human β-globin mRNA. For long transcription units containing multiple exons, however, splicing of exons in the nascent RNA begins before transcription of the gene is complete. Early pioneering research on the nuclear processing of mRNAs revealed that mRNAs are initially transcribed as molecules that are much longer than the mature mRNAs in the cytoplasm. It was also shown that RNA sequences near the 5′ cap added shortly after transcription initiation are retained in the mature mRNA, and that RNA sequences near the polyadenylated ends of mRNA-processing intermediates are retained in the mature mRNAs in the cytoplasm. The solution to this apparent conundrum came from the discovery of introns by electron microscopy of RNA-DNA hybrids of adenovirus DNA and the mRNA encoding hexon, a major virion capsid protein (Figure 10-6). Other studies revealed nuclear viral RNAs that were colinear with the viral DNA (primary transcripts), and others with one or two of the introns removed (processing intermediates). Together,
3
3 5
illustrating that RRMs are oriented in different relative positions in different hnRNPs. The p(Y)-tract is a polypyrimidine tract. In PTB, the two RRMs associate through their α helices so that the positively charged β sheets face away from each other, upward for RRM3 and downward for RRM4. The structure of CUCUCU single-stranded RNA bound to each of the two RRMs was determined, explaining how PTB can bind to two tracts of six pyrimidines in a single RNA if they are separated by a loop of 15 or more nucleotides. This ability of PTB to form a small loop in a pre-mRNA probably contributes to its ability to function as a splicing repressor at exons where the upstream 3′ splice site or the downstream 5′ splice site is flanked by two polypyrimidine tracts. See K. Nagai et al., 1995, Trends Biochem. Sci. 20:235. [Part (b) data from N. Harada et al., 1999, Nature 398:579, PDB ID 1b7f. Part (c) data from F. C. Oberstrass et al., 2006, Science 309:2054, PDB ID 2adb, 2adc.]
these results led to the realization that introns are removed from primary transcripts as exons are spliced together. The locations of splice sites—that is, exon-intron junctions—in a pre-mRNA can be determined by comparing the sequence of genomic DNA with that of cDNA prepared from the corresponding mRNA (see Figure 6-17). Sequences that are present in the genomic DNA but absent from the cDNA represent introns and indicate the positions of splice sites. Such analyses of a large number of different mRNAs revealed moderately conserved, short consensus sequences at the splice sites flanking introns in eukaryotic pre-mRNAs, including a polypyrimidine tract just upstream of the 3′ splice site (Figure 10-7). Studies of mutant genes with deletions introduced into introns have shown that much of the central portion of an intron can be removed without affecting splicing; generally only 30–40 nucleotides at each end of an intron are necessary for splicing to occur at normal rates. Analysis of the intermediates formed during the splicing of pre-mRNAs in vitro led to the discovery that splicing of exons proceeds via two sequential transesterification reactions (Figure 10-8). Introns are removed as a lariat structure in which the 5′ guanine of the intron is joined in an unusual 2′,5′-phosphodiester bond to an adenosine near the 3′ end of the intron. This A residue is called the branch-point A because it forms an RNA branch in the lariat structure. In each transesterification reaction, one phosphoester bond is 10.1 Processing of Eukaryotic Pre-mRNA
423
(a)
EXPERIMENTAL FIGURE 106 Electron microscopy of mRNA–template DNA hybrids shows that introns are spliced out during pre-mRNA processing. (a) Diagram of the EcoRI A fragment of adenovirus DNA, which extends from the left end of the genome to just before the end of the final exon of the hexon gene. The hexon gene consists of three short exons and one long (~3.5 kb) exon separated by three introns of ~1, 2.5, and 9 kb. (b) Electron micrograph (left) and schematic drawing (right) of a hybrid between an EcoRI A DNA fragment and a hexon mRNA. The loops marked A, B, and C correspond to the introns indicated in (a). Since these intron sequences in the viral genomic DNA are not present in the mature hexon mRNA, they loop out between the exon sequences that hybridize to their complementary sequences in the mRNA. [Micrograph courtesy of Phillip A. Sharp.]
Adenovirus hexon gene
5
A
B
3
C EcoRI A Exons
Introns
1kb
(b) 3 DNA
A
C
5 B mRNA
Branch point
5 splice site 5ⴕ exon Pre-mRNA Frequency of occurrence (%)
Polypyrimidine tract (10–12 b)
3 splice site 3ⴕ exon
Intron
A/C
A
G
70
60
80
G
U
100 100
A/G 95
A
G U
C
U
A /G
A
C /U
70 80 45
80
90
80
100
80
N
C
A
G
G
80 100 100 60
20–50 b
FIGURE 107 Consensus sequences around splice sites in vertebrate pre-mRNAs. The only nearly invariant bases are the 5′GU and the 3′ AG of the intron (blue), although the flanking bases indicated are found at frequencies higher than expected based on a random distribution. A polypyrimidine tract (hatched area) near the 3′end of the intron is found in most introns. The branch-point
adenosine, also invariant, is usually 20–50 bases from the 3′ splice site. The central region of the intron, which may range from 40 bases to 50kilobases in length, is generally unnecessary for splicing to occur. See R. A. Padgett et al., 1986, Annu. Rev. Biochem. 55:1119, and E. B. Keller and W. A. Noon, 1984, Proc. Natl. Acad. Sci. USA 81:7417.
exchanged for another. Since the number of phosphoester bonds in the molecule is not changed in either reaction, no energy is consumed. The net result of these two reactions is that two exons are ligated and the intervening intron is released as a branched lariat structure.
showed that a synthetic oligonucleotide that hybridizes with the 5′-end region of U1 snRNA blocks RNA splicing. In vivo experiments showed that base pairing–disrupting mutations in the pre-mRNA 5′ splice site also block RNA splicing; in this case, however, splicing can be restored by expression of a U1 snRNA with a compensating mutation that restores base pairing to the mutant pre-mRNA 5′ splice site (Figure 10-9b). Involvement of U2 snRNA in splicing was initially suspected when it was found to have an internal sequence that is largely complementary to the consensus sequence flanking the branch point in pre-mRNAs (see Figure10-7). Compensating mutation experiments, similar to those conducted with U1 snRNA and 5′ splice sites, demonstrated that base pairing between U2 snRNA and the branch-point sequence in pre-mRNA is also critical to splicing. Figure 10-9a illustrates the general structures of the U1 and U2 snRNAs and how they base-pair with pre-mRNA during splicing. Significantly, the branch-point A itself, which is not base-paired to U2 snRNA, “bulges out” (Figure 10-10a), which allows its 2′ hydroxyl to participate in the first transesterification reaction of RNA splicing (see Figure10-8).
During Splicing, snRNAs Base-Pair with Pre-mRNA Splicing requires the presence of small nuclear RNAs (snRNAs), which base-pair with the pre-mRNA, and some 170 associated proteins. Five U-rich snRNAs, designated U1, U2, U4, U5, and U6, participate in pre-mRNA splicing. Ranging in length from 107 to 210 nucleotides, these snRNAs are associated with 6–10 proteins each in the many small nuclear ribonucleoprotein particles (snRNPs) in the nuclei of eukaryotic cells. Definitive evidence for the role of U1 snRNA in splicing came from experiments indicating that base pairing between the 5′ splice site of a pre-mRNA and the 5′ region of U1 snRNA is required for RNA splicing (Figure 10-9a). In vitro experiments 424
CHAPTER 10
t Post-transcriptional Gene Control
FIGURE 108 Two transesterification reactions result in the splicing of exons in pre-mRNA. In the first reaction, the ester bond between the 5′ phosphorus of the intron and the 3′ oxygen (dark red) of exon 1 is exchanged for an ester bond with the 2′ oxygen (blue) of the branch-point A residue. In the second reaction, the ester bond between the 5′ phosphorus of exon 2 and the 3′ oxygen (orange) of the intron is exchanged for an ester bond with the 3′ oxygen of exon 1, releasing the intron as a lariat structure and joining the two exons. Arrows show where activated hydroxyl oxygens react with phosphorus atoms.
Intron
2 HO
5 O O 5
P
Exon 1
A O 3
O
O
O 3
O = 3 oxygen of exon 1
P
O
O
Exon 2
3
5 First transesterification
O = 2 oxygen of branch-point A O = 3 oxygen of intron 5 O O
P
A
2 O
O 3
O O 5
O 3
H
P
O
O 5
3
Second transesterification
O 5 O O
P
+ 2 O
5
O
A
O
3
O Spliced exons
3 OH
O
P
Excised lariat intron
FIGURE 109 below Base pairing between pre-mRNA, U1 snRNA, and U2 snRNA early in the splicing process. (a) In this diagram, secondary structures in the snRNAs that are not altered during splicing are depicted schematically. The yeast branch-point sequence is shown here. Note that U2 snRNA base-pairs with a sequence that includes the branch-point A, although this residue is not base-paired. For unknown reasons, antisera from patients with the autoimmune disease systemic lupus erythematosus (SLE) contain antibodies to snRNP proteins, which have been useful in characterizing components of the splicing reaction; the purple rectangles represent sequences that bind snRNP proteins recognized by these anti-Sm antibodies. (b) Only the 5′ ends of U1 snRNAs and 5′ splice sites in pre-mRNAs are shown. (Left) A mutation (A) in a pre-mRNA splice site that interferes with base pairing to the 5′ end of U1 snRNA blocks splicing. (Right) Expression of a U1 snRNA with a compensating mutation (U) that restores base pairing also restores splicing of the mutant pre-mRNA. See M. J. Moore et al., 1993, in R. Gesteland and J. Atkins, eds., The RNA World, Cold Spring Harbor Press, pp. 303–357; see also Y. Zhuang and A. M. Weiner, 1986, Cell 46:827.
(a)
U1 snRNA
Sm 3
Sm GUC CAUUCAUA cap 5
3 5
Exon 1
CAGGUAAGU
Pre-mRNA
U U U U G C U2 snRNA G C C G U U U U A U G C A CUA UCUAAGCA cap 5 A G AUGAUG U Py CAGG Exon 2 UACUAC A Branch point
3
(b) W.-t. U1 snRNA 3 Mutant pre-mRNA 5
Exon 1
GUC CAUUCAUA cap 5
Mutant U1 snRNA 3
CAGGUAAAU
Mutant pre-mRNA 5
Mutation in pre-mRNA 5 splice site blocks splicing
3
GUC CAUUUAUA cap 5 Exon 1
CAGGUAAAU
3
Compensatory mutation in U1 restores splicing
10.1 Processing of Eukaryotic Pre-mRNA
425
(a) Self-complementary sequence with bulging A
(c) Spliceosome structure
A 5ⴕU A C U A C GU A GU A A UG A UG C A U C A U 5ⴕ A (b) X-ray crystallography structure 18.5 Å
A5 (top)
A5 (bottom)
Similar studies with other snRNAs demonstrated that base pairing between the snRNAs themselves also occurs during splicing. Moreover, rearrangements in these RNA-RNA interactions are critical in the splicing pathway. As mentioned above, a synthetic oligonucleotide that base pairs with the 5′ end of U1 snRNA was found to inhibit RNA splicing in vitro, supporting the importance of U1 snRNA base pairing to a 5′ splice site for the first step in pre-mRNA splicing. Currently, a similar strategy is being used in clinical trials for the treatment of Duchenne muscular dystrophy (DMD). This disorder is the most common human genetic disease due to new mutations in the genome. It is caused by mutations in the DMD gene, especially chainterminating mutations due to a base-pair change in an exon that generates a stop codon. Alternatively, short deletions or insertions that change the reading frame of the message result in translation of abnormal amino acids, generally followed by a stop codon in the altered reading frame. These mutations eliminate the C-terminus of the encoded protein, dystrophin, which is essential to its function (see Figure 17-20, bottom). The DMD gene is the longest human gene (~2million base pairs; half the length of the entire E. coli genome!), which makes it a large target for random mutations. Since the DMD gene is on the X chromosome, there is no second wild-type copy to complement the mutation in males. Synthetic oligonucleotides have been developed that are modified to permeate cell membranes, but have normal Watson-Crick base-pairing properties. By hybridizing with the terminus of a mutant exon, they can cause the abnormal exon to be “skipped” during pre-mRNA splicing, and can be designed so that the normal exon upstream of the mutation splices to an in-frame downstream exon. This results in expression of a protein with an internal deletion, but one that, potentially, has sufficient function to alleviate what are otherwise devastating symptoms. ■
Spliceosomes, Assembled from snRNPs and a Pre-mRNA, Carry Out Splicing The five splicing snRNPs and other proteins involved in splicing assemble on a pre-mRNA, forming a 426
CHAPTER 10
t Post-transcriptional Gene Control
FIGURE 1010 Structures of a bulged A in an RNA-RNA helix and an intermediate in the splicing process. (a) Diagram of RNA duplex used for determining the structure of a bulged A. Bulged As at position 5 (red) are excluded from duplex RNA-RNA hybrid formed by complementary bases (blue and green). (b) X-ray crystallography of the structure showed that the bulged A residues extend from the side of an A-form RNA-RNA helix. The phosphate backbone of one strand is shown in green and that of the other strand in blue. The structure on the right is turned 90 degrees for a view down the axis of the helix. (c) 40 Å resolution structure of a spliceosomal splicing intermediate containing U2, U4, U5, and U6 snRNPs, determined by cryoelectron microscopy and image reconstruction. The U4/U6/U5 tri-snRNP complex has a structure similar to the triangular body of this complex below the neck, suggesting that these snRNPs are at the bottom of the structure shown here and that the head is composed largely of U2 snRNP. See H. Stark and R. Luhrmann, 2006, Annu. Rev. Biophys. Biomol. Struct. 35:435. [Parts (a) and (b) data from J. A. Berglund et al., 2001, RNA 7:682, PDB ID 1i9x. Part (c) from E. Wolf et al., “Exon, intron and splice site locations in the spliceosomal B complex,” EMBO J., 2009, 28(15):2283–2292; doi:10.1038/emboj.2009.171.]
large ribonucleoprotein complex called a spliceosome (Figure 10-11). The spliceosome has a mass similar to that of a ribosome. Assembly of a spliceosome begins with the base pairing of the U1 snRNA to the 5′ splice site as well as the cooperative binding of protein SF1 (splicing factor 1) to the branch-point A and of the heterodimeric protein U2AF (U2-associated factor) to the polypyrimidine tract and the 3′ AG of the intron via its large and small subunits, respectively. The U2 snRNP then base-pairs with the branch-point region (see Figure 10-9a) as SF1 is released. Extensive base pairing between the snRNAs in the U4 and U6 snRNPs forms a complex that associates with U5 snRNP. This U4/U6/U5 “tri-snRNP” then associates with the previously formed U1/ U2/pre-mRNA complex to generate a spliceosome. After formation of the spliceosome, extensive rearrangements in the pairing of snRNAs and the pre-mRNA lead to the release of the U1 snRNP. Figure 10-10c shows the structure of this intermediate in the splicing process. A further rearrangement of spliceosomal components occurs with the loss of the U4 snRNP. Its release generates a complex that catalyzes the first transesterification reaction that forms the 2′,5′-phosphodiester bond between the 2′ hydroxyl on the branch-point A and the phosphate at the 5′ end of the intron (see Figure 10-8). Following another rearrangement of the snRNPs, the second transesterification reaction ligates the two exons in a standard 3′,5′-phosphodiester bond, releasing the intron as a lariat structure associated with the snRNPs. This final intron-snRNP complex rapidly dissociates, and the individual snRNPs released can participate in a new cycle of splicing. The excised intron is then rapidly degraded by a debranching enzyme and other nuclear RNases discussed later.
pG
5
U2AF
SF1
U1
Yn AG
A 1
3
SF1
U1 p
A
pG
5
3
U2 U4/U6/U5
2
U4 A
U1 pG
5
U2 p
U6
FIGURE 1011 Model of spliceosome-mediated splicing of pre-mRNA. Step 1 : After U1 base-pairs with the consensus 5′ splice site, SF1 (splicing factor 1) binds the branch-point A; U2AF (U2 snRNP associated factor) associates with the polypyrimidine tract and 3′ splice site; and the U2 snRNP associates with the branch-point A via base-pairing interactions shown in Figure 10-9, displacing SF1. Step 2 : A trimeric snRNP complex of U4, U5, and U6 joins the initial complex to form the spliceosome. Step 3 : Rearrangements of base-pairing interactions between snRNAs convert the spliceosome into a catalytically active conformation and destabilize the U1 and U4 snRNPs, which are released. Step 4 : The catalytic core, thought to be formed by U6 and U2, then catalyzes the first transesterification reaction, forming the intermediate containing a 2′,5′-phosphodiester bond, as shown in Figure 10-8. Step 5 : Following further rearrangements between the snRNPs, the second transesterification reaction joins the two exons by a standard 3′,5′-phosphodiester bond and releases the intron as a lariat structure as well as the remaining snRNPs. Step 6 : The excised lariat intron is converted into a linear RNA by a debranching enzyme. See T. Villa et al., 2002, Cell 109:149.
3
Spliceosome
U5 3
U1, U4
U6 OH A
pG
5
p
U5 4
U2 3
First transesterification
U2 Gp A
U6 p
OH 5
5
5
p
3
3
U5
Second transesterification
Lariat intron
U2, U5, U6
GpA
Spliced exons
OH 6 5 pG
Debranching enzyme
A
OH 3
Linear intron RNA
As mentioned above, a spliceosome is roughly the size of a ribosome and is composed of about 170 proteins, including about 100 “splicing factors” in addition to the proteins associated with the five snRNPs. This makes RNA splicing comparable in complexity to initiation of transcription and protein synthesis. Some of the splicing factors are associated with snRNPs, but others are not. For instance, the
65-kDa subunit of U2AF binds to the polypyrimidine tract near the 3′ end of an intron and to the U2 snRNP. The 35-kDa subunit of U2AF binds to the AG dinucleotide at the 3′ end of the intron and also interacts with the larger U2AF subunit bound nearby. These two U2AF subunits act together with SF1 to help specify the 3′ splice site by promoting interaction of the U2 snRNP with the branch point (see Figure 10-11, step 1 ). Some splicing factors also exhibit sequence homologies to known RNA helicases; these factors are probably necessary for the base-pairing rearrangements that occur among snRNAs during the spliceosomal splicing cycle. Several splicing factors associate with the CTD of RNA polymerase II when it is phosphorylated at serine 2 of the heptapeptide repeat by the cyclin T–CDK9 transcription elongation factor (see Figure 9-21). This association concentrates these splicing factors near the RNA exit site of RNA polymerase II so that they can rapidly assemble a spliceosome at a splice site as it emerges from the polymerase. Following RNA splicing, a specific set of hnRNP proteins remains bound to the spliced RNA approximately 20 nucleotides 5′ to each exon-exon junction, thus forming an exon-junction complex. One of the hnRNP proteins associated with the exon-junction complex is the RNA export factor (REF), which functions in the export of fully processed mRNPs from the nucleus to the cytoplasm, as discussed in Section 10.3. Other proteins associated with the exon-junction complex function in a quality-control mechanism in the cytoplasm that leads to the degradation of improperly spliced mRNAs, known as nonsense-mediated decay (see Section 10.4). A small fraction of pre-mRNAs (~1 percent in humans) contain introns whose splice sites do not conform to the standard consensus sequence. This class of introns begins with AU and ends with AC rather than following the usual “GU-AG rule” (see Figure 10-7). Splicing of this special class of introns occurs via a splicing cycle analogous to that shown in Figure 10-11, except that four novel, low- abundance snRNPs, together with the standard U5 snRNP, are involved. 10.1 Processing of Eukaryotic Pre-mRNA
427
Nearly all functional mRNAs in vertebrate, insect, and plant cells are derived from a single molecule of the corresponding pre-mRNA by removal of internal introns and splicing of exons. However, in two types of protozoans— trypanosomes and euglenoids—mRNAs are constructed by splicing together separate RNA molecules. This process, referred to as trans-splicing, is also used in the synthesis of 10–15 percent of the mRNAs in the nematode (roundworm) Caenorhabditis elegans, an important model organism for studying embryonic development. Trans-splicing is carried out by snRNPs by a process similar to the splicing of exons in a single pre-mRNA.
Chain Elongation by RNA Polymerase II Is Coupled to the Presence of RNA-Processing Factors How is RNA processing efficiently coupled with the transcription of a pre-mRNA? The key lies in the long carboxy-terminal domain (CTD) of RNA polymerase II, which, as discussed in Chapter 9, is composed of multiple repeats of a seven-residue (heptapeptide) sequence. When fully extended, the CTD domain in the human RNA polymerase II is about 130 nm long (Figure 10-12). The remarkable length of the CTD apparently allows multiple proteins to associate simultaneously with a single RNA polymerase II molecule. For instance, the enzymes that add the 5′ cap to nascent transcripts associate CTD
CTD Pol II
FIGURE 1012 Schematic diagram of human RNA polymerase II with the CTD extended. The length of the human RNA polymerase II carboxy-terminal domain (CTD) and the linker region that connects it to the polymerase is shown relative to the globular domain of the polymerase. In its extended form, the CTD can associate with multiple RNA-processing factors simultaneously. See P. Cramer, D. A. Bushnell, andR. D. Kornberg, 2001, Science 292:1863. 428
CHAPTER 10
t Post-transcriptional Gene Control
with the serine 5–phosphorylated CTD, as mentioned above, as do splicing and polyadenylation factors. As a consequence, these processing factors are present at high local concentrations when splice sites and polyadenylation signals are transcribed by the polymerase, enhancing the rate and specificity of RNA processing. In a reciprocal fashion, the association of hnRNP proteins with the nascent RNA enhances the interaction of RNA polymerase II with elongation factors such as DSIF and cyclin T–CDK9 (see Figure9-21), increasing the rate of transcription. As a consequence, the rate of transcription is coordinated with the rate of nascent RNA association with hnRNPs and RNA-processing factors. This mechanism may ensure that a pre-mRNA is not synthesized unless the machinery for processing it is properly positioned.
SR Proteins Contribute to Exon Definition in Long Pre-mRNAs The average length of an exon in the human genome is about 150 bases, whereas the average length of an intron is about 3500 bases, and the longest introns exceed 500 kb! Because the sequences of 5′ and 3′ splice sites and branch points are so degenerate, multiple copies of those sequences are likely to occur randomly in long introns. Consequently, additional sequence information is required to define the exons that should be spliced together in higher organisms with long introns. The information for defining the splice sites that demarcate exons is encoded within the sequences of the exons. A family of RNA-binding proteins, the SR proteins, interact with sequences within exons called exonic splicing enhancers. SR proteins are a subset of the hnRNP proteins discussed earlier that contain one or more RRM RNA-binding domains. They also contain several protein-protein interaction domains rich in arginine (R) and serine (S) residues, called RS domains. When bound to exonic splicing enhancers, SR proteins mediate the cooperative binding of U1 snRNP to a true 5′ splice site and U2 snRNP to a branch point through a network of protein-protein interactions that span an exon (Figure 10-13). The complex of SR proteins, snRNPs, and other splicing factors (e.g., U2AF and SF1) that assemble across an exon, which has been called a cross-exon recognition complex, permits precise specification of exons in long pre-mRNAs. Mutations that interfere with the binding of an SR protein to an exonic splicing enhancer, even if they do not change the encoded amino acid sequence, prevent formation of the cross-exon recognition complex. As a result, the affected exon is “skipped” during splicing and is not included in the final processed mRNA. The truncated mRNA produced in this case is either degraded or translated into a mutant, abnormally functioning protein. This type of mutation occurs in some human genetic diseases. For example, spinal muscular atrophy is one of the most common genetic causes of childhood mortality. This disease results from mutations in a region of the genome containing two closely related genes, SMN1 and SMN2, that arose by gene duplication. The two genes encode identical proteins, but SMN2 is expressed
Spliceosome
5 Pre-mRNA
U2
U2AF65 35 SR SR SR
A
YYYY
Branch point
AG
3 splice site
ESE
U1
U2
GU
A
5 splice site
Cross-exon recognition complex
U2AF65 35 SR SR SR YYYY
Branch point
GU
AG
3 splice site
U1
ESE
3
5 splice site
Cross-exon recognition complex
FIGURE 1013 Exon recognition through cooperative binding of SR proteins and splicing factors to pre-mRNA. The correct 5′ GU and 3′ AG splice sites are recognized by splicing factors on the basis of their proximity to exons. The exons contain exonic splicing enhancers (ESEs) that are binding sites for SR proteins. When bound to ESEs, the SR proteins interact with one another and promote the cooperative binding of the U1 snRNP to the 5′ splice site of the downstream intron, SF1 and then the U2 snRNP to the branch point of the upstream intron, the 65- and 35-kDa subunits of U2AF to the polypyrimidine tract and AG 3′ splice site of the upstream intron, and other splicing factors (not shown). The resulting RNA-protein cross-exon recognition
complex spans an exon and activates the correct splice sites for RNA splicing. Note that the U1 and U2 snRNPs in this unit do not become part of the same spliceosome. The U2 snRNP on the right forms a spliceosome with the U1 snRNP bound to the 5′ end of the same intron. The U1 snRNP shown on the right forms a spliceosome with the U2 snRNP bound to the branch point of the downstream intron (not shown), and the U2 snRNP on the left forms a spliceosome with a U1 snRNP bound to the 5′ splice site of the upstream intron (not shown). Double-headed arrows indicate protein-protein interactions. See T. Maniatis, 2002, Nature 418:236; see also S. M. Berget, 1995, J. Biol. Chem. 270:2411.
at a much lower level because a silent mutation in one exon interferes with the binding of an SR protein. This mutation leads to exon skipping in most of the SMN2 mRNAs. The homologous SMN gene in the mouse, in which there is only a single copy, is essential for cell viability. Spinal muscular atrophy in humans results from homozygous mutations that inactivate SMN1. The small amount of protein translated from the small fraction of SMN2 mRNAs that are correctly spliced is sufficient to maintain cell viability during embryogenesis and fetal development, but it is not sufficient to maintain the viability of spinal cord motor neurons in childhood, resulting in their death and the associated disease. ■
A non-hybridizing region that remains single-stranded and can bind an abundant SR protein may help to assemble a cross-exon recognition complex to increase correct splicing of exons in pre-mRNAs expressed from the SMN2 gene.
Approximately 15 percent of the single-base mutations that cause human genetic diseases interfere with proper exon definition. Some of these mutations occur in 5′ or 3′ splice sites, often resulting in the use of nearby alternative “cryptic” splice sites that are present in the normal gene sequence. In the absence of the normal splice site, the cross-exon recognition complex recognizes these alternative sites. Other mutations that cause abnormal splicing result in a new consensus splice-site sequence that becomes recognized in place of the normal splice site. Finally, some mutations can interfere with the binding of specific SR proteins to pre-mRNAs. These mutations inhibit splicing at normal splice sites, as in the case of the SMN2 gene, and thus lead to exon skipping. Strategies involving membrane-permeant synthetic oligonucleotide derivatives similar to those discussed above for causing skipping of mutant exons in DMD are being developed for the treatment of these genetic diseases. Such molecules can hybridize to a mutant sequence that creates an abnormal splice site, sterically blocking access of U1 or U2 snRNAs to that site. In the case of spinal muscular atrophy, researchers are experimenting with modified oligonucleotides that base-pair to a region in the SMN2 pre-mRNA close to the missing exonic splicing enhancer.
Self-Splicing Group II Introns Provide Clues to the Evolution of snRNAs Under certain unphysiological in vitro conditions, pure preparations of some RNA transcripts slowly splice out introns in the absence of any protein. This observation led to the recognition that some introns are self-splicing. Two types of self-splicing introns have been discovered: group I introns, present in nuclear rRNA genes of protozoans, and group II introns, present in protein-coding genes and some rRNA and tRNA genes in mitochondria and chloroplasts of plants and fungi. Discovery of the catalytic activity of self-splicing introns revolutionized our thinking about the functions of RNA. As discussed in Chapter 5, RNA is now known to catalyze peptide-bond formation during protein synthesis in ribosomes. Here we discuss the probable role of group II introns, now found only in mitochondrial and chloroplast DNA, in the evolution of snRNAs; the functioning of group I introns is considered in the later section on rRNA processing. Even though their precise sequences are not highly conserved, all group II introns fold into a conserved, complex secondary structure containing numerous stem-loops (Figure 10-14a). Self-splicing by a group II intron occurs via two transesterification reactions involving intermediates and products analogous to those found in nuclear pre-mRNA splicing. The mechanistic similarities between group II intron self-splicing and spliceosomal splicing led to the hypothesis that snRNAs function analogously to the stemloops in the secondary structure of group II introns. According to this hypothesis, snRNAs interact with 5′ and 3′ splice sites of pre-mRNAs and with one another to produce a 10.1 Processing of Eukaryotic Pre-mRNA
429
(a) Group II intron
(b) U snRNAs in spliceosome U5 U4
III II
IV
in the process (see Figures 8-18 and 8-19). It also permitted the increase in protein diversity that results from alternative RNA splicing and an additional level of gene control resulting from regulated RNA splicing.
U6 I
V A 5 3
VI
3′ Cleavage and Polyadenylation of Pre-mRNAs Are Tightly Coupled
U1 A
U2
5 3 Pre-mRNA intron
FIGURE 1014 Comparison of group II self-splicing introns and the spliceosome. These schematic diagrams compare the secondary structures of (a) group II self-splicing introns and (b) U snRNAs present in the spliceosome. The first transesterification reaction is indicated by light green arrows; the second reaction, by blue arrows. The branchpoint A is boldfaced. The similarity in these structures suggests that the spliceosomal snRNAs evolved from group II introns, and that the trans-acting snRNAs are functionally analogous to the corresponding domains in group II introns. The colored bars flanking the introns in (a) and (b) represent exons. See P. A. Sharp, 1991, Science 254:663.
three-dimensional RNA structure that is functionally analogous to that of group II self-splicing introns (Figure 10-14b). An extension of this hypothesis is that introns in ancient pre-mRNAs evolved from group II self-splicing introns through the progressive loss of internal RNA structures, which concurrently evolved into trans-acting snRNAs that perform the same functions. Support for this type of evolutionary model comes from experiments with group II intron mutants in which domain V and part of domain I are deleted. RNA transcripts containing such mutant introns are defective in self-splicing, but when RNA molecules equivalent to the deleted regions are added to the in vitro reaction, self-splicing occurs. This finding demonstrates that these domains in group II introns can be trans-acting, like snRNAs. The similarity in the mechanisms of group II intron selfsplicing and of spliceosomal splicing of pre-mRNAs also suggests that the splicing reaction is catalyzed by the snRNA, not the protein, components of spliceosomes. Although group II introns can self-splice in vitro at elevated temperatures and Mg2+ concentrations, under in vivo conditions, proteins called maturases, which bind to group II intron RNA, are required for rapid splicing. Maturases are thought to stabilize the precise three-dimensional interactions of the intron RNA required to catalyze the two splicing transesterification reactions. By analogy, snRNP proteins in spliceosomes are thought to stabilize the precise geometry of snRNAs and intron nucleotides required to catalyze pre-mRNA splicing. The evolution of snRNAs may have been an important step in the rapid evolution of higher eukaryotes. As sequences involved in self-splicing were lost from introns and their functions supplanted by trans-acting snRNAs, the remaining intron sequences would have become free to diverge. This in turn probably facilitated the evolution of new genes through exon shuffling, since there would be few constraints on the sequences of new introns generated 430
CHAPTER 10
t Post-transcriptional Gene Control
In eukaryotic cells, all mRNAs, except histone mRNAs,* have a 3′ poly(A) tail. Early studies of pulse-labeled adenovirus and SV40 RNA demonstrated that the viral primary transcripts extend beyond the site from which the poly(A) tail extends. These results suggested that A residues are added to a 3′ hydroxyl generated by endonucleolytic cleavage of a longer transcript, but the predicted downstream RNA fragments were never detected in vivo, presumably because of their rapid degradation. However, both predicted cleavage products were observed in in vitro processing reactions performed with nuclear extracts of cultured human cells. The cleavage/polyadenylation process and degradation of the RNA downstream of the cleavage site occurs much more slowly in these in vitro reactions, simplifying detection of the downstream cleavage product. Early sequencing of cDNA clones from animal cells showed that nearly all mRNAs contain the sequence AAUAAA 15–30 nucleotides upstream from the poly(A) tail (Figure 10-15). Polyadenylation of RNA transcripts is virtually eliminated when the corresponding sequence in the template DNA is mutated to any other sequence except one encoding a closely related sequence (AUUAAA). The unprocessed RNA transcripts produced from such mutant templates do not accumulate in nuclei, but are rapidly degraded. Further mutagenesis studies revealed that a second signal downstream from the cleavage site is required for efficient cleavage and polyadenylation of most pre-mRNAs in animal cells. This downstream signal is not a specific sequence, but rather a GU-rich or simply a U-rich region within about 20nucleotides of the cleavage site. Identification and purification of the proteins required for cleavage and polyadenylation of pre-mRNA have led to the model shown in Figure 10-15. A 360-kDa cleavage and polyadenylation specificity factor (CPSF), composed of five different polypeptides, first forms an unstable complex with the upstream AAUAAA polyadenylation signal. Then at least three additional proteins bind to the CPSF-RNA complex: a 200-kDa heterotrimer called cleavage stimulatory factor (CStF), which interacts with the G/U-rich sequence; a 150-kDa heterotetramer called cleavage factor I (CFI); and a second heterodimeric cleavage factor (CFII). A 150-kDa protein called symplekin is thought to form a scaffold on which these cleavage/polyadenylation factors assemble. Finally, *The major histone mRNAs are transcribed from repeated genes in prodigious amounts in replicating cells during the S phase. They undergo a special form of 3′-end processing that involves cleavage but not polyadenylation. Specialized RNA-binding proteins that help to regulate histone mRNA translation bind to the 3′ end generated by this specialized system.
Poly(A) signal 5⬘
Poly(A) signal
Poly(A) site
AAUAAA
3⬘
G/U
Pre-mRNA
CPSF, CStF, CFI, CFII
CFI CPSF
5⬘
AAUAAA CFII G/U CStF 3⬘ PAP
AAUAAA
5⬘
PAP G/U
3⬘ Cleavage
5⬘
AAUAAA
OH p
G/U 3⬘ ATP p
G/U
PPi
CStF, CFI, CFII
5⬘
AAUAAA
Slow polyadenylation
AAAOH3⬘
PABPN1
PABPN1 5⬘
AAUAAA PABPN1
A~12 AOH3⬘ ATP PPi
5⬘
AAUAAA
A~12
Rapid, processive polyadenylation
A~12
A~12
A~200 OH3⬘ PAP
Dissociation of PAP when tail reaches ~250 As
FIGURE 1015 Model for cleavage and polyadenylation of pre-mRNAs in mammalian cells. Cleavage and polyadenylation specificity factor (CPSF) binds to the upstream AAUAAA polyadenylation signal. CStF interacts with a downstream GU- or U-rich sequence and with bound CPSF, forming a loop in the RNA; binding of CFI and CFII helps stabilize the complex. Binding of poly(A) polymerase (PAP) then stimulates cleavage at a poly(A) cleavage site, which usually is 15–30 nucleotides 3′ of the upstream polyadenylation signal. The cleavage factors are released, as is the downstream RNA cleavage product, which is rapidly degraded. Bound PAP then adds about 12 A residues at a slow rate to the 3′-hydroxyl group generated by the cleavage reaction. Binding of nuclear poly(A)-binding protein (PABPN1) to the initial short poly(A) tail accelerates the rate of addition by PAP. After 200–250 A residues have been added, PABPN1 signals PAP to stop polymerization.
poly(A) polymerase (PAP) must bind to the complex before cleavage can occur. This requirement for PAP binding links cleavage and polyadenylation, so that the free 3′ end generated is rapidly polyadenylated and no essential information is lost to exonuclease degradation of an unprotected 3′ end. Assembly of this large multiprotein cleavage/polyadenylation complex around the AU-rich polyadenylation signal in a pre-mRNA is analogous in many ways to formation of the transcription preinitiation complex at the AT-rich TATA box of a template DNA molecule (see Figure 9-19). In both cases, multiprotein complexes assemble cooperatively through a network of specific protein–nucleic acid and protein-protein interactions. Following cleavage at the poly(A) site, polyadenylation proceeds in two phases: addition of the first 12 or so A residues occurs slowly, followed by rapid addition of up to 200–250 more A residues. The rapid phase requires the binding of multiple copies of a poly(A)-binding protein containing the RRM motif. This protein is designated PABPN1 to distinguish it from the poly(A)-binding protein that is present in the cytoplasm in humans, PABPC1. PABPN1 binds cooperatively to the short A tail initially added by PAP and to CPSF bound to the AAUAAA polyadenylation signal. This binding stimulates the PAP to extend the short poly(A) tail rapidly and processively; that is, without releasing the growing poly(A) tail from the complex of PABPN1 and CPSF. Once the poly(A) tail reaches a length of about 250 adenines, this processivity is lost, and PAP dissociates from the poly(A)-PABPN1 complex, terminating A addition (see Figure 10-15). Binding of PABPN1 to the poly(A) tail is essential for mRNA export into the cytoplasm. As for splicing factors, several of the subunits of the proteins involved in cleavage and polyadenylation associate with the serine 2–phosphorylated CTD of RNA polymerase II, which concentrates them in the region where polyadenylation signals in the RNA emerge from the elongating polymerase. In wild-type genes, RNA polymerase II terminates transcription at any one of multiple possible sites within about 2 kb of the polyadenylation signal. Experiments with SV40 and adenovirus (both DNA viruses) showed that when the polyadenylation signal is mutated, RNA polymerase II does not terminate transcription, but continues transcription until the next poly(A) site in the viral genome is encountered. Similar results were soon shown for a recombinant 10.1 Processing of Eukaryotic Pre-mRNA
431
human β-globin gene inserted into an adenovirus. These experiments showed that transcription termination by RNA polymerase II is coupled to cleavage and polyadenylation of the transcript. It is hypothesized that this is due to the de-protection of the 5′ end of the nascent RNA. Because no cap is present on the 5′ end of the cleaved RNA, it is susceptible to the XRN1 5′→3′ exoribonuclease. It is thought that when this exoribonuclease reaches the still-transcribing polymerase, it triggers termination, either by pulling the 3′ end of the nascent RNA out of the polymerase active site or by inducing a conformational change in the polymerase that causes transcription termination. Once the nascent RNA is removed from the elongating polymerase, the contacts between the RNA polymerase II clamp and the RNA-DNA hybrid within the polymerase (see Figure 9-15) are lost, allowing the clamp to open and releasing the polymerase from the DNA template. More recent chromatin immunoprecipitation studies (ChIP-seq) (see Figure 9-18) with antibody to RNA polymerase II indicate that the polymerase may be removed from the template DNA at multiple possible sites within about 2kb downstream from the poly(A) site.
Nuclear Exoribonucleases Degrade RNA That Is Processed Out of Pre-mRNAs Because the human genome contains long introns, only about 5 percent of the nucleotides that are polymerized by RNA polymerase II during transcription are retained in mature, processed mRNAs. Although this process appears inefficient, it probably evolved in multicellular organisms because the process of exon shuffling facilitated the evolution of new genes in organisms with long introns (see Chapter 8). The introns that are spliced out and the RNA downstream from the cleavage/polyadenylation site are degraded by nuclear exoribonucleases. As mentioned earlier, the 2′,5′-phosphodiester bond in excised introns is hydrolyzed by a debranching enzyme (see Figure 10-11, step 6 ), yielding a linear molecule with unprotected ends. Such linear RNA molecules can be attacked by exoribonucleases, which hydrolyze one base at a time from the 5′ or 3′ end (as opposed to endoribonucleases, which digest internal phosphodiester bonds). The predominant mechanism of RNA decay is digestion by a large (~400-kDa) protein complex called the exosome, which contains an internal 3′→5′ exoribonuclease (Figure 10-16). (Exosomes also function in the cytoplasm, as discussed later.) The exosome is in many ways analogous to the proteasome (see Figure 3-31) that digests polyubiquitinylated proteins in both the nucleus and the cytoplasm. The predominant active site of the exosome lies on the inside of the complex, where it can digest only single-stranded RNAs that are threaded into the pore at the top of the complex (Figure 10-16b). This pore is too small to allow the entry of double-stranded or other structured regions of RNAs. Other proteins that associate with the complex include an RNA helicase, which disrupts base pairing and RNA-protein interactions that would otherwise prevent the entry of RNA into the pore. 432
CHAPTER 10
t Post-transcriptional Gene Control
In addition to introns, the exosome also degrades premRNAs that have not been properly spliced or polyadenylated, although at present, it is not yet clear how the exosome recognizes improperly processed pre-mRNAs. But in yeast cells with temperature-sensitive mutant PAP (see Figure 10-15), pre-mRNAs are retained at their sites of transcription in the nucleus at the nonpermissive temperature. These abnormally processed pre-mRNAs are released in cells with a second mutation in a subunit of the exosome found only in nuclear and not in cytoplasmic exosomes (Rrp6, see Figure 10-16). In addition, exosomes are found concentrated at sites of transcription in Drosophila polytene chromosomes, where they are associated with RNA polymerase II elongation factors. These results suggest that the exosome participates in an as yet poorly understood quality-control mechanism in the nucleus that recognizes aberrantly processed pre-mRNAs, preventing their export to the cytoplasm and ultimately leading to their degradation. To avoid being degraded by nuclear exonucleases, nascent transcripts, pre-mRNA-processing intermediates, and mature mRNAs in the nucleus must have their ends protected. As discussed above, the 5′ end of a nascent transcript is protected by addition of the 5′ cap structure as soon as the 5′ end emerges from the polymerase. The 5′ cap is protected because it is bound by a heterodimeric nuclear cap-binding complex (CBC), which protects it from 5′ exonucleases and also functions in export of the mRNA to the cytoplasm. The 3′ end of a nascent transcript lies within the RNA polymerase and is thus inaccessible to exonucleases (see Figure 5-12). As discussed previously, the free 3′ end generated by cleavage of a pre-mRNA downstream from the polyadenylation signal is rapidly polyadenylated by the PAP associated with the other 3′ processing factors, and the resulting poly(A) tail is bound by PABPN1 (see Figure 10-15). This tight coupling of cleavage and polyadenylation, followed by PABPN1 binding, protects the 3′ end from exonuclease attack.
RNA Processing Solves the Problem of Pervasive Transcription of the Genome in Metazoans As discussed in Chapter 9, analysis of the location of transcribing RNA polymerase II in metazoan cells revealed the surprising result that the polymerase transcribes in the downstream direction, into coding regions, and in the upstream direction, away from coding regions, at nearly equal frequency from most promoters (see Figure 9-18). This finding was confirmed by deep sequencing of small RNAs isolated from metazoan cells, which revealed low levels of short, capped RNAs transcribed from both the sense and antisense strands at CpG island promoters, which account for some 70percent of mammalian promoters. Indeed, deep sequencing of all cellular RNAs showed that both strands of nearly the entire genome are transcribed, although much of the resulting RNA is present at extremely low concentrations of less than one molecule per cell. This finding raised the question of how the cell deals with such “pervasive transcription.” Sequence analysis of these low-abundance short, capped RNAs indicates that they are probably prevented from
(a)
(b)
180°
Front
Rrp47
(c)
Back
N N
Rrp6
Csl4
N
Back Rrp4 Rrp41
Rrp43 (OIP2) Rrp46
C
Mpp6
Csl4
Rrp45 (PM/SCL-75)
Rrp42
Mtr4 RNA
Rrp40
Rrp40 Rrp43 Rrp6
90°
Rrp46 Exo-10
90°
Rrp4 Rrp46 Rrp40
Rrp43 (OIP2)
Csl4
Rrp45 (PM/SCL-75)
Rrp44 Rrp44
Mtr3 Rrp4
Rrp42
Rrp41
180°
Top
Bottom
FIGURE 1016 Structure of the exosome. (a) Catalytically inactive exosome core. A nine-subunit, 286-kDa human exosome core was assembled in vitro from subunits Rrp41, Rrp45, Rrp42, Mtr3, Rrp43, Rrp46, Rrp4, Rrp40, and Csl4 expressed at a high level in E. coli (see Figure 6-29). Its structure was determined to a resolution of 3.35 Å by x-ray crystallography. (b) The 10-subunit, catalytically active cytoplasmic exosome. The orientation is similar to that of the upper right image in part (a), but rotated slightly counterclockwise. Processive 3′→5′ exonuclease activity is provided by the tenth subunit, Rrp44 (pink), associated with the bottom of the core. The C-terminus of an eleventh subunit, Rrp6, in the nuclear exosome is shown in maroon. RNA with a double-stranded region at the top and a 3′ single-stranded region that enters the core pore is shown in black. (c) Diagram of the 14-subunit nuclear exosome. Exo-10 represents the 10-subunit complex shown in (b). A heterodimer
of Rrp6 and Rrp47 associates with Csl4 at the top of the exosome core through the C-terminal domain of Rrp6, as shown in (b). The N-terminus of an RNA helicase, Mtr4 (blue), associates with the heterodimerization domain of Rrp6 and Rrp47. Another subunit associated with the top, Mpp6, also associates with the Mtr4 RNA helicase in the human nuclear exosome, but its structure and the details of the Mpp6-Mtr4 interaction remain to be determined. The path of single-stranded RNA through the exosome is diagrammed in red. The exonuclease active site in the processive exonuclease Rrp44 is indicated by a pink circle. An endonuclease active site in Rrp44 is represented by the pink oval. A non-processive 3′→5′ exonuclease active site in Rrp6 is represented by a maroon oval. See B. Schuch etal., 2014. EMBO J. 33:2829. [Part (a) data from Q. Liu, J. C. Greimann, and C. D. Lima, 2006, Cell 127:1223. Part (b) data from D. L. Makino, M. Braumgartner, and E. Conti, 2013, Nature 495:70. PDB ID 4ifd.]
reaching high concentrations by RNA processing and nuclear surveillance for abnormally processed RNAs. Sequencing of RNAs from several cell types has revealed that the antisense RNAs have a higher frequency of AAUAAA polyadenylation signal sequences transcribed from the AT-rich DNA of most metazoans (~60 percent AT in mammals) than do transcripts transcribed in the sense direction into coding regions. Because of the high AT composition of mammalian DNA, an AAUAAA sequence in an antisense transcript is frequently followed by a U-rich sequence that may function as the downstream element of a bona fide pre-mRNA cleavage/ polyadenylation signal (see Figure 10-15). These cleavage/ polyadenylation signals occur much less frequently in transcripts going into coding regions. Where they do occur in the sequence of pre-mRNAs, in either exons or introns, they usually lie downstream of consensus base-pairing sites for U1 snRNA, which has been found to suppress cleavage/polyadenylation following nearby AAUAAA sequences. This function of U1 snRNA may help to explain why the U1 snRNP is much more abundant than the other spliceosomal snRNPs.
This is not the case for cleavage/polyadenylation signals used in the processing of 3′ ends of mRNAs because U1 snRNA associates with the 5′ end of the terminal intron, far from the poly(A) site. In addition, as discussed above, transcription by RNA polymerase II usually terminates within ~2 kb following cleavage and polyadenylation of a pre-mRNA. Consequently, the enrichment of poly(A) sites, and the relative lack of binding sites for U1 snRNA, in antisense transcripts may lead to cleavage of most of these transcripts within ~2 kb of the transcription start site by cleavage/polyadenylation factors (see Figure 10-15), followed by termination of transcription (Figure 10-17). Cleaved antisense transcripts are probably degraded by the same nuclear exonucleases that degrade introns spliced out of pre-mRNAs and sequences downstream of pre-mRNA cleavage/polyadenylation sites, as well as sequences processed out of rRNA and tRNA precursors, discussed in a later section (see Figure 10-1). As a result, even though a large number of polymerases transcribe in the “wrong” direction, most of the transcripts generated in this way are rapidly degraded. 10.1 Processing of Eukaryotic Pre-mRNA
433
m 7G
U1
PAS
5’S
U1
S PAS
5’S
S PAS
PAS
PAS
PAS
Pol II
Upstream antisense
PAS
PAS 5’SS
PA
S
PAS
PAS
Pol II
5’SS Coding gene
m7G
FIGURE 1017 RNA transcribed in the "wrong" direction from most promoters in metazoans has a high frequency of polyadenylation signals and a low frequency of binding sites for U1 snRNA. This pattern may account for the termination of transcription in the "wrong" direction after about 2 kb for most of these transcripts. PAS represents polyadenylation signals encoded in the DNA that are transcribed into RNA. Cleavage of transcripts transcribed in the upstream direction (scissors) is proposed to generate free RNA ends that
are digested by the nuclear exosome and a nuclear 5′→3′ exonuclease, XRN1. In contrast, pre-mRNAs synthesized by RNA polymerase II transcribing into coding regions have evolved to have few polyadenylation signals. Where they do occur, these signals are usually preceded by a binding site for U1 snRNP, which inhibits cleavage at a nearby PAS (stop sign). However, the PAS used to generate the 3′ end of an mRNA does not have a closely associated U1 RNP binding site. See A. E. Almada et al., 2013, Nature 499:360.
KEY CONCEPTS OF SECTION 10.1
pre-mRNAs of higher organisms. A network of interactions between SR proteins, snRNPs, and splicing factors forms a cross-exon recognition complex that specifies correct splice sites (see Figure 10-13).
Processing of Eukaryotic Pre-mRNA r In the nucleus of eukaryotic cells, pre-mRNAs are associated with hnRNP proteins and processed by 5′ capping, 3′ cleavage and polyadenylation, and splicing before being transported to the cytoplasm (see Figure 10-2). r Shortly after transcription initiation, capping enzymes associate with the carboxy-terminal domain (CTD) of RNA polymerase II, phosphorylated multiple times at serine 5 of the heptapeptide repeat by TFIIH during transcription initiation. These enzymes then rapidly add the 5′ cap to the nascent transcript when it reaches a length of about 25 nucleotides. Other RNA-processing factors involved in RNA splicing and in 3′ cleavage and polyadenylation associate with the CTD when it is phosphorylated at serine 2 of the heptapeptide repeat, increasing the rate of transcription elongation. Consequently, transcription does not proceed at a high rate until RNA-processing factors become associated with the CTD, where they are poised to interact with the nascent pre-mRNA as it emerges from the surface of the polymerase. r Five different snRNPs interact via base pairing with one another and with pre-mRNA to form the spliceosome (see Figure 10-11). This very large ribonucleoprotein complex catalyzes two transesterification reactions that join two exons and remove the intron as a lariat structure, which is subsequently degraded (see Figure 10-8). r SR proteins that bind to exonic splicing enhancer sequences in exons are critical in defining exons in the large
434
CHAPTER 10
t Post-transcriptional Gene Control
r The snRNAs in the spliceosome are thought to have an overall tertiary structure similar to that of group II self-splicing introns. r For long transcription units in higher organisms, splicing of exons usually begins as the pre-mRNA is still being formed. Cleavage and polyadenylation to form the 3′ end of the mRNA occur after the poly(A) cleavage site is transcribed. r In most protein-coding genes, a conserved AAUAAA polyadenylation signal lies slightly upstream from a poly(A) site where cleavage and polyadenylation occur. A GU- or U-rich sequence downstream from the poly(A) site contributes to the efficiency of cleavage and polyadenylation. r A multiprotein complex that includes poly(A) polymerase (PAP) carries out the cleavage and polyadenylation of a pre-mRNA. A nuclear poly(A)-binding protein, PABPN1, stimulates addition of A residues by PAP and stops their addition once the poly(A) tail reaches about 250 residues (see Figure10-15). r Excised introns and RNA downstream from the cleavage/ polyadenylation site are degraded primarily by exosomes, multiprotein complexes that contain an internal 3′→5′ exonuclease. Exosomes also degrade improperly processed pre-mRNAs.
10.2 Regulation of Pre-mRNA Processing
whereas hepatocytes produce another type. Both fibronectin isoforms are encoded by the same transcription unit, but the transcript is spliced differently in the two cell types to yield two different mRNAs (see Figure 5-16). In other cases, alternative processing of the same transcript may occur simultaneously in the same cell type in response to different developmental or environmental signals. We first discuss one of the best-understood examples of regulated RNA processing, then briefly consider the consequences of RNA splicing in the development of the nervous system.
Now that we’ve seen how pre-mRNAs are processed into mature, functional mRNAs, let’s consider how regulation of this process can contribute to gene control. Recall from Chapter 8 that higher eukaryotes have both simple and complex transcription units encoded in their DNA. The primary transcripts produced from the former contain one poly(A) site and exhibit only one pattern of RNA splicing, even if multiple introns are present; thus simple transcription units encode a single mRNA. In contrast, the primary transcripts produced from complex transcription units (which constitute about 95 percent of all human transcription units) can be processed in alternative ways to yield different mRNAs that encode distinct proteins (see Figure 8-3).
A Cascade of Regulated RNA Splicing Controls Drosophila Sexual Differentiation One of the earliest examples of regulated alternative splicing of pre-mRNA came from studies of sexual differentiation in Drosophila. The genes required for normal Drosophila sexual differentiation were first characterized by isolating Drosophila mutants defective in the process. When the proteins encoded by the wild-type genes were characterized biochemically, two of them were found to regulate a cascade of alternative RNA splicing in Drosophila embryos. More recent research has provided insight into how these proteins regulate RNA processing and ultimately lead to the creation of two different sex-specific transcriptional repressors that suppress the development of characteristics of the opposite sex. The Sex-lethal (Sxl) protein, encoded by the sex-lethal gene, is the first protein to act in the cascade (Figure 10-18). The Sxl protein is present only in female embryos. Early in
Alternative Splicing Generates Transcripts with Different Combinations of Exons The discovery that a large fraction of transcription units in higher organisms encode alternatively spliced mRNAs and that differently spliced mRNAs are expressed in different cell types revealed that regulation of RNA splicing is an important gene-control mechanism in higher eukaryotes. Although many examples of cleavage at alternative poly(A) sites in pre-mRNAs are known, alternative splicing of different exons is the more common mechanism for expressing different proteins from one complex transcription unit. In Chapter 5, for example, we mentioned that fibroblasts produce one type of the extracellular protein fibronectin,
Pre-mRNAs
mRNAs
− 2 2
(a) sxl
3
4
Sxl protein
4 3
5
2
3
1
3
1
2
4
− (b) tra
1
5
2
3
Rbp1 + Tra2
+
+
3 5
4
3
An An
(c) dsx
Tra protein
3
5
3
3
4
Dsx protein
3
5
Dsx protein
FIGURE 1018 A cascade of regulated splicing controls sex determination in Drosophila embryos. For clarity, only the exons (boxes) and introns (black lines) where regulated splicing occurs are shown. Splicing is indicated by red dashed lines above (female) and blue dashed lines below (male) the pre-mRNAs. Vertical red lines in exons indicate in-frame stop codons, which prevent synthesis of functional protein. Only female embryos produce functional Sxl protein, which represses splicing between exons 2 and 3 in sxl pre-mRNA (a) and between exons 1 and 2 in tra pre-mRNA (b). (c) In contrast, the cooperative binding of Tra protein and two SR proteins, Rbp1 and Tra2, activates splicing between exons 3 and 4 and cleavage/polyadenylation(An) at the 3′ end of exon 4 in dsx pre-mRNA in female embryos. In male embryos, which lack functional Tra, the SR proteins do not bind to exon 4, and consequently exon 3 is spliced to exon 5. The distinct Dsx proteins produced in female and male embryos as the result of this cascade of regulated splicing repress transcription of genes required for sexual differentiation of the opposite sex. See M. J. Moore et al., 1993, in R. Gesteland and J. Atkins, eds., The RNA World, Cold Spring Harbor Press, pp. 303–357. 10.2 Regulation of Pre-mRNA Processing
435
development, the Sxl gene is transcribed from a promoter that functions only in female embryos. Later in development, this female-specific promoter is shut off, and another promoter for sex-lethal becomes active in both male and female embryos. In male embryos, however, in the absence of early Sxl protein, exon 2 of the sex-lethal pre-mRNA is spliced to exon 3 to produce an mRNA that contains a stop codon early in the sequence. The net result is that male embryos produce no functional Sxl protein either early or later in development. In contrast, the Sxl protein expressed in early female embryos regulates splicing of the sex-lethal pre-mRNA so that a functional sex-lethal mRNA is produced (Figure 10-18a). Sxl accomplishes this by binding to a sequence in the pre-mRNA near the 3′ end of the intron between exon 2 and exon 3, thereby blocking the proper association of U2AF and U2 snRNP with the adjacent 3′ splice site used in males (see Figure 10-11). As a consequence, the U1 snRNP bound to the 5′ end of the intron between exons 2 and 3 assembles into a spliceosome with U2 snRNP bound to the branch point at the 3′ end of the intron between exons 3 and 4, leading to the splicing of exon 2 to exon 4 and the skipping of exon 3. The binding site for Sxl in the sex-lethal pre-mRNA is called an intronic splicing silencer because of its location in an intron and its function in blocking, or “silencing,” the use of a splice site. The resulting female-specific sex-lethal mRNA is translated into functional Sxl protein, which reinforces its own expression in female embryos by continuing to cause skipping of exon 3. The absence of Sxl protein in male embryos allows the inclusion of exon 3 and, consequently, of the stop codon near the 5′ end of exon 3 that prevents translation of functional Sxl protein (see Figure 10-18a). Sxl protein also regulates alternative splicing of the pre-mRNA transcribed from the transformer gene (Figure10-18b). In male embryos, in which no Sxl is expressed, exon 1 is spliced to exon 2, which contains a stop codon that prevents synthesis of a functional Transformer (Tra) protein. In female embryos, however, binding of Sxl protein to an intronic splicing silencer at the 3′ end of the intron between exons 1 and 2 blocks binding of U2AF at this site. The interaction of Sxl with transformer pre-mRNA is mediated by two adjacent RRM domains in the protein (see Figure 10-5). When Sxl is bound, U2AF binds to a lower-affinity site farther 3′ in the pre-mRNA; as a result, exon 1 is spliced to this alternative 3′ splice site, causing skipping of exon 2 with its stop codon. The resulting female-specific transformer mRNA, which contains additional constitutively spliced exons, is translated into functional Tra protein. Finally, Tra protein regulates the alternative processing of pre-mRNA transcribed from the doublesex (dsx) gene (Figure 10-18c). In female embryos, a complex of Tra and two constitutively expressed SR proteins, Rbp1 and Tra2, directs the splicing of exon 3 to exon 4 and also promotes cleavage/polyadenylation at the alternative poly(A) site at the 3′ end of exon 4, leading to a short, female-specific version of the Dsx protein. In male embryos, which produce
436
CHAPTER 10
t Post-transcriptional Gene Control
no Tra protein, exon 4 is skipped, so that exon 3 is spliced to exon 5. Exon 5 is constitutively spliced to exon 6, which is polyadenylated at its 3′ end—leading to a longer, male-specific version of the Dsx protein. The RNA sequence to which Tra binds in exon 4 is called an exonic splicing enhancer because it enhances splicing at a nearby splice site. As a result of the cascade of regulated RNA processing depicted in Figure 10-18, different Dsx proteins are expressed in male and female embryos. The two proteins are transcription factors that share the N-terminal sequence encoded in exons 1–3, including a common DNA-binding domain, but have different C-terminal sequences, encoded by exon 4 in females and exon 5 plus additional downstream exons in males. The unique C-terminal end of the female protein functions as a strong activation domain, while the C-terminal end of the male protein is a strong repression domain. Consequently, the female Dsx protein activates genes with binding sites for the transcription factor, including genes that induce development of female characteristics, while the male Dsx protein represses the same target genes. Figure 10-19 illustrates how the Tra/Tra2/Rbp1 complex is thought to interact with doublesex pre-mRNA. Rbp1 and Tra2 are SR proteins, but they do not interact with exon 4 in the absence of the Tra protein. The interaction of the Tra protein with Rbp1 and Tra2 results in the cooperative binding of all three proteins to six exonic splicing enhancers in exon 4. The bound Tra2 and Rbp1 proteins then promote the binding of U2AF and the U2 snRNP to the 3′ end of the intron between exons 3 and 4, just as other SR proteins do for constitutively spliced exons (see Figure 10-13). The Tra/ Tra2/Rbp1 complexes also enhance binding of the cleavage/ polyadenylation complex to the 3′ end of exon 4 because the U2 snRNP plus associated proteins bound to a 3′ splice site enhance binding of cleavage/polyadenylation factors (see Figure 10-15) to an appropriately spaced polyadenylation signal through cooperative binding interactions.
Tra Rbp1
Tra2
An 5
1
2
3
4
An 5
6
3
FIGURE 1019 Model of splicing activation by Tra protein and the SR proteins Rbp1 and Tra2. In female Drosophila embryos, splicing of exons 3 and 4 in dsx pre-mRNA is activated by the binding of Tra/Tra2/Rbp1 complexes to six exonic splicing enhancers in exon 4. Because Rbp1 and Tra2 cannot bind to the pre-mRNA in the absence of Tra, exon 4 is skipped in male embryos. See the text for discussion. An = polyadenylation. See T. Maniatis and B. Tasic, 2002, Nature 418:236.
Splicing Repressors and Activators Control Splicing at Alternative Sites As is evident from Figure 10-18, the Drosophila Sxl protein and Tra protein have opposite effects: Sxl prevents splicing, causing exons to be skipped, whereas Tra promotes splicing. The action of similar proteins may explain the cell-type-specific expression of fibronectin isoforms in humans (see Figure 5-16). For instance, an Sxl-like splicing repressor expressed in hepatocytes might bind to splice sites for the EIIIA and EIIIB exons in the fibronectin pre-mRNA, causing them to be skipped during RNA splicing. Alternatively, a Tra-like splicing activator expressed in fibroblasts might activate the splice sites associated with those exons, leading to their inclusion in the mature mRNA. Experimental examination of some systems has revealed that the inclusion of an exon in some cell types and the skipping of the same exon in other cell types results from the combined influence of several splicing repressors (usually hnRNP proteins) and enhancers (usually SR proteins). RNA binding sites for repressors can also occur in exons, where they are called exonic splicing silencers. And binding sites for splicing activators can also occur in introns, where they are called intronic splicing enhancers. Alternative splicing of exons is especially common in the nervous system, where it generates multiple isoforms of many proteins required for neuronal development and function in both vertebrates and invertebrates. The primary transcripts of the genes encoding these proteins often show complex splicing patterns that can generate several different mRNAs, which are expressed in different anatomic locations within the central nervous system. Here we consider two remarkable examples that illustrate the critical role of this process in neural function. Expression of K+-Channel Proteins in Vertebrate Hair Cells In the inner ear of vertebrates, individual hair cells, which are ciliated neurons, respond most strongly to a specific frequency of sound. Cells tuned to low frequencies (~50 Hz) are found at one end of the tubular cochlea that makes up the inner ear; cells responding to high frequencies (~5000 Hz) are found at the other end (Figure 10-20a). Cells in between the two ends respond to a gradient of frequencies between these extremes. One component in the tuning of hair cells in reptiles and birds is the opening of K+ ion channels in response to increased intracellular Ca2+ concentrations. The Ca2+ concentration at which the channel opens determines the frequency with which the membrane potential oscillates and hence the frequency to which the cell is tuned. The gene encoding this Ca2+-activated K+ channel is expressed as multiple, alternatively spliced mRNAs, which encode proteins that open at different Ca2+ concentrations. Hair cells with different response frequencies express different isoforms of the channel protein depending on their position along the length of the cochlea. The sequence variation in the protein is very complex: there are at least eight regions in the mRNA where one of several alternative exons is utilized, permitting the expression of 576 possible isoforms (Figure 10-20b).
(a)
Apical hair cell (50 Hz)
(b)
Auditory nerve cell body
1
N
Auditory nerve
Basal hair cell (5000 Hz)
2
Exterior S0
S1
S2
S3
S4
S5
S6
Cytosol 3
S7
8 C
4 S8 6
S10 5 S9
7
FIGURE 1020 Role of alternative splicing in the perception of sounds of different frequencies. (a) The chicken cochlea, a 5-mmlong tube, contains an epithelium of auditory hair cells that are tuned to a gradient of vibrational frequencies from 50 Hz at the apical end (left) to 5000 Hz at the basal end (right). (b) The Ca2+-activated K+ channel contains seven transmembrane α helices (S0–S6), which associate to form the channel. The cytosolic domain, which includes four hydrophobic regions (S7–S10), regulates opening of the channel in response to Ca2+. Isoforms of the channel, encoded by alternatively spliced mRNAs produced from the same primary transcript, open at different Ca2+ concentrations and thus respond to different frequencies. Red numbers refer to regions where alternative splicing produces different amino acid sequences in the various isoforms. See K. P. Rosenblatt et al., 1997, Neuron 19:1061.
PCR analysis of mRNAs from individual hair cells has shown that each hair cell expresses a mixture of different K+-channel mRNAs, with different isoforms predominating in different cells according to their position along the cochlea. This remarkable arrangement suggests that splicing of the K+-channel pre-mRNA is regulated in response to extracellular signals that inform the cell of its position along the cochlea. Other studies have demonstrated that splicing at one of the alternative splice sites in the Ca2+-activated K+-channel pre-mRNA in the rat is suppressed when a specific protein kinase is activated by neuron depolarization in response to synaptic activity from interacting neurons. This observation raises the possibility that a splicing repressor specific for this splice site may be activated when it is phosphorylated by this protein kinase, whose activity in turn is regulated by synaptic activity. Since hnRNP and SR proteins are extensively modified by phosphorylation and other post-translational
10.2 Regulation of Pre-mRNA Processing
437
modifications, it seems likely that complex regulation of alternative RNA splicing through post-translational modifications of splicing factors plays a significant role in modulating neuron function. Many examples of genes similar to those that encode the cochlear K+ channel have been observed in vertebrate neurons; in these cases, alternatively spliced mRNAs co-expressed from a specific gene in one type of neuron are expressed at different relative concentrations in different regions of the central nervous system. Expansions in the number of microsatellite repeats within the transcribed regions of genes expressed in neurons can alter the relative concentrations of alternatively spliced mRNAs transcribed from multiple genes. In Chapter 8, we discussed how backward slippage during DNA replication can lead to expansion of a microsatellite repeat (see Figure 8-5). At least 14 different types of neurological diseases result from expansion of microsatellite regions within transcription units expressed in neurons. The resulting long regions of repeated simple sequences in nuclear pre-mRNAs of these neurons result in
TABLE 102
abnormalities in the relative concentrations of alternatively spliced mRNAs. For example, the most common of these types of diseases, myotonic dystrophy, results from increased copies of either CUG repeats in one transcript, in some patients, or CCUG repeats in another transcript, in other patients. When the number of these repeats increases to 10or more times the normal number of repeats, abnormalities are observed in the functions of two hnRNP proteins that bind to these repeated sequences. The abnormalities probably result because the hnRNPs are bound by the abnormally high concentrations of the repeats in the nuclei of neurons in these patients and cannot associate with other pre-mRNAs. This sequestration of the hnRNPs leads to alterations in the rate of splicing of different alternative splice sites in multiple pre-mRNAs that are normally regulated by these hnRNP proteins. Because of the importance of the proper regulation of alternative splicing for the normal function of neurons, multiple human neurological disorders are associated with abnormalities in the function of nuclear RNA-binding proteins and the expansion of microsatellite repeats that generate binding sites for splicing factors (Table10-2). ■
Neurological Disorders with Links to Abnormalities in Alternative RNA Splicing
Disease
Link to Alternative Splicing
Ataxia telangiectasia
Point mutations within the ATM gene cause aberrant splicing of ATM transcripts
Fascioscapulohumoral dystrophy (FSHD)
Loss of FRG1, a nuclear RNA-binding protein, leads to altered splicing of many pre-mRNAs
Fragile-X-associated tremor/ataxia syndrome (FXTAS)
Premutation CGG repeat expansions in the FMR1 gene result in the sequestration of RNA-binding splicing factors
Frontotemporal dementia with Parkinsonism linked to chromosome 17 (FTDP-17)
Point mutations within the MAPT gene result in altered levels of MAPT transcripts containing the alternatively spliced exon 10
Duchenne muscular dystrophy; Becker’s muscular dystrophy
Altered splicing of dystrophin transcripts due to deletions and mutations in the dystrophin gene
MYOTONIC DYSTROPHY (DM) DM1
CUG expansion in the 3′ UTR of DMPK results in the misregulation of the MBNL splicing factor and consequent missplicing of MBNL target pre-mRNAs
DM2
CCUG expansion in ZNF9 intron leading to misregulation of the CUG-BP1 splicing factor and missplicing of CUG-BP1 target pre-mRNAs
Neurofibromatosis type 1 (NF1)
Numerous mutations in the NF1 gene, including mutations that result in aberrant splicing
PARANEOPLASTIC NEUROLOGIC DISORDERS (PND) Paraneoplastic opsoclonus-myoclonus-ataxia (POMA)
Autoimmune antibodies recognize the Nova family of neuronspecific RNA-binding splicing factors; Nova knockout mice phenocopy POMA
Hu syndrome (PEM/SN; paraneoplastic encephalomyelitis/sensory neuronopathy)
Autoimmune antibodies recognize the Hu family of RNA-binding factors related to the Drosophila splicing factor ELAV
Prader-Willi syndrome
Loss of a splicing regulatory snoRNA that is complementary to a splicing silencer element implicated in regulating the alternative splicing of serotonin receptor 5-HT2cR transcripts (Continued)
438
CHAPTER 10
t Post-transcriptional Gene Control
Psychiatric disorders
Accumulation of aberrantly spliced transcripts in schizophrenic patients
Retinitis pigmentosa
Mutation of genes encoding U snRNP-associated proteins
Rett syndrome
Mutation of the gene encoding MeCP2, which interacts with the YB-1 RNA-binding protein; mouse model of Rett syndrome shows aberrant pre-mRNA splicing
Spinal muscular atrophy
Deletion/mutation of the SMN1 gene and the loss of a splicing regulatory element in SMN2 results in insufficient levels of SMN, which is involved in snRNP biogenesis
SPINOCEREBELLAR ATAXIAS SCA2, SCA8, SCA10, and SCA12
Possible RNA gain of function due to triplet repeat expansions; direct and indirect interactions with RNA-binding splicing factors
source: Republished by permission of Elsevier, from Licatalosi, D. and Darnel, R., “Splicing regulation in neurologic disease,” Neuron, 2006, 52:1, 93–101. Permission conveyed through the Copyright Clearance Center, Inc.
Expression of Dscam Isoforms in Drosophila Retinal Neurons The most extreme example of regulated alternative RNA processing yet uncovered occurs in expression of the Dscam gene in Drosophila. Mutations in this gene interfere with the normal synaptic connections made between retinal axons and dendrites during fly development. Analysis of the Dscam gene showed that it contains four groups of exons within which one of several possible exons is included in the final mature mRNA. The gene contains a total of 95 exons (Figure 10-21), generating 38,016 possible alternatively spliced isoforms! Drosophila mutants with a version of the gene that can be spliced in only about 22,000 different ways have specific defects in connectivity between neurons. These results indicate that expression of most of the possible Dscam isoforms through regulated RNA splicing helps to specify the tens of millions of different specific
12
Number of alternatives 48 33
2
Genomic DNA
mRNA Ig2 Ig3
Ig7
TM
Protein Homophilic binding region
FIGURE 1021 The Drosophila Dscam gene is processed into a vast number of alternative isoforms. Dscam encodes a cell-surface protein on neurons. The protein (bottom) is composed of ten different immunoglobulin (Ig) domains (ovals), six different fibronectin type III domains (rectangles), one transmembrane domain (yellow), and a C-terminal cytoplasmic domain (dark gray). The fully processed mRNA is shown as rectangles representing each exon, with the length of the rectangle corresponding to the length of the exons, and a green circle representing the 5’ cap. Each mRNA contains one of the 12 Ig2 exons shown in light blue (top), one of the 48 Ig3 exons shown in green, one of the 33 Ig7 exons shown in dark blue, and one of the 2 transmembrane exons shown in yellow. The exons shown in pink are spliced into each of the messages. Thus alternative splicing can generate 12 × 48 × 33 × 2 = 38,016 possible isoforms. See M. R. Sawaya et al., 2008, Cell 134:1007.
synaptic connections between neurons in the Drosophila brain. Inother words, the correct wiring of neurons in the brain requires regulated RNA splicing.
RNA Editing Alters the Sequences of Some Pre-mRNAs In the mid-1980s, sequencing of numerous cDNA clones and corresponding genomic DNAs from multiple organisms led to the unexpected discovery of another type of pre-mRNA processing. In this type of processing, called RNA editing, the sequence of a pre-mRNA is altered; as a result, the sequence of a mature mRNA differs from that of the exons encoding it in genomic DNA. RNA editing is widespread in the mitochondria of protozoans and plants as well as in chloroplasts. In the mitochondria of certain pathogenic trypanosomes, more than half the sequence of some mRNAs is altered from the sequence of the corresponding primary transcripts. Additions and deletions of specific numbers of Us follow templates provided by base-paired short “guide” RNAs. These RNAs are encoded by thousands of small circular DNA molecules concatenated to many fewer large DNA molecules. The reason for this baroque mechanism for encoding mitochondrial proteins in such protozoans is not clear. But this system does represent a potential target for drugs to inhibit the complex processing enzymes essential to the microbe that do not exist in the cells of its human or other vertebrate hosts. In higher eukaryotes, RNA editing is much rarer, and thus far, only single-base changes have been observed. Such minor editing, however, turns out to have significant functional consequences in some cases. An important example of RNA editing in mammals involves the APOB gene, which encodes two alternative forms of a serum protein that is central to the uptake and transport of cholesterol. Consequently, it is important in the pathogenic processes that lead to atherosclerosis, the arterial disease that is the major cause of death in the developed world. The APOB gene encodes both the serum protein apolipoprotein B-100 (apoB-100), which is expressed in hepatocytes, the major cell type in the liver, and apoB-48, which is expressed in intestinal epithelial cells. The 240-kDa apoB-48 corresponds to the N-terminal region of the 500-kDa 10.2 Regulation of Pre-mRNA Processing
439
TAA
CAA APOB gene
Exon 26 Liver
APOB mRNA
CAA
ApoB proteins
Intestine
CAA
UAA
5
An 1
COOH apoB-100
FIGURE 1022 RNA editing of APOB pre-mRNA. The APOB mRNA produced in the liver has the same sequence as the exons in the primary transcript. This mRNA is translated into apoB-100, which has two functional domains: an N-terminal domain (green) that associates with lipids and a C-terminal domain (orange) that binds to LDL receptors on
apoB-100. Both ApoB proteins are components of the large lipoprotein complexes we described in Chapter 7, which transport lipids in the serum. However, only low-density lipoprotein (LDL) complexes, which contain apoB-100 on their surface, deliver cholesterol to body tissues by binding to the LDL receptor that is present on all cells (see Figures 14-27 and 14-29). The cell-type-specific expression of the two forms of ApoB results from editing of ApoB pre-mRNA so as to change the nucleotide at position 6666 in the sequence from a C to a U. This alteration, which occurs only in intestinal cells, converts a CAA codon for glutamine to a UAA stop codon, leading to synthesis of the shorter apoB-48 (Figure10-22). Studies with the partially purified enzyme that performs the post-transcriptional deamination of C6666 to U (see Figure 2-17) shows that it can recognize and edit an RNA as short as 26 nucleotides containing the sequence surrounding C6666 in the ApoB primary transcript.
KEY CONCEPTS OF SECTION 10.2
Regulation of Pre-mRNA Processing r Because of alternative splicing of primary transcripts, the use of alternative promoters, and cleavage at different poly(A) sites, different mRNAs may be expressed from the same gene in different cell types or at different developmental stages (see Figure 10-18). r Alternative splicing can be regulated by RNA-binding proteins that bind to specific sequences near regulated splice sites. Splicing repressors may sterically block the binding of splicing factors to specific sites in pre-mRNAs or inhibit their function. Splicing activators enhance splicing by interacting with splicing factors, thus promoting their association with a regulated splice site. The RNA sequences bound by splicing repressors are called intronic or exonic splicing silencers, depending on their location in an intron or exon.
440
CHAPTER 10
t Post-transcriptional Gene Control
UAA
5
UAA An
1
4536
NH2
3
2152
NH2
COOH apoB-48
cell membranes. In the APOB mRNA produced in the intestine, however, the CAA codon in exon 26 is edited to a UAA stop codon. As a result, intestinal cells produce apoB-48, which corresponds to the N-terminal domain of apoB-100. See P. Hodges and J. Scott, 1992, Trends Biochem. Sci. 17:77.
RNA sequences bound by splicing activators are called intronic or exonic splicing enhancers. r In RNA editing, the nucleotide sequence of a pre-mRNA is altered in the nucleus. In vertebrates, this process is relatively rare, and only single-base C to U changes have been observed, but those changes can have important consequences by altering the amino acid encoded by an edited codon (see Figure 10-22).
10.3 Transport of mRNA Across the Nuclear Envelope Fully processed mRNAs in the nucleus remain bound by hnRNP proteins in complexes referred to as nuclear mRNPs. Before an mRNA can be translated into its encoded protein, it must be exported from the nucleus into the cytoplasm. The nuclear envelope is a double membrane that separates the nucleus from the cytoplasm (see Figure 1-12). Like the plasma membrane surrounding a cell, each nuclear membrane consists of a water-impermeable phospholipid bilayer and multiple associated proteins. mRNPs and other macromolecules, including tRNAs and ribosomal subunits, traverse the nuclear envelope through nuclear pore complexes (NPCs). This section focuses on the export of mRNPs through NPCs and the mechanisms that allow some level of regulation of this step. Transport of mRNPs, proteins, and other cargoes through NPCs is discussed in greater detail in Chapter 13. Embedded in the nuclear envelope, NPCs are cylindrical in shape with a diameter of about 30 nm. Proteins and RNPs larger than 40–60 kDa must be selectively transported across the nuclear envelope with the assistance of transporter proteins that bind them and also interact reversibly with components in the central channel of the NPC. mRNPs are transported through the NPC by the mRNP exporter,
a heterodimer consisting of a large subunit, called nuclear export factor 1 (NXF1), and a small subunit, nuclear export transporter 1 (NXT1). NXF1 binds nuclear mRNPs through associations with both RNA and proteins in the mRNP complex. One of the most important of these proteins is REF (RNA export factor), a component of the exon-junction complexes discussed earlier, which is bound approximately 20 nucleotides 5′ to each exon-exon junction (Figure 10-23). The mRNP exporter also associates with SR proteins bound to exonic splicing enhancers. Thus SR proteins associated with exons function to direct both the splicing of pre-mRNAs and the export of fully processed mRNAs through NPCs to the cytoplasm. mRNPs are probably bound along their length by multiple mRNP exporters, which interact reversibly with unstructured protein domains that fill the NPC central channel (see Chapter 13). Protein filaments extend from the core NPC scaffold into the nucleoplasm, forming an NPC nuclear basket (see Figure 10-23). Other protein filaments extend from the cytoplasmic face of the NPC into the cytoplasm. Both sets of filaments assist in mRNP export. Gle2, an adapter protein that reversibly binds both NXF1 and a protein in the nuclear basket, brings nuclear mRNPs to the NPC in preparation for export. A protein in the cytoplasmic filaments of the NPC binds an RNA helicase (Dbp5) that functions in
PABPN1 PABPN1
A~12 Nucleus
A~12 (A)n
REF NXF1 NXT1
CBC
NPC
PABPN1 PABPN1
elF4E
Cytoplasm
PABPC1 PABPC1
A~12
A~12 (A)n
FIGURE 1023 Remodeling of mRNPs during nuclear export. Some mRNP proteins (rectangles) dissociate from nuclear mRNP complexes before their export through an NPC. Others (ovals) are exported through the NPC with the mRNP, but dissociate from it in the cytoplasm and are shuttled back into the nucleus through an NPC. In the cytoplasm, translation initiation factor eIF4E replaces CBC bound to the 5′ cap, and PABPC1 replaces PABPN1.
the dissociation of NXF1/NXT1 and other hnRNP proteins from the mRNP as it reaches the cytoplasm. In a process called mRNP remodeling, the proteins associated with an mRNA in the nuclear mRNP are exchanged for a different set of proteins as the mRNP is transported through the NPC (see Figure 10-23). Some nuclear mRNP proteins dissociate early in transport, remaining in the nucleus to bind to newly synthesized nascent pre-mRNA. Other nuclear mRNP proteins remain with the mRNP as it traverses the NPC and do not dissociate from the mRNP until the complex reaches the cytoplasm. Proteins in this category include the NXF1/NXT1 mRNP exporter, the nuclear cap-binding complex (CBC) bound to the 5′ cap, and PABPN1 bound to the poly(A) tail. These proteins dissociate from the mRNP on the cytoplasmic side of the NPC through the action of the Dbp5 RNA helicase that associates with the cytoplasmic NPC filaments, as discussed above. These proteins are then imported back into the nucleus, as described for other nuclear proteins in Chapter 13, where they can function in the export of another mRNP. In the cytoplasm, the cap-binding translation initiation factor eIF4E replaces the CBC bound to the 5′ cap of nuclear mRNPs (see Figure5-23). In vertebrates, the nuclear poly(A)-binding protein PABPN1 is replaced with the cytoplasmic poly(A)-binding protein PABPC1 (so named to distinguish it from the nuclear PABPN1). Only a single PABP is found in budding yeast, in both the nucleus and the cytoplasm.
Phosphorylation and Dephosphorylation of SR Proteins Imposes Directionality on mRNP Export Across the Nuclear Pore Complex Studies of S. cerevisiae indicate that the direction of mRNP export from the nucleus into the cytoplasm is controlled by the phosphorylation and dephosphorylation of mRNP adapter proteins, such as REF, that assist in the binding of the NXF1/NXT1 mRNP exporter to mRNPs. In one case, a yeast SR protein (Npl3) functions as an adapter protein that promotes the binding of the yeast mRNP exporter (Figure10-24). In its phosphorylated form, the SR protein initially binds to nascent pre-mRNA. When 3′ cleavage and polyadenylation are completed, the adapter protein is dephosphorylated by a specific nuclear protein phosphatase that is essential for mRNP export. Only the dephosphorylated adapter protein can bind the mRNP exporter, thereby coupling mRNP export to correct polyadenylation. This mechanism is one form of mRNA “quality control.” If the nascent mRNP is not correctly processed, it is not recognized by the phosphatase that dephosphorylates Npl3, and consequently, it is not bound by the mRNP exporter and is not exported from the nucleus. Instead, it is degraded by exosomes, the multiprotein complexes that degrade unprotected RNAs in the nucleus and cytoplasm (see Figures 10-1 and 10-16). Following export to the cytoplasm, the Npl3 SR protein is phosphorylated by a specific cytoplasmic protein kinase.
10.3 Transport of mRNA Across the Nuclear Envelope
441
RNA pol ll RNA pol ll
RNA pol ll
Npl3
Npl3
P
P
AAAAAAA
Npl3
Glc7
P
1 P
Nucleoplasm
Npl3
NXF1/NXT1
P NPC
AAAAAAA
Npl3 2 NXF1/NXT1
Cytoplasm
Import Export
6 Importin
AAAAAAA 3
Npl3
P
P Npl3
AAAAAAA
Npl3
NXF1/NXT1
NXF1/NXT1 P
5 NXF1/NXT1
Sky1
4 Translation 7
FIGURE 1024 Reversible phosphorylation and direction of mRNP nuclear export. Step 1 : The yeast SR protein Npl3 binds nascent pre-mRNAs in its phosphorylated form. Step 2 : When polyadenylation has occurred successfully, the Glc7 nuclear phosphatase dephosphorylates Npl3, promoting the binding of the mRNP exporter, NXF1/NXT1. Step 3 : The mRNP exporter allows diffusion of the mRNP complex through the central channel of the nuclear pore complex (NPC). Step 4 : The cytoplasmic protein kinase Sky1 phosphorylates
Npl3 in the cytoplasm, causing step 5 dissociation of the phosphorylated Npl3 from the mRNP exporter, probably through the action of an RNA helicase associated with NPC cytoplasmic filaments step 6 . The mRNA transporter and phosphorylated Npl3 are transported back into the nucleus through NPCs. Step 7 Transported mRNA is available for translation in the cytoplasm. See E. Izaurralde, 2004, Nat. Struct. Mol. Biol. 11:210–212; see also W. Gilbert and C. Guthrie, 2004, Mol. Cell 13:201–212.
This phosphorylation causes it to dissociate from the mRNP, along with the mRNP exporter. In this way, dephosphorylation of mRNP adapter proteins in the nucleus once RNA processing is complete and their phosphorylation and resulting dissociation in the cytoplasm result in a higher concentration of mRNP exporter–mRNP complexes in the nucleus, where they form, and a lower concentration of these complexes in the cytoplasm, where they dissociate. As a result, the direction of mRNP export may be driven by simple diffusion down a concentration gradient of the mRNP exporter– mRNP complex across the NPC, from high in the nucleus to low in the cytoplasm.
Balbiani Rings in Insect Larval Salivary Glands Allow Direct Visualization of mRNP Export Through NPCs
442
CHAPTER 10
t Post-transcriptional Gene Control
The larval salivary glands of the insect Chironomus tentans provide a good model system for electron microscopic studies of the formation of hnRNPs and their export through NPCs. In these larvae, genes in large chromosomal puffs called Balbiani rings are abundantly transcribed into nascent pre-mRNAs that associate with hnRNP proteins and are processed into coiled mRNPs with a final mRNA length of about 75 kb (Figure 10-25a, b). These giant mRNAs encode large
glue proteins that adhere the developing larva to a leaf. After processing of the pre-mRNA in Balbiani ring hnRNPs, the resulting mRNPs move through NPCs to the cytoplasm. Electron micrographs of sections of these cells show mRNPs that appear to uncoil during their passage through NPCs and then bind to ribosomes as they enter the cytoplasm. This uncoiling is probably a consequence of the remodeling of mRNPs as the result of phosphorylation of mRNP proteins by cytoplasmic kinases and the action of the RNA helicase associated with NPC cytoplasmic filaments, as discussed in the previous section. The observation that mRNPs become associated with ribosomes during transport indicates that the 5′ end leads the way through the NPC. Detailed electron microscopic studies of the transport of Balbiani ring mRNPs through nuclear pore complexes led to the model depicted in Figure 10-25c.
Pre-mRNAs in Spliceosomes Are Not Exported from the Nucleus It is critical that only fully processed mature mRNAs be exported from the nucleus because translation of incompletely processed pre-mRNAs containing introns would produce defective proteins that might interfere with the functioning of the cell. To prevent this, pre-mRNAs associated with
FIGURE 1025 Formation of heterogeneous ribonucleoprotein particles (hnRNPs) and export of mRNPs from the nucleus. (a) Model of a single chromatin transcription loop and assembly of Balbiani ring (BR) mRNP in Chironomus tentans. Nascent RNA transcripts produced from the template DNA rapidly associate with proteins, forming hnRNPs. The gradual increase in the size of the hnRNPs reflects the increasing length of RNA transcripts at greater distances from the transcription start site. The model was reconstructed from electron micrographs of serial thin sections of salivary gland cells. (b) Schematic diagram of the biogenesis of hnRNPs. Following processing of the pre-mRNA, the resulting ribonucleoprotein particle is referred to as an mRNP. (c) Model for the transport of BR mRNPs through the nuclear pore complex (NPC) based on electron microscopic studies. Note that the curved mRNPs appear to uncoil as they pass through NPCs. As the mRNA enters the cytoplasm, it rapidly associates with ribosomes, indicating that the 5′ end passes through the NPC first. Parts (b) and (c), see B. Daneholt, 1997, Cell 88:585. Seealso B. Daneholt, 2001, Proc. Natl. Acad. Sci. USA 98:7012.
(b)
(a)
hnRNP
Template DNA
(c)
snRNPs in spliceosomes are usually prevented from being transported to the cytoplasm. In one type of experiment demonstrating this restriction, a gene encoding a pre-mRNA with a single intron that is normally spliced out was mutated to introduce deviations from the consensus splice-site sequences. Mutation of either the 5′ or the 3′ invariant splice-site bases at the ends of the intron resulted in pre-mRNAs that were bound by snRNPs to form spliceosomes; however, RNA splicing was blocked, and the pre-mRNA was retained in the nucleus. In contrast, mutation of both the 5′ and 3′ splice sites in the same pre-mRNA resulted in export of the unspliced pre-mRNA, although less efficiently than for the spliced mRNA, probably because of the absence of an exon-junction complex. When both splice sites were mutated, the pre-mRNAs were not efficiently bound by snRNPs, and consequently, their export was not blocked. Studies in yeast have shown that a protein component of the NPC nuclear basket is required to retain pre-mRNAs associated with snRNPs in the nucleus. If either this protein or the nuclear basket protein to which it binds is deleted, unspliced pre-mRNAs are exported. Consequently, these proteins prevent hnRNPs associated with snRNPs from traversing the NPC.
mRNP
[Part (a) republished with permission from Elsevier, from Erricson, C. et al., “The ultrastructure of upstream and downstream regions of an active Balbiani ring gene,” Cell, 1989, 56(4): 631–9; courtesy of B. Daneholt. Permission conveyed through the Copyright Clearance Center, Inc.]
Nuclear envelope
Nucleoplasm
mRNP
Cytoplasm
mRNA
NPC
10.3 Transport of mRNA Across the Nuclear Envelope
443
Many cases of thalassemia, an inherited disease that results in abnormally low levels of globin proteins, are due to mutations in globin-gene splice sites that decrease the efficiency of splicing but do not prevent association of the pre-mRNA with snRNPs. The resulting unspliced globin pre-mRNAs are retained in the nuclei of erythroid progenitors (see Figure 16-7) and are rapidly degraded. ■
HIV Rev Protein Regulates the Transport of Unspliced Viral mRNAs As discussed earlier, transport of mRNPs containing mature, functional mRNAs through NPCs from the nucleus to the cytoplasm entails a complex mechanism that is crucial to gene expression (see Figures 10-23, 10-24, and 10-25). Regulation of this transport theoretically could provide another means of gene control, although it appears to be relatively rare. Indeed, the only known examples of regulated mRNA export occur during the cellular response to conditions (e.g., heat shock) that cause protein denaturation or during viral infection, when virus-induced alterations in nuclear export of mRNPs maximize viral replication. Here we describe the regulation of mRNP export mediated by a protein encoded by human immunodeficiency virus (HIV). HIV, which is a retrovirus, integrates a DNA copy of its RNA genome into the host-cell DNA (see Figure 5-48). The integrated viral DNA, or provirus, contains a single transcription unit, which is transcribed into a single primary transcript by cellular RNA polymerase II. The HIV transcript can be spliced in alternative ways to yield three classes of
mRNAs: a 9-kb unspliced mRNA; 4-kb mRNAs formed by removal of one intron; and 2-kb mRNAs formed by removal of two or more introns (Figure 10-26). After their synthesis in the host-cell nucleus, all three classes of HIV mRNAs are transported to the cytoplasm and translated into viral proteins; some of the 9-kb unspliced RNA is used as the viral genome in progeny virions that bud from the cell surface. Since the 9-kb and 4-kb HIV mRNAs contain splice sites, they can be viewed as incompletely spliced mRNAs. As discussed earlier, association of such incompletely spliced mRNAs with snRNPs in spliceosomes normally blocks their export from the nucleus. Thus HIV, as well as other retroviruses, must have some mechanism for overcoming this block, permitting export of the longer viral mRNAs. Some retroviruses have evolved an RNA sequence within their genome called the constitutive transport element (CTE), which binds to the NXF1/NXT1 mRNP exporter with high affinity. This strong interaction with the mRNP exporter allows export of unspliced retroviral RNA into the cytoplasm. HIV solved the problem differently. Studies with HIV mutants showed that transport of unspliced 9-kb and singly spliced 4-kb viral mRNAs from the nucleus to the cytoplasm requires the virus-encoded Rev protein. Subsequent biochemical experiments demonstrated that Rev binds to a specific Rev-response element (RRE) that is present in HIV RNA. In cells infected with HIV mutants lacking the RRE, unspliced and singly spliced viral mRNAs remain in the nucleus, demonstrating that the RRE is required for Rev-mediated stimulation of nuclear export. Early in an infection, before any Rev protein is synthesized, only multiply spliced 2-kb mRNAs that do not retain any splice
HIV provirus
RRE Transcription, splicing
Transport CYTOPLASMIC mRNAs
NUCLEAR mRNAs +Rev
9-kb Unspliced
−Rev
4-kb Singly spliced
2-kb Multiply spliced
+Rev −Rev
444
CHAPTER 10
t Post-transcriptional Gene Control
4 kb
−Rev
Rev protein 2 kb
Nucleoplasm
FIGURE 1026 Transport of HIV mRNAs from the nucleus to the cytoplasm. The HIV genome, which contains several coding regions, is transcribed into a single 9-kb primary transcript. Several 4-kb mRNAs result from the splicing out of any one of several introns (dashed lines), and several 2-kb mRNAs result from the splicing out of two or more alternative
9 kb
Translation
Cytoplasm
introns. After transport to the cytoplasm, these various RNA species are translated into different viral proteins. Rev protein, encoded by a 2-kb mRNA, interacts with the Rev-response element (RRE) in the unspliced and singly spliced mRNAs, stimulating their transport to the cytoplasm. See B. R. Cullen and M. H. Malim, 1991, Trends Biochem. Sci. 16:346.
sites can be exported. One of these alternatively spliced 2-kb mRNAs encodes Rev, which contains a leucine-rich nuclearexport signal that interacts with the transporter exportin 1 (see Chapter 13) rather than with the NXF1/NXT1 mRNP exporter. Translation of Rev in the cytoplasm, followed by its import into the nucleus, results in export of the larger unspliced and singly spliced HIV mRNAs through the NPC.
KEY CONCEPTS OF SECTION 10.3
Transport of mRNA Across the Nuclear Envelope r Most mRNPs are exported from the nucleus by a heterodimeric mRNP exporter that interacts with unstructured protein domains that fill the central channel of the nuclear pore complex (NPC). The direction of transport (nucleus to cytoplasm) results from dissociation of the mRNP exporter–mRNP complex in the cytoplasm due to the phosphorylation of mRNP adapter proteins by cytoplasmic kinases and the action of an RNA helicase associated with cytoplasmic filaments of the nuclear pore complexes. As a result, mRNP exporter–mRNP complexes diffuse down a concentration gradient across the NPC from the nucleus to the cytoplasm. r The mRNP exporter binds to most mRNAs cooperatively with SR proteins bound to exonic splicing enhancers and with REF associated with exon-junction complexes as well as with additional mRNP proteins. r Pre-mRNAs bound by a spliceosome normally are not exported from the nucleus, ensuring that only fully processed, functional mRNAs reach the cytoplasm for translation.
expression of many genes. Most of these mechanisms operate in the cytoplasm, controlling the stability or localization of mRNA or its translation into protein. The concentration of an mRNA in the cytoplasm is determined by its rate of synthesis and its rate of degradation. The most stable mRNAs, which encode proteins required in large amounts (such as the ribosomal proteins), can accumulate to very high copy numbers per cell. In contrast, highly unstable mRNAs, which encode proteins expressed in short bursts (such as cytokines, secreted proteins that regulate the immune response), rarely achieve such high concentrations even when transcribed, processed, and exported from the nucleus at high rates. We begin by discussing the major pathways that degrade mRNAs. Next we discuss two related mechanisms of gene control that provide powerful new techniques for manipulating the expression of specific genes for experimental and therapeutic purposes. These mechanisms are controlled by short (~22-nucleotide) single-stranded RNAs called micro-RNAs (miRNAs) and short interfering RNAs (siRNAs). Both base-pair with specific target mRNAs, causing their rapid degradation (siRNAs) or inhibiting their translation and inducing a slower form of degradation (miRNAs). Many miRNAs can target more than one mRNA. Consequently, these mechanisms contribute significantly to the regulation of gene expression. Short interfering RNAs, involved in a process called RNA interference, are an important cellular defense against viral infection and excessive transposition by retrotransposons. We also discuss mechanisms that control the overall rate of protein synthesis, as well as highly specific mechanisms that regulate the translation and stability of particular mRNAs. Finally, we discuss mechanisms that control the localization of mRNAs in the cytoplasm of asymmetric cells so that the encoded protein is translated at sites in the cell where it is needed.
10.4 Cytoplasmic Mechanisms of Post-transcriptional Control
Degradation of mRNAs in the Cytoplasm Occurs by Several Mechanisms
Before proceeding, let’s quickly review the steps in gene expression at which control is exerted. We saw in Chapter 9 that regulation of transcription initiation and transcription elongation in the promoter-proximal region are the initial mechanisms for controlling the expression of genes in the DNA → RNA → protein pathway. In the preceding sections of this chapter, we learned that the expression of protein isoforms is controlled by the regulation of alternative RNA splicing and of cleavage and polyadenylation at alternative poly(A) sites. Although nuclear export of fully and correctly processed mRNPs to the cytoplasm is rarely regulated, the export of improperly processed or aberrantly remodeled pre-mRNPs is prevented, and such abnormal transcripts are degraded by exosomes. However, retroviruses, including HIV, have evolved mechanisms that permit pre-mRNAs that retain splice sites to be exported and translated. In this section, we consider other mechanisms of posttranscriptional control that contribute to regulating the
As mentioned above, the concentration of an mRNA is a function of both its rate of synthesis and its rate of degradation. For this reason, if two genes are transcribed at the same rate, the steady-state concentration of the corresponding mRNA that is more stable will be higher than the concentration of the other. The stability of an mRNA also determines how rapidly synthesis of the encoded protein can be shut down. For a stable mRNA, synthesis of the encoded protein persists long after transcription of the gene is repressed. Most bacterial mRNAs are unstable, decaying exponentially with a typical half-life of a few minutes. For this reason, a bacterial cell can rapidly adjust the synthesis of proteins to accommodate changes in the cellular environment. Most cells in multicellular organisms, on the other hand, exist in a fairly constant environment and carry out a specific set of functions over days to months or even the lifetime of the organism (neurons, for example). Accordingly, most mRNAs of higher eukaryotes have half-lives of many hours. 10.4 Cytoplasmic Mechanisms of Post-transcriptional Control
445
However, some proteins in eukaryotic cells are required only for short periods and must be expressed in bursts. For example, as discussed above, certain signaling molecules called cytokines, which are involved in regulating the immune response of mammals, are synthesized and secreted in short bursts (see Chapter 23). Similarly, many of the transcription factors that regulate the onset of the S phase of the cell cycle, such as Fos and Jun, are synthesized only for brief periods (see Chapter 19). The expression of such proteins occurs in short bursts because transcription of their genes can be rapidly turned on and off, and their mRNAs have unusually short half-lives, on the order of 30 minutes or less. Cytoplasmic mRNAs are degraded by one of the three pathways shown in Figure 10-27. For most mRNAs, the deadenylation-dependent pathway is followed: the length of the poly(A) tail gradually decreases with time through the action of a deadenylating nuclease complex. When the tail has been shortened sufficiently, PABPC1 molecules can no longer bind to it and stabilize the interaction of the 5′ cap and translation initiation factors (see Figure 5-23, which summarizes the steps of translation initiation). The exposed cap is then removed by a decapping enzyme (DCP1/DCP2), leaving the unprotected mRNA susceptible to degradation
by XRN1, a 5′→3′ exoribonuclease. Removal of the poly(A) tail also makes mRNAs susceptible to degradation by cytoplasmic exosomes containing 3′→5′ exonucleases. The 5′→3′ exonuclease pathway predominates in yeast, and the 3′→5′ exosome pathway predominates in mammalian cells. The decapping enzymes and 5′→3′ exonuclease are concentrated in P bodies (processing bodies, described below), regions of the cytoplasm with unusually high concentrations of RNPs. Some mRNAs are degraded primarily by a deadenylationindependent decapping pathway (Figure 10-27b). Certain sequences at the 5′ end of an mRNA make the cap sensitive to the decapping enzyme. For these mRNAs, the rate at which they are decapped controls the rate at which they are degraded because once the 5′ cap is removed, the RNA is rapidly hydrolyzed by the 5′→3′ exoribonuclease XRN1. Other mRNAs are degraded by an endonucleolytic pathway that does not involve decapping or significant deadenylation (Figure 10-27c). One example of this type of pathway is the RNA interference pathway discussed below. Each siRNA-RISC complex can degrade thousands of targeted RNA molecules. The fragments generated by internal cleavage are then degraded by exonucleases.
(a) Deadenylation-dependent mRNA decay
5e UTR m7G
(b) Deadenylation-independent mRNA decay
3e UTR AAAA
ORF
Edc3 Rps28B
DCP2 m7G
Deadenylation AA
m7G 1
DCP1
2
5eq 3e decay
AAAA
Deadenylase complex AAAA
3eq 5e decay XRN1
m7G
m7G Decapping
DCP2 m7G
Scavenger decapping
Exosome
(c) Endonuclease-mediated mRNA decay
m7G
AAAA Endonuclease
m7GpppN 5eq 3e decay DCP1
5eq 3e decay
GOH, NOH, Pi
3eq 5e decay AAAA
XRN1
m7G Exosome
XRN1
FIGURE 1027 Pathways for degradation of eukaryotic mRNAs. (a) In the most common pathway of mRNA degradation, the deadenylation-dependent pathway, the poly(A) tail is progressively shortened by a deadenylase complex until it reaches a length of 20 or fewer A residues, at which point the interaction between PABPC1 and the remaining poly(A) is destabilized, leading to weakened interactions between the 5′ cap and translation initiation factors (see Figure5-23). The deadenylated mRNA then may either (1) be decapped by the DCP1/DCP2 deadenylation complex and degraded by XRN1, a 5′→3′ exonuclease, or (2) be degraded by 446
CHAPTER 10
t Post-transcriptional Gene Control
3′→5′ exonucleases in cytoplasmic exosomes. (b) Other mRNAs are decapped before they are deadenylated and then degraded by the XRN1 5′→3′ exonuclease. In the example shown from yeast, an RNAbinding protein Rps28B binds a sequence in the 3’-UTR of its own mRNA, which then interacts with Edc3 (enhancer of decapping3). Edc3 then recruits the DCP1/2 decapping enzyme to the mRNA, auto regulating expression of Rps28B. (c) Some mRNAs are cleaved internally by an endonuclease and the fragments degraded by a cytoplasmic exosome and the XRN1 exonuclease. See N. L. Garneau, J. Wilusz, andC. J. Wilusz, 2007, Nat. Rev. Mol. Cell Biol. 8:113.
The rate of mRNA deadenylation varies inversely with the frequency of translation initiation for an mRNA: the higher the frequency of initiation, the slower the rate of deadenylation. This relationship is probably due to the reciprocal interactions between translation initiation factors bound at the 5′ cap and PABPC1 bound to the poly(A) tail. For an mRNA that is translated at a high rate, initiation factors are bound to the cap much of the time, stabilizing the binding of PABPC1 and thereby protecting the poly(A) tail from deadenylating nuclease complexes. Many short-lived mRNAs in mammalian cells—those encoding proteins such as cytokines and transcription factors whose concentrations must change rapidly—contain multiple, sometimes overlapping copies of the sequence AUUUA in their 3′ untranslated region. These sequences are known as AU-rich elements. Specific RNA-binding proteins