Genes and Proteins

I.  Overview of Protein Synthesis
  A.  Process:  DNA => RNA => Polypeptide => Protein
  B.  Why a 2-step process?
    1.  Evolutionary limitation
      a.  RNA probably preceded DNA
      b.  Some RNAs can self replicate
    2.  DNA is more stable (less damage)
    3.  Productivity:  EXAMPLE:  can get five times more protein from one DNA gene 
          being transcribed into 10 copies of mRNA which can be used to make 10 
          proteins each.

II.  Historical Development of Gene-Protein Relationship
  A.  1902 - Archibald Garrod
               Alkaptonuria : homogentisic acid (HA) is metabolized to maleylacetoacetic 
               acid.  HA builds up in tissues, urine, cartilage and turns them black.  
               Garrod determined that the disorder was inherited as a recessive trait.. 
                Linked a chemical activity to heredity.
  B.  1941 - George Beadle and Edward Tatum
                Studied heritability of metabolic mutations in the bread mold, Neurospora 
                crassa.  On minimal medium, it could manufactory all 20 AA, purines, 
                pyrimidines, and vitamins.  Induced mutations with X-radiation then 
                plated the fugal colonies on minimal and complete medium.   Anything 
                that didn't grow on minimal had mutated.  Could then plate up those 
                grown on complete on minimal plus one nutrient to isolate the function of 
                the mutated enzyme.   B6, B1 plus hundreds more mutants isolated.  
                Then, bred with wild-type to establish nuclear origin to mutant.  Beadle 
                and Tatum suggested that independent mutants could be found for every 
                enzymatically control reaction.  One "mutation" one enzyme 
                hypothesis
  C.  1944 - The final link was placed when Avery, McCary and McLeod 
                demonstrated that DNA was genetic material.  This must be site of 
                nuclear mutations.  As of 1945: "One DNA gene - one enzyme".  Next 
                Step: analysis of Sickle-Cell Anemia
  D.  1949 - James Neel and E.A. Beet demonstrated that Sickle-cell was inherited 
                as a 2-allele Mendelian trait.
                   AA - normal
                   AS - largely unaffected carriers - blood does sickle
                   SS - Sickling - under low oxygen tension (venous blood) 
                        erythrocytes elongate (sickle).  RBC's aggregate and block 
                        capillaries, depriving tissues of oxygen.  Crisis may be fatal as 
                        kidneys, brain muscles and lungs affected.
  E.  1949 - Linus Pauling, Itano Singer and Wells found that hemoglobins isolated 
                  from normal and sickle-cell migrate in electric field (Electrophoresis) at 
                  different rates.  Concluded that a chemical difference existed between 
                  to hemoglobin forms (HbA and HbS).
  F.  1954, 1957 - Vernon Ingram found that chemical difference occurred in globin 
                   portion (proteinaceous), not the non-protein heme groups.  He found 
                   that Human Hemoglobin contains 4 polypeptides in globin protein; 2 
                   alpha chains (141 AA long) and 2 beta chains (146 AA long).  The HbS 
                   had a single AA change in the 6th position of the Beta chain.  Valine 
                   instead of Glutamic Acid
  G.  Implications:
    1.  Heritable (DNA) mutation affecting a non-enzymatic protein.  DNA codes for 
         proteins, not just enzymes.
    2.  Affect is at level of polypeptide - "One gene - one polypeptide"
         The link was made:  DNA determines protein formation.  The question now 
         was HOW?  How does the sequence of nucleotides in DNA determine the 
         sequence of AA in a protein?
         Two questions:
             (1)  Information Transfer - conversion of DNA into mRNA -  Breaking the 
                  "Genetic Code".   Question degenerates to "how does the 
                   ribonucleotide sequence of mRNA (U,A,G,C) code for amino acid?
             (2)  The mechanism - Transcription/Translation

III.  Summary of the Code
  A.  Characteristics
    1.  Linear - sequence of nucleotide bases are linear 
    2.  Triplet Code - each sequence of 3 nucleotide bases specifies one amino acid 
                               [Codon]
    3.  Unambiguous - each triplet specifies only one amino acid
    4.  Degenerate - A particular amino acid is specified by more than one triplet.
    5.  Start and Stops - Initiation and Termination Codons
    6.  Ordered - degenerate codons for a given amino acid are grouped together 
                         most often varying by only the third base
    7.  No internal punctuation
    8.  Non-overlapping
    9.  Universal - used by almost all Pro- and Eu- karyotes

IV.  Deciphering the Code
  A.  George Gamow (1954):  Diamond Code, overlapping DNA code
    1.  no intermediate - direct DNA template
    2.  overlapping
  B.  Arguments against overlapping code
    1.  Sidney Brenner:  Presumed triplet code (fewest number that could code for 
        20 amino acids).  Reasoned that overlap code meant restriction on amino acid 
        sequencing.  For any 3-base sequence, only 16 possible tripeptides  (4 in first, 
        X 4 in third)  24 = 16
    2.  Mutation - with overlapping code, .1 amino acid would change; yet human 
        hemoglobin revealed single amino acid changes.
    3.  Crick 1957 - coding would require H-bonding, which is unlikely between 
        nucleotides and amino acids.  Requires an adapter molecule (tRNA) - would 
        covalently bind to the amino acid, yet be capable of hydrogen bonding to a 
        nucleotide sequence.  An overlapping code would require that these adapters 
        overlap - seemed structurally unlikely
  C.  Francois Jacob and Jacques Monod (1961) - postulated the existence of 
        messenger RNA (mRNA)
  D.  Marshall Nirenberg and J. Heinrick Matthaei (1961)
    1.  Required two things:
      a.  A cell-free protein-synthesizing system:  ATP, CTP, UTP, GTP; Mg++, K+ 
          (stabilizers)  Create random RNA fragments!
      b.  The enzyme Polynucleotide Phosphorylase allowed the production of 
          synthetic mRNAs.  These mRNAs served as templates for polypepetide 
          synthesis in the cell-free system.
    2.  Put RNA fragments in with ribosomes, tRNA, amino acids, and enzymes =>
        create polypeptides
      a.  Homopolymers:  Assuming triplet,  UUU => Phenylalanine
                             Repeated for   AAA => lysine
                             GGG => didn't work right - molecule folded
                             CCC => proline
      b.  Heteropolymers: Ratios of 2 bases: 1A:5C can predict ratio of triplets 
            3A = AAA = (1/6)3 = 0.4 =  Lysine
         2A:1C = AAC, ACA, CAA = (2.3) X 3 = 6.9 =  Asparagine (2) Glutamine (2)
         1A:2C = ACC, CAC, CCA = (11.6) X 3 = 34.8 =  Threonine (12)  Histidine (14)
            3C = CCC = 57.9% =   Proline
        Defining the combo's using other techniques led to a complete description of 
        the code.
  D.  The Triplet Binding Technique
    1.  1964  Nirenberg and Philip Leder:  Triplet Binding Assay (Specific Sequence 
        Assignments for the Triplets)
    2.  The Method:
  a.  Ribosomes bind to nitrocellulose, but mRNA and tRNA do not.
  b.  Charge Ribosomes with single codon (triplet) mRNA and bind them to the 
      nitrocellulose.
  c.  Add the radio-labeled amino acids that have charged their tRNA.
  d.  Combine the labeled-charged tRNA with the nitrocellulose.  If the radioactivity 
      remains, the codon-anticodon have matched.
  e.  Analyze the sequence on the codon and anti-codon.
  E.  The Use of Repeating Copolymers
    1.  Gobind Khorana - chemically synthesize di-, tri-, and tetra- nucleotides.
    2.  Dinucleotide: 2 repeating triplets
             Trinucleotide: 3 repeating triplets
             Tetranucleotide: 4 repeating triplets
    3.  Technique confirmed other's results and gave light to new assignments.  
        Including stop codons.

V.  The Coding Dictionary
  A.  Degeneracy and Wobble (Crick, 1966)
    1.  Third position is usually the variable position
    2.  Wobble allows the anticodon of a single tRNA species to pair with more than 
        one triplet in mRNA
      a.  Can be done without changing the amino acid sequence
      b.  U may pair with A or G.  G may pair with C or U.  Inosine may pair with C,U, 
          or A
      c.  With this in mind, only 30 different tRNA species are needed to 
          accommodate the 61 triplets specifying amino acids.
  B.  Initiation and Termination
    1.  Bacteria: fmet (N-formylmethionine) with AUG (sometimes GUG)
      a.  The formyl may be removed or the whole residue removed after synthesis of 
          the protein.
      b.  Within the protein, only met added with AUG codon
    2.  Eukaryotes: AUG start codon adds only met.
    3.  Termination: UAA, UAG, UGA
      a.  Nonsense mutation - premature termination caused by an internal amino 
          acid codon being changed to a stop codon.
          amber (UAG)
          ochre (UAA)
          opal (UGA)
      b.  Missense mutation - when stop codon is changed to an amino acid codon.

VI.  Confirmation of Code Studies: Phage MS2

VII.  Universality of the Code
  A.  Between 1960-1978 the universality of the code was established.
      Virtually the same in viruses, bacteria, and eukaryotes.
  B.  Some exceptions were the yeast and human mitochondria
    1.  Most variations in wobble position
    2.  Led to a reduction in the number of tRNA species needed (22)

VIII.  Protein Synthesis:  Players
  A.  DNA template
    1.  Sense and Antisense strands:  Initial template binding occurs by the sigma subunit 
        of the RNA Polymerase enzyme at the promoter sequences of DNA.
        a.  RNA Polymerase enzyme explores the DNA "looking" for the start points
        b.  RNA Polymerase is a large enzyme - covers 60 nucleotide pairs
    2.  Promoters: consensus sequences
      (a) -10 (David) Pribnow box 3'TATAATG5' important in binding RNA polymerase
      (b) -35 Recognition 3'TTGACA5' distance is important
    3.  Information codes for: mRNA, tRNA, rRNA
  B.  RNA Polymerase (1959 - Samuel Weiss)
    1.  makes all three RNA's
    2.  Structure
      (a)  Core =  2a, b, b', sigma polypeptides
      (b)  Functional subunits
  beta polypeptide = provides the catalytic basis and active site
           sigma subunit = plays a regulatory function involving the initiation of RNA 
                                    transcription
    3.  Prokaryotes have a single form.  Eukaryotes have 3 RNA polymerases
      (a)  RNA pol I - rRNA, found in the Nucleolus
      (b)  RNA pol II - mRNA, found in the Nucleoplasm
      (c)  RNA pol III - 5sRNA and tRNAs, found in the Nucleoplasm

IX.  Process
  A.  Binding
    1.  RNA Polymerase binds to the promoter regions (-35 and -10)
      (a)  RNA Polymerase is a large enzyme
      (b)  relationship between promoters and RNA polymerase may aid in orienting 
           molecule correctly.
      (c)  requires the sigma unit (different sigma units control the amount of RNA 
           transcription)
    2.  Hydrogen bonds broken (easily at the High A=T regions)
  B.  Initiation
    1.  Begins reading the DNA 3' => 5' on the sense strand
    2.  Approximately 6-7 bases after the -10 (Pribnow Box), first two complements 
        linked (ribonucleotide triphosphate).
    3.  Form a DNA-RNA hybrid helix (temporary)
  C.  Elongation
    1.  Creates an RNA strand (5' => 3') by adding ribonucleotides to the free 3'-OH 
        group at a rate of 50 bases/sec (37oC).
    2.  After about 10 bases, the sigma unit drops off, not required for elongation
  D.  Termination
    1.  Palindromic sequences
    2.  How might they work?  rho dependency
        a.  Rho independent:  a long palindrome codes for a large hair-pin, which, 
            when formed, dissociates polymerase from template.
        b.  Rho dependent:  shorter palindromes create smaller hairpins - a rho subunit is 
            required to disengage polymerase.

X.  Translation:  Players
  A.  mRNA  GENE TRANSCRIPT
    (a)  cloverleaf shape, about 75 to 90 nucleotides - synthesized as a larger 
          transcript
    (b)  many post transcriptionally modified bases:
              inosinic acid
              hypoxanthine
              ribothymidylic acid
              pseudouridine
    (c)  TYC arm, pseudouridine arm, amino acid binding arm (3'-CCA), and 
         anticodon arm, extra arm varies in length (short and long groups)
  C.  Ribosomes [Svedberg 1S = 10-13s]
    (a)  synthesized in the nucleolus of the eukaryotic nucleolus.  In prokaryotes, 
         simply synthesized from DNA in cytoplasm.
    (b)             Prokaryote                         Eukaryote
         70S = 50S and 30S                        80S = 60S and 40S
         50S = 23S and 5S with 31 proteins        60S = 28S, 5.8S, 5S, and 50 proteins
         30S = 16S with 21 proteins               40S = 18S and 33 proteins
    (c)  rRNA acts as a scaffold to hold the proteins.  The 3' end of the 16S rRNA is 
         important in recognition of the mRNA.
    (d)  The proteins act as "factors" - initiation, elongation, and termination
    (e)  E. coli: 7 copies of a single sequence codes for all three rRNA components.  
         Initial transcript is a 30S RNA that is cleaved into 23S, 16S, and 5S.
  (f)  Eukaryotes: multiple tandem repeats separated by spacer DNA.  Large 
       transcripts that are cleaved to make the smaller subunits.  Example: Humans 
       45S transcript found on the ends of chromosomes 13, 14, 15, 21, and 22.  5S not 
       part of this larger transcript (Humans: chromosome 1).

XI.  Preparation for Translation: Charging of tRNA
  A.  An amino acid reacts with ATP by the action of  aminoacyl-synthetase  to 
      form Amino Acid-AMP + (PPi).  The aminoacyl-synthetase then binds this 
      activated amino acid to the proper tRNA with the elimination of AMP.
  B.  Binding to the tRNA is on the 3' end CCA.  The adenine connects through the 
      3'-OH to the carboxyl group of the amino acid.
  C.  30 different tRNAs and 20 different aminoacyl-synthetases.  Due to the wobble 
      of the 3rd position in the Codon.
  D.  "Second Genetic Code" - the specific binding of tRNA to the appropriate amino 
      acid

XII.  Process (Prokaryotes)
  A.  Initiation
    1.  Small ribosomal subunit (30S) binds S1 protein to activate the small subunit.  
        This covers an area of 35-40 bp on the mRNA.
    2.  The activated small subunit then binds to a consensus sequence at a short 
        sequence that is complementary to part of a hexamer that lies close to the 3' 
        end of the 16S rRNA.  5'...AGGAGG...3' called Shine-Dalgarno sequence.  
        Lies about 7 bases downstream from the AUG start codon.
    3.  Activated small subunit reads down the mRNA until the start codon AUG is 
        found.  Places the AUG codon in the P site (Peptidyl site).
    4.  A special tRNA carrying fMet binds to this structure, forming the initiation 
        complex.
  B.  Elongation
    1.  Large subunit "sits" on initiation complex such that the "P" site is occupied by 
        the fMet-tRNA.
    2.  The second tRNA, dictated by 2nd codon
    3.  Translocation reaction
      a.  peptidyl transferase, an enzyme in large subunit, forms peptide bond 
          between amino acids.
      b.  At same time, bond linking the mRNA to tRNA is broken, and the tRNA is 
          released.
      c.  Structures (ribosome) moves 3 bases to the right.
      d.  Requires several elongation factors (EFs)
  C.  Termination
    1.  UGA, UAG, UAA => no tRNA's
    2.  The polypeptide is cleaved from last tRNA by releasing factors.  GTP-
        dependent release factors: cleave the polypeptide chain from the terminal 
        tRNA
   D.  Polyribosomes: repetition of protein synthesis on one mRNA

XIII.  Protein Synthesis in Eukaryotes
  A.  Transcription and Translation in Eukaryotes: Differences from Prokaryotes
    1.  Transcription in eukaryotes occurs within the nucleus under the direction of 3 
        separate forms of RNA polymerases;  for the mRNA to be translated, it must 
        move out into the cytoplasm.
    2.  The initiation and regulation of transcription are under the control of extensive 
        nucleotide sequences found in DNA upstream from the pint of initial 
        transcription.  Promoters and Enhancers
    3.  Translation occurs on ribosomes that are larger and whose rRNA and proteins 
        are more complex than those present in prokaryotes.
    4.  Protein factors similar to those in prokaryotes guide initiation, elongation, and 
        termination of translation in eukaryotes.  However, there appear to be more 
        factors required during each of these steps.
    5.  Initiation of eukaryotic translation does not require the amino acid 
        formylmethionine.  however, the AUG triplet is essential to the formation of 
        the translational complex and a unique transfer RNA (tRNAimet) is used for 
        its initiation.
    6.  Extensive modifications occur to eukaryotic RNA transcripts that eventually 
        serve as mRNAs.  The initial transcripts are much larger than those that are 
        eventually translated.  Thus, they are called pre-mRNAs and are thought to 
        constitute a group of molecules found only in the nucleus - a group referred to 
        generally as heterogeneous RNA (hnRNA).  Only about 25% of hnRNA 
        molecules are converted to mRNA.  Those that have substantial amounts of 
        their ribonucleotide sequence excised, while the remaining segments are 
        spliced back together prior to translation.  This phenomenon has given rise to 
        the concept of so-called split genes in eukaryotes.
    7.  Prior to the processing of an mRNA transcript, a cap and tail are added to the 
        molecule.  these modifications are essential to efficient processing and 
        subsequently to translation.
    8.  Eukaryotic mRNAs are much longer-lived than their bacterial counterparts.  
        Most exist for hours, rather than minutes, prior to their degradation in the cell.
  B.  TRANSCRIPTION:  Players
    1.  RNA Polymerase - no subunits known
      a.  RNA Polymerase I - rRNA
      b.  RNA Polymerase II - mRNA
      c.  RNA Polymerase III - tRNA
    2.  DNA Double Helix
      a.  higher levels of structure -bending, coiling, etc.  (Helicase, Gyrase)
      b.  Promoter Sequences - Polymerase Specific
        (1)  RNA pol I - only codes for rRNA genes (multiple copies, but all identical).  
             Some consensus sequences -38 ATCTTT.  But typically, most species 
             have unique sequences.  Easy for RNA Pol I to evolve with changes in 
             promoter.  Similarity increase as ancestry becomes more recent.
        (2)  RNA Pol II - codes for all different mRNA's.  Much more restrictive, can't 
             evolve with one sequence or lose capacity to transcribe others.  As a 
             result, much greater conservation over evolutionary time.  Consensus 
             sequence among eukaryotes.
              -25 TAT[]A[] (Goldberg-Hogness or TATA Box[analogous to -10 in prokaryotes]
              -75 GGCCAATCT (CAAT Box)
             -110 GC Box
             Promoter not required for transcription, but is required for orderly, 
             repeatable transcription, always starts in same place.
        (3).  RNA Poly III - tRNA short sequences with internal promoters.
      c.  Enhancer sequences (discovered 1980):  Possibly short sequences involved 
          in binding transcription factors
        (1)  regions important to productivity levels, mutation causes 100X decrease in 
             productivity
        (2)  Position not fixed relative to gene they enhance.  may be up-stream, 
down-stream, or in the gene. Can be up to 3000 bp away. (3) Not gene specific, If the enhancer or the gene is moved, an unrelated gene will be enhanced. (4) Sequences may induce formation of Z-DNA, altering degree of supercoiling, or protein product may aid RNA Pol binding. 3. Transcription Factors - additional proteins required for transcription. Aid in RNA pol lII binding to promoter a. TATA - factor (TFIID): Binds to promoter b. also factors for GC region and CCAAT Box c. supplant role of sigma factor in prokaryotes. Greater degree of specific control C. Process 1. Mechanics (similar to prokaryotes) a. Binding of RNA Pol to promoter; efficiency modified by enhancers and transcription factors. b. Transcribes by linking complementary ribonucleotides c. Termination - similar (palindromic regions) 2. Modifications a. 1970, James Darnell, recognized that initial mRNA products are considerably longer than cytoplasmic mRNA that is translated. 1. Heterogeneous RNA (hnRNA). Found only in nucleus, not in the cytoplasm 2 By 1977, Susan Berget and Philip Sharp found that there were internal sequences in hnRNA that were not present in the mRNA. Regions spliced out in mRNA may be 20,000 bp long (introns), and intervening sequences spliced together (exons). Splicing accomplished by Spliceosome and snRNA (small nuclear ribonucleoproteins). Made-up of an RNA scaffold with SNURPS. [40S yeast, 60S mammals] Obviously, implies that intervening sequences exist between coding regions in the DNA. DNA => Intervening Sequences - Introns (interruptions) => coding Sequences - Exons (expressed) c. Recognition of the Exon Regions (1) Sequence analysis of RNA product in relation to the DNA sequence showed portions of the DNA sequence were missing in the mRNA of the cytoplasm (2) Formation of heteroduplexes: Bind DNA to RNA, it will bind where complementary and antisense will have to "loop-out". The "loop-out" regions were the introns (3) Splicing Mechanisms vary depending on the RNA being produced tRNA: Endonuclease and Ligase rRNA: Self splicing, Ribozyme mRNA: Lariat Formation Mitochondrial RNA: RNA maturase d. Mechanics of splicing (1) Each RNA is spliced in a different way (2) Splicing is catalyzed by a series of endonucleases and ligases coalesced into a large enzymatic unit called the spliceosome. (3) Tom Maniatis (1991) The concentration of endonucleases and associated splicing factors influences the position of the cut. If SXL absent - cut is made at site 1, and UAG preemptively stops translation, no functional protein produced. If SXL splicing protein is present, it binds to site 1 and cut is made at 2-. Functional protein produced. 3. Review a. Each RNA Polymerase recognizes its own promoter b. To produce mRNA, RNA pol II recognizes the TATA Box (-25) and/or CAAT Box (-75), and transcribes to terminator (introns and exons) c. Introns spliced out - may be tissue specific regulation based on concentration of splicing proteins in spliceosome. d. 5'-Cap and 3'-polyA tail added. e. mRNA transported to the cytoplasm C. Translation : Players: 1. mRNA, with start and stop codons 2. Ribosome: 80S => large subunit 60S, small subunit 40S Large subunit = 28S, 5.8S, and 5S rRNA + 50 proteins Small subunit = 18S + 33 proteins 3. tRNA - same specificity D. Process 1. Mechanics: same as prokaryotes EXCEPT: mRNA binds to small subunit after the tRNAimet. The mRNA then binds and the whole unit moves down the mRNA to the start codon where the large subunit binds. 2. Modifications: Post translational modifications a. Adding sugar, lipid, or protein groups (not exclusive to eukaryotes) b. Splicing (Patricia Kane 1990) : Yeast TFP1 gene - codes for 1 translational product which is cleaved into 2 functional proteins. E. Exceptions: 1. Non-universality: some triplet codons have different results in different organisms a. UGA - usually a terminator, but in human mitochondria and yeast mitochondria it codes for tryptophan b. UAG - usually a terminator, but in paramecium it codes for glycine. c. five other exceptions are known. Really not many, given the extensive work done on a variety of organisms. 2. Overlapping Genes a. fX174 virus has circular chromosome consisting of 5386 nucleotides that could encode a maximum of 1795 amino acids, enough for 5 to 6 proteins. But, it produces 11, consisting of over 2300 amino acids. Prokaryote Transcription I. Operons A. For greatest efficiency, the genes for proteins that function together are controlled together. Coordinate control through grouping functionally related genes together so they can be regulated together. B. The lac Operon 1. Lactose metabolism in E. coli is carried out by two enzymes, with possible involvement by a third. a. galactose permease = lacY transports lactose into cells b. beta galactosidase = lacZ cuts the beta-galactosidic bond between galactose and glucose c. galactose transacetylase = lacA function in lactose metabolism is unclear 2. The genes for all three enzymes are clustered together and transcribed together from one promoter, yielding a polycistronic message. a. cistron is a synonym for gene 3. Therefore, these three genes, linked in function, are also linked in expression. 4. They are turned off and on together C. Negative Control of the lac Operon: Occurs as follows: 1. Negative control implies that the operon is turned on unless something intervenes to stop it 2. The operon is turned off as long as repressor binds to the operator, because the repressor interferes with RNA polymerase's binding to the adjacent promoter. a. lac repressor: product of the lacI gene is a tetramer of four identical protein chains that bind to the operator b. When the repressor is bound to the operator, the operon is repressed c. Operator and promoter are contiguous d. If RNA Polymerase cannot transcribe the lacZ, T, and A genes, the operon is off, or repressed. i. Structural Genes: lacZ, lacY, and lacA ii. Regulatory Genes: Promoter and Operator 3. When the supply of glucose is exhausted and lactose is available, the few molecules of lac operon enzymes produce a few molecules of allolactose from the lactose. a. Lac operon is repressed as long as glucose is present b. Repressor: allosteric protein c. Changes its conformation, therefore its function when it binds to and inducer molecule d. Allolactose: when beta-galactosidase cleaves lactose to galactose plus glucose, it rearranges a small fraction of the lactose to allolactose. i. Allolactose is just galactose linked to glucose in a different way than in lactose. ii. In lactose, the linkage is through a beta-1,4 bond; in allolactose, the linkage is beta-1,6. 4. Allolactose acts as an inducer by binding to the repressor and causing a conformational shift that encourages dissociation from the operator. 5. With the repressor removed, RNA polymerase is free to bind to the lac promoter and transcribe the three structural genes. D. Operon Hypothesis 1. Francois Jacob and Jacques Monod (1940-50) Pasteur Institute 2. By 1950 realized that the three enzymes were induced together a. Constitutive mutants: that needed no induction b. They produced the three gene products all the time 3. Arthur Pardee, Jacob and Monod created merodiploids (partial diploids) carrying both the wild-type (indelible) and constitutive alleles of the lac operon. a. Indelible allele proved to beta dominant, demonstrated that wild-type cells produced some substance that turned the lac genes off b. lac repressor: controlled both wild-type and merodiploids 4. Existence of a repressor required that there be some specific receptor to which the repressor would bind. Jacob and Monod called this the Operator. a. cis-dominant: it is dominant only with respect to genes on the same piece of DNA (in cis; Latin: cis = here) mutations are called Oc, for operator constitutive. b. mutations in the repressor gene: dominant both in cis (on same chromosome) and in trans ( on different chromosome) because the mutant repressor will remain bound to both operators even in the presence of the inducer. c. Suzanne Bourgeois: later found many other mutants. These are named Is to distinguish them from constitutive repressor mutants (I-) which make a repressor that cannot recognize the operator. E. Positive Control: 1. It would be wasteful for E. coli to turn on the lac operon in the presence of glucose. E. coli cells keep the lac operon turned off as long as glucose is present. Geneticists suspected that this selection in factor of glucose metabolism and against use of other energy sources was due to the influence of some breakdown product, or catabolite. Hence the name Catabolite Repression 2. Positive control (catabolite repression) of the lac operon works as follows: a complex composed of cAMP plus a protein known as catabolite activator protein (CAP) binds to the upstream part of the promoter and facilitates binding of RNA polymerase to the down stream part. a. One substance that responds to glucose concentration is a nucleotide called cyclic AMP (cAMP). The lower the level of glucose, the higher the concentration of cAMP. b. Positive controller of the lac operon is a complex composed to two parts: cAMP and a binding protein called CAP, for Catabolite activator protein by its discoverers, Geoffrey Zubay and his colleagues. The protein binds cAMP, and the resulting complex binds to the lac promoter and turns it on by helping RNA polymerase bind. Notice that the lac promoter is divided into two parts, the CAP binding site on the left and the RNA polymerase binding site on the right. c. CAP and cAMP also stimulate transcription of other inducible operons, including the well-studied ara and gal operons. 3. This greatly enhances transcription of the operon. 4. The physiological significance of this positive control mechanism is that it can only operate when cAMP concentration is elevated; this occurs when glucose concentration is low and there is a corresponding need to use an alternate energy source. II. Temporal Control of Transcription (Viral early and late genes. Bacillus subtilis sporulation genes) A. Modification of the Host RNA Polymerase 1. Bacillus subtilis: and phage SPO1 2. SPO1: large phage with many genes a. Transcription of phage SPO1 genes in infected B. subtilis cells proceeds according to a temporal program in which early genes are transcribed first, then middle genes, and finally late genes. i. first five minutes: early genes are expressed ii. middle genes turn on about 5 to 10 minutes post-infection iii. from about the 10 minute point until the end of infection, the late genes switch on b. Switching is directed by a set of phage-encoded sigma factors that associate with the host core RNA polymerase and change its specificity from early to middle to late. i. phage does not carry its own RNA polymerase ii. B. subtilis holoenzyme: 2alpha, beta, beta' iii. sigma factor MW= 43,000 iv. one of the genes transcribed in the early phage of SPO1 is called gene 28 gp28 (gene product) displaces host sigma (sigma43) gp28 changes the RNA pol specificity: transcribes the phage middle genes gp28 is a novel sigma factor that accomplishes two things; (1) it diverts the host's polymerase from transcribing host genes (2) it switches from early to middle transcription. c. Host sigma is specific for the page early genes; the phage gp28 witches the specificity to the middle gene; and the phage gp33 and gp34 switch to late specificity. i. gp33 and gp34 products of two phage middle genes replace gp28 ii. Note that the polypeptides of the host core polymerase remain constant throughout this process; it is the progressive substitution of sigma factors that changes the specificity of the enzyme and thereby directs the transcriptional program. B. RNA Polymerase encoded in Phage T7 1. T7: smaller genome than SPO1 a. distinguish three phases of transcription: class I, class II, and class III 2. Phage T7, instead of coding for a new sigma factor to change the host polymerase's specificity from early to late, encodes a whole new RNA polymerase with absolute specificity for the later phage genes. 3. This polymerase, composed of a single polypeptide, is a product of one of the earliest phage genes, gene 1. 4. The temporal program in the infection by this phage is simple. The host polymerase reads the earliest (class I) genes, one of whose products is the phage polymerase, which then reads the later (class II and class III) genes. C. Control of Transcription During Sporulation 1. Growth state: vegetative. Onset of hard times forms protective endospores through sporulation 2. When the bacterium B. subtilis sporulates, a whole new set of sporulation- specific genes turns on, and many, but not all vegetative genes turn off. 3. Switch takes place largely at the transcription level. 4. Accomplished by several new sigma factors that displace the vegetative sigma factor from the core RNA polymerase and direct transcription f sporulation genes instead of vegetative genes. i. sigma29, sigma30, and sigma32 (sigmaE, sigmaH, and sigmaC) for sporulation ii. sigma43 (sigmaA) for vegetative 5. Each sigma factor has its own preferred promoter sequence. 6. E. coli also has multiple sigma factor with their own specificities. Heat-shock genes sigma32 (or sigmaH) D. Infection of E. coli By Phage l 1. Phage l can replicate in either of two ways: lytic or lysogenic. 2. In the lytic mode, almost all of the phage genes are transcribed and translated, and the phage DNA is replicated, leading to production of progeny phages and lysis of the host cells. 3. In the lysogenic mode, the l DNA is incorporated into the host genome; after that occurs, only one gene is expressed. 4. The product of this gene, the l repressor, binds to the two early phage operators and prevents transcription of all the rest of the phage gene. 5. However, the incorporated phage DNA ( the prophage) still replicates, since it has become part of the host DNA. 6. Lytic Replication of Phage l a. Linear, as the DNA exists in the phage particles, and circular, the shape the DNA assumes shortly after infection begins. i. 12-base overhangs,, or sticky ends ii. cohesive ends go by the name cos iii. cyclization brings together all the late genes, which had been separated at the two ends of the linear genome. b. The immediate early/delayed early/late transcriptional switching in the lytic cycle of phage l is controlled by antiterminators. i. host RNA polymerase holoenzyme transcribes the immediate early genes first. cro and N ii. One of the two immediate early genes is cro, which codes for an antirepressor that allows the lytic cycle to continue. iii. The other, N, codes for an antiterminator, pN, that over rides the terminators after the N and cro genes. iv. cro and N lie immediately downstream from the rightward and leftward promoters, PR and PL v. At this stage no repressor is bound to the operators that govern these promoters (OR and OL): transcription proceeds unimpeded. vi. When polymerase reaches the ends of the immediate early genes, it encounters. r- dependent terminators and stops short of the delayed early genes. vii. Same promoters (PR and PL) are used for both immediate early and delayed early transcription. Does not involve a new sigma factor or RNA polymerase. it involves an extension of transcripts controlled by the same promoters. c. Transcription then continues on into the delayed early genes. establishes lysogeny. i. rightward side, downstream from cII, genes O and P code for protein necessary for phage DNA replication. ii. Q gene product (pQ) = antiterminator which permits transcription of late genes. iii. One of the delayed early genes, Q, codes for another antiterminator (pQ) that permits transcription of the late genes from the late promoter, PR', to continue without premature termination. d. Late genes are all transcribed in the rightward direction i. late promoter PR' lies just downstream from Q ii. Transcription from this promoter terminates after only 194 bases, unless pQ intervenes to prevent termination. iii. late genes code for the proteins that make up that phage head and tail, and for protein that lyse the host cell so the progeny phage can escape. 7 Establishing Lysogeny a. delayed early genes are required for lytic and lysogenic cycles i. most delayed early gene products are needed for integration of the phage DNA into the host genome ii. products of the cII and cIII genes allow transcription of the cI gene and therefore production of the l repressor , the central component in lysogeny. b. Phage l establishes lysogeny by causing production of enough repressor to bind to the early operators and prevent further early RNA synthesis c. Promoter used for establishment of lysogeny is PRE, which lies to the right of PR and cro. i. RE in PRE stand for repressor establishment. ii. PRE lies to the right of both PR and cro. iii. It directs transcription leftward through cro and then through cI d. Transcription form this promoter goes leftward through the cI gene. e. another promoter, PRM, comes into play later, to maintain lysogeny. i. RM in PRM stand for repressor maintenance. promoter used during lysogeny to ensure a continuing supply of repressor to maintain the lysogenic state. ii. Peculiar property of requiring own product (repressor) for activity. iii. this promoter cannot be used to establish lysogeny because there is no repressor to activate it at the beginning of infection 8. Lysogen Induction: a. E. coli cells respond to environmental insults, such as mutagens or radiation, by inducing a set of genes whose collective activity is called the SOS response i. When a lysogen suffers DNA damage, it induces the SOS response. ii. recA product participates in recombination repair of DNA damage but environmental insults also induce a new activity in the recA protein - stimulates a latent protease, or protein-cleaving activity in the l repressor. c. The initial event in this response is the appearance of co-protease activity in the RecA protein. d. This causes the l repressors to cut themselves in half, removing them from the l operators and inducing the lytic cycle. d. In this way, l phages can escape the potentially lethal damage that is occurring in their host. III. Specific DNA-Protein Interactions A. Proteins that bind to specific regions of DNA 1. l repressor and cro bind to their respective operators 2. CAP which binds to the CAP binding site in the lac promoter B. They can locate and bind to one particular short DNA sequence among a vast excess of unrelated DNA C. All the proteins just listed have a similar structural motif: two alpha helices connected by a short protein turn {helix-turn- helix}