The transcription of genes could be defined as the intricate molecular manoeuvres occurring in the nuclei of cells, which allow the translation of genetic information held in the DNA into the proteins required for life. Gene transcription is the dominant control point in the production of any protein, and is initiated and regulated through the combined activities of a highly specialised set of nuclear proteins. This review examines the role of these protein “transcription factors” in the production of messenger RNA, the information intermediary produced in the nucleus, and transferred to the cytoplasm to serve as a template for protein synthesis. In combination with RNA polymerase, an extraordinary and complex enzyme required to synthesise new RNA molecules, a multitude of transcription factors combine their activities to orchestrate and control this elegant process.
- RNA polymerase
- transcription factors
- messenger RNA
Statistics from Altmetric.com
Genes encode proteins: this is the first tenet of modern biology. However, DNA resides in the nucleus of a cell, and proteins are formed in the cytoplasm, presenting early researchers with their most frustrating conundrum. Clearly, information was not transferred directly from DNA to protein. Mendel's analysis of inheritance, and the discovery of chromosomes containing thousands of genes, had long since given birth to the field of genetics; however, it was not until 1957, four years after Watson and Crick1 elucidated the structure of DNA, that the story of how proteins are created truly began to unfold. Progress hinged on the discovery of a new player in this most complex of games: RNA.2 Unlike DNA, RNA was synthesised on demand, was “turned over” quickly and, most importantly, RNA was synthesised in the nucleus but then translocated out of the nucleus and into the cytoplasm, the site of protein synthesis. RNA was heralded as the best candidate so far for an information transfer intermediary between DNA and protein (fig 1).
This early promise was fulfilled, and a series of elegant experiments in the late fifties and early sixties went on to establish that proteins were indeed created through the translation of specialist messenger RNA molecules (mRNA) transcribed from individual genes in the nucleus.3 This process of transcription, the creation of RNA messages sent from the nucleus to the cytoplasm, is the dominant control point in the production of any given protein. The elegant complexities of transcriptional control are described below, beginning with a more thorough description of the key player: RNA.
RNA, like DNA, is a macromolecule made up of a long chain of individual nucleotides. However, it differs from DNA in several crucial aspects of its structure. First, unlike the classic Watson and Crick double helix, RNA is usually single stranded. Second, as its name implies, RNA contains a ribose sugar group rather than the deoxyribose groups found in DNA. Third, and for reasons that remain largely unclear, RNA contains the pyrimidine base uracil (U), instead of thymine (T), hence the thymine to adenine (T-A) and cytosine to guanine (C-G) pairings seen between the base pairs in DNA are replaced in RNA by U-A and C-G couplings. RNA is synthesised from DNA based on the complementarity of bases, a complex procedure carried out by the enzyme RNA polymerase (fig 2).4
The Watson and Crick double helix unwinds, allowing single stranded RNA to be synthesised using a single stranded DNA sequence as a template. The template DNA strand is termed the “sense strand”, with its partner DNA strand termed the “antisense strand”. Hence, the newly synthesised RNA has, with the exception of the inclusion of uracil to replace thymine, the same sequence as the antisense DNA strand. It is a phenomenally elegant and efficient process, with the newly formed RNA molecules being transferred to the cytoplasm and converted or “translated” into the appropriate protein, with the DNA double helix “closing” behind it as it leaves, ready to be transcribed again when required.5 Let us now look in more detail at how these transcription events are controlled.
In higher, more complex organisms, several transcribing enzymes or “RNA polymerases” are used, and RNA molecules with a broad range of functions are synthesised from their DNA templates on demand. In the ensuing sections, however, we will focus on RNA polymerase II (RNA pol II), the enzyme responsible for mRNA synthesis,6 the first step in the production of a protein. RNA pol II is responsible for the synthesis of all mRNA molecules that encode proteins, and is the best characterised of all the polymerase molecules. Transcription of a gene, that is, the production of an mRNA “copy” that can be used as a template for protein synthesis, is a complex process, requiring the combined and precise activities of a large number of different proteins. We begin with transcription “initiation”, the events that set the whole process in motion.
For the process of transcription to begin, RNA pol II must interact with the “promoter” of the gene of interest. The promoter of a gene is simply defined as the shortest DNA sequence at which RNA pol II can initiate transcription, and these short sequences are highly conserved between thousands of genes.5 Figure 3 depicts a typical gene promoter.
The promoter consists of a “TATA box”, a short stretch of DNA located about 30 nucleotides before the site where transcription begins (in normal nomenclature, the transcription start site, indicated here by the arrow, is designated +1, and all other sites on the DNA are given as positive (downstream) or negative (upstream) values relative to that position; that is, the TATA box is designated as position −30, upstream of the transcriptional start site). The TATA box is so called because it always contains the crucial T-A-T-A nucleotide sequence that the transcription initiating proteins recognise and bind to.7 In addition to the crucial TATA box, the promoter must also contain an “initiator element” (Inr), a non-conserved short sequence of about 17 nucleotides that serves as the site of RNA pol II binding, and overlaps with the transcriptional start site.8,9 These two elements together constitute a “minimal promoter”; that is, the simplest minimal requirement for transcription to begin (fig 4). Of course, events are rarely that simple, but we will begin by using this minimal promoter as our model to look at transcriptional initiation.
So, what are the essential proteins for transcriptional initiation? Well, in addition to RNA pol II, initiation requires a range of “basal” protein transcription factors (TFs), housekeeping proteins that form a large complex required to begin the process (the “preinitiation complex”)10; and luxury “transcriptional activation factors” (TAFs), inducible proteins that can speed up or slow down events in response to cellular signals. We will deal with TAFs later, but for transcription to occur at all the preinitiation complex must form, its job being to unwind the DNA helix, separate the strands to use as a template, and enable RNA pol II to take up its position so that mRNA synthesis can begin. The precise and ordered step by step assembly of the preinitiation complex is described below.
Events begin when the cell receives a signal, ordering the gene of interest to be transcribed. At this point, both the TATA box and the Inr are unoccupied, and the DNA upstream and downstream is tightly wound in a double helix.10
The first step in assembly of the preinitiation complex is the recruitment of TBP, the TATA box binding protein.11 Activated by the inducible luxury TAF proteins (here simply designated TAF and transcription factor IIA), TBP binds to the TATA box by contacting points in both the major and minor grooves of the wound DNA helix (fig 5). This distorts the helix, pulling crucial sequences upstream and downstream into close proximity, creating the ideal landing site for TFIIB,12 the “conductor” protein, which orchestrates the full assembly of the complex and begins the transcriptional process (fig 6).
The second step of the process is the binding of TFIIB to the appropriate site. This small specialist protein binds to the DNA next to the TATA box, but is carefully kept in position through interaction with upstream and downstream sequences, and through direct protein–protein interactions with TBP and one or more of the TAFs.12 The positioning of TFIIB is crucial, because this is the linchpin protein that recruits the RNA pol II molecule into the complex. RNA pol II exists in a resting state in the nucleus in a complex with TFIIF,13,14 which is also recruited by TFIIB to a specific position within the growing complex: step 3 (fig 7).
The third step of the process is the TFIIB driven recruitment of the enormous RNA pol II molecule, which comes already associated with TFIIF (fig 6).14 Both these proteins are positioned and held in place through protein–protein interactions with the versatile TFIIB, while RNA pol II also interacts with the DNA, binding and completely covering the Inr, and sequences located immediately upstream and downstream. The complex is now nearing completion (fig 8).
Step 4 sees the addition of TFIIE,15 a structural protein that contacts TFIIF, but is held in place through protein–protein interactions with the RNA pol II molecule (fig 8). The role of TFIIE is to attract and anchor the final component of the complex, TFIIH (fig 9).
The addition of TFIIH sees the completion of the preinitiation complex, and now transcription can begin in earnest (fig 9). TFIIH,16 the last crucial component of the complex, is a “helicase”; that is, a protein that functions by unwinding DNA helices. The TFIIH helicase “melts” a short 10 nucleotide sequence just at the transcriptional start site, and activates the RNA pol II molecule to begin the transcription process. This process requires an input of energy from the cell, and this comes in the form of an ATP molecule produced by the cell's normal metabolism. The ATP molecule donates it's energy giving phosphates to the RNA pol II molecule, and now the process of transcription has begun.
Elongation and termination
The initiation of transcription is an extraordinarily complex series of events. Because of the inherent complexities of this process, initiation is the major control point in the transcription of all protein coding genes. Immediately after transcription is initiated, the huge protein complex that has been formed begins to split apart—many of the basal transcription factors serve only to get events under way, and leave soon after the process has begun.17 RNA pol II, and its ever associated TFIIF, using the phosphate energy provided by ATP, process along the now unwound DNA molecule, synthesising a new complementary mRNA molecule. Figure 10 depicts this process.
The new mRNA chain (marked RNA) is synthesised by the RNA pol II, which moves along the DNA as it joins one free nucleotide to the next in a sequence based on the template DNA strand it is moving along. TFIIB, TFIIE, and TFIIH, their jobs now done, are released from the complex and degraded.17 The TBP and its associated proteins are recycled; that is, they remain in place, initiating the same events over and over again, until the cell has synthesised all of the mRNA copies of that gene that it requires. The elongation of the nascent mRNA chain continues until the RNA pol II reaches a termination signal on the DNA template (a stretch of nucleotides, the sequence of which causes the polymerase to pause). At this point,17 the polymerase reaches an impasse, and cannot continue along the DNA. It's progress halted, the polymerase loses the energy giving phosphates that help it move, causing it to lose its grip on the DNA and fall off. A simple end to a complex process. At this point the RNA pol II–TFIIF complex is recycled and used again if required, and the DNA, now without proteins bound, returns to its helical resting state.
The events outlined above represent the minimum that occurs in the creation of an mRNA molecule from its DNA template. In higher organisms, proteins are turned over phenomenally quickly, and must be produced rapidly in response to the appropriate stimuli. Take—for example, the human insulin gene.18 Insulin is produced from pancreatic β cells in response to raised blood glucose concentrations. To replenish intracellular stores of insulin, these cells need rapidly to synthesise more insulin, and that requires new mRNA molecules. In turn, these mRNA molecules need to be replaced; therefore, just as glucose stimulates the release of insulin from the cells, so it also stimulates transcription of the insulin gene. However, this is an over simplified description of events, because insulin is also produced in response to a multitude of hormonal signals, nutrients, metabolites, and so on. Transcriptional initiation is the most tightly regulated point in the production of a protein. In other words, in complex organisms with an advanced physiology, which must continually adapt to their environment, the degree of control of transcription is very much greater than the simplified version described above, and many gene promoters in humans are extraordinarily complex, containing many DNA elements upstream of the TATA box and Inr that can enhance or repress transcription. Even in its simplest form, the promoter shown in fig 3 is likely to contain one or more “upstream promoter elements”, which can modify the rate of transcription. Figure 11 shows an example of such a promoter.
The two promoter elements depicted here are among the most common found in the promoters of genes in higher organisms19–21: the GC box and the CCAAT box. The GC box is so called because it contains the sequence GGGCGG, the binding site for a transcription factor protein called SP1. Transcription factors are proteins that interact with the promoters of genes to modify the rate of transcription. SP1 is a “general” transcription factor; that is, it is found in all cell types of the body, and is involved in the transcription of a multitude of genes.22 Many promoters contain multiple GC boxes, all of which bind SP1. Unlike the GC box, the CCAAT box binds a number of different proteins—some are present in all cells, but some are cell specific (that is, they exist only in one particular cell type within the body, activating a specific gene or genes). The activation of transcription occurs through the combined activities of a number of transcription factors, some general, and some cell type specific, and as the promoters become more complex, so the gene can be activated more rapidly, and in response to a wider range of stimuli. Many genes add a further degree of control through the presence of an “enhancer” region, which works in combination with the promoter to control transcription.
In general, gene promoters occur directly upstream from the transcriptional start site, containing the TATA box, Inr, and some basic transcriptional regulatory elements (like the GC and CCAAT boxes). Functionally, promoters are essential for transcription initiation to occur. However, in complex organisms, a simple promoter is rarely sufficient to direct transcription in a timely and responsive enough manner to allow a cell or an organism to adapt to a constantly changing environment. Enhancer regions work in combination with the promoter, and are so called because they enhance the capacity of a cell to transcribe a particular gene with much greater efficiency, and a greater sensitivity to changes in the environment.3 In higher organisms there are no fixed rules for enhancer regions, and a typical example is shown in fig 12.
In this simple case, the enhancer region occurs directly upstream of the promoter. Enhancers occur in varying sizes, and can contain from one sequence element (that is, the binding site for one protein transcription factor) to dozens of elements stretching for thousands of nucleotides. The position of the enhancer relative to the promoter is not fixed. As shown in fig 13, enhancers can occur almost anywhere in relation to the gene they effect. In strongly expressed genes, such as the insulin gene, the promoter and enhancer are combined, creating an extraordinarily powerful and complex control region directly upstream from the transcriptional start site. However, many genes are modified by enhancers that occur thousands of nucleotides upstream or downstream of the gene. In a rare number of cases, enhancers can occur within the coding region of the gene of interest, and still work efficiently to modify transcription rates. Clearly, enhancers are multitalented stretches of DNA, and this complexity is underlined by the fact that in addition to being able to work from a multitude of positions relative to the gene, enhancers also work in an orientation independent manner; that is, if you remove an enhancer sequence and turn it round to face in the opposite direction, it continues to work just the same! So how exactly do these sequences function to effect transcription? Well, the answer lies in the orchestration of protein transcription factor interactions with the preinitiation complex.
TAFs, transcription factors, and some three dimensional thinking
The formation of the preinitiation complex, as indicated, can be instigated and modified through the activity of TAFs, activatory transcription factors that bring about the whole transcription process. However, once the complex is formed, its activity can be enhanced or repressed by the activity of a whole range of additional transcription factors. The very simplest case, the activation of transcription through binding of a positively acting transcription factor to its site within the enhancer, is shown in fig 14.
In this case, the positively acting transcription factor exerts its effect by bending the DNA so as to contact and increase the activity of RNA pol II, which is binding to the promoter (pol II here represents the entire preinitiation complex). In this way, enhancers can operate from any position, because the transcription factors can bend and warp the DNA until everything comes together in the correct position to activate transcription. Although it is easy to think of DNA as a linear molecule, with binding sites upstream and downstream, in reality transcription factors can bend and loop DNA in three dimensions to produce the effect they require. If you consider that a complex gene might have several enhancers, all of which contain multiple sites for transcription factor binding, and all of which are brought into alignment with each other and with pol II by the proteins that bind to them, then the complexities of transcriptional activation become clear! It is also worth remembering that not all transcription factors activate transcription; because it is important to a cell that genes are not over transcribed or produced inappropriately, just as many transcription factors serve to regulate this process in a negative manner.3
The most simple form of negative regulation occurs when negatively acting transcription factors bind to sites near or overlapping the TATA box and Inr, structurally creating a barrier that prevents pol II from binding.23 Figure 15 depicts this scenario.
This is the simplest form of negative regulation. Many negatively acting transcription factors work in the same way as the positively acting factor depicted in fig 14, by binding to enhancer elements and folding and looping the DNA, in this case to block or inhibit pol II activation. In addition, negatively acting factors can compete with positively acting factors for the same binding site, therefore repressing transcription indirectly by preventing the binding of positively acting factors. Frequently, however, negative regulatory factors do not interact with DNA at all, but work through direct interaction with positively acting transcription factors.23 Figure 16 shows an example of this.
Many transcription factors exist as dimers24 (homodimers, where two identical proteins join together to form an active factor; or heterodimers, where two different proteins must join to form an active molecule). In the example above, a positively acting homodimer forms, and then binds to the enhancer element, bending DNA to contact RNA pol II, as in fig 14 previously. However, the negatively acting factor in this case can inhibit this activation without binding to DNA, as it binds to one half of the dimer, preventing formation of the active protein. By forming heterodimers that are unable to bind to DNA, negatively acting factors can sequester the positively acting proteins, rendering them helpless and ineffectual. There are many variations on this method of inhibition, and dimerisation represents a crucial step in the regulation of transcription factor activity, and a very popular control point in the regulation of transcription initiation. Even when an active dimer is formed and binds to DNA its job is not done—negatively acting factors can still intervene to prevent transcriptional activation, as shown in fig 17.24
In this case, the activating transcription factor Gal 4 is thwarted by the binding of Gal 80, which quite literally masks the active protein, so that it can no longer interact with RNA pol II.25 This elegant manipulation of transcription factor activity forms the central control mechanism in the regulation of galactose metabolism, switching galactose utilisation genes (containing Gal 4 binding sites) on and off in response to changes in the body's galactose concentrations (Gal 4 is activated when galactose concentrations are high, with Gal 80 inhibiting its activity when galactose is in short supply). And so we begin to see that the regulation of transcription is extraordinarily complex, so as to provide the necessary versatility of gene expression required to adapt to an ever changing environment.
As the field of transcriptional research progresses and expands, we learn more and more about the intricacies of transcriptional regulation, and the exquisitely delicate interactions that control the expression of our genes. This review describes only the basic building blocks of these control mechanisms, scratching the surface of the myriad of complexities that lie at the heart of gene regulatory events. Nonetheless, all transcription events must begin with the formation of the preinitiation complex, and most are modified, either positively or negatively, through the activities of a specific group of protein transcription factors interacting with DNA elements within the gene promoters and enhancers. These events are rarely simple: to take one well characterised example, fig 18 is a simplified diagram of the promoter/enhancer region controlling transcription of the human insulin gene.
Regulation of the insulin gene is very tightly controlled by a 350 base pair sequence lying directly upstream of the transcriptional start site,26 and regulation occurs in response to a multitude of agents, including glucose, other nutrients, hormones, and metabolites. In the above schematic, each box in the +1 to −350 region of the promoter/enhancer represents the binding site for at least one transcription factor (indicated by the arrows). Many sites bind more than one protein, meaning that at least 15–20 (but most likely many more) transcription factors exert an effect on this gene through binding to this relatively short promoter sequence. In turn, these transcription factors become modified in response to glucose, hormones, metabolites, and so on, which affects their binding activities, dimerisation, transcriptional activation, and ultimately their interactions with other proteins and with pol II.26 Further upstream, additional regulatory sites bind a second, supplementary set of transcription regulatory proteins.27 There are doubtless many other sites, as yet uncharacterised, which also play a role in the regulation of insulin gene transcription. And so we see how complex the regulation of a single gene can be.
So why is this necessary? Well, in the case of the insulin gene this complexity is absolutely crucial. If too much insulin is produced blood glucose concentrations fall too low, resulting in coma; if too little insulin is produced the excessive amounts of sugar in the blood damage liver, kidneys, eyes, heart, and many other major organs, resulting in long term health problems and premature death.28 The precise regulation of the insulin gene in response to changes in blood glucose is brought about through the delicate and precise modification of transcription factor activity, and this represents the most crucial control point in the regulation of insulin production. Mutations in the transcription factors controlling these events have enormous consequences for insulin production, resulting in diabetes29 and all its damaging health complications.
Of course, the insulin gene is but one example of the complex transcriptional regulatory mechanisms used in higher organisms (there are many more complex and convoluted examples). Nonetheless, it is an elegant illustration of the hierarchical organisation behind mRNA production. The regulation of transcription is the key control point in the production of a protein, and is brought about only as the culmination of an often complex series of carefully orchestrated interactions between protein transcription factors, RNA polymerase, and the DNA template at the heart of it all. Subtle modifications in transcription factor activity amplify to produce dramatic changes in gene expression, and it is this intricate manipulation of individual activities that allows higher organisms to monitor and adapt to even the slightest change in their environment. With increasing complexity comes increasing flexibility, and an increased capacity to adapt: perhaps this is transcription's true take home message.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.