The polymerase chain reaction (PCR) is used universally for accurate exponential amplification of DNA. We describe a high error rate at mononucleotide and dinucleotide repeat sequence motifs. Subcloning of PCR products allowed sequence analysis of individual DNA molecules from the product pool and revealed that: (1) monothymidine repeats longer than 11 bp are amplified with decreasing accuracy, (2) repeats generally contract during PCR because of the loss of repeat units, (3) Taq and proofreading polymerase Pfu generate similar errors at mononucleotide and dinucleotide repeats, and (4) unlike the parent PCR product pool, individual clones containing a single repeat length produce no “shadow bands”. These data demonstrate that routine PCR amplification alters mononucleotide and dinucleotide repeat lengths. Such sequences are common components of genetic markers, disease genes, and intronic splicing motifs, and the amplification errors described here can be mistaken for polymorphisms or mutations.
Statistics from Altmetric.com
The polymerase chain reaction (PCR) is one of the most widely used techniques in molecular biology and has made possible a great variety of both diagnostic and research applications. Examples are the detection of gene mutations, the analysis of polymorphic markers and microsatellite loci, analysis of gene expression, DNA cloning, and site directed mutagenesis. Because PCR involves the exponential amplification of target sequences, a high degree of polymerase fidelity is essential if the introduction of a large number of replication errors during the PCR reaction is to be avoided. Most commercially available Taq polymerases introduce errors at the rate of approximately 10−5 to 10−6 point mutations/bp/duplication, with higher fidelity polymerases such as Pfu and Deep Vent generating up to eight times fewer errors.1 In contrast, we found that mononucleotide and dinucleotide repeats were not faithfully reproduced during PCR.
Methods and results
A stretch of 26 A nucleotides in intron 5 of the hMSH2 gene, termed Bat-26,2 was amplified from the genomic DNA of several healthy individuals under routine PCR conditions (200 ng genomic DNA and 1.25 U AmpliTaq (Perkin Elmer, Branchburg, New Jersey, USA) in 50 μl containing 1.5mM MgCl2, 300 ng each primer,2 and 200μM dNTPs). Direct sequencing of the PCR product produced an illegible sequence after the mononucleotide repeat, indicating a possible difference between two alleles. Such allele differences, along with any polymerase slippage incurred during the PCR reaction, can both be visualised using gene scanning technology. However, for our study a strategy of subcloning and sequencing of individual clones of the PCR product was used for the qualitative determination of the composition of the PCR product pool, by identifying the sequences of individual DNA molecules. This revealed differences in the length of the Bat-26 poly-A stretch, which varied from 19 to 28 bp, which is incompatible with the concept of simple polymorphism or mutation (table 1). The sequencing of more than 30 individual clones revealed that the repeat was predominantly shortened, with only 35% of the clones containing the predicted sequence of (A)26 (table 1). These data suggested that a systematic polymerase error had taken place during the PCR reaction, which was specific to the mononucleotide repeat.
To determine in more detail the performance of PCR amplification at similar sequences, repeats of 21, 13, 11, and nine monothymidines were amplified and subcloned. The polypyrimidine tract of the hMLH1 intron 11 splice acceptor site contains a (T)21 monothymidine repeat, termed Bat-21 (AcNb U40971),3 with an adjacent (TA)11 dinucleotide repeat. The polypyrimidine tract preceding exon 2 in the gene hMSH2 contains a (T)13 stretch, termed Bat-13 (AcNb U41207),3 and intron 4 of the human RAC1 gene contains both (T)9 and (T)11 runs (AcNb AJ132695).4 As shown in table 1, sequencing of individual cloned PCR products revealed incorrect amplification of the (T)21 and (T)13 repeats, whereas (T)9 was replicated faithfully. The limit for correct amplification was reached with (T)11, where 90% of the cloned products contained the predicted number of thymidines. The predominant observation was of repeat contraction; the (T)21 repeat appeared to expand, but this tendency was accounted for upon inspection of individual Bat-21 clones by A-T transversions in the adjacent TA repeat. Two different amplification errors at this combined repeat therefore led to an overall expansion of the poly-T stretch.
To determine the performance of the high fidelity Pfu polymerase at such sequences, the strategy was repeated for the above markers (with 200 ng genomic DNA and 2.5 U Pfu DNA polymerase (Stratagene, Amsterdam, the Netherlands) in 100 μl containing 2mM MgCl2, 600 ng of each primer, and 250μM dNTPs for 30 cycles (95°C for 45 seconds; annealing at 60°C for one minute; 72°C for two minutes) and showed that the limit for correct amplification by Pfu was raised from (T)11 to (T)13, but that longer repeats were also incorrectly amplified (table 1).
Dinucleotide repeats are another group of frequently amplified genetic markers. We amplified the dinucleotide repeat D15S128 (AcNb Z17197)5 in an individual previously determined to be homozygous for (CA)18, the allele with the highest frequency. PCR was carried out with both Taq and Pfu polymerases, followed by subcloning and sequencing of individual inserts. Both enzymes were found to amplify this repeat with a high error rate (table 1), as seen for the longer mononucleotide repeats.
Trinucleotide (CAG)n repeats occur in the coding regions of disease causing genes, such as the androgen receptor6 or Huntington's and Machado Joseph disease genes. Expansion of the triplets is associated with severe neurodegeneration,7, 8 and genetic analysis of patients relies on PCR amplification of the repeats from genomic DNA. To test for possible errors introduced by in vitro amplification, the (CAG)n and (GGC)n trinucleotide repeats in the first coding exon of the androgen receptor gene (AcNb NM_000044)6 were amplified by both Taq and Pfu, revealing that over 80% of clones contained the correct sequences (data not shown), a level of error compatible with direct sequencing. These data are also consistent with previous observations that longer nucleotide repeat units are associated with less polymerase error.29
An accepted feature of nucleotide repeat marker analysis, both in mutation detection or analysis of genetic polymorphisms, is the occurrence of staining patterns called “shadow bands”.10–12 When PCR products of such markers are separated by non-denaturing polyacrylamide gel electrophoresis they appear as broad bands or separate into a series of individual shadows. To determine the contribution of PCR polymerase errors to this phenomenon, a Bat-26 PCR product amplified with Taq was run on a polyacrylamide gel alongside several individual clones of the same product with a known number of A nucleotides. The latter were excised from the cloning vector by restriction digestion. As shown in fig 1, the PCR product appeared as a broad, smeared area of adjacent shadow bands, whereas the cloned fragments of defined length migrated as discrete single bands, forming a size ladder according to the length of the poly-A stretch. This demonstrates that the phenomenon of shadow bands commonly observed in PCR based genetic analysis of microsatellite loci can be attributed at least in part to artefactual PCR amplification errors. In addition, others have shown that improper annealing of PCR products during non-denaturing gel electrophoresis can further compound the occurrence of multiple band patterns.13
In our study, we describe a high error rate in PCR amplification of mononucleotide and dinucleotide repeats from genomic DNA. Our findings relate mainly to monothymidine/monoadenosine repeats, and suggest that the longer the repeat, the greater the errors made during amplification. Although our additional data include only TA and CA dinucleotide repeats, we believe it is possible to extrapolate the general rule that PCR amplification of mononucleotide or dinucleotide repeats results in error. Use of the high fidelity proofreading polymerase Pfu in place of Taq restricted the occurrence of such errors only in the case of short monothymidine repeats. We think that contraction of the repeat itself is the most common type of error occurring during PCR amplification. This type of error is commonly referred to as polymerase “slippage” and is probably caused by slipped strand mispairing.14, 15 Alternatively, the loss or gain of nucleotide repeat units without affecting the surrounding sequence10, 12 can be explained by mega-priming. In this case, fragments that were either incompletely synthesised or broken in their repetitive element during PCR can anneal under formation of mismatches, then extend, resulting in repeat length variation, independent of polymerase type or PCR conditions. This conclusion was drawn from work using synthetic oligonucleotides in the absence of genomic DNA.16, 17 Previous work has suggested that such polymerase error can be reduced by optimisation of PCR reaction conditions and buffer composition.1, 13, 18–20 In our present study, a fixed set of routine PCR conditions were used to determine the effect that repeat size and length alone would have on polymerase error rate. Our data lead to the conclusion that polymerase performance itself imposes considerable constraints upon PCR amplification fidelity independent of PCR conditions, primer composition, or PCR product length.
Although the molecular mechanism of the described errors during amplification of repetitive DNA sequence motifs remains a matter of discussion, such errors are frequently encountered during cloning or genetic analysis of introns, polypyrimidine tracts, or microsatellite markers. In addition, they are of great interest during analysis and diagnosis of various diseases. For example, mononucleotide and dinucleotide microsatellite sequences are hotspots for mammalian polymerase error during in vivo DNA replication,21 and are highly unstable in tumours resulting from mismatch repair deficiency.22, 23 On this basis, they are widely used as markers for certain cancer syndromes,24 including hereditary non-polyposis colorectal cancer.25 Furthermore, dinucleotide repeats are used as polymorphic markers to determine heterozygosity. With the increased use of automated DNA sequencing, the described in vitro amplification errors at repeat motifs can easily be mistaken, in research and diagnostics, for polymorphism or mutation.
We thank C Caldas for comments on the manuscript, L Vieira for donating D15S128 primers, S Pedro for running the ABI sequencing apparatus, and S Beck and S Vieira for assistance with high performance liquid chromatography analysis. This study was supported by PRAXIS XXI grant 2/2.1/SAU/1397/95 from the Fundação para a Ciência e a Tecnologia, and PRAXIS XXI postdoctoral fellowship BPD/4140/96 to LAC.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.