Although ancient DNA (aDNA) miscoding lesions have been studied since the earliest days of the field, their nature remains a source of debate. A variety of conflicting hypotheses exist about which miscoding lesions constitute true aDNA damage as opposed to PCR polymerase amplification error. Furthermore, considerable disagreement and speculation exists on which specific damage events underlie observed miscoding lesions. The root of the problem is that it has previously been difficult to assemble sufficient data to test the hypotheses, and near-impossible to accurately determine the specific strand of origin of observed damage events. With the advent of emulsion-based clonal amplification (emPCR) and the sequencing-by-synthesis technology this has changed. In this paper we demonstrate how data produced on the Roche GS20 genome sequencer can determine miscoding lesion strands of origin, and subsequently be interpreted to enable characterization of the aDNA damage behind the observed phenotypes. Through comparative analyses on 390,965 bp of modern chloroplast and 131,474 bp of ancient woolly mammoth GS20 sequence data we conclusively demonstrate that in this sample at least, a permafrost preserved specimen, Type 2 (cytosine-->thymine/guanine-->adenine) miscoding lesions represent the overwhelming majority of damage-derived miscoding lesions. Additionally, we show that an as yet unidentified guanine-->adenine analogue modification, not the conventionally argued cytosine-->uracil deamination, underpins a significant proportion of Type 2 damage. How widespread these implications are for aDNA will become apparent as future studies analyse data recovered from a wider range of substrates.
Keywords: Animals; DNA Damage; DNA, Chloroplast; DNA-Directed DNA Polymerase; Data Interpretation, Statistical; Elephants; Fossils; Genomics; Polymerase Chain Reaction; Templates, Genetic