Abstract
Trinucleotide repeat (TNR) diseases are neurological disorders caused by expanded genomic TNRs that become unstable in a length-dependent manner. The CAG•CTG sequence is found in approximately one-third of pathogenic TNR loci, including the HTT gene that causes Huntington’s disease. Friedreich’s ataxia, the most prevalent hereditary ataxia, results from GAA repeat expansion at the FXN gene. Here we used cytosine and adenine base editing to reduce the repetitiveness of TNRs in patient cells and in mice. Base editors introduced G•C>A•T and A•T>G•C interruptions at CAG and GAA repeats, mimicking stable, nonpathogenic alleles that naturally occur in people. AAV9 delivery of optimized base editors in Htt.Q111 Huntington’s disease and YG8s Friedreich’s ataxia mice resulted in efficient editing in transduced tissues, and significantly reduced repeat expansion in the central nervous system. These findings demonstrate that introducing interruptions in pathogenic TNRs can mitigate a key neurological feature of TNR diseases in vivo.
Similar content being viewed by others
Main
Trinucleotide repeat (TNR) sequences are common genomic elements that can become unstable in a length-dependent manner. Pathogenic expansion of TNRs is associated with over 40 severe, predominantly neurological disorders1. TNRs may be localized to gene promoters, coding sequences, untranslated regions and introns, and the repeat motif can vary between TNR disorders2. The most common pathogenic triplet base pair is CAG•CTG, which occurs in at least 15 known pathogenic TNR loci2. CAG repeats in exons frequently encode oligomers of glutamine3. These polyglutamine (poly-Q) diseases include Huntington’s disease (HD), spinocerebellar ataxias (SCAs), dentatorubral-pallidoluysian atrophy, and spinal and bulbar muscular atrophy. The most common hereditary ataxia in humans, Friedreich’s ataxia (FRDA), is caused by the intronic expansion of GAA repeats. Currently there are no approved treatments that halt TNR disease progression4,5,6.
The age of TNR disease onset, disease severity and rate of disease progression are primarily determined by the length of the corresponding repeat tract at birth, with longer repeats being associated with a less favorable prognosis. Repeat lengths beyond a locus-specific threshold are unstable in some somatic cells and can expand, contract and become increasingly unstable as repeat length increases. The instability of these genomic loci results from the formation of higher-order DNA and R-loop structures during transcription and cell replication that interfere with normal cell function7,8,9,10. These abnormal structures are subjected to error-prone DNA repair that can result in the expansion or contraction of the repeat, with a general bias toward expansions in longer repeat tracts11,12,13,14. Single-cell analysis of brain tissue from patients with HD suggests that affected neurons undergo decades of CAG repeat expansion without evident phenotype before crossing a threshold that causes rapid neurodegeneration15, suggesting that therapeutic intervention to halt somatic repeat expansion before this threshold is reached may prevent or impede disease onset or progression.
In cell and animal models, naturally occurring single-nucleotide variants within repeat tracts inhibit the formation of higher-order nucleotide structures and reduce repeat instability16,17,18,19,20,21,22. At various TNR loci, repeat instability is better predicted by the length of pure uninterrupted repeats than by the length of the repeat tract13,23. TNR interruptions such as synonymous CAA triplets within CAG repeat tracts, or GAG or GGA triplets within GAA repeat tracts, are common. TNR interruptions in patients are associated with reduced somatic instability24,25,26, reduced transgenerational transmission27,28,29, delayed onset and progression of the disease, and overall milder phenotypes compared with individuals with uninterrupted repeats13,26,30,31,32,33,34,35,36,37,38,39,40,41. Genome-wide association data from >9,000 patients with HD suggest that a single CAA interruption in a CAG poly-Q region delays HD onset by an average of 12 yr (refs. 13,22), while pedigree analyses of 16 patients with HD suggest that CAG repeat interruption delays disease onset 13–29 yr (refs. 36,42). These findings collectively raise the possibility that introducing interruptions in pathogenic TNR tracts might improve their genomic stability and ameliorate disease pathology.
Base editing is a precision genome editing technology that directly introduces targeted changes to the DNA in living cells43,44,45,46,47,48. Cytosine base editors (CBEs), which mediate C•G>T•A substitutions, and adenine base editors (ABEs), which mediate A•T>G•C substitutions, in theory can install single-nucleotide changes that interrupt repeats within TNR alleles resembling interruptions found in the general population, or in mild or unaffected individuals with long repeats23,49 (Table 1 and Supplementary Table 1). In this study, we use base editing to introduce interruptions in repeats associated with HD and FRDA and assess their effect on expansion of these repeats in patient cells and in mouse models of these diseases.
Results
Synonymous cytosine base editing of CAG repeats in vitro
In patients suffering from poly-Q disorders including HD and SCAs, naturally occurring synonymous CAA interruptions in pathogenic CAG repeats are associated with a delayed or lack of disease onset13,22,36,42,50. These interruptions are proposed to stabilize repeats and suppress somatic repeat expansion17,18,22. In HD knock-in mice, long stretches of alternating CAG and CAA codons do not undergo somatic expansion, unlike pure CAG repeats of similar length22. We hypothesized that introducing CAA interruptions throughout CAG repeat tracts by cytosine base editing might reduce the expansion of long pathogenic TNR alleles (Fig. 1a).
a, An overview of the base editing approach to reduce triplet-repeat expansions. b, Schematic of the CAG-CBE base editing strategy. c, An illustration of cytosine base editing at CAG repeats. The smaller cartoon illustrates the multiple binding opportunities for the Cas9-sgCTG complex at CAG repeats. The magnified snippet shows a singular binding event. d, Optimization of cytosine base editing strategies in HEK293T cells. Data are mean ± s.d. of biological triplicates. e, Optimization of the ‘GS’ linker of EA-evoA-Cas9-NG in HEK293T cells. Data are mean ± s.d. of biological triplicates. f, CAG repeat base editing at HTT alleles in human fibroblasts. Numbers below the bars indicate the number of CAG repeats (CAG size) in HTT alleles. Data are mean ± s.d. of biological replicates (n = 2 for HD cell lines with 20/48 and 17/71 CAGs, n = 3 for HD cell lines with 15/16 and 18/180 CAGs). g, Distribution of HTT CAG allele sizes in CBE-treated (CBE) and untreated HD fibroblasts with 18/180 CAG repeats in Rep1, 5 d (P1) and 30 d (P5) after electroporation, as measured by fragment analysis. h, CAG repeat base editing in HD fibroblasts with 18/180 CAG repeats measured across 30 d and five cell passages. P1–P5 refer to cell passages 1–5. Rep1 and Rep2 refer to two independent biological replicates. Illustrations in a, c and e were created using BioRender.com. CBE, CBE-treated; UGI, uracil DNA glycosylase inhibitor domain; UT, untreated cells.
To induce CAA interruptions in CAG repeat tracts, we designed a single-guide RNA (sgRNA) targeting CTG repeats on the opposite strand (sgCTG) and compared base editing throughout the repeat region using eight cytosine deaminases in the BE4max architecture with the NG-protospacer adjacent motif (PAM) Cas9 variant42,43 (Fig. 1b–d and Extended Data Fig. 1a). We co-transfected HEK293T cells, which contain an average of 17 CAG repeats at HTT, with plasmids encoding a CBE and sgCTG. We measured edited repeat tracts at the HTT locus by Illumina high-throughput sequencing (HTS) and an in-house software, powTNRka (Supplementary Note 2). Even though CAG repeats are common genetic elements and thus sgCTG should target many sites in the human genome, we did not observe any evident cellular toxicity. We determined the fraction of HTT alleles with at least one CAA interruption within the sequenced HTT CAG repeat tract of 17 repeats, and observed 44–62% average editing among our top six strategies (48 ± 4.2% for CDA-BE4 (ref. 51), 44 ± 5.3% for BE4 (refs. 52,53), 46 ± 5.0% for EA-BE4 (ref. 54), 51 ± 7.5% for EA-evoA (refs. 51,54), 53 ± 13% for AID-BE4 (ref. 55) and 62 ± 3.0% for AID-BE5 (ref. 54)) (Fig. 1d and Extended Data Fig. 1a).
Rarely, CBEs induce G•C>A•T changes upstream of the sgRNA binding site54 on the opposing DNA strand (Supplementary Text). This effect is more likely when multiple CBE binding events occur in proximity at the same target site, as with our sgCTG-targeting approach54. At glutamine-coding CAG repeats these edits can result in nonsense mutation (CAG to TAG or TAA). rAPOBEC1 family deaminases that harbor the ‘EA’ purity and efficiency modifications achieved the highest top-strand product purity54 (55:1 for EA-evoA and 63:1 for EA-BE4; Fig. 1d and Extended Data Fig. 1a). Modifying the flexible Gly-Ser (GS) linker between the EA-evoA deaminase and Cas protein improved editing outcomes: rigid linkers incorporating a nuclear localization signal (NLS)53,56 enhanced editing efficiency by up to 1.3-fold and product purity by 1.6-fold (Fig. 1e, Supplementary Text and Extended Data Fig. 1a,b). The EA-evoA-32NLS base editor yielded the highest editing efficiency (64 ± 4.8%) as well as the highest top-strand purity (81:1). We selected this editing strategy (hereafter designated CAG-CBE) for further study.
CBE interruption of HTT CAG repeats reduces expansion in HD cells
To assess genome editing by CAG-CBE in pathogenic CAG repeats, we quantified interruptions in three HD patient-derived fibroblast lines that each carry one wild-type HTT allele and one pathogenic allele with 48–180 CAG repeats (Methods). We delivered the base editor and synthetic sgCTG by messenger RNA electroporation47,57. At 5 d after electroporation we observed that 66–82% of treated cells contained interrupted repeats in the pathogenic CAG repeat tract (Fig. 1f). Within each sample, editing was ~1.1–1.3-fold higher at the long pathogenic HTT allele than at the shorter wild-type allele (Fig. 1f), suggesting that the increase in binding opportunities for the CBE at long pathogenic alleles results in higher targeting efficiency and a greater number of interruptions per allele (Extended Data Fig. 1c–e and Supplementary Text).
Next, we asked whether CAG-CBE-induced repeat interruptions affect the stability of edited HTT alleles. We cultured GM09197 primary patient fibroblasts for up to 30 d until the cell lines reached senescence, and observed progressive expansion of the pathogenic HTT allele that initially contained 180 CAG repeats on average. We assessed CAG repeat instability at five consecutive passages by measuring the CAG repeat length in edited versus unedited cells (Fig. 1g and Extended Data Fig. 1f). Base editing of the HTT alleles was completed by 5 d after electroporation and was durable throughout the experiment. We observed a modest increase in the fraction of edited cells at later timepoints, likely due to stochastic clonal expansion within the bulk cell culture, or possibly a growth advantage of the edited fibroblasts relative to unedited cells (Fig. 1h). By passage 5 (30 d after treatment), we observed a shift in the CAG repeat distribution in untreated and mock-edited cells toward an increased repeat size, with the most frequent CAG allele acquiring ~6 CAG repeats relative to passage 1 (day 5 posttreatment; Fig. 1g and Extended Data Fig. 1f). In contrast, CAG-CBE-treated HD fibroblasts did not exhibit repeat expansion by passage 5, and the most frequent CAG allele was reduced by ~5 CAG repeats compared with passage 1. These findings demonstrate that inducing interruptions in pathogenic-length CAG repeats by base editing can prevent somatic repeat expansion and promote contraction of the pathogenic repeats.
Genome-wide off-target editing analysis of CAG repeat base editing
Expansion of CAG repeats at various loci is associated with TNR diseases, including HD and several SCAs2,58,59,60,61. Our CAG-CBE strategy targets pure CAG•CTG repeats, enabling repeat interruption across pathogenic expansion loci regardless of gene identity. We observed that CAG-CBE introduces interruptions in 39–65% of alleles at multiple TNR loci (AR, ATNX1, ATNX2, ATNX7, ATN1 and TBP; Fig. 2a), demonstrating its potential for interrupting and reducing CAG•CTG repeat expansions in a range of TNR disorders.
a, Base editing at TNR disease-associated genes in HEK293T cells. Data are mean ± s.d. of biological triplicates, except for AR (n = 5) and ATXN2 (n = 4). b, CIRCLE-seq off-target hits in the human genome classified by the number of mismatches with sgCTG. c–e, Alternative target and off-target editing at CIRCLE-seq sites in HEK293T cells, quantified by WGS. c, Violin plot representing mean base editing frequencies at CIRCLE-seq sites (>0.5% editing by WGS), classified by mismatch number. Median and quartiles are shown. d, Alternative target editing at protein-coding sites in HEK293T cells, grouped by encoded amino acids. Median and quartiles are shown; each dot represents mean editing at a specific locus. e, Base editing at CIRCLE-seq sites (>0.5% editing by WGS) and grouped by mismatch position relative to sgCTG spacer. Mean editing (%) represents base editing frequency across genomic sites meeting the specified mismatch criteria. Mismatch category A includes the five nucleotides closest to the PAM (positions 1–5), category B represents positions 6–10 and category C spans the last ten, PAM-distal nucleotides (positions 11–20) of the protospacer. A0–A5, B0–B5 and C0–C10 indicate the number of mismatches (0–5) between the sgCTG and a target site in categories A, B and C. Each square shows the number of loci with >0.5% editing in each mismatch subgroup. f, Editing at protein-coding sites with 0–3 mismatches between the sgCTG and a target site in HEK293T cells, measured by HTS. Each dot represents mean editing at a unique locus; disease-associated genes are colored diamonds. Data are mean ± s.d. of all loci in each category. g, Comparison of editing quantified by WGS and HTS at selected sites. Each dot represents the log2 fold-change between editing frequencies quantified by WGS and HTS at a single locus, with the horizontal line indicating median (n = 22, P = 0.0045, one-sample t-test and Wilcoxon test). h, Impact of CAG repeat editing on amino acid sequence at protein-coding genes. The scatterplot shows the percentage of synonymous (y axis) and nonsynonymous (x axis) editing at edited alleles for each protein-coding gene. Each dot represents mean editing calculated for all loci mapping to a unique gene. Data in c–h represent biological triplicates. Poly-A, polyalanine; Poly-L, polyleucine; Poly-S, polyserine.
Base editors can induce off-target edits in the genome through Cas-dependent and Cas-independent mechanisms62,63,64,65,66,67. Many coding and noncoding sites in the human genome contain ≥8 CTG repeats that match the sgCTG spacer, including regions not known to be associated with disease68. To assess Cas-dependent off-target activity of CAG-CBE, we performed circularization for in vitro reporting of cleavage effects by sequencing (CIRCLE-seq) analysis on human genomic DNA (gDNA) from HEK293T cells using purified ribonucleoprotein complexes composed of Cas9-NG nuclease and sgCTG69. CIRCLE-seq nominated 5,706 potential targets, including 614 (10.8%) loci with ≥8 CTG repeats that perfectly match the sgCTG spacer (‘alternative targets’). These alternative targets include 16 TNR disease-associated loci, 129 additional protein-coding loci and 469 noncoding loci (Fig. 2b). The remaining >89% of CIRCLE-seq-nominated loci harbor mismatches to the sgCTG that are known to inhibit sgRNA binding and base editing in cells70,71,72 (Fig. 2b), including 833 protein-coding loci. We classified these hits based on: (1) the identity of the annotated targeted region with hypergeometric optimization of motif enrichment tool (HOMER)73; and (2) the number of mismatches with the pure CAG•CTG repeat targeted by sgCTG70,71,72 (Fig. 2b, Extended Data Fig. 1g,h and Supplementary Tables 2 and 3).
To experimentally validate genome-wide off-target activity, we performed whole-genome sequencing (WGS) in HEK293T cells at a 160× average read depth, and quantified the fraction of CIRCLE-seq-nominated loci with at least one CAA interruption observed at >0.5% over background levels. We detected cytosine base editing at 48% of CIRCLE-seq-nominated loci (2,753), with 1,240 sites showing ≥5% editing (Supplementary Table 4). Editing was greatest at alternative targets (35 ± 18% across 579 loci; Fig. 2c), including 143 protein-coding loci. Most of these encode poly-Q or polyleucine (114 total) and are converted to synonymous codons by cytosine base editing (CAG-to-CAA and CTG-to-TTG, respectively; Fig. 2d). As anticipated, mismatches progressively reduced observed editing70,71,72; a single mismatch decreased editing by ~1.8-fold, while three or more mismatches greatly reduced or abolished off-target editing70,71,72 (Fig. 2c–e and Supplementary Text).
To validate our WGS results54,64, we quantified CBE-induced C•G>T•A interruptions at seven poly-Q-coding alternative targets and 16 CIRCLE-seq-nominated protein-coding off-targets using targeted-amplicon HTS sequencing. Concordant with WGS results, a single mismatch reduced editing by ~2.7-fold, while two or more mismatches nearly eliminated CBE activity (Fig. 2f and Supplementary Text). Although WGS and HTS results were well correlated, WGS slightly overestimated off-target editing at some loci (one-sample t-test and Wilcoxon test P = 0.0045; Fig. 2g).
The WGS pipeline revealed that substantial CBE activity (≥5%) occurred at intergenic (364 loci) or intronic (339 loci) regions, while ~28% of edited sites (350 loci across 243 genes) mapped to protein-coding exons. Of these, 145 genes acquired synonymous codon substitutions, while 91 (~37%) acquired missense mutations in ~26% of alleles on average (Fig. 2h), including at least 47 neuronally expressed genes (Supplementary Tables 5–7). Among these, four (MED12, NOLC1, PRPF40A and RPLP0) are considered essential according to the Cancer DepMap database74,75 (Supplementary Table 8). To assess the impact of nonsynonymous editing, we used AlphaMissense to predict how CAG-CBE-induced amino acid substitutions may affect protein folding and function76,77. This analysis revealed that the induced missense mutations mostly result in benign amino acid changes (87%, 53 genes), with one gene (TSHZ3, ~10% edited alleles) acquiring mutations likely to affect protein function. Finally, we observed nonsense mutations in a single gene (PRRC2C, ~7% edited alleles) that is nonessential and not expressed in neurons74 (Supplementary Tables 5–9).
Collectively, these findings confirm that CAG-CBE introduces C•G>T•A substitutions that are mostly noncoding or synonymous or reproduce natural allelic variation78,79. The small fraction of protein-coding off-target loci that undergo editing (6.1% of CIRCLE-seq-nominated sites) merit careful study to understand potential biological consequences of editing at these residues. This work also underscores the limitations of computational predictive models (Supplementary Text and Extended Data Fig. 2a–c), which may overestimate or underreport off-target events, further highlighting the importance of experimental nomination and empirical validation of genome-wide off-target effects.
Cytosine base editing of HTT alleles reduces CAG length in neurons
The Htt.Q111 mouse model of HD harbors a humanized HTT allele with a long pathogenic repeat tract containing ~109–122 CAGs and exhibits age-dependent somatic instability in central nervous system (CNS) tissues including striatum and cortex80,81,82. In contrast, HD knock-in mice that carry (CAGCAACAGCAACAA)21, a long poly-Q tract composed of a mixture of CAG and CAA codons, do not exhibit somatic instability22, suggesting that interrupting the pathogenic-length CAG tracts may alleviate repeat expansions in vivo.
To assess whether CBE-induced CAA interruption reduces the average length of HTT CAG repeats in vivo, we designed a dual adeno-associated virus (AAV) strategy to package CAG-CBE (EA-evoA-32NLS-NG and sgCTG, v5 AAV-CBE; Fig. 3a) for delivery to Htt.Q111 mice83. We selected the AAV serotype 9 (AAV9) as it has a well-established tropism for neurons in the CNS and has been shown to almost exclusively target neurons in the cortex84,85,86. We injected Htt.Q111 neonates on postnatal day 0 via intracerebroventricular (ICV) injection with total 3.8 × 1010 viral genomes (vg) per mouse of the dual AAV9-CBE vectors, along with 3.8 × 109 vg of AAV9-Cbh-eGFP-KASH (Klarsicht/ANC-1/Syne-1 homology domain, hereafter, AAV9-GFP)83 to serve as a viral transduction control83,87 (Fig. 3b). We quantified GFP-positive nuclei from the cortex and striatum of injected mice and observed a mean transduction efficiency of 50 ± 10% and 31 ± 12%, respectively (Fig. 3c), consistent with earlier reports using AAV9 (refs. 83,88,89,90).
a, Dual-AAV vectors encoding split-intein EA-evoA-32NLS-NG and sgCTG cassettes, v5 AAV9-CBE. b, Neonatal ICV injections in Htt.Q111 mice with AAV9-CBE, and AAV9-GFP as a transduction control. c, Transduction efficiency in the cortex and striatum of Htt.Q111 mice treated with AAV9-CBE + AAV9-GFP. Data are mean ± s.d. of independent animals (12-week, n = 6; 24-week, n = 7). d, Base editing in the CNS of Htt.Q111 mice treated with AAV9-CBE, or controls. Editing was quantified at 4, 12 and 24 weeks postinjection. Data are mean ± s.d. of independent animals (4-week, n = 3; 12-week, n = 6; 24-week, n = 7). e,f, Base editing in bulk and GFP+ flow-sorted nuclei isolated from the cortex (e) or striatum (f) of Htt.Q111 mice treated with AAV9-CBE + AAV9-GFP at 12 and 24 weeks postinjection. Data are mean ± s.d. of independent animals (12-week, n = 6; 24-week, n = 7). g–i, Distribution of CAG allele sizes in the tail (g), cortex (h) and striatum (i) isolated from 24-week-old Htt.Q111 mice treated with AAV9-CBE, or controls. The dashed vertical line marks the modal HTT CAG allele determined from the tail. Data show mean CAG repeat size distributions from at least four independent animals (tail: untreated n = 4, CBE n = 11; striatum: untreated n = 7, CBE n = 7; cortex: untreated n = 8, CBE n = 7). j, ICAG calculated for the tail, cortex and striatum isolated from 12- and 24-week-old Htt.Q111 mice treated with AAV9-CBE, or controls. Data are shown as box plots, with each data point representing an independent animal (12 weeks, control group: tail n = 8, striatum n = 8; cortex n = 4; 12 weeks, CBE group: cortex n = 6, striatum n = 6, tail n = 5; 24 weeks, control group: tail n = 8, cortex n = 8, striatum n = 7; 24 weeks, CBE group: all tissues n = 7). The horizontal line marks the median, and whiskers denote the minimum and maximum values. *P = 0.0265, **P = 0.0064, ***P = 0.0009, Welch’s one-tailed t-test. Illustrations in a and b were created using BioRender.com.
At 4 weeks postinjection, we observed CAA interruptions in 24 ± 1.6% and 5.6 ± 1.2% of alleles from cortical and striatal cells, respectively, with no detected nonsense mutations resulting from opposite-strand editing above background levels (Fig. 3d), and with a median of three interruptions and an average of 3.9 per edited allele in the cortex (Extended Data Fig. 3a). As expected given the long-lasting nature of standard AAV expression and the continued availability of long stretches of uninterrupted repeat sequence, base editing levels in these tissues steadily increased up to 12 weeks postinjection, reaching 33 ± 8.3% in the cortex and 22 ± 3.0% in the striatum with a median of six interruptions per edited read (average of 7.6; Fig. 3d and Extended Data Fig. 3a), after which editing largely stabilized and we observed only a modest increase in editing at 24 weeks (34 ± 6.0% edited alleles in the cortex and 26 ± 6.0% in the striatum, with median of eight interruptions and average of 8.8 interruptions per edited read). In transduced cells—enriched by sorting for GFP-positive nuclei83,87—we observed that 76 ± 12% and 72 ± 13% of HTT alleles from the cortex and striatum, respectively, harbored one or more CAA interruptions by 24 weeks postinjection (Fig. 3e,f). These observed editing frequencies are all underestimates of the true interruption frequency across the entirety of the repeat tract, as they measure editing across only <65% of the repeat region (70 of ≥109 repeats; Supplementary Text and Extended Data Fig. 3b). Collectively, these data confirm that neonatal ICV injection of AAV9-CBE enables efficient synonymous CAA interruption of long pathogenic HTT CAG repeats in HD-relevant tissues.
To examine whether CAA interruptions impact somatic CAG repeat expansion in vivo, we performed qualitative and quantitative analyses of CAG repeat length profiles from Htt.Q111 mice treated with AAV9-CBE compared with controls80,91 (Fig. 3g–j and Supplementary Text). At 12 weeks postinjection, we found that AAV9-CBE treatment significantly reduced the average size of CAG repeats (CAG instability index, ICAG) in the cortex (ICAG = −1.6 ± 0.5 repeats, Welch’s one-tailed t-test P = 0.0064) and striatum (ICAG = −2.8 ± 0.5 repeats, Welch’s one-tailed t-test P = 0.0009; Fig. 3j, Extended Data Fig. 3c–f and Supplementary Text). This effect endured over time, reaching ICAG = −2.2 ± 0.9 repeats in the cortex (Welch’s one-tailed t-test P = 0.0265) and ICAG = −5.4 ± 0.9 repeats in the striatum (Welch’s one-tailed t-test P = 0.0003) at 24 weeks postinjection (Fig. 3g–j).
Collectively, the data reveal that neonatal ICV injection of AAV9-CBE enables substantial transduction of HD-relevant tissues in humanized HD mice, significantly reducing the average size of pathogenic HTT CAG repeats by inducing synonymous CAG-to-CAA interruptions with proportionally few byproducts (Supplementary Text, Fig. 3d and Extended Data Fig. 3g–j). This targeted base editing approach in postnatal animals not only prevents repeat expansion but also contracts repeats, offering particular benefits for CAG repeat lengths that exceed the pathogenic threshold, are inherently toxic and cannot become nonpathogenic solely by reducing the rate of somatic expansion91,92.
Our findings demonstrate the protective role of CAA interruptions in vivo, and show that CAG-CBE activity primarily results in silent or noncoding sequence changes at off-target loci (Supplementary Text and Extended Data Fig. 3k–n).
Adenine base editing of GAA repeats at FXN alleles in vitro
Similar to the protective effect of CAG repeat interruptions, GGA and GAG triplet interruptions in the GAA repeat region of FXN intron 1 are associated with the absence of FRDA disease phenotypes, later disease onset and milder symptoms compared with patients with similar-sized uninterrupted GAA alleles21,23,49. Triplet sequence variation within GAA repeat tracts can also prevent the formation of higher-order DNA structures that underlie repeat expansion and transcriptional repression of FXN in vitro16,21.
We hypothesized that inducing A•T>G•C interruptions at GAA repeats using ABEs could mimic the natural genetic variation of FXN alleles that is observed in the general population and in individuals with pathogenic-length GAA repeats that are disease-free, or have later onset compared with patients with FRDA with similar repeat length21,23,49 (Table 1 and Supplementary Text). We speculated that A•T>G•C interruptions at GAA repeats could reduce the length of pathogenic repeat expansions that are causal to FRDA (Figs. 1a and 4a). To induce A•T>G•C interruptions throughout GAA repeat tracts, we designed three sgRNAs to target pure GAA triplets using ‘AGAA’, ‘AAGA’ or ‘GAAG’ as a PAM. We paired these sgRNAs with ten different ABEs that used the laboratory-evolved deoxyadenine deaminases from either ABE7.10 or ABE8e (refs. 56,93) fused to PAM-compatible Cas-proteins, including Cas9-NG, Cas9-NRCH, Cas9-SpG, Cas9-SpRY and Cas9-NRRH54,94,95,96,97 (Fig. 4b). We assessed ABE editing at FXN alleles of HEK293T cells, which on average harbor nine GAA repeats, by HTS and powTNRka analysis to quantify the fraction of FXN alleles with at least one A•T>G•C interruption. We observed the highest fraction of edited FXN alleles with strategies using the AGAA PAM (42 ± 2.9% by ABE8e-Cas9-NRCH and 46 ± 2.0% by ABE8e-Cas9-NG) and GAAG PAM (45 ± 2.1% by ABE8e-Cas9-SpRY; Fig. 4b).
a, Illustration of adenine base editing at GAA repeats (top) and schematic of the editing strategy (bottom). A smaller cartoon illustrates multiple binding opportunities for the Cas9-sgGAA complex at GAA repeats, with a magnified view showing a single binding event. b, Optimization of adenine base editing in HEK293T cells. Sequences below the bar plot indicate the NNNN PAM sequences compatible with sgGAA spacer. Data are mean ± s.d. of biological triplicates. c, Comparison of AGAA PAM-targeting ABE8e strategies in FXN-mESCs, evaluated across 30 (FXN-30GAA-mES) and 50 (FXN-60-GAA-mES) GAA repeats. Data are shown as mean ± s.d. of biological triplicates. d,e, CIRCLE-seq off-target hits in the human genome classified by the identity of the targeted region annotated with HOMER (d) and the number of mismatches with the sgGAA (e). f,g, Alternative target and off-target editing at CIRCLE-seq sites in HEK293T cells, confirmed by WGS (>0.5% editing) and classified based on the number (f) and the location (g) of mismatches relative to the sgGAA spacer. Horizontal lines in f mark median and quartiles calculated for all loci in a specific group. Editing for each locus is a mean of triplicates. Mean editing (%) in g represents base editing frequency across genomic sites meeting the specified mismatch criteria. Mismatch category A includes the five nucleotides proximal to PAM (positions 1–5), category B represents positions 6–10 and category C spans the last ten, PAM-distal nucleotides (positions 11–20) of the protospacer. A0–A5, B0–B5 and C0–C10 indicate the number of mismatches (0–5) between the sgGAA and a target site in categories A, B and C. Each square shows the number of loci with >0.5% editing in each mismatch subgroup. h, Editing frequencies at CIRCLE-seq sites with 0–4 mismatches between the sgGAA and a target site in HEK293T cells, measured by HTS. Each dot shows mean editing at a unique locus; diamonds indicate protein-coding sites. Data are mean ± s.d. of all loci in each category. Data in f–h represent biological triplicates. Illustration in a was created using BioRender.com. TSS, transcription start site; UTR, untranslated region.
GAA repeats are one of the most abundant triplet-repeat sequences in the human genome. They are found in G/A islands, mostly in Alu elements, with nearly 600 genomic integration sites harboring at least eight GAA repeats that support sgRNA binding and base editing by our strategy—several fold more than the ~100 sites in the human genome that harbor at least eight CAG repeats68,98. The most efficient base editors use a Cas9 nickase to bias the permanent incorporation of the edited base pair at the target locus by DNA repair52,56. While Cas9 nickase and canonical base editors are typically associated with minimal indels54,64, we reasoned that a high number of nicking events from the abundance of GAA repeats in the genome may adversely affect genomic stability. Since nicking is not essential to base editing and current ABEs are highly efficient, we assessed whether ABEs that use nuclease-dead Cas9 (D10A + H840A, dCas9) enable efficient A•T>G•C interruption of GAA repeats. Concordant with their nickase-Cas9 (D10A) ABE8e counterparts, we observed the highest fractions of edited FXN alleles from non-nicking ABEs using the AGAA PAM by dCas9-NRCH (21 ± 1.9%) and dCas9-NG (23 ± 1.8%), and the GAAG PAM by dCas9-SpRY (19 ± 1.4%; Fig. 4b), with overall fractions of edited FXN alleles being approximately half those of the nicking ABE variants.
Next, we evaluated our ABE-dCas9 editing strategies in mouse embryonic stem cell (mESC) lines harboring the human FXN intron 1 locus with long GAA repeats99,100 (Supplementary Text and Extended Data Fig. 4a–c). We found that ABE8e fused to dCas9-NRCH greatly outperformed other ABEs by up to 4.2-fold (Fig. 4c) and that the editing efficiency of GAA repeats generally increased with the length of the repeat tract (23 ± 0.2% in FXN-30GAA-mES compared with 32 ± 3.7% in FXN-60GAA-mES; Fig. 4c). Overall, these data demonstrate that the ABE8e-dNRCH base editor in combination with the GAA repeat-targeting sgRNA (sgGAA; a strategy hereafter designated GAA-ABE) enables efficient interruption of FXN GAA repeat alleles at both endogenous FXN loci and longer-length GAA repeats.
Genome-wide off-target editing analysis of GAA repeat base editing
The GAA-ABE strategy enables site-specific interruptions at pathogenic GAA expansion loci that underlie neurodegenerative diseases, including FRDA and late-onset cerebellar ataxias2,101,102,103. However, there are many coding and noncoding regions in the genome with ≥8 GAA repeats that could support editing by GAA-ABE, including non-disease-associated regions68.
To investigate Cas-dependent genome-wide activity of GAA-ABE, we performed CIRCLE-seq on human (HEK293T) and mouse (NIH3T3) gDNA69 using Cas9-NRCH ribonucleoprotein complexes with sgGAA69,104 (Supplementary Text). We identified 41,992 putative GAA-ABE off-target sites in the human genome, including 2,703 perfect-match loci (6.4% of CIRCLE-seq-nominated loci), of which only six were in protein-coding regions (Fig. 4d,e and Supplementary Tables 15 and 16). Most CIRCLE-seq-nominated sites (94%) contained mismatches that reduce binding and editing efficiency, and the overwhelming majority (>97%) were in noncoding genomic regions.
To characterize genome-wide GAA-ABE editing, we performed 160× WGS in HEK293T cells. We detected A•T>G•C interruptions at ~50% (20,560 sites) of CIRCLE-seq-nominated loci, with 5,085 sites (~12%) showing ≥5% editing (Supplementary Table 19). Notably, 3,857 edited sites were within Alu elements, which often contain GAA, GGA, GAG and GGG triplets98.
Genome-wide GAA-ABE editing decreased with an increasing number of mismatches with the sgGAA70,71,72 (Fig. 4f–h and Supplementary Text). We mainly observed GAA-ABE activity at perfect-match loci (2,497 loci), including five protein-coding sites, with an average editing of 21 ± 9.4% across these sites. Additionally, we detected low-level editing (3.6% on average) at 475 protein-coding off-target loci (Supplementary Tables 16 and 19). Single-amplicon sequencing confirmed that editing efficiency decreased as the number of mismatches increased (Fig. 4h, Extended Data Fig. 4f and Supplementary Text).
Substantial GAA-ABE activity (≥5% editing) occurred mainly in intergenic (3,201 hits, 62%) and intronic (1,755 hits, 34%) regions, with only 83 protein-coding sites (~1.6%) affected. Among 57 genes with nonsynonymous substitutions, 29 are neuronally expressed105 and three (DKC1, NOC3L and UPF2) are considered essential, including one expressed in neurons (DKC1)74,75,105 (Supplementary Tables 19–23). AlphaMissense76 predicted that most missense mutations (36 genes) were benign, including those in all three essential genes74,75. However, four genes (B4GALT6, BRD9, EIF2S2 and RIMS1) acquired amino acid changes predicted to impact protein folding or function (Supplementary Table 24). No nonsense mutations were detected at any locus (Supplementary Tables 20 and 21). Collectively, we observed substantial editing at ~12% of CIRCLE-seq-nominated loci.
Together, these results demonstrate that GAA-ABE primarily targets GAA repeats at noncoding regions of the human genome (>98%)23,49. Over 87% of candidate off-target sites harbor ≥2 mismatches with the sgGAA, which greatly reduces binding and editing activity70,71,72. As with CAG-CBE, our results suggest that computational predictions may misrepresent off-target editing (Supplementary Text and Extended Data Figs. 4g and 5a,b), emphasizing the need for empirical validation.
Adenine base editing of GAA repeats in FRDA patient cells
Small repeat interruptions, such as the A•T>G•C base changes introduced by our GAA-ABE strategy, are commonly found in FXN alleles in the general population and are associated with greater stability of GAA repeat alleles21,106, higher expression of FXN genes, milder disease or later disease onset in patients with FRDA compared with those with uninterrupted repeats of a similar length16,21,23,49,106,107. To assess GAA-ABE activity on pathogenic expanded GAA repeats, we quantified repeat interruptions in two primary patient fibroblast cell lines that each carry two long pathogenic FXN alleles with ~330/380 (GM03816) or ~541/420 (GM04078) GAA repeats. We treated these cells by mRNA electroporation of the ABE8e-dNRCH editor and synthetic sgGAA47,57, and after 5 d observed 20 ± 7.0% average repeat interruption in control fibroblasts (8 or 9 GAA repeats), and 33 ± 12% and 32 ± 9.5% interruption in GM03816 and GM04078 alleles, respectively (Fig. 5a,b, Supplementary Text and Extended Data Fig. 5c). Detected interruption frequencies remained stable over time (Extended Data Fig. 5d), and we observed a positive correlation between editing efficiency and repeat length, concordant with GAA-ABE editing in HEK293T and mES-FXN reporter cell lines (Fig. 4b,c), and consistent with the increased opportunity for base editor binding at longer GAA repeats.
a, Base editing of FXN GAA repeats in control and FRDA patient-derived fibroblasts. Numbers below the bar plot indicate the size of GAA repeats in each cell line. Data are mean ± s.d. of biological triplicates. b, Observed and estimated FXN GAA repeat editing in control and FRDA patient-derived fibroblasts, normalized to untreated controls. Data are mean ± s.d. of biological triplicates. NS, not significant, Welch’s two-tailed t-test. c, FXN mRNA expression in human fibroblasts treated with ABEdCH or in controls, 12 d after electroporation, normalized to TBP levels. Data are mean ± s.d. of biological triplicates. *P = 0.017, Welch’s one-tailed t-test. d, Dual-AAV vectors encoding split-intein ABE8e-dNRCH and sgGAA cassettes, v6 AAV9-ABE. e, Neonatal ICV injections in YG8s mice with AAV9-ABEdCH. f, FXN GAA repeat editing in the cortex of YG8s.300 and YG8s.800 mice treated with AAV9-ABEdCH at 24 weeks postinjection, as observed in HTS or estimated, normalized to uninjected controls. Data are mean ± s.d. of independent animals (YG8s.300 n = 10, YG8s.800 n = 7). g,h, IGAA (g) and mean distribution of GAA allele sizes (\(\Delta\)GAA size) (h) in the cortex isolated from 24-week-old YG8s.300 mice treated with AAV9-ABEdCH, or controls. i,j, IGAA (i) and mean distribution of GAA allele sizes (ΔGAA size) (j) in the cortex isolated from 24-week-old YG8s.800 mice treated with AAV9-ABEdCH, or controls. Data in g and i are shown as box plots, with each data point representing an independent animal (YG8s.300: untreated n = 6, ABE n = 10; YG8s.800: untreated n = 6, ABE n = 7). The horizontal line marks the median, and whiskers represent the minimum and maximum values. ****P < 0.0001, Welch’s one-tailed t-test. h,j, Mean GAA repeat size distributions from at least four independent animals (YG8s.300: untreated n = 6, ABE n = 10; YG8s.800: untreated n = 4, ABE n = 8). The dashed line marks the modal FXN GAA allele determined from the tail. Illustrations in d and e were created using BioRender.com.
To evaluate the impact of ABEdCH-mediated GAA interruptions on FXN expression in alleles with pathogenic-length GAA repeats, we quantified FXN transcript levels in FRDA patient fibroblasts relative to untreated cells and wild-type controls. We found that ABE treatment increased FXN mRNA expression ~1.5-fold in treated FRDA fibroblasts, from ~49% to ~74% of wild-type levels, as measured by digital droplet PCR (Fig. 5c). These results suggest that base editing of pathogenic GAA repeats may alleviate FXN transcriptional repression to increase transcript levels, consistent with previous reports that correlate FXN repeat purity with transcriptional repression16,21,106,107.
Taken together, these data demonstrate that GAA-ABE enables efficient editing of pathogenic GAA repeat tracts in FXN alleles of patients with FRDA, significantly increasing FXN transcript levels and suggesting that base editing of GAA repeats can partially rescue molecular hallmarks of FRDA in human cells16,21. The most prevalent (>80%) interruptions induced by GAA-ABE at FXN were GAG and GGA (Extended Data Fig. 5e), which represent the most common FXN triplet interruptions in the general population. Thus, GAA-ABE produces FXN genotypes that resemble natural genetic variants associated with improved phenotypic outcomes in individuals with expanded GAA repeats23,49, and alleviate molecular markers of disease in FRDA cell lines16,21.
Adenine base editing of FXN alleles reduces GAA length in neurons
FRDA patient-derived fibroblasts used in this study do not exhibit measurable GAA repeat instability108,109. In contrast, the YG8s.300 and YG8s.800 mouse models of FRDA harbor a human FXN YAC transgene with ~300 and 800 GAA units, respectively, that undergo progressive instability starting at ~18 weeks of age that is biased toward expansion of repeats in somatic tissues, including the cortex, striatum and the liver, but not the tail110,111,112.
To assess how A•T>G•C interruption of long pathogenic GAA repeats affects repeat expansion at FXN alleles in vivo83,84,85,86,87, we injected the dual AAV9-ABEdCH vectors at a total dose of 3.8 × 1010 vg per mouse83,88,89,90 (Fig. 5d,e and Supplementary Text). At 24 weeks postinjection, the estimated editing in cortical alleles reached 28 ± 6.8% in YG8s.300 mice (of which 5.6% ± 0.8% was directly observed) and 55 ± 14% in YG8s.800 mice (of which 5.5% ± 2.9% was directly observed) (Fig. 5f, Supplementary Text and Extended Data Fig. 6c). Editing predominantly resulted in GAA-to-GGA changes (66 ± 2.9% of interruptions in YG8s.300 mice and 56 ± 4.0% of interruptions in YG8s.800 mice; Extended Data Fig. 6d), concordant with the results of GAA-ABE editing of pathogenic-length GAA repeats in vitro (Extended Data Fig. 5e). We estimated that edited FXN alleles acquire 8.3 A•T>G•C interruptions per 300 GAA repeats on average (1.8 A•T>G•C observed interruptions; Extended Data Fig. 6e). These estimates are in general agreement with long-read nanopore sequencing of edited alleles, which revealed up to 9.3 ± 1.2 A•T>G•C interruptions per ~300 GAAs in FXN alleles of treated YG8s mice (Extended Data Fig. 6e–g and Supplementary Text).
Next, we assessed whether A•T>G•C interruptions impact somatic GAA expansion in vivo. We quantified GAA instability index (IGAA) in AAV9-ABEdCH-treated and control YG8s mice80,113 (Methods). At 24 weeks postinjection, AAV9-ABEdCH treatment significantly reduced the average size of GAA repeats in the cortex compared with control animals (IGAA; Fig. 5g–j and Extended Data Fig. 6h,i). In YG8s.300 mice, the average GAA repeat size decreased by IGAA = −4.9 ± 0.6 repeats (Fig. 5g,h), while in YG8s.800 mice, we observed even greater reduction of IGAA = −7.2 ± 1.2 repeats (Fig. 5i,j; Welch’s one-tailed t-test P < 0.0001 and P < 0.0001, respectively). Notably, cortical GAA size variation in treated YG8s.800 mice did not differ from tail tissue, which does not undergo repeat expansion (not significant, Welch’s one-tailed t-test; Fig. 5i), suggesting that AAV9-ABEdCH effectively halts spontaneous cortical expansions (Fig. 5i and Extended Data Fig. 6j).
AAV9-ABEdCH reduced somatic repeat expansions (expansion index, IGAA(e)) in YG8s.300 mice (IGAA(e) = −2.9 ± 0.6 repeats, Welch’s one-tailed t-test P = 0.0002) and in YG8s.800 mice (IGAA(e) = −5.2 ± 0.9 repeats, Welch’s one-tailed t-test P = 0.0003; Extended Data Fig. 6j) while also promoting repeat contractions (contraction index, IGAA(c)). At 24 weeks postinjection, we observed a reduction in GAA repeat length in cortical FXN alleles of YG8s.300 mice (IGAA(c) = −2.0 ± 0.5 repeats, Welch’s one-tailed t-test P = 0.0022) and, to an even greater extent, in YG8s.800 mice (IGAA(c) = −5.0 ± 2.1 repeats, Welch’s one-tailed t-test P = 0.0203), compared with controls (Extended Data Fig. 6j), suggesting that longer FXN alleles are more prone to contractions in YG8s mice.
Collectively, these data establish that neonatal ICV injection of AAV9-ABEdCH enables substantial transduction of FRDA-relevant tissues that undergo somatic repeat expansion. The direct installation of A•T>G•C interruptions using this delivery method at pathogenic-length FXN alleles significantly reduces GAA repeat size by limiting repeat expansions and inducing repeat contractions, highlighting that base editing can both stabilize and contract pathogenic FXN GAA repeats in postnatal animals.
Discussion
TNR diseases affect ~1 in 3,000 individuals worldwide1. Recently approved Skyclarys (omaveloxolone) is the first treatment for FRDA that delays disease progression in some patients114,115, and symptom management plans provide relief to patients living with poly-Q disorders including HD. To date, however, there are no therapeutic interventions that reverse or stop the motor and neurological decline of any TNR disorder.
The Htt.Q111 and YG8s mouse models used in our study do not exhibit the motor and behavioral phenotypes observed in patients with HD and FRDA. Mouse models of FRDA that harbor human pathogenic FXN alleles on a mouse Fxn knockout background (YG8-800 (ref. 116)) or HD mouse models with longer CAG repeats (Q140 (ref. 117)) may better recapitulate both the molecular and physiological human disease manifestations. Future studies using such models may help validate whether reduced expansions of pathogenic-length TNR alleles following base editor treatment of expanded repeats improves motor and behavioral function to further verify the protective role of repeat interruptions in TNR disorders.
Here, we packaged our repeat-targeting base editing strategies into AAV9 vectors and delivered to murine neonates via ICV injection. We observed that base editing of pathogenic repeats reduced repeat expansions in the CNS of Htt.Q111 and YG8s mouse models of HD and FRDA. However, GAA repeat expansion and frataxin deficiency in patients with FRDA affect non-neuronal tissues as well, including glial cells118,119, which may exacerbate neural degeneration in FRDA. Moreover, patients with FRDA often develop a cardiomyopathy associated with heart failure and death120. While AAV9 vectors have a well-established neuronal tropism and have been demonstrated to primarily target neurons in the CNS84,85,86, alternative AAV serotypes and routes of administration may enable greater editing and treatment of additional FRDA disease-relevant tissues in older animals83,121. Such approaches would facilitate the investigation of later-stage rescue of repeat instability in vivo, including at alleles that already exhibit measurable somatic instability.
AAV delivery results in potentially long-term base editor expression, which promotes the continued accumulation of repeat interruptions in vivo. Prolonged base editor expression may also result in an increase in off-target editing events; however, in previous studies we have not observed an increased accumulation of genomic off-targets from constitutive base editor expression in vivo over time89. Unintended genomic changes may affect cell function and confound outcomes in targeted cells, and off-target editing risk is an important consideration for the development of a base editing therapeutic; thus, minimizing the off-target activity is desirable for both the investigation and potential future treatment of TNR diseases using base editors. Deaminase variants such as those in ABE8e-V106W (refs. 65,93), ABE8.17-m (refs. 122) or V106W variants of TadCBEs or CBE6s (refs. 108,109) have been shown to reduce Cas-independent editing events of base editors, and alternative delivery methods including localized administration and delivery modalities that facilitate only transient or cell-type-specific expression of base editors may further reduce the physiological burden of off-target editing in vivo123,124.
In this study, we have begun the characterization of unintended targets of these repeat base editing strategies, and found that: (1) the level of off-target editing is inversely correlated with the number of mismatches between the repeat-targeting sgRNA and the sequence of the off-target locus, as expected70,71,72; (2) the vast majority of undesired editing occurs in noncoding or intergenic regions of the human genome; and (3) repeat-targeting base editing often leads to the induction of benign single-nucleotide variations that are observed in the general population and synonymous substitutions at protein-coding loci that preserve endogenous protein sequence. The alternative target and off-target sites of repeat-targeting in the human genome observed in this study warrant further comprehensive cell-type-specific longitudinal analyses to evaluate the regulatory risks of accumulated mutations in targeted tissues, to better assess the safety profile of our approaches and whether interrupting pathogenic repeats that underlie TNR diseases may be a viable therapeutic approach in the future. Nonetheless, the approaches and findings developed here should prove useful to elucidate the causality and biological consequences of uninterrupted and interrupted repeat tracts in cultured cells and animal models of TNR diseases.
Methods
This research complies with relevant ethical regulations. The study protocol was approved by the Broad’s Institutional Biosafety Committee, the Broad’s Institutional Animal Care and Use Committee (IACUC) and relevant IACUC compliance committees at Massachusetts General Hospital.
Cell culture
Culture of mESCs and HEK293T cells was performed according to previously published protocols100. Undifferentiated 129P2/OlaHsd mESC (male) lines125 were maintained on 0.2% gelatin-coated plates feeder-free in mESC media composed of Knockout DMEM (Life Technologies) supplemented with 15% defined FBS (HyClone), 0.1 mM nonessential amino acids (Life Technologies), Glutamax (Life Technologies), 0.55 mM 2-mercaptoethanol (Sigma-Aldrich), 1X ESGRO LIF (Millipore), with the addition of 2i: 5 nM GSK-3 inhibitor XV (Sigma-Aldrich), and 500 nM UO126 (Sigma-Aldrich). FXN-mESCs were generated for this project (see below). HEK293T cells were purchased from ATCC (CRL-3216) and were maintained in DMEM (Thermo Fisher) supplemented with 10% FBS (Thermo Fisher). Human fibroblasts lines were purchased from the Coriell Institute and were maintained in DMEM (Thermo Fisher) supplemented with 20% FBS (Thermo Fisher). Human fibroblast lines used in this study included: GM07492 (healthy control); GM04855, GM04281 and GM09197 (HD lines); and GM03816 and GM04078 (FRDA lines). All cells were regularly tested for mycoplasma.
For genome editing experiments, cells were seeded 1 d before the experiment to be ~70–80% confluent on the day of transfection and transfected with sgRNA and base editor plasmids at a 1:1 molar ratio using Lipofectamine 3000 (Thermo Fisher) in accordance with the manufacturer’s protocols. For stable integration of plasmids, cells were co-transfected with Tol2 transposase at an equimolar ratio. For antibiotic selection, FXN-mESCs were treated with 50 μg ml−1 hygromycin B (Thermo Fisher) and/or 6.67 μg ml−1 blasticidin S (Thermo Fisher), as indicated, starting 24 h after transfection. Selected cells were allowed to recover and expand before collection. All sgRNA sequences designed for this study are listed in Supplementary Table 27.
Cloning
Base editor plasmids were constructed by replacing the deaminase and Cas-protein domains of the p2T-CMV-ABE7.10-BlastR (Addgene, cat. no. 152989) plasmid or p2T-CMV-BE4max-BlastR with USER cloning (New England Biolabs)54. Individual sgRNAs were cloned into the SpCas9-hairpin U6-sgRNA expression plasmid (Addgene, cat. no. 71485) using BbsI plasmid digest and Gibson assembly (New England Biolabs). Protospacer sequences and gene-specific primers used for amplification followed by HTS are listed in Supplementary Table 27. Constructs were transformed into Mach1 chemically competent Escherichia coli (Thermo Fisher) and grown on Luria-Bertani (LB) agar plates, and liquid cultures were grown in LB broth overnight at 37 °C with 100 μg ml−1 ampicillin. Individual colonies were validated by Templiphi rolling circle amplification (Cytivia) followed by Sanger sequencing. Verified plasmids were prepared by mini, midi or maxiprep (Qiagen).
AAV vectors for FXN GAA editing were cloned by Gibson assembly using NEB Stable Competent E. coli (High Efficiency; New England Biolabs) to insert the sgRNA sequence and N-terminal base editor half of ABE8e-dNRCH into v6 Cbh-AAV-ABE-NpuN+U6-sgRNA (Addgene, cat. no. 137177), and the C-terminal base editor half and a second U6-sgRNA cassette into v6 Cbh-AAV-ABE-NpuC (Addgene, cat. no. 137178)83. AAV vectors for HTT CAG editing were similarly cloned by Gibson assembly using NEB Stable Competent E. coli (High Efficiency). The sgRNA sequence and the N-terminal base editor half of EA-evoA-32NLS-NG were cloned into v5 Cbh-AAV-CBE-NpuN+U6-sgRNA (Addgene, cat. no. 137175), and the C-terminal base editor half and a second U6-sgRNA cassette into v5 Cbh-AAV-CBE-NpuC (Addgene, cat. no. 137176)83.
Electroporation
Patient-derived fibroblast lines were obtained from Coriell and cultured in high-glucose DMEM (Thermo Fisher) supplemented with 20% (v/v) FBS. The following lines were used for HD: GM04855 (20/48 CAG), GM04281 (17/71 CAG), GM09197 (18/180 CAG) and GM07492 (15/16 CAG); and for FRDA: GM07492 (8/9 GAA), GM03816 (330/380 GAA) and GM04078 (541/420 GAA). Patient-derived fibroblasts (Coriell) were grown to 80% confluency on a 15-cm plate, washed with PBS (Thermo Fisher), trypsinized using TrypLE Express enzyme (Thermo Fisher) and suspended in 10 ml of media. Cells were transferred to Falcon tubes and centrifuged for 8 min at 150g and washed twice in 1 ml of PBS. Each electroporation reaction was assembled with 200,000 cells, 1,000 mg of editor mRNA and 50 pmol of sgRNA (Synthego) and performed with a P2 cell line kit using the DS-150 program (Lonza) on a Lonza 4D nucleofector. Immediately after electroporation, 80 ml of media was added to each well and incubated at room temperature for 10 min before transferring into 1 ml of media in a 24-well plate. Electroporated cells were grown for 5 d or as indicated in the text, with media change every other day. At the end of the experiment, cells were collected, and extracted DNA and RNA (AllPrep DNA/RNA, Qiagen) were used for downstream applications (sequencing, gel electrophoresis and/or droplet digital PCR (ddPCR)).
Generation of transgenic mESCs
Transgenic mESCs were generated by Tol2-mediated integration of a custom transgene into mESCs. For generation of FXN-mESCs, the transgene was generated by PCR amplification of gBlocks (IDT) encoding an FXN intron 1 with 30 GAA repeats with 250 base pairs (bp) of flanking sequence on each end of the repeats. Constructs with longer GAA repeats were cloned by PCR amplification of the FXN locus from YG8s.300 mouse with the resulting amplicon harboring ~60 GAA repeats. The obtained transgenes were cloned into p2T-CAGGS-MCS-p2A-GFP-PuroR plasmid (Addgene, cat. no. 107186) using Gibson assembly (New England Biolabs) into BamHI cloning site. Constructs were transformed into Stb3 chemically competent E. coli (Thermo Fisher) and grown on LB agar plates and liquid cultures were grown in LB overnight at 37 °C with 100 μg ml−1 ampicillin. Individual colonies were validated by Templiphi rolling circle amplification (Cytivia) followed by Sanger sequencing. Verified plasmids were prepared by mini, midi or maxiprep (Qiagen). Six-well plates with >105 initial mESCs were transfected with a total of 3.75 μg of FXN-30GAA or FXN-60GAA transgene together with 3.75 μg of Tol2 plasmid to allow for stable genomic integration using Lipofectamine 3000 and according to manufacturer protocols. Transfected cells were selected with 0.25 μg ml−1 Puromycin (Thermo Fisher, cat. no. A1113803) starting the day after transfection for 4 d, before splitting 1:1. After reaching confluency, cells were sorted for the GFP+ population, plated with no antibiotic and expanded for at least 7 d. The GFP+ enriched population was GFP+ sorted again, expanded, selected with Puromycin for at least 7 d and serially diluted onto two 96-well plates. Obtained individual clones were genotyped and the presence of the FXN transgene was confirmed by PCR (Supplementary Table 27). The number of transgene integrations was quantified by ddPCR (Supplementary Table 27), and ranged from 4 to 8 integrations for selected FXN-30GAA lines and from 6 to 20 integrations for selected FXN-60GAA lines. Clones with correct and intact transgenes were further expanded for base editing experiments. Selected cell lines harbored 4, 4 and 2 integrations of the FXN-30GAA transgenes and 3, 9 and 10 integrations of the FXN-60GAA transgenes, and were used as independent biological replicates in the base editing experiments.
HTS of gDNA
Library preparation was performed according to previously published protocols54. Primers used in this study are listed in Supplementary Table 27. Briefly, we isolated gDNA with the QIAamp DNA mini kit (Qiagen) and used 50–200 ng of gDNA to assess individual locus editing. Sequencing libraries were amplified in two steps. First, to amplify the locus of interest and, second, to add full-length Illumina sequencing adapters using the NEBNext Index Primer Sets 1 and 2 (New England Biolabs) or internally ordered primers with equivalent sequences. PCRs of HTT amplicons were performed with Herculase II Fusion DNA Polymerase (Agilent) according to manufacturer protocols with 2–4-min extension time and a total of 24–28 amplification cycles. PCRs of FXN amplicons in HEK293T cells were performed using NEBNext Q5 Master Mix (New England Biolabs) with 2-min extension time and 24 PCR cycles. FXN locus in patient fibroblasts and mouse tissues was amplified in a two-step nested PCR with (1) UltraRun LongRange Master Mix (9–10 cycles, 5-min extension time; Qiagen) and (2) NEBNext Q5 Master Mix (8–9 cycles, 5-min extension; New England Biolabs). All PCR reactions were supplemented with Betaine (Sigma-Aldrich) at a final concentration of 0.5 M.
Samples were pooled using TapeStation (Agilent) and quantified using a KAPA Library Quantification Kit (Roche). The pooled samples were sequenced using Illumina MiSeq sequencers and Illumina MiSeq Control software (v.3.1). Alignment of fastq files and quantification of editing frequency for individual loci were performed using custom software powTNRka (v.1.0.0, pol. ‘powtorka’–‘a repeat’) described in Supplementary Note 2 (repeat amplicons) or CRISPResso2 (typical amplicons) in batch mode126. The editing frequency for each site was calculated as the ratio between the number of modified reads (that is, containing at least one nucleotide conversion or indel) and the total number of aligned reads.
Quantification of editing
Editing levels were quantified using powTNRka, a custom alignment and analysis software described in Supplementary Note 2 (refs. 126,127). From a user’s perspective, powTNRka accepts International Union of Pure and Applied Chemistry nomenclature when computing alignments. In this work, alleles for base editing were specified as YAR (C or T; A; A or G) for HTT to capture all possible cytosine base editing events, including those resulting from the opposite-strand deamination, and GRR (G; A or G; A or G) for FXN to capture all combinations of adenine base editing events.
Estimation of editing
Sequencing reads were filtered and aligned using powTNRka. For each sample and condition, all pure GAA triplets and triplets with A•T>G•C interruptions were calculated. These data were used to determine the fraction \(\left(f\;\right)\) of all sequenced triplets that contain interruptions. Fraction \(f\) was then normalized to the average number of A•T>G•C interruptions \(\left({n}_{\rm{i}}\right)\) observed in interrupted FXN alleles in the respective sample (Fig. 5c,f and Extended Data Fig. 6b). The resulting factor \((\frac{f}{{n}_{\rm{i}}})\) describes the probability of an A•T>G•C edit occurring at a single GAA triplet across all sequenced reads. This factor was then used to estimate the probability that an FXN GAA allele of a specified repeat size \(\left(N\right)\) contains at least one interruption (that is, the estimated editing) according to the following formula: \(1-{(1-\frac{f}{{n}_{\rm{i}}})}^{N}\).
To estimate the average number of A•T>G•C interruptions in interrupted FXN alleles of a specific GAA repeat size, we calculated the average fraction \(\left({\;f}_{\rm{i}}\right)\) of each sequenced repeat tract that was edited into interruptions across all interrupted FXN alleles. This number was then used to estimate the number of interruptions that can occur in interrupted FXN alleles of a specified GAA repeat size \(\left(N\right)\) by simple multiplication: \({f}_{\rm{i}}\times N\).
Quantification of FXN expression
RNA from FRDA patient-derived fibroblasts was isolated with the AllPrep DNA/RNA kit or the RNeasy mini kit (Qiagen). Then, 100–500 ng of RNA was used to perform reverse transcription using SuperScript IV (Thermo Fisher) according to the manufacturer’s protocols. The level of FXN transcripts was quantified by digital droplet PCR (Bio-Rad) using 1 × ddPCR Supermix for probes (no dUTP (deoxyuridine triphosphate)) (Bio-Rad), complementary DNA equivalent to 10–20 ng of initial RNA input and two sets of primers/probes: human frataxin (Bio-Rad, assay ID: dHsaCNS648692366) and human TBP (Bio-Rad, assay ID: dHsaCPE5058363). Droplet generation, PCR amplification (95 °C for 10-min ramp at 2 °C s−1, (94 °C for 30-s ramp at 2 °C s−1, 60 °C for 1-min ramp at 2 °C s−1) × 49, 98 °C for 10-min ramp at 2 °C s−1) and droplet reading were performed on a QX ONE Droplet Digital PCR system (Bio-Rad). ddPCR data were analyzed using QX ONE Software (Bio-Rad).
Purification of NRCH and NG Cas nuclease proteins
NRCH and NG Cas nuclease proteins were cloned into the expression plasmid pD881-SR (Atum, cat. no. FPB-27E-269). The resulting plasmid was transformed into BL21 Star DE3 competent cells (Thermo Fisher, cat. no. C601003). Colonies were picked for overnight growth in Terrific Broth + 25 μg ml−1 kanamycin at 37 °C. The next day, 2 l of prewarmed Terrific Broth was inoculated with overnight culture at a starting optical density (OD)600 of 0.05. Cells were shaken at 37 °C for about 2.5 h until the OD600 was ~1.5. Cultures were cold shocked in an ice-water slurry for 1 h, at which point l-rhamnose was added to a final concentration of 0.8%. Cultures were then incubated at 18 °C with shaking for 24 h to induce protein expression. Following induction, cells were pelleted and flash-frozen in liquid nitrogen and stored at −80 °C. The next day, cells were resuspended in 30 ml of cold lysis buffer (1 M NaCl, 100 mM Tris-HCl pH 7.0, 5 mM TCEP, 20% glycerol) with five tablets of cOmplete, EDTA-free protease inhibitor cocktail (Millipore Sigma, cat. no. 4693132001). Cells were passed three times through a homogenizer (Avestin Emulsiflex-C3) at ~124.1 MPa to lyse. Cell debris was pelleted by centrifugation at 20,000g for 20 min at 4 °C. Supernatant was collected and spiked with 40 mM imidazole, followed by a 1-h incubation at 4 °C with 1 ml of Ni-NTA resin slurry (G Bioscience, cat. no. 786-940, prewashed once with lysis buffer). Protein-bound resin was washed twice with 12 ml of lysis buffer in a gravity column at 4 °C. Protein was eluted in 3 ml of elution buffer (300 mM imidazole, 500 mM NaCl, 100 mM Tris-HCl pH 7.0, 5 mM TCEP, 10% glycerol). Eluted protein was diluted in 40 ml of low-salt buffer (100 mM Tris-HCl, pH 7.0, 1 mM TCEP, 20% glycerol) just before loading into a 50-ml Akta Superloop for ion-exchange purification on the Akta Pure25 fast protein liquid chromatography. Ion-exchange chromatography was conducted on a 5-ml GE Healthcare HiTrap SP HP prepacked column (Cytivia, cat. no. 17115201). After washing the column with low-salt buffer, the diluted protein was flowed through the column to bind. The column was then washed in 15 ml of low-salt buffer before being subjected to an increasing gradient to a maximum of 80% high-salt buffer (1 M NaCl, 100 mM Tris-HCl, pH 7.0, 5 mM TCEP, 20% glycerol) over the course of 50 ml, at a flow rate of 5 ml min−1. We collected 1-ml fractions during this ramp to high-salt buffer. Peaks were assessed by SDS–PAGE to identify fractions containing the desired protein, which were concentrated first using an Amicon Ultra 15-ml centrifugal filter (100-kDa cutoff, UFC910024), followed by a 0.5-ml 100-kDa cutoff Pierce concentrator (cat. no. 88503). Concentrated protein was quantified using a BCA assay and determined to be 12.6 mg ml−1 (Thermo Fisher, cat. no. 23227).
CIRCLE-seq off-target editing analysis
Off-target analysis using CIRCLE-seq was performed as previously described62,104. Briefly, gDNA from HEK293T cells or NIH3T3 cells was isolated using Gentra Puregene Kit (Qiagen) according to the manufacturer’s instructions. Purified gDNA was sheared with a Covaris S2 instrument to an average length of 300 bp. The fragmented DNA was end-repaired, poly(A)-tailed and ligated to a uracil-containing stem-loop adapter using the KAPA HTP Library Preparation Kit, PCR Free (Roche). Adapter-ligated DNA was treated with Lambda Exonuclease and E. coli Exonuclease I, then with USER enzyme and T4 polynucleotide kinase (New England Biolabs). Intramolecular circularization of the DNA was performed with T4 DNA ligase (New England Biolabs) and residual linear DNA was degraded by Plasmid-Safe ATP-dependent DNase (Lucigen). In vitro cleavage reactions were performed with 250 ng of Plasmid-Safe ATP-dependent DNase-treated circularized DNA, 90 nM NRCH or NG Cas9 nuclease protein, Cas9 nuclease buffer (New England Biolabs) and 90 nM synthetic chemically modified sgRNA (Synthego), in a total volume of 100 μl. Cleaved products were poly-A-tailed, ligated with a hairpin adapter, treated with USER enzyme and amplified by PCR with barcoded universal primers NEBNext Multiplex Oligos for Illumina (New England Biolabs), using Kapa HiFi Polymerase (Roche). Libraries were sequenced with 150-bp paired-end reads on an Illumina MiSeq instrument. CIRCLE-seq data analyses were performed using open-source CIRCLE-seq analysis software and default recommended parameters (https://github.com/tsailabSJ/circleseq), using the human genome assembly hg19 as the reference genome. The CIRCLE-seq nominated sites (in hg19) were converted to hg38 to enable downstream analysis with tools that use hg38 as their reference genome. All coordinates in the corresponding Supplementary Tables are provided in hg19, however. Genomic region assignments for the identified off-target hits were performed with HOMER73.
WGS and data analysis
HEK293T cells were seeded in a 96-well plate and transfected at 70% confluence with 66 ng of sgRNA and 200 ng of CBE or ABE base editor in six technical replicates and three biological replicates using Lipofectamine 3000 (Thermo Fisher), as described above. At 3 d after transfection, cells were collected, gDNA was isolated with the QIAamp DNA mini kit (Qiagen) and technical replicates were pooled together. PCR-free library preparation and WGS were performed by the Broad Institute Genomics Platform. Briefly, 350 ng of human DNA was acoustically sheared with an ultrasonicator (Covaris) to obtain 450-bp-long fragments. Libraries were created using a Kapa HyperPrep Plus kit according to manufacturer protocols. All samples were paired-end sequenced (2 × 150 bp) on the NovaSeqX Platform for 160× sequencing coverage. Initial data processing and read alignment were performed by the Broad Institute Genomics Platform. Reads were demultiplexed and aligned to hg38 using DRAGEN (v.3). The obtained WGS data were used to investigate the editing activity of base editors at genomic coordinates nominated in the CIRCLE-seq analyses described above.
All subsequent analyses were performed using the Muhee Cluster high-performance computing cluster (Brigham and Women’s Hospital). pysam (v.0.22.1) was used to analyze editing frequency128,129,130,131. The editing fraction at CIRCLE-seq-nominated loci was calculated on every sample independently. Genomic loci represented by 30 or fewer sequencing reads were filtered out. Editing (that is, frequency of a CAA•TTG or GAG•CTC, GGA•CCT or GGG•CCC edit) was measured at every amenable nucleotide position at the remaining sites. The untreated control group was used as a reference to exclude nucleotide positions where the background allele frequency was >2.5%, indicating that these variants were already present in the initial unedited cell population and therefore that polymorphisms at these positions are unlikely to have resulted from base editing activity. The editing at each CIRCLE-seq-nominated locus was then assessed by applying a probability algorithm that accounts for editing at each nucleotide position within a given locus to estimate the likelihood of the locus acquiring at least one interruption54. While this pipeline can sensitively detect rare base changes at >0.5% editing threshold, including those naturally occurring in cell culture, this lenient filter may overestimate base editing in treated samples. Due to this inherent high technical error, while we generally report editing values using a >0.5% threshold, we apply a ≥5% editing threshold (‘substantial’ editing) for detailed downstream analyses of mutated allele frequencies in WGS data. This ≥5% editing threshold is also used to classify edits as synonymous, nonsynonymous or nonsense.
The MANE.GRCh38.v1.3.refseq genomic dataset132 was used to analyze the effect of base editing on amino acid sequence at respective off-target loci for all protein-coding sites. The normal tissue database from the Human Protein Atlas was used to check whether the coding loci or genes are expressed in brain-related tissues105. Essential genes were identified with the Cancer DepMap database74,75 and the effect of amino acid substitutions on protein folding and function was determined with the AlphaMissense database76,77.
Husbandry of Htt.Q111 and YG8s mice
All animal procedures were carried out to minimize pain and discomfort, under approved IACUC protocols of the Massachusetts General Hospital. Animal husbandry was performed under controlled temperature (18–23 °C) with humidity 24–60%, and on a 12-h light/dark cycle. Animals were housed in groups of 1–5 individuals per cage, with the same sex and strain. Animals were trans-cardially perfused via thoracotomy using Avertin or euthanized with CO2 to collect tissues for DNA and RNA. Both male and female animals were used in the study.
Heterozygous Htt.Q111 animals were maintained on a C57BL/6J background133. gDNA was isolated from tail biopsies using Quick Extract DNA Extraction Solution (VWR, cat. no. QE09050). The litter was genotyped to determine the size of HTT CAG repeats in Htt.Q111 allele by PCR using human-specific HTT primers (Supplementary Table 27) and Taq PCR Core Kit with Q solution (Qiagen)134,135, with the following thermocycling conditions: initial denaturation 95 °C (5 min), 30 cycles of (95 °C (30 s), 65 °C (30 s), 72 °C (90 s)), final extension 68 °C (10 min). PCR products were resolved on either a 1% agarose gel or on an ABI3730xl automated DNA analyzer (Applied Biosystems) to check for the presence of expanded CAG repeats.
The average modal CAG repeat size (measured in the tail) of the mice used in the study was 118 (ranging from 117 to 119) for control and 114 (ranging from 108 to 120) for AAV-CBE-treated animals in the 12-week cohort, and 113 (ranging from 107 to 122) for control and 115 (ranging from 109 to 117) for AAV-CBE-treated animals in the 24-week cohort.
YG8s animals (also known as ‘Tg(FXN)YG8Pook/J’), carrying a single copy of YAC human FXN transgene with either 300 or 800 GAA repeats, were maintained on a C57BL/6J genetic background. Animals used were hemizygous for YAC transgene and wild type for endogenous Fxn. Pups were genotyped to determine the size of FXN GAA repeats by PCR amplification (Supplementary Table 27) using TaKaRa PCR Amplification Kit (Takara, cat. no. R011) along with Q solution (Qiagen)136. PCR was conducted in 20-μl reactions containing 40 ng of DNA template using the following program: 3 min at 94 °C; 20 cycles of 20 s at 94 °C, 30 s at 64 °C and 5 min at 68 °C; followed by 9 cycles of 20 s at 94 °C and 5 min at 68 °C, with each subsequent elongation step increased by 15 s; and a final extension step of 7 min at 68 °C. The average modal GAA repeat size (measured in the tail) of the mice used in the study was 355 (ranging from 343 to 378) for control and 369 (ranging from 341 to 413) for AAV-ABE-treated YG8s.300 mice, and 774 (ranging from 687 to 746) for control and 741 (ranging from 746 to 795) for AAV-ABE-treated YG8s.800 mice.
ICV injections
Neonatal ICV injections were performed as previously described83,137. Briefly, the injection was performed with 600 Series Microliter Hamilton syringe (Hamilton, cat. no. 87943). High-titer qualified AAV was obtained through the Viral Vector Core at UMass Medical School and concentrated using Amicon Ultra-15 centrifugal filter units (Millipore), quantified by quantitative PCR (AAVpro Titration Kit v.2, Clontech) and stored at 4 °C until use. For injection, a small amount of Fast Green Dye was added to the AAV injection solution to assess ventricle targeting. Htt.Q111, Yg8s or C57BL/6 pups were anesthetized by placement on ice for 2–3 min, until they were immobile and unresponsive to a toe pinch. Up to 4.5 μl of injection mix (3.8 × 1013 vg kg−1) was injected freehand, with approximately half of the volume into each ventricle on postnatal day 0–2. No overt adverse events were observed and visual inspection during sample collection and dissection of brain tissue did not reveal any obvious brain abnormalities in the animals treated with AAV-BE compared with those treated with AAV-GFP or PBS or untreated animals.
Nuclear isolation and sorting of tissues
Tissue collection and nuclear isolation were performed as previously described83. Briefly, at the endpoint for the experiment, Htt.Q111 or YG8s mice were euthanized and brain dissections were performed. For isolation of the cortex and striatum, cerebella were separated from the brain postmortem using surgical tweezers. Hemispheres were separated using a scalpel and the cortex was separated from underlying midbrain tissue with a curved scalpel and tweezers. For nuclear isolation, dissected tissue was homogenized using a glass Dounce homogenizer (Sigma, cat. no. D8938) (20 strokes with pestle A followed by 20 strokes with pestle B) in 2 ml of ice-cold EZ-PREP buffer (Sigma-Aldrich, cat. no. NUC-101). Samples were incubated for 5 min with an additional 2 ml of EZ-PREP buffer. Nuclei were centrifuged at 500g for 5 min, and the supernatant was removed. Samples were resuspended with gentle pipetting in 4 ml of ice-cold Nuclei Suspension Buffer consisting of 100 μg ml−1 BSA and 3.33 μM Vybrant DyeCycle Ruby (Thermo Fisher) in PBS and centrifuged at 500g for 5 min. The supernatant was removed, and nuclei were resuspended in 1–2 ml of Nuclei Suspension Buffer, passed through a 35-μm strainer and sorted into 200 μl of Agencourt DNAdvance lysis buffer using a MoFlo Astrios (Beckman Coulter) at the Broad Institute Flow Cytometry Core Facility. All steps were performed on ice or at 4 °C. gDNA was purified according to the Agencourt DNAdvance (Beckman Coulter) instructions for 200-μl volume.
Fragment analysis of CAG repeat instability
gDNA was isolated from mouse tissues using DNeasy Blood & Tissue Kit (Qiagen, cat. no. 69506). PCR amplification and repeat instability analysis of the HTT CAG repeat locus were performed as described above and oligonucleotides are listed in Supplementary Table 27 (refs. 134,138). The forward primer was fluorescently labeled with 6-FAM (Applied Biosystems) and the resulting FAM-labeled PCR products, encompassing the HTT CAG repeat, were resolved on the ABI3730xl automated DNA analyzer (Applied Biosystems), with either GeneScan 500-LIZ (mouse tissues) or 1200-LIZ (human fibroblasts) internal size standards, and analyzed with GeneMapper v.5 (Applied Biosystems). Frequencies of HTT alleles in HD fibroblasts were calculated relative to the predominant allele in the population, with the main allele frequency set to 1. Oligos used for repeat locus amplification are listed in Supplementary Table 27 (refs. 134,135). Of note, the amplicons shown in the agarose gel images include the repeat region as well as a total of ~136-bp flanking sequence for technical reasons related to amplification and HTS analysis. The amplicons used for fragment analysis shown in the histograms were generated using primers that include a total of ~80-bp flanking region, thus differing with the amplicon size shown on the agarose gels by ~56 bp.
CAG expansion indices in Htt.Q111 mice were determined for both AAV9-CBE-treated mice and control animals that were not treated with base editor (untreated or vehicle control mice). CAG repeat instability calculations were performed using GeneMapper peak height data, considering only expansion peaks (that is, change in CAG ≥ 0 units, rightward shift on the histogram, CAG expansion index), only contraction peaks (that is, change in CAG ≤ 0 units, leftward shift on the histogram, CAG contraction index) or both expansion and contraction peaks (ICAG), and using a 5% peak height threshold113,135. Only traces with a modal peak height ≥1,000 were used, and only contractions of up to −40 CAG repeats were considered in the analysis. For each trace, change in CAG was determined from modal allele size (‘main allele’) identified in the tail (stable tissue) of the same animal80,113. Of note, minor differences in germline CAG repeat size, the rate of somatic instability between individual Htt.Q111 animals, as well as variations in editing efficiencies in AAV9-CBE-treated animals, can result in variability in the instability index within each group. Additional technical factors, including sampling error, PCR bias due to repeat sequence and size differences between samples and DNA quality that affect peak calling in fragment analysis can also impact data quality and interpretation of these analyses.
Analysis of GAA repeat instability with long gel electrophoresis
Amplification of GAA repeats was performed according to protocols described by Long et al.136. The PCR products were run on 1% agarose gel, and the intensity of the bands was quantified by ImageLab and used to normalize the input of PCR product for a Long gel. The normalized inputs of the amplicons were loaded onto a 1% agarose gel in a large horizonal gel electrophoresis system (VWR, cat. no. 730-1112) and run at 70 V (~67 h for 300 repeats and ~120 h for 800 repeats). Long gel was imaged using a Typhoon 5 laser scanner (Amersham) and band intensities were calculated using ImageJ software to generate a repeat length histogram. For each lane, change in GAA size was determined from modal GAA allele determined in the tail (stable tissue) of the same animal, which was also run in the same gel. GAA repeat instability indices were calculated as described above for the HTT CAG repeat113, using a 10% intensity threshold. These analyses were performed for AAV9-ABEdCH-treated mice and control animals that were not treated with base editor (untreated, saline-treated or vehicle control mice).
Statistical analysis
Student’s t-tests (two-tailed) with Welch’s unequal variances correction were used to compare sequencing and mRNA levels in individual comparisons. Student’s t-tests (one-tailed) with Welch’s unequal variances correction were used to compare somatic instabilities (that is, instability index, expansion index and contraction index) in mouse tissues and FXN transcript levels in patient cells. One-sample t-test and Wilcoxon test were performed to compare editing frequencies quantified by HTS and WGS. Fisher exact P value was calculated to compare FXN genotypes in the UK Biobank. The t-tests and Pearson correlation were performed using GraphPad Prism v.9.4.1 or Microsoft Excel v.16.64. Presented error bars represent standard deviations of ≥3 independent biological replicates, unless indicated otherwise.
To compare editing outcomes in nanopore sequencing data, the Kolmogorov–Smirnov test was used to compare distributions of the number of interruption products observed in each amplicon. For sample i containing \({n}_{i}\) reads, we manually computed the empirical distribution function for the number of interruption products, \({F}_{i,{n}_{i}}\left(x\right)\). For each comparison between samples \(i,j\), the Kolmogorov–Smirnov test statistic \({D}_{i,\;j}\) was computed manually based on the formula \({D}_{i,\;j}=\mathop{\max }\limits_{x}\left|{F}_{i,{n}_{i}}\left(x\right)-{F}_{j,{n}_{j}}(x)\right|\). P values were then computed by numerically minimizing the following function with the BFGS algorithm: \({P}_{{\rm{KS}}}={\mathrm{arg }}\mathop{\min }\limits_{P}{\left({D}_{i,\;j}-\sqrt{-\frac{{n}_{i}+{n}_{j}}{2{n}_{i}{n}_{j}}\times \log \frac{P}{2}}\right)}^{2}\,\)
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The plasmids used in this study are available through AddGene (depositor: David R. Liu, AddGene IDs: 232720–232724, https://www.addgene.org/browse/article/28252668/). DNA sequencing files can be accessed using the NCBI SRA (PRJNA1193010). Other databases used in this study include: human genome assemblies hg19 and hg38, MANE.GRCh38.v1.3.refseq gnomic dataset, UK Biobank data-field 24062, GENCODE mouse reference genome M32 (GRCm39), the Human Protein Atlas, Cancer DepMap and the AlphaMissense database. All data are available in the main text and the Supplementary Information. Source data are provided with this paper.
Code availability
Code for powTNRka software written for this paper can be found at https://github.com/alvin-hsu/powTNRka. Supporting source code, which includes all scripts needed to reproduce the data analysis, can be found at https://doi.org/10.7910/DVN/8NFSTC (ref. 139).
Change history
20 June 2025
In the version of the article initially published, an earlier, incorrect version of the Supplementary Tables 1–24, 27 was included. The correct file is now available online.
References
Paulson, H. Repeat expansion diseases. Handb. Clin. Neurol. 147, 105–123 (2018).
Halman, A., Dolzhenko, E. & Oshlack, A. STRipy: a graphical application for enhanced genotyping of pathogenic short tandem repeats in sequencing data. Hum. Mutat. 43, 859–868 (2022).
Depienne, C. & Mandel, J. L. 30 years of repeat expansion disorders: what have we learned and what are the remaining challenges? Am. J. Hum. Genet. 108, 764 (2021).
Cheng, Y., Zhang, S. & Shang, H. Latest advances on new promising molecular-based therapeutic approaches for Huntington’s disease. J. Transl. Int. Med. 12, 134 (2024).
Ramakrishnan, S., Shah, M. & Gupta, V. Trinucleotid Repeat Disorders. In StatPearls [Internet] (StatPearls Publishing, updated 11 December 2024).
Saini, A. K. et al. Recent advances in the treatment strategies of Friedreich’s ataxia: a review of potential drug candidates and their underlying mechanisms. Curr. Pharm. Des. 30, 1472–1489 (2024).
Khristich, A. N. & Mirkin, S. M. On the wrong DNA track: molecular mechanisms of repeat-mediated genome instability. J. Biol. Chem. 295, 4134–4170 (2020).
Lin, Y., Dent, S. Y. R., Wilson, J. H., Wells, R. D. & Napierala, M. R loops stimulate genetic instability of CTG · CAG repeats. Proc. Natl Acad. Sci. USA 107, 692–697 (2010).
Reddy, K. et al. Processing of double-R-loops in (CAG)·(CTG) and C9orf72 (GGGGCC)·(GGCCCC) repeats causes instability. Nucleic Acids Res. 42, 10473–10487 (2014).
Gold, M. A. et al. Restarted replication forks are error-prone and cause CAG repeat expansions and contractions. PLoS Genet. 17, e1009863 (2021).
Lee, J. M. et al. Identification of genetic factors that modify clinical onset of Huntington’s disease. Cell 162, 516–526 (2015).
McAllister, B. et al. Exome sequencing of individuals with Huntington’s disease implicates FAN1 nuclease activity in slowing CAG expansion and disease onset. Nat. Neurosci. 25, 446–457 (2022).
Lee, J. M. et al. CAG repeat not polyglutamine length determines timing of Huntington’s disease onset. Cell 178, 887–900.e14 (2019).
Wheeler, V. C., Stone, J. C., Massey, T. H. & Pinto, R. M. Chapter 4 - The instability of the Huntington’s disease CAG repeat mutation. In Huntington’s Disease (eds Yang, X. W., Thompson, L. M. & Heiman, M.) 85–115 (Academic Press, 2024).
Handsaker, R. E. et al. Long somatic DNA-repeat expansion drives neurodegeneration in Huntington’s disease. Cell 188, 623–639.e19 (2025).
Sakamoto, N. et al. GGA*TCC-interrupted triplets in long GAA*TTC repeats inhibit the formation of triplex and sticky DNA structures, alleviate transcription inhibition, and reduce genetic instabilities. J. Biol. Chem. 276, 27178–27187 (2001).
Rolfsmeier, M. L. & Lahue, R. S. Stabilizing effects of interruptions on trinucleotide repeat expansions in Saccharomyces cerevisiae. Mol. Cell. Biol. 20, 173–180 (2000).
Xu, P., Pan, F., Roland, C., Sagui, C. & Weninger, K. Dynamics of strand slippage in DNA hairpins formed by CAG repeats: roles of sequence parity and trinucleotide interrupts. Nucleic Acids Res. 48, 2232–2245 (2020).
Sobczak, K. & Krzyzosiak, W. J. CAG repeats containing CAA interruptions form branched hairpin structures in spinocerebellar ataxia type 2 transcripts. J. Biol. Chem. 280, 3898–3910 (2004).
Ditch, S., Sammarco, M. C., Banerjee, A. & Grabczyk, E. Progressive GAA·TTC repeat expansion in human cell lines. PLoS Genet. 5, 1000704 (2009).
Ohshima, K. et al. A nonpathogenic GAAGGA repeat in the Friedreich gene: implications for pathogenesis. Neurology 53, 1854 (1999).
Choi, D. E. et al. Base editing strategies to convert CAG to CAA diminish the disease-causing mutation in Huntington’s disease. eLife 12, RP89782 (2024).
Nethisinghe, S. et al. Interruptions of the FXN GAA repeat tract delay the age at onset of Friedreich’s ataxia in a location dependent manner. Int. J. Mol. Sci. 22, 7507 (2021).
Eichler, E. E. et al. Length of uninterrupted CGG repeats determines instability in the FMR1 gene. Nat. Genet. 8, 88–94 (1994).
Yrigollen, C. M., Mendoza-Morales, G., Hagerman, R. & Tassone, F. Transmission of an FMR1 premutation allele in a large family identified through newborn screening: the role of AGG interruptions. J. Hum. Genet 58, 553–559 (2013).
Latham, G. J., Coppinger, J., Hadd, A. G. & Nolin, S. L. The role of AGG interruptions in fragile X repeat expansions: a twenty-year perspective. Front. Genet. 5, 244 (2014).
Yrigollen, C. M. et al. AGG interruptions within the maternal FMR1 gene reduce the risk of offspring with fragile X syndrome. Genet. Med. 14, 729–736 (2012).
Villate, O. et al. Effect of AGG interruptions on FMR1 maternal transmissions. Front. Mol. Biosci. 7, 556545 (2020).
Pearson, C. E. et al. Interruptions in the triplet repeats of SCA1 and FRAXA reduce the propensity and complexity of slipped strand DNA (S-DNA) formation. Biochemistry 37, 2701–2708 (1998).
Pešović, J. et al. Repeat interruptions modify age at onset in myotonic dystrophy type 1 by stabilizing DMPK expansions in somatic cells. Front. Genet. 9, 601 (2018).
Cumming, S. A. et al. De novo repeat interruptions are associated with reduced somatic instability and mild or absent clinical features in myotonic dystrophy type 1. Eur. J. Hum. Genet. 26, 1635 (2018).
Mangin, A. et al. Robust detection of somatic mosaicism and repeat interruptions by long-read targeted sequencing in myotonic dystrophy type 1. Int. J. Mol. Sci. 22, 2616 (2021).
Santoro, M., Masciullo, M., Silvestri, G., Novelli, G. & Botta, A. Myotonic dystrophy type 1: role of CCG, CTC and CGG interruptions within DMPK alleles in the pathogenesis and molecular diagnosis. Clin. Genet. 92, 355–364 (2017).
Falik-Zaccai, T. C. et al. Predisposition to the fragile X syndrome in Jews of Tunisian descent is due to the absence of AGG interruptions on a rare Mediterranean haplotype. Am. J. Hum. Genet. 60, 103 (1997).
Nolin, S. L. et al. Fragile X full mutation expansions are inhibited by one or more AGG interruptions in premutation carriers. Genet. Med. 17, 358–364 (2015).
Findlay Black, H. et al. Frequency of the loss of CAA interruption in the HTT CAG tract and implications for Huntington disease in the reduced penetrance range. Genet. Med. 22, 2108 (2020).
Gao, R. et al. Instability of expanded CAG/CAA repeats in spinocerebellar ataxia type 17. Eur. J. Hum. Genet. 16, 215–222 (2007).
Choudhry, S., Mukerji, M., Srivastava, A. K., Jain, S. & Brahmachari, S. K. CAG repeat instability at SCA2 locus: anchoring CAA interruptions and linked single nucleotide polymorphisms. Hum. Mol. Genet. 10, 2437–2446 (2001).
Charles, P. et al. Are interrupted SCA2 CAG repeat expansions responsible for parkinsonism? Neurology 69, 1970–1975 (2007).
Fournier, C. et al. Interrupted CAG expansions in ATXN2 gene expand the genetic spectrum of frontotemporal dementias. Acta Neuropathol. Commun. 6, 41 (2018).
Kim, J. M. et al. Importance of low-range CAG expansion and CAA interruption in SCA2 parkinsonism. Arch. Neurol. 64, 1510–1518 (2007).
Wright, G. E. B. et al. Length of uninterrupted CAG, independent of polyglutamine size, results in increased somatic instability, hastening onset of Huntington disease. Am. J. Hum. Genet. 104, 1116–1126 (2019).
Rees, H. A. & Liu, D. R. Base editing: precision chemistry on the genome and transcriptome of living cells. Nat. Rev. Genet. 19, 770–788 (2018).
Komor, A. C., Badran, A. H. & Liu, D. R. Editing the genome without double-stranded DNA breaks. ACS Chem. Biol. 13, 383–388 (2018).
Komor, A. C., Badran, A. H. & Liu, D. R. CRISPR-based technologies for the manipulation of eukaryotic genomes. Cell 168, 20–36 (2017).
Anzalone, A. V., Koblan, L. W. & Liu, D. R. Genome editing with CRISPR–Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol. 38, 824–844 (2020).
Huang, T. P., Newby, G. A. & Liu, D. R. Precision genome editing using cytosine and adenine base editors in mammalian cells. Nat. Protoc. 16, 1089–1128 (2021).
Newby, G. A. & Liu, D. R. In vivo somatic cell base editing and prime editing. Mol. Ther. 29, 3107–3124 (2021).
Al-Mahdawi, S. et al. Large interruptions of GAA repeat expansion mutations in Friedreich ataxia are very rare. Front. Cell. Neurosci. 21, 443 (2018).
Wright, G. E. B. et al. Interrupting sequence variants and age of onset in Huntington’s disease: clinical implications and emerging therapies. Lancet Neurol. 19, 930–939 (2020).
Thuronyi, B. W. et al. Continuous evolution of base editors with expanded target compatibility and improved activity. Nat. Biotechnol. 37, 1070 (2019).
Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).
Koblan, L. W. et al. Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat. Biotechnol. 36, 843–846 (2018).
Arbab, M. et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 463–480.e30 (2020).
Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, aaf8729 (2016).
Gaudelli, N. M. et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).
Sürün, D. et al. Efficient generation and correction of mutations in human iPS cells utilizing mRNAs of CRISPR base editors and prime editors. Genes 11, 511 (2020).
MacLean, H. E., Warne, G. L. & Zajac, J. D. Spinal and bulbar muscular atrophy: androgen receptor dysfunction caused by a trinucleotide repeat expansion. J. Neurol. Sci. 135, 149–157 (1996).
Fujigasaki, H. et al. CAG repeat expansion in the TATA box-binding protein gene causes autosomal dominant cerebellar ataxia. Brain 124, 1939–1947 (2001).
Koide, R. et al. Unstable expansion of CAG repeat in hereditary dentatorubral-pallidoluysian atrophy (DRPLA). Nat. Genet. 6, 9–13 (1994).
Klockgether, T., Mariotti, C. & Paulson, H. L. Spinocerebellar ataxia. Nat. Rev. Dis. Prim. 5, 24 (2019).
Tsai, S. Q. & Joung, J. K. Defining and improving the genome-wide specificities of CRISPR–Cas9 nucleases. Nat. Rev. Genet. 17, 300–312 (2016).
Pacesa, M. et al. Structural basis for Cas9 off-target activity. Cell 185, 4067–4081 (2022).
Doman, J. L., Raguram, A., Newby, G. A. & Liu, D. R. Evaluation and minimization of Cas9-independent off-target DNA editing by cytosine base editors. Nat. Biotechnol. 38, 620–628 (2020).
Rees, H. A., Wilson, C., Doman, J. L. & Liu, D. R. Analysis and minimization of cellular RNA editing by DNA adenine base editors. Sci. Adv. 5, eaax5717 (2019).
Yu, Y. et al. Cytosine base editors with minimized unguided DNA and RNA off-target events and high on-target activity. Nat. Commun. 11, 2052 (2020).
Zhou, C. et al. Off-target RNA mutation induced by DNA base editing and its elimination by mutagenesis. Nature 571, 275–278 (2019).
Kozlowski, P., De Mezer, M. & Krzyzosiak, W. J. Trinucleotide repeats in human genome and exome. Nucleic Acids Res. 38, 4027–4039 (2010).
Tsai, S. Q. et al. CIRCLE-seq: a highly sensitive in vitro screen for genome-wide CRISPR-Cas9 nuclease off-targets. Nat. Methods 14, 607–614 (2017).
Fu, Y. et al. High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells. Nat. Biotechnol. 31, 822–826 (2013).
Hsu, P. D. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol. 31, 827–832 (2013).
Anderson, E. M. et al. Systematic analysis of CRISPR-Cas9 mismatch tolerance reveals low levels of off-target activity. J. Biotechnol. 211, 56–65 (2015).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576 (2010).
Tsherniak, A. et al. Defining a cancer dependency map. Cell 170, 564–576.e16 (2017).
DepMap, Broad. DepMap 24Q2 Public. Figshare https://doi.org/10.25452/figshare.plus.27993248.v1 (2024).
Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023).
Sun, K. Y. et al. A deep catalogue of protein-coding variation in 983,578 individuals. Nature 631, 583–592 (2024).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Lo, H. S. et al. Allelic variation in gene expression is common in the human genome. Genome Res. 13, 1855 (2003).
Lee, J. M., Pinto, R. M., St Gillis, T., Claire, J. C. & Wheeler, V. C. Quantification of age-dependent somatic CAG repeat instability in Hdh CAG knock-in mice reveals different expansion dynamics in striatum and liver. PLoS ONE 6, e23647 (2011).
Wheeler, V. C. et al. Long glutamine tracts cause nuclear localization of a novel form of huntingtin in medium spiny striatal neurons in HdhQ92 and HdhQ111 knock-in mice. Hum. Mol. Genet. 9, 503–513 (2000).
Kovalenko, M. et al. Msh2 acts in medium-spiny striatal neurons as an enhancer of CAG instability and mutant huntingtin phenotypes in Huntington’s disease knock-in mice. PLoS ONE 7, e44273 (2012).
Levy, J. M. et al. Cytosine and adenine base editing of the brain, liver, retina, heart and skeletal muscle of mice via adeno-associated viruses. Nat. Biomed. Eng. 4, 97–110 (2020).
Hammond, S. L., Leek, A. N., Richman, E. H. & Tjalkens, R. B. Cellular selectivity of AAV serotypes for gene delivery in neurons and astrocytes by neonatal intracerebroventricular injection. PLoS ONE 12, e0188830 (2017).
Foust, K. D. et al. Intravascular AAV9 preferentially targets neonatal neurons and adult astrocytes. Nat. Biotechnol. 27, 59–65 (2009).
Mathiesen, S. N., Lock, J. L., Schoderboeck, L., Abraham, W. C. & Hughes, S. M. CNS transduction benefits of AAV-PHP.eB over AAV9 are dependent on administration route and mouse strain. Mol. Ther. Methods Clin. Dev. 19, 447 (2020).
Swiech, L. et al. In vivo interrogation of gene function in the mammalian brain using CRISPR-Cas9. Nat. Biotechnol. 33, 102 (2014).
Robbins, K. L., Glascock, J. J., Osman, E. Y., Miller, M. R. & Lorson, C. L. Defining the therapeutic window in a severe animal model of spinal muscular atrophy. Hum. Mol. Genet. 23, 4559–4568 (2014).
Arbab, M. et al. Base editing rescue of spinal muscular atrophy in cells and in mice. Science 380, eadg6518 (2023).
Meyer, K. et al. Improving single injection CSF delivery of AAV9-mediated gene therapy for SMA: a dose-response study in mice and nonhuman primates. Mol. Ther. 23, 477–487 (2015).
Aldous, S. G. et al. A CAG repeat threshold for therapeutics targeting somatic instability in Huntington’s disease. Brain 3, 1784–1798 (2024).
Belgrad, J. & Khvorova, A. More than 185 CAG repeats: a point of no return in Huntington’s disease biology. Brain 147, 1601–1603 (2024).
Richter, M. F. et al. Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nat. Biotechnol. 38, 901 (2020).
Miller, S. M. et al. Continuous evolution of SpCas9 variants compatible with non-G PAMs. Nat. Biotechnol. 38, 471–481 (2020).
Walton, R. T., Christie, K. A., Whittaker, M. N. & Kleinstiver, B. P. Unconstrained genome targeting with near-PAMless engineered CRISPR-Cas9 variants. Science 368, 290–296 (2020).
Nishimasu, H. et al. Engineered CRISPR-Cas9 nuclease with expanded targeting space. Science 361, 1259–1262 (2018).
Chatterjee, P. et al. A Cas9 with PAM recognition for adenine dinucleotides. Nat. Commun. 11, 2474 (2020).
Clark, R. M. et al. Expansion of GAA triplet repeats in the human genome: unique origin of the FRDA mutation at the center of an Alu. Genomics 83, 373–383 (2004).
Kawakami, K. Tol2: a versatile gene transfer vector in vertebrates. Genome Biol. 8, S7 (2007).
Arbab, M., Srinivasan, S., Hashimoto, T., Geijsen, N. & Sherwood, R. I. Cloning-free CRISPR. Stem Cell Rep. 5, 908–917 (2015).
Rafehi, H. et al. An intronic GAA repeat expansion in FGF14 causes the autosomal-dominant adult-onset ataxia SCA50/ATX-FGF14. Am. J. Hum. Genet. 110, 105 (2023).
Pellerin, D. et al. Deep intronic FGF14 GAA repeat expansion in late-onset cerebellar ataxia. N. Engl. J. Med. 388, 128–141 (2023).
Kekou, K. et al. A dynamic trinucleotide repeat (TNR) expansion in the DMD gene. Mol. Cell. Probes 30, 254–260 (2016).
Lazzarotto, C. R. et al. Defining CRISPR–Cas9 genome-wide nuclease activities with CIRCLE-seq. Nat. Protoc. 13, 2615–2642 (2018).
Sjöstedt, E. et al. An atlas of the protein-coding genes in the human, pig, and mouse brain. Science 367, eaay5947 (2020).
Pollard, L. M. et al. Replication-mediated instability of the GAA triplet repeat mutation in Friedreich ataxia. Nucleic Acids Res. 32, 5962–5971 (2004).
Sakamoto, N., Ohshima, K., Montermini, L., Pandolfo, M. & Wells, R. D. Sticky DNA, a self-associated complex formed at long GAA*TTC repeats in intron 1 of the frataxin gene, inhibits transcription. J. Biol. Chem. 276, 27171–27177 (2001).
Neugebauer, M. E. et al. Evolution of an adenine base editor into a small, efficient cytosine base editor with low off-target activity. Nat. Biotechnol. 41, 673–685 (2022).
Zhang, E., Neugebauer, M. E., Krasnow, N. A. & Liu, D. R. Phage-assisted evolution of highly active cytosine base editors with enhanced selectivity and minimal sequence context preference. Nat. Commun. 15, 1697 (2024).
Virmouni, S. A. et al. A novel GAA-repeat-expansion-based mouse model of Friedreich’s ataxia. Dis. Model. Mech. 8, 225–235 (2015).
Virmouni, S. A., Sandi, C., Al-Mahdawi, S. & Pook, M. A. Cellular, molecular and functional characterisation of YAC transgenic mouse models of Friedreich ataxia. PLoS ONE 9, e107416 (2014).
Al-Mahdawi, S. et al. GAA repeat instability in Friedreich ataxia YAC transgenic mice. Genomics 84, 301–310 (2004).
Lee, J. M. et al. A novel approach to investigate tissue-specific trinucleotide repeat instability. BMC Syst. Biol. 4, 29 (2010).
Lee, A. Omaveloxolone: first approval. Drugs 83, 725–729 (2023).
Lynch, D. R. et al. Efficacy of omaveloxolone in Friedreich’s ataxia: delayed-start analysis of the MOXIe extension. Mov. Disord. 38, 313–320 (2023).
Gérard, C., Archambault, A. F., Bouchard, C. & Tremblay, J. P. A promising mouse model for Friedreich ataxia progressing like human patients. Behav. Brain Res. 436, 114107 (2023).
Menalled, L. B., Sison, J. D., Dragatsis, I., Zeitlin, S. & Chesselet, M. F. Time course of early motor and neuropathological anomalies in a knock-in mouse model of Huntington’s disease with 140 CAG repeats. J. Comp. Neurol. 465, 11–26 (2003).
Harding, I. H., Lynch, D. R., Koeppen, A. H. & Pandolfo, M. Central nervous system therapeutic targets in Friedreich ataxia. Hum. Gene Ther. 31, 1226 (2020).
Apolloni, S., Milani, M. & D’ambrosi, N. Neuroinflammation in Friedreich’s ataxia. Int. J. Mol. Sci. 23, 6297 (2022).
Hanson, E., Sheldon, M., Pacheco, B., Alkubeysi, M. & Raizada, V. Heart disease in Friedreich’s ataxia. World J. Cardiol. 11, 1–12 (2019).
Davis, J. R. et al. Efficient in vivo base editing via single adeno-associated viruses with size-optimized genomes encoding compact adenine base editors. Nat. Biomed. Eng. 6, 1272–1283 (2022).
Gaudelli, N. M. et al. Directed evolution of adenine base editors with increased activity and therapeutic application. Nat. Biotechnol. 38, 892–900 (2020).
Banskota, S. et al. Engineered virus-like particles for efficient in vivo delivery of therapeutic proteins. Cell 185, 250–265.e16 (2022).
An, M. et al. Engineered virus-like particles for transient delivery of prime editor ribonucleoprotein complexes in vivo. Nat. Biotechnol. 42, 1526–1537 (2024).
Sherwood, R. I. et al. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat. Biotechnol. 32, 171–178 (2014).
Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. https://doi.org/10.1038/s41587-019-0032-3 (2019).
Gotoh, O. An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982).
Heger, A., Marshall, J., Jacobs, K. & contributors. Pysam. GitHub https://github.com/pysam-developers/pysam?tab=readme-ov-file (2025).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Bonfield, J. K. et al. HTSlib: C library for reading/writing high-throughput sequencing data. Gigascience 10, giab007 (2021).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Morales, J. et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature 604, 310–315 (2022).
Neto, J. L. et al. Genetic contributors to intergenerational CAG repeat instability in Huntington’s disease knock-in mice. Genetics 205, 503–516 (2017).
Pinto, R. M. et al. Mismatch repair genes Mlh1 and Mlh3 modify CAG instability in Huntington’s disease mice: genome-wide and candidate approaches. PLoS Genet. 9, e1003930 (2013).
Pinto, R. M. et al. In vivo CRISPR–Cas9 genome editing in mice identifies genetic modifiers of somatic CAG repeat instability in Huntington’s disease. Nat. Gen. 57, 314–322 (2024).
Long, A. et al. Somatic instability of the expanded GAA repeats in Friedreich’s ataxia. PLoS ONE 12, e0189990 (2017).
Porensky, P. N. et al. A single administration of morpholino antisense oligomer rescues spinal muscular atrophy in mouse. Hum. Mol. Genet. 21, 1625–1638 (2012).
Mouro Pinto, R. et al. In vivo CRISPR–Cas9 genome editing in mice identifies genetic modifiers of somatic CAG repeat instability in Huntington’s disease. Nat. Genet. 57, 314–322 (2025).
Hsu, A. powTNRka for TNR BE, v.1. Harvard Database https://doi.org/10.7910/DVN/8NFSTC (2025).
Acknowledgements
We thank Z. McLean for his help with instability analysis in patient fibroblasts, and S. Calvo and V. Mootha for helpful discussions. This research has been conducted using the UK Biobank Resource under the application no. 48511. This work uses data provided by patients and collected by the NHS as part of their care and support. We acknowledge funding from the following sources: the Netherlands Organisation for Scientific Research Rubicon Fellowship (M.A.); US National Institutes of Health (NIH) K99 Pathway to Independence Award no. NS119743-01A1 (M.A.); the Chan Zuckerberg Initiative (M.A.); the Lodish Family Foundation (M.A.); US NIH grant no. U19 NS132304 (M.A., R.M.P., D.R.L.); the Helen Hay Whitney Fellowship (G.A.N.); US NIH K99 Pathway to Independence Award no. HL163805 (G.A.N.); US NIH grant no. R01 NS126420 (R.M.P.); the Dake Family Foundation (R.M.P.); US NIH grant no. U01 AI142756 (D.R.L.); NHGRI grant no. U01HG011755 (H.L.R); US NIH grant no. RM1 HG009490 (D.R.L.); US NIH grant no. R01 EB022376 (D.R.L.); US NIH grant no. R35 GM118062 (D.R.L.); the Friedreich Ataxia Research Alliance (D.R.L.); the Bill and Melinda Gates Foundation (D.R.L.); and the Howard Hughes Medical Institute (D.R.L.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the paper.
Author information
Authors and Affiliations
Contributions
Z.M., M.A. R.M.P. and D.R.L. conceptualized the project. Z.M., M.A., M.K., A.H., J.C.L.R., J.Z., T.Y., B.W., N.J.D., M.W., S.S., A.C., Y.A.T., L.G.F., M.B. and R.M.P. were responsible for the methodology. Z.M., M.A., M.K., J.C.L.R., G.A.N., A.C. and R.M.P. carried out the investigation. Z.M., M.A., M.K., J.Z., T.Y. and R.M.P. were responsible for visualization. M.A., R.M.P. and D.R.L. acquired funding. Z.M., M.A., R.M.P. and D.R.L. were responsible for project administration. Z.M., M.A., R.M.P. and D.R.L. supervised the project. Z.M., M.A. and D.R.L. wrote the paper. H.L.R., J.X. and G.G. provided advice.
Corresponding authors
Ethics declarations
Competing interests
Z.M., M.A. and D.R.L. have filed patent applications on this work. D.R.L. is a consultant, co-founder and/or equity owner of Beam Therapeutics, Prime Medicine, Pairwise Plants and nChroma Bio, companies that use or deliver genome or epigenome editing agents. The other authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Krishanu Saha, Fyodor Urnov and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Synonymous cytosine base editing of CAG repeats in vitro.
(a) Optimization of cytosine base editing strategies in HEK293T cells. Data are mean ± SD of biological triplicates. (b-c) Cumulative mean % of interrupted HTT alleles with at least a specified number of CAA interruptions induced by EA-evoA-NG in (b) HEK293T and (c) HD fibroblasts (GM04855, 20/48 CAGs). Data are mean ± SD of biological replicates ((b) n = 3, (c) n = 2). (d) Distribution of CAA interruptions throughout CAG repeats in GM04855 fibroblasts (20/48 CAGs) at interrupted HTT alleles represented as % interrupted alleles with CAA interruptions at a given position of the repeat tract. Data are mean ± SD of two biological replicates. (e) CAG repeat base editing at the pathogenic HTT allele in HD fibroblasts with 180 CAG repeats (GM09197) quantified in 5’- > 3’ (spanning 59 CAGs) and 3′- > 5′ (spanning 81 CAGs) sequencing direction. Data are mean ± SD of two biological replicates. (f) A representative agarose gel showing the distribution of CAG allele sizes in CBE-treated (CBE) and untreated HD fibroblasts with 180 CAG repeats (GM09197); P – passage, L – ladder. The dashed lines indicate the starting CAG size. The experiment was performed twice with similar results. (g-h) CIRCLE-seq nominated off-target hits in the human genome classified by (g) the identity of the targeted region annotated with HOMER and (h) number of mismatches of the protein-coding sites with the sgCTG. TSS-transcription start site, TTS-transcription termination site.
Extended Data Fig. 2 In-silico off-target analysis for CAG repeat-targeting strategy.
(a-b) CRISPRitz-predicted off-target candidate sites in the (a) macaque (Macaca mulatta) and (b) human genomes organized based on the position of mismatches between the genomic site and sgCTG spacer sequence. Mismatch category A includes the five nucleotides most proximal to the PAM (positions 1-5), category B represents positions 6-10, and category C spans the last ten, PAM-distal nucleotides (positions 11-20) of the protospacer. A0-A5, B0-B5 and C0-C10 indicate the number of mismatches (0-5) between the sgCTG and a target site in categories A, B or C. Each square shows the number of loci in each mismatch subgroup. (c) Predicted in cellulo base editing off-target activity of our CBE strategy in the macaque genome, based on in silico CRISPRitz predictions of off-target edits and a subsequent WGS-transformation using the ratio of off-targets edited in HEK293T cells ( > 0.5% editing by WGS) versus predicted by CRISPRitz in the human genome. Heatmap colors represent predicted in cellulo editing per mismatch bin, based on corresponding editing frequencies per bin observed in WGS analysis of edited HEK293T cells.
Extended Data Fig. 3 Cytosine base editing of HTT CAG repeats in Htt.Q111 mice.
(a) Cumulative mean % interrupted HTT alleles with at least a specified number of CAA interruptions in the cortex of AAV9-CBE-treated Htt.Q111 mice at 4-, 12- and 24-weeks post-injection. Mean ± SD (4 weeks n = 4, 12 weeks n = 6, 24 weeks n = 7). (b) CAG-CBE editing in Htt.Q111 mice quantified in 5′- > 3′ and 3′->5′sequencing directions. Mean ± SD (CBE n = 3, untreated n = 4). (c) Expansion (ICAG(e), positive) and contraction (ICAG(c), negative) indices in tail, cortex and striatum of 12- and 24-week-old Htt.Q111 mice treated with AAV9-CBE, or controls. Box plots of independent animals (Tail: untreated n = 4, CBE n = 5; Striatum: untreated n = 4, CBE n = 6; Cortex: untreated n = 4, CBE n = 6). Median indicated by horizontal line, whiskers show min-max values. **P = 0.007, ***P = 0.0003, ****P < 0.0001, Welch’s one-tailed t-test. (d-f) CAG allele sizes in (d) tail, (e) cortex and (f) striatum of 12-week-old Htt.Q111 mice treated with AAV9-CBE, or controls. Dotted line indicates modal HTT allele. Mean distributions (Tail: untreated n = 8, CBE n = 5; Striatum: untreated n = 8, CBE n = 6; Cortex: untreated n = 4, CBE n = 6). (g) Frameshifting indels in AAV9-CBE-treated Htt.Q111 mice, 12- and 24-weeks post-treatment. Mean ± SD (untreated n = 3, 12 weeks n = 6, 24 weeks n = 7). (h) Fraction of interrupted HTT alleles with frameshifting indels in AAV9-CBE-tretaed Htt.Q111 mice at 12 and 24 weeks post-treatment. Mean ± SD (12 weeks n = 6, 24 weeks n = 7). (i) Frameshifting indel sequences (G, A and AG) at CAG repeats in cortex and striatum of Htt.Q111 mice, 12- and 24-weeks post-injection. Corresponding amino acid sequences (Poly-S – polyserine, Poly-A – polyalanine) noted. Median shown by horizontal line, min-max range by vertical lines (n = 26). (j) Frameshifting indels in HD fibroblasts (GM09197) treated with CBE. Mean ± SD (untreated n = 4, CBE n = 3). (k-l) CIRCLE-seq off-target hits in the mouse genome classified by (k) targeted region and (l) mismatch number with sgCTG. (m) Differential gene expression analysis in cortex of Htt.Q111 mice treated with AAV9-CBE + AAV9-GFP or AAV9-GFP vehicle at 12 weeks post-injection (RNA-seq). Each dot represents a transcript. Mean of four animals per group. r = 0.97 across 134,701 transcripts. (n) Whole-transcriptome C-to-U RNA off-target analysis in Htt.Q111 mice treated with AAV9-CBE + AAV9-GFP or AAV9-GFP vehicle. Mean ± SD of four animals. P = 0.0182, Welch’s two tailed t-test. All data points represent independent biological replicates.
Extended Data Fig. 4 Adenine base editing of FXN GAA repeats in vitro.
(a) Cumulative % of interrupted FXN alleles with at least a specified number of A•T > G•C interruptions induced by GAA-ABE in FXN-mESCs and measured across 30 (FXN-30GAA-mES) and 50 (FXN-60GAA-mES) GAA repeats. Data are mean ± SD of biological triplicates. (b) Distribution of A•T > G•C interruptions throughout GAA repeats at interrupted FXN alleles in FXN-mES cells shown as % interrupted alleles with an A•T > G•C interruption at a given position in the GAA repeat tract. Data are mean ± SD of biological triplicates. (c) Composition of the A•T > G•C interruptions (GGG, GGA and GAG sequences) introduced at FXN alleles in HEK293T cells (9 GAAs) and FXN-mESCs (30 and 50 GAAs). Data are mean ± SD of biological triplicates. (d-e) CIRCLE-seq off-target hits in the mouse genome classified by (d) the number of mismatches with the sgGAA and (e) the identity of the targeted region annotated with HOMER. (f) Comparison of base editing frequencies quantified by WGS and amplicon sequencing (HTS) at selected sites. Each dot represents the average ratio of editing frequencies quantified by WGS and HTS at a single locus, obtained from biological triplicates. Data are plotted as log2 fold-change, with the median indicated by the horizontal line (n = 55). P < 0.0001, One sample t and Wilcoxon test. (g) CRISPRitz-predicted off-target sites in the macaque (Macaca mulatta) genome, categorized by mismatch position between the sgGAA and genomic site. Category A represents the five nucleotides proximal to PAM (positions 1-5), category B covers the next five nucleotides (positions 6-10), and category C includes the last ten, PAM-distal nucleotides (positions 11-20) of the protospacer. A0-A5, B0-B5 and C0-C10 denote the number of mismatches (0-5) between the sgGAA and a target site in each category. Each square shows the number of loci in each mismatch subgroup.
Extended Data Fig. 5 Adenine base editing of FXN GAA repeats.
(a) CRISPRitz-predicted off-target sites in the human genome, categorized by mismatch position between the sgGAA and genomic site. Category A represents the five nucleotides proximal to PAM (positions 1-5), category B covers the next five nucleotides (positions 6-10), and category C spans the last ten, PAM-distal nucleotides (positions 11-20) of the protospacer. A0-A5, B0-B5 and C0-C10 denote the number of mismatches (0-5) between the sgGAA and a target site in each category. Each square shows the number of loci in each mismatch subgroup. (b) Predicted in cellulo base editing off-target activity of our ABE strategy in the macaque genome, based on in silico CRISPRitz predictions of off-target edits and a subsequent WGS-transformation using the ratio of off-targets edited in HEK293T ( > 0.5% editing by WGS) versus predicted by CRISPRitz in the human genome. Heatmap colors represent predicted in cellulo editing efficiency per mismatch bin, based on the corresponding editing frequencies per bin observed in WGS analysis of edited HEK293T cells. (c) Average number of A•T > G•C interruptions in edited FXN alleles isolated from control or FRDA patient-derived fibroblasts, directly observed or estimated. Data are mean ± SD of biological triplicates. ns – not significant, Welch’s two-tailed t-test. (d) Observed and estimated FXN GAA repeat editing in FRDA patient-derived fibroblasts (GM04078, 541/420 GAAs) 5-16 days and 1-3 cell passages (P) after electroporation, normalized to untreated controls. Data are mean ± SD of biological replicates (unrelated sgRNA n = 3, ABE-treated n = 5). (e) Composition of A•T > G•C interruptions (GGG, GGA and GAG sequences) introduced at FXN alleles in control (8/9 GAA repeats) or FRDA patient-derived fibroblasts (330/380 or 541/420 GAAs). Data are mean ± SD of biological triplicates.
Extended Data Fig. 6 Adenine base editing of FXN GAA repeats in YG8s mice.
(a) Transduction efficiency in the cortex of YG8s mice treated with AAV9-GFP at 0.4-2.5×1010 vg/mouse. Mean ± SD (0.4 and 2.5×1010vg n = 5,1.5 ×1010vg n = 4). (b) FXN GAA repeat editing in heart, liver (Liv), striatum (Str), brainstem (Brst), and tail (T) of YG8s.300 mice treated with AAV9-ABEdCH at 24 weeks post-injection, observed by HTS or estimated, normalized to controls. Mean ± SD (heart and striatum n = 6, liver n = 12, brainstem n = 4, tail n = 3). (c) FXN GAA repeat editing in the cortex of YG8s GAA.300 mice treated with AAV9-ABEdCH at 4 and 24 weeks post-injection, observed by HTS or estimated, normalized to uninjected controls. Mean ± SD (4 weeks n = 4, 24 weeks n = 10). (d) Composition of A•T > G•C interruptions (GGG, GGA and GAG sequences) introduced at FXN alleles in 24-week-old YG8s mice treated with AAV9-ABEdCH. Mean ± SD (n = 6). (e) Average number of A•T > G•C interruptions in edited FXN alleles in 24-week-old YG8s mice as observed by HTS or estimated. Mean ± SD (YG8s.300 n = 6, YG8s.800 n = 5). (f) Average number of A•T > G•C interruptions in edited FXN alleles in 24-week-old YG8s.300 mice measured by nanopore sequencing. Mean ± SD (n = 6). (g) Cumulative probabilities of a given number of A•T > G•C interruptions in FXN alleles isolated from the cortex of YG8s.300 mice treated with AAV9-ABEdCH (n = 6) or uninjected controls (reference probability, mean of n = 2), determined by nanopore sequencing (Kolmogorov-Smirnov, p < 1.58−14. (h-i) Representative agarose gels used for quantification of GAA instability index in (h) YG8s.300 and (i) YG8s.800 mice treated with AAV9-ABEdCH (ABE), or controls (Ctrl); T- tail, Ctx – cortex. Each animal tissue was analyzed on the gel at least twice with similar results. (j) Expansion (IGAA(e), positive) and contraction (IGAA(c), negative) indices in tail and cortex of 24-week-old YG8s mice treated with AAV9-ABEdCH, or controls. Box plots of biological replicates (YG8s.300 n = 11, YG8s.800 n = 7). Median indicated with horizontal lines, whiskers show min-max values. *P = 0.0203, **P = 0.0022, ***P < 0.0003, Welch’s one-tailed t-test. All data points represent independent biological replicates.
Supplementary information
Supplementary Information
Supplementary Text, Discussion and Notes 1 and 2.
Supplementary Tables 1–24, 27
A master sheet of all supplementary tables from the manuscript (except for Supplementary Tables 25 and 26). The index of all the items is included in the first tab of this file.
Supplementary Table 25
Off-target sites for sgGAA in the nonhuman primate genome (Macaca mulatta) identified with CRISPRitz prediction tool.
Supplementary Table 26
Off-target sites for sgGAA in the human genome identified with CRISPRitz prediction tool.
Source data
Source Data Fig. 1
Statistical source data.
Source Data Fig. 2
Statistical source data.
Source Data Fig. 3
Statistical source data.
Source Data Fig. 4
Statistical source data.
Source Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 1
Statistical source data.
Source Data Extended Data Fig. 3
Statistical source data.
Source Data Extended Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 6
Statistical source data.
Source Data Extended Data Fig. 1f
Unprocessed gel.
Source Data Extended Data Fig. 6h,i
Unprocessed gel.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Matuszek, Z., Arbab, M., Kesavan, M. et al. Base editing of trinucleotide repeats that cause Huntington’s disease and Friedreich’s ataxia reduces somatic repeat expansions in patient cells and in mice. Nat Genet 57, 1437–1451 (2025). https://doi.org/10.1038/s41588-025-02172-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-025-02172-8
This article is cited by
-
Base editing as a therapeutic strategy for somatic repeat expansion diseases
Nature Genetics (2025)