WO2022008466A1 - Base editing tools - Google Patents
Base editing tools Download PDFInfo
- Publication number
- WO2022008466A1 WO2022008466A1 PCT/EP2021/068559 EP2021068559W WO2022008466A1 WO 2022008466 A1 WO2022008466 A1 WO 2022008466A1 EP 2021068559 W EP2021068559 W EP 2021068559W WO 2022008466 A1 WO2022008466 A1 WO 2022008466A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- base
- cas9
- editing
- dna
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases [RNase]; Deoxyribonucleases [DNase]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/78—Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y305/00—Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
- C12Y305/04—Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
- C12Y305/04002—Adenine deaminase (3.5.4.2)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y305/00—Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
- C12Y305/04—Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
- C12Y305/04005—Cytidine deaminase (3.5.4.5)
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
- C07K2319/80—Fusion polypeptide containing a DNA binding domain, e.g. Lacl or Tet-repressor
Definitions
- the present invention relates to the field of genetic modification technology, particularly to the area of CRISPR-Cas mediated modifications of genetic material, and more particularly the area of CRISPR-Cas mediated base editing of nucleic acid materials.
- CRISPR-Cas technology has developed a wide range of applications for the modification of genetic material, including editing.
- the genetic material can be edited across a range of cell types and organisms, from single prokaryotic cells to entire eukaryotic organisms.
- the simplicity and programmability of the RNA-guided CRISPR-associated nucleases have enabled the generation of double-stranded DNA breaks (DSBs) at precise target positions in the genome (see Jinek, M. etai, (2012) Science, 337(6096): 816 - 821).
- cellular DNA repair mechanisms introduce random insertions/deletions (indels), translocations or other stochastic rearrangements at the targeted site through non-homologous/microhomology-mediated end joining (NHEJ/MMEJ), leading to gene disruption and undesired modifications (see Jeggo, P. A. (1998) Advances in Genetics Vol. 38, pp. 185 - 218, Academic Press; Rouet, P. et ai, (1994) Proceedings of the National Academy of Sciences, 91(13), 6064-6068; Lukacsovich, T. et ai, (1994) Nucleic Acids Research, 22(25): 5649-5657).
- indels random insertions/deletions
- NHEJ/MMEJ non-homologous/microhomology-mediated end joining
- HDR homology-directed repair
- CRISPR-mediated DSBs are predominantly leveraged as a counter selection system to kill the unedited cells, after native or phage-recombinase-assisted homologous recombination (see Li, Q. et ai, (2015) Metabolic engineering, 31: 13 - 21; Tong, Y. etal., (2015) ACS Synthetic Biology, 4(9), 1020 - 1029; Yu, J. etal., 2015 Appl. Environ. Microbiol. DOI: 10.1128/AEM.04023-14; Wang, Y. et al., (2016) ACS Synthetic Biology, 5(7), 721-732).
- SNPs single nucleotide polymorphisms
- Komor, A.C., et ai (2016) “Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage” Nature 533: 420-424.
- Komor et ai used a base editor comprising a dead SpyCas9 (dSpyCas9) fused to a cytidine deaminase in conjunction with an appropriate guide RNA (gRNA).
- dSpyCas9 dead SpyCas9
- gRNA guide RNA
- a bacteriophage-derived uracil DNA glycosylase inhibitor (UGI) was fused to the C-terminus of nCas9 (nickase), inhibiting the reversion of the U:G pair back to the original C:G pair by cellular uracil DNA glycosylases.
- UMI bacteriophage-derived uracil DNA glycosylase inhibitor
- the base editors manipulate the cellular DNA repair response to favour desired base-editing outcomes, resulting in permanent correction of ⁇ 15 - 75% of total cellular DNA with minimal (typically £ 1%) indel formation.
- the editing window was from -15 to -20 positions from the PAM, in E. coii.
- the cytidine deaminase was a deaminase from the apolipoprotein B mRNA-editing complex (APOBEC) family of deaminases, specifically the rat APOBEC1 (rAPOBECI) fused to the N-terminus of nSpyCas9.
- APOBEC apolipoprotein B mRNA-editing complex
- rAPOBECI rat APOBEC1 fused to the N-terminus of nSpyCas9.
- UGI followed by the LVA tag were fused to the C-terminus of nSpyCas9, forming the so-called base editor 3 (BE3) system.
- MACBETH glutamicum base editing method using dead SpyCas9 (dSpyCas9) or nSpyCas9D10A and activation-induced cytidine deaminase (AID) without foreign DNA templates, achieving single-, double-, and triple-locus editing with efficiencies up to 100%, 87.2% and 23.3%, respectively.
- the Jb/a KP c gene was confirmed in K. pneumoniae to be the major factor that contributed to the carbapenem resistance of a hypermucoviscous carbapenem- resistant K. pneumoniae strain.
- Truncated or extended guide RNAs were employed to expand the canonical 5-bp editing window to 7-bp. Also, bacterial adenine base editing was also achieved with a Cas9 fused to adenosine deaminase.
- base editors were engineered containing mutated cytidine deaminase domains that narrow the width of the editing window from ⁇ 5 nucleotides to as little as 1 - 2 nucleotides. This enabled discrimination of neighbouring C nucleotides, which would otherwise be edited with similar efficiency, and doubled the number of disease-associated target Cs able to be corrected preferentially over nearby non-target Cs.
- the base editor was used to make single-, double- and triple-point C to T mutations at target sites in S.
- the base editor was also highly efficient in the industrial strain, Streptomyces rapamycinicus, which produces the immunosuppressive agent rapamycin.
- the PmCDAI -assisted base editor dCas9-CDA-ULstr could edit cytosines preceded by guanosines with high efficiency which is advantageous for editing Streptomyces genomes (with high GC content).
- Luo Y., etal., (2020) Microb. Cell Fact. 19: 93 describe cytosine base editors (CBEs) made by ligating CDA1 and UGI to the carboxy terminus of dCas9 or nCas9D10A; and adenine base editors (ABEs) made by linking a codon-optimized TadA-TadA*(opt) to the amino terminus of dCas9 or nCas9D10A.
- CBEs and ABEs are robust base editing systems for Rhodobacter sphaeroides 2.4.1 that allowed the efficient modification of multiplex genes in a stringent and chemically inducible manner.
- CBEs cytidine base editors
- ABEs adenine base editors
- Streptomyces which enable targeted C-to-T or A-to-G nucleotide substitutions, respectively, bypassing DSB and the need for a repair template.
- Successful genome editing is reported for Streptomyces at frequencies of around 50% using defective Cas9-guided base editors and up to 100% by using nicked Cas9-guided base editors.
- Multiplexing is also described for a nicked Cas9-guided base editor BE3 and programmed mutation of nine target genes simultaneously.
- the high-fidelity version of BE3 HF-BE3 used also improved editing specificity.
- the state of the art also includes base editing carried out in eukaryotes, such as in yeast, plants, mice, and human cells.
- base editing can directly, efficiently and precisely invert devastating point mutations whilst limiting the formation of DSB-mediated by-products.
- BE4 and SaBE4 represent “fourth-generation” base editors (BE4 and SaBE4) which are made by fusing BE3, BE4, SaBE3, or SaBE4 to Gam, a bacteriophage Mu protein that binds DSBs and greatly reduces indel formation during base editing, in most cases to below 1.5%, and further improves product purity.
- BE4 and SaBE4 editors were used to edit a haploid human cell line.
- BE represents a cytidine deaminase and a catalytically defective (i.e. nuclease deficient) Cas9.
- aaf8729 describes the activation-induced cytidine deaminase (AID) ortholog of PmCDAI engineered to form a synthetic complex (Target-AID) with nickase SpyCas9 (D10A). Specific point mutation was induced primarily at cytidines within the target range of five bases. Although the editor was highly effective in yeasts, it also induced insertion and deletion (indel) in mammalian cells. UGI however suppressed the indel formation and improved the efficiency.
- BE3 third-generation base editor
- HF-BE3 high-fidelity base editor
- RNP ribonucleoprotein
- BE3 RNPs are delivered into both zebrafish embryos and the inner ear of live mice to achieve specific, DNA-free base editing in vivo.
- BE-PLUS a new base editing tool with broadened editing window and enhanced fidelity
- Cell Research 28(8): 855 - 861 describes a modified BE in (BE-PLUS) made by fusing 10 copies of GCN4 peptide to nCas9(D10A) for recruiting scFv-APOBEC-UGI-GB1 to the target sites.
- the modified system tested in HEK293FT cells achieved base editing with a broadened window, resulting in an increased genome targeting scope with fewer unwanted indels and non-C-to-T conversions.
- SpCas9-NG a rationally engineered SpCas9 variant
- the SpyCas9-NG induced indels at endogenous target sites bearing NG PAMs in human cells. Fusion of SpCas9-NG and the activation-induced cytidine deaminase (AID) mediated C-to-T conversion at target sites with NG PAMs in human cells. Tian, S.
- Tan J., etal., (2019) Nature Comm. 10: 439 reports a fusion of PmCDAI to the C- terminus of nSpyCas9 through a 16 amino acid linker (XTEN), exhibiting editing from -16 to -19 positions from the PAM, in yeast.
- XTEN 16 amino acid linker
- PmCDAI was fused to the N-terminus of nSpyCas9 through a 16 amino acid linker (XTEN), exhibiting editing in a wider window (-13 to -21 positions from the PAM) and similar efficiencies to the C- terminus fusion, in yeast.
- both the activity window and editing efficiency remained unaltered, suggesting that the termini of CDA1 are inherently flexible and may act as linker-like sequences.
- W02020/081568 UNIVERSITY OF MASSACHUSETTS (‘Programmable DNA Base Editing By Nme2cas9-deaminase Fusion Proteins’) describes novel tools for base-editing in e.g. HEK293T, K562 or C57BL/6NJ mouse cells using a mesophilic, type ll-C Cas9 variant from Neisseria meningitidis (Nme2Cas9) that uses a 22-24 nt spacer to edit sites adjacent to an N4CC PAM.
- Nme2Cas9 Neisseria meningitidis
- C-to-T conversion is mediated by the nNme2Cas9-CBE4 (also called (C)BE4-nNme2Cas9(D16A)-UGI-UGI) editor or its optimised version YE1-BE3-nNme2Cas9(D16A)-UGI.
- These editors comprise either the wild-type or a mutant (called YE1) rat APOBEC1 cytidine deaminase enzyme, respectively.
- the potential base editing window of YE1-BE3-nNme2Cas9(D16A) is from nucleotides 2-8 in the displaced DNA strand, counting the nucleotide at the 5’ (PAM-distal) end of the 20 nt protospacer as nucleotide #1 , which corresponds to positions -13 to -19 relative to the PAM.
- A-to G conversion is mediated by the ABE7.10 nNme2Cas9(D16A) editor or its optimised version nNme2Cas9(D16A)-ABEmax.
- the most popular base editing systems are the “Target-AID” (activation-induced cytidine deaminase) and the “BE” (Base Editor). They comprise a fusion between the catalyti cally deactivated or nickase variant of the Streptococcus pyogenes Cas9 (dSpyCas9, nSpyCas9) with a cytidine deaminase enzyme (converts OG into T ⁇ A base pair), such as the Petromyzon marinus cytosine deaminase PmCDAI and its human orthologue (“Target-AID”) or the rat APOBEC1 (“BE”) (see Komor, A.
- uracil DNA glycosylase inhibitor from bacteriophage PBS is usually included for higher editing efficiencies (see Komor, A. C. et al., (2016) supra).
- GMI uracil DNA glycosylase inhibitor
- each of these base editors impose a NGG PAM downstream of the target region, the OG base pair within a narrow activity window at the PAM-distal end of the protospacer as well as, in the case of prokaryotes, the generation of a stop codon at this specific position.
- the rAPOBECI exhibits sequence context preferences, presenting low efficiencies at GC motifs (see Komor, A. et al., (2016) supra and Komor, A. et al., (2017) supra). Low product purity has also been reported, caused by unexpected C- to non-T editing ( Komor A. et al., (2016) supra ; Nishida, K. etal., (2016) Science, 353(6305), aaf8729; Hess, G. T. et al., (2016) Nature Methods 13(12): 1036; Kim, K. et al., (2017) Nature Biotechnology 35(5): 435; Ma, Y.
- adenosine deaminase-based editors Zhang, Y., etal., (2020) “Programmable adenine deamination in bacteria using a Cas9-adenine-deaminase fusion” Chem. Sci. Vol 11 , pages 1657-1664 describes a gene editor for use with gRNA for targeting and directly converting adenine to guanine in bacterial genomes.
- the gene editor is a fusion of an adenine deaminase and nSpyCas9 (D10A).
- the method achieves the conversion of adenine to guanine via an enzymatic deamination reaction and a subsequent DNA replication process rather than HR, which is utilized in conventional bacterial genetic manipulation methods.
- a systematic screening successfully targeted the possibly editable adenine sites of cntBC, the importer of the staphylopine/metal complex in Staphylococcus aureus.
- anti-CRISPR protein AcrIICINme traps these Cas9 endonucleases in vivo in a DNA-bound, catalyti cally inactive state, robustly inhibiting targeting and resulting in a transcriptional silencing that is comparable to their catalytically "dead” variants (Thermo-dCas9 and Geo-dCas9).
- the present invention provides a base editor ( ⁇ ) comprising:
- the Cas9 is a Geobacillus stearothermophilus Cas9 (GeoCas9) having an amino acid sequence of SEQ ID NO: 1 or a sequence of at least 77% identity therewith; or
- the Cas9 is a Geobacillus thermodenitrificans Cas9 (ThermoCas9) having an amino acid sequence of SEQ ID NO: 3 or a sequence of at least 77% identity therewith.
- the base editors (I) of this invention have a base editing window of up to 19 nucleotides in the protospacer region, distal of the PAM. This is a much wider editing window compared to other known base editors.
- the base editors of the invention provide an increased possibility and flexibility in obtaining the desired edits in a given target sequence. For example, in generation of stop codons in a target gene sequence. Aside from introducing stop codons, base editors of the invention introduce multiple nucleotide substitutions, which can be beneficial for random mutagenesis studies but also for generating targeted modifications obtainable by the base conversion.
- the inventors believe that the mechanism of appropriate base substitution (giving rise on transcription and translation to amino acid substitutions) may be as shown in the Figure 1e of Tong T. eta!., (2019) PNAS 116(41) 20366 - 20375.
- the base editors provide alternative PAM specificities, thus increasing the number of targetable sites. They also expand the range of targeted hosts, as they are the first base editors comprised of thermotolerant CRISPR variants, implying their possible application in not only mesophilic but also thermophilic prokaryotes.
- the base editors theoretically extend multiplexing possibilities for pairwise combinations of Cas9 orthologs with orthogonal guides. Thanks to their much smaller size (1087 GeoCas9,
- the unexpected effect of the invention can be said to lie in an increased specificity and range of possible base edits, a much wider window of editing possibility than with known Cas9 and Cas12a base editors, and a much wider temperature range of operation.
- the GeoCas9 is preferably (i) a dead GeoCas9 (dGeoCas9), or (ii) a modified GeoCas9; more preferably the GeoCas9 comprises DNA single strand nickase activity.
- the invention provides a base editor as herein defined, wherein the ThermoCas9 is a dead ThermoCas9 (dThermoCas9) or a modified ThermoCas9, e.g. having DNA single strand nickase activity.
- dThermoCas9 dead ThermoCas9
- modified ThermoCas9 e.g. having DNA single strand nickase activity.
- the deaminase may be a cytidine deaminase; optionally wherein the human Target-AID (activation-induced cytidine deaminase), its orthologue from the sea lamprey Petromyzon marinus (PmCDAI) or the rat APOBEC1 (“rAPOBECI, BE”).
- the base editor may further comprise at least one uracil DNA glycosylase inhibitor (UGI).
- linker between (i) the Cas9 and the UGI; and/or (ii) the Cas9 and the cytidine deaminase; and/or (iii) the cytidine deaminase and the UGI.
- the linker is the amino-acid sequence that connects the Cas9 enzyme with the deaminase enzyme in order to make the desired chimeric protein.
- Possible linkers of use in the present invention may be selected from one or more of SH3 and (x2)3xFLAG tag, SH3 and IxFLAG tag, XTEN.
- the deaminase may be an adenine deaminase.
- the Cas9 is a Geobacillus stearothermophilus Cas9 (GeoCas9) having an amino acid sequence of SEQ ID NO: 1 or a sequence of at least 86% amino acid identity therewith; or the Cas9 is a Geobacillus thermodenitrificans Cas9 (ThermoCas9) having an amino acid sequence of SEQ ID NO: 3 or a sequence of at least 86% amino acid identity therewith.
- the Cas9 may be one which has a PAM sequence recognition preference selected from NNNNCRAA, NNNNCVAA or NNNNCCCA.
- the Cas9 is a dead GeoCas9 (dGeoCas9) and the deaminase is PmCDAI (TargetAID) or human CDA1.
- the Cas9 is dGeoCas9 and the cytidine deaminase is PmCDAI
- a stronger preference for Cs at the PAM distal end similar to dSpyCas9 T argetAI D).
- the Cas9 is a dead ThermoCas9 (dThermoCas9) and the deaminase is PmCDAI (TargetAID) or human CDA1.
- the Cas9 is dThermoCas9 and the cytidine deaminase is PmCDAI
- the Cas9 is a nickase ThermoCas9 (e.g. nThermoCas9D8A) and the deaminase is rAPOBECI; preferably wherein the Cas9 has a PAM sequence recognition preference of NNNNCCAA.
- rAPOBECI -nThermoCas9(D8A)- UGI-UGI advantageously when tested in human cells has an editing window of activity from -5 to -29 positions, i.e.
- nickase GeoCas9 and nickase ThermoCas9 base editor embodiments of the invention may be used for editing in mammalian cells; dead GeoCas9 and dead ThermoCas9 base editor embodiments of the invention may provide less efficient base editing in mammalian cells,
- base editors of the invention appear to have lesser level of off-target events compared to counterpart base editors known in the art.
- the invention provides a base editor (“II”) comprising:
- the Cas9 is a Geobacillus stearothermophilus Cas9 (GeoCas9) having an amino acid sequence of SEQ ID NO: 9 or a sequence of at least 77% identity therewith; or
- the Cas9 is a Geobacillus thermodenitrificans Cas9 (ThermoCas9 ) having an amino acid sequence of SEQ ID NO: 11 or a sequence of at least 77% identity therewith.
- the deaminase may be a cytidine deaminase; optionally wherein the human Target-AID (activation-induced cytidine deaminase), its orthologue from the sea lamprey Petromyzon marinus (PmCDAI) or the rat APOBEC1 (“BE”).
- Target-AID activation-induced cytidine deaminase
- BE rat APOBEC1
- the base editor may further comprise at least one uracil DNA glycosylase inhibitor (UGI).
- UGI uracil DNA glycosylase inhibitor
- a linker between (i) the Cas9 and the UGI; and/or (ii) Cas9 and the cytidine deaminase; and/or (iii) the cytidine deaminase and the UGI.
- the deaminase may be an adenine deaminase.
- the Cas9 is a Geobacillus stearothermophilus Cas9 (GeoCas9) having an amino acid sequence of SEQ ID NO: 9 or a sequence of at least 86% amino acid identity therewith; or the Cas9 is a Geobacillus thermodenitrificans Cas9 (ThermoCas9 ) having an amino acid sequence of SEQ ID NO: 11 or a sequence of at least 86% amino acid identity therewith.
- GeoCas9 Geobacillus stearothermophilus Cas9
- ThermoCas9 Geobacillus thermodenitrificans Cas9
- the Cas9 may have a PAM sequence recognition preference selected from NNNNCRAA, NNNNCVAA or NNNNCCCA.
- the Cas9 is an active GeoCas9 (GeoCas9) and the deaminase is e.g. PmCDAI (TargetAID), where advantageously there is a window of based editing activity from -3 to -28 positions; i.e. up to 26 residues, or -9 to -24; i.e. up to 16 residues (for 1-step and 2-step incubation respectively), making a total possible window of -3 to -28, i.e. up to 26 residues.
- the Cas9 is an active ThermoCas9 (ThermoCas9) and the deaminase is e.g.
- PmCDAI (TargetAID), where advantageously there is a window of based editing activity from -9 to -23 positions; i.e. up to 15 residues, or -5 to -27; i.e. up to 20 residues (for 1-step and 2-step incubation respectively), making a total possible window of -5 to -27, i.e. up to 23 residues
- the base editor may be a fusion protein generated by expression of a polynucleotide encoding the protein components in a suitable cell.
- An expressed fusion protein base editor may be isolated including at least partially purified. Such isolated base editor may be used in certain methods of the invention as described hereinafter.
- the invention also provides a polynucleotide (I) encoding a base editor (I) as hereinbefore defined.
- the invention includes an expression vector (I) comprising the polynucleotide (I).
- the expression vector (I) may further comprise a polynucleotide encoding a guide RNA (gRNA) which targets a DNA sequence.
- gRNA guide RNA
- the invention also provides a polynucleotide (II) encoding a base editor (II) as hereinbefore defined.
- the invention includes an expression vector (II) comprising the polynucleotide (II) and optionally an anticrispr protein gene, e.g. the acrllCl Nme gene.
- an anticrispr protein gene e.g. the acrllCl Nme gene.
- an anticrispr protein gene e.g. the acrllCl Nme gene.
- the expression vector (II) may further comprise a polynucleotide encoding a gRNA which targets a DNA sequence.
- the expression vector (II) may further comprise an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand.
- the invention further provides a system for base editing of a target DNA sequence, comprising expression vector (II) as a first expression vector; a second expression vector comprising a polynucleotide encoding (a) an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand, and (b) a gRNA for a target DNA sequence.
- expression vector (II) as a first expression vector
- a second expression vector comprising a polynucleotide encoding (a) an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand, and (b) a gRNA for a target DNA sequence.
- the invention also provides a system for base editing of a target DNA sequence, comprising expression vector (II) as a first expression vector; a second expression vector comprising a polynucleotide encoding an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand; and a third expression vector comprising a polynucleotide encoding a gRNA for a target DNA sequence.
- expression vector (II) as a first expression vector
- a second expression vector comprising a polynucleotide encoding an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand
- a third expression vector comprising a polynucleotide encoding a gRNA for a target DNA sequence.
- the invention also provides a system for base editing of a target DNA sequence, comprising expression vector (II) as a first expression vector; a second expression vector comprising a polynucleotide encoding an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand.
- expression vector (II) as a first expression vector
- second expression vector comprising a polynucleotide encoding an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand.
- a gRNA is provided directly as a gRNA molecule.
- the invention further provides a system for base editing of a target DNA sequence comprising a base editor (I), and a gRNA for a target strand DNA.
- the invention also provides a system for base editing of a target DNA sequence comprising a base editor (II), and a gRNA for a target strand DNA.
- a system for base editing of a target DNA sequence comprising a base editor (II), and a gRNA for a target strand DNA.
- Such a system may further comprise an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand.
- a ribonucleoprotein complex comprising a base editor (I) and a gRNA for a target DNA strand.
- a ribonucleoprotein complex comprising a base editor (II), and a gRNA for a target DNA strand an anti-CRISPR protein.
- the invention therefore provides a method of base editing comprising transforming a cell with a first expression vector (I); a second expression vector comprising a polynucleotide encoding a gRNA for a target DNA sequence.
- the first expression vector (I) further comprises a polynucleotide encoding a gRNA which targets a DNA sequence.
- the invention includes a method of base editing comprising transforming a cell with a first expression vector (I); and introducing into the cell a gRNA for a target DNA sequence.
- the invention further provides a method of base editing comprising transforming a cell with a first expression vector (II); a second expression vector comprising a polynucleotide encoding (a) an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand, and (b) a guide RNA for a target DNA sequence.
- a first expression vector II
- a second expression vector comprising a polynucleotide encoding (a) an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand, and (b) a guide RNA for a target DNA sequence.
- the invention includes a method of base editing comprising transforming a cell with a first expression vector which is an expression vector (II); a second expression vector comprising a polynucleotide encoding an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand; and a third expression vector comprising a polynucleotide encoding a guide RNA for a target DNA sequence.
- the first expression vector (II) further comprises a polynucleotide encoding a gRNA which targets a DNA sequence.
- the invention also provides a method of base editing comprising transforming a cell with a first expression vector (II); a second expression vector comprising a polynucleotide encoding an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand; and introducing into the cell a gRNA for a target DNA sequence.
- a first expression vector II
- a second expression vector comprising a polynucleotide encoding an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand
- the invention also provides a method of base editing comprising transforming a cell with a first expression vector which is an expression vector (II); and introducing into the cell a gRNA for a target DNA sequence.
- a method of base editing as described herein may be carried out in cells ex vivo or in vitro.
- expression is induced in the cell(s) for a period, following which genetic material in the cells is analysed to identify base edited cells, e.g. by polymerase chain reaction (PCR) using at least one suitable primer pair, purification and Sanger sequencing.
- PCR polymerase chain reaction
- Included in the invention is a method of base editing, comprising exposure DNA to (a) a base editor (I), and (b) a gRNA for a target strand DNA.
- Also included in the invention is a method of base editing, comprising exposure DNA to (a) a base editor (II), (b) a gRNA for a target strand DNA, and (c) an anti-CRISPR protein.
- the invention further provides a method of base editing, comprising exposure of DNA to any ribonucleoprotein complex as hereinbefore defined.
- This may be DNA in vitro, which may be isolated DNA, or wherein the DNA is comprised in a cell.
- this may be removed or inactivated, thereby providing a counter selection step for non-edited cells.
- this may be the small anti-CRISPR protein from Neisseria meningitidis (AcrIICINme).
- This particular anti- CRISPR protein has been shown to be in vitro and in vivo active for a number of Cas9 nucleases (see Garcia B,. eta!., (2019) Supra).
- Counter-selection when used has an advantage whereby multiple single nucleotide mutations occur; often up to 9 position in a single read, which increases the possibilities of generation of stop codon and inactivation of the gene. However, this does not necessarily generate clean mutants and so a re-streaking step may be needed.
- the gRNA is a single guide RNA (sgRNA); the sgRNA may comprise a spacer having at least 5 mismatches at the 5’ end thereof in comparison with the targeted protospacer.
- Figure 1 is a diagram comparing the base editing window for various known Cas9 base editors and those of the invention.
- the base editors referenced to the prior art in the diagram are as follows:
- Figure 2 is a BLASTP alignment of amino acid sequences of ThermoCas9 with GeoCas9. This shows that ThermoCas9 presents 88% amino acid sequence similarity to GeoCas9.
- Figure 3 is a ClustalW analysis. Sequences (1:2) Aligned. Score: 87.9852. This shows that ThermoCas9 presents 88% amino acid sequence similarity to GeoCas9.
- Figure 4 is at table showing pBLAST results of Cas9 protein sequences compared to ThermoCas9.
- Figures 5A and 5B are heatmaps of C * G to T ⁇ A base-editing by dGeoTarget-AID.
- the heatmaps depict the percentage of C * G to T ⁇ A conversion in every C-position within or immediately upstream of protospacers (y axis) in the genomes of DH10B_gfp single colonies transformed with the pdGeoTarget-AID_BE-G1/2/3/4 vectors (where BE-G1/2/3/4 represent the corresponding employed spacers).
- White boxes represent no base-editing, light to darker grey boxes represent increasing base-editing efficiencies, and black boxes represent 100% base-editing efficiency.
- Figures 6A and 6B are heatmaps of OG to T ⁇ A base-editing by dThermoTarget- AID.
- the heatmaps depict the percentage of OG to T ⁇ A conversion in every C-position within or immediately upstream of protospacers (y axis) in the genomes of E. coli DH10B_g p single colonies transformed with the pdThermoTarget-AID_BE-T1/2/3/4/5/6 vectors (where BE-T1/2/3/4/5/6 represent the corresponding employed spacers).
- White boxes represent no base-editing, light to darker grey boxes represent increasing base editing efficiencies, and black boxes represent 100% base-editing efficiency.
- Figures 7 A and 7B are heatmaps of OG to T ⁇ A base-editing by AcrGeoTarget- AID.
- the heatmaps depict the percentage of OG to T ⁇ A conversion in every Oposition within or immediately upstream of protospacers (y axis) in the genomes of E. coli DH10B_g p single colonies transformed with the pAcrGeoTarget-AID_BE-G1/2/3/4/5/6 vectors (where BE-G1/2/3/4/5/6 represent the corresponding employed spacers).
- White boxes represent no base-editing, light to darker grey boxes represent increasing base editing efficiencies, and black boxes represent 100% base-editing efficiency.
- Figures 8A and 8B are heatmaps of OG to T ⁇ A base-editing by AcrThermoTarget-AID.
- the heatmaps depict the percentage of OG to T ⁇ A conversion in every Oposition within or immediately upstream of protospacers (y axis) in the genomes of E. coli DH10B_g p single colonies transformed with the pAcrThermoTarget-AID_BE- G1/2/3/4/5 vectors.
- White boxes represent no base-editing, light to darker grey boxes represent increasing base-editing efficiencies, and black boxes represent 100% base editing efficiency.
- Figure 9 shows heatmaps of OG to T ⁇ A base editing by nThermoBE4 in HEK293T cells.
- the heatmaps depict the percentage of OG to T ⁇ A conversion in every Oposition within or immediately upstream of protospacers (y axis) in the genomes of HEK293T cell populations transfected with the pnThermoBE4_BE- TE1/TE2/TE3/TV1/TV2/TV E/T D1/TD2/TD3 vectors.
- White boxes represent no base editing
- light to darker grey boxes represent increasing base-editing efficiencies
- black boxes represent 100% base-editing efficiency.
- the inventors have generated novel tools for base-editing, e.g. in bacteria (Escherichia coli), using thermostable, type ll-C Cas9 variants from Geobacillus thermodenithficans T 12 (ThermoCas9) and Geobacillus stearothermophilus (GeoCas9) that use 23 nt spacer to edit sites adjacent to an N4CVAA/N4CCCA or N4CRAA PAM, respectively.
- the inventors have investigated four novel editors that comprise a cytidine deaminase enzyme from the sea lamprey Petromyzon marinus (PmCDAI): (a) dThermoTarget-AID, (b) dGeoTarget-AID, (c) AcrThermoTarget-AID, and (d) AcrGeoTarget-AID.
- PmCDAI sea lamprey Petromyzon marinus
- ThermoCas9/GeoCas9 in combination with the anti-CRISPR protein allows for counter selection after base-editing to eliminate the unedited, weakly edited or edited only at the PAM-distal end (where mismatches are tolerated) colonies. Thus, this also allows an enrichment of edits at the PAM proximal end.
- nThermoCas9-rAPOBEC1 a novel tool for C-to-T conversion in mammalian cells
- D8A nickase ThermoCas9 variant
- rat APOBEC1 cytidine deaminase This system also presents a much larger base editing window from -5 to -29 position relative to the PAM (25 bp).
- the inventors combined cytidine deaminase PmCDAI with wide- temperature range Cas9 orthologues from Geobacillus stearothermophilus (GeoCas9)
- thermostable Cas9 with increased lifetime in human plasma” Nature Communications 8(1): 142
- Geobacillus thermodenithficans ThermoCas9
- the base editors created have PAM preferences of NNNNCRAA and NNNNCVAA/NNNNCCCA respectively and are different in many respects compared to the known base editors which comprise SpyCas9. For example, a different range of genomic targets is made available with base editors of the present invention.
- the inventors use catalytically inactive variants of GeoCas9 or ThermoCas9 combined with PmCDAI deaminase (selected due to its low sequence context preference) and UGI (for decreasing undesired background of PmDA1 -mediated base editing).
- the inventors use active GeoCas9 or ThermoCas9 variants, together with a small anti-CRISPR protein, by way of example from Neisseria meningitidis (AcrIICINme), which is known to trap the Cas9 proteins in vitro in a DNA-bound but catalytically inactive state (see Pawluk etai., (2016) 39 “Naturally occurring off-switches for CRISPR-Cas9” Cell 167(7): 1829-1838; and Harrington etai., (2017b) “A broad-spectrum inhibitor of CRISPR-Cas9” Cell 170(6): 1224-1233).
- AcrIICINme Neisseria meningitidis
- base editing is carried out in cells by the PmCDAI of the base editor, followed by the step counterselection of non-base-edited cells by removal of the AcrIICINme activity.
- the inventors have followed a 1-day protocol and targeted a gene that is not correlated to cell survival (GFP).
- the inventors have found that the base-editing window of the GeoCas9- and ThermoCas9-based base-editors is up to 19 nucleotides, much wider compared to the base-editing window of currently known base-editors used in bacteria.
- the use of the base editors of the invention as described herein therefore broadens the range of possible edits and hence provides more flexibility in obtaining the desired edits in a given target sequence, due to wider editing window (as well as different PAM requirements). For example, it increases the possibility that stop codons will be generated in a target gene.
- base editors of the invention introduce multiple nucleotide substitutions, which can be beneficial for random mutagenesis studies, but also for generating targeted mutants obtainable by the base conversion (to be selected from the subset of possible modifications).
- the inventors foresee application across the range of bacterial species, whether mesophilic or thermophilic, but also in eukaryotic species.
- FIG. 1 More information about the editing window for base editors of the invention is shown in Figure 1 and compared to various known base editors based on SpyCas9.
- the base-editing window is 5 to 6 nucleotides for the TargetAID base editors (from -15/-16 to -20 PAM-distal position) and 4 to 6 nucleotides for the rAPOBECI base editor (from -13 to -16/-18 PAM-distal position or from -11 to -17).
- the inventors have also applied a fusion protein base editor of the nickase ThermoCas9 variant (nThermoCas9), rAPOBECI enzyme and two copies of UGI in HEK293T cells for the purpose of base-editing.
- the active Geobacillus stearothermophilus (GeoCas9) component of base editors of the invention may be provided as a full length GeoCas9 protein or an active fragment thereof.
- a method of expressing and then purifying the GeoCas9 from E. coli is described in Harrington et a!., (2017a) supra and so an isolated, optionally purified, GeoCas9 may form an intermediate component in a method for making base editors of the invention involving in vitro coupling of base editor components together using appropriate linking chemistry from proteins and peptides which is well known to a person of skill in the art.
- the amino acid sequence of the GeoCas9 reference protein may be as set forth in SEQ ID NO: 1, although equally other amino acid sequences of Cas9 from other strains of Geobacillus stearothermophilus available in online databases may be used as a reference sequence, e.g. Geobacillus LC300 Cas9 (see Harrington, B., etal ., (2017a) supra).
- the active Geobacillus thermodenitrificans (ThermoCas9) component of base editors of the invention may be provided as a full length ThermoCas9 protein or an active fragment thereof.
- a method of expressing and then purifying the ThermoCas9 from E. coli is described in Mougiakos etal., (2017) supra and so an isolated, optionally purified, ThermoCas9 may form an intermediate component in a method for making base editors of the invention involving in vitro coupling of base editor components together.
- the amino acid sequence of the ThermoCas9 reference protein may be as set forth in SEQ ID NO: 3, although equally other amino acid sequences of Cas9 from other strains of Geobacillus thermodenitrificans available in online databases may be used as a reference sequence, e.g. Geobacillus 47C-I lb Cas9, Geobacillus 46C-I la, Geobacillus LC300, Geobacillus jurassicus and others.
- thermophilic Cas9 Inactive or nickase thermophilic Cas9 components
- a nickase variant may be created via a mutation in either one of the HNH or the RuvC catalytic domains of the Cas9 nuclease. This has been shown for S. pyogenes Cas9 (SpyCas) with SpyCas9-mutants D10A and H840A, which have an inactive RuvC or HNH nuclease domain, respectively.
- SpyCas S. pyogenes Cas9
- D10A and H840A which have an inactive RuvC or HNH nuclease domain, respectively.
- the corresponding mutation positions in dGeoCas9 are D8A and H582A.
- the corresponding mutation positions in dThermoCas9 are D8A and H582A.
- the amino acid sequence of dGeoCas9 is SEQ ID NO: 5 (DNA sequence is SEQ ID NO: 6).
- the amino acid sequence of dThermoCas9 is SEQ ID NO: 7 (DNA sequence is SEQ ID NO: 8).
- thermophilic Cas9 components variants of the thermophilic Cas9 components
- variant as used herein in relation to any of the active, inactive or nickase Cas9 components of base editors of the invention.
- the term variant may apply in relation to the nucleotide gene sequences or the amino acid sequences of the Cas9 components.
- variants have certain differences in the nucleotide and amino acid sequences from the corresponding Cas9 reference sequences, as disclosed herein, but retain substantially same or similar structure and function to the reference Cas9.
- a variant also includes a GeoCas9 or ThermoCas9 of any of the active or nickase species thereof, which has sequence alterations that do not alter the function of the resulting protein, for example at the level of gene sequence silent nucleotide base changes due to redundancy of the genetic code. At the protein level such changes may be in non- conserved amino acid residues. Also encompassed are variant Cas9 components that are substantially identical, i.e. have only one or a number of sequence variations, for example in non-conserved amino acid residues compared to the respective reference sequences as described herein. The number of such amino acid changes may be selected from 1, 2, 3,
- amino acid changes for example. Such changes may be at least partly contiguous or non-contiguous along the length of the amino acid sequence.
- Variants of the Cas9 components of the base editors of the invention may also be defined in terms of degree of percentage identity to the respective reference sequences.
- a variant may include a Cas9 protein or polypeptide having at least 77% identity; preferably at least 86%; more preferably at least 90%; even more preferably at least 95% identity to the defined reference sequence.
- a Cas protein or polypeptide component of base editors of the invention may comprise an amino acid sequence with a percentage identity with any of the respective reference SEQ ID Nos as disclosed herein, as follows: at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5% or at least 99.8% identity therewith.
- the percentage amino acid sequence identity with the reference sequence determinable as a function of the number of identical positions shared by the sequences in a selected comparison window, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences.
- the overall sequence identity may be determined using a global alignment algorithm known in the art, such as the Needleman Wunsch algorithm in the program GAP (GCG Wisconsin Package, Accelrys).
- GAP GCG Wisconsin Package, Accelrys
- less than the full length of a Cas9 component may be used if it provides substantially similar structure and function as the full length Cas9 reference sequence or percentage identity variants thereof.
- Full length GeoCas9 has 1087 amino acids. Therefore fragments of these Cas9 species for use in the invention may comprise an amino acid sequence that is truncated at either the N terminus and/or the C-terminus, by a number of amino acids selected from 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
- a certain number of amino acid changes selected from addition, deletion and substitution may be provided.
- the number may be selected from 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20.
- the base editors in accordance with the invention may have PAM preferences which correspond to those known or determinable for GeoCas9 or ThermoCas9.
- cytidine deaminases which can be used in the invention, including without limitation: activation-induced deaminase (AID) (see e.g., Longerich S. et ai, (2006) Curr Opin. Immunol. 18(2): 164 - 74; Di noisya J. M. etal., (2007) Annu. Rev. Biochem. 76: 1 - 22); apolipoprotein B mRNA editing protein catalytic subunit 1 (APOBEC1) (see e.g., Harris etal., (2002) Mol. Cell (2002) 10(5): 1247 - 53; Blanc V. & Davidson N. O. (2003) The Journal of Biological Chemistry 278, 1395-1398; Petit V.
- AID activation-induced deaminase
- APOBEC1 apolipoprotein B mRNA editing protein catalytic subunit 1
- APOBEC3F see e.g. Hultquist J. F. etal., (2011) J Virol. 85(21):11220 - 112234; Refsland E. W. et al., (2012) PLoS Pathog. 8(7):e1002800
- APOBEC3G see e.g., Malim M. & Emernam M. (2008) Cell Host Microbe 3(6): 388 -398; Albin J. & Harris R. (2010) Expert Rev. Mol. Med. Jan 22;12:e4; Wissing S. etal., (2010) Mol. Aspects Med.
- AID/APOBEC homologues have also been identified (e.g. Mm-AID, Gg-AID, Dr- AID, Tr-AID, and Ip-AID from mouse, chicken, zebrafish, pufferfish fugu, and channel catfish, respectively) and jawless vertebrates (e.g. CDA1L1_1 to CDA1L1_4, Gc-AID, Lc- AID, and Tn-AID from sea lamprey, nurse shark, the “living fossil” fish coelacanth, and the bony fish tetraodon, respectively) (DancygerA. et al., (2012) FASEB J. 26(4):1517 - 1525; Holland S. etal., (2016) Proc. Natl. Acad. Sci. U S A 115(14): E3211-E3220; Quinlan E. M. etal., (2017) Mol. Cell Biol. 37(20): e00077-1).
- Base editors of the invention may be used with any cytidine deaminase variant or engineered version of them (see e.g. Gehrke etal., (2016) Nature Biotechnology 36: 977 - 982; Wang M. etal., (2009) Nat. Struct. Mol. Biol. 16(7):769-776; Wang M. etal., (2010) J. Exp. Med. 207(1): 141-53; Kohli R. M. etal., (2010) J. Biol. Chem. 24: 285(52): 40956 - 409564; Zuo et ai, (2020) Nature Methods 17, 600-604.
- PmCDAI may be used because these nine human-derived variants have specific preference for the nucleotide immediately upstream of the C targeted for conversion (most of them show preference for TC motif). PmCDAI shows less context preference (Tan J. et at., (2019) Nature Comm. 10: 439) as well as higher editing efficiency (in certain sequence contexts) and higher product purity compared to the widely used APOBEC1 ( Komor et at., (2017) supra ; Tan J. et ai, (2019) supra). In addition, APOBEC1 base editors produce predominantly singly modified products, while CDA1 base editors mainly produce two simultaneous modifications in targets with multiple Cs (Tan J., etai, (2019) supra).
- Base editors of the invention may comprise a cytidine deaminase domain which is preferably a deaminase selected from the apolipoprotein B mRNA-editing complex (APOBEC) family of deaminases.
- the APOBEC family deaminase may therefore be selected from the group consisting of APOBEC1 deaminase, APOBEC2 deaminase, APOBEC3A deaminase, APOBEC3B deaminase, APOBEC3C deaminase, APOBEC3D deaminase, APOBEC3F deaminase, APOBEC3G deaminase, and APOBEC3H deaminase.
- CDA1 cytidine deaminase 1
- a cytidine deaminase domain comprises an amino acid sequence that is at least 85% identical to an amino acid sequence of SEQ ID NOs: 266 - 284, 607 - 610, 5724 - 5736, or 5738 - 5741 as set forth in WO2017/070632 A2.
- Truncated versions of cytidine deaminase inhibitors may be employed.
- a C-terminus-truncated PmCDAI was fused to the N-terminus of nSpyCas9 (without the presence of a linker).
- Truncation of the nuclear export signal (NES) from the C-terminus of PmCDAI showed small effects on editing efficiency and specificity, while larger deletions rendered editing more precise and substantially narrowed the activity window of the base editors.
- base editors of the present invention may use engineered cytidine deaminases in case a narrower window of editing is desired.
- WO 2019/241649, WO 2019/023680 and WO 2018/218166 each describe various cytidine deaminases which may be of use in the presently described invention.
- Adenine deaminases are known, having been engineered to convert adenosine to inosine, which is treated like guanosine by the cell, creating an A to G (or T to C) change.
- Adenine DNA deaminases do not exist in nature but have been created by directed evolution of the Escherichia coli TadA, a tRNA adenine deaminase.
- the evolved TadA domain may be fused to the relevant GeoCas9 or ThermoCas9 protein to create the adenine base editor.
- WO 2018/027078 describes an adenosine deaminase of use in the present invention, capable of deaminating adenine of deoxyadenosine in DNA useful for editing nucleobase pairs in double-stranded DNA sequences.
- a combination of a cytidine deaminase and adenine deaminase may be used that can concurrently introduce A-to-G and C-to-T substitutions, as e.g. described in Gmnewaldef a/., (2020) Nature Biotechnology: Jun 1, there referred to as synchronous programmable adenine and cytosine editor (SPACE).
- SPACE synchronous programmable adenine and cytosine editor
- a "uracil DNA glycosylase inhibitor” or "UGI,” as used herein, means a protein which is capable of inhibiting a uracil-DNA glycosylase base-excision repair enzyme.
- UGI protein and nucleotide sequences are well known, and include, for example, those published in Wang et a!., (1989) “Uracil-DNA glycosylase inhibitor gene of bacteriophage PBS2 encodes a binding protein specific for uracil-DNA glycosylase” J. Biol. Chem. 264: 1 163-1 171; Lundquist et ai, (1997) “Site- directed mutagenesis and characterization of uracil-DNA glycosylase inhibitor protein. Role of specific carboxylic amino acids in complex formation with Escherichia coli uracil-DNA glycosylase” J. Biol. Chem.
- the UGI domain comprises a wild-type UGI or a UGI having an amino acid sequence identical to those set forth in SEQ ID NOs: 322-324 of WO 2017/070632 A2, or SEQ ID NO: 48 of WO 2019/005886 A1 ; or SEQ ID NO: 500 of WO 2019/168953.
- the UGI proteins provided herein include fragments of UGI and proteins homologous to a UGI or a UGI fragment.
- a UGI domain comprises a fragment of the amino acid sequence set forth in any of the aforementioned SEQ ID NOs.
- a UGI fragment comprises an amino acid sequence that comprises at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid sequence as set forth in the aforementioned SEQ ID NOs.
- the UGI proteins comprising UGI or fragments of UGI or homologs of UGI or UGI fragments are referred to as "UGI variants.”
- a UGI variant shares homology to UGI, or a fragment thereof.
- a UGI variant may be at least 70% identical, at least 75% identical, at least 80% identical, at least 85% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at least 99.9% identical to a wild type UGI or a UGI as set forth in any of the aforementioned SEQ ID Nos.
- a fragment of UGI may be at least 70% identical, at least 80% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at least 99.9% to the corresponding fragment of wild-type UGI or a UGI as set forth in any of the aforementioned SEQ ID NOs.
- UGI significantly increases the editing efficiency compared to the absence of UGI (indicatively Banno et at., (2016) Nat Microbiol. 3(4): 423 - 429). Moreover, Komor et at., (2017) supra and Li et al., (2016) supra have reported the implementation of two or four copies of the UGI gene, respectively, further increasing the editing efficiency.
- the base editors of this invention therefore may include multiple copies of UGI therein, e.g. 2, 3 or 4 copies.
- Base editors of the invention may be constructed along the lines already known in the art, and therefore may comprise various linkers, such as XTEN. Some base editors of the invention may be completely linkerless.
- the UGI is always positioned at the C- terminus of Cas9 in base editors of the invention, and since the deaminase may also be at the C-terminus of the Cas9, a linker needs to be placed between the Cas9 and the deaminase, and another linker between the deaminase and the UGI. Usually therefore there is no linker between the Cas9 and the UGI.
- CDA1 at the N-terminus of the base editor protein complex which is connected with a linker to the N- terminus of Cas9, which in turn is connected with the UGI via another linker at its C- terminus.
- PmCDAI can be fused to the N-terminus of the Cas9, maintaining the same editing efficiencies but widening even more the activity window compared to the known SpyCas9 base editors.
- Gam may be fused to BE3, BE4, SaBE3, or SaBE4 in base editors of the invention.
- Gam is a bacteriophage Mu protein that binds DSBs and greatly reduces indel formation during base editing, in most cases to below 1.5%, and further improves product purity ( KomorA. C. et a!., (2017) Science Advances 3(8): eaao4774).
- a single-vector approach i.e. medium copy number plasmid carrying the AcrIICI under the same inducible promoter as before, the active ThermoCas9/GeoCas9 under the same inducible promoter as before and the guide RNA under the same constitutive promoter as before
- the Cas9-mediated targeting was shown to be stronger than the AcrIICI -mediated inhibition. So therefore a single-vector approach may provide a higher tunability than the dual-vector approach.
- the inventors prefer a single-vector approach to enable counter-selection (i.e. high Cas9, low AcrIICI) following a preceding base editing step (i.e. high Cas9, high AcrIICI).
- the amino acid sequence of AcrIICI Nme (for application in E. coli ) is set forth in SEQ ID NO: 11; the DNA sequence in SEQ ID NO: 12.
- WO 2017/160689 describes various type ll-C anti-CRISPRs which may be employed in the present invention.
- a crRNA of particular sequence is required, together with a tracrRNA.
- the two RNA molecules may be substituted by a crRNA-tracrRNA chimeric molecule, where crRNA constitutes the 5’ part of the molecule followed by a small connecting sequence (usually 5’-GAAA-3’) followed by tracrRNA as the 3’part of the molecule, what is commonly known as a single guide or guide RNA (sgRNA or gRNA).
- sgRNA or gRNA single guide or guide RNA
- the base editors of the invention form a ribonucleoprotein complex with the crRNA/tracrRNA or gRNA, and these RNAs complementarily bind to a targeted sequence of the genomic DNA.
- the RNA targeting molecules are designed in sequence selection so as to recognize the complement of a so called “protospacer” sequence which is adjacent 3’ to the PAM sequence in the same strand.
- sgRNAs with truncated or extended spacer sequence may be used in the invention to shift the editing window, as has been observed in a study from Banno, S. et al., (2016) Nature Microbiology, 3(4): 423 - 429.
- an optimal sgRNA was used for ThermoCas9 (Mougiakos eta!., (2017) Nature Communications 8(1): 1647) for both ThermoCas9 and GeoCas9.
- the sgRNA of GeoCas9 (Harrington et al., (2017a) supra) may be used for both Cas9 variants.
- an aptazyme could be fused to the sgRNA for strict control of the base editing efficiency, although it might reduce the on-target efficiency and present leaky activity (Tang W. etai, (2017) Nature Communications 8: 15939.
- the “base editing window” in accordance with the present invention are the positions in the protospacer sequence of the DNA strand which are open to reaction with the deaminase. Whether or not a base change occurs at a given position upstream of the PAM depends of course on the presence of the appropriate base: C for a cytidine deaminase, and A for and adenine deaminase. Also, even though there may be a multiplicity of appropriate bases in the editing window, this may not mean that all such bases are acted on simultaneously and/or completely. One may expect to see patterns of editing in the base editing window and these may be revealed after the event by sequencing of the relevant region of DNA concerned.
- the editing window may be any of the following, in nucleotide position, with respect to the PAM as can be seen in Figure 1.
- base editing within a protospacer window selected from any of the following individual window positions:
- Base editing may take place preferentially at the PAM distal end of any of the base editing window possibilities. This is usually when a dead or nickase version of the GeoCas9 or ThermoCas9 is used.
- Base editing may take place preferentially at the PAM proximal end of any of the base editing window possibilities. This is usually when an active GeoCas9 or ThermoCas9 is used in conjunction with an anti-CRISPR protein.
- the window of -8 to -5 of the PAM is less efficient than elsewhere upstream in the protospacer and as a consequence is less preferred.
- base editors of the invention may be synthesised as fusion proteins within cells in situ by expression within the cell from a suitable polynucleotide expression vector
- base editors can be produced externally of the cell by a combination of chemical synthesis and chemical coupling.
- base editors may be made by expressing protein or polypeptide components in recombinant expression systems, isolated and then covalently coupled using protein coupling chemistry well known in the art.
- Peptide ligation may be used to create base editors, involving peptide or protein synthesis and ligation. Two or more peptide ligation steps, and sequential peptide ligation may be used.
- Cell-free protein expression may be used to produce base editors or ribonucleoprotein base editors of the invention.
- Cell-free protein production is achieved by combining a crude lysate from growing cells, which contains all the necessary enzymes and machinery for protein synthesis (including transcription and translation), with the exogenous supply of essential amino acids, nucleotides, salts, and energy-generating factors and introducing exogenous messages including RNA (mRNA) or DNA as template into the system.
- mRNA RNA
- Chemical synthesis may be used, especially solid-phase peptide synthesis (SPPS) followed by isolation and chemical ligation of segments of base editor proteins.
- SPPS solid-phase peptide synthesis
- Various methods will be well known to persons of skill in the art, including thioester-forming ligation, oxime and hydrazone-forming ligation, thiazolidone/oxazolidine-forming ligation, disulphide exchange (thioacid-capture ligation), or native chemical ligation (NCL).
- NCL NCL
- a thioester and a cysteinyl peptide are combined together to form a ligated peptide or protein.
- synthetic and recombinant building blocks can be used for NCL in a semisynthetic manner.
- segments can be acquired by recombinant protein expression and/or by synthesis and coupled together.
- RNP Ribonucleoprotein
- RNP delivery may be performed using a Neon electroporation system (ThermoFisher) following the manufacturer’s instructions.
- a modification of target nucleic acid may be carried out directly on cells without employing expression vectors encoding the base editor proteins and targeting RNA.
- the base editor proteins and targeting RNA may be introduced into the cells simultaneously, sequentially (in any order as desired), or separately.
- a ribonucleoprotein base editor complex may be introduced directly into cells.
- Polynucleotides of the present invention as described herein may be in isolated form. However, in order that expression of such a polynucleotide is carried out in a desired cell to undertake base editing, the polynucleotide encoding the base editor (and/or gRNA) is preferably be provided in an expression construct. One or more expression vectors may be used in accordance with the invention to achieve the base editing required.
- Suitable expression vectors will vary according to the recipient cell and may incorporate regulatory elements which enable expression in the target cell and preferably which facilitate high-levels of expression. Such regulatory sequences may be capable of influencing transcription or translation of a gene or gene product, for example in terms of initiation, accuracy, rate, stability, downstream processing and mobility.
- Such elements may include, for example, strong and/or constitutive promoters, 5’ and 3’ UTR’s, transcriptional and/or translational enhancers, transcription factor or protein binding sequences, start sites and termination sequences, ribosome binding sites, recombination sites, polyadenylation sequences, sense or antisense sequences, sequences ensuring correct initiation of transcription and optionally poly-A signals ensuring termination of transcription and transcript stabilisation in the host cell.
- the regulatory sequences may be plant, animal, bacteria, fungal or virus-derived, and preferably may be derived from the same organism as the host cell. Clearly, appropriate regulatory elements will vary according to the host cell of interest. For example, regulatory elements which facilitate high-level expression in prokaryotic host cells such as in E.
- coli may include the pLac, T7, P(Bla), P(Cat), P(Kat), trp or tac promoters.
- Regulatory elements which facilitate high-level expression in eukaryotic host cells might include the AOX1 or GAL1 promoter in yeast or the CMV- or SV40-promoters, CMV-enhancer, SV40-enhancer, Herpes simplex virus VIP16 transcriptional activator or inclusion of a globin intron in animal cells.
- constitutive high-level expression may be obtained using, for example, the Zea mays ubiquitin 1 promoter or 35S and 19S promoters of cauliflower mosaic virus.
- Suitable regulatory elements may be constitutive, whereby they direct expression under most environmental conditions or developmental stages, developmental stage specific or inducible.
- the promoter is inducible, to direct expression in response to environmental, chemical or developmental cues, such as temperature, light, chemicals, drought, and other stimuli.
- promoters may be chosen which allow expression of the protein of interest at particular developmental stages or in response to extra- or intra-cellular conditions, signals or externally applied stimuli.
- a range of promoters exist for use in E. coli which give high-level expression at particular stages of growth (e.g. osmY stationary phase promoter) or in response to particular stimuli (e.g. HtpG Heat Shock Promoter).
- Suitable expression vectors may comprise additional sequences encoding selectable markers which allow for the selection of said vector in a suitable host cell and/or under particular conditions.
- base editing of cells comprises transfecting, transforming or transducing the cell with any of the expression vectors as hereinbefore described.
- the methods of transfection, transformation or transduction are of the types well known to a person of skill in the art. Where there is one expression vector used to generate expression of a ribonucleoprotein complex of the invention and when the targeting RNA is added directly to the cell then the same or a different method of transfection, transformation or transduction may be used. Similarly, then there is one expression vector being used to generate expression of a ribonucleoprotein complex of the invention and when another expression vector is being used to generate the targeting RNA in situ via expression, then the same or a different method of transfection, transformation or transduction may be used.
- an mRNA encoding the base editor protein polypeptide is introduced into a cell so that the base editor is expressed in the cell.
- the targeting RNA which guides the Cas protein complex to the desired target sequence is also introduced into the cell, whether simultaneously, separately or sequentially from the mRNA, such that the necessary ribonucleoprotein base editor complex is formed in the cell.
- the modification of the target nucleic acid may be made in vivo , that is in situ in a cell, whether an isolated cell or as part of a multicellular tissue, organ or organism.
- the method may desirably be carried out in vivo or alternatively may be carried out by isolating a cell from the whole tissue, organ or organism, treating the cell ribonucleoprotein complex in accordance with the method and subsequently returning the cell treated with ribonucleoprotein complex to its former location, or a different location, whether within the same or a different organism.
- the ribonucleoprotein complex or the Cas protein or polypeptide requires an appropriate form of delivery into the cell.
- suitable delivery systems and methods are well known to persons skilled in the art, and include but are not limited to cytoplasmic or nuclear microinjection.
- sgRNA e.g. a U6 promoter plus sgRNA
- thermocas9 and geocas9 gene sizes are small enough to permit base editor constructs of the invention to be inserted into AAV vectors (unlike spCas9) and the GeoCas9 and ThermoCas9 base editor fusions of the invention are still small enough for AVV vectors. Therefore in preferred modes of delivery, an Adeno-associated virus (AAV) is used; this delivery system is not disease causing in humans and has been approved for clinical use in Europe.
- AAV Adeno-associated virus
- Methods of transformations of cells in accordance with the invention include the following: In vitro and ex vivo electroporation of (i) DNA plasmid(s) coding for the base editor protein complex and the desired sgRNA(s), (ii) mRNA(s) coding for the base editor protein complex and the desired sgRNA(s), or (iii) purified base editor protein complex molecules loaded with the desired sgRNA(s), into either the nucleus (nucleofection) or the cytoplasm of individual mammalian cells (see for example: Matano M, etal. (2015) Nat Med. 21 : 256 - 62 ; Paquet D, et ai.
- ssDNA or dsDNA vectors that code for the base editor protein complex and the desired sgRNA(s).
- Liposomes for example, Lipofectamine by Thermo Fischer Scientific; and see also for example Yin H. etal. (2016) Nat. Biotechnol. 34:328 -333
- other lipoplexes/polyplexes for example, FuGENE-6 reagent by Promega, zwitterionic amino lipids (see for example, Miller J. etal. (2017) Angew Chem. Int. Ed. Engl. 56: 1059 - 1063), DNA/Ca 2+ microcomplexes, (see for example, Ebina H. etal., (2013) Sci. Rep.
- Covalent attachment (see for example, Ramakrishna S. etal. (2014) Genome Res. 24: 1020-1027; Axford D. etal. (2017) FASEB J. 31: 909.4) of cell-penetrating peptides (CPPs) to purified base editor protein complex molecules loaded with the desired sgRNA(s) for in vitro and ex vivo delivery.
- CPPs cell-penetrating peptides
- AID orthologs from bony and cartilaginous fish, as well as PmCDAI are cold- adapted enzymes (Dancyger ef a/., 2012; Quinlan et al., 2017; Holland et a!., 2018).
- PmCDAI exhibits optimal activity at 14.5°C in vitro (Quinlan et al., 2017).
- all CDA1L1 enzymes from sea lamprey as well as from zebrafish AID (Dr-AID) or channel catfish (Ip-AID) are also cold adapted, exhibiting an optimal temperature of 14- 22 °C and 20-25°C, respectively, while the human (Hs-AID) prefers 30 -37 °C (Dancyger A.
- Timing of base editing protocol The inventors have found that methods of the invention permit a short time of protocol. A minimum of about 20 hours is possible from transforming the cells up to sending the PCR amplified DNA for sequencing.
- a short time protocol has sometimes a disadvantage of generating some mixed wild type/mutant genotypes.
- this problem is readily overcome by a re-streaking on plates with base editing (dGeoTarget-AID, dThermoTarget-AID) or counter-selection conditions (AcrGeoTarget-AID, AcrThermoTarget-AID).
- a re-streaking protocol can provide higher editing efficiency and wider window of base editing activity.
- the present invention is of broad applicability and expression vectors of the present invention as described herein may be employed in any genetically tractable organism which can be transformed with the expression vectors.
- the invention involves the direct exposure of the organism to one or more individual components of the base editing systems of the invention, e.g. direct exposure to gRNA or to ribonucleoprotein, then the invention may be deployed to any suitable host cell which can be induced to take up the base editing component.
- Appropriate cells for base editing may be prokaryotic or eukaryotic.
- commonly used host cells may be selected for use in accordance with the present invention including prokaryotic or eukaryotic cells which are genetically accessible and which can be cultured, for example prokaryotic cells, fungal cells, plant cells and animal cells including human cells (but not embryonic stem cells).
- host cells will be selected from a prokaryotic cell, a fungal cell, a plant cell, a protist cell or an animal cell.
- the host cell is a prokaryotic cell it may be bacterium, e.g. an Escherichia coli cell.
- the base editors and any of the base editing systems of the invention described herein may be used to modify genomes of bacterial cells.
- the bacteria may be thermophilic bacteria, for example bacteria selected from: Acidithiobacillus species including Acidithiobacillus caldus ; Aeribacillus species including Aeribacillus pallidus; Alicyclobacillus species including Alicyclobacillus acidocaldarius, Alicyclobacillus acidoterrestris, Alicyclobacillus cycloheptanicusl, Alicyclobacillus hesperidum; Anoxybacillus species including Anoxybacillus caldiproteolyticus, Anoxybacillus flavithermus , Anoxybacillus rupiensis, Anoxybacillus tepidamans ; Bacillus species including Bacillus caldolyticus, Bacillus caldotenax, Bacillus caldovelox, Bacillus coagulans, Bacillus clausii, Bacillus lichen
- Thermovibrio species including Thermovibrio ammonificans, Thermovibrio ruber, Thermovirga species including Thermovirga lienii and Thermus species including Thermus aquaticus, Thermus caldophilus, Thermus flavus, Thermus scotoductus, Thermus thermophilus; Thiobacillus neapoiitanus.
- the bacteria may be mesophilic and selected from any of: Acidithiobacillus species including Acidithiobacillus caldus ; Actinobacillus species including Actinobacillus succinogenes; Anaerobiospirillum species including Anaerobiospirillum succiniciproducens; Bacillus species including Bacillus alcaliphilus, Bacillus amyloliquefaciens, Bacillus circulans, Bacillus cereus, Bacillus clausii, Bacillus firmus, Bacillus halodurans, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus pumilus, Bacillus subtilis, Bacillus thuringiensis ⁇ , Basfia species including Basfia succiniciproducens ⁇ , Brevibacillus species including Brevibacillus brevis ; Brevibacillus laterosporus ; Clostridium species including Clostridium ace
- the base editors and methods of the invention may be used to modify the genome of yeast or fungi which may be mesophilic and wherein the fungus is selected from: an Aspergillus species including, but not limited to, Aspergillus nidulans, Aspergillus niger, Aspergillus terreus, Aspergillus oryzae and Aspergillus terreus; more preferably the Aspergillus species is Aspergillus nidulans or Aspergillus niger.
- the mesophilic fungal species could be a Candida species.
- the yeast or fungal species may be thermophilic, e.g.
- dThermoTarget-AID (a fusion of: dead ThermoCas9 - a 121 amino acid linker — PmCDAI - a 2 amino acid linker - UGI - a 3 amino acid degradation tag).
- dGeoTarget-AID (a fusion of dead GeoCas9 - a 121 amino acid linker - PmCDAI - a 2 amino acid linker - UGI - a 3 amino acid degradation tag).
- AcrThermoTarget-AID (a fusion of active ThermoCas9 - a 121 amino acid linker — PmCDAI - a 2 amino acid linker - UGI - a 3 amino acid degradation tag) plus expression of the anti-crispr protein AcrIICINme as a separate gene in the same plasmid.
- AcrGeoTarget-AID (a fusion of active ThermoCas9 - a 121 amino acid linker - PmCDAI - a 2 amino acid linker - UGI - a 3 amino acid degradation tag) plus expression of the anticrispr protein AcrIICINme as a separate gene in the same plasmid.
- nThermoBE4 (a fusion of APOBEC1 - a 32 amino acid linker - nickase ThermoCas9 - a 10 amino acid linker - UGI - a 10 amino acid linker - UGI - a 4 amino acid linker - SV40 NLS).
- nThermoTarget-AID (a fusion of nickase ThermoCas9 - a 104 amino acid linker — PmCDAI - a 10 amino acid linker- UGI).
- the target sequence (protospacers) are always on the genome.
- Example 1 “dGeoTarget-AID” base-editing system
- the 3’-end (minus the stop codon) of the deactivated GeoCas9 (D8A, H582A) endonuclease gene from Geobacillus stearothermophilus (Harrington et al., (2017a) supra) [SEQ ID NO: 6] was fused to a 363 bp long (SH3 and 3xFLAG tag) linker sequence, which was in turn fused to the Petromyzon marinus cytosine deaminase (PmCDAI) gene (Nishida etai, (2016) Science Vol 353 Issue 6305, aaf8729) - minus the stop codon, which was in turn fused to a 6 bp (SR) linker sequence, which was in turn fused to the uracil DNA glycosylase inhibitor (UGI) gene (minus the stop codon) from bacteri
- the Lad inhibits the expression of the dGeoCas9-PmCDA1-UGI-LVA fusion, by binding to the lac operator sequence upstream of the corresponding genetic sequence, while addition of IPTG blocks this binding and permits the expression of the dGeoCas9-PmCDA1-UGI-LVA fusion.
- the resulting pdGeoTarget-AID vector was employed as the basis for the construction of the experimental vectors described further on.
- the targeted base-editing efficiency of the dGeoTarget-AID system was examined.
- the E. coli DH10B_gfp strain was employed as the main experimental strain.
- the DH10B_gfp strain was constructed by integrating a gfp gene into the genome of the E. coli DH10B strain.
- Six spacers, designed to target protospacers within the sequence of the genomically integrated gfp gene, were incorporated separately into the 5’-end of the sgRNA module of the pdGeoTarget-AID vector (see Table 1).
- the selected protospacers were flanked by PAMs for which GeoCas9 was previously demonstrated to have variable levels of preference (Harrington et al., (2017a) supra).
- Figure 5A shows the percentages resulting from high-throughput in silico analysis of Sanger sequencing of several single colonies (x axis), employing a variation of the on-line tool “EditR” and setting as threshold p £ 0.05. Single colonies were obtained from base-editing experiments of at least three independent biological replicates. No base-editing was observed in any C-position, when pdGeoTarget-AID_BE-G5/6 vectors were applied. Streaking.
- FIG. 5B Two single colonies (colony with number 2 for spacer BE-G1, and 14 for spacer BE-G2), previously screened as partially edited, were selected from spot-streaked “master” plates and were subsequently streaked on plates supplemented with 1 mM IPTG for maximum induction of the pdGeoTarget-AID system. Several single colonies per streak were screened by PCR and Sanger sequencing (colonies 2a-2r for spacer BE-G1, and 14a-14s for spacer BE-G2), and the base-editing efficiencies are reported in the heatmaps.
- PmCDAI Petromyzon marinus cytosine deaminase
- the genetic sequence of the dThermoCas9-PmCDA1-UGI-LVA fusion was cloned into a low copy number plasmid (pACYC184) under the transcriptional control of a synthetic, IPTG-inducible, tetracycline promoter (Ptet-lac; Ptet combined with lac operator).
- a synthetic, IPTG-inducible, tetracycline promoter Ptet-lac; Ptet combined with lac operator.
- a sgRNA-expressing module transcribed from the strong, constitutive promoter PJ23119 and the lad gene constitutively expressed from its native promoter (Placl).
- the Lad inhibits the expression of the dThermoCas9-PmCDA1-UGI- LVA fusion, by binding to the lac operator sequence upstream of the corresponding genetic sequence, while addition of IPTG blocks this binding and permits the expression of the dThermoCas9-PmCDA1-UGI-LVA fusion.
- the resulting pdThermoTarget-AID vector was employed as the basis for the construction of the experimental vectors described further on.
- the targeted base-editing efficiency of the dThermoTarget-AID system was examined.
- the E. coli DH10B_gfp strain was employed as the main experimental strain; for its construction, a gfp gene was integrated into the genome of the E. coli DH10B strain.
- Six spacers, designed to target protospacers within the sequence of the genomically integrated gfp gene, were incorporated separately into the 5’-end of the sgRNA module of the pdThermoTarget-AID vector (see Table 2).
- the selected protospacers were flanked by PAMs for which ThermoCas9 was previously demonstrated to have variable levels of preference (Mougiakos, Mohanraju, and Bosma et al., (2017) supra).
- ThermoCas9 was previously demonstrated to have variable levels of preference (Mougiakos, Mohanraju, and Bosma et al., (2017) supra).
- the expression of the dThermoTarget-AID fusion was induced with the addition of 50 mM IPTG during recovery and plating in order to trigger base-editing.
- Figure 6A shows the percentages resulting from high-throughput in silico analysis of Sanger sequencing of several single colonies (x axis), employing a variation of the on-line tool “EditR” and setting as threshold p £ 0.05. Single colonies were obtained from base-editing experiments of at least three independent biological replicates.
- Figure 6B shows two single colonies (colony with number 13 for spacer BE-T1, and 15 for spacer BE-T4) , previously screened as partially edited, were selected from spot-streaked “master” plates and were subsequently streaked on plates supplemented with 1 mM IPTG for maximum induction of the pdThermoTarget-AID system.
- Several single colonies per streak were screened by PCR and Sanger sequencing (colonies 13a-13j for spacer BE-T1, and 15a- 15j for spacer BE-T4), and the base-editing efficiencies are reported in the heatmaps.
- the dThermoTarget-AID preferentially edited Cs at the PAM-distal end of the protospacer, similar to the commonly used SpyCas9-base editors. However, a much broader window of activity (from -6 to -27 positions) was observed, extending the target spectrum from 4-6 bp to 22 bp.
- the dThermoTarget-AID base-editing tool can be used to generate OG to T ⁇ A mutants in E. coli with efficiencies of up to 100% in single colonies, and activity window from -6 to -27 positions (22 bp) with 1-step incubation or from -5 to -27 positions (23 bp) with 2-step incubation.
- this novel base editor with unique PAM preferences compared to the currently available base-editing systems significantly expands both the targeting scope and the editing window in bacteria.
- AcrllCl Nme :GeoCas9-PmCDA1-UGI-LVA system was created (hereafter denoted as “AcrGeoTarget-AID”). Induction of the AcrllCl Nme expression by this system allows the GeoCas9-PmCDA1-UGI-LVA fusion to perform only base-editing, while stopping the induction of the AcrllCl Nme expression, resulting in counter-selection of the unedited cells by the active GeoCas9 component of the GeoCas9-PmCDA1-UGI-LVA fusion.
- the dgeocas9 gene was substituted into the previously described pdGeoTarget- AID plasmids with the active geocas9 gene, whilst simultaneously cloning into the same vectors the acriid Nme gene under the transcriptional control of the rhamnose-inducible promoter (Prha).
- Prha rhamnose-inducible promoter
- BE-G4 contains Cs only at either the extreme PAM-proximal end (C-2, C-3) or the extreme PAM-distal end (C-19, C-20), where base-editing events are unlikely or very likely to happen, respectively.
- Previous studies (Hsu etai, (2013) Nature Biotechnology 31: 827 - 832) for other Cas9 endonucleases demonstrate that 1 or 2 spacer-protospacer mismatches, especially at the PAM-distal end, are generally more tolerated than multiple consecutive mismatches.
- GeoCas9 probably tolerates spacer-protospacer mismatches at the PAM-distal end, resulting in this case from C-19 and/or C-20 base-editing, and could still cleave the possibly edited genomic target, triggering cell death.
- the mutant cells were eliminated from the population, resulting in lower base-editing efficiencies for the PAM-distal positions compared to the catalytically deactivated counterpart (dGeoTarget-AID).
- Figure 7A shows base-editing (50 mM IPTG and 0.2% L-rhamnose) and counter selection (1 mM IPTG) conditions were applied during recovery (R) and plating (P), respectively.
- no induction conditions or base-editing conditions 50 mM IPTG and 0.2% L-rhamnose
- the sgRNA module was always constitutively transcribed. The percentages resulted from high-throughput in silico analysis of Sanger sequencing of several single colonies (x axis), employing a variation of the on-line tool “EditR” and setting as threshold p £ 0.05. Single colonies were obtained from base-editing experiments of at least three independent biological replicates.
- Figure 7B shows two single colonies (colony with number 7 for spacer BE-G1 P: counter-selection, and 1 for spacer BE-G2 P: counter-selection), previously screened as partially edited, were selected from spot- streaked “master” plates and were subsequently streaked on plates supplemented with 1 mM IPTG for maximum induction of the pAcrGeoTarget-AID system.
- Several colonies were screened by PCR and Sanger sequencing (colonies 7a-7q for spacer BE-G1 P: counter-selection, and 1a-1s for spacer BE-G2 P: counter-selection), and the base-editing efficiencies are reported in the heatmaps.
- AcrGeoTarget-AID provided numerous colonies with 100% OG to T ⁇ A conversion of at least one cytosine, even with only 1 hour of base-editing induction (counter-selection case) (Figure 7A), contrary to the results from dGeoTarget-AID for the same targets ( Figure 5A).
- dGeoTarget-AID clean base-edited colonies only occurred with spacer BE-G2 in 2/20 screened colonies, whereas induction of AcrGeoTarget-AID lead to 10/24 (P: base-editing) and 4/14 (P: counter-selection) colonies with at least one cytosine 100% converted to thymine.
- spacer BE-G1 showed 8/22 (P: counter selection) colonies with complete deamination, while BE-G4 marked 4/33 (P: base-editing) and 5/32 (P: counter-selection) colonies.
- the editing window of AcrGeoTarget-AID is shifted towards the “seed region”, most probably due to lower spacer-protospacer mismatch tolerance at these positions ( Figure 7A).
- Cs at PAM-proximal positions that have not been edited by dGeoTarget-AID were surprisingly edited up to 100% from the AcrGeoTarget-AID (C-9 for BE-G1 and BE-G2; C-15 for BE-G3; C-3 and C-12 for BE-G6), while editing efficiencies of Cs at the PAM-distal end remained high (C-24 for BE-G2; C-19 and C-20 for BE-G4).
- the activity window of the AcrGeoTarget-AID was 26 bp (from -3 to -28).
- the AcrGeoTarget-AID editor mediates rapid and efficient base-editing in a surprisingly wide activity window (26 bp contrary to 4-6 bp with SpyCas9), enabling the generation of premature stop codons outside the restricted region of the current base editing tools.
- the AcrGeoTarget-AID system can be used to generate OG to T ⁇ A mutants in E. coli with efficiencies of up to 100%, and activity window from -3 to -28 positions (26 bp) with 1-step incubation or from -9 to -24 positions (16 bp) with 2-steps incubation.
- thermocas9 gene in the previously described pdThermoTarget-AID plasmids was substituted with the active thermocas9 gene, while simultaneously cloning in the same vectors the acriid Nme gene under the transcriptional control of the rhamnose-inducible promoter (Prha).
- Prha rhamnose-inducible promoter
- Base-editing (50 mM IPTG and 0.2% L-rhamnose) and counter-selection (1 mM IPTG) conditions were applied during recovery (R) and plating (P), respectively.
- no induction conditions or base-editing conditions 50 pM IPTG and 0.2% L- rhamnose
- the sgRNA module was always constitutively transcribed.
- the percentages shown in Figure 8A resulted from high- throughput in silico analysis of Sanger sequencing of several single colonies (x axis), employing a variation of the on-line tool “EditR” and setting as threshold p £ 0.05. Single colonies were obtained from base-editing experiments of at least three independent biological replicates.
- Streaking several colonies screened as partially edited or completely unedited (colonies with numbers 28-35 for BE-T1 P: base-editing; 32-34 for BE-T1 P: counter-selection; 16-18 for BE-T4 P: base-editing; 24-27 for BE-T4 P: counter-selection) were selected from spot-streaked “master” plates and were subsequently streaked on plates supplemented with 1 mM IPTG for maximum induction of the pAcrGeoTarget-AID system (“Streaking”).
- Figure 8B shows two single colonies (colony with number 5 for spacer BE-T 1 P: counter-selection, and 7 for spacer BE-T4 P: counter-selection), previously screened as partially edited, were selected from spot-streaked “master” plates and were subsequently streaked on plates supplemented with 1 mM IPTG for maximum induction of the pAcrGeoTarget-AID system.
- Several colonies were screened by PCR and Sanger sequencing (colonies 5a-5i for spacer BE-T1 P: counter-selection, and 7a-7j for spacer BE-T4 P: counter-selection), and the base-editing efficiencies are reported in the heatmaps.
- the preferred editing window of AcrThermoTarget-AID is shifted towards the “seed region”, due to lower mismatch tolerance leading to less efficient counter-selection at these positions (Figure 5A).
- dThermoTarget-AID with BE-T 1 exhibited a strong preference for Cs at the PAM-distal end (C-19, C-23, and C- 27), while AcrThermoTarget-AID edited scattered Cs (C-9, C-14, C-18, and C-19).
- no editing was observed at C-27 for AcrThermoTarget-AID.
- dThermoTarget-AID with BE-T4 performed substantial editing at C-24, while AcrThermoTarget-AID was completely unable to edit this position.
- the overall activity window of the AcrThermoTarget-AID was 15 bp (from -9 to -23).
- the AcrThermoTarget-AID editor mediates rapid and efficient base-editing in a wide activity window (15 bp contrary to 4-6 bp with SpyCas9), enabling the generation of premature stop codons outside the restricted region of the current base-editing tools.
- Example 5 Base-editing technology for human cells
- the genetic sequence of the rAPOBEC1-nThermoCas9(D8A)-UGI-UGI-NLS fusion was cloned into a plasmid without mammalian origin of replication (pCMV) under the transcriptional control of the constitutive, cytomegalovirus (CMV) promoter. Cloned in the same plasmid was a sgRNA expression module transcribed from the constitutive RNA-polymerase III U6 promoter.
- HEK293T Human Embryonic Kidney 293 cells
- targets the homeobox protein EMX1, the vascular endothelial growth factor A (VEGFA), and the DNA- methyltransferase 1 (DNMT1).
- EMX1 the homeobox protein
- VEGFA vascular endothelial growth factor A
- DNMT1 DNA- methyltransferase 1
- Three different cytosine-rich targeting-spacers were designed for each target gene (see Table 5) and incorporated separately into the 5’-end of the sgRNA module of the nThermoBE4 vector. All selected protospacers were flanked by optimal PAM (5’-NNNNCCAA-3’), and HEK293T (ATCC CRL-3216TM) cells were employed as the main experimental cell line.
- the resulting pnThermoBE4_TE1/TE2/TE3/TV1/TV2/TV3/TD1/TD2/TD3 vectors were transfected (LipofectamineTM 3000 Transfection reagent (Thermo Fisher Scientific, Cat. No. L3000-008) in HEK239T cells.
- LipofectamineTM 3000 Transfection reagent Thermo Fisher Scientific, Cat. No. L3000-008
- the genomic DNA of each transfected culture was isolated, and Q5 PCR amplification of the corresponding target regions was performed, followed by purification, Sanger sequencing and T7 endonuclease I assays (EnGen mutation detection kit, NEB).
- Heatmaps show the percentage of OG to T ⁇ A conversion in every Oposition within or immediately upstream of protospacers (y axis) in the genomes of HEK293T cell populations transfected with the pnThermoBE4_BE-
- T E 1 /T E2/T E3/T V 1 /T V2/T VE/T D1/TD2/TD3 vectors White boxes represent no base editing, light to darker grey boxes represent increasing base-editing efficiencies, and black boxes represent 100% base-editing efficiency. The percentages resulted from in silico analysis of Sanger sequencing of several single colonies (x axis), employing a variation of the on-line tool “EditR”.
- OG to T ⁇ A targeted base editing is provided by the described nThermoBE4 system, reaching on-target editing efficiencies up to 72% at the best edited site ( Figure 9, protospacer targeted by the pnThermoBE4_BE-TV3 plasmid). 4 out of 9 targeted genomic regions were successfully base-edited in multiple positions across the protospacer sequence or immediately upstream ( Figure 9).
- the nThermoBE4 preferentially edited Cs at the PAM-distal end of the protospacer, similar to the commonly used SpyCas9-base editors.
- the base-editing activity of the nThermoBE4 system at undesired loci within the genome of HEK293T cells was studied.
- the in silico- predicted off-target sites were screened in the populations with successful on-target OG to T ⁇ A conversion (cultures of HEK239T cells harbouring the pnThermoBE4_BE-TE2/BE-TV2/BE-TV3/BE-TD3 vectors).
- Base-editing was observed in 1/4, 1/13, 1/11, and 1/1 of the predicted off-target sites for spacers BE-TE2, BE-TV2, BE-TV3, and BE-TD3, respectively.
- the “nThermoBE4” system exhibited less off-target activity (only 4/29 tested, “predicted” off-target sites were indeed edited) compared to the previously reported high off-target activity of the nSpyCas9-mediated BE4 system.
- 3 studies have reported that at least half of the off-target sites which were predicted for different protospacers were indeed edited (17/34, 9/13, or 21/21 tested, “predicted” off-target sites) ( Komor et al., (2016) supra ; Rees etai, (2017) supra ; Kim etai, (2017) supra).
- ThermoCas9 is smaller by almost 300 amino acids than SpyCas9, facilitating the use of convenient delivery systems, for example adenoviral vectors, which can reach different targets from SpyCas9 due to alternative PAM requirements and performs base editing in a broader window, facilitating the introduction of stop codons, e.g. the functional characterization of genes.
- the ThermoCas9 base-editor presents reduced off- target activity, which is vital for applications in human cells.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Genetics & Genomics (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Medicinal Chemistry (AREA)
- Biomedical Technology (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
- Enzymes And Modification Thereof (AREA)
Abstract
Thermostable, type II-C Cas9 variants of Geobacillus thermodenitrificans T12 (ThermoCas9) and Geobacillus stearothermophilus (GeoCas9) use 23 nt spacers to edit sites adjacent to an N4CVAA/N4CCCA or N4CRAA PAM, respectively. These Cas9 variants are linked to a cytidine deaminase enzyme from the sea lamprey Petromyzon marinus (PmCDA1) to provide: (a) dThermoTarget-AID, (b) dGeoTarget- AID, (c) AcrThermoTarget-AID, and (d) AcrGeoTarget-AID. Compared to known base editors, these systems exhibit much larger base editing windows: (a) from -5 to -27 positions relative to the PAM (23 bp); (b) from -5 to -24 positions relative to the PAM (20 bp); (c) from -5 to -27 positions relative to the PAM (23 pb); (d) from -3 to - 28 positions relative to the PAM (26 bp). The first two editors employ the catalytically inactive ThermoCas9 or GeoCas9, respectively, while the last two editors co-express a small anti-CRISPR protein from Neisseria meningitidis (AcrIIC1Nme) with active ThermoCas9 or GeoCas9, respectively. Methods of gene editing are described.
Description
Base Editing Tools
FIELD OF THE INVENTION
The present invention relates to the field of genetic modification technology, particularly to the area of CRISPR-Cas mediated modifications of genetic material, and more particularly the area of CRISPR-Cas mediated base editing of nucleic acid materials.
BACKGROUND
Over the last 5 years, CRISPR-Cas technology has developed a wide range of applications for the modification of genetic material, including editing. The genetic material can be edited across a range of cell types and organisms, from single prokaryotic cells to entire eukaryotic organisms. The simplicity and programmability of the RNA-guided CRISPR-associated nucleases have enabled the generation of double-stranded DNA breaks (DSBs) at precise target positions in the genome (see Jinek, M. etai, (2012) Science, 337(6096): 816 - 821). In response to DSBs, cellular DNA repair mechanisms introduce random insertions/deletions (indels), translocations or other stochastic rearrangements at the targeted site through non-homologous/microhomology-mediated end joining (NHEJ/MMEJ), leading to gene disruption and undesired modifications (see Jeggo, P. A. (1998) Advances in Genetics Vol. 38, pp. 185 - 218, Academic Press; Rouet, P. et ai, (1994) Proceedings of the National Academy of Sciences, 91(13), 6064-6068; Lukacsovich, T. et ai, (1994) Nucleic Acids Research, 22(25): 5649-5657).
Alternatively, a precise DNA modification can occur through homology-directed repair (HDR), albeit only when a donor DNA template is provided (see Rudin, N. et ai, (1989) Genetics, 122(3), 519-534; Rouet, P. etai, (1994) Molecular and Cellular Biology, 14(12): 8096-8106).
Most prokaryotes lack a functional NHEJ/MMEJ system (see Aravind, L. & Koonin, E. V. (2001) Genome research 11(8), 1365 - 1374; lliakis, G. etai, (2004) Cytogenetic and Genome Research, 104(1-4): 14 - 20; Bowater, R. & Doherty, A. J., (2006) PLoS Genetics, 2(2), e8; Cui, L. & Bikard, D. (2016) Nucleic acids research, 44(9): 4243-4251). Hence, a donor DNA is required to survive the lethal DSBs (Mougiakos, I. et ai, (2016) Trends in Biotechnology, 34(7): 575 - 587; Jiang, W. etai, (2013) Nature biotechnology, 31(3): 233; Barrangou, R. & van Pijkeren, J. P., (2016) Current Opinion in Biotechnology 37: 61 - 68). However, the low HDR efficiency often results in severe loss of transformation efficiency, especially in the case of non-model organisms, or escape mutant/mixed genotypes (see Wang, Y. et ai, (2015) Journal of biotechnology, 200: 1 - 5; Huang, H. etai, (2016) ACS
Synthetic Biology, 5(12): 1355 - 1361; Li, Q. etal., (2016) Biotechnology Journal, 11(7): 961-972). As such, CRISPR-mediated DSBs are predominantly leveraged as a counter selection system to kill the unedited cells, after native or phage-recombinase-assisted homologous recombination (see Li, Q. et ai, (2015) Metabolic engineering, 31: 13 - 21; Tong, Y. etal., (2015) ACS Synthetic Biology, 4(9), 1020 - 1029; Yu, J. etal., 2015 Appl. Environ. Microbiol. DOI: 10.1128/AEM.04023-14; Wang, Y. et al., (2016) ACS Synthetic Biology, 5(7), 721-732).
Exogenous DNA template is thus always required, which may also provoke implications with GMO regulation. Similarly, in mammalian cells, HDR is inefficient, highly restricted on cell type/state and outcompeted by the active NHEJ, which generates a strong, undesired genotypic background. Therefore, DSB-triggered NHEJ is mostly applied for gene disruption (see Chapman, J. R. etal., (2012) Molecular Cell, 47(4): 497 - 510; Cox, D. B. T. etal., (2015) Nature Medicine, 21(2): 121; Paquet, D. etal., Nature (2016) 533(7601): 125; Lin, S. et al., (2014) elife, 3, e04766). Nevertheless, the vast majority of human diseases is caused by point mutations, implying the need for precise and efficient modification of a single base pair to reverse these pathogenic single nucleotide polymorphisms (SNPs) (see Goodwin, S. et ai, (2016) Nature Reviews Genetics, 17(6): 333; Koisicki, M. et ai, (2018) Nature Biotechnology, 36(8): 765;
Landrum, M. J. et ai, (2015) Nucleic Acids Research, 44(D1), D862 - D868; Landrum, M. J. et ai, (2013) Nucleic Acids Research, 42(D1), D980 - D985).
The first scientific publication concerning base editing was Komor, A.C., et ai (2016) “Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage” Nature 533: 420-424. Komor et ai, used a base editor comprising a dead SpyCas9 (dSpyCas9) fused to a cytidine deaminase in conjunction with an appropriate guide RNA (gRNA). Various cytidine deaminases were tried, and a bacteriophage-derived uracil DNA glycosylase inhibitor (UGI) was fused to the C-terminus of nCas9 (nickase), inhibiting the reversion of the U:G pair back to the original C:G pair by cellular uracil DNA glycosylases. In four transformed human and murine cell lines, the base editors manipulate the cellular DNA repair response to favour desired base-editing outcomes, resulting in permanent correction of ~15 - 75% of total cellular DNA with minimal (typically £ 1%) indel formation.
Since Komor, A. C. etai (2016) supra, there have been further publications of base editing technology. The following scientific publications concern base editing in bacteria, using base editors guided to the desired target site(s) by appropriate guide RNAs. Many have employed a cytidine deaminase for generating a site-specific mutation or generation of premature stop codons in genes of bacterial species. Gene function is
thereby investigated without lethal DSBs or the requirement of foreign DNA template and the dependence on additional or host-specific factors.
Banno, S. eta!., (2018) “Deaminase-mediated multiplex genome editing in Escherichia coii" Nature Microbiology, 3(4): 423 - 429 reports on a cytidine deaminase from Petromyzon marinus (PmCDAI) fused to the C-terminus of nuclease-deficient SpyCas9 (dSpyCas9) via a 121 amino acid linker. This achieved specific point mutagenesis at the target sites in E. coii by introducing cytosine mutations without compromising cell growth. Cytosine-to-thymine substitutions were induced mainly within an approximately five-base window of target sequences on the protospacer adjacent motif- distal side. UGI in combination with a degradation tag (LVA tag) resulted in a robustly high mutation efficiency, which allowed simultaneous multiplex editing of six different genes.
The editing window was from -15 to -20 positions from the PAM, in E. coii.
Zheng, K., etal. (2018) “Highly efficient base editing in bacteria using a Cas9- cytidine deaminase fusion” Commun, Biol. 1, 32 describes a nickase Cas9 (nSpyCas9D10A)-cytidine deaminase fusion protein to direct the conversion of cytosine to thymine within prokaryotic cells, resulting in high mutagenesis frequencies in E. coii and Brucella melitensis. In this case the cytidine deaminase was a deaminase from the apolipoprotein B mRNA-editing complex (APOBEC) family of deaminases, specifically the rat APOBEC1 (rAPOBECI) fused to the N-terminus of nSpyCas9. UGI followed by the LVA tag were fused to the C-terminus of nSpyCas9, forming the so-called base editor 3 (BE3) system.
Gu, T et ai, (2018) “Highly efficient base editing in Staphylococcus aureus using an engineered CRISPR RNA-guided cytidine deaminase” Chemical Science, 9(12): 3248 - 3253. This publication describes a fusion of a Cas9 nickase (nSpyCas9D10A) and a cytidine deaminase (rAPOBECI) that was guided to a target genomic locus for gene inactivation via generating a premature stop codon. The base editor was highly efficient in generating gene inactivation and point mutations in drug-resistant strains of S. aureus which were the subject of study.
Chen, W. etal., (2018) “CRISPR/Cas9-based Genome Editing in Pseudomonas aeruginosa and Cytidine Deaminase-Mediated Base Editing in Pseudomonas Species” iScience 6: 222 - 231 discloses a fusion of the cytidine deaminase rAPOBECI and the nSpyCas9D10A. An editing system (pnCasPA-BEC) is produced which enables highly efficient gene inactivation and point mutations in a variety of Pseudomonas species, such as P. aeruginosa, Pseudomonas putida, Pseudomonas fluorescens, and Pseudomonas syringae.
Wang, Y. et al., (2018) “MACBETH: Multiplex automated Corynebacterium glutamicum base editing method” Metabolic Engineering 47: 200 - 210 describes a multiplex automated C. glutamicum base editing method (MACBETH) using dead SpyCas9 (dSpyCas9) or nSpyCas9D10A and activation-induced cytidine deaminase (AID) without foreign DNA templates, achieving single-, double-, and triple-locus editing with efficiencies up to 100%, 87.2% and 23.3%, respectively.
Wang, Y. etal., (2018) “CRISPR-Cas9 and CRISPR-Assisted Cytidine Deaminase Enable Precise and Efficient Genome Editing in Klebsiella pneumoniae” Applied and Environmental Microbiology Vol 84 issue 23 e01834-18 discloses a cytidine base-editing system (pBECKP) for precise C T conversion in both the chromosomal and plasmid- borne genes by a fusion of the cytidine deaminase rAPOBECI and a Cas9 nickase (nSpyCas9D10A). By using both the pBECKP and a lambda Red recombination system (pCasKP-pSGKP), the Jb/aKPc gene was confirmed in K. pneumoniae to be the major factor that contributed to the carbapenem resistance of a hypermucoviscous carbapenem- resistant K. pneumoniae strain.
Wang, Y. et al., (2019) “Expanding targeting scope, editing window, and base transition capability of base editing in Corynebacterium glutamicum" Biotechnology and Bioengineering vol 116, issue 11 pages 3016 - 3029 reports on how four Cas9 variants (nVRER-Cas9(D10A), nxCas9 3.7, nCas9-NG and nSpyCas9 (D10A)) accepting different protospacer adjacent motif (PAM) sequences were used to increase the genome-targeting scope of bacterial base editing. The authors found that the PAM requirement of bacterial base editing could be relaxed from NGG to NG using the Cas9 variants in C. glutamicum. Truncated or extended guide RNAs were employed to expand the canonical 5-bp editing window to 7-bp. Also, bacterial adenine base editing was also achieved with a Cas9 fused to adenosine deaminase.
Tong T. etal., (2019) “Highly efficient DSB-free base editing for streptomycetes with CRISPR-BEST” PNAS 116(41) 20366 - 20375 describes both cytidine deaminase base editors and an adenosine deaminase base editor. The nickase SpyCas9 (D10A) was together with the deaminase. The editing systems were tested at various gene loci in bacterial species Streptomyces coelicolor, S. griseofuscus, and S. collinus T0365 (used in multiplex).
Kim, Y., et al. (2017) “Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions” Nature Biotechnology vol 35, pages 371-376 describes a fusion protein containing a dSpyCas9, a cytidine deaminase (rAPOBECI) and an inhibitor of base excision repair (UGI). Five C to T (or G to A) base editors were produced which use natural and engineered Cas9 variants with
different protospacer-adjacent motif (PAM) specificities to expand the number of sites that can be targeted by base editing. Additionally, base editors were engineered containing mutated cytidine deaminase domains that narrow the width of the editing window from ~5 nucleotides to as little as 1 - 2 nucleotides. This enabled discrimination of neighbouring C nucleotides, which would otherwise be edited with similar efficiency, and doubled the number of disease-associated target Cs able to be corrected preferentially over nearby non-target Cs.
Li, Q., etal. (2019) “CRISPR-Cas9D10A nickase-assisted base editing in the solvent producer Clostridium beijerinckii.” Biotechnology and Bioengineering vol 116 pages 1475- 1483 discloses a base editor (pCBEclos) for use in C. beijerinckii based on the fusion of cytidine deaminase (rAPOBECI), nSpyCas9 (D10A) and UGI. By appropriate choice of target sequence, conversion of C G to T A. was capable of creating missense mutation or null mutations in a gene. The efficiency of the editing system was such that editing could be sometimes achieved directly obtained following transformation or usually by requiring only one single re-streaking step.
Zhao, Y., etal. “Multiplex genome editing using a dCas9-cytidine deaminase fusion in Streptomyces.” Sci. China Life Sci. (2019) https://doi.org/10.10Q7/s11427-019-1559-v discloses a base editor (dCas9-CDA-ULstr) comprising cytidine deaminase from Petromyzon marinus (PmCDAI), a nuclease-deficient Cas9 (dSpyCas9), the UGI and the protein degradation tag (LVA tag). The base editor was used to make single-, double- and triple-point C to T mutations at target sites in S. coelicolor with efficiency up to 100%, 60% and 20%, respectively. The base editor was also highly efficient in the industrial strain, Streptomyces rapamycinicus, which produces the immunosuppressive agent rapamycin. Compared with base editors derived from the cytidine deaminase rAPOBECI, the PmCDAI -assisted base editor dCas9-CDA-ULstr could edit cytosines preceded by guanosines with high efficiency which is advantageous for editing Streptomyces genomes (with high GC content).
Luo Y., etal., (2020) Microb. Cell Fact. 19: 93 describe cytosine base editors (CBEs) made by ligating CDA1 and UGI to the carboxy terminus of dCas9 or nCas9D10A; and adenine base editors (ABEs) made by linking a codon-optimized TadA-TadA*(opt) to the amino terminus of dCas9 or nCas9D10A. These CBEs and ABEs are robust base editing systems for Rhodobacter sphaeroides 2.4.1 that allowed the efficient modification of multiplex genes in a stringent and chemically inducible manner.
Zhong Z. etal., (2019) bioRxiv doi: http://dx.doi.Org/10.1101/630137 describes cytidine base editors (CBEs) and adenine base editors (ABEs) used in Streptomyces which enable targeted C-to-T or A-to-G nucleotide substitutions, respectively, bypassing
DSB and the need for a repair template. Successful genome editing is reported for Streptomyces at frequencies of around 50% using defective Cas9-guided base editors and up to 100% by using nicked Cas9-guided base editors. Multiplexing is also described for a nicked Cas9-guided base editor BE3 and programmed mutation of nine target genes simultaneously. The high-fidelity version of BE3 (HF-BE3) used also improved editing specificity.
Also following on from Komor, A. C. et al., (2016) supra, the state of the art also includes base editing carried out in eukaryotes, such as in yeast, plants, mice, and human cells. Such base editing can directly, efficiently and precisely invert devastating point mutations whilst limiting the formation of DSB-mediated by-products.
Komor, A. C. et al., (2017) “Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purity” Science Advances 3(8), eaao4774 describes “fourth-generation” base editors (BE4 and SaBE4) which are made by fusing BE3, BE4, SaBE3, or SaBE4 to Gam, a bacteriophage Mu protein that binds DSBs and greatly reduces indel formation during base editing, in most cases to below 1.5%, and further improves product purity. The BE4 and SaBE4 editors were used to edit a haploid human cell line. As is used hereinafter, the term of art “BE” represents a cytidine deaminase and a catalytically defective (i.e. nuclease deficient) Cas9.
Nishida, K. et al., (2016) “Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems” Science 353 (6305): aaf8729 describes the activation-induced cytidine deaminase (AID) ortholog of PmCDAI engineered to form a synthetic complex (Target-AID) with nickase SpyCas9 (D10A). Specific point mutation was induced primarily at cytidines within the target range of five bases. Although the editor was highly effective in yeasts, it also induced insertion and deletion (indel) in mammalian cells. UGI however suppressed the indel formation and improved the efficiency.
Shitamani, Z. et al., (2017) “Targeted base editing in rice and tomato using a CRISPR-Cas9 cytidine deaminase fusion” Nature Biotechnology, 35(5): 441 - 443 describes the editor Target-AID for point mutagenesis at genomic regions specified by sgRNAs in the two crop plants. In rice, multiple herbicide-resistance point mutations were introduced by multiplexed editing using herbicide selection, whilst in tomato marker-free plants with homozygous heritable DNA substitutions were generated, demonstrating the feasibility of this example of base editing for crop improvement.
Kim, K. et al., (2017) “Highly efficient RNA-guided base editing in mouse embryos” Nature Biotechnology, 35(5): 435 - 437 describes electroporation or microinjection of BE mRNA or ribonucleoproteins targeting the Dmd or Tyr gene into mouse zygotes. Fo mice
showed nonsense mutations with an efficiency of 44-57% and allelic frequencies of up to 100%, demonstrating an efficient method to generate mice with targeted point mutations.
Rees, H. A. et al., (2017) “Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery” Nature Communications 8: 15790 describes reducing off-target base editing by installing mutations into a third-generation base editor (BE3) to generate a high-fidelity base editor (HF-BE3). Purified BE3 and HF- BE3 are delivered as ribonucleoprotein (RNP) complexes into mammalian cells, establishing DNA-free base editing. RNP delivery of BE3 confers higher specificity even than plasmid transfection of HF-BE3, while maintaining comparable on-target editing levels. BE3 RNPs are delivered into both zebrafish embryos and the inner ear of live mice to achieve specific, DNA-free base editing in vivo.
Jiang, W. etal., (2018) “BE-PLUS: a new base editing tool with broadened editing window and enhanced fidelity” Cell Research 28(8): 855 - 861 describes a modified BE in (BE-PLUS) made by fusing 10 copies of GCN4 peptide to nCas9(D10A) for recruiting scFv-APOBEC-UGI-GB1 to the target sites. The modified system tested in HEK293FT cells achieved base editing with a broadened window, resulting in an increased genome targeting scope with fewer unwanted indels and non-C-to-T conversions.
Sasaguri, H. etal., (2018) “Introduction of pathogenic mutations into the mouse Psenl gene by Base Editor and Target-AID” Nature Communications 9(1): 2892 describes how either BE or Target-AID mRNA are injected, together with identical sgRNAs into mouse zygotes. Both BE and Target-AID were therefore found useful to generate mice harbouring pathogenic point mutations in vivo.
Zafra, M. P. et al., (2018) “Optimized base editors enable efficient editing in cells, organoids and mice” Nature Biotechnology 36(9): 888 - 893 describes reengineered sequences of BE3, BE4Gam, and xBE3 by codon optimization and incorporation of additional nuclear-localization sequences. The reengineered base editors enabled target modification in a wide range of mouse and human cell lines, and intestinal organoids. The optimized base editors also mediated efficient in vivo somatic editing in the liver in adult mice.
Nishimasu, H. etal., (2018) “Engineered CRISPR-Cas9 nuclease with expanded targeting space” Science 361(6408): 1259 - 1262 describes a rationally engineered SpCas9 variant (SpCas9-NG) that can recognize relaxed NG PAMs. The SpyCas9-NG induced indels at endogenous target sites bearing NG PAMs in human cells. Fusion of SpCas9-NG and the activation-induced cytidine deaminase (AID) mediated C-to-T conversion at target sites with NG PAMs in human cells.
Tian, S. etal., (2018) “CRISPR-Cas9 Based Engineering of Actinomycetal Genomes” ACS Synth. Biol. 4(9), 1020-1029 describes a Cas9 base editor which was successfully used to target and deactivate two genes, actlORFI (SCO5087) and actVB (SCO5092), from the actinorhodin biosynthetic gene cluster in Streptomyces coelicolor A3(2). When templates for HDR were provided at the same time as base editing, precise deletions of the targeted gene were observed with near 100% frequency.
Tan J., etal., (2019) Nature Comm. 10: 439 reports a fusion of PmCDAI to the C- terminus of nSpyCas9 through a 16 amino acid linker (XTEN), exhibiting editing from -16 to -19 positions from the PAM, in yeast. When the linker was completely removed, both the activity window and editing efficiency remained unaltered. Also, PmCDAI was fused to the N-terminus of nSpyCas9 through a 16 amino acid linker (XTEN), exhibiting editing in a wider window (-13 to -21 positions from the PAM) and similar efficiencies to the C- terminus fusion, in yeast. Similarly, when the linker was completely removed, both the activity window and editing efficiency remained unaltered, suggesting that the termini of CDA1 are inherently flexible and may act as linker-like sequences.
W02020/081568 UNIVERSITY OF MASSACHUSETTS (‘Programmable DNA Base Editing By Nme2cas9-deaminase Fusion Proteins’) describes novel tools for base-editing in e.g. HEK293T, K562 or C57BL/6NJ mouse cells using a mesophilic, type ll-C Cas9 variant from Neisseria meningitidis (Nme2Cas9) that uses a 22-24 nt spacer to edit sites adjacent to an N4CC PAM. Specifically, C-to-T conversion is mediated by the nNme2Cas9-CBE4 (also called (C)BE4-nNme2Cas9(D16A)-UGI-UGI) editor or its optimised version YE1-BE3-nNme2Cas9(D16A)-UGI. These editors comprise either the wild-type or a mutant (called YE1) rat APOBEC1 cytidine deaminase enzyme, respectively. The potential base editing window of YE1-BE3-nNme2Cas9(D16A) is from nucleotides 2-8 in the displaced DNA strand, counting the nucleotide at the 5’ (PAM-distal) end of the 20 nt protospacer as nucleotide #1 , which corresponds to positions -13 to -19 relative to the PAM. Moreover, A-to G conversion is mediated by the ABE7.10 nNme2Cas9(D16A) editor or its optimised version nNme2Cas9(D16A)-ABEmax.
So far, to summarise, the most popular base editing systems are the “Target-AID” (activation-induced cytidine deaminase) and the “BE” (Base Editor). They comprise a fusion between the catalyti cally deactivated or nickase variant of the Streptococcus pyogenes Cas9 (dSpyCas9, nSpyCas9) with a cytidine deaminase enzyme (converts OG into T·A base pair), such as the Petromyzon marinus cytosine deaminase PmCDAI and its human orthologue (“Target-AID”) or the rat APOBEC1 (“BE”) (see Komor, A. C. etal., (2016) supra ; Nishida, K. etal., (2016) Science 353(6305): aaf8729. In addition, a uracil DNA glycosylase inhibitor (UGI) from bacteriophage PBS is usually included for higher
editing efficiencies (see Komor, A. C. et al., (2016) supra). However, each of these base editors impose a NGG PAM downstream of the target region, the OG base pair within a narrow activity window at the PAM-distal end of the protospacer as well as, in the case of prokaryotes, the generation of a stop codon at this specific position. Moreover, the rAPOBECI exhibits sequence context preferences, presenting low efficiencies at GC motifs (see Komor, A. et al., (2016) supra and Komor, A. et al., (2017) supra). Low product purity has also been reported, caused by unexpected C- to non-T editing (Komor A. et al., (2016) supra ; Nishida, K. etal., (2016) Science, 353(6305), aaf8729; Hess, G. T. et al., (2016) Nature Methods 13(12): 1036; Kim, K. et al., (2017) Nature Biotechnology 35(5): 435; Ma, Y. etal., (2016) Nature Methods 13(12): 1029), indel formation by indel- prone end-joining processes (see Komor, A. etal., (2017) supra), and undesired off-target events (Komor, A. et al., (2016) supra; Nishida, K. etal., (2016) Science 353(6305): aaf8729; Gaudelli, N. M. etal., (2017) Nature, 551(7681): 464; Rees, H. A. etal., (2017) Nature Communications, 8: 15790).
Also known in the art are adenosine deaminase-based editors. Zhang, Y., etal., (2020) “Programmable adenine deamination in bacteria using a Cas9-adenine-deaminase fusion” Chem. Sci. Vol 11 , pages 1657-1664 describes a gene editor for use with gRNA for targeting and directly converting adenine to guanine in bacterial genomes. The gene editor is a fusion of an adenine deaminase and nSpyCas9 (D10A). The method achieves the conversion of adenine to guanine via an enzymatic deamination reaction and a subsequent DNA replication process rather than HR, which is utilized in conventional bacterial genetic manipulation methods. A systematic screening successfully targeted the possibly editable adenine sites of cntBC, the importer of the staphylopine/metal complex in Staphylococcus aureus.
Mougiakos, I. “Enhancing CRISPRi specificity employing active ThermoCas9 and Acrlld” 3rd International Conference on CRISPR Technologies, Wurzburg, Germany 18th September 2019 describes how both ThermoCas9 and GeoCas9 are in vivo active at 37°C and can be used for introducing dsDNA breaks in E. coli, in a tunable and spacer- dependent manner. In addition, anti-CRISPR protein AcrIICINme traps these Cas9 endonucleases in vivo in a DNA-bound, catalyti cally inactive state, robustly inhibiting targeting and resulting in a transcriptional silencing that is comparable to their catalytically "dead" variants (Thermo-dCas9 and Geo-dCas9).
Garcia. B., etal., (2019) Cell Reports-D-19-01748 (available at https://ssrn.com/abstract=3385124 or http://dx.doi.org/10.2139/ssrn.3385124 describes experiments using E. coli phage Mu plaque assays and shows that Acrlld inhibits various type IIC Cas9 nucleases (Nme1Cas9, HpaCas9, BoeCas9, GeoCas9, CjeCas9, KlaCas9)
and a type IIA Cas9 nuclease (SauCas9). No inhibition was observed for the type IIC CdiCas9, the type IIA SpyCas9 and the type MB FnoCas9.
In general, there are difficulties in base editing when seeking to create stop codons using existing SpyCas9 base editors. One is the particular PAM recognition sequence of 5’-NGG-3’ which limits the target genomic regions. Another is the presence of cytidines within the protospacer when what is required is that the cytidines must be within the editing window, which for SpyCas9 is narrow (around -14 to -20 distal of the PAM).
SUMMARY OF THE INVENTION
Accordingly, in a first aspect the present invention provides a base editor (Ύ) comprising:
(a) a Cas9 lacking endonuclease activity and not generating DNA double strand breaks;
(b) a deaminase; wherein:
(i) the Cas9 is a Geobacillus stearothermophilus Cas9 (GeoCas9) having an amino acid sequence of SEQ ID NO: 1 or a sequence of at least 77% identity therewith; or
(ii) the Cas9 is a Geobacillus thermodenitrificans Cas9 (ThermoCas9) having an amino acid sequence of SEQ ID NO: 3 or a sequence of at least 77% identity therewith.
Advantageously, the base editors (I) of this invention have a base editing window of up to 19 nucleotides in the protospacer region, distal of the PAM. This is a much wider editing window compared to other known base editors. As such, the base editors of the invention provide an increased possibility and flexibility in obtaining the desired edits in a given target sequence. For example, in generation of stop codons in a target gene sequence. Aside from introducing stop codons, base editors of the invention introduce multiple nucleotide substitutions, which can be beneficial for random mutagenesis studies but also for generating targeted modifications obtainable by the base conversion. Without wishing to be bound by any particular theory, the inventors believe that the mechanism of appropriate base substitution (giving rise on transcription and translation to amino acid substitutions) may be as shown in the Figure 1e of Tong T. eta!., (2019) PNAS 116(41) 20366 - 20375.
Moreover, the base editors provide alternative PAM specificities, thus increasing the number of targetable sites. They also expand the range of targeted hosts, as they are the first base editors comprised of thermotolerant CRISPR variants, implying their possible application in not only mesophilic but also thermophilic prokaryotes. In addition, the base editors theoretically extend multiplexing possibilities for pairwise combinations of Cas9 orthologs with orthogonal guides. Thanks to their much smaller size (1087 GeoCas9,
1082 aa ThermoCas9) compared to the most widely used SpyCas9 (1368 aa)-mediated base editors, they are also expected to facilitate deliverability to eukaryotic hosts (overall size ~4.5 kb < ~4.7 kb adeno-associated virus cargo). Finally, their longer PAM (more restrictive requirements) and extended spacer length may lead to limited off-target editing, as observed in genome editing applications with other type II orthologues (Lee CM, Cradick TJ, and Bao G. (2016) “The Neisseria meningitidis CRISPR-Cas9 system enables specific genome editing in mammalian cells.” Mol. Ther. 24: 645 - 654; Amrani, N. etai, (2018) “NmeCas9 is an intrinsically high-fidelity genome-editing platform.” Genome biology 19(1): 1 - 25; Friedland et ai, (2015) “Characterization of Staphylococcus aureus Cas9: a smaller Cas9 for all-in-one adeno-associated virus delivery and paired nickase applications.” Genome biology, 16(1): 257; Kim E etai., (2017) “In vivo genome editing with a small Cas9 orthologue derived from Campylobacter jejuni." Nat. Commun. 8:14500; Chen, F. etai., (2017) “Targeted activation of diverse CRISPR-Cas systems for mammalian genome editing via proximal CRISPR targeting.” Nat. Comm. 8(1): 1 - 12).
In general, the unexpected effect of the invention can be said to lie in an increased specificity and range of possible base edits, a much wider window of editing possibility than with known Cas9 and Cas12a base editors, and a much wider temperature range of operation.
The GeoCas9 is preferably (i) a dead GeoCas9 (dGeoCas9), or (ii) a modified GeoCas9; more preferably the GeoCas9 comprises DNA single strand nickase activity.
In particular embodiments, the invention provides a base editor as herein defined, wherein the ThermoCas9 is a dead ThermoCas9 (dThermoCas9) or a modified ThermoCas9, e.g. having DNA single strand nickase activity.
The deaminase may be a cytidine deaminase; optionally wherein the human Target-AID (activation-induced cytidine deaminase), its orthologue from the sea lamprey Petromyzon marinus (PmCDAI) or the rat APOBEC1 (“rAPOBECI, BE”). In such cases, the base editor may further comprise at least one uracil DNA glycosylase inhibitor (UGI).
There is preferably a linker between (i) the Cas9 and the UGI; and/or (ii) the Cas9 and the cytidine deaminase; and/or (iii) the cytidine deaminase and the UGI. The linker is the amino-acid sequence that connects the Cas9 enzyme with the deaminase enzyme in order to make the desired chimeric protein. However, there is a study in yeast (see Tan. J.
etai, (2019) "Engineering of high-precision base editors for site-specific single nucleotide replacement” Nature Communications 10: 439) which found that even complete elimination of a linker in a base-editing system similar to that of the present invention did not alter the editing window and the editing efficiency of the base editor.
Possible linkers of use in the present invention may be selected from one or more of SH3 and (x2)3xFLAG tag, SH3 and IxFLAG tag, XTEN.
There may be two UGIs.
Alternatively, the deaminase may be an adenine deaminase.
In preferred embodiments, the Cas9 is a Geobacillus stearothermophilus Cas9 (GeoCas9) having an amino acid sequence of SEQ ID NO: 1 or a sequence of at least 86% amino acid identity therewith; or the Cas9 is a Geobacillus thermodenitrificans Cas9 (ThermoCas9) having an amino acid sequence of SEQ ID NO: 3 or a sequence of at least 86% amino acid identity therewith.
The Cas9 may be one which has a PAM sequence recognition preference selected from NNNNCRAA, NNNNCVAA or NNNNCCCA.
In certain embodiments, the Cas9 is a dead GeoCas9 (dGeoCas9) and the deaminase is PmCDAI (TargetAID) or human CDA1. When the Cas9 is dGeoCas9 and the cytidine deaminase is PmCDAI, advantageously there is a window of based editing activity from -10 to -24 positions; i.e. up to 15 residues, or from -5 to -24; i.e. up to 20 residues (for 1-step and 2-step incubation respectively), making a total possible window of -5 to -24, i.e. up to 20 residues. There is also an editing efficiency of up to 100%. Also found is a stronger preference for Cs at the PAM distal end (similar to dSpyCas9 T argetAI D).
In other embodiments, the Cas9 is a dead ThermoCas9 (dThermoCas9) and the deaminase is PmCDAI (TargetAID) or human CDA1. When the Cas9 is dThermoCas9 and the cytidine deaminase is PmCDAI, advantageously there is a window of based editing activity from -6 to -27 positions; i.e. up to 22 residues, or from -5 to -27; i.e. up to 23 residues (for 1-step and 2-step incubation respectively), making a total possible window of -5 to -27, i.e. up to 23 residues
In other embodiments, the Cas9 is a nickase ThermoCas9 (e.g. nThermoCas9D8A) and the deaminase is rAPOBECI; preferably wherein the Cas9 has a PAM sequence recognition preference of NNNNCCAA. rAPOBECI -nThermoCas9(D8A)- UGI-UGI (nThermoCas9-BE4) advantageously when tested in human cells has an editing window of activity from -5 to -29 positions, i.e. up to 25 residues, which is wider than nSpyCas9 (D10A) mediated BE4 system (-13 to -18) or nSauCas9 (D10A) mediated BE4 system (-10 to -18).
For enhanced efficiency, nickase GeoCas9 and nickase ThermoCas9 base editor embodiments of the invention may be used for editing in mammalian cells; dead GeoCas9 and dead ThermoCas9 base editor embodiments of the invention may provide less efficient base editing in mammalian cells,
By way of additional advantage, base editors of the invention appear to have lesser level of off-target events compared to counterpart base editors known in the art.
In another aspect the invention provides a base editor (“II”) comprising:
(a) a catalytically active Cas9 for generating DNA double strand breaks;
(b) a deaminase; wherein:
(i) the Cas9 is a Geobacillus stearothermophilus Cas9 (GeoCas9) having an amino acid sequence of SEQ ID NO: 9 or a sequence of at least 77% identity therewith; or
(ii) the Cas9 is a Geobacillus thermodenitrificans Cas9 (ThermoCas9 ) having an amino acid sequence of SEQ ID NO: 11 or a sequence of at least 77% identity therewith.
The deaminase may be a cytidine deaminase; optionally wherein the human Target-AID (activation-induced cytidine deaminase), its orthologue from the sea lamprey Petromyzon marinus (PmCDAI) or the rat APOBEC1 (“BE”).
The base editor may further comprise at least one uracil DNA glycosylase inhibitor (UGI). There may be two UGIs, but the inventors expect that more than two UGIs may be toxic to cells.
Preferably there is a linker between (i) the Cas9 and the UGI; and/or (ii) Cas9 and the cytidine deaminase; and/or (iii) the cytidine deaminase and the UGI.
Alternatively, the deaminase may be an adenine deaminase.
In preferred embodiments, the Cas9 is a Geobacillus stearothermophilus Cas9 (GeoCas9) having an amino acid sequence of SEQ ID NO: 9 or a sequence of at least 86% amino acid identity therewith; or the Cas9 is a Geobacillus thermodenitrificans Cas9 (ThermoCas9 ) having an amino acid sequence of SEQ ID NO: 11 or a sequence of at least 86% amino acid identity therewith.
The Cas9 may have a PAM sequence recognition preference selected from NNNNCRAA, NNNNCVAA or NNNNCCCA.
In certain embodiments, the Cas9 is an active GeoCas9 (GeoCas9) and the deaminase is e.g. PmCDAI (TargetAID), where advantageously there is a window of based editing activity from -3 to -28 positions; i.e. up to 26 residues, or -9 to -24; i.e. up to 16 residues (for 1-step and 2-step incubation respectively), making a total possible window of -3 to -28, i.e. up to 26 residues.
In other embodiments, the Cas9 is an active ThermoCas9 (ThermoCas9) and the deaminase is e.g. PmCDAI (TargetAID), where advantageously there is a window of based editing activity from -9 to -23 positions; i.e. up to 15 residues, or -5 to -27; i.e. up to 20 residues (for 1-step and 2-step incubation respectively), making a total possible window of -5 to -27, i.e. up to 23 residues
In any of the aforementioned base editor aspects of the invention, the base editor may be a fusion protein generated by expression of a polynucleotide encoding the protein components in a suitable cell. An expressed fusion protein base editor may be isolated including at least partially purified. Such isolated base editor may be used in certain methods of the invention as described hereinafter.
The invention also provides a polynucleotide (I) encoding a base editor (I) as hereinbefore defined. The invention includes an expression vector (I) comprising the polynucleotide (I). The expression vector (I) may further comprise a polynucleotide encoding a guide RNA (gRNA) which targets a DNA sequence.
The invention also provides a polynucleotide (II) encoding a base editor (II) as hereinbefore defined. The invention includes an expression vector (II) comprising the polynucleotide (II) and optionally an anticrispr protein gene, e.g. the acrllClNme gene. As will be understood by a person of skill in the art, when an anticrispr gene is present in the same expression vector as the base editor, the anticrispr protein is expressed as a separate protein and not fused with the base editor. The expression vector (II) may further comprise a polynucleotide encoding a gRNA which targets a DNA sequence.
The expression vector (II) may further comprise an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand.
The invention further provides a system for base editing of a target DNA sequence, comprising expression vector (II) as a first expression vector; a second expression vector comprising a polynucleotide encoding (a) an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand, and (b) a gRNA for a target DNA sequence.
The invention also provides a system for base editing of a target DNA sequence, comprising expression vector (II) as a first expression vector; a second expression vector comprising a polynucleotide encoding an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand; and a third expression vector comprising a polynucleotide encoding a gRNA for a target DNA sequence.
The invention also provides a system for base editing of a target DNA sequence, comprising expression vector (II) as a first expression vector; a second expression vector
comprising a polynucleotide encoding an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand. In this aspect of the system of the invention, a gRNA is provided directly as a gRNA molecule.
The invention further provides a system for base editing of a target DNA sequence comprising a base editor (I), and a gRNA for a target strand DNA.
The invention also provides a system for base editing of a target DNA sequence comprising a base editor (II), and a gRNA for a target strand DNA. Such a system may further comprise an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand.
A ribonucleoprotein complex (I) comprising a base editor (I) and a gRNA for a target DNA strand.
A ribonucleoprotein complex (II) comprising a base editor (II), and a gRNA for a target DNA strand an anti-CRISPR protein.
The invention therefore provides a method of base editing comprising transforming a cell with a first expression vector (I); a second expression vector comprising a polynucleotide encoding a gRNA for a target DNA sequence. Optionally, the first expression vector (I) further comprises a polynucleotide encoding a gRNA which targets a DNA sequence.
The invention includes a method of base editing comprising transforming a cell with a first expression vector (I); and introducing into the cell a gRNA for a target DNA sequence.
The invention further provides a method of base editing comprising transforming a cell with a first expression vector (II); a second expression vector comprising a polynucleotide encoding (a) an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand, and (b) a guide RNA for a target DNA sequence.
The invention includes a method of base editing comprising transforming a cell with a first expression vector which is an expression vector (II); a second expression vector comprising a polynucleotide encoding an anti-CRISPR protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand; and a third expression vector comprising a polynucleotide encoding a guide RNA for a target DNA sequence. Optionally, the first expression vector (II) further comprises a polynucleotide encoding a gRNA which targets a DNA sequence.
The invention also provides a method of base editing comprising transforming a cell with a first expression vector (II); a second expression vector comprising a polynucleotide encoding an anti-CRISPR protein that prevents DNA double strand
cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand; and introducing into the cell a gRNA for a target DNA sequence.
The invention also provides a method of base editing comprising transforming a cell with a first expression vector which is an expression vector (II); and introducing into the cell a gRNA for a target DNA sequence.
A method of base editing as described herein may be carried out in cells ex vivo or in vitro.
In accordance with any of the aforementioned methods of the invention, after transformation of cell(s), expression is induced in the cell(s) for a period, following which genetic material in the cells is analysed to identify base edited cells, e.g. by polymerase chain reaction (PCR) using at least one suitable primer pair, purification and Sanger sequencing.
Included in the invention is a method of base editing, comprising exposure DNA to (a) a base editor (I), and (b) a gRNA for a target strand DNA.
Also included in the invention is a method of base editing, comprising exposure DNA to (a) a base editor (II), (b) a gRNA for a target strand DNA, and (c) an anti-CRISPR protein.
The invention further provides a method of base editing, comprising exposure of DNA to any ribonucleoprotein complex as hereinbefore defined. This may be DNA in vitro, which may be isolated DNA, or wherein the DNA is comprised in a cell.
In any of the aforementioned base editing methods of the invention involving an anti-CRISPR protein, this may be removed or inactivated, thereby providing a counter selection step for non-edited cells.
In any of the aforementioned expression vectors, systems, ribonucleoprotein complexes, or methods of the invention involving an anti-CRISPR protein, this may be the small anti-CRISPR protein from Neisseria meningitidis (AcrIICINme). This particular anti- CRISPR protein has been shown to be in vitro and in vivo active for a number of Cas9 nucleases (see Garcia B,. eta!., (2019) Supra).
Counter-selection when used has an advantage whereby multiple single nucleotide mutations occur; often up to 9 position in a single read, which increases the possibilities of generation of stop codon and inactivation of the gene. However, this does not necessarily generate clean mutants and so a re-streaking step may be needed.
In any of the aforementioned expression vectors, systems, ribonucleoprotein complexes or methods of the invention wherein the gRNA is a single guide RNA (sgRNA); the sgRNA may comprise a spacer having at least 5 mismatches at the 5’ end thereof in comparison with the targeted protospacer.
BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments of the invention are further described hereinafter with reference to the accompanying drawings, in which:
Figure 1 is a diagram comparing the base editing window for various known Cas9 base editors and those of the invention. The base editors referenced to the prior art in the diagram are as follows:
- 1 Banno, S. et ai, (2018) Nature Microbiology, 3(4): 423 - 429.
2Wang, Y. et ai, (2018) Metabolic Engineering 47: 200 - 210.
3Wang, Y. eta!., (2019) Biotechnology and Bioengineering vol 116, issue 11 pages 3016 - 3029.
- 4Zhao, Y., etal. (2019) Sci. China Life Sci. https://doi.org/10.1007/s11427-019- 1559-y
5Zheng, K., etal. (2018) Commun, Biol. 1, 32.
- 6Gu, T. et ai, (2018) Chemical Science, 9(12): 3248 - 3253.
- 7Chen, W. et ai, (2018) iScience 6: 222 - 231.
- 8Tong T. etal., (2019) PN AS 116(41) 20366 - 20375.
9Kim, Y., et al. (2017) Nature Biotechnology vol 35, pages 371-376.
- 10 W02020/0081568 UNIVERSITY OF MASSACHUSETTS
Figure 2 is a BLASTP alignment of amino acid sequences of ThermoCas9 with GeoCas9. This shows that ThermoCas9 presents 88% amino acid sequence similarity to GeoCas9.
Figure 3 is a ClustalW analysis. Sequences (1:2) Aligned. Score: 87.9852. This shows that ThermoCas9 presents 88% amino acid sequence similarity to GeoCas9.
Figure 4 is at table showing pBLAST results of Cas9 protein sequences compared to ThermoCas9.
Figures 5A and 5B are heatmaps of C*G to T·A base-editing by dGeoTarget-AID. The heatmaps depict the percentage of C*G to T·A conversion in every C-position within or immediately upstream of protospacers (y axis) in the genomes of DH10B_gfp single colonies transformed with the pdGeoTarget-AID_BE-G1/2/3/4 vectors (where BE-G1/2/3/4 represent the corresponding employed spacers). White boxes represent no base-editing, light to darker grey boxes represent increasing base-editing efficiencies, and black boxes represent 100% base-editing efficiency.
Figures 6A and 6B are heatmaps of OG to T·A base-editing by dThermoTarget- AID. The heatmaps depict the percentage of OG to T·A conversion in every C-position within or immediately upstream of protospacers (y axis) in the genomes of E. coli DH10B_g p single colonies transformed with the pdThermoTarget-AID_BE-T1/2/3/4/5/6 vectors (where BE-T1/2/3/4/5/6 represent the corresponding employed spacers). White boxes represent no base-editing, light to darker grey boxes represent increasing base editing efficiencies, and black boxes represent 100% base-editing efficiency.
Figures 7 A and 7B are heatmaps of OG to T·A base-editing by AcrGeoTarget- AID. The heatmaps depict the percentage of OG to T·A conversion in every Oposition within or immediately upstream of protospacers (y axis) in the genomes of E. coli DH10B_g p single colonies transformed with the pAcrGeoTarget-AID_BE-G1/2/3/4/5/6 vectors (where BE-G1/2/3/4/5/6 represent the corresponding employed spacers). White boxes represent no base-editing, light to darker grey boxes represent increasing base editing efficiencies, and black boxes represent 100% base-editing efficiency.
Figures 8A and 8B are heatmaps of OG to T·A base-editing by AcrThermoTarget-AID. The heatmaps depict the percentage of OG to T·A conversion in every Oposition within or immediately upstream of protospacers (y axis) in the genomes of E. coli DH10B_g p single colonies transformed with the pAcrThermoTarget-AID_BE- G1/2/3/4/5 vectors. White boxes represent no base-editing, light to darker grey boxes represent increasing base-editing efficiencies, and black boxes represent 100% base editing efficiency.
Figure 9 shows heatmaps of OG to T·A base editing by nThermoBE4 in HEK293T cells. The heatmaps depict the percentage of OG to T·A conversion in every Oposition within or immediately upstream of protospacers (y axis) in the genomes of HEK293T cell populations transfected with the pnThermoBE4_BE- TE1/TE2/TE3/TV1/TV2/TV E/T D1/TD2/TD3 vectors. White boxes represent no base editing, light to darker grey boxes represent increasing base-editing efficiencies, and black boxes represent 100% base-editing efficiency.
DETAILED DESCRIPTION
The inventors have generated novel tools for base-editing, e.g. in bacteria (Escherichia coli), using thermostable, type ll-C Cas9 variants from Geobacillus thermodenithficans T 12 (ThermoCas9) and Geobacillus stearothermophilus (GeoCas9) that use 23 nt spacer to edit sites adjacent to an N4CVAA/N4CCCA or N4CRAA PAM, respectively. More specifically, the inventors have investigated four novel editors that
comprise a cytidine deaminase enzyme from the sea lamprey Petromyzon marinus (PmCDAI): (a) dThermoTarget-AID, (b) dGeoTarget-AID, (c) AcrThermoTarget-AID, and (d) AcrGeoTarget-AID. Compared to known base editors, these systems exhibit much larger base editing windows: (a) from -5 to -27 positions relative to the PAM (23 bp); (b) from -5 to -24 positions relative to the PAM (20 bp); (c) from -5 to -27 positions relative to the PAM (23 pb); (d) from -3 to -28 positions relative to the PAM (26 bp). The first two editors employ the catalytically inactive ThermoCas9 or GeoCas9, respectively, while the last two editors co-express a small anti-CRISPR protein from Neisseria meningitidis (AcrIICINme) with active ThermoCas9 or GeoCas9, respectively. Use of the active ThermoCas9/GeoCas9 in combination with the anti-CRISPR protein allows for counter selection after base-editing to eliminate the unedited, weakly edited or edited only at the PAM-distal end (where mismatches are tolerated) colonies. Thus, this also allows an enrichment of edits at the PAM proximal end.
Additionally, described herein is a novel tool for C-to-T conversion in mammalian cells (named nThermoCas9-rAPOBEC1) that combines a nickase ThermoCas9 variant (D8A) with the rat APOBEC1 cytidine deaminase. This system also presents a much larger base editing window from -5 to -29 position relative to the PAM (25 bp).
In more detail, the inventors combined cytidine deaminase PmCDAI with wide- temperature range Cas9 orthologues from Geobacillus stearothermophilus (GeoCas9)
(see Harrington etai, (2017a) “A thermostable Cas9 with increased lifetime in human plasma” Nature Communications 8(1): 1424) or from Geobacillus thermodenithficans (ThermoCas9) (see Mougiakos et ai, (2017) “Characterizing a thermostable Cas9 for bacterial genome editing and silencing” Nature Communications 8(1): 1647. The base editors created have PAM preferences of NNNNCRAA and NNNNCVAA/NNNNCCCA respectively and are different in many respects compared to the known base editors which comprise SpyCas9. For example, a different range of genomic targets is made available with base editors of the present invention.
In one mode of application of the invention, the inventors use catalytically inactive variants of GeoCas9 or ThermoCas9 combined with PmCDAI deaminase (selected due to its low sequence context preference) and UGI (for decreasing undesired background of PmDA1 -mediated base editing).
In another mode of application of the invention, the inventors use active GeoCas9 or ThermoCas9 variants, together with a small anti-CRISPR protein, by way of example from Neisseria meningitidis (AcrIICINme), which is known to trap the Cas9 proteins in vitro in a DNA-bound but catalytically inactive state (see Pawluk etai., (2016) 39 “Naturally occurring off-switches for CRISPR-Cas9” Cell 167(7): 1829-1838; and Harrington etai.,
(2017b) “A broad-spectrum inhibitor of CRISPR-Cas9” Cell 170(6): 1224-1233). In this mode, base editing is carried out in cells by the PmCDAI of the base editor, followed by the step counterselection of non-base-edited cells by removal of the AcrIICINme activity. Contrary to most base-editing applications to date, the inventors have followed a 1-day protocol and targeted a gene that is not correlated to cell survival (GFP).
Remarkably, the inventors have found that the base-editing window of the GeoCas9- and ThermoCas9-based base-editors is up to 19 nucleotides, much wider compared to the base-editing window of currently known base-editors used in bacteria. The use of the base editors of the invention as described herein therefore broadens the range of possible edits and hence provides more flexibility in obtaining the desired edits in a given target sequence, due to wider editing window (as well as different PAM requirements). For example, it increases the possibility that stop codons will be generated in a target gene. As well as introducing stop codons, base editors of the invention introduce multiple nucleotide substitutions, which can be beneficial for random mutagenesis studies, but also for generating targeted mutants obtainable by the base conversion (to be selected from the subset of possible modifications). The inventors foresee application across the range of bacterial species, whether mesophilic or thermophilic, but also in eukaryotic species.
More information about the editing window for base editors of the invention is shown in Figure 1 and compared to various known base editors based on SpyCas9. Currently in the literature the base-editing window is 5 to 6 nucleotides for the TargetAID base editors (from -15/-16 to -20 PAM-distal position) and 4 to 6 nucleotides for the rAPOBECI base editor (from -13 to -16/-18 PAM-distal position or from -11 to -17). There is also the 8 nucleotide window in Klebsiella pneumoniae APOBEC1-XTEN (from -10 to - 17 PAM-distal position).
The inventors have also applied a fusion protein base editor of the nickase ThermoCas9 variant (nThermoCas9), rAPOBECI enzyme and two copies of UGI in HEK293T cells for the purpose of base-editing.
Active thermophilic Cas9 components
The active Geobacillus stearothermophilus (GeoCas9) component of base editors of the invention may be provided as a full length GeoCas9 protein or an active fragment thereof. A method of expressing and then purifying the GeoCas9 from E. coli is described in Harrington et a!., (2017a) supra and so an isolated, optionally purified, GeoCas9 may form an intermediate component in a method for making base editors of the invention
involving in vitro coupling of base editor components together using appropriate linking chemistry from proteins and peptides which is well known to a person of skill in the art.
The amino acid sequence of the GeoCas9 reference protein may be as set forth in SEQ ID NO: 1, although equally other amino acid sequences of Cas9 from other strains of Geobacillus stearothermophilus available in online databases may be used as a reference sequence, e.g. Geobacillus LC300 Cas9 (see Harrington, B., etal ., (2017a) supra).
The active Geobacillus thermodenitrificans (ThermoCas9) component of base editors of the invention may be provided as a full length ThermoCas9 protein or an active fragment thereof. A method of expressing and then purifying the ThermoCas9 from E. coli is described in Mougiakos etal., (2017) supra and so an isolated, optionally purified, ThermoCas9 may form an intermediate component in a method for making base editors of the invention involving in vitro coupling of base editor components together.
The amino acid sequence of the ThermoCas9 reference protein may be as set forth in SEQ ID NO: 3, although equally other amino acid sequences of Cas9 from other strains of Geobacillus thermodenitrificans available in online databases may be used as a reference sequence, e.g. Geobacillus 47C-I lb Cas9, Geobacillus 46C-I la, Geobacillus LC300, Geobacillus jurassicus and others.
Inactive or nickase thermophilic Cas9 components
A nickase variant may be created via a mutation in either one of the HNH or the RuvC catalytic domains of the Cas9 nuclease. This has been shown for S. pyogenes Cas9 (SpyCas) with SpyCas9-mutants D10A and H840A, which have an inactive RuvC or HNH nuclease domain, respectively. The corresponding mutation positions in dGeoCas9 are D8A and H582A. The corresponding mutation positions in dThermoCas9 are D8A and H582A.
When a combination of both of the D10A and H840A mutations is employed, this leads to a catalytically dead Cas9 variant (Standage-Beier, K. et al. , 2015, ACS Synth.
Biol. 4, 1217-1225; Jinek, M. et al., 2012, Science 337, 816- 821; Xu, T. et al., 2015, Appl. Environ. Microbiol. 81, 4423^431).
The amino acid sequence of dGeoCas9 is SEQ ID NO: 5 (DNA sequence is SEQ ID NO: 6). The amino acid sequence of dThermoCas9 is SEQ ID NO: 7 (DNA sequence is SEQ ID NO: 8).
Variants of the thermophilic Cas9 components
The term " variant" as used herein in relation to any of the active, inactive or nickase Cas9 components of base editors of the invention. The term variant may apply in relation to the nucleotide gene sequences or the amino acid sequences of the Cas9 components. Such variants have certain differences in the nucleotide and amino acid sequences from the corresponding Cas9 reference sequences, as disclosed herein, but retain substantially same or similar structure and function to the reference Cas9.
A variant also includes a GeoCas9 or ThermoCas9 of any of the active or nickase species thereof, which has sequence alterations that do not alter the function of the resulting protein, for example at the level of gene sequence silent nucleotide base changes due to redundancy of the genetic code. At the protein level such changes may be in non- conserved amino acid residues. Also encompassed are variant Cas9 components that are substantially identical, i.e. have only one or a number of sequence variations, for example in non-conserved amino acid residues compared to the respective reference sequences as described herein. The number of such amino acid changes may be selected from 1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11 or 12 amino acid changes, for example. Such changes may be at least partly contiguous or non-contiguous along the length of the amino acid sequence.
Variants of the Cas9 components of the base editors of the invention may also be defined in terms of degree of percentage identity to the respective reference sequences. For example, a variant may include a Cas9 protein or polypeptide having at least 77% identity; preferably at least 86%; more preferably at least 90%; even more preferably at least 95% identity to the defined reference sequence.
More particularly, a Cas protein or polypeptide component of base editors of the invention may comprise an amino acid sequence with a percentage identity with any of the respective reference SEQ ID Nos as disclosed herein, as follows: at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5% or at least 99.8% identity therewith.
The percentage amino acid sequence identity with the reference sequence determinable as a function of the number of identical positions shared by the sequences in a selected comparison window, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences.
The overall sequence identity may be determined using a global alignment algorithm known in the art, such as the Needleman Wunsch algorithm in the program GAP (GCG Wisconsin Package, Accelrys).
In some embodiments, less than the full length of a Cas9 component may be used if it provides substantially similar structure and function as the full length Cas9 reference sequence or percentage identity variants thereof. Full length GeoCas9 has 1087 amino acids. Therefore fragments of these Cas9 species for use in the invention may comprise an amino acid sequence that is truncated at either the N terminus and/or the C-terminus, by a number of amino acids selected from 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19 or 20 from the relevant reference sequence or percentage identity variants of said reference sequence. Alternatively or additionally to the N and/or C terminal truncations, a certain number of amino acid changes selected from addition, deletion and substitution may be provided. The number may be selected from 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20. There can be any combination of or number of addition, substitution or deletion of amino acids compared to the reference sequence or percentage identity variants of the reference sequence.
PAMs
The base editors in accordance with the invention may have PAM preferences which correspond to those known or determinable for GeoCas9 or ThermoCas9. For GeoCas9 the PAM is NNNNCRAA or NNNNGMAA (wherein R = A or G and M = C or A). For ThermoCas9 the PAM is NNNNCNAA or NNNNCMCA (wherein M = C or A)
Cvtidine deaminase
There are many cytidine deaminases which can be used in the invention, including without limitation: activation-induced deaminase (AID) (see e.g., Longerich S. et ai, (2006) Curr Opin. Immunol. 18(2): 164 - 74; Di Noia J. M. etal., (2007) Annu. Rev. Biochem. 76: 1 - 22); apolipoprotein B mRNA editing protein catalytic subunit 1 (APOBEC1) (see e.g., Harris etal., (2002) Mol. Cell (2002) 10(5): 1247 - 53; Blanc V. & Davidson N. O. (2003) The Journal of Biological Chemistry 278, 1395-1398; Petit V. etal., 2009 J. Mol. Biol. 385(1): 65 - 78; Gonzalez C. et ai, (2009) Retrovirology 6, 96; Ikeda T. etal., (2011) Nucleic Acids Research, 395538 - 5554; Ikeda T. et al., (2008) Nucleic Acids Research 36: 6859 - 6871; Niewiadomska etal., 2007) J Virol. 81(17): 9577 - 9583 ; APOBEC3A (see e.g., Stenglein etal., (2010) Nat. Struct. Mol. Biol. 17(2): 222 - 229; Suspene R. et al., (2011) Proc. Natl. Acad. Sci. 108(12): 4858 - 4863; Bulliard Y. et ai, (2011) J. Virol. 85(4): 1765 - 1776; Carpenter et ai, M. A. (2012) J. Biol. Chem. 287(41): 34801-34808); APOBEC3B (see e.g., Malim M. & Emernam M. (2008) Cell Host Microbe 3(6): 388 -398; Albin J. & Harris R. (2010) Expert Rev. Mol. Med. Jan 22;12:e4;
Wissing S. et a!., (2010) Mol. Aspects Med. 31(5): 383 - 397; Harris R. S. eta!., (2012) J. Biol. Chem. 287(49): 40875 - 40883); APOBEC3C (see e.g. Baumert T. F. etal., (2007) Hepatology 46(3): 682-689; Kock J. & Blum H. (2008) J. Gen. Virol. 89(Pt 5): 1184 - 1191; Langlois etal., (2005) Nucleic Acids Res. 33(6): 1913 - 1923); APOBEC3D 36,37 (Hultquist J. F. et al., (2011) J Virol. 85(21):11220 - 112234; Refsland E. W. etal., (2012) PLoS Pathog. 8(7):e1002800); APOBEC3F (see e.g. Hultquist J. F. etal., (2011) J Virol. 85(21):11220 - 112234; Refsland E. W. et al., (2012) PLoS Pathog. 8(7):e1002800) ; APOBEC3G (see e.g., Malim M. & Emernam M. (2008) Cell Host Microbe 3(6): 388 -398; Albin J. & Harris R. (2010) Expert Rev. Mol. Med. Jan 22;12:e4; Wissing S. etal., (2010) Mol. Aspects Med. 31(5): 383 - 397; Harris R. S. etal., (2012) J. Biol. Che . 287(49): 40875 - 40883; Kock J. & Blum H. (2008) J. Gen. Virol. 89(Pt 5): 1184 - 1191; Hultquist J. F. etal., (2011) J Virol. 85(21):11220 - 112234); APOBEC3H 36,37 (Hultquist J. F. et al., (2011) J Virol. 85(21):11220 - 112234; Refsland E. W. et al., (2012) PLoS Pathog. 8(7):e1002800; Kock J. & Blum H. (2008) J. Gen. Virol. 89(Pt 5): 1184 - 1191).
AID/APOBEC homologues have also been identified (e.g. Mm-AID, Gg-AID, Dr- AID, Tr-AID, and Ip-AID from mouse, chicken, zebrafish, pufferfish fugu, and channel catfish, respectively) and jawless vertebrates (e.g. CDA1L1_1 to CDA1L1_4, Gc-AID, Lc- AID, and Tn-AID from sea lamprey, nurse shark, the “living fossil” fish coelacanth, and the bony fish tetraodon, respectively) (DancygerA. et al., (2012) FASEB J. 26(4):1517 - 1525; Holland S. etal., (2018) Proc. Natl. Acad. Sci. U S A 115(14): E3211-E3220; Quinlan E. M. etal., (2017) Mol. Cell Biol. 37(20): e00077-1).
Base editors of the invention may be used with any cytidine deaminase variant or engineered version of them (see e.g. Gehrke etal., (2018) Nature Biotechnology 36: 977 - 982; Wang M. etal., (2009) Nat. Struct. Mol. Biol. 16(7):769-776; Wang M. etal., (2010) J. Exp. Med. 207(1): 141-53; Kohli R. M. etal., (2010) J. Biol. Chem. 24: 285(52): 40956 - 409564; Zuo et ai, (2020) Nature Methods 17, 600-604. More particularly PmCDAI may be used because these nine human-derived variants have specific preference for the nucleotide immediately upstream of the C targeted for conversion (most of them show preference for TC motif). PmCDAI shows less context preference (Tan J. et at., (2019) Nature Comm. 10: 439) as well as higher editing efficiency (in certain sequence contexts) and higher product purity compared to the widely used APOBEC1 (Komor et at., (2017) supra ; Tan J. et ai, (2019) supra). In addition, APOBEC1 base editors produce predominantly singly modified products, while CDA1 base editors mainly produce two simultaneous modifications in targets with multiple Cs (Tan J., etai, (2019) supra).
Base editors of the invention may comprise a cytidine deaminase domain which is preferably a deaminase selected from the apolipoprotein B mRNA-editing complex
(APOBEC) family of deaminases. The APOBEC family deaminase may therefore be selected from the group consisting of APOBEC1 deaminase, APOBEC2 deaminase, APOBEC3A deaminase, APOBEC3B deaminase, APOBEC3C deaminase, APOBEC3D deaminase, APOBEC3F deaminase, APOBEC3G deaminase, and APOBEC3H deaminase. Also available is cytidine deaminase 1 (CDA1).
In accordance with the invention a cytidine deaminase domain comprises an amino acid sequence that is at least 85% identical to an amino acid sequence of SEQ ID NOs: 266 - 284, 607 - 610, 5724 - 5736, or 5738 - 5741 as set forth in WO2017/070632 A2.
Truncated versions of cytidine deaminase inhibitors may be employed. For example, as described in Tan. J. eta!., (2019) supra a C-terminus-truncated PmCDAI was fused to the N-terminus of nSpyCas9 (without the presence of a linker). Truncation of the nuclear export signal (NES) from the C-terminus of PmCDAI showed small effects on editing efficiency and specificity, while larger deletions rendered editing more precise and substantially narrowed the activity window of the base editors. Similarly, base editors of the present invention may use engineered cytidine deaminases in case a narrower window of editing is desired.
WO 2019/241649, WO 2019/023680 and WO 2018/218166 each describe various cytidine deaminases which may be of use in the presently described invention.
Adenine deaminase
Adenine deaminases are known, having been engineered to convert adenosine to inosine, which is treated like guanosine by the cell, creating an A to G (or T to C) change. Adenine DNA deaminases do not exist in nature but have been created by directed evolution of the Escherichia coli TadA, a tRNA adenine deaminase. Like cytosine base editors, the evolved TadA domain may be fused to the relevant GeoCas9 or ThermoCas9 protein to create the adenine base editor. For example, WO 2018/027078 describes an adenosine deaminase of use in the present invention, capable of deaminating adenine of deoxyadenosine in DNA useful for editing nucleobase pairs in double-stranded DNA sequences.
Also, a combination of a cytidine deaminase and adenine deaminase may be used that can concurrently introduce A-to-G and C-to-T substitutions, as e.g. described in Gmnewaldef a/., (2020) Nature Biotechnology: Jun 1, there referred to as synchronous programmable adenine and cytosine editor (SPACE).
Uracil DNA Glycosylase Inhibitor (UGI)
A "uracil DNA glycosylase inhibitor" or "UGI," as used herein, means a protein which is capable of inhibiting a uracil-DNA glycosylase base-excision repair enzyme.
Suitable UGI protein and nucleotide sequences are well known, and include, for example, those published in Wang et a!., (1989) “Uracil-DNA glycosylase inhibitor gene of bacteriophage PBS2 encodes a binding protein specific for uracil-DNA glycosylase” J. Biol. Chem. 264: 1 163-1 171; Lundquist et ai, (1997) “Site- directed mutagenesis and characterization of uracil-DNA glycosylase inhibitor protein. Role of specific carboxylic amino acids in complex formation with Escherichia coli uracil-DNA glycosylase” J. Biol. Chem. 272: 21408-21419; Ravishankar etai., (1998) “X-ray analysis of a complex of Escherichia coli uracil DNA glycosylase (EcUDG) with a proteinaceous inhibitor. The structure elucidation of a prokaryotic UDG” Nucleic Acids Res. 26: 4880-4887; and Putnam etai., (1999) “Protein mimicry of DNA from crystal structures of the uracil-DNA glycosylase inhibitor protein and its complex with Escherichia coli uracil-DNA glycosylase” J. Mol. Biol. 287: 331- 346.
The UGI domain comprises a wild-type UGI or a UGI having an amino acid sequence identical to those set forth in SEQ ID NOs: 322-324 of WO 2017/070632 A2, or SEQ ID NO: 48 of WO 2019/005886 A1 ; or SEQ ID NO: 500 of WO 2019/168953. In some embodiments, the UGI proteins provided herein include fragments of UGI and proteins homologous to a UGI or a UGI fragment. For example, in some embodiments, a UGI domain comprises a fragment of the amino acid sequence set forth in any of the aforementioned SEQ ID NOs. In some embodiments, a UGI fragment comprises an amino acid sequence that comprises at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid sequence as set forth in the aforementioned SEQ ID NOs. In some embodiments, the UGI proteins comprising UGI or fragments of UGI or homologs of UGI or UGI fragments are referred to as "UGI variants."
A UGI variant shares homology to UGI, or a fragment thereof. For example a UGI variant may be at least 70% identical, at least 75% identical, at least 80% identical, at least 85% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at least 99.9% identical to a wild type UGI or a UGI as set forth in any of the aforementioned SEQ ID Nos. In some embodiments a fragment of UGI may be at least 70% identical, at least 80% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at
least 99.9% to the corresponding fragment of wild-type UGI or a UGI as set forth in any of the aforementioned SEQ ID NOs.
UGI significantly increases the editing efficiency compared to the absence of UGI (indicatively Banno et at., (2018) Nat Microbiol. 3(4): 423 - 429). Moreover, Komor et at., (2017) supra and Li et al., (2018) supra have reported the implementation of two or four copies of the UGI gene, respectively, further increasing the editing efficiency. The base editors of this invention therefore may include multiple copies of UGI therein, e.g. 2, 3 or 4 copies.
Linkers
Base editors of the invention may be constructed along the lines already known in the art, and therefore may comprise various linkers, such as XTEN. Some base editors of the invention may be completely linkerless. The UGI is always positioned at the C- terminus of Cas9 in base editors of the invention, and since the deaminase may also be at the C-terminus of the Cas9, a linker needs to be placed between the Cas9 and the deaminase, and another linker between the deaminase and the UGI. Usually therefore there is no linker between the Cas9 and the UGI. Other possibilities include a CDA1 at the N-terminus of the base editor protein complex which is connected with a linker to the N- terminus of Cas9, which in turn is connected with the UGI via another linker at its C- terminus. In some embodiments, PmCDAI can be fused to the N-terminus of the Cas9, maintaining the same editing efficiencies but widening even more the activity window compared to the known SpyCas9 base editors.
Gam Protein
Gam may be fused to BE3, BE4, SaBE3, or SaBE4 in base editors of the invention. Gam is a bacteriophage Mu protein that binds DSBs and greatly reduces indel formation during base editing, in most cases to below 1.5%, and further improves product purity (KomorA. C. et a!., (2017) Science Advances 3(8): eaao4774).
Anti-CRISPR
In methods of the invention which employ an anti-CRISPR, this has been so far tested by locating a polynucleotide encoding the anti-CRISPR on a separate expression vector from that for the base editor. However, the anti-CRISPR-containing expression vector possessed a high level of expression without the need for any particular induction
conditions. Advantageously a high copy number of the anti-CRISPR expression vector can be used which because of leaky expression this leads to a high level of expression and good repression of the ThermoCas9 or GeoCas9 activities, leading to efficient base editing. Experiments have been successfully undertaken in E. coli using either a dual plasmid approach or a single-plasmid approach.
Even with maximum induction of the active Cas9, almost complete inhibition of the targeting was observed, even when AcrIICI was not induced, due to the leaky expression of the AcrIICI in the high copy number plasmid.
In a single-vector approach (i.e. medium copy number plasmid carrying the AcrIICI under the same inducible promoter as before, the active ThermoCas9/GeoCas9 under the same inducible promoter as before and the guide RNA under the same constitutive promoter as before), when the active ThermoCas9 or GeoCas9 and the AcrIICI expression was fully induced, the Cas9-mediated targeting was shown to be stronger than the AcrIICI -mediated inhibition. So therefore a single-vector approach may provide a higher tunability than the dual-vector approach.
For base editing using AcrIICI, the inventors prefer a single-vector approach to enable counter-selection (i.e. high Cas9, low AcrIICI) following a preceding base editing step (i.e. high Cas9, high AcrIICI). The amino acid sequence of AcrIICI Nme (for application in E. coli ) is set forth in SEQ ID NO: 11; the DNA sequence in SEQ ID NO: 12.
WO 2017/160689 describes various type ll-C anti-CRISPRs which may be employed in the present invention.
Targeting RNA
In order to provide the sequence-specific targeting for base editing in accordance with the invention, a crRNA of particular sequence is required, together with a tracrRNA. However, the two RNA molecules may be substituted by a crRNA-tracrRNA chimeric molecule, where crRNA constitutes the 5’ part of the molecule followed by a small connecting sequence (usually 5’-GAAA-3’) followed by tracrRNA as the 3’part of the molecule, what is commonly known as a single guide or guide RNA (sgRNA or gRNA).
The base editors of the invention form a ribonucleoprotein complex with the crRNA/tracrRNA or gRNA, and these RNAs complementarily bind to a targeted sequence of the genomic DNA. The RNA targeting molecules are designed in sequence selection so as to recognize the complement of a so called “protospacer” sequence which is adjacent 3’ to the PAM sequence in the same strand.
sgRNAs with truncated or extended spacer sequence may be used in the invention to shift the editing window, as has been observed in a study from Banno, S. et al., (2018) Nature Microbiology, 3(4): 423 - 429. As described in the examples herein an optimal sgRNA was used for ThermoCas9 (Mougiakos eta!., (2017) Nature Communications 8(1): 1647) for both ThermoCas9 and GeoCas9. Alternatively, the sgRNA of GeoCas9 (Harrington et al., (2017a) supra) may be used for both Cas9 variants. In addition, an aptazyme could be fused to the sgRNA for strict control of the base editing efficiency, although it might reduce the on-target efficiency and present leaky activity (Tang W. etai, (2017) Nature Communications 8: 15939.
Base editing window
The “base editing window” in accordance with the present invention are the positions in the protospacer sequence of the DNA strand which are open to reaction with the deaminase. Whether or not a base change occurs at a given position upstream of the PAM depends of course on the presence of the appropriate base: C for a cytidine deaminase, and A for and adenine deaminase. Also, even though there may be a multiplicity of appropriate bases in the editing window, this may not mean that all such bases are acted on simultaneously and/or completely. One may expect to see patterns of editing in the base editing window and these may be revealed after the event by sequencing of the relevant region of DNA concerned.
For base editors of the invention the editing window may be any of the following, in nucleotide position, with respect to the PAM as can be seen in Figure 1. For example, base editing within a protospacer window selected from any of the following individual window positions:
-29 to -14, -29 to -13, -29 to -12, -29 to -11, -29 to -10, -29 to -9, -29 to -8, -29 to - 7, -29 to -6, -29 to -5; or
-28 to -14, -28 to -13, -28 to -12, -28 to -11, -28 to -10, -28 to -9, -28 to -8, -28 to - 7, -28 to -6, -28 to -5; or
-27 to -14, -27 to -13, -27 to -12, -27 to -11, -27 to -10, -27 to -9, -27 to -8, -27 to - 7, -27 to -6, -27 to -5; or
-26 to -14, -26 to -13, -26 to -12, -26 to -11, -26 to -10, -26 to -9, -26 to -8, -26 to - 7, -26 to -6, -26 to -5; or
-25 to -14, -25 to -13, -25 to -12, -25 to -11, -25 to -10, -25 to -9, -25 to -8, -25 to - 7, -25 to -6, -25 to -5; or
-24 to -14, -24 to -13, -24 to -12, -24 to -11, -24 to -10, -24 to -9, -24 to -8, - 24 to -7, -24 to -6, -24 to -5; or
-23 to -14, -23 to -13, -23 to -12, -23 to -11, -23 to -10, -23 to -9, -23 to -8, -23 to - 7, -23 to -6, -23 to -5; or
-22 to -14, -22 to -13, -22 to -12, -22 to -11, -22 to -10, -22 to -9, -22 to -8, -22 to - 7, -22 to -6, -22 to -5; or
-21 to -14, -21 to -13, -21 to -12, -21 to -11, -21 to -10, -21 to -9, -21 to -8, -21 to - 7, -21 to -6, -21 to -5; or
-20 to -9, -20 to -8, -20 to -7, -20 to -6, -20 to -5; or
-19 to -9, -19 to -8, -19 to -7, -19 to -6, -19 to -5; or
-18 to -9, -18 to -8, -18 to -7, -18 to -6, -18 to -5; or
-17 to -9, -17 to -8, -17 to -7, -17 to -6, -17 to -5; or
-16 to -9, -16 to -8, -16 to -7, -16 to -6, -16 to -5; or
-15 to -9, -15 to -8, -15 to -7, -15 to -6, -15 to -5; or
-14 to -9, -14 to -8, -14 to -7, -14 to -6, -14 to -5.
Base editing may take place preferentially at the PAM distal end of any of the base editing window possibilities. This is usually when a dead or nickase version of the GeoCas9 or ThermoCas9 is used.
Base editing may take place preferentially at the PAM proximal end of any of the base editing window possibilities. This is usually when an active GeoCas9 or ThermoCas9 is used in conjunction with an anti-CRISPR protein.
In preferred embodiments of the active GeoCas9 and anti-CRISPR protein form of base editing, the window of -8 to -5 of the PAM is less efficient than elsewhere upstream in the protospacer and as a consequence is less preferred.
Synthetic protein base editors
Whilst base editors of the invention may be synthesised as fusion proteins within cells in situ by expression within the cell from a suitable polynucleotide expression vector, base editors can be produced externally of the cell by a combination of chemical synthesis and chemical coupling. Alternatively, base editors may be made by expressing protein or polypeptide components in recombinant expression systems, isolated and then covalently coupled using protein coupling chemistry well known in the art.
Peptide ligation may be used to create base editors, involving peptide or protein synthesis and ligation. Two or more peptide ligation steps, and sequential peptide ligation may be used.
Cell-free protein expression may be used to produce base editors or ribonucleoprotein base editors of the invention. Cell-free protein production is achieved by combining a crude lysate from growing cells, which contains all the necessary enzymes and machinery for protein synthesis (including transcription and translation), with the exogenous supply of essential amino acids, nucleotides, salts, and energy-generating factors and introducing exogenous messages including RNA (mRNA) or DNA as template into the system.
Chemical synthesis may be used, especially solid-phase peptide synthesis (SPPS) followed by isolation and chemical ligation of segments of base editor proteins. Various methods will be well known to persons of skill in the art, including thioester-forming ligation, oxime and hydrazone-forming ligation, thiazolidone/oxazolidine-forming ligation, disulphide exchange (thioacid-capture ligation), or native chemical ligation (NCL).
In NCL, a thioester and a cysteinyl peptide are combined together to form a ligated peptide or protein. In principle, synthetic and recombinant building blocks can be used for NCL in a semisynthetic manner. Thus, segments can be acquired by recombinant protein expression and/or by synthesis and coupled together. For more specific information see Hou, W., eta!., (2017) “Progress in Chemical Synthesis of Peptides and Proteins” Trans. Tianjin Univ. 23, 401-419.
Ribonucleoprotein (RNP) complexes
RNP delivery may be performed using a Neon electroporation system (ThermoFisher) following the manufacturer’s instructions.
Base editing without expression or in cell-free systems
In accordance with methods and systems of base editing described herein, a modification of target nucleic acid may be carried out directly on cells without employing expression vectors encoding the base editor proteins and targeting RNA. The base editor proteins and targeting RNA may be introduced into the cells simultaneously, sequentially (in any order as desired), or separately. A ribonucleoprotein base editor complex may be introduced directly into cells.
Polynucleotides and expression vectors
Polynucleotides of the present invention as described herein may be in isolated form. However, in order that expression of such a polynucleotide is carried out in a desired cell to undertake base editing, the polynucleotide encoding the base editor (and/or gRNA) is preferably be provided in an expression construct. One or more expression vectors may be used in accordance with the invention to achieve the base editing required.
Suitable expression vectors will vary according to the recipient cell and may incorporate regulatory elements which enable expression in the target cell and preferably which facilitate high-levels of expression. Such regulatory sequences may be capable of influencing transcription or translation of a gene or gene product, for example in terms of initiation, accuracy, rate, stability, downstream processing and mobility.
Such elements may include, for example, strong and/or constitutive promoters, 5’ and 3’ UTR’s, transcriptional and/or translational enhancers, transcription factor or protein binding sequences, start sites and termination sequences, ribosome binding sites, recombination sites, polyadenylation sequences, sense or antisense sequences, sequences ensuring correct initiation of transcription and optionally poly-A signals ensuring termination of transcription and transcript stabilisation in the host cell. The regulatory sequences may be plant, animal, bacteria, fungal or virus-derived, and preferably may be derived from the same organism as the host cell. Clearly, appropriate regulatory elements will vary according to the host cell of interest. For example, regulatory elements which facilitate high-level expression in prokaryotic host cells such as in E. coli may include the pLac, T7, P(Bla), P(Cat), P(Kat), trp or tac promoters. Regulatory elements which facilitate high-level expression in eukaryotic host cells might include the AOX1 or GAL1 promoter in yeast or the CMV- or SV40-promoters, CMV-enhancer, SV40-enhancer, Herpes simplex virus VIP16 transcriptional activator or inclusion of a globin intron in animal cells. In plants, constitutive high-level expression may be obtained using, for example, the Zea mays ubiquitin 1 promoter or 35S and 19S promoters of cauliflower mosaic virus.
Suitable regulatory elements may be constitutive, whereby they direct expression under most environmental conditions or developmental stages, developmental stage specific or inducible. Preferably, the promoter is inducible, to direct expression in response to environmental, chemical or developmental cues, such as temperature, light, chemicals, drought, and other stimuli. Suitably, promoters may be chosen which allow expression of the protein of interest at particular developmental stages or in response to extra- or intra-cellular conditions, signals or externally applied stimuli. For example, a range of promoters exist for use in E. coli which give high-level expression at particular
stages of growth (e.g. osmY stationary phase promoter) or in response to particular stimuli (e.g. HtpG Heat Shock Promoter).
Suitable expression vectors may comprise additional sequences encoding selectable markers which allow for the selection of said vector in a suitable host cell and/or under particular conditions.
In accordance with preferred aspects of the invention, base editing of cells comprises transfecting, transforming or transducing the cell with any of the expression vectors as hereinbefore described. The methods of transfection, transformation or transduction are of the types well known to a person of skill in the art. Where there is one expression vector used to generate expression of a ribonucleoprotein complex of the invention and when the targeting RNA is added directly to the cell then the same or a different method of transfection, transformation or transduction may be used. Similarly, then there is one expression vector being used to generate expression of a ribonucleoprotein complex of the invention and when another expression vector is being used to generate the targeting RNA in situ via expression, then the same or a different method of transfection, transformation or transduction may be used.
In other embodiments, an mRNA encoding the base editor protein polypeptide is introduced into a cell so that the base editor is expressed in the cell. The targeting RNA which guides the Cas protein complex to the desired target sequence is also introduced into the cell, whether simultaneously, separately or sequentially from the mRNA, such that the necessary ribonucleoprotein base editor complex is formed in the cell.
Base editing of prokaryotes and archeae
The particular base editors and systems exemplified herein are readily adapted for the base editing of other bacterial and more widely for other microbes. A person of skill in the art will readily know how to select appropriate promoters, vectors, RBSs, codon optimized variants of the Cas9 orthologs, cytidine deaminase enzymes and UGIs.
T ransformation protocols
Equally, the modification of the target nucleic acid may be made in vivo , that is in situ in a cell, whether an isolated cell or as part of a multicellular tissue, organ or organism. In the context of whole tissue and organs, and in the context of an organism, the method may desirably be carried out in vivo or alternatively may be carried out by isolating a cell from the whole tissue, organ or organism, treating the cell ribonucleoprotein complex in
accordance with the method and subsequently returning the cell treated with ribonucleoprotein complex to its former location, or a different location, whether within the same or a different organism.
In these embodiments, the ribonucleoprotein complex or the Cas protein or polypeptide requires an appropriate form of delivery into the cell. Such suitable delivery systems and methods are well known to persons skilled in the art, and include but are not limited to cytoplasmic or nuclear microinjection.
It has been reported that it is possible to clone as much as about 5,5kb cargo in AAV5, for example. Considering the sizes of components of an exemplary base editing system of the invention, these are:
• CMV enhancer: 380nt
• CMV promoter: 204nt
• Connecting sequence: 73nt
• ApoBEC: 687nt
• Linker: 96nt
• nThermoCas9: 3246nt
• Linker: 30nt
• UGI: 249nt
• Linker: 30nt
• UGI: 249nt
• Linker 12nt
• SV40 NLS plus stop codon: 24nt
The sum of the above modules is 5280 bp. In embodiments where an sgRNA, e.g. a U6 promoter plus sgRNA, is included in the same construct as the base editor components, then there would be an additional 450nt.
The thermocas9 and geocas9 gene sizes are small enough to permit base editor constructs of the invention to be inserted into AAV vectors (unlike spCas9) and the GeoCas9 and ThermoCas9 base editor fusions of the invention are still small enough for AVV vectors. Therefore in preferred modes of delivery, an Adeno-associated virus (AAV) is used; this delivery system is not disease causing in humans and has been approved for clinical use in Europe.
Methods of transformations of cells in accordance with the invention include the following:
In vitro and ex vivo electroporation of (i) DNA plasmid(s) coding for the base editor protein complex and the desired sgRNA(s), (ii) mRNA(s) coding for the base editor protein complex and the desired sgRNA(s), or (iii) purified base editor protein complex molecules loaded with the desired sgRNA(s), into either the nucleus (nucleofection) or the cytoplasm of individual mammalian cells (see for example: Matano M, etal. (2015) Nat Med. 21 : 256 - 62 ; Paquet D, et ai. (2016) Nature 533: 125 -129; Ousterout D. et ai. (2015) Nat. Commun. 6: 6244; Ye L. et al. (2014) Proc. Natl. Acad. Sci. USA 111: 9591- 9596; Wang T. (2014) Science 343: 80 - 84; Zuckermann M. etal. (2015) Nat. Commun.
6: 7391 ; Truong D. J. etal. (2015) Nucleic Acids Res. 43: 6450-6458.
In vitro, ex vivo and in vivo delivery of adeno-associated (see for example,
Esvelt K. etal. (2013) Nat. Methods 10: 1116-1121), lentiviral (see for example Shalem O. et ai (2014) Science 343: 84 - 87) or adenoviral (see for example, Maddalo D, et al.
(2014) Nature 516: 423-427) ssDNA or dsDNA vectors (AAV, LV or AV vectors respectively) that code for the base editor protein complex and the desired sgRNA(s).
Liposomes (for example, Lipofectamine by Thermo Fischer Scientific; and see also for example Yin H. etal. (2016) Nat. Biotechnol. 34:328 -333) and other lipoplexes/polyplexes (for example, FuGENE-6 reagent by Promega, zwitterionic amino lipids (see for example, Miller J. etal. (2017) Angew Chem. Int. Ed. Engl. 56: 1059 - 1063), DNA/Ca2+ microcomplexes, (see for example, Ebina H. etal., (2013) Sci. Rep. 3: 2510) as carriers for in vitro, ex vivo and in vivo delivery of (i) DNA plasmid(s) coding for the base editor protein complex and the desired sgRNA(s), (ii) mRNA(s) coding for the base editor protein complex and the desired sgRNA(s), or (iii) purified base editor protein complex molecules loaded with the desired sgRNA(s).
Covalent attachment (see for example, Ramakrishna S. etal. (2014) Genome Res. 24: 1020-1027; Axford D. etal. (2017) FASEB J. 31: 909.4) of cell-penetrating peptides (CPPs) to purified base editor protein complex molecules loaded with the desired sgRNA(s) for in vitro and ex vivo delivery.
In vitro delivery of DNA nanoclews (see for example, Sun W. et al. (2015) Angew Chem. Int. Ed. Engl. 54: 12029-12033) loaded with (and protecting) purified base editor protein complex molecules loaded with the desired sgRNA(s) (the latter partial complementary to the palindromic sequences that form the nanoclews).
In vitro, ex vivo and in vivo delivery of purified base editor protein complex molecules loaded with the desired sgRNA(s) and associated with gold nanoparticles (AuNPs) (see for example, Mout R. etal. (2017) ACS Nano 11: 2452 - 2458).
In vitro delivery of purified base editor protein complex molecules loaded with the desired sgRNA(s) via induced transduction by osmocytosis and propanebetaine (iTOP) (see for example, D'Astolfo D. et al. (2015) Cell 161 : 674 - 690).
In vivo hydrodynamic delivery of (i) DNA plasmid(s) coding for the base editor protein complex and the desired sgRNA(s), or (ii) purified base editor protein complex molecules loaded with the desired sgRNA(s), into the blood stream of animal models. This approach is only used of mice and due to its harsh and painful nature it cannot be used for humans (see for example, Yin H. etal. (2014) Nat. Biotechnol. 32: 551 - 553 ; Guan Y. etal. (2016) EMBO Mol. Med. 8: 477 - 488; Xue W. etal. (2014) Nature 514: 380 - 384
Direct needle-based in vitro and ex vivo microinjection of (i) DNA plasmid(s) coding for the base editor protein complex and the desired sgRNA(s), (ii) mRNA(s) coding for the base editor protein complex and the desired sgRNA(s), or (iii) purified base editor protein complex molecules loaded with the desired sgRNA(s), into either the nucleus or the cytoplasm of individual cells (rodent, monkey, sheep, porcine). So far this approach has been used for rodent, monkey, sheep, porcine (see for example, Yang H. etal. (2013) Cell 154: 1370 - 1379; Niu Y. etal. (2014) Cell 156: 836 - 843; Crispo M. et al. (2015) PLoS One 10: e0136690; Chuang C. K. etal. (2017) Anim. Biotechnol. 28: 174 - 181.
Temperature
AID orthologs from bony and cartilaginous fish, as well as PmCDAI, are cold- adapted enzymes (Dancyger ef a/., 2012; Quinlan et al., 2017; Holland et a!., 2018). Specifically, PmCDAI exhibits optimal activity at 14.5°C in vitro (Quinlan et al., 2017). Moreover, all CDA1L1 enzymes from sea lamprey as well as from zebrafish AID (Dr-AID) or channel catfish (Ip-AID) are also cold adapted, exhibiting an optimal temperature of 14- 22 °C and 20-25°C, respectively, while the human (Hs-AID) prefers 30 -37 °C (Dancyger A. et al., (2012) FASEB J. 26(4): 1517 - 1525; Holland S. etal., (2018) Proc. Natl. Acad. Sci. U S A 115(14): E3211-E3220; Quinlan E. M. etal., (2017) Mol. Cell Biol. 37(20): e00077-1). Despite how PmCDAI presents preference for low temperatures, it has been shown to be highly active at 37°C in E. coli (Banno, S. etal., (2018) Nature Microbiology, 3(4): 423 - 429) and given its small size it is expected to be active at even higher temperatures.
Timing of base editing protocol
The inventors have found that methods of the invention permit a short time of protocol. A minimum of about 20 hours is possible from transforming the cells up to sending the PCR amplified DNA for sequencing.
Re-streaking
A short time protocol has sometimes a disadvantage of generating some mixed wild type/mutant genotypes. However, this problem is readily overcome by a re-streaking on plates with base editing (dGeoTarget-AID, dThermoTarget-AID) or counter-selection conditions (AcrGeoTarget-AID, AcrThermoTarget-AID). A re-streaking protocol can provide higher editing efficiency and wider window of base editing activity.
Base editing in cells
Advantageously, the present invention is of broad applicability and expression vectors of the present invention as described herein may be employed in any genetically tractable organism which can be transformed with the expression vectors. Where the invention involves the direct exposure of the organism to one or more individual components of the base editing systems of the invention, e.g. direct exposure to gRNA or to ribonucleoprotein, then the invention may be deployed to any suitable host cell which can be induced to take up the base editing component.
Appropriate cells for base editing may be prokaryotic or eukaryotic. In particular, commonly used host cells may be selected for use in accordance with the present invention including prokaryotic or eukaryotic cells which are genetically accessible and which can be cultured, for example prokaryotic cells, fungal cells, plant cells and animal cells including human cells (but not embryonic stem cells). Preferably, host cells will be selected from a prokaryotic cell, a fungal cell, a plant cell, a protist cell or an animal cell.
Where the host cell is a prokaryotic cell it may be bacterium, e.g. an Escherichia coli cell.
The base editors and any of the base editing systems of the invention described herein may be used to modify genomes of bacterial cells. The bacteria may be thermophilic bacteria, for example bacteria selected from: Acidithiobacillus species including Acidithiobacillus caldus ; Aeribacillus species including Aeribacillus pallidus; Alicyclobacillus species including Alicyclobacillus acidocaldarius, Alicyclobacillus acidoterrestris, Alicyclobacillus cycloheptanicusl, Alicyclobacillus hesperidum; Anoxybacillus species including Anoxybacillus caldiproteolyticus, Anoxybacillus
flavithermus , Anoxybacillus rupiensis, Anoxybacillus tepidamans ; Bacillus species including Bacillus caldolyticus, Bacillus caldotenax, Bacillus caldovelox, Bacillus coagulans, Bacillus clausii, Bacillus licheniformis, Bacillus methanolicus, Bacillus smithii including Bacillus smithii ET 138, Bacillus subtilis, Bacillus thermocopriae, Bacillus thermolactis, Bacillus thermoamylovorans, Bacillus thermoleovorans ; Caldibacillus species including Caldibacillus debilis Caldicellulosiruptor species including Caldicellulosiruptor bescii, Caldicellulosiruptor hydrothermalis, Caldicellulosiruptor kristjanssonii, Caldicellulosiruptor kronotskyensis, Caldicellulosiruptor lactoaceticus, Caldicellulosiruptor obsidiansis, Caldicellulosiruptor owensensis, Caldicellulosiruptor saccharolyticus ; Clostridium species including Clostridium clariflavum, Clostridium straminisolvens, Clostridium tepidiprofundi, Clostridium thermobutyricum, Clostridium thermocellum, Clostridium thermosuccinogenes, Clostridium thermopalmarium ; Deinococcus species including Deinococcus cellulosilyticus, Deinococcus deserti, Deinococcus geothermalis, Deinococcus murrayi, Deinococcus radiodurans ; Defluviitalea species including Defluviitalea phaphyphila, Desulfotomaculum species including Desulfotomaculum carboxydivorans, Desulfotomaculum nigrificans, Desulfotomaculum salinum, Desulfotomaculum solfataricum ; Desulfurella species including Desulfurella acetivorans ; Desulfurobacterium species including Desulfurobacterium thermolithotrophum; Geobacillus species including Geobacillus icigianus, Geobacillus caldoxylosilyticus, Geobacillus jurassicus, Geobacillus galactosidasius, Geobacillus kaustophilus, Geobacillus lituanicus, Geobacillus stearothermophilus, Geobacillus subterraneus, Geobacillus thermantarcticus, Geobacillus thermocatenulatus, Geobacillus thermodenitrificans, Geobacillus thermoglucosidans, Geobacillus thermoleovorans, Geobacillus toebii, Geobacillus uzenensis, Geobacillus vulcanii, Geobacillus zalihae Hydrogenobacter species including Hydrogenobacter thermophiles Hydrogenobaculum species including Hydrogenobaculum acidophilum ; Ignavibacterium species including Ignavibacterium album ; Lactobacillus species including Lactobacillus bulgaricus, Lactobacillus delbrueckii, Lactobacillus ingluviei, Lactobacillus thermotolerans ; Marinithermus species including Marinithermus hydrothermalis; Moorella species including Moorella thermoacetica Oceanithermus species including Oceanithermus desulfurans, Oceanithermus profundus ; Paenibacillus species including Paenibacillus sp. J2, Paenibacillus marinum, Paenibacillus thermoaerophilus; Persephonella species including Persephonella guaymasensis, Persephonella hydrogeniphila, Persephonella marina ; Rhodothermus species including Rhodothermus marinus, Rhodothermus obamensis, Rhodothermus profundi ; Sulfobacillus species including Sulfobacillus acidophilus ; Sulfurihydrogenibium species including Sulfurihydrogenibium azorense, Sulfurihydrogenibium kristjanssonii, Sulfurihydrogenibium rodmanii, Sulfurihydrogenibium yellowstonense ; Symbiobacterium species including
Symbiobacterium thermophilum, Symbiobacterium toebir, Thermoanaerobacter species including Thermoanaerobacter brockii, Thermoanaerobacter ethanolicus, Thermoanaerobacter italicus, Thermoanaerobacter kivui, Thermoanaerobacter marianensis, Thermoanaerobacter mathranii, Thermoanaerobacter pseudoethanolicus, Thermoanaerobacter wiegelii Thermoanaerobacterium species including Thermoanaerobacterium aciditolerans, Thermoanaerobacterium aotearoense, Thermoanaerobacterium ethanolicus, Thermoanaerobacterium pseudoethanolicus, Thermoanaerobacterium saccharolyticum, Thermoanaerobacterium thermosaccharolyticum, Thermoanaerobacterium xylanolyticum Thermobacillus species including Thermobacillus composti, Thermobacillus xylanilyticus Thermocrinis species including Thermocrinis albus, Thermocrinis ruber; Thermodulfatator species including Thermodesulfatator atlanticus, Thermodesulfatator autotrophicus, Thermodesulfatator indicus ; Thermodesulfobacterium species including Thermodesulfobacterium commune, Thermodesulfobacterium hydrogeniphilum·, Thermodesulfobium species including Thermodesulfobium narugense; Thermodesulfovibrio species including Thermodesulfovibrio aggregans, Thermodesulfovibrio thiophilus, Thermodesulfovibrio yellowstonir, Thermosipho species including Thermosipho africanus, Thermosipho atlanticus, Thermosipho melanesiensis·, Thermotoga species including Thermotoga maritima, Thermotoga neopolitana, Thermotoga sp. RQ7; Thermovibrio species including Thermovibrio ammonificans, Thermovibrio ruber, Thermovirga species including Thermovirga lienii and Thermus species including Thermus aquaticus, Thermus caldophilus, Thermus flavus, Thermus scotoductus, Thermus thermophilus; Thiobacillus neapoiitanus.
The bacteria may be mesophilic and selected from any of: Acidithiobacillus species including Acidithiobacillus caldus ; Actinobacillus species including Actinobacillus succinogenes; Anaerobiospirillum species including Anaerobiospirillum succiniciproducens; Bacillus species including Bacillus alcaliphilus, Bacillus amyloliquefaciens, Bacillus circulans, Bacillus cereus, Bacillus clausii, Bacillus firmus, Bacillus halodurans, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus pumilus, Bacillus subtilis, Bacillus thuringiensis·, Basfia species including Basfia succiniciproducens·, Brevibacillus species including Brevibacillus brevis ; Brevibacillus laterosporus ; Clostridium species including Clostridium acetobutylicum, Clostridium autoethanogenum, Clostridium beijerinkii, Clostridium carboxidivorans, Clostridium cellulolyticum, Clostridium ljungdahlii, Clostridium pasteurianum, Clostridum perfringens, Clostridium ragsdalei, Clostridium saccharobutylicum, Clostridium saccharoperbutylacetonium·, Corynebacterium species including Corynebacterium glutamicum ; Desulfitobacterium species including Desulfitobacterium dehalogenans,
Desulfitobacterium hafniense ; Desulfotomaculum species including Desulfotomaculum acetoxidans, Desulfotomaculum gibsoniae, Desulfotomaculum reducens, Desulfotomaculum ruminis·, Enterobacter species including Enterobacter asburiae Enterococcus species including Enterococcus faecalis; Escherichia species including Escherichia coir, Lactobacillus species including Lactobacillus acidophilus, Lactobacillus amylophilus, Lactobacillus amylovorus, Lactobacillus animalis, Lactobacillus arizonensis, Lactobacillus bavaricus, Lactobacillus brevis, Lactobacillus buchneri, Lactobacillus bulgaricus, Lactobacillus casei, Lactobacillus corynoformis, Lactobacillus crispatus, Lactobacillus curvatus, Lactobacillus delbrueckii, Lactobacillus fermentum, Lactobacillus gasseri, Lactobacillus helveticus, Lactobacillus johnsonii, Lactobacillus pentosus, Lactobacillus plantarum, Lactobacillus reuteri, Lactobacillus rhamnosus, Lactobacillus sakei, Lactobacillus salivarius, Lactobacillus sanfriscensis Mannheimia species including Mannheimia succiniciproducens·, Paenibacillus species including Paenibacillus alvei, Paenibacillus beijingensis, Paenibacillus borealis, Paenibacillus dauci, Paenibacillus durus, Paenibacillus graminis, Paenibacillus larvae, Paenibacillus lentimorbus, Paenibacillus macerans, Paenibacillus mucilaginosus, Paenibacillus odorifer, Paenibacillus polymyxa, Paenibacillus stellifer, Paenibacillus terrae, Paenibacillus wulumuqiensis·, Pediococcus species including Pediococcus acidilactici, Pediococcus claussenii, Pediococcus ethanolidurans, Pediococcus pentosaceus; Salmonella typhimurium; Sporolactobacillus species including Sporolactobacillus inulinus, Sporolactobacillus laevolacticus, Staphylococcus aureus; Streptococcus species including Streptococcus agalactiae, Streptococcus bovis, Streptococcus equisimilis, Streptococcus feacalis, Streptococcus mutans, Streptococcus oralis, Streptococcus pneumonia, Streptococcus pyogenes, Streptococcus salivarius, Streptococcus thermophilus, Streptococcus sobrinus, Streptococcus uberis ; Streptomyces species including Streptomyces achromogenes, Streptomyces avermitilis, Streptomyces coelicolor, Streptomyces griseus, Streptomyces lividans, Streptomyces parvulus, Streptomyces venezuelae, Streptomyces vinaceus ; Tetragenococcus species including Tetragenococcus halophilus and Zymomonas species including Zymomonas mobilis.
The base editors and methods of the invention may be used to modify the genome of yeast or fungi which may be mesophilic and wherein the fungus is selected from: an Aspergillus species including, but not limited to, Aspergillus nidulans, Aspergillus niger, Aspergillus terreus, Aspergillus oryzae and Aspergillus terreus; more preferably the Aspergillus species is Aspergillus nidulans or Aspergillus niger. Alternatively, the mesophilic fungal species could be a Candida species. Or the yeast or fungal species may be thermophilic, e.g. the fungi or yeast may be selected from: Aspergillus species including Aspergillus fumigatus, Aspergillus nidulans, Aspergillus terreus, Aspergillus
versicolor, Canariomyces species including Canariomyces thermophile; Chaetomium species including Chaetomium mesopotamicum, Chaetomium thermophilum ; Candida species including Candida bovina, Candida sloofii, Candida thermophila, Candida tropicalis, Candida krusei ( =lssatchenkia oriental is)] Cercophora species including Cercophora coronate, Cercophora septentrionalis] Coonemeria species including Coonemeria aegyptiaca] Corynascus species including Corynascus thermophiles] Geotrichum species including Geotrichum candidum ; Kluyveromyces species including Kluyveromyces fragilis, Kluyveromyces marxianus] Malbranchea species including Malbranchea cinnamomea, Malbranchea sulfurea] Melanocarpus species including Melanocarpus albomyces]Myceliophtora species including Myceliophthora fergusii, Myceliophthora thermophila] Mycothermus species including Mycothermus thermophiles (=Scytalidium thermophilum/Torula thermophila)] Myriococcum species including Myriococcum thermophilum] Paecilomyces species including Paecilomyces thermophila] Remersonia species including Remersonia thermophila] Rhizomucor species including Rhizomucor pusillus, Rhizomucor tauricus] Saccharomyces species including Saccharomyces cerevisiae, Schizosaccharomyces species including Schizosaccharomyces pombe, Scytalidium species including Scytalidium thermophilum] Sordaris species including Sordaria thermophila] Thermoascus species including Thermoascus aurantiacus, Thermoascus thermophiles] Thermomucor species including Thermomucor indicae-seudaticae and Thermomyces species including Thermomyces ibadanensis, Thermomyces lanuginosus.
More particularly, the base editors which have been made by the inventors are as follows:
Intended for use in bacteria (and tested in E. coli)\
1. dThermoTarget-AID (a fusion of: dead ThermoCas9 - a 121 amino acid linker — PmCDAI - a 2 amino acid linker - UGI - a 3 amino acid degradation tag).
2. dGeoTarget-AID (a fusion of dead GeoCas9 - a 121 amino acid linker - PmCDAI - a 2 amino acid linker - UGI - a 3 amino acid degradation tag).
3. AcrThermoTarget-AID (a fusion of active ThermoCas9 - a 121 amino acid linker — PmCDAI - a 2 amino acid linker - UGI - a 3 amino acid degradation tag) plus expression of the anti-crispr protein AcrIICINme as a separate gene in the same plasmid.
4. AcrGeoTarget-AID (a fusion of active ThermoCas9 - a 121 amino acid linker - PmCDAI - a 2 amino acid linker - UGI - a 3 amino acid degradation tag) plus
expression of the anticrispr protein AcrIICINme as a separate gene in the same plasmid.
A single plasmid approach is used, meaning that in E.coli for dThermoCas9 and dGeoCas9 base editors, the polynucleotide fusion (=dCas9+PmCDA1+UGI+LVA) and the sgRNA are in the same medium copy number plasmid (pACYC184).
Intended for use in mammalian, preferably human cells (and tested in HEK293T):
1. nThermoBE4 (a fusion of APOBEC1 - a 32 amino acid linker - nickase ThermoCas9 - a 10 amino acid linker - UGI - a 10 amino acid linker - UGI - a 4 amino acid linker - SV40 NLS).
2. nThermoTarget-AID (a fusion of nickase ThermoCas9 - a 104 amino acid linker — PmCDAI - a 10 amino acid linker- UGI). In HEK239T cells, the polynucleotide fusion (=rAPOBEC1+nThermoCas9+UGI+UGI) and the sgRNA are in the same plasmid (pCMV).
In all cases, the target sequence (protospacers) are always on the genome.
For the AcrIICI :ThermoCas9 & GeoCas9 base editors, the polynucleotide fusion (=Cas9+PmCDA1+UGI+LVA), the sgRNA and the acrIICINme gene are in the same medium copy number plasmid (pACYC184).
EXAMPLES
Example 1: “dGeoTarget-AID” base-editing system
For the design and cloning of the vector that carries and expresses the “dGeoTarget-AID” system, the 3’-end (minus the stop codon) of the deactivated GeoCas9 (D8A, H582A) endonuclease gene from Geobacillus stearothermophilus (Harrington et al., (2017a) supra) [SEQ ID NO: 6] was fused to a 363 bp long (SH3 and 3xFLAG tag) linker sequence, which was in turn fused to the Petromyzon marinus cytosine deaminase (PmCDAI) gene (Nishida etai, (2016) Science Vol 353 Issue 6305, aaf8729) - minus the stop codon, which was in turn fused to a 6 bp (SR) linker sequence, which was in turn fused to the uracil DNA glycosylase inhibitor (UGI) gene (minus the stop codon) from bacteriophage PBS2 (Zhigang etai, (1991) Gene 99: 31 - 37), which was in turn fused to the LVA protein degradation tag sequence (Andersen etai, (1998) Appl. Environ. Microbiol. 64: 2240 - 2246) followed by a stop codon. The genetic sequence of the dGeoCas9-PmCDA1-UGI-LVA fusion was cloned into a low copy number plasmid (pACYC184) under the transcriptional control of a synthetic, IPTG-inducible, tetracycline promoter (Ptet-lac; Ptet combined with lac operator). Also cloned into the same plasmid
was a sgRNA-expressing module transcribed from the strong, constitutive promoter PJ23119 and the lad gene constitutively expressed from its native promoter (Placl). The Lad inhibits the expression of the dGeoCas9-PmCDA1-UGI-LVA fusion, by binding to the lac operator sequence upstream of the corresponding genetic sequence, while addition of IPTG blocks this binding and permits the expression of the dGeoCas9-PmCDA1-UGI-LVA fusion. The resulting pdGeoTarget-AID vector was employed as the basis for the construction of the experimental vectors described further on.
The targeted base-editing efficiency of the dGeoTarget-AID system was examined. The E. coli DH10B_gfp strain was employed as the main experimental strain. The DH10B_gfp strain was constructed by integrating a gfp gene into the genome of the E. coli DH10B strain. Six spacers, designed to target protospacers within the sequence of the genomically integrated gfp gene, were incorporated separately into the 5’-end of the sgRNA module of the pdGeoTarget-AID vector (see Table 1). The selected protospacers were flanked by PAMs for which GeoCas9 was previously demonstrated to have variable levels of preference (Harrington et al., (2017a) supra). Upon transformation of the resulting pdGeoTarget-AID_BE-G1/2/3/4/5/6 vectors in chemically competent E. coli DH10B_gfp cells, the expression of the dGeoTarget-AID fusion was induced with the addition of 50 mM IPTG during recovery and plating in order to trigger base-editing. The next day, several colonies were streaked on selection “master” plates with no inducers and were simultaneously screened for base-editing through Q5 PCR amplification of the targeted region with genome specific primers, purification of the amplified fragments, Sanger sequencing and high-throughput in silico analysis of the results employing a variation of the on-line tool “EditR” (Kluesner et al., (2018) CRISPR J. 1: 239 - 250). Each base-editing experiment was performed in at least three biological replicates.
The expression of the dGeoTarget-AID fusion was induced with the addition of 50 mM IPTG during recovery and plating, while the sgRNA module was constitutively transcribed. Figure 5A shows the percentages resulting from high-throughput in silico analysis of Sanger sequencing of several single colonies (x axis), employing a variation of the on-line tool “EditR” and setting as threshold p £ 0.05. Single colonies were obtained from base-editing experiments of at least three independent biological replicates. No base-editing was observed in any C-position, when pdGeoTarget-AID_BE-G5/6 vectors were applied. Streaking. Several colonies (colonies number 23-27 for spacer BE-G1, and 23-28 for spacer BE-G2), which were screened as completely unedited, were selected from spot-streaked “master” plates and were subsequently streaked on plates supplemented with 1 mM IPTG for maximum induction of the pdGeoTarget-AID system. A single colony per streak was screened by PCR and Sanger sequencing (colonies with numbers 23’-27’ for spacer BE-G1, and 23’-28’ for spacer BE-G2), and the base-editing efficiencies are reported in the heatmaps. Figure 5B Two single colonies (colony with number 2 for spacer BE-G1, and 14 for spacer BE-G2), previously screened as partially edited, were selected from spot-streaked “master” plates and were subsequently streaked on plates supplemented with 1 mM IPTG for maximum induction of the pdGeoTarget-AID system. Several single colonies per streak were screened by PCR and Sanger sequencing (colonies 2a-2r for spacer BE-G1, and 14a-14s for spacer BE-G2), and the base-editing efficiencies are reported in the heatmaps.
C*G to T·A targeted base-editing with the dGeoTarget-AID system reached editing efficiencies of up to 100% within a single colony. After only one night of incubation, 4 out of 6 spacers resulted in base-editing within or immediately upstream of the protospacer region (Figure 5A). The base-editing efficiency (%mixed or clean edited colonies/screened colonies) when targeting protospacers with the most preferred PAMs (as previously reported by Harrington et al., 2017) was 7% (2/27), 25% (7/28), 11% (2/18), and 87%
(27/31) for spacers BE-G1, BE-G2, BE-G3, and BE-G4, respectively. No editing was observed with the other 2 spacers (BE-G5 and BE-G6), due to the low preference of dGeoCas9 for their corresponding PAM (Harrington et a!., (2017a) supra). The dGeoTarget-AID preferentially edited Cs at the PAM-distal end of the protospacer, similar to the commonly used SpyCas9-base editors. However, a much broader window of activity (from -10 to -24 positions) was observed, extending the target spectrum from 4-6 bp to 15 bp. Intriguingly, base-editing with BE-G2 preferentially lead to the conversion of C-24, just outside of the protospacer region, reaching 100% efficiency. Possibly this base is part of the R-loop formed by the dGeoCas9:DNA complex and thus accessible for deamination by the PmCDAl Together, these results suggest that the dGeoTarget-AID editor mediates rapid and efficient base-editing in a wide activity window, increasing the freedom for generation of stop codons or multiple nucleotide substitutions in the target gene.
Most of the screened colonies were not clean base-editing mutants, as the majority exhibited mixed wild-type/mutant genotypes. On that note, we reasoned to investigate whether extended incubation of cells under conditions that induce base-editing would result in the occurrence of clean mutant cells. For this purpose, we selected from the “master” plates the streaks from the colonies which were screened to be completely unedited (Figure 5A, colonies with numbers 23-27 for spacer BE-G1 and 23-28 for spacer BE-G2) and streaked them on plates supplemented with 1 mM IPTG (higher IPTG concentration compared to the 50 mM employed for the recovery and plating immediately after transformation). Upon screening of newly formed single colonies, many were shown to be mixed or clean mutants in multiple positions (Figure 5A). Additionally, selected from the “master” plates were streaks from colonies which were screened to have the highest number of partial edits (Figure 5A, colony 2 for spacer BE-G1 and colony 14 for spacer BE-G2) and these were streaked on plates with 1 mM IPTG, to investigate whether this could further increase the occurrence of base-edited colonies with clean genotypes. Indeed, 50% (9/18) and 95% (18/19) of the screened single colonies for BE-G1 and BE- G2, respectively, showed one or multiple (up to 6) 100% edited cytosines to thymines (Figure 5B). Interestingly, numerous positions that were not edited before (C-9 and C-15 for BE-G1; C-5, C-9, C-11, C-22 for BE-G2) exhibited 100% editing with this second incubation step, further expanding the activity window from -5 to -24 positions (20 bp) (Figure 5B). These observations denote that extended base-editing conditions massively increase the number of clean mutants and widen the window of activity, not only towards the PAM distal region but also towards the theoretical PAM proximal “seed region”.
Altogether, the dGeoTarget-AID base-editing tool can be successfully employed to generate OG to T·A mutants in E. coli with efficiencies of up to 100% in a single colony, and activity window from -10 to -24 positions (15 bp) with 1-step incubation or from -5 to - 24 positions (20 bp) with 2-step incubation. This novel base editor with alternative PAM preferences compared to the currently available base-editing systems significantly expands both the targeting scope and the editing window in bacteria.
Example 2: “dThermoTarget-AID” base-editing system
For the design and cloning the vector that carries and expresses the “dThermoTarget-AID” system, the 3’-end (minus the stop codon) of the deactivated ThermoCas9 (D8A, H582A) endonuclease gene from Geobacillus thermodenitrificans T 12 (Mougiakos, Mohanraju, and Bosma eta!., (2017) supra) [SEQ ID NO: 8] was fused to a 363 bp long (SH3 and 3xFLAG tag) linker sequence, which was in turn fused to the Petromyzon marinus cytosine deaminase (PmCDAI) gene (minus the stop codon)
(Nishida eta!., (2016) supra), which was in turn fused to a 6 bp (SR) linker sequence, which was in turn fused to the uracil DNA glycosylase inhibitor (UGI) gene (minus the stop codon) from bacteriophage PBS2 (Zhigang eta!., (1991) supra), which was in turn fused to the LVA protein degradation tag sequence (Andersen eta!., (1998) supra) followed by a stop codon. The genetic sequence of the dThermoCas9-PmCDA1-UGI-LVA fusion was cloned into a low copy number plasmid (pACYC184) under the transcriptional control of a synthetic, IPTG-inducible, tetracycline promoter (Ptet-lac; Ptet combined with lac operator). In the same plasmid, we cloned a sgRNA-expressing module transcribed from the strong, constitutive promoter PJ23119 and the lad gene constitutively expressed from its native promoter (Placl). The Lad inhibits the expression of the dThermoCas9-PmCDA1-UGI- LVA fusion, by binding to the lac operator sequence upstream of the corresponding genetic sequence, while addition of IPTG blocks this binding and permits the expression of the dThermoCas9-PmCDA1-UGI-LVA fusion. The resulting pdThermoTarget-AID vector was employed as the basis for the construction of the experimental vectors described further on.
The targeted base-editing efficiency of the dThermoTarget-AID system was examined. The E. coli DH10B_gfp strain was employed as the main experimental strain; for its construction, a gfp gene was integrated into the genome of the E. coli DH10B strain. Six spacers, designed to target protospacers within the sequence of the genomically integrated gfp gene, were incorporated separately into the 5’-end of the sgRNA module of the pdThermoTarget-AID vector (see Table 2). The selected protospacers were flanked by PAMs for which ThermoCas9 was previously demonstrated to have variable levels of
preference (Mougiakos, Mohanraju, and Bosma et al., (2017) supra). Upon transformation of the resulting pdThermoTarget-AID_BE-T1/2/3/4/5/6 vectors in chemically competent E. coli DH10B_gfp cells, the expression of the dThermoTarget-AID fusion was induced with the addition of 50 mM IPTG during recovery and plating in order to trigger base-editing. The next day, several colonies were streaked on selection “master” plates with no inducers and were simultaneously screened for base-editing through Q5 PCR amplification of the targeted region with genome specific primers, purification of the amplified fragments, Sanger sequencing and high-throughput in silico analysis of the results employing a variation of the on-line tool “EditR” (Kluesner et al., (2018) supra). Each base-editing experiment was performed in at least three biological replicates.
The expression of the dThermoTarget-AID fusion was induced with the addition of 50 pM IPTG during recovery and plating, while the sgRNA module was constitutively transcribed. Figure 6A shows the percentages resulting from high-throughput in silico analysis of Sanger sequencing of several single colonies (x axis), employing a variation of
the on-line tool “EditR” and setting as threshold p £ 0.05. Single colonies were obtained from base-editing experiments of at least three independent biological replicates.
Streaking
Several colonies (colonies number 34-41 for spacer BE-T1, and 36-44 for spacer BE-T4), which were screened as partially edited or completely unedited, were selected from spot-streaked “master” plates and were subsequently streaked on plates supplemented with 1 mM IPTG for maximum induction of the pdThermoTarget-AID system. A single colony per streak was screened by PCR and Sanger sequencing (colonies with numbers 34’-41’ for spacer BE-T1, and 36’-44’ for spacer BE-T4), and the base-editing efficiencies are reported in the heatmaps. Figure 6B shows two single colonies (colony with number 13 for spacer BE-T1, and 15 for spacer BE-T4) , previously screened as partially edited, were selected from spot-streaked “master” plates and were subsequently streaked on plates supplemented with 1 mM IPTG for maximum induction of the pdThermoTarget-AID system. Several single colonies per streak were screened by PCR and Sanger sequencing (colonies 13a-13j for spacer BE-T1, and 15a- 15j for spacer BE-T4), and the base-editing efficiencies are reported in the heatmaps.
Demonstrated for the first time is OG to T·A targeted base-editing with the described dThermoTarget-AID system, reaching base-editing efficiencies of up to 100% within a single colony. After only one night of incubation, all tested spacers (6/6) resulted in base-editing within or immediately upstream of the protospacer region (Figure 6A). The base-editing efficiency (%mixed or clean edited colonies/screened colonies) was 98% (40/41), 23% (10/44), 14% (3/21), 57% (25/44), 83% (30/36), and 11% (3/28) for BE-T1, BE-T2, BE-T3, BE-T4, BE-T5, and BE-T6, respectively. The dThermoTarget-AID preferentially edited Cs at the PAM-distal end of the protospacer, similar to the commonly used SpyCas9-base editors. However, a much broader window of activity (from -6 to -27 positions) was observed, extending the target spectrum from 4-6 bp to 22 bp.
Interestingly, base-editing with BE-T1 and BE-T4 lead to the conversion of C-27 and C-24, just outside of the protospacer region, reaching efficiencies 35% and 38%, respectively. These outcomes suggest that the dThermoTarget-AID editor mediates rapid and efficient base-editing in a wide activity window, increasing the freedom for generation of stop codons or multiple nucleotide substitutions in the target gene.
Most of the screened colonies were not clean base-editing mutants, as the majority exhibited mixed wild-type/mutant genotypes. Therefore, extended incubation of cells under conditions that induce base-editing was carried out, to see if this would result in the occurrence of clean mutant cells. For this purpose, selected from the “master” plates were the streaks from the colonies which were screened to be either completely unedited
or partially edited (Figure 6A, colonies with numbers 34-41 for spacer BE-T 1 and 36-44 for spacer BE-T4) and streaked them on plates supplemented with 1 mM IPTG (higher IPTG concentration compared to the 50 mM employed for the recovery and plating immediately after transformation). Upon screening of newly formed single colonies, most were shown to be mixed or clean mutants in multiple positions (Figure 6A). Additionally, selected from the “master” plates were the streaks from partially edited colonies (Figure 6A, colony 13 for spacer BE-T1 and colony 15 for spacer BE-T4) and these were streaked on plates with 1 mM IPTG, to investigate whether the occurrence of base-edited colonies with clean genotypes could be further increased. Indeed, 100% (10/10) and 50% (5/10) of the screened single colonies for BE-T1 and BE-T4, respectively, showed one or multiple (up to 5) 100% edited cytosines to thymines (Figure 6B). Notably, various positions that were not edited before (C-5, C-9, C-13, C-14, C-18, C-20, and 0-21 for BE-T 1 ; C-9, C-10, C-14, and C-24 for BE-T4) exhibited editing with this second incubation step, further expanding the activity window from -5 to -27 positions (23 bp) (Figure 6B). These observations denote that extended base-editing conditions massively increase the number of clean mutants and widen the window of activity, not only towards the PAM distal region but also towards the theoretical PAM proximal “seed region”.
All in all, the dThermoTarget-AID base-editing tool can be used to generate OG to T·A mutants in E. coli with efficiencies of up to 100% in single colonies, and activity window from -6 to -27 positions (22 bp) with 1-step incubation or from -5 to -27 positions (23 bp) with 2-step incubation. Akin to dGeoTarget-AID, this novel base editor with unique PAM preferences compared to the currently available base-editing systems significantly expands both the targeting scope and the editing window in bacteria.
Example 3: “AcrGeoTarget-AID” base-editing system
In order to increase the efficiency of the dGeoTarget-AID system to generate clean base-edited mutants, without the requirement for additional streaking steps, a system was devised that combines base-editing and counterselection. The AcrllClNme anti-CRISPR protein was previously shown (Harrington et a!., (2017b) supra) to block GeoCas9 from introducing double-stranded DNA break to a targeted protospacer, upon steadily binding to its HNH domain, but does not stop GeoCas9 from binding to the targeted protospacer. An AcrllClNme:GeoCas9-PmCDA1-UGI-LVA system was created (hereafter denoted as “AcrGeoTarget-AID”). Induction of the AcrllClNme expression by this system allows the GeoCas9-PmCDA1-UGI-LVA fusion to perform only base-editing, while stopping the induction of the AcrllClNme expression, resulting in counter-selection of the unedited cells by the active GeoCas9 component of the GeoCas9-PmCDA1-UGI-LVA
fusion. The dgeocas9 gene was substituted into the previously described pdGeoTarget- AID plasmids with the active geocas9 gene, whilst simultaneously cloning into the same vectors the acriidNme gene under the transcriptional control of the rhamnose-inducible promoter (Prha). Upon transformation of the 6 resulting pAcrGeoCas9Target-AID_BE- G1/2/3/4/5/6 vectors in E. coli DH10B _gfp cells, base-editing (50 mM IPTG and 0.2% L- rhamnose) and counter-selection (1 mM IPTG) conditions were applied during recovery (R) and plating (P), respectively. As controls, no induction conditions or base-editing conditions (50 mM IPTG and 0.2% L-rhamnose) were employed only during plating. The next day, several colonies were streaked on selection “master” plates with no inducers and were simultaneously screened for base-editing through Q5 PCR amplification of the targeted region with genome specific primers, purification of the amplified fragments, Sanger sequencing and high-throughput in silico analysis of the results employing a variation of the on-line tool “EditR” (Kluesner et al., (2018) supra), previously developed in our lab. Each base-editing experiment was performed in at least three biological replicates.
Shown for the first time is OG to T·A base-editing with an active type II CRISPR nuclease, reaching editing efficiencies of up to 100%. Substantial base-editing efficiencies were observed for the constructs expressing the BE-G1, BE-G2 and BE-G4 spacers, while minor base-editing efficiencies were observed for the constructs expressing the other three spacers (BE-G3, BE-G5, BE-G6) (see Table 3). AcrGeoTarget-AID-mediated base-editing coupled to counter-selection increases the efficiency compared to the dGeoTarget-AID for spacers BE-G1 and BE-G2, but that was not the case for spacer BE-G4. This phenomenon can be explained by the fact that BE-G4 contains Cs only at either the extreme PAM-proximal end (C-2, C-3) or the extreme PAM-distal end (C-19, C-20), where base-editing events are unlikely or very likely to happen, respectively. Previous studies (Hsu etai, (2013) Nature Biotechnology 31: 827 - 832) for other Cas9 endonucleases demonstrate that 1 or 2 spacer-protospacer mismatches, especially at the PAM-distal end, are generally more tolerated than multiple consecutive mismatches. So, GeoCas9 probably tolerates spacer-protospacer mismatches at the PAM-distal end, resulting in this case from C-19 and/or C-20 base-editing, and could still cleave the possibly edited genomic target, triggering cell death. Hence, not only the wild type but also the mutant cells were eliminated from the population, resulting in lower base-editing efficiencies for the PAM-distal positions compared to the catalytically deactivated counterpart (dGeoTarget-AID).
Table 3. Base-editing efficiency of AcrGeoTarget-AID with different spacers and induction conditions compared to dGeoTarget-AID.
Figure 7A shows base-editing (50 mM IPTG and 0.2% L-rhamnose) and counter selection (1 mM IPTG) conditions were applied during recovery (R) and plating (P), respectively. As controls, no induction conditions or base-editing conditions (50 mM IPTG and 0.2% L-rhamnose) were employed only during plating. The sgRNA module was always constitutively transcribed. The percentages resulted from high-throughput in silico analysis of Sanger sequencing of several single colonies (x axis), employing a variation of the on-line tool “EditR” and setting as threshold p £ 0.05. Single colonies were obtained from base-editing experiments of at least three independent biological replicates. Streaking: several colonies (colonies with numbers 22-27 for BE-G1 P: base-editing; 19-22 for BE-G1 P: counter-selection; 20-24 for BE-G2 P: base-editing; 13 and 14 for BE-G2 P: counter-selection), which were screened as partially edited or completely unedited, were selected from spot-streaked “master” plates and were subsequently streaked on plates supplemented with 1 mM IPTG for maximum induction of the pAcrGeoTarget-AID system. A single colony per streak was screened by PCR and Sanger sequencing (colonies with numbers 22’-27’ for BE-G1 P: base-editing; 19’-22’ for BE-G1 P: counter-selection; 20’-24’ for BE-G2 P: base-editing; 13’ and 14’ for BE-G2 P: counter-selection), and the base editing efficiencies are reported in the heatmaps. Figure 7B shows two single colonies (colony with number 7 for spacer BE-G1 P: counter-selection, and 1 for spacer BE-G2 P: counter-selection), previously screened as partially edited, were selected from spot- streaked “master” plates and were subsequently streaked on plates supplemented with 1 mM IPTG for maximum induction of the pAcrGeoTarget-AID system. Several colonies were screened by PCR and Sanger sequencing (colonies 7a-7q for spacer BE-G1 P:
counter-selection, and 1a-1s for spacer BE-G2 P: counter-selection), and the base-editing efficiencies are reported in the heatmaps.
AcrGeoTarget-AID provided numerous colonies with 100% OG to T·A conversion of at least one cytosine, even with only 1 hour of base-editing induction (counter-selection case) (Figure 7A), contrary to the results from dGeoTarget-AID for the same targets (Figure 5A). Regarding dGeoTarget-AID, clean base-edited colonies only occurred with spacer BE-G2 in 2/20 screened colonies, whereas induction of AcrGeoTarget-AID lead to 10/24 (P: base-editing) and 4/14 (P: counter-selection) colonies with at least one cytosine 100% converted to thymine. Additionally, spacer BE-G1 showed 8/22 (P: counter selection) colonies with complete deamination, while BE-G4 marked 4/33 (P: base-editing) and 5/32 (P: counter-selection) colonies.
Opposite to dGeoTarget-AID, the editing window of AcrGeoTarget-AID is shifted towards the “seed region”, most probably due to lower spacer-protospacer mismatch tolerance at these positions (Figure 7A). For example, Cs at PAM-proximal positions that have not been edited by dGeoTarget-AID, were surprisingly edited up to 100% from the AcrGeoTarget-AID (C-9 for BE-G1 and BE-G2; C-15 for BE-G3; C-3 and C-12 for BE-G6), while editing efficiencies of Cs at the PAM-distal end remained high (C-24 for BE-G2; C-19 and C-20 for BE-G4). The activity window of the AcrGeoTarget-AID was 26 bp (from -3 to -28).
The AcrGeoTarget-AID editor mediates rapid and efficient base-editing in a surprisingly wide activity window (26 bp contrary to 4-6 bp with SpyCas9), enabling the generation of premature stop codons outside the restricted region of the current base editing tools.
It is remarkable that the one-step protocol used in this study is amongst the fastest reported base-editing applications to date. Even though the AcrGeoTarget-AID system generated a substantially higher number of clean C to T mutants compared to the here described dGeoTarget-AID system for the same targets, it still resulted in colonies with mixed wild-type/mutant genotypes. In this context, it was investigated whether extended incubation of cells under conditions that induce counter-selection (1 mM IPTG) would result in the occurrence of more clean mutant cells (Li et al., (2019) Biotechnol Bioeng. 116: 1475 - 1483). For this purpose, selected from the “master” plates were the streaks from colonies which were screened to be mainly unedited (Figure 7A, colonies with numbers 22-27 for BE-G1 P: base-editing; 19-22 for BE-G1 P: counter-selection; 20-24 for BE-G2 P: base-editing; 13 and 14 for BE-G2 P: counter-selection) and these were streaked on plates with counter-selection conditions (1 mM IPTG). Upon screening of newly formed single colonies, many were shown to be mixed or clean mutants in multiple
positions (Figure 7A). Additionally, selected from the “master” plates were the streaks from colonies which were screened to have partial editing in at least one position (Figure 7A, colony 7 for spacer BE-G1 (P: counter-selection); colony 1 for spacer BE-G2 (P: counter-selection) and these were streaked on counter-selection plates (1 mM IPTG), to investigate whether we could further increase the occurrence of base-edited colonies with clean genotypes. Indeed, the resulting single colonies were screened to have one or multiple (up to 5) 100% edited cytosines, with 89% (17/19), 100% (17/17), 100% (16/16), and 100% (19/19) efficiencies, respectively (Figure 7B). Notably, various positions that were not edited before (C-15 for BE-G1; C-10, and C-22 for BE-G2) exhibited editing with this second incubation step, mostly reaching 100% efficiencies (Figure 7B). These observations denote that extended base-editing conditions massively increases the number of clean base-edited mutants, most of them being edited in multiple positions.
The AcrGeoTarget-AID system can be used to generate OG to T·A mutants in E. coli with efficiencies of up to 100%, and activity window from -3 to -28 positions (26 bp) with 1-step incubation or from -9 to -24 positions (16 bp) with 2-steps incubation.
Example 4: “AcrThermoTarget-AID” base-editing system
Aiming to increase the efficiency of the dThermoTarget-AID system to generate clean base-edited mutants, without the requirement for additional streaking steps, a system that combines base-editing and counter-selection was devised. This is an AcrllClNme:ThermoCas9-PmCDA1-UGI-LVA system (hereafter denoted as “AcrThermoTarget-AID”). It was expected that induction of the AcrllClNme expression by this system would allow the ThermoCas9-PmCDA1-UGI-LVA fusion to perform only base editing, while stopping the induction of the AcrllClNme expression would result in counter selection of the unedited cells by the active ThermoCas9 component of the ThermoCas9- PmCDA1-UGI-LVA fusion. The dthermocas9 gene in the previously described pdThermoTarget-AID plasmids was substituted with the active thermocas9 gene, while simultaneously cloning in the same vectors the acriidNme gene under the transcriptional control of the rhamnose-inducible promoter (Prha). Upon transformation of the 6 resulting pAcrGeoCas9Target-AID_BE-G1/2/3/4/5/6 vectors in E. coli DH10B _gfp cells, base-editing (50 mM IPTG and 0.2% L-rhamnose) and counter-selection (1 mM IPTG) conditions were applied during recovery (R) and plating (P), respectively. As controls, no induction conditions or base-editing conditions (50 mM IPTG and 0.2% L-rhamnose) were employed only during plating. The next day, several colonies were streaked on selection “master” plates with no inducers and were simultaneously screened for base-editing through Q5 PCR amplification of the targeted region with genome specific primers, purification of the
amplified fragments, Sanger sequencing and high-throughput in silico analysis of the results employing a variation of the on-line tool “EditR” (Kluesner etal., (2018) supra). Each base-editing experiment was performed in at least three biological replicates.
OG to T·A base-editing with an active type II CRISPR nuclease, reaching editing efficiencies of up to 100% is achieved. Substantial base-editing efficiencies were observed for constructs expressing the BE-T1, BE-T3, and BE-T4 spacers, while minor or no editing was observed with the other three spacers (BE-T2, BE-T5, and BE-T6) (see Table 4). Direct comparison of the editing efficiencies of the AcrThermoTarget-AID and dThermoTarget-AID base editors indicated that the former was more efficient for spacers BE-T3 and BE-T4, albeit the latter was more efficient for spacers BE-T1, BE-T2, BE-T5, and BE-T6.
Table 4. Base-editing efficiency of AcrThermoTarget-AID with different spacers and induction conditions compared to dThermoTarget-AID.
Base-editing (50 mM IPTG and 0.2% L-rhamnose) and counter-selection (1 mM IPTG) conditions were applied during recovery (R) and plating (P), respectively. As controls, no induction conditions or base-editing conditions (50 pM IPTG and 0.2% L- rhamnose) were employed only during plating. The sgRNA module was always constitutively transcribed. The percentages shown in Figure 8A resulted from high-
throughput in silico analysis of Sanger sequencing of several single colonies (x axis), employing a variation of the on-line tool “EditR” and setting as threshold p £ 0.05. Single colonies were obtained from base-editing experiments of at least three independent biological replicates. Streaking: several colonies screened as partially edited or completely unedited (colonies with numbers 28-35 for BE-T1 P: base-editing; 32-34 for BE-T1 P: counter-selection; 16-18 for BE-T4 P: base-editing; 24-27 for BE-T4 P: counter-selection) were selected from spot-streaked “master” plates and were subsequently streaked on plates supplemented with 1 mM IPTG for maximum induction of the pAcrGeoTarget-AID system (“Streaking”). A single colony per streak was screened by PCR and Sanger sequencing (colonies with numbers 28’-35’ for BE-T 1 P: base-editing; 32’-34’ for BE-T 1 P: counter-selection; 16’-18’ for BE-T4 P: base-editing; 24’-27’ for BE-T4 P: counter selection), and the base-editing efficiencies are reported in the heatmaps. Figure 8B shows two single colonies (colony with number 5 for spacer BE-T 1 P: counter-selection, and 7 for spacer BE-T4 P: counter-selection), previously screened as partially edited, were selected from spot-streaked “master” plates and were subsequently streaked on plates supplemented with 1 mM IPTG for maximum induction of the pAcrGeoTarget-AID system. Several colonies were screened by PCR and Sanger sequencing (colonies 5a-5i for spacer BE-T1 P: counter-selection, and 7a-7j for spacer BE-T4 P: counter-selection), and the base-editing efficiencies are reported in the heatmaps.
Contrary to dThermoTarget-AID (Figure 6A), AcrThermoTarget-AID provided various colonies with 100% C*G to T·A conversion of at least one cytosine, even with only 1 hour of base-editing induction (counter-selection case) (Figure 5A). Regarding dThermoTarget-AID, 3/41, 1/44, 1/21, 3/44, and 5/36 colonies revealed at least one fully converted C at any position for spacers BE-T1, BE-T2, BE-T3, BE-T4, and BE-T5, respectively. However, induction of AcrThermoTarget-AID lead to higher amount of complete deamination. Indicatively, 9/35 (P: base-editing; BE-T1), 11/34 (P: counter selection; BE-T1), 1/15 (P: no induction; BE-T1), 1/28 (P: no induction; BE-T2), 1/30 (P: base-editing; BE-T3), 7/31 (P: counter-selection; BE-T3), 6/18 (P: base-editing; BE-T4), 8/27 (P: counter-selection; BE-T4), and 2/42 (P: counter-selection; BE-T5) colonies were 100% edited at one or more positions.
Unlike dThermoTarget-AID, the preferred editing window of AcrThermoTarget-AID is shifted towards the “seed region”, due to lower mismatch tolerance leading to less efficient counter-selection at these positions (Figure 5A). Indicatively, dThermoTarget-AID with BE-T 1 exhibited a strong preference for Cs at the PAM-distal end (C-19, C-23, and C- 27), while AcrThermoTarget-AID edited scattered Cs (C-9, C-14, C-18, and C-19). Remarkably, no editing was observed at C-27 for AcrThermoTarget-AID. Furthermore,
dThermoTarget-AID with BE-T4 performed substantial editing at C-24, while AcrThermoTarget-AID was completely unable to edit this position. Moreover, dThermoTarget-AID with BE-T5 showed high editing efficiencies (approximately 83%) at C-20, whereas only 5-9% editing was observed for AcrThermoTarget-AID at this position, mimicking the case of dGeoTarget-AID and AcrGeoTarget-AID for the same spacer (BE TS = BE-G4). The overall activity window of the AcrThermoTarget-AID was 15 bp (from -9 to -23).
The AcrThermoTarget-AID editor mediates rapid and efficient base-editing in a wide activity window (15 bp contrary to 4-6 bp with SpyCas9), enabling the generation of premature stop codons outside the restricted region of the current base-editing tools.
Even though the AcrThermoTarget-AID system generated a substantially higher number of clean C to T mutants compared to the here described dThermoTarget-AID system for the same targets, it still resulted in colonies with mixed wild-type/mutant genotypes. Extended incubation of cells was investigated under conditions that induce counter-selection to see if this would result in the occurrence of more clean mutant cells. For this purpose, selected from the “master” plates were the streaks from colonies which were screened to be either partially or completely unedited (Figure 9A, colonies with numbers 28-35 for BE-T1 P: base-editing; and 32-34 for BE-T1 P: counter-selection). These were then streaked on plates with counter-selection conditions (1 mM IPTG). Upon screening of newly formed single colonies, many were shown to be mixed or clean mutants in up to 7 positions (Figure 8A). Additionally, partially edited colonies were selected and streaked on counter-selection plates (1 mM IPTG), to investigate occurrence of base-edited colonies with clean genotypes. Selected from the “master” plates were streaks from colonies with partial edits obtained from BE-T1 (P: counter-selection; colony 5), yielding in one or multiple (up to 4) 100% edited cytosines in all tested colonies (9/9) (Figure 8B). Positions that were not edited before (C-9 and C-13) were now found to be edited with this second incubation step, mostly reaching 100% efficiencies. Also selected from the corresponding “master” plate was the streak from a colony from BE-T4 (P: counter-selection; Colony 7) with a single mutated position (C-14), and this was streaked, triggering expansion of the activity window, spanning from position C-9 to a position residing outside of the spacer (C-24) with up to 5 edited cytidines all 100% mutated (Figure 8B). These observations demonstrate that extended base-editing conditions massively increase the number of clean base-edited mutants, most of them being edited in multiple positions.
The AcrThermoTarget-AID tool can be used to generate C*G to T·A mutants in E. coli with efficiencies of up to 100%, and activity window from -9 to -23 positions (15 bp) with 1-step incubation or from -5 to -27 positions (23 bp) with 2-steps incubation.
Example 5: Base-editing technology for human cells
Design and cloning of a vector that carries and expresses the “nThermoBE4” system: The 3’-end (minus the stop codon) of the rat Apolipoprotein B mRNA editing protein catalytic subunit 1 (rAPOBECI) (Komor et al., (2016) supra) was fused to a 96 bp long (2xSGGS, XTEN, and 2xSGGS) linker sequence, which was in turn fused to the nickase Cas9 (D8A) endonuclease gene from Geobacillus thermodenitrificans T 12 (Mougiakos, Mohanraju, and Bosma eta!., (2017) supra) (minus the start and the stop codons), which was in turn fused to a 30 bp long (3xSGGS) linker sequence, which was in turn fused to the uracil DNA glycosylase inhibitor (UGI) gene (minus the start and the stop codons) from bacteriophage PBS2 (Zhigang etal., (1991) supra), which was in turn fused to a 30 bp long (3xSGGS) linker sequence, which was in turn fused to a second copy of the ugi gene (minus the start and the stop codons), which was in turn fused to 12 bp long (SGGS) linker sequence, followed by a SV40 NLS signal and a stop codon. The genetic sequence of the rAPOBEC1-nThermoCas9(D8A)-UGI-UGI-NLS fusion was cloned into a plasmid without mammalian origin of replication (pCMV) under the transcriptional control of the constitutive, cytomegalovirus (CMV) promoter. Cloned in the same plasmid was a sgRNA expression module transcribed from the constitutive RNA-polymerase III U6 promoter. Three well characterized, non-essential genes in the genome of Human Embryonic Kidney 293 cells (HEK293T) cells where chosen as targets: the homeobox protein EMX1, the vascular endothelial growth factor A (VEGFA), and the DNA- methyltransferase 1 (DNMT1). Three different cytosine-rich targeting-spacers were designed for each target gene (see Table 5) and incorporated separately into the 5’-end of the sgRNA module of the nThermoBE4 vector. All selected protospacers were flanked by optimal PAM (5’-NNNNCCAA-3’), and HEK293T (ATCC CRL-3216TM) cells were employed as the main experimental cell line.
The resulting pnThermoBE4_TE1/TE2/TE3/TV1/TV2/TV3/TD1/TD2/TD3 vectors were transfected (Lipofectamine™ 3000 Transfection reagent (Thermo Fisher Scientific, Cat. No. L3000-008) in HEK239T cells. Three days post-transfection, the genomic DNA of each transfected culture was isolated, and Q5 PCR amplification of the corresponding target regions was performed, followed by purification, Sanger sequencing and T7 endonuclease I assays (EnGen mutation detection kit, NEB). Each T7 endonuclease I reaction was further subjected to 1% agarose gel electrophoresis and the results were
visualized and consecutively analysed using the “ImageLab” software. The base editing efficiencies were quantified using the free web tool “EditR” (Kluesner etal., (2018) supra). Finally, potential off-target sites were predicted in silico using the “ChopChop” on-line tool, and they were experimentally screened via Q5 PCR amplification of each predicted off- target region, followed by purification and Sanger sequencing.
Table 5. Characteristics of (proto)spacers used in this study for the ThermoCas9- based tool in human cells.
Figure 9 shows OG to T·A base editing by nThermoBE4 in HEK293T cells.
Heatmaps show the percentage of OG to T·A conversion in every Oposition within or
immediately upstream of protospacers (y axis) in the genomes of HEK293T cell populations transfected with the pnThermoBE4_BE-
T E 1 /T E2/T E3/T V 1 /T V2/T VE/T D1/TD2/TD3 vectors. White boxes represent no base editing, light to darker grey boxes represent increasing base-editing efficiencies, and black boxes represent 100% base-editing efficiency. The percentages resulted from in silico analysis of Sanger sequencing of several single colonies (x axis), employing a variation of the on-line tool “EditR”.
OG to T·A targeted base editing is provided by the described nThermoBE4 system, reaching on-target editing efficiencies up to 72% at the best edited site (Figure 9, protospacer targeted by the pnThermoBE4_BE-TV3 plasmid). 4 out of 9 targeted genomic regions were successfully base-edited in multiple positions across the protospacer sequence or immediately upstream (Figure 9). The nThermoBE4 preferentially edited Cs at the PAM-distal end of the protospacer, similar to the commonly used SpyCas9-base editors. However, the editing window (from -5 to -29 positions) was broader than the previously reported nSpyCas9-mediated BE4 system (from -13 to -18) and nSaCas9- mediated BE4 system (from -10 to -18) (Komor et al., (2017) supra). Interestingly, up to 7 cytosines were edited (including position C-28 and C-29) when the pnThermoBE4_BE-TV2 was applied (Figure 9). The C-28 and C-29 are part of the R-loop formed by the nThermoCas9:DNA complex and thus accessible for deamination by the rAPOBEd. Together, these results show how the nThermoBE4 editor mediates efficient base editing in a wide activity window, increasing the freedom for multiple nucleotide substitutions in the target gene for applications that this is desired.
The base-editing activity of the nThermoBE4 system at undesired loci within the genome of HEK293T cells was studied. The in silico- predicted off-target sites were screened in the populations with successful on-target OG to T·A conversion (cultures of HEK239T cells harbouring the pnThermoBE4_BE-TE2/BE-TV2/BE-TV3/BE-TD3 vectors). Base-editing was observed in 1/4, 1/13, 1/11, and 1/1 of the predicted off-target sites for spacers BE-TE2, BE-TV2, BE-TV3, and BE-TD3, respectively. In total, the “nThermoBE4” system exhibited less off-target activity (only 4/29 tested, “predicted” off-target sites were indeed edited) compared to the previously reported high off-target activity of the nSpyCas9-mediated BE4 system. For the latter, 3 studies have reported that at least half of the off-target sites which were predicted for different protospacers were indeed edited (17/34, 9/13, or 21/21 tested, “predicted” off-target sites) (Komor et al., (2016) supra ; Rees etai, (2017) supra ; Kim etai, (2017) supra). However, it has been shown that the base editing off-target effects, especially in the case of cytidine deaminase, can be probably
attributed to the over-activity of APOBEC1 and UGI and not due to Cas9 mismatch tolerance (Jin eta!., (2019) supra ; Zuo eta!., (2019) supra).
ThermoCas9 is smaller by almost 300 amino acids than SpyCas9, facilitating the use of convenient delivery systems, for example adenoviral vectors, which can reach different targets from SpyCas9 due to alternative PAM requirements and performs base editing in a broader window, facilitating the introduction of stop codons, e.g. the functional characterization of genes. Moreover, the ThermoCas9 base-editor presents reduced off- target activity, which is vital for applications in human cells.
Nucleotide and amino acid sequences
[SEQ ID NO: 1] Geobacillus stearothermophilus Cas9 AA Sequence
MRYKIGLDIGITSVGWAVMNLDIPRIEDLGVRIFDRAENPQTGESLALPRRLARSA
RRRLRRRKHRLERIRRLVIREGILTKEELDKLFEEKHEIDVWQLRVEALDRKLNND
ELARVLLHLAKRRGFKSNRKSERSNKENSTMLKHIEENRAILSSYRTVGEMIVKDP
KFALHKRNKGENYTNTIARDDLEREIRLIFSKQREFGNMSCTEEFENEYITIWASQ
RPVASKDDIEKKVGFCTFEPKEKRAPKATYTFQSFIAWEHINKLRLISPSGARGLT
DEERRLLYEQAFQKNKITYHDIRTLLHLPDDTYFKGIVYDRGESRKQNENIRFLELD
AYHQIRKAVDKVYGKGKSSSFLPIDFDTFGYALTLFKDDADIHSYLRNEYEQNGKR
MPNLANKVYDNELIEELLNLSFTKFGHLSLKALRSILPYMEQGEVYSSACERAGYT
FTGPKKKQKTMLLPNIPPIANPWMRALTQARKWNAIIKKYGSPVSIHIELARDLS
QTFDERRKTKKEQDENRKKNETAIRQLMEYGLTLNPTGHDIVKFKLWSEQNGRC
AYSLQPIEIERLLEPGYVEVDHVIPYSRSLDDSYTNKVLVLTRENREKGNRIPAEYL
GVGTERWQQFETFVLTNKQFSKKKRDRLLRLHYDENEETEFKNRNLNDTRYISR
FFANFIREHLKFAESDDKQKVYTVNGRVTAHLRSRWEFNKNREESDLHHAVDAVI
VACTTPSDIAKVTAFYQRREQNKELAKKTEPHFPQPWPHFADELRARLSKHPKES
IKALNLGNYDDQKLESLQPVFVSRMPKRSVTGAAHQETLRRYVGIDERSGKIQTV
VKTKLSEIKLDASGHFPMYGKESDPRTYEAIRQRLLEHNNDPKKAFQEPLYKPKK
NGEPGPVIRTVKIIDTKNQVIPLNDGKTVAYNSNIVRVDVFEKDGKYYCVPVYTMDI
MKGILPNKAIEPNKPYSEWKEMTEDYTFRFSLYPNDLIRIELPREKTVKTAAGEEIN
VKDVFVYYKTIDSANGGLELISHDHRFSLRGVGSRTLKRFEKYQVDVLGNIYKVRG
EKRVGLASSAHSKPGKTIRPLQSTRD*
[SEQ ID NO: 21 Geobacillus stearothermophilus Cas9 DNA Sequence
atgcgttataagattggcctggacatcggtattacctctgttggttgggcagtcatgaacctggatatccctcgtatcga agatctgggcgtgcgcatttttgaccgtgcggagaacccgcagaccggtgaatctctggctctgccgcgtcgtctgg cacgtagcgcacgccgccgcctgcgtcgtcgtaaacaccgtctggagcgtattcgtcgcctggttattcgtgaaggc atcctgacgaaagaagaactggataaactgttcgaagaaaaacacgagatcgacgtatggcagctgcgtgtaga agccctggaccgtaagctgaacaacgacgaactggcgcgtgtcctgctgcatctggcaaagcgtcgtggcttcaa atctaaccgtaaatctgaacgctccaataaagagaactccactatgctgaaacatattgaggagaaccgtgcaatt ctgtctagctaccgtaccgtgggcgaaatgattgttaaagacccgaaattcgcactgcataagcgtaacaaaggcg aaaactacaccaacaccattgcacgcgatgacctggaacgtgaaatccgtctgattttctccaaacagcgcgaatt cggcaacatgtcttgcaccgaagaattcgaaaacgaatatattaccatttgggcatctcagcgtccggtggcgtcta aagatgatatcgaaaaaaaagtaggcttttgtactttcgaaccgaaggaaaaacgtgcgccgaaagccacctata ccttccagtcttttatcgcgtgggaacatatcaacaaactgcgtctgatttctccgtctggcgcccgcggcctgaccga cgaagaacgtcgtctgctgtatgaacaagcattccagaaaaacaaaattacctaccacgatattcgtaccctgctg catctgccggacgacacctacttcaagggcatcgtttacgatcgcggtgaatctcgtaagcagaacgaaaacattc gtttcctggaactggatgcataccaccagatccgtaaagctgtagataaagtttacggcaagggtaaatccagcag cttcctgccgatcgactttgataccttcggttacgcgctgaccctgtttaaagacgatgcggatatccactcttacctgc gcaacgagtacgaacagaacggcaaacgtatgcctaacctggctaacaaagtttacgataacgagctgattgaa gaactgctgaacctgtccttcactaaattcggtcacctgtctctgaaagctctgcgttccatcctgccgtatatggaaca gggtgaagtctactcctccgcttgtgaacgtgcaggctacaccttcaccggtccgaaaaagaagcaaaaaactat gctgctgccgaacatcccgccgattgcgaaccctgtagtaatgcgtgcactgacccaggcgcgcaaagtagtcaa cgcgatcatcaaaaagtacggcagcccggtttccatccatatcgaactggcgcgcgacctgagccagacttttgac gagcgtcgtaaaactaaaaaggaacaggatgaaaaccgtaaaaaaaacgaaaccgcgatccgccagctgat ggaatacggtctgactctgaaccctactggtcacgatattgtgaagttcaagctgtggtctgaacagaacggtcgct gtgcttactctctgcagccgatcgagatcgaacgtctgctggagccaggttacgttgaagtagatcatgtgatcccgt actcccgctctctggatgattcttataccaacaaagttctggttctgactcgcgaaaaccgtgagaaaggcaaccgc atcccagctgaatatctgggtgttggcactgagcgttggcaacagttcgaaaccttcgtcctgaccaataaacagttc tctaaaaagaaacgtgaccgtctgctgcgtctgcactacgatgaaaacgaagagactgaattcaaaaaccgtaa cctgaacgatactcgctacatcagccgcttcttcgcaaacttcattcgtgaacacctgaaatttgcggaatccgacga taaacagaaagtttataccgtaaacggccgtgttaccgcccacctgcgttctcgctgggagttcaacaagaaccgt gaggaaagcgatctgcaccacgctgttgacgccgttattgtggcgtgcaccaccccaagcgatatcgctaaggtg accgcattctaccagcgtcgtgagcagaacaaggaactggccaaaaaaaccgaaccgcattttccgcagccgtg gccgcacttcgcggacgaactgcgtgctcgtctgtccaaacatcctaaagaaagcatcaaagctctgaacctgggt aactacgatgaccaaaaactggaatctctgcagccggtgtttgtcagccgtatgccgaaacgttctgttactggcgct gcgcaccaggaaacgctgcgccgttacgtgggcatcgacgaacgctccggtaaaatccagaccgtagtaaaaa ccaaactgtccgagattaaactggatgcatccggccacttcccaatgtacggtaaagaatccgatccacgcacttat
qaaqccatccqccaqcqtctqctqqaqcataacaacqacccqaaaaaqqcattccaqqaqcctctqtacaaac cqaaaaaaaacqqcqaaccqqqcccqqtaatccqtactqtaaaaattatcqacacqaaaaaccaqqtqatccc tctqaacqatqqtaaaaccqtqqcctacaattccaacatcqttcqcqtqqacqtqttcqaaaaaqatqqtaaatact actqtqtaccqqtqtataccatqqacatcatqaaaqqcattctqccqaacaaaqcqattqaaccqaacaaqccqt actctqaatqqaaaqaaatqaccqaaqattacacqtttcqtttcaqcctqtatccqaacqacctqatccqcatcqaa ctqccqcqtqaaaaaaccqttaaaaccqctqcaqqcqaaqaaattaacqtqaaaqacqtqttcqtttactataaa acqatcqactccqcaaacqqcqqcctqqaactqatttctcacqaccaccqtttctctctqcqtqqcqttqqctctcqc accctqaaacqtttcqaqaaatatcaaqttqatqttctqqqtaacatctataaaqtqcqtqqcqaqaaacqtqtcqqt ctqqcqtcctccqcacacaqcaaacctqqcaaaaccattcqtccactqcaatctactcqtqactaa
[SEQ ID NO: 3] Geobacillus thermodenitrificans T12 Cas9 protein AA sequence
MKYKIGLDIGITSIGWAVINLDIPRIEDLGVRIFDRAENPKTGESLALPRRLARSARR
RLRRRKHRLERIRRLFVREGILTKEELNKLFEKKHEIDVWQLRVEALDRKLNNDEL
ARILLHLAKRRGFRSNRKSERTNKENSTMLKHIEENQSILSSYRTVAEMVVKDPKF
SLHKRNKEDNYTNTVARDDLEREIKLIFAKQREYGNIVCTEAFEHEYISIWASQRPF
ASKDDIEKKVGFCTFEPKEKRAPKATYTFQSFTVWEHINKLRLVSPGGIRALTDDE
RRLIYKQAFHKNKITFHDVRTLLNLPDDTRFKGLLYDRNTTLKENEKVRFLELGAY
HKIRKAIDSVYGKGAAKSFRPIDFDTFGYALTMFKDDTDIRSYLRNEYEQNGKRME
NLADKVYDEELIEELLNLSFSKFGHLSLKALRNILPYMEQGEVYSTACERAGYTFT
GPKKKQKTVLLPNIPPIANPWMRALTQARKVVNAIIKKYGSPVSIHIELARELSQSF
DERRKMQKEQEGNRKKNETAIRQLVEYGLTLNPTGLDIVKFKLWSEQNGKCAYS
LQPIEIERLLEPGYTEVDHVIPYSRSLDDSYTNKVLVLTKENREKGNRTPAEYLGL
GSERWQQFETFVLTNKQFSKKKRDRLLRLHYDENEENEFKNRNLNDTRYISRFLA
NFIREHLKFADSDDKQKVYTVNGRITAHLRSRWNFNKNREESNLHHAVDAAIVAC
TTPSDIARVTAFYQRREQNKELSKKTDPQFPQPWPHFADELQARLSKNPKESIKA
LNLGNYDNEKLESLQPVFVSRMPKRSITGAAHQETLRRYIGIDERSGKIQTWKKK
LSEIQLDKTGHFPMYGKESDPRTYEAIRQRLLEHNNDPKKAFQEPLYKPKKNGEL
GPIIRTIKIIDTTNQVIPLNDGKTVAYNSNIVRVDVFEKDGKYYCVPIYTIDMMKGILP
NKAIEPNKPYSEWKEMTEDYTFRFSLYPNDLIRIEFPREKTIKTAVGEEIKIKDLFAY
YQTIDSSNGGLSLVSHDNNFSLRSIGSRTLKRFEKYQVDVLGNIYKVRGEKRVGV
ASSSHSKAGETIRPL*
[SEQ ID NO: 4] Geobacillus thermodenitrificans T12 Cas9 DNA Sequence
atgaagtataaaatcggtcttgatatcggcattacgtctatcggttgggctgtcattaatttggacattcctcgcatcgaa gatttaggtgtccgcatttttgacagagcggaaaacccgaaaaccggggagtcactagctcttccacgtcgcctcg cccgctccgcccgacgtcgtctgcggcgtcgcaaacatcgactggagcgcattcgccgcctgttcgtccgcgaag gaattttaacgaaggaagagctgaacaagctgtttgaaaaaaagcacgaaatcgacgtctggcagcttcgtgttg aagcactggatcgaaaactaaataacgatgaattagcccgcatccttcttcatctggctaaacggcgtggatttaga tccaaccgcaagagtgagcgcaccaacaaagaaaacagtacgatgctcaaacatattgaagaaaaccaatcc attctttcaagttaccgaacggttgcagaaatggttgtcaaggatccgaaattttccctgcacaagcgtaataaagag gataattacaccaacactgttgcccgcgacgatcttgaacgggaaatcaaactgattttcgccaaacagcgcgaat atgggaacatcgtttgcacagaagcatttgaacacgagtatatttccatttgggcatcgcaacgcccttttgcttctaag gatgatatcgagaaaaaagtcggtttctgtacgtttgagcctaaagaaaaacgcgcgccaaaagcaacatacac attccagtccttcaccgtctgggaacatattaacaaacttcgtcttgtctccccgggaggcatccgggcactaaccga tgatgaacgtcgtcttatatacaagcaagcatttcataaaaataaaatcaccttccatgatgttcgaacattgcttaact tgcctgacgacacccgttttaaaggtcttttatatgaccgaaacaccacgctgaaggaaaatgagaaagttcgcttc cttgaactcggcgcctatcataaaatacggaaagcgatcgacagcgtctatggcaaaggagcagcaaaatcattt cgtccgattgattttgatacatttggctacgcattaacgatgtttaaagacgacaccgacattcgcagttacttgcgaa acgaatacgaacaaaatggaaaacgaatggaaaatctagcggataaagtctatgatgaagaattgattgaaga acttttaaacttatcgttttctaagtttggtcatctatcccttaaagcgcttcgcaacatccttccatatatggaacaaggc gaagtctactcaaccgcttgtgaacgagcaggatatacatttacagggccaaagaaaaaacagaaaacggtatt gctgccgaacattccgccgatcgccaatccggtcgtcatgcgcgcactgacacaggcacgcaaagtggtcaatg ccattatcaaaaagtacggctcaccggtctccatccatatcgaactggcccgggaactatcacaatcctttgatgaa cgacgtaaaatgcagaaagaacaggaaggaaaccgaaagaaaaacgaaactgccattcgccaacttgttgaa tatgggctgacgctcaatccaactgggcttgacattgtgaaattcaaactatggagcgaacaaaacggaaaatgtg cctattcactccaaccgatcgaaatcgagcggttgctcgaaccaggctatacagaagtcgaccatgtgattccatac agccgaagcttggacgatagctataccaataaagttcttgtgttgacaaaggagaaccgtgaaaaaggaaaccg caccccagctgaatatttaggattaggctcagaacgttggcaacagttcgagacgtttgtcttgacaaataagcagtt ttcgaaaaagaagcgggatcgactccttcggcttcattacgatgaaaacgaagaaaatgagtttaaaaatcgtaat ctaaatgatacccgttatatctcacgcttcttggctaactttattcgcgaacatctcaaattcgccgacagcgatgaca aacaaaaagtatacacggtcaacggccgtattaccgcccatttacgcagccgttggaattttaacaaaaaccggg aagaatcgaatttgcatcatgccgtcgatgctgccatcgtcgcctgcacaacgccgagcgatatcgcccgagtcac cgccttctatcaacggcgcgaacaaaacaaagaactgtccaaaaagacggatccgcagtttccgcagccttggc cgcactttgctgatgaactgcaggcgcgtttatcaaaaaatccaaaggagagtataaaagctctcaatcttggaaat tatgataacgagaaactcgaatcgttgcagccggtttttgtctcccgaatgccgaagcggagcataacaggagcg gctcatcaagaaacattgcggcgttatatcggcatcgacgaacggagcggaaaaatacagacggtcgtcaaaa agaaactatccgagatccaactggataaaacaggtcatttcccaatgtacgggaaagaaagcgatccaaggac
atatgaagccattcgccaacggttgcttgaacataacaatgacccaaaaaaggcgtttcaagagcctctgtataaa ccgaagaagaacggagaactaggtcctatcatccgaacaatcaaaatcatcgatacgacaaatcaagttattccg ctcaacgatggcaaaacagtcgcctacaacagcaacatcgtgcgggtcgacgtctttgagaaagatggcaaatat tattgtgtccctatctatacaatagatatgatgaaagggatcttgccaaacaaggcgatcgagccgaacaaaccgt actctgagtggaaggaaatgacggaggactatacattccgattcagtctatacccaaatgatcttatccgtatcgaat ttccccgagaaaaaacaataaagactgctgtgggggaagaaatcaaaattaaggatctgttcgcctattatcaaac catcgactcctccaatggagggttaagtttggttagccatgataacaacttttcgctccgcagcatcggttcaagaac cctcaaacgattcgagaaataccaagtagatgtgctaggcaacatctacaaagtgagaggggaaaagagagtt ggggtggcgtcatcttctcattcgaaagccggggaaactatccgtccgttataa
[SEQ ID NO: 5] dGeoCas9 (D8A, FI582A) (applied in E. coli) amino acid sequence:
MRYKIGLAIGITSVGWAVMNLDIPRIEDLGVRIFDRAENPQTGESLALPRRLARSA
RRRLRRRKHRLERIRRLVIREGILTKEELDKLFEEKHEIDVWQLRVEALDRKLNND
ELARVLLHLAKRRGFKSNRKSERSNKENSTMLKHIEENRAILSSYRTVGEMIVKDP
KFALHKRNKGENYTNTIARDDLEREIRLIFSKQREFGNMSCTEEFENEYITIWASQ
RPVASKDDIEKKVGFCTFEPKEKRAPKATYTFQSFIAWEHINKLRLISPSGARGLT
DEERRLLYEQAFQKNKITYHDIRTLLHLPDDTYFKGIVYDRGESRKQNENIRFLELD
AYHQIRKAVDKVYGKGKSSSFLPIDFDTFGYALTLFKDDADIHSYLRNEYEQNGKR
MPNLANKVYDNELIEELLNLSFTKFGHLSLKALRSILPYMEQGEVYSSACERAGYT
FTGPKKKQKTMLLPNIPPIANPWMRALTQARKWNAIIKKYGSPVSIHIELARDLS
QTFDERRKTKKEQDENRKKNETAIRQLMEYGLTLNPTGHDIVKFKLWSEQNGRC
AYSLQPIEIERLLEPGYVEVDAVIPYSRSLDDSYTNKVLVLTRENREKGNRIPAEYL
GVGTERWQQFETFVLTNKQFSKKKRDRLLRLHYDENEETEFKNRNLNDTRYISR
FFANFIREHLKFAESDDKQKVYTVNGRVTAHLRSRWEFNKNREESDLHHAVDAVI
VACTTPSDIAKVTAFYQRREQNKELAKKTEPHFPQPWPHFADELRARLSKHPKES
IKALNLGNYDDQKLESLQPVFVSRMPKRSVTGAAHQETLRRYVGIDERSGKIQTV
VKTKLSEIKLDASGHFPMYGKESDPRTYEAIRQRLLEHNNDPKKAFQEPLYKPKK
N G E P G P VI RTVKI I DTKN Q V I P L N D G KTVAYN S N I VRVD VF E KD G KYYC VP VYTM D I
MKGILPNKAIEPNKPYSEWKEMTEDYTFRFSLYPNDLIRIELPREKTVKTAAGEEIN
VKDVFVYYKTIDSANGGLELISHDHRFSLRGVGSRTLKRFEKYQVDVLGNIYKVRG
EKRVGLASSAHSKPGKTIRPLQSTRD*
[SEQ ID NO: 6] dGeoCas9 (D8A, H582A) (applied in E. coli) DNA sequence:
atgcgttataagattggcctggCcatcggtattacctctgttggttgggcagtcatgaacctggatatccctcgtatcgaagatctg ggcgtgcgcatttttgaccgtgcggagaacccgcagaccggtgaatctctggctctgccgcgtcgtctggcacgtagcgcacg ccgccgcctgcgtcgtcgtaaacaccgtctggagcgtattcgtcgcctggttattcgtgaaggcatcctgacgaaagaagaact ggataaactgttcgaagaaaaacacgagatcgacgtatggcagctgcgtgtagaagccctggaccgtaagctgaacaacg acgaactggcgcgtgtcctgctgcatctggcaaagcgtcgtggcttcaaatctaaccgtaaatctgaacgctccaataaagag aactccactatgctgaaacatattgaggagaaccgtgcaattctgtctagctaccgtaccgtgggcgaaatgattgttaaagac ccgaaattcgcactgcataagcgtaacaaaggcgaaaactacaccaacaccattgcacgcgatgacctggaacgtgaaat ccgtctgattttctccaaacagcgcgaattcggcaacatgtcttgcaccgaagaattcgaaaacgaatatattaccatttgggca tctcagcgtccggtggcgtctaaagatgatatcgaaaaaaaagtaggcttttgtactttcgaaccgaaggaaaaacgtgcgcc gaaagccacctataccttccagtcttttatcgcgtgggaacatatcaacaaactgcgtctgatttctccgtctggcgcccgcggcc tgaccgacgaagaacgtcgtctgctgtatgaacaagcattccagaaaaacaaaattacctaccacgatattcgtaccctgctg catctgccggacgacacctacttcaagggcatcgtttacgatcgcggtgaatctcgtaagcagaacgaaaacattcgtttcctg gaactggatgcataccaccagatccgtaaagctgtagataaagtttacggcaagggtaaatccagcagcttcctgccgatcg actttgataccttcggttacgcgctgaccctgtttaaagacgatgcggatatccactcttacctgcgcaacgagtacgaacagaa cggcaaacgtatgcctaacctggctaacaaagtttacgataacgagctgattgaagaactgctgaacctgtccttcactaaatt cggtcacctgtctctgaaagctctgcgttccatcctgccgtatatggaacagggtgaagtctactcctccgcttgtgaacgtgcag gctacaccttcaccggtccgaaaaagaagcaaaaaactatgctgctgccgaacatcccgccgattgcgaaccctgtagtaat gcgtgcactgacccaggcgcgcaaagtagtcaacgcgatcatcaaaaagtacggcagcccggtttccatccatatcgaact ggcgcgcgacctgagccagacttttgacgagcgtcgtaaaactaaaaaggaacaggatgaaaaccgtaaaaaaaacga aaccgcgatccgccagctgatggaatacggtctgactctgaaccctactggtcacgatattgtgaagttcaagctgtggtctga acagaacggtcgctgtgcttactctctgcagccgatcgagatcgaacgtctgctggagccaggttacgttgaagtagatGCtg tgatcccgtactcccgctctctggatgattcttataccaacaaagttctggttctgactcgcgaaaaccgtgagaaaggcaaccg catcccagctgaatatctgggtgttggcactgagcgttggcaacagttcgaaaccttcgtcctgaccaataaacagttctctaaa aagaaacgtgaccgtctgctgcgtctgcactacgatgaaaacgaagagactgaattcaaaaaccgtaacctgaacgatact cgctacatcagccgcttcttcgcaaacttcattcgtgaacacctgaaatttgcggaatccgacgataaacagaaagtttataccg taaacggccgtgttaccgcccacctgcgttctcgctgggagttcaacaagaaccgtgaggaaagcgatctgcaccacgctgtt gacgccgttattgtggcgtgcaccaccccaagcgatatcgctaaggtgaccgcattctaccagcgtcgtgagcagaacaagg aactggccaaaaaaaccgaaccgcattttccgcagccgtggccgcacttcgcggacgaactgcgtgctcgtctgtccaaaca tcctaaagaaagcatcaaagctctgaacctgggtaactacgatgaccaaaaactggaatctctgcagccggtgtttgtcagcc gtatgccgaaacgttctgttactggcgctgcgcaccaggaaacgctgcgccgttacgtgggcatcgacgaacgctccggtaa aatccagaccgtagtaaaaaccaaactgtccgagattaaactggatgcatccggccacttcccaatgtacggtaaagaatcc gatccacgcacttatgaagccatccgccagcgtctgctggagcataacaacgacccgaaaaaggcattccaggagcctctg tacaaaccgaaaaaaaacggcgaaccgggcccggtaatccgtactgtaaaaattatcgacacgaaaaaccaggtgatcc ctctgaacgatggtaaaaccgtggcctacaattccaacatcgttcgcgtggacgtgttcgaaaaagatggtaaatactactgtgt accggtgtataccatggacatcatgaaaggcattctgccgaacaaagcgattgaaccgaacaagccgtactctgaatggaa agaaatgaccgaagattacacgtttcgtttcagcctgtatccgaacgacctgatccgcatcgaactgccgcgtgaaaaaaccg ttaaaaccgctgcaggcgaagaaattaacgtgaaagacgtgttcgtttactataaaacgatcgactccgcaaacggcggcct
ggaactgatttctcacgaccaccgtttctctctgcgtggcgttggctctcgcaccctgaaacgtttcgagaaatatcaagttgatgtt ctgggtaacatctataaagtgcgtggcgagaaacgtgtcggtctggcgtcctccgcacacagcaaacctggcaaaaccattc gtccactgcaatctactcgtgactaa
[SEQ ID NO: 7] dThermoCas9 (D8A, H582A) (applied in E. coli) amino acid sequence:
MKYKIGLAIGITSIGWAVINLDIPRIEDLGVRIFDRAENPKTGESLALPRRLARSARRRLRRR
KHRLERIRRLFVREGILTKEELNKLFEKKHEIDVWQLRVEALDRKLNNDELARILLHLAKRR
GFRSNRKSERTNKENSTMLKHIEENQSILSSYRTVAEMVVKDPKFSLHKRNKEDNYTNTV
ARDDLEREIKLIFAKQREYGNIVCTEAFEHEYISIWASQRPFASKDDIEKKVGFCTFEPKEK
RAPKATYTFQSFTVWEHINKLRLVSPGGIRALTDDERRLIYKQAFHKNKITFHDVRTLLNLP
DDTRFKGLLYDRNTTLKENEKVRFLELGAYHKIRKAIDSVYGKGAAKSFRPIDFDTFGYAL
TMFKDDTDIRSYLRNEYEQNGKRMENLADKVYDEELIEELLNLSFSKFGHLSLKALRNILP
YMEQGEVYSTACERAGYTFTGPKKKQKTVLLPNIPPIANPVVMRALTQARKVVNAIIKKYG
SPVSIHIELARELSQSFDERRKMQKEQEGNRKKNETAIRQLVEYGLTLNPTGLDIVKFKLW
SEQNGKCAYSLQPIEIERLLEPGYTEVDAVIPYSRSLDDSYTNKVLVLTKENREKGNRTPA
EYLGLGSERWQQFETFVLTNKQFSKKKRDRLLRLHYDENEENEFKNRNLNDTRYISRFLA
NFIREHLKFADSDDKQKVYTVNGRITAHLRSRWNFNKNREESNLHHAVDAAIVACTTPSDI
ARVTAFYQRREQNKELSKKTDPQFPQPWPHFADELQARLSKNPKESIKALNLGNYDNEK
LESLQPVFVSRMPKRSITGAAHQETLRRYIGIDERSGKIQTVVKKKLSEIQLDKTGHFPMY
GKESDPRTYEAIRQRLLEHNNDPKKAFQEPLYKPKKNGELGPIIRTIKIIDTTNQVIPLNDGK
TVAYNSNIVRVDVFEKDGKYYCVPIYTIDMMKGILPNKAIEPNKPYSEWKEMTEDYTFRFS
LYPNDLIRIEFPREKTIKTAVGEEIKIKDLFAYYQTIDSSNGGLSLVSHDNNFSLRSIGSRTLK
RFEKYQVDVLGNIYKVRGEKRVGVASSSHSKAGETIRPL*
[SEQ ID NO: 8] dThermoCas9 (D8A, H582A) (applied in E. coli) DNA sequence: atgaagtataaaatcggtcttgCtatcggcattacgtctatcggttgggctgtcattaatttggacattcctcgcatcgaagatttag gtgtccgcatttttgacagagcggaaaacccgaaaaccggggagtcactagctcttccacgtcgcctcgcccgctccgcccg acgtcgtctgcggcgtcgcaaacatcgactggagcgcattcgccgcctgttcgtccgcgaaggaattttaacgaaggaagag ctgaacaagctgtttgaaaaaaagcacgaaatcgacgtctggcagcttcgtgttgaagcactggatcgaaaactaaataacg atgaattagcccgcatccttcttcatctggctaaacggcgtggatttagatccaaccgcaagagtgagcgcaccaacaaagaa aacagtacgatgctcaaacatattgaagaaaaccaatccattctttcaagttaccgaacggttgcagaaatggttgtcaaggat ccgaaattttccctgcacaagcgtaataaagaggataattacaccaacactgttgcccgcgacgatcttgaacgggaaatca aactgattttcgccaaacagcgcgaatatgggaacatcgtttgcacagaagcatttgaacacgagtatatttccatttgggcatc gcaacgcccttttgcttctaaggatgatatcgagaaaaaagtcggtttctgtacgtttgagcctaaagaaaaacgcgcgccaaa agcaacatacacattccagtccttcaccgtctgggaacatattaacaaacttcgtcttgtctccccgggaggcatccgggcacta
accgatgatgaacgtcgtcttatatacaagcaagcatttcataaaaataaaatcaccttccatgatgttcgaacattgcttaacttg cctgacgacacccgttttaaaggtcttttatatgaccgaaacaccacgctgaaggaaaatgagaaagttcgcttccttgaactc ggcgcctatcataaaatacggaaagcgatcgacagcgtctatggcaaaggagcagcaaaatcatttcgtccgattgattttgat acatttggctacgcattaacgatgtttaaagacgacaccgacattcgcagttacttgcgaaacgaatacgaacaaaatggaaa acgaatggaaaatctagcggataaagtctatgatgaagaattgattgaagaacttttaaacttatcgttttctaagtttggtcatcta tcccttaaagcgcttcgcaacatccttccatatatggaacaaggcgaagtctactcaaccgcttgtgaacgagcaggatataca tttacagggccaaagaaaaaacagaaaacggtattgctgccgaacattccgccgatcgccaatccggtcgtcatgcgcgca ctgacacaggcacgcaaagtggtcaatgccattatcaaaaagtacggctcaccggtctccatccatatcgaactggcccggg aactatcacaatcctttgatgaacgacgtaaaatgcagaaagaacaggaaggaaaccgaaagaaaaacgaaactgccat tcgccaacttgttgaatatgggctgacgctcaatccaactgggcttgacattgtgaaattcaaactatggagcgaacaaaacgg aaaatgtgcctattcactccaaccgatcgaaatcgagcggttgctcgaaccaggctatacagaagtcgacGCtgtgattccat acagccgaagcttggacgatagctataccaataaagttcttgtgttgacaaaggagaaccgtgaaaaaggaaaccgcaccc cagctgaatatttaggattaggctcagaacgttggcaacagttcgagacgtttgtcttgacaaataagcagttttcgaaaaagaa gcgggatcgactccttcggcttcattacgatgaaaacgaagaaaatgagtttaaaaatcgtaatctaaatgatacccgttatatc tcacgcttcttggctaactttattcgcgaacatctcaaattcgccgacagcgatgacaaacaaaaagtatacacggtcaacggc cgtattaccgcccatttacgcagccgttggaattttaacaaaaaccgggaagaatcgaatttgcatcatgccgtcgatgctgcca tcgtcgcctgcacaacgccgagcgatatcgcccgagtcaccgccttctatcaacggcgcgaacaaaacaaagaactgtcca aaaagacggatccgcagtttccgcagccttggccgcactttgctgatgaactgcaggcgcgtttatcaaaaaatccaaagga gagtataaaagctctcaatcttggaaattatgataacgagaaactcgaatcgttgcagccggtttttgtctcccgaatgccgaag cggagcataacaggagcggctcatcaagaaacattgcggcgttatatcggcatcgacgaacggagcggaaaaatacaga cggtcgtcaaaaagaaactatccgagatccaactggataaaacaggtcatttcccaatgtacgggaaagaaagcgatccaa ggacatatgaagccattcgccaacggttgcttgaacataacaatgacccaaaaaaggcgtttcaagagcctctgtataaacc gaagaagaacggagaactaggtcctatcatccgaacaatcaaaatcatcgatacgacaaatcaagttattccgctcaacgat ggcaaaacagtcgcctacaacagcaacatcgtgcgggtcgacgtctttgagaaagatggcaaatattattgtgtccctatctat acaatagatatgatgaaagggatcttgccaaacaaggcgatcgagccgaacaaaccgtactctgagtggaaggaaatgac ggaggactatacattccgattcagtctatacccaaatgatcttatccgtatcgaatttccccgagaaaaaacaataaagactgct gtgggggaagaaatcaaaattaaggatctgttcgcctattatcaaaccatcgactcctccaatggagggttaagtttggttagcc atgataacaacttttcgctccgcagcatcggttcaagaaccctcaaacgattcgagaaataccaagtagatgtgctaggcaac atctacaaagtgagaggggaaaagagagttggggtggcgtcatcttctcattcgaaagccggggaaactatccgtccgttata a
[SEQ ID NO: 9] nThermoCas9 (D8A) (applied in HEK239T, human codon optimized) amino acid sequence:
MKYKIGLAIGITSIGWAVINLDIPRIEDLGVRIFDRAENPKTGESLALPRRLARSARRRLRRR
KHRLERIRRLFVREGILTKEELNKLFEKKHEIDVWQLRVEALDRKLNNDELARILLHLAKRR
GFRSNRKSERTNKENSTMLKHIEENQSILSSYRTVAEMVVKDPKFSLHKRNKEDNYTNTV
ARDDLEREIKLIFAKQREYGNIVCTEAFEHEYISIWASQRPFASKDDIEKKVGFCTFEPKEK
RAPKATYTFQSFTVWEHINKLRLVSPGGIRALTDDERRLIYKQAFHKNKITFHDVRTLLNLP
DDTRFKGLLYDRNTTLKENEKVRFLELGAYHKIRKAIDSVYGKGAAKSFRPIDFDTFGYAL
TMFKDDTDIRSYLRNEYEQNGKRMENLADKVYDEELIEELLNLSFSKFGHLSLKALRNILP
YMEQGEVYSTACERAGYTFTGPKKKQKTVLLPNIPPIANPVVMRALTQARKVVNAIIKKYG
SPVSIHIELARELSQSFDERRKMQKEQEGNRKKNETAIRQLVEYGLTLNPTGLDIVKFKLW
SEQNGKCAYSLQPIEIERLLEPGYTEVDHVIPYSRSLDDSYTNKVLVLTKENREKGNRTPA
EYLGLGSERWQQFETFVLTNKQFSKKKRDRLLRLHYDENEENEFKNRNLNDTRYISRFLA
NFIREHLKFADSDDKQKVYTVNGRITAHLRSRWNFNKNREESNLHHAVDAAIVACTTPSDI
ARVTAFYQRREQNKELSKKTDPQFPQPWPHFADELQARLSKNPKESIKALNLGNYDNEK
LESLQPVFVSRMPKRSITGAAHQETLRRYIGIDERSGKIQTVVKKKLSEIQLDKTGHFPMY
GKESDPRTYEAIRQRLLEHNNDPKKAFQEPLYKPKKNGELGPIIRTIKIIDTTNQVIPLNDGK
TVAYNSNIVRVDVFEKDGKYYCVPIYTIDMMKGILPNKAIEPNKPYSEWKEMTEDYTFRFS
LYPNDLIRIEFPREKTIKTAVGEEIKIKDLFAYYQTIDSSNGGLSLVSHDNNFSLRSIGSRTLK
RFEKYQVDVLGNIYKVRGEKRVGVASSSHSKAGETIRPL*
[SEQ ID NO: 10] nThermoCas9 (D8A) (applied in HEK239T, human codon optimized)
DNA acid sequence: atgaaatacaagattggcctcgCCattggcataacctctataggctgggccgtgatcaaccttgatataccccgaatagagg accttggcgttaggatcttcgatcgcgcagagaatcccaagacaggcgaaagcctcgccctgcctaggcgtttggcacgttca gcaaggagacgcttgaggcgcaggaagcacaggcttgaacgaatccgaaggctctttgtgcgagaggggatcttgacaaa agaggaacttaataaacttttcgagaagaaacatgagattgatgtgtggcaactgagagtcgaggccctcgaccgcaagctc aacaatgacgagctggcaagaattttgttgcacttggcaaagaggcgagggttccggtcaaatcgaaaatcagaacgaact aataaagagaattcaaccatgttgaagcacatagaggagaatcagagcatattgtctagctatcgtacagtggctgagatggt cgttaaagaccccaagttcagcttgcataaaaggaacaaggaagacaactatacgaataccgtggctcgggatgacctcga gcgcgagattaagctcatctttgctaagcaaagggagtacggcaatattgtatgtactgaggcgttcgagcatgaatacatcag tatctgggccagccagcggcccttcgcctcaaaagacgacattgaaaagaaggtgggcttttgcaccttcgaacccaaggag aagcgtgctccgaaggccacctatacttttcaatcttttacagtgtgggagcacatcaataagctcaggctggttagccccggcg ggatacgagctctcacagacgacgagcgcaggttgatttataaacaggccttccacaagaacaagattacttttcacgacgtg agaacccttctcaatctgcccgatgatacgcgcttcaagggcctcttgtacgatcggaatacaaccctcaaagagaacgaaa aggtcagatttctggagcttggtgcataccacaagatccgcaaggcaattgattctgtatacgggaagggtgctgctaagagct tcagacccatcgacttcgacaccttcggatatgcccttactatgttcaaggatgatacggatatccgatcttatctgaggaatgag tatgagcagaacgggaagcgcatggagaacctggccgacaaggtttacgacgaggagctcatagaggagctgcttaatct gagcttcagcaaattcgggcacttgtctctgaaggccttgcggaatattctcccttacatggagcagggtgaggtgtatagcacg gcatgcgagcgggctgggtacaccttcactggccccaagaagaagcaaaagacagtgctgcttcccaatatcccaccaata gcaaaccccgtggtgatgagagctttgacccaagctcggaaggtcgtgaacgcaataattaagaaatatgggagccccgtgt
caatacacattgagctcgcgcgtgagctcagtcagtcattcgacgagagacggaagatgcaaaaggagcaagaggggaa tcggaagaagaatgagacagcaatcagacagctggtggagtacggactcaccctgaaccctaccggcctggatatcgtca agtttaagctgtggtccgagcagaatggtaagtgcgcatactctctgcagcccatagagattgaaagacttctggagcctggat acactgaggtggatcacgtcatcccctattctcgctctcttgatgactcatacacaaacaaggtgctggtactcactaaagaaaa tagggagaagggtaataggacgcctgcagagtacctggggttgggaagcgagcggtggcagcaatttgaaaccttcgtgct caccaacaaacaattcagcaagaagaaaagggacaggctgctgcgactgcactatgacgagaatgaggagaacgaattc aagaacagaaacctgaacgacacacgctacatatctcggtttctggccaatttcatccgtgagcacctgaagtttgcagattcc gacgataagcagaaggtctatacagtgaatggaaggatcactgctcacctgaggtctcggtggaacttcaataagaatcgtg aggagagtaacctgcaccacgctgtagacgccgctatagttgcatgtactactccctctgacattgctcgtgtgacggcgttttac cagagaagagagcagaacaaggagctttcaaagaaaaccgaccctcaattcccccaaccctggccacatttcgccgacg agcttcaagccaggctgagcaagaaccctaaagaatccattaaggcactgaacctcggcaactacgacaatgaaaagctt gagtcactgcaacctgtcttcgtgagcagaatgcccaaaagatcaattaccggtgctgcccaccaggagactcttaggcggta cattggaatagatgagagatctggcaagattcaaactgttgttaagaagaagcttagtgaaattcagctcgacaagaccggac actttcccatgtatggaaaggagtctgaccctcgcacttacgaggcaatacggcagcgcctgctggagcacaataacgatcct aagaaagcattccaggaacccctctacaagcccaagaagaatggggagcttgggccgattattagaactattaagataattg acaccaccaaccaggtgatcccactgaatgacgggaagaccgtagcatataattctaatattgttagggttgatgtgttcgaaa aggacggtaagtactactgcgttcccatatacactattgacatgatgaagggcattctgcccaataaagcaatagaacctaata agccctatagcgaatggaaagagatgactgaagattacacctttcggtttagcctgtatcccaacgacctgattaggatagagtt cccacgcgagaagaccattaaaacagccgttggagaggagattaagatcaaagacttgtttgcatactaccagacgatagat agcagcaacggcggcttgagccttgtgtcccacgacaataatttctccctgagaagtattggcagccgcactctgaagcgcttt gaaaagtatcaggtggacgtacttggaaatatttataaggttcggggagagaaaagggtgggcgtcgcttccagttcacactc caaggcaggcgagacaattcggcccctgtaa
[SEQ ID NO: 11] AcrIICINme (applied in E. coli, codon optimised for E. coli K12) amino acid sequence:
MNKTYKIGKNAGYDGCGLCLAAISENEAIKVKYLRDICPDYDGDDKAEDWLRWGTDSRV KAAALEM EQYAYTSVGM ASCWEFVEL
[SEQ ID NO: 12] AcrIICINme (applied in E. coli, codon optimised for E. coli K12) DNA sequence: atgaataaaacatataaaatcggaaaaaatgcgggatatgacggctgcgggttatgtttagccgcgatctcagaaaatgaag ctattaaagttaagtatctgcgcgatatctgtccagactatgacggcgacgacaaagctgaggactggctgagatggggaac ggatagccgcgtgaaagctgcggctttagaaatggagcaatatgcttatacgtcggttgggatggcgtcatgttgggagtttgttg aactatga
Throughout the description and claims of this specification, the words “comprise” and “contain” and variations of them mean “including but not limited to”, and they are not intended to (and do not) exclude other moieties, additives, components, integers or steps. Throughout the description and claims of this specification, the singular encompasses the plural unless the context otherwise requires. In particular, where the indefinite article is used, the specification is to be understood as contemplating plurality as well as singularity, unless the context requires otherwise.
Features, integers, characteristics, compounds, chemical moieties or groups described in conjunction with a particular aspect, embodiment or example of the invention are to be understood to be applicable to any other aspect, embodiment or example described herein unless incompatible therewith. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive. The invention is not restricted to the details of any foregoing embodiments.
The invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings), or to any novel one, or any novel combination, of the steps of any method or process so disclosed. The reader's attention is directed to all papers and documents which are filed concurrently with or previous to this specification in connection with this application and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference.
Claims
1. A base editor comprising:
(a) a Cas9 lacking endonuclease activity and not generating DNA double strand breaks;
(b) a deaminase; wherein:
(i) the Cas9 is a Geobacillus stearothermophilus Cas9 (GeoCas9) having a sequence of SEQ ID NO: 1 or a sequence of at least 77% identity therewith; or
(ii) the Cas9 is a Geobacillus thermodenitrificans Cas9 (ThermoCas9 ) having a sequence of SEQ ID NO: 3 or a sequence of at least 77% identity therewith.
2. A base editor as claimed in claim 1 , wherein the GeoCas9 is (i) a dead GeoCas 9 (dGeoCas9), or (ii) a modified GeoCas9; preferably comprising DNA single strand nickase activity.
3. A base editor as claimed in claim 1, wherein the ThermoCas9 is a dead ThermoCas9 (dThermoCas9) or a modified ThermoCas9, e.g. having DNA single strand nickase activity.
4. A base editor as claimed in any of claims 1 to 3, wherein the deaminase is a cytidine deaminase; optionally wherein the cytidine deaminase is Petromyzon marinus cytidine deaminase (PmCDAI ), human CDA1, or rat APOBEC1 (“BE”).
5. A base editor as claimed in claim 4, wherein the base editor further comprises at least one uracil DNA glycosylase inhibitor (UGI).
6. A base editor as claimed in any preceding claim, wherein there is a linker between (i) between the Cas9 and the UGI; and/or (ii) between Cas9 and the cytidine deaminase; and/or (iii) between the cytidine deaminase and the UGI.
7. A base editor as claimed in claim 5 or claim 6, wherein there are two UGIs.
8. A base editor as claimed in any of claims 1 to 3, wherein the deaminase is an adenine deaminase.
9. A base editor as claimed in any preceding claim, wherein:
(i) the Cas9 is a Geobacillus stearothermophilus Cas9 (GeoCas9) having a sequence of SEQ ID NO: 1 or a sequence of at least 86% identity therewith; or
(ii) the Cas9 is a Geobacillus thermodenitrificans Cas9 (ThermoCas9 ) having a sequence of SEQ ID NO: 3 or a sequence of at least 86% identity therewith.
10. A base editor as claimed in any preceding claim, wherein the Cas9 is has a PAM sequence recognition preference selected from NNNNCRAA, NNNNCVAA or NNNNCCCA.
11. A base editor as claimed in claim 1 , wherein (i) the Cas9 is a dead GeoCas9 (dGeoCas9) and the deaminase is PmCDAI or human CDA1; or (ii) the Cas9 is a dead ThermoCas9 (dThermoCas9) and the deaminase is PmCDAI or human CDA1.
12. A base editor as claimed in claim 1, wherein (i) the Cas9 is a nickase ThermoCas9 (nThermoCas9) and the deaminase is rAPOBECI; preferably wherein the Cas9 has a PAM sequence recognition preference of NNNNCCAA.
13. A base editor comprising:
(a) a catalytically active Cas9 for generating DNA double strand breaks;
(b) a deaminase; wherein:
(i) the Cas9 is a Geobacillus stearothermophilus Cas9 (GeoCas9) having a sequence of SEQ ID NO: 1 or a sequence of at least 77% identity therewith; or
(ii) the Cas9 is a Geobacillus thermodenitrificans Cas9 (ThermoCas9 ) having a sequence of SEQ ID NO: 3 or a sequence of at least 77% identity therewith.
14. A base editor as claimed in claim 13, wherein the deaminase is a cytidine deaminase; optionally wherein the cytidine deaminase is selected from Petromyzon marinus cytidine deaminase (PmCDAI ), human CDA1, or rat APOBEC1 (“BE”).
15. A base editor as claimed in claim 14, wherein the base editor further comprises at least one uracil DNA glycosylase inhibitor (UGI).
16. A base editor as claimed in any of claims 13 to 15, wherein there is a linker between (i) between the Cas9 and the UGI; and/or (ii) between Cas9 and the cytidine deaminase; and/or (iii) between the cytidine deaminase and the UGI.
17. A base editor as claimed in claim 15 or claim 16, wherein there are two UGIs.
18. A base editor as claimed in claim 13, wherein the deaminase is an adenine deaminase.
19. A base editor as claimed in any of claims 13 to 18, wherein:
(i) the Cas9 is a Geobacillus stearothermophilus Cas9 (GeoCas9) having a sequence of SEQ ID NO: 1 or a sequence of at least 86% identity therewith; or
(ii) the Cas9 is a Geobacillus thermodenitrificans Cas9 (ThermoCas9 ) having a sequence of SEQ ID NO: 3 or a sequence of at least 86% identity therewith.
20. A base editor as claimed in any of claims 13 to 19, wherein the Cas9 has a PAM sequence recognition preference selected from NNNNCRAA, NNNNCVAA or NNNNCCCA.
21. A polynucleotide encoding a base editor of any of claims 1 to 12.
22. A polynucleotide encoding a base editor of any of claims 13 to 20.
23. An expression vector comprising a polynucleotide of claim 21.
24. An expression vector comprising a polynucleotide of claim 22
25. An expression vector as claimed in claim 23, further comprising a polynucleotide encoding a guide RNA (gRNA) which targets a DNA sequence.
26. An expression vector as claimed in claim 24, further comprising a polynucleotide encoding a gRNA which targets a DNA sequence.
27. An expression vector as claimed in claim 24 or claim 26, further comprising an anti- crispr protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand.
28. A system for base editing of a target DNA sequence, comprising as a first expression vector, a vector of claim 24; a second expression vector comprising a polynucleotide encoding (a) an anti-crispr protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand, and (b) a gRNA for the target DNA sequence.
29. A system for base editing of a target DNA sequence, comprising as a first expression vector, a vector of claim 24; a second expression vector comprising a polynucleotide encoding an anti-crispr protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand; and a third expression vector comprising a polynucleotide encoding a gRNA for the target DNA sequence.
30. A system for base editing of a target DNA sequence, comprising as a first expression vector, a vector of claim 26; and a second expression vector comprising a polynucleotide encoding an anti-crispr protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand.
31. A system for base editing of a target DNA sequence comprising a base editor of any of claims 1 to 12, and a gRNA for a target strand DNA.
32. A system for base editing of a target DNA sequence comprising a base editor of any of claims 13 to 20, and a gRNA for a target strand DNA.
33. A system as claimed in claim 32, further comprising an anti-crispr protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand.
34. A ribonucleoprotein complex comprising a base editor of any of claims 1 to 12, and a gRNA for a target DNA strand.
35. A ribonucleoprotein complex comprising a base editor of any of claims 13 to 20, a gRNA for a target DNA strand, and an anti-crispr protein.
36. A method of base editing comprising transforming a cell with a first expression vector which is an expression vector of claim 23; a second expression vector comprising a polynucleotide encoding a gRNA for a target DNA sequence.
37. A method of base editing comprising transforming a cell with a first expression vector which is an expression vector of claim 23; and introducing into the cell a gRNA for a target DNA sequence.
38. A method of base editing comprising transforming a cell with an expression vector of claim 25.
39. A method of base editing comprising transforming a cell with a first expression vector which is an expression vector of claim 24; a second expression vector comprising a polynucleotide encoding (a) an anti-crispr protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand, and (b) a gRNA for a target DNA sequence.
40. A method of base editing comprising transforming a cell with a first expression vector which is an expression vector of claim 24; a second expression vector comprising a polynucleotide encoding an anti-crispr protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand; and a third expression vector comprising a polynucleotide encoding a gRNA for a target DNA sequence.
41. A method of base editing comprising transforming a cell with a first expression vector which is an expression vector of claim 26; and a second expression vector comprising a polynucleotide encoding an anti-crispr protein that prevents DNA double strand cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand.
42. A method of base editing comprising transforming a cell with a first expression vector which is an expression vector of claim 24; a second expression vector comprising a polynucleotide encoding an anti-crispr protein that prevents DNA double strand
cleavage but does not inhibit DNA binding activity of the Cas9 to a target DNA strand; and introducing into the cell a gRNA for a target DNA sequence.
43. A method of base editing comprising transforming a cell with a first expression vector which is an expression vector of claim 24; and introducing into the cell a gRNA for a target DNA sequence.
44. A method of base editing as claimed in any of claims 36 to 41 which is carried out in cells ex vivo or in vitro.
45. A method of any of claims 36 to 44, wherein after transformation, expression is induced in the cells for a period, following which genetic material in the cells is analysed to identify base edited cells, e.g. by polymerase chain reaction (PCR) using at least one suitable primer pair.
46. A method of base editing, comprising exposing DNA to (i) a base editor of any of claims 1 to 20, and (ii) a guide RNA for a target strand DNA.
47. A method of base editing, comprising exposing DNA to (i) a base editor of any of claims 13 to 20, (ii) a gRNA for a target strand DNA, and (iii) an anti-crispr protein.
48. A method of base editing, comprising exposing DNA to a ribonucleoprotein complex of claim 34.
49. A method of base editing, comprising exposing DNA to a ribonucleoprotein complex of claim 35.
50. A method of base editing as claimed in any of claims 46 to 48, wherein the DNA is comprised in a cell.
51. A method of base editing as claimed in any of claims 39, 40, 41 , 47 or 49, wherein any anti-crispr protein is removed or inactivated, thereby providing a counter selection step for non-edited cells.
52. An expression vector as claimed in claim 27, a system as claimed in any of claims 28, 29, 30 or 33, a ribonucleoprotein complex as claimed in claim 36, or a method as
claimed in any of claims 39, 40, 41 , 47 or 49, wherein the anti-crispr protein is the small anti-crispr protein from Neisseria meningitidis (AcrIICINme).
53. An expression vector as claimed in any of claims 25, 26, 27 or 52, a system as claimed in any of claims 28 to 33, a ribonucleoprotein complex as claimed in claims 34 or claim 35, or a method as claimed in any of claims 36 to 49, wherein the gRNA is a single gRNA (sgRNA); preferably wherein the sgRNA comprises a spacer having at least 5 mismatches at the 5’ end thereof in comparison with the targeted protospacer.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB2010348.7 | 2020-07-06 | ||
| GBGB2010348.7A GB202010348D0 (en) | 2020-07-06 | 2020-07-06 | Base editing tools |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022008466A1 true WO2022008466A1 (en) | 2022-01-13 |
Family
ID=72050585
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2021/068559 Ceased WO2022008466A1 (en) | 2020-07-06 | 2021-07-06 | Base editing tools |
Country Status (2)
| Country | Link |
|---|---|
| GB (1) | GB202010348D0 (en) |
| WO (1) | WO2022008466A1 (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2023227669A3 (en) * | 2022-05-26 | 2024-02-22 | UCB Biopharma SRL | Novel nucleic acid-editing proteins |
| WO2024183751A1 (en) * | 2023-03-07 | 2024-09-12 | 上海科技大学 | ISCBN-ωRNA EDITING SYSTEM AND USE THEREOF |
| CN119685293A (en) * | 2025-02-27 | 2025-03-25 | 天津科技大学 | A UNG mutant and fusion protein that can be used for base editing in Bacillus methanolicus, a base editing system and applications |
| WO2025091603A1 (en) * | 2023-10-31 | 2025-05-08 | 江南大学 | Construction and use of new crispr-cas12b-based base editor |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116825209B (en) * | 2023-07-14 | 2025-10-31 | 安徽农业大学 | Guide-RNAs gene editing tool for simplifying base editing library |
Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2017070632A2 (en) | 2015-10-23 | 2017-04-27 | President And Fellows Of Harvard College | Nucleobase editors and uses thereof |
| WO2017160689A1 (en) | 2016-03-15 | 2017-09-21 | University Of Massachusetts | Anti-crispr compounds and methods of use |
| WO2018027078A1 (en) | 2016-08-03 | 2018-02-08 | President And Fellows Of Harard College | Adenosine nucleobase editors and uses thereof |
| WO2018197520A1 (en) * | 2017-04-24 | 2018-11-01 | Dupont Nutrition Biosciences Aps | Methods and compositions of anti-crispr proteins for use in plants |
| WO2018218166A1 (en) | 2017-05-25 | 2018-11-29 | The General Hospital Corporation | Using split deaminases to limit unwanted off-target base editor deamination |
| WO2019005886A1 (en) | 2017-06-26 | 2019-01-03 | The Broad Institute, Inc. | Crispr/cas-cytidine deaminase based compositions, systems, and methods for targeted nucleic acid editing |
| WO2019023680A1 (en) | 2017-07-28 | 2019-01-31 | President And Fellows Of Harvard College | Methods and compositions for evolving base editors using phage-assisted continuous evolution (pace) |
| WO2019168953A1 (en) | 2018-02-27 | 2019-09-06 | President And Fellows Of Harvard College | Evolved cas9 variants and uses thereof |
| WO2019241649A1 (en) | 2018-06-14 | 2019-12-19 | President And Fellows Of Harvard College | Evolution of cytidine deaminases |
| WO2020081568A1 (en) | 2018-10-15 | 2020-04-23 | University Of Massachusetts | Programmable dna base editing by nme2cas9-deaminase fusion proteins |
| WO2020092453A1 (en) * | 2018-10-29 | 2020-05-07 | The Broad Institute, Inc. | Nucleobase editors comprising geocas9 and uses thereof |
-
2020
- 2020-07-06 GB GBGB2010348.7A patent/GB202010348D0/en not_active Ceased
-
2021
- 2021-07-06 WO PCT/EP2021/068559 patent/WO2022008466A1/en not_active Ceased
Patent Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2017070632A2 (en) | 2015-10-23 | 2017-04-27 | President And Fellows Of Harvard College | Nucleobase editors and uses thereof |
| WO2017160689A1 (en) | 2016-03-15 | 2017-09-21 | University Of Massachusetts | Anti-crispr compounds and methods of use |
| WO2018027078A1 (en) | 2016-08-03 | 2018-02-08 | President And Fellows Of Harard College | Adenosine nucleobase editors and uses thereof |
| WO2018197520A1 (en) * | 2017-04-24 | 2018-11-01 | Dupont Nutrition Biosciences Aps | Methods and compositions of anti-crispr proteins for use in plants |
| WO2018218166A1 (en) | 2017-05-25 | 2018-11-29 | The General Hospital Corporation | Using split deaminases to limit unwanted off-target base editor deamination |
| WO2019005886A1 (en) | 2017-06-26 | 2019-01-03 | The Broad Institute, Inc. | Crispr/cas-cytidine deaminase based compositions, systems, and methods for targeted nucleic acid editing |
| WO2019023680A1 (en) | 2017-07-28 | 2019-01-31 | President And Fellows Of Harvard College | Methods and compositions for evolving base editors using phage-assisted continuous evolution (pace) |
| WO2019168953A1 (en) | 2018-02-27 | 2019-09-06 | President And Fellows Of Harvard College | Evolved cas9 variants and uses thereof |
| WO2019241649A1 (en) | 2018-06-14 | 2019-12-19 | President And Fellows Of Harvard College | Evolution of cytidine deaminases |
| WO2020081568A1 (en) | 2018-10-15 | 2020-04-23 | University Of Massachusetts | Programmable dna base editing by nme2cas9-deaminase fusion proteins |
| WO2020092453A1 (en) * | 2018-10-29 | 2020-05-07 | The Broad Institute, Inc. | Nucleobase editors comprising geocas9 and uses thereof |
Non-Patent Citations (141)
| Title |
|---|
| ALBIN J.HARRIS R., EXPERT REV. MOL. MED. JAN, vol. 22, no. 12, 2010, pages e4 |
| ALBIN J.HARRIS R., EXPERT REV. MOL. MED., vol. 12, 2010, pages e4 |
| AMRANI, N. ET AL.: "NmeCas9 is an intrinsically high-fidelity genome-editing platform", GENOME BIOLOGY, vol. 19, no. 1, 2018, pages 1 - 25, XP055666761, DOI: 10.1186/s13059-018-1591-1 |
| ANDERSEN ET AL., APPL. ENVIRON. MICROBIOL., vol. 64, 1998, pages 2240 - 2246 |
| ARAVIND, L.KOONIN, E. V., GENOME RESEARCH, vol. 11, no. 8, 2001, pages 1365 - 1374 |
| AXFORD D ET AL., FASEB J., vol. 31, 2017, pages 909 |
| BANNO ET AL., NAT MICROBIOL., vol. 3, no. 4, 2018, pages 423 - 429 |
| BANNO, S. ET AL., NATURE MICROBIOLOGY, vol. 3, no. 4, 2018, pages 423 - 429 |
| BANNO, S. ET AL.: "Deaminase-mediated multiplex genome editing in Escherichia coli", NATURE MICROBIOLOGY, vol. 3, no. 4, 2018, pages 423 - 429, XP036467259, DOI: 10.1038/s41564-017-0102-6 |
| BARRANGOU, R.VAN PIJKEREN, J. P., CURRENT OPINION IN BIOTECHNOLOGY, vol. 37, 2016, pages 61 - 68 |
| BAUMERT T. F. ET AL., HEPATOLOGY, vol. 46, no. 3, 2007, pages 682 - 689 |
| BLANC V.DAVIDSON N. O., THE JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 278, 2003, pages 1395 - 1398 |
| BOWATER, R.DOHERTY, A. J., PLOS GENETICS, vol. 2, no. 2, 2006, pages e8 |
| BULLIARD Y, J. VIROL., vol. 85, no. 4, 2011, pages 1765 - 1776 |
| CHAPMAN, J. R. ET AL., MOLECULAR CELL, vol. 47, no. 4, 2012, pages 497 - 510 |
| CHEN, F. ET AL.: "Targeted activation of diverse CRISPR-Cas systems for mammalian genome editing via proximal CRISPR targeting", NAT. COMM., vol. 8, no. 1, 2017, pages 1 - 12, XP002776524, DOI: 10.1038/ncomms14958 |
| CHEN, W. ET AL., ISCIENCE, vol. 6, 2018, pages 222 - 231 |
| CHEN, W. ET AL.: "CRISPR/Cas9-based Genome Editing in Pseudomonas aeruginosa and Cytidine Deaminase-Mediated Base Editing in Pseudomonas Species", ISCIENCE, vol. 6, 2018, pages 222 - 231 |
| CHUANG C. K. ET AL., ANIM. BIOTECHNOL., vol. 28, 2017, pages 174 - 181 |
| COX, D. B. T. ET AL., NATURE MEDICINE, vol. 21, no. 2, 2015, pages 121 |
| CRISPO M. ET AL., PLOS ONE, vol. 10, 2015, pages e0136690 |
| CUI, L.BIKARD, D., NUCLEIC ACIDS RESEARCH, vol. 44, no. 9, 2016, pages 4243 - 4251 |
| DANCYGER A. ET AL., FASEB J, vol. 26, no. 4, 2012, pages 1517 - 1525 |
| D'ASTOLFO D. ET AL., CELL, vol. 161, 2015, pages 674 - 690 |
| DI NOIA J. M. ET AL., ANNU. REV. BIOCHEM., vol. 76, 2007, pages 1 - 22 |
| EBINA H ET AL., SCI. REP, vol. 3, 2013, pages 2510 |
| ESVELT K ET AL., NAT. METHODS, vol. 10, 2013, pages 1116 - 1121 |
| FRIEDLAND ET AL.: "Characterization of Staphylococcus aureus Cas9: a smaller Cas9 for all-in-one adeno-associated virus delivery and paired nickase applications", GENOME BIOLOGY, vol. 16, no. 1, 2015, pages 257, XP055347837, DOI: 10.1186/s13059-015-0817-8 |
| GAUDELLI, N. M. ET AL., NATURE, vol. 551, no. 7681, 2017, pages 464 |
| GEHRKE ET AL., NATURE BIOTECHNOLOGY, vol. 36, 2018, pages 977 - 982 |
| GONZALEZ C., RETROVIROLOGY, vol. 6, 2009, pages 96 |
| GOODWIN, S. ET AL., NATURE REVIEWS GENETICS, vol. 17, no. 6, 2016, pages 333 |
| GRUNEWALD, NATURE BIOTECHNOLOGY, 2020 |
| GU, T ET AL.: "Highly efficient base editing in Staphylococcus aureus using an engineered CRISPR RNA-guided cytidine deaminase", CHEMICAL SCIENCE, vol. 9, no. 12, 2018, pages 3248 - 3253 |
| GU, T. ET AL., CHEMICAL SCIENCE, vol. 9, no. 12, 2018, pages 3248 - 3253 |
| GUAN Y. ET AL., EMBO MOL. MED., vol. 8, 2016, pages 477 - 488 |
| HARRINGTON ET AL.: "A broad-spectrum inhibitor of CRISPR-Cas9", CELL, vol. 170, no. 6, 2017, pages 1224 - 1233, XP085189781, DOI: 10.1016/j.cell.2017.07.037 |
| HARRIS ET AL., MOL. CELL, vol. 10, no. 5, 2002, pages 1247 - 53 |
| HARRIS R. S. ET AL., J. BIOL. CHEM., vol. 287, no. 49, 2012, pages 40875 - 40883 |
| HESS, G. T. ET AL., NATURE METHODS, vol. 13, no. 12, 2016, pages 1029 |
| HOLLAND S ET AL., PROC. NATL. ACAD. SCI. USA, vol. 115, no. 14, 2018, pages E3211 - E3220 |
| HOLLAND S. ET AL., PROC. NATL. ACAD. SCI. U S A, vol. 115, no. 14, 2018, pages E3211 - E3220 |
| HOU, W. ET AL.: "Progress in Chemical Synthesis of Peptides and Proteins", TRANS. TIANJIN UNIV., vol. 23, 2017, pages 401 - 419, XP036310733, DOI: 10.1007/s12209-017-0068-8 |
| HSU ET AL., NATURE BIOTECHNOLOGY, vol. 31, 2013, pages 827 - 832 |
| HUANG, H. ET AL., ACS SYNTHETIC BIOLOGY, vol. 5, no. 12, 2016, pages 1355 - 1361 |
| HULTQUIST J. F. ET AL., J VIROL, vol. 85, no. 21, 2011, pages 11220 - 112234 |
| HULTQUIST J. F. ET AL., J VIROL., vol. 85, no. 21, 2011, pages 11220 - 112234 |
| IKEDA T. ET AL., NUCLEIC ACIDS RESEARCH, vol. 36, 2008, pages 6859 - 6871 |
| IKEDA T. ET AL., NUCLEIC ACIDS RESEARCH, vol. 39, 2011, pages 5538 - 5554 |
| ILIAKIS, G. ET AL., CYTOGENETIC AND GENOME RESEARCH, vol. 104, no. 1-4, 2004, pages 14 - 20 |
| JIANG, W. ET AL., NATURE BIOTECHNOLOGY, vol. 31, no. 3, 2013, pages 233 |
| JIANG, W. ET AL.: "BE-PLUS: a new base editing tool with broadened editing window and enhanced fidelity", CELL RESEARCH, vol. 28, no. 8, 2018, pages 855 - 861, XP036675688, DOI: 10.1038/s41422-018-0052-4 |
| JINEK, M. ET AL., SCIENCE, vol. 337, no. 6096, 2012, pages 816 - 821 |
| KIM E ET AL.: "In vivo genome editing with a small Cas9 orthologue derived from Campylobacter jejuni", NAT. COMMUN., vol. 8, 2017, pages 14500 |
| KIM, K. ET AL.: "Highly efficient RNA-guided base editing in mouse embryos", NATURE BIOTECHNOLOGY, vol. 35, no. 5, 2017, pages 435 - 437, XP055482711, DOI: 10.1038/nbt.3816 |
| KIM, Y. ET AL., NATURE BIOTECHNOLOGY, vol. 35, no. 5, 2017, pages 371 - 376 |
| KIM, Y. ET AL.: "Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions", NATURE BIOTECHNOLOGY, vol. 35, 2017, pages 371 - 376, XP055484491, DOI: 10.1038/nbt.3803 |
| KLUESNER ET AL., CRISPR J, vol. 1, 2018, pages 239 - 250 |
| KOCK J.BLUM H., J. GEN. VIROL., vol. 89, 2008, pages 1184 - 1191 |
| KOHLI R. M. ET AL., J. BIOL. CHEM., vol. 24, no. 52, 2010, pages 40956 - 409564 |
| KOMOR A. C. ET AL., SCIENCE ADVANCES, vol. 3, no. 8, 2017, pages eaao4774 |
| KOMOR, A. C. ET AL.: "Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purity", SCIENCE ADVANCES, vol. 3, no. 8, 2017, pages eaao4774, XP055453964, DOI: 10.1126/sciadv.aao4774 |
| KOMOR, A.C. ET AL.: "Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage", NATURE, vol. 533, 2016, pages 7601 - 424 |
| LANDRUM, M. J. ET AL., NUCLEIC ACIDS RESEARCH, vol. 42, no. D1, 2013, pages D980 - D985 |
| LANDRUM, M. J. ET AL., NUCLEIC ACIDS RESEARCH, vol. 44, no. D1, 2015, pages D862 - D868 |
| LANGLOIS ET AL., NUCLEIC ACIDS RES., vol. 33, no. 6, 2005, pages 1913 - 1923 |
| LEE CMCRADICK TJBAO G: "The Neisseria meningitidis CRISPR-Cas9 system enables specific genome editing in mammalian cells", MOL. THER., vol. 24, 2016, pages 645 - 654, XP055449590, DOI: 10.1038/mt.2016.8 |
| LI ET AL., BIOTECHNOL BIOENG, vol. 116, 2019, pages 1475 - 1483 |
| LI, Q. ET AL., BIOTECHNOLOGY JOURNAL, vol. 11, no. 7, 2016, pages 961 - 972 |
| LI, Q. ET AL., METABOLIC ENGINEERING, vol. 31, 2015, pages 13 - 21 |
| LI, Q. ET AL.: "CRISPR-Cas9D10A nickase-assisted base editing in the solvent producer Clostridium beijerinckii.", BIOTECHNOLOGY AND BIOENGINEERING, vol. 116, 2019, pages 1475 - 1483 |
| LIN, S. ET AL., ELIFE, vol. 3, 2014, pages e04766 |
| LONGERICH S, CURR OPIN. IMMUNOL., vol. 18, no. 2, 2006, pages 164 - 74 |
| LUKACSOVICH, T., NUCLEIC ACIDS RESEARCH, vol. 22, no. 25, 1994, pages 5649 - 5657 |
| LUNDQUIST ET AL.: "Site- directed mutagenesis and characterization of uracil-DNA glycosylase inhibitor protein. Role of specific carboxylic amino acids in complex formation with Escherichia coli uracil-DNA glycosylase", J. BIOL. CHEM., vol. 272, 1997, pages 21408 - 21419 |
| LUO Y. ET AL., MICROB. CELL FACT., vol. 19, 2020, pages 93 |
| MADDALO D ET AL., NATURE, vol. 514, 2014, pages 380 - 384 |
| MALIM M.EMERNAM M., CELL HOST MICROBE, vol. 3, no. 6, 2008, pages 388 - 398 |
| MATANO M ET AL., NAT MED., vol. 21, 2015, pages 256 - 62 |
| MILLER J ET AL., ANGEW CHEM. INT. ED. ENGL., vol. 56, 2017, pages 1059 - 1063 |
| MOUGIAKOS ET AL., NATURE COMMUNICATIONS, vol. 8, no. 1, 2017, pages 15939 |
| MOUGIAKOS ET AL.: "Characterizing a thermostable Cas9 for bacterial genome editing and silencing", NATURE COMMUNICATIONS, vol. 8, no. 1, 2017, pages 1647 |
| MOUGIAKOS IOANNIS ET AL: "Characterizing an antiCRISPR-based on/off switch for bacterial genome engineering", FEMS 2019 8TH CONGRESS OF EUROPEAN MICROBIOLOGISTS, 7 July 2019 (2019-07-07), pages 1 - 1639, XP055843376, ISBN: 978-1-138-20333-4, Retrieved from the Internet <URL:https://fems2019.org/fileadmin/user_upload/FEMS/fems2019_abstractbook.pdf> DOI: 10.4324/9781315471938 * |
| MOUGIAKOS, I., TRENDS IN BIOTECHNOLOGY, vol. 34, no. 7, 2016, pages 575 - 587 |
| MOUGIAKOS, I.: "Enhancing CRISPRi specificity employing active ThermoCas9 and AcrIlC1", INTERNATIONAL CONFERENCE ON CRISPR TECHNOLOGIES, WURZBURG, GERMANY, 18 September 2019 (2019-09-18) |
| MOUT R ET AL., ACS NANO, vol. 11, 2017, pages 2452 - 2458 |
| NIEWIADOMSKA ET AL., J VIROL., vol. 81, no. 17, 2007, pages 9577 - 9583 |
| NISHIDA, K. ET AL., SCIENCE, vol. 353, no. 6305, 2016, pages aaf8729 |
| NISHIDA, K. ET AL.: "Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems", SCIENCE, vol. 353, 2016, pages 6305 |
| NISHIMASU, H. ET AL.: "Engineered CRISPR-Cas9 nuclease with expanded targeting space", SCIENCE, vol. 361, no. 6408, 2018, pages 1259 - 1262, XP055578577, DOI: 10.1126/science.aas9129 |
| NIU Y. ET AL., CELL, vol. 156, 2014, pages 836 - 843 |
| PAQUET D ET AL., NATURE, vol. 533, 2016, pages 125 - 129 |
| PAWLUK ET AL.: "Naturally occurring off-switches for CRISPR-Cas9", CELL, vol. 167, no. 7, 2016, pages 1829 - 1838, XP029850707, DOI: 10.1016/j.cell.2016.11.017 |
| PETIT V. ET AL., J. MOL. BIOL., vol. 385, no. 1, 2009, pages 65 - 78 |
| PUTNAM ET AL.: "Protein mimicry of DNA from crystal structures of the uracil-DNA glycosylase inhibitor protein and its complex with Escherichia coli uracil-DNA glycosylase", J. MOL. BIOL., vol. 287, 1999, pages 331 - 346, XP004462617, DOI: 10.1006/jmbi.1999.2605 |
| QUINLAN E. M. ET AL., MOL. CELL BIOL., vol. 37, no. 20, 2017, pages e00077 - 1 |
| RAMAKRISHNA S ET AL., GENOME RES, vol. 24, 2014, pages 1020 - 1027 |
| RAVISHANKAR: "X-ray analysis of a complex of Escherichia coli uracil DNA glycosylase (EcUDG) with a proteinaceous inhibitor. The structure elucidation of a prokaryotic UDG", NUCLEIC ACIDS RES., vol. 26, 1998, pages 4880 - 4887 |
| REES, H. A. ET AL.: "A thermostable Cas9 with increased lifetime in human plasma", NATURE COMMUNICATIONS, vol. 8, no. 1, 2017, pages 15790 |
| REES, H. A.: "Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery", NATURE COMMUNICATIONS, vol. 8, 2017, pages 15790, XP055597104, DOI: 10.1038/ncomms15790 |
| REFSLAND E. W. ET AL., PLOS PATHOG, vol. 8, no. 7, 2012, pages e1002800 |
| ROUET, P. ET AL., MOLECULAR AND CELLULAR BIOLOGY, vol. 14, no. 12, 1994, pages 8096 - 8106 |
| ROUET, P. ET AL., PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, vol. 91, no. 13, 1994, pages 6064 - 6068 |
| RUDIN, N., GENETICS, vol. 122, no. 3, 1989, pages 519 - 534 |
| SASAGURI, H. ET AL.: "Introduction of pathogenic mutations into the mouse Psen1 gene by Base Editor and Target-AID", NATURE COMMUNICATIONS, vol. 9, no. 1, 2018, pages 2892 |
| SHALEM O ET AL., SCIENCE, vol. 343, 2014, pages 84 - 87 |
| SHITAMANI, Z. ET AL.: "Targeted base editing in rice and tomato using a CRISPR-Cas9 cytidine deaminase fusion", NATURE BIOTECHNOLOGY, vol. 35, no. 5, 2017, pages 441 - 443, XP055529795, DOI: 10.1038/nbt.3833 |
| STANDAGE-BEIER, K. ET AL., ACS SYNTH. BIOL, vol. 4, 2015, pages 1217 - 1225 |
| STENGLEIN ET AL., NAT. STRUCT. MOL. BIOL., vol. 17, no. 2, 2010, pages 222 - 229 |
| SUN W. ET AL., ANGEW CHEM. INT. ED. ENGL., vol. 54, 2015, pages 12029 - 12033 |
| SUSPENE R. ET AL., PROC. NATL. ACAD. SCI., vol. 108, no. 12, 2011, pages 4858 - 4863 |
| TAN J. ET AL., NATURE COMM, vol. 10, 2019, pages 439 |
| TIAN, S. ET AL.: "CRISPR-Cas9 Based Engineering of Actinomycetal Genomes", ACS SYNTH. BIOL., vol. 4, no. 9, 2018, pages 1020 - 1029 |
| TONG T. ET AL., PNAS, vol. 116, no. 41, 2019, pages 20366 - 20375 |
| TONG T. ET AL.: "Highly efficient DSB-free base editing for streptomycetes with CRISPR-BEST", PNAS, vol. 116, no. 41, 2019, pages 20366 - 20375 |
| TONG, Y. ET AL., ACS SYNTHETIC BIOLOGY, vol. 4, no. 9, 2015, pages 1020 - 1029 |
| TRUONG D. J. ET AL., NUCLEIC ACIDS RES., vol. 43, 2015, pages 6450 - 6458 |
| WANG ET AL.: "Uracil-DNA glycosylase inhibitor gene of bacteriophage PBS2 encodes a binding protein specific for uracil-DNA glycosylase", J. BIOL. CHEM., vol. 264, 1989, pages 1 163 - 1 171 |
| WANG M. ET AL., J. EXP. MED., vol. 207, no. 1, 2010, pages 141 - 53 |
| WANG M. ET AL., NAT. STRUCT. MOL. BIOL., vol. 16, no. 7, 2009, pages 769 - 776 |
| WANG, Y. ET AL., BIOTECHNOLOGY AND BIOENGINEERING, vol. 116, 2019, pages 3016 - 3029 |
| WANG, Y. ET AL., JOURNAL OF BIOTECHNOLOGY, vol. 200, 2015, pages 1 - 5 |
| WANG, Y. ET AL., METABOLIC ENGINEERING, vol. 47, 2018, pages 200 - 210 |
| WANG, Y. ET AL.: "CRISPR-Cas9 and CRISPR-Assisted Cytidine Deaminase Enable Precise and Efficient Genome Editing in Klebsiella pneumoniae", APPLIED AND ENVIRONMENTAL MICROBIOLOGY, vol. 84, 2018, pages e01834 - 18 |
| WANG, Y. ET AL.: "Expanding targeting scope, editing window, and base transition capability of base editing in Corynebacterium glutamicum", BIOTECHNOLOGY AND BIOENGINEERING, vol. 116, 2019, pages 3016 - 3029 |
| WANG, Y. ET AL.: "MACBETH: Multiplex automated Corynebacterium glutamicum base editing method", METABOLIC ENGINEERING, vol. 47, 2018, pages 200 - 210 |
| WISSING S. ET AL., MOL. ASPECTS MED., vol. 31, no. 5, 2010, pages 383 - 397 |
| XU, T. ET AL., APPL. ENVIRON. MICROBIOL., vol. 81, 2015, pages 4423 - 4431 |
| YANG H. ET AL., CELL, vol. 154, 2013, pages 1370 - 1379 |
| YE L. ET AL., PROC. NATL. ACAD. SCI. USA, vol. 111, 2014, pages 9591 - 9596 |
| YIN H ET AL., NAT. BIOTECHNOL., vol. 32, 2014, pages 551 - 553 |
| YIN H ET AL., NAT. BIOTECHNOL., vol. 34, 2016, pages 328 - 333 |
| ZAFRA, M. P. ET AL.: "Optimized base editors enable efficient editing in cells, organoids and mice", NATURE BIOTECHNOLOGY, vol. 36, no. 9, 2018, pages 888 - 893, XP036929662, DOI: 10.1038/nbt.4194 |
| ZHANG, Y. ET AL.: "Programmable adenine deamination in bacteria using a Cas9-adenine-deaminase fusion", CHEM. SCI., vol. 11, 2020, pages 1657 - 1664 |
| ZHAO, Y. ET AL., SCI. CHINA LIFE SCI, 2019, Retrieved from the Internet <URL:https://doi.org/10.1007/s11427-019-1559-y> |
| ZHAO, Y. ET AL.: "Multiplex genome editing using a dCas9-cytidine deaminase fusion in Streptomyces", SCI. CHINA LIFE SCI., 2019, Retrieved from the Internet <URL:https://doi.org/10.1007/s11427-019-1559-v> |
| ZHENG, K. ET AL., COMMUN, BIOL., vol. 1, 2018, pages 32 |
| ZHIGANG ET AL., GENE, vol. 99, 1991, pages 31 - 37 |
| ZHONG Z. ET AL., BIORXIV, 2019 |
| ZUCKERMANN M ET AL., NAT. COMMUN., vol. 6, 2015, pages 7391 |
| ZUO ET AL., NATURE METHODS, vol. 17, 2020, pages 600 - 604 |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2023227669A3 (en) * | 2022-05-26 | 2024-02-22 | UCB Biopharma SRL | Novel nucleic acid-editing proteins |
| WO2024183751A1 (en) * | 2023-03-07 | 2024-09-12 | 上海科技大学 | ISCBN-ωRNA EDITING SYSTEM AND USE THEREOF |
| WO2025091603A1 (en) * | 2023-10-31 | 2025-05-08 | 江南大学 | Construction and use of new crispr-cas12b-based base editor |
| CN119685293A (en) * | 2025-02-27 | 2025-03-25 | 天津科技大学 | A UNG mutant and fusion protein that can be used for base editing in Bacillus methanolicus, a base editing system and applications |
| CN119685293B (en) * | 2025-02-27 | 2025-06-17 | 天津科技大学 | A UNG mutant and fusion protein, base editing system and application for base editing of Bacillus methanolicus |
Also Published As
| Publication number | Publication date |
|---|---|
| GB202010348D0 (en) | 2020-08-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11939605B2 (en) | Thermostable CAS9 nucleases | |
| US20250136960A1 (en) | Cell data recorders and uses thereof | |
| WO2022008466A1 (en) | Base editing tools | |
| US12312613B2 (en) | Unconstrained genome targeting with near-PAMless engineered CRISPR-Cas9 variants | |
| US12286654B2 (en) | Base editing enzymes | |
| EP3467125B1 (en) | Using rna-guided foki nucleases (rfns) to increase specificity for rna-guided genome editing | |
| US20230348877A1 (en) | Base editing enzymes | |
| CN117222741A (en) | Site-specific genome modification technology | |
| US20250122490A1 (en) | Nucleic Acid-Guided Nucleases | |
| US20250243514A1 (en) | Compositions, methods, and systems for dna modification | |
| KR102913446B1 (en) | Thermostable CAS9 nuclease | |
| Glibauskaitė | Directed evolution studies of a methylation-sensitive cas9 for human genome editing | |
| CN120699942A (en) | Deaminases for base editing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21739145 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 21739145 Country of ref document: EP Kind code of ref document: A1 |