HK1261797A1 - Evolved cas9 proteins for gene editing - Google Patents
Evolved cas9 proteins for gene editing Download PDFInfo
- Publication number
- HK1261797A1 HK1261797A1 HK19121702.5A HK19121702A HK1261797A1 HK 1261797 A1 HK1261797 A1 HK 1261797A1 HK 19121702 A HK19121702 A HK 19121702A HK 1261797 A1 HK1261797 A1 HK 1261797A1
- Authority
- HK
- Hong Kong
- Prior art keywords
- amino acid
- seq
- acid sequence
- cas9 protein
- mutations
- Prior art date
Links
Description
RELATED APPLICATIONS
According to 35 u.s.c. § 119(e), the present application claims priority to the following U.S. provisional patent applications: u.s.s.n.62/245,828 filed 10/23/2015, u.s.s.n.62/279,346 filed 1/15/2016, u.s.s.n.62/311,763 filed 22/2016, u.s.s.n.62/322,178 filed 13/4/2016, u.s.s.s.n.62/357,352 filed 30/6/2016, u.s.s.s.n.62/370,700 filed 8/3/2016, u.s.s.s.n.62/398 filed 22/2016, 9/22/2016, 490, u.s.s.s.n.62/408,686 filed 4/10/2016, and u.s.s.n.62/357,332 filed 30/6/2016; each of which is incorporated herein by reference.
Background
Targeted editing of nucleic acid sequences, such as targeted cleavage or targeted introduction of specific modifications into genomic DNA, is a very promising approach for studying gene function and also has the potential to provide new therapies for human genetic diseases.1The ideal nucleic acid editing technique has three characteristics: (1) the efficiency of the modification required for insertion is high; (2) minimal off-target activity; (3) the ability to program to accurately edit at any site in a given nucleic acid, such as any site in the human genome.2Current genome engineering tools, including engineered Zinc Finger Nucleases (ZFNs),3transcriptional activators such as effectorsThe nuclease TALEN is used for the transcription reaction of the TALEN,4and the most recent RNA-guided DNA endonuclease, Cas9, act on sequence-specific DNA cleavage in the genome. This programmable cleavage can result in mutation of the DNA at the cleavage site by non-homologous end joining (NHEJ) or replacement of the DNA surrounding the cleavage site by Homology Directed Repair (HDR).6,7
One drawback of the prior art is that NHEJ and HDR are both random processes, which typically result in moderate gene editing efficiency and unwanted gene alterations that can compete with the desired alterations.8Since in principle many genetic diseases can be caused by specific nucleotide changes affecting specific positions in the genome (e.g., C to T changes in specific codons of a gene associated with the disease),9the development of programmable approaches to achieve such precise editing would represent a powerful new research tool, as well as a potential new approach to gene editing-based human therapeutics.
Another disadvantage of existing genome engineering tools is that they are limited by the DNA sequences that can be targeted. When using ZNF or TALEN, a new protein must be generated for each individual target sequence. Although Cas9 can target almost any target sequence by providing a suitable guide RNA, Cas9 technology is still limited in sequence that can be targeted by strict requirements for the protospacer-adjacent motif (PAM), which typically is the nucleotide sequence 5' -NGG-3', that must be present immediately at the 3' end of the target DNA sequence in order for the Cas9 protein to bind to and act on the target sequence. The PAM requirement therefore limits the sequences that can be targeted efficiently by the Cas9 protein.
Summary of The Invention
Importantly, 80-90% of protein mutations that cause human disease result from substitutions, deletions or insertions of only a single nucleotide.6Most current strategies for single base gene correction include engineered nucleases (which rely on double strand breaks, DSBs)Followed by random, offset homology directed repair, HDR), and DNA-RNA chimeric oligonucleotides.22The latter strategy involves designing the RNA/DNA sequence to base pair with a specific sequence in the genomic DNA, except for the nucleotide to be edited. The resulting mismatch is recognized and repaired by the endogenous repair system of the cell, resulting in a change in the sequence of the chimera or genome. Both strategies suffer from low gene editing efficiency and unwanted gene alterations, as they suffer from randomness in HDR and competition between HDR and non-homologous end joining NHEJ.23-25HDR efficiency depends on the location of the target gene in the genome,26the state of the cell cycle is such that,27and the type of cell/tissue. The development of a straightforward, programmable way to install specific types of base modifications to precise locations of genomic DNA with enzyme-like efficiency and without randomness therefore represents a powerful new approach to gene editing-based research tools and human therapeutics.
Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) systems are recently discovered prokaryotic adaptive immune systems10It has been modified to enable robust and universal genome engineering in various organisms and cell lines.11The CRISPR-Cas (CRISPR-associated) system is a protein-RNA complex that uses an RNA molecule (sgRNA) as a guide to localize the complex to a target DNA sequence by base pairing.12In natural systems, the Cas protein then acts as an endonuclease to cleave the target DNA sequence.13The target DNA sequence must be both complementary to the sgRNA and contain a "protospacer adjacent motif" (PAM) at the 3' end of the complementary region for the system to function.14The requirement of PAM sequences limits the use of Cas9 technology because not all desired targeting sequences contain a PAM sequence at the 3' terminus and therefore cannot be effectively targeted by wild-type Cas9 protein.
Provided herein are novel Cas9 variants that exhibit activity against target sequences that do not comprise a canonical PAM sequence (5' -NGG-3', where N is any nucleotide) at the 3' end. Such Cas9 variants are not limited to target sequences that include a canonical PAM sequence at the 3' end.
Among the known Cas proteins, Streptococcus pyogenes (Streptococcus pyogenes) Cas9 has been widely used as a tool for genome engineering.15The Cas9 protein is a large multi-domain protein containing two distinct nuclease domains. Point mutations can be introduced into Cas9 to eliminate nuclease activity, resulting in a lethal Cas9(dCas9) that still retains the ability to bind DNA in the manner programmed by the sgRNA.16In principle, such Cas9 variants, when fused to another protein or domain, are able to target the protein to almost any DNA sequence simply by co-expression with the appropriate sgRNA. Accordingly, the disclosure also includes fusion proteins comprising such Cas9 variants and DNA modification domains (such as deaminase, nuclease, nickase, recombinase, methyltransferase, methylase, acetylase, acetyltransferase, transcriptional activator or transcriptional repressor domains), and the use of such fusion proteins to correct mutations in a disease-associated genome (e.g., the genome of a human subject), or to generate mutations in a genome (e.g., the human genome) to reduce or prevent expression of a gene.
In some embodiments, any of the Cas9 proteins provided herein can be fused to a protein having enzymatic activity. In some embodiments, the enzymatic activity modifies the target DNA. In some embodiments, the enzymatic activity is a nuclease activity, a methyltransferase activity, a demethylase activity, a DNA repair activity, a DNA damage activity, a deamination activity, a dismutase activity, an alkylation activity, a depurination activity, an oxidation activity, a pyrimidine dimer formation activity, an integrase activity, a transposase activity, a recombinase activity, a polymerase activity, a ligase activity, a helicase activity, a photolytic activity, or a glycosylase activity. In some cases, the enzymatic activity is a nuclease activity. In some cases, nuclease activity introduces double-strand breaks in the target DNA. In some cases, the enzymatic activity modifies a target polypeptide associated with the target DNA. In some cases, the enzymatic activity is methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylating activity, polyadenylation activity, sumoylating activity, desusumoylating activity, ribosylating activity, deglycosylating activity, myristoylation activity, or myristoylation activity. In some cases, the target polypeptide is a histone and the enzymatic activity is methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, or deubiquitinating activity.
In some embodiments, any of the Cas9 proteins provided herein can be fused to a protein having enzymatic activity. In some embodiments, the enzymatic activity modifies a polypeptide (e.g., a histone) associated with DNA. In some embodiments, the enzymatic activity is methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity (i.e., ubiquitination activity), deubiquitination activity, adenylation activity, polyadenylation activity, sumoylation activity, desuhmidation activity, ribosylation activity, enucleation glycosylation activity, myristoylation activity glycosylation activity (e.g., from O-GlcNAc transferase), or deglycosylation activity. The enzymatic activities listed herein catalyze covalent modification of proteins. It is known in the art that such modifications can alter the stability or activity of a target protein (e.g., phosphorylation by kinase activity can stimulate or silence protein activity depending on the target protein). A protein target of particular interest is histone. It is known in the art that histone proteins bind to DNA and form complexes known as nucleosomes. Histones can be modified (e.g., by methylation, acetylation, ubiquitination, phosphorylation) to cause structural changes in surrounding DNA, thereby controlling the accessibility of potentially large portions of DNA to interacting factors such as transcription factors, polymerases, and the like. A single histone can be modified in many different ways and in many different combinations (e.g., trimethylated H3K27 of lysine 27 of histone 3 is associated with a DNA region that represses transcription, while trimethylated H3K4 of lysine 4 of histone 3 is associated with a DNA region that activates transcription). Thus, site-directed modifying polypeptides having histone modification activity can be used for site-specific control of DNA structure and can be used to alter histone modification patterns in selected regions of a target DNA. Such methods are useful for research and clinical applications.
In some embodiments, the deaminase domain catalyzes the removal of an amine group from a molecule. In a further embodiment, the cytidine deaminase domain deaminates cytosines to produce uracils. In other embodiments, the nuclease domain has enzymatic activity and can cleave phosphodiester bonds between nucleotide subunits of a nucleic acid. In some embodiments, recombinase domains of specific sequences of recombinant DNA can be used to manipulate the structure of the genome and control gene expression. In further embodiments, the methylase domains may be used to methylate their respective substrates, while the acetylase domains may be used to acetylate their respective substrates. In other embodiments, the acetyltransferase domain may be used to transfer acetyl groups. Examples of acetyltransferase molecules include, but are not limited to, histone acetyltransferases (e.g., CBP histone acetyltransferases), choline acetyltransferases, chloramphenicol acetyltransferases, serotonin N-acetyltransferases, NatA acetyltransferases, and NatB acetyltransferases. The present disclosure also encompasses transcriptional activator and transcriptional repressor domains. The transcription activator domain is a region of a transcription factor that can activate transcription of a promoter by interaction or multiple interactions with a DNA binding domain, typically a transcription factor and an RNA polymerase. A transcription repressor domain is a region of a transcription factor that can repress transcription from a promoter through interaction or multiple interactions with DNA binding domains, typically transcription factors and RNA polymerase.
The potential of Cas9 systems for genome engineering is enormous. Its unique ability to bring proteins to specific sites in the genome programmed by sgrnas can be exploited as a variety of site-specific genome engineering tools in addition to nucleases, including transcriptional activators, transcriptional repressors, histone modification proteins, integrases, deaminases and recombinases.11Some of these potential applications have recently been associated withThe dCas9 fusion of the transactivator to provide an RNA-guided transcriptional activator,17,18a transcription repressor that is capable of repressing transcription,16,19,20and chromatin modifying enzymes.21Simple co-expression of these fusion proteins with various sgrnas results in specific expression of the target gene. These pioneering studies pave the way to design and construct easily programmable sequence-specific effectors to manipulate the genome precisely.
Some aspects of the present disclosure provide strategies, systems, proteins, nucleic acids, compositions, cells, reagents, methods, and kits useful for targeted binding, editing, and cleavage of nucleic acids, including encoding a single site in the genome of a subject, e.g., the genome of a human subject. In some embodiments, a recombinant Cas9 protein is provided that comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, or at least ten changes as compared to a naturally occurring protein and that exhibits activity on a target sequence that does not comprise the canonical PAM (5' -NGG-3', where N is any nucleotide) at the 3' end. Examples of such Cas9 protein mutations are given in tables 3,5, 8 and 9. In some embodiments, fusion proteins of Cas9 and a nucleic acid editing enzyme or enzymatic domain, such as a deaminase domain, are provided. In some embodiments, methods for targeted nucleic acid binding, editing and/or cleavage are provided. In some embodiments, reagents and kits are provided for generating fusion proteins that target nucleic acid binding, editing and/or cleaving proteins, such as Cas9 variants and nucleic acid editing enzymes or domains.
Some aspects of the disclosure provide a recombinant Cas9 protein comprising an amino acid sequence at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of Cas9 provided by any one of the sequences set forth in SEQ ID NOs 9-262, wherein the amino acid sequence of the Cas protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 262,267,294,405,409,480,543,694,1219,1224,1256 and 1362 of Streptococcus pyogenes having the amino acid sequence provided in SEQ ID NO 9 or the corresponding amino acid residues in any of the amino acid sequences provided in SEQ ID NO 10-262. In some embodiments, the recombinant Cas9 protein comprises RuvC and HNH domains. In some embodiments, the amino acid sequence of the recombinant Cas9 protein is different from the amino acid sequence of the naturally occurring Cas9 protein. In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: the corresponding mutation in any one of X262T, X294R, X409I, X480K, X543D, X694I and X1219V of the amino acid sequence provided in SEQ ID NO 9, or the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid. In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 9, A262T, K294R, S409I, E480K, E543D, M694I and E1219V, or SEQ ID NO 10-262.
Other aspects of the disclosure provide a recombinant Cas9 protein comprising an amino acid sequence at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of Cas9 provided by any one of the sequences set forth in SEQ ID NOs 10-262, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 262,267,294,405,409,480,543,694,1219,1224,1256 and 1362 of the amino acid sequence provided in SEQ ID NO 9, or the corresponding amino acid residues in any one of the amino acid sequences provided in SEQ ID NO 10-262; and wherein the amino acid sequence of the recombinant Cas9 protein is different from the amino acid sequence of a naturally occurring Cas9 protein. In some embodiments, the recombinant Cas9 protein comprises RuvC and HNH domains. In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 9, X262T, X267G, X294R, X405I, X409I, X480K, X543D, X694I, X1219V, X1224K and X1256K, or in SEQ ID NO 10-262, wherein X denotes any amino acid. In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 9, A262T, S267G, K294R, F405I, S409I, E480K, E543D, M694I, E1219V, N1224K and Q1256K, or in the amino acid sequences provided in SEQ ID NO 10-262.
It is to be understood that any amino acid mutation (e.g., a262T) described herein from a first amino acid residue (e.g., a) to a second amino acid residue (e.g., T) can also include amino acid residues from the first amino acid residue to a similar (e.g., conserved) amino acid residue as the second amino acid residue. For example, an alanine to threonine mutation (e.g., the a262T mutation) can also be from an alanine to an amino acid that is similar in size and chemistry to threonine (e.g., serine). Additional pairs of similar amino acids include, but are not limited to, the following: phenylalanine and tyrosine; asparagine and glutamine; methionine and cysteine; aspartic acid and glutamic acid; and arginine and lysine. One skilled in the art will recognize that such conservative amino acid substitutions may have less impact on protein structure and may be well tolerated without compromising function. In some embodiments, any amino acid of the amino acid mutations provided herein from one amino acid to threonine can be an amino acid mutation to threonine. In some embodiments, any amino acid of the amino acid mutations provided herein from one amino acid to arginine may be an amino acid mutation to lysine. In some embodiments, any amino acid that is mutated from one amino acid to an isoleucine provided herein can be an amino acid mutation to alanine, valine, methionine or leucine. In some embodiments, any amino acid mutation provided herein from one amino acid to a lysine amino acid may be an amino acid mutation to an arginine amino acid. In some embodiments, any amino acid mutation from an amino acid to aspartic acid provided herein can be an amino acid mutation to glutamic acid or asparagine. In some embodiments, any amino acid mutated from an amino acid to valine provided herein may be an amino acid mutation to alanine, isoleucine, methionine or leucine. In some embodiments, any amino acid of the amino acid mutations provided herein from one amino acid to glycine may be an amino acid mutation to alanine. However, it is understood that one of skill in the art will recognize additional conserved amino acid residues, and that any amino acid mutation to other conserved amino acid residues is also within the scope of the present disclosure.
In some embodiments, the Cas9 protein is a Cas9 domain of a fusion protein. In some embodiments, the amino acid sequence of the Cas9 protein comprises the X1219V mutation of the amino acid sequence provided in SEQ ID NO:9, or SEQ ID NO:10-262, wherein X represents any amino acid. In some embodiments, the mutation is X1219A, X1219I, X1219M or X1219L.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the E1219V mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262. In some embodiments, the mutation is E1219A, E1219I, E1219M or E1219L.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the X480K mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262, wherein X represents any amino acid. In some embodiments, the mutation is X480R.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the E480K mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262. In some embodiments, the mutation is E480R.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the X543D mutation of the amino acid sequence provided in SEQ ID NO:9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NOs: 10-262, wherein X represents any amino acid. In some embodiments, the mutation is X543N.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the E543D mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262. In some embodiments, the mutation is E543N.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the X480K, X543D, and X1219V mutations of the amino acid sequences provided in SEQ ID NO:9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the X262T, X409I, X480K, X543D, X694I, and X1219V mutations of the amino acid sequences provided in SEQ ID NO:9, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the X294R, X480K, X543D, X1219V, X1256K, and X1362P mutations of the amino acid sequences provided in SEQ ID NO:9, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the X294R, X480K, X543D, X1219V and X1256K mutations of the amino acid sequences provided in SEQ ID NO:9, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the X267G, X294R, X480K, X543D, X1219V, X1224K, and X1256K mutations of the amino acid sequences provided in SEQ ID NO:9, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the X262T, X405I, X409I, X480K, X543D, X694I and X1219V mutations of the amino acid sequences provided in SEQ ID NO:9, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the E480K, E543D, and E1219V mutations of the amino acid sequences provided in SEQ ID NO:9, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the a262T, S409I, E480K, E543D, M694I, and E1219V mutations of the amino acid sequences provided in SEQ ID NO:9, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the K294R, E480K, E543D, E1219V, Q1256K, and L1362P mutations of the amino acid sequences provided in SEQ ID No. 9, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID nos. 10-262.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the K294R, E480K, E543D, E1219V, and Q1256K mutations of the amino acid sequences provided in SEQ ID NO:9, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the S267G, K294R, E480K, E543D, E1219V, N1224K, and Q1256 mutations of the amino acid sequences provided in SEQ ID NO:9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the a262T, F405I, S409I, E480K, E543D, M694I, and E1219V mutations of the amino acid sequences provided in SEQ ID NO:9, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262.
The HNH nuclease domain of Cas9 is used to cleave a DNA strand complementary to a guide rna (grna) its active site consists of a ββα -metal fold, and its histidine 840 activates water molecules to attack the scissile phosphate, which is more electrophilic due to coordination with magnesium ions, resulting in cleavage of the 3'-5' phosphate bond in some embodiments, the amino acid sequence of the HNH domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the HNH domain of any one of SEQ ID NOs 9-262.
The RuvC domain of Cas9 cleaves non-target DNA strands. It is encoded by sequentially scattered sites, which interact in tertiary structure to form a RuvC cleavage domain and consist of an RNase H fold structure. In some embodiments, the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of any one of SEQ ID NOs 9-262. In some embodiments, the amino acid sequence of the RuvC domain is identical to the amino acid sequence of the RuvC domain of any one of SEQ ID NOs 9-262.
In some embodiments, the Cas9 protein comprises one or more mutations that affect (e.g., inhibit) the ability of Cas9 to cleave one or both strands of a DNA duplex. In some embodiments, the Cas9 protein comprises the D10A and/or H840A mutations of the amino acid sequences provided in SEQ ID NO 9A variant or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NO 10-262. In some embodiments, the Cas9 protein comprises D10X of the amino acid sequence provided in SEQ ID No. 91And/or H840X2A mutation, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NO 10-262, wherein X1Is any amino acid except D, and X2Is any amino acid other than H. In some embodiments, the Cas9 protein comprises the D10A mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262. In some embodiments, the Cas9 protein comprises the H at amino acid residue 840 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding residue in any of the amino acid sequences provided in SEQ ID nos. 10-262. In some embodiments, the Cas9 protein comprises the H840A mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262. In some embodiments, the Cas9 protein comprises the D at amino acid residue 10 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding residue in any of the amino acid sequences provided in SEQ ID nos. 10-262.
In some embodiments, the Cas9 protein of the present disclosure exhibits activity, e.g., increased binding, to a target sequence that does not comprise the canonical PAM (5' -NGG-3') at its 3' end compared to the streptococcus pyogenes Cas9 provided in SEQ ID NO: 9.
Some aspects of the disclosure provide a recombinant Cas9 protein comprising an amino acid sequence that is at least 90% identical to the amino acid sequence of streptococcus pyogenes Cas9 provided in SEQ ID No. 9, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 262,267,294,405,409,480,543,694,1219,1224,1256 and 1362 of the amino acid sequence provided in SEQ ID NO:9, wherein the amino acid sequence of the recombinant Cas9 protein differs from the amino acid sequence of a naturally occurring Cas9 protein, and wherein the recombinant Cas9 protein exhibits increased activity on a target sequence that does not comprise the canonical PAM (5' -NGG-3') at its 3' end compared to the streptococcus pyogenes Cas9 provided in SEQ ID NO: 9. In some embodiments, streptococcus pyogenes Cas9 comprises RuvC and HNH domains. In other embodiments, the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 122,137,182,262,294,409,480,543,660,694,1219 and 1329 of the amino acid sequence provided in SEQ ID NO 9.
As an example, the Cas9 protein may exhibit increased binding to the target sequence, may exhibit increased nuclease activity at the target sequence, or may exhibit an increase in other activity, depending on whether the Cas9 protein is fused to another domain, such as an enzyme with enzymatic activity. In some embodiments, the enzymatic activity modifies the target DNA. In some embodiments, the enzymatic activity is a nuclease activity, a methyltransferase activity, a demethylase activity, a DNA repair activity, a DNA damage activity, a deamination activity, a dismutase activity, an alkylation activity, a depurination activity, an oxidation activity, a pyrimidine dimer formation activity, an integrase activity, a transposase activity, a recombinase activity, a polymerase activity, a ligase activity, a helicase activity, a photolytic activity, or a glycosylase activity. In some cases, the enzymatic activity is a nuclease activity. In some cases, nuclease activity introduces double-strand breaks in the target DNA. In some cases, the enzymatic activity modifies a target polypeptide associated with the target DNA. In some cases, the enzymatic activity is methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylating activity, polyadenylation activity, sumoylating activity, desusumoylating activity, ribosylating activity, deglycosylating activity, myristoylation activity, or myristoylation activity. In some cases, the target polypeptide is a histone and the enzymatic activity is methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, or deubiquitinating activity.
In some embodiments, any Cas9 protein may be fused to a protein with enzymatic activity. In some embodiments, the enzymatic activity modifies a polypeptide (e.g., a histone) associated with DNA. In some embodiments, the enzymatic activity is methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity (i.e., ubiquitination activity), deubiquitination activity, adenylation activity, polyadenylation activity, sumoylation activity, desuhmidation activity, ribosylation activity, enucleation glycosylation activity, myristoylation activity glycosylation activity (e.g., from O-GlcNAc transferase), or deglycosylation activity. The enzymatic activities listed herein catalyze covalent modification of proteins. It is known in the art that such modifications can alter the stability or activity of a target protein (e.g., phosphorylation by kinase activity can stimulate or silence protein activity depending on the target protein). A protein target of particular interest is histone. It is known in the art that histone proteins bind to DNA and form complexes known as nucleosomes. Histones can be modified (e.g., by methylation, acetylation, ubiquitination, phosphorylation) to cause structural changes in surrounding DNA, thereby controlling the accessibility of potentially large portions of DNA to interacting factors such as transcription factors, polymerases, and the like. A single histone can be modified in many different ways and in many different combinations (e.g., trimethylated H3K27 of lysine 27 of histone 3 is associated with a DNA region that represses transcription, while trimethylated H3K4 of lysine 4 of histone 3 is associated with a DNA region that activates transcription). Thus, site-directed modifying polypeptides having histone modification activity can be used for site-specific control of DNA structure and can be used to alter histone modification patterns in selected regions of a target DNA. Such methods are useful for research and clinical applications.
In some embodiments, the Cas9 protein exhibits an activity on a target sequence that has a 3' end that is not directly adjacent to the canonical PAM sequence (5' -NGG-3') or that does not have the canonical PAM sequence that is at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1000-fold, at least 5000-fold, at least 10000-fold, at least 50000-fold, at least 100000-fold, at least 500000-fold, or at least 1000000-fold greater than the activity of streptococcus pyogenes Cas9 on the same target sequence provided by SEQ ID No. 9.
In some embodiments, the 3' end of the target sequence is directly adjacent to the AGC, GAG, TTT, GTG, CAA CAC, GAT, TAA, ACG, CGA or CGT sequence.
In some embodiments, Cas9 protein activity is measured by a nuclease assay or a nucleic acid binding assay, which are known in the art and will be apparent to those of skill in the art. As provided herein, a Cas9 protein can be fused to one or more domains that confer a protein activity, e.g., a nucleic acid editing activity (e.g., deaminase activity or transcription activation activity), which can be measured (e.g., by a deaminase assay or transcription activation assay). In some embodiments, the Cas9 protein is fused to a deaminase domain, and its activity can be measured using a deaminase assay. In some embodiments, the Cas9 protein is fused to a transcriptional activation domain and its activity can be measured using a transcriptional activation assay in which a reporter, such as GFP or luciferase, is expressed from the binding of the corresponding Cas9 to the target sequence.
In some embodiments, the amino acid sequence of the Cas9 protein comprises any of the mutations provided herein. For example, in some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 9, X262T, X267G, X294R, X405I, X409I, X480K, X543D, X694I, X1219V, X1224K, X1256K and X1362P, or in SEQ ID NO 10-262, wherein X represents any amino acid. In other embodiments, the mutation may be a corresponding mutation in any of the amino acid sequences provided in SEQ ID No. 9, a262T, S267G, K294R, F405I, S409I, E480K, E543D, M694I, E1219V, N1224K, Q1256K and L1362P, or the amino acid sequences provided in SEQ ID nos. 10-262.
In some embodiments, the amino acid sequence of the Cas9 protein comprises any of the mutations provided herein. For example, in some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: the corresponding mutation in any one of X262T, X294R, X409I, X480K, X543D, X694I and X1219V of the amino acid sequence provided in SEQ ID NO 9, or the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid. In other embodiments, the mutation may be a corresponding mutation in any of A262T, K294R, S409I, E480K, E543D, M694I or E1219V of the amino acid sequence provided in SEQ ID NO:9, or the amino acid sequence provided in SEQ ID NO: 10-262.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the X1219V mutation or the E1219V mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262, wherein X represents any amino acid.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the X480K mutation or the E480K mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262, wherein X represents any amino acid.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the X543D mutation or the E543D mutation of the amino acid sequence provided in SEQ ID NO:9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the mutations X480K, X543D, and X1219V of the amino acid sequence provided in SEQ ID No. 9; or mutations E480K, E543D and E1219V, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOS: 10-262, wherein X represents any amino acid.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the mutations X262T, X409I, X480K, X543D, X694I, and X1219V of the amino acid sequence provided in SEQ ID No. 9; or the mutations A262T, S409I, E480K, E543D, M694I and E1219V, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 10 to 262, where X denotes any amino acid.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the mutations X294R, X480K, X543D, X1219V, X1256K, and X1362P of the amino acid sequence provided in SEQ ID No. 9; or the mutations K294R, E480K, E543D, E1219V, Q1256K and L1362P, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 10 to 262, where X denotes any amino acid.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the mutations X294R, X480K, X543D, X1219V, and X1256K of the amino acid sequence provided in SEQ ID No. 9; or the mutations K294R, E480K, E543D, E1219V and Q1256K, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the mutations X267G, X294R, X480K, X543D, X1219V, X1224K, and X1256K of the amino acid sequence provided in SEQ ID No. 9; or the mutations S267G, K294R, E480K, E543DE121 1219V, N1224K and Q1256K, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 10 to 262, where X denotes any amino acid.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the mutations X262T, X405I, X409I, X480K, X543D, X694I, and X1219V of the amino acid sequence provided in SEQ ID NO: 9; or the mutations A262T, F405I, S409I, E480K, E543D, M694I and E1219V, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOS 10 to 262, wherein X represents any amino acid.
In some embodiments, the amino acid sequence of the HNH domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the HNH domain of any one of SEQ ID NOs 9-262. In some embodiments, the amino acid sequence of the HNH domain is identical to the amino acid sequence of the HNH domain of any one of SEQ ID NOs 9-262.
In some embodiments, the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of any one of SEQ ID NOs 9-262. In some embodiments, the amino acid sequence of the RuvC domain is identical to the amino acid sequence of the RuvC domain of any one of SEQ ID NOs 9-262.
In some embodiments, the Cas9 protein comprises the D10A and/or H840A mutations of the amino acid sequence provided in SEQ ID No. 9 or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262. In some embodiments, the Cas9 protein comprises D10X of the amino acid sequence provided in SEQ ID No. 91And/or H840X2A mutation, or a corresponding mutation in any one of the amino acid sequences provided in SEQ ID NO 10-262, wherein X1Is any amino acid except D, and X2Is any amino acid other than H. In some embodiments, the Cas9 protein comprises the D10A mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262. In some embodiments, the Cas9 protein comprises the H at amino acid residue 840 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding residue in any of the amino acid sequences provided in SEQ ID nos. 10-262. In some embodiments, the Cas9 protein comprises the H840A mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262. In some embodiments, the Cas9 protein comprises the D at amino acid residue 10 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding residue in any of the amino acid sequences provided in SEQ ID nos. 10-262.
Some aspects of the disclosure provide fusion proteins comprising a Cas9 protein provided herein fused to a second protein, thereby forming a fusion protein in some embodiments, the second protein is fused to the N-terminus of the Cas9 protein in some embodiments, the second protein is fused to the C-terminus of the Cas9 protein in some embodiments, the Cas9 domain and the effector domain are fused via a linker the linker may be as simple as a covalent bond or it may be a polymeric linker of many atomic lengths in some embodiments, the linker is a polypeptide or based on an amino acid in other embodiments the linker is not peptide-like in some embodiments the linker is a covalent bond (e.g., a carbon-carbon bond, a disulfide bond, a carbon-heteroatom bond, etc.) in some embodiments, the linker is an amide-linked carbon-nitrogen bond.
In some embodiments, the linker comprises two domains, e.g., a nuclease-inactivated Cas9 domain and an effector domain (e.g., a deaminase domain), connecting two molecules or moieties, e.g., a fusion proteinChemical groups or molecules. In some embodiments, the linker comprises one or more amino acid residues. For example, a linker may comprise at least 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,25,30,35,40,45,50 or more amino acid residues. In some embodiments, the linker is 3,9,16, or 21 amino acids in length. In some embodiments, the linker comprises (GGGGS)n(SEQ ID NO:5),(G)n,(EAAAK)n(SEQ ID NO:6),(GGS)nSGSETPGTSESATPES (SEQ ID NO:7) (also known as XTEN), or (XP)nA motif or a combination of any of these, wherein n is independently an integer from 1 to 30. In some embodiments, wherein the linker comprises (GGS)3Motif or SGSETPGTSESATPES (SEQ ID NO:7) (XTEN) motif.
Some aspects of the disclosure provide fusion proteins comprising a Cas9 protein provided herein fused to a second protein, thereby forming a fusion protein. In some embodiments, the second protein is fused to the N-terminus of the Cas9 protein. In some embodiments, the second protein is fused to the C-terminus of the Cas9 protein. In some embodiments, the Cas9 domain and the effector domain are fused by a nucleic acid localization sequence (NLS), such as an NLS comprising the amino acid sequence PKKKRKV (SEQ ID NO:299), MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO:300), or SPKKKRKVEAS (SEQ ID NO: 284). In some embodiments, the NLS can be combined with any of the linkers listed above.
In some embodiments, the effector domain comprises an enzyme domain. In some embodiments, the effector domain comprises a nuclease, nickase, acetylase, acetyltransferase, transcriptional activator, or transcriptional repressor domain that can have nuclease activity, nickase activity, recombinase activity, deaminase activity, methyltransferase activity, methylase activity, acetylase activity, acetyltransferase activity, transcriptional activation activity, or transcriptional repressor activity, respectively. In some embodiments, the effector domain is an effector domain. In some embodiments, the effector domain is a deaminase domain. In some embodiments, the deaminase is a cytosine deaminase or a cytidine deaminase. In some embodiments, the deaminase is an apolipoprotein B mRNA editing complex (APOBEC) family deaminase. In some embodiments, the deaminase is an APOBEC1 deaminase. In some embodiments, the deaminase is an APOBEC2 deaminase. In some embodiments, the deaminase is an APOBEC3 deaminase. In some embodiments, the deaminase is an APOBEC3A deaminase. In some embodiments, the deaminase is an APOBEC3D deaminase. In some embodiments, the deaminase is an APOBEC3E deaminase. In some embodiments, the deaminase is an APOBEC3F deaminase. In some embodiments, the deaminase is an APOBEC3G deaminase. In some embodiments, the deaminase is an APOBEC3H deaminase. In some embodiments, the deaminase is an APOBEC4 deaminase. In some embodiments, the deaminase is an activation-induced deaminase (AID). In some embodiments, the effector domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or at least 99.5% identical to the deaminase domain of any of SEQ ID NO: 263-291. In some embodiments, the deaminase is a cytidine deaminase. In some embodiments, the deaminase is an apolipoprotein B mRNA editing complex (APOBEC) family deaminase. In some embodiments, the deaminase is an APOBEC1 family deaminase. In some embodiments, the deaminase is an activation-induced cytidine deaminase (AID). In some embodiments, the deaminase is an ACF1/ASE deaminase. In some embodiments, the deaminase is an adenosine deaminase. In some embodiments, the deaminase is an ADAT family deaminase.
Some aspects of the disclosure provide fusion proteins comprising a Cas9 protein fused to effector domains such as deaminase and Uracil Glycosylase Inhibitor (UGI). Some aspects of the present disclosure are based on the recognition that such fusion proteins can exhibit increased nucleic acid editing efficiency compared to fusion proteins that do not comprise a UGI domain. Domains such as deaminase domains and UGI domains have been described and are within the scope of the present disclosure. For example, domains such as deaminase domains and UGI domains have been described in the following provisional application numbers: 62/245,828 filed 2015, 10-23, 62/279,346 filed 2016, 3-22, 62/311,763 filed 2016, 62/322,178 filed 2016, 4-13, 2016, 62/357,352 filed 2016, 6-30, 62/357,352 filed 2016, 3-3, 62/370,700 filed 2016, 62/398,490 filed 2016, 9-22, and 62/408,686 filed 2016, 10-14; the entire contents of each of which are incorporated herein by reference. It is understood that the deaminase domain and UGI domain described in the aforementioned references are within the scope of the present disclosure and can be fused to any of the Cas9 proteins provided herein.
In some embodiments, the effector domain of the fusion protein is a nuclease domain. In some embodiments, the nuclease domain is a FokI DNA cleavage domain. In some embodiments, the fusion protein dimerizes. In certain embodiments, the dimer of the fusion protein is active. For example, two Fok1DNA cleavage domains may dimerize to cleave nucleic acids.
In some embodiments, the Cas9 protein is fused to a second Cas9 protein. In some embodiments, the second Cas9 protein is a Cas9 protein of any one of claims 1-345. In some embodiments, the second Cas9 protein is fused to the N-terminus of the fusion protein. In some embodiments, the second Cas9 protein is fused to the C-terminus of the fusion protein. In some embodiments, the Cas9 protein and the second Cas9 protein are fused via a second linker. In some embodiments, in embodiments, the second linker comprises (GGGGS)n(SEQ ID NO:5),a(G)n,(EAAAK)n(SEQ ID NO:6),(GGS)nSGSETPGTSESATPES (SEQ ID NO:7) (also known as XTEN), or (XP)nA motif or a combination of any of these, wherein n is independently an integer from 1 to 30. In some embodiments, the second linker comprises (GGS)3And (c) a motif.
Some aspects of the disclosure provide complexes comprising a Cas9 protein or Cas9 fusion protein provided herein and a guide RNA that binds to a Cas9 protein or Cas9 fusion protein.
In some embodiments, the target sequence is a DNA sequence. In some embodiments, the target sequence is a sequence in a mammalian genome. In some embodiments, the target sequence is a sequence in the human genome. In some embodiments, the 3' end of the target sequence is not directly adjacent to the canonical PAM sequence (5' -NGG-3 ').
Some aspects of the disclosure provide methods of using Cas9 proteins, fusion proteins, or complexes provided herein. For example, some aspects of the disclosure methods, comprising contacting a DNA molecule with: (a) a Cas9 protein or fusion protein and a guide RNA provided herein, wherein the guide RNA is about 15-100 nucleotides in length and comprises at least 10 contiguous nucleotides that are complementary to a target sequence; or (b) a Cas9 protein, Cas9 fusion protein, or Cas9 protein, Cas9 fusion protein complexed with a gRNA, as provided herein. In some embodiments, the 3' end of the target sequence is not directly adjacent to the canonical PAM sequence (5' -NGG-3 '). In some embodiments, the 3' end of the target sequence is immediately adjacent to the AGC, GAG, TTT, GTG or CAA sequence. In some embodiments, the target sequence comprises a sequence associated with a disease or disorder. In some embodiments, the target sequence comprises a point mutation associated with a disease or disorder. In some embodiments, the Cas9 protein, the Cas9 fusion protein, or the complex results in the correction of a point mutation. In some embodiments, the step of contacting is performed in vivo in the subject.
Some aspects of the disclosure provide kits comprising a nucleic acid construct comprising: (a) a nucleotide sequence encoding a Cas9 protein or a Cas9 fusion protein provided herein; and (b) a heterologous promoter that drives expression of the sequence of (a). In some embodiments, the kit further comprises an expression construct encoding a guide RNA backbone, wherein the expression construct comprises a cloning site positioned to allow cloning of a nucleic acid sequence that is identical or complementary to a target sequence into the guide RNA backbone.
Some aspects of the disclosure provide polynucleotides encoding any of the Cas9 proteins, Cas9 fusion proteins, or guide RNAs that bind to a Cas9 protein or a Cas9 fusion protein provided herein. Some aspects of the disclosure provide vectors comprising such polynucleotides. In some embodiments, the vector comprises a heterologous promoter that drives expression of the polynucleotide.
Some aspects of the disclosure provide a cell comprising any of the Cas9 proteins, fusion proteins, nucleic acid molecules, and/or vectors provided herein.
The above summary is intended to illustrate, in a non-limiting manner, some embodiments, advantages, features and uses of the technology disclosed herein. Other embodiments, advantages, features, and uses of the technology disclosed herein will be apparent from the detailed description, drawings, examples, and claims.
Brief Description of Drawings
Figure 1 shows the activity of wild-type streptococcus pyogenes Cas9 on canonical and non-canonical PAM libraries.
Figure 2 shows the activity of an exemplary evolved Cas9 clone on a PAM library after directed evolution.
Figure 3 shows a comparison of wild-type Cas9 and evolved Cas9 in a mammalian GFP activation assay.
Figures 4A to 4B show the binding activity of Cas9(pJH306) and the evolved Cas9 protein to the 5'-NGG-3' PAM sequence using GFP as readout. Based on the increase in GFP fluorescence signal, for 5'-NGG-3' PAM, many evolved Cas9 proteins showed increased Cas9 binding activity relative to wild-type Cas 9. Figure 4A is a graph showing Cas9 binding activity as a function of% cells above background fluorescence. Figure 4B is a graph showing Cas9 binding activity as a function of mean fluorescence. The Cas9 protein used in these experiments was dCas9 protein fused to a VPR transcriptional activator.
FIGS. 5A-5B show the binding activity of wild-type dCas9-VPR (pJH306) and evolved dCas9-VPR protein to NNN PAM sequences using GFP as readout. Based on the increase in GFP fluorescence signal, many evolved Cas9 proteins showed increased Cas9 binding activity relative to wild-type Cas9 for NNN PAM libraries. Figure 5A is a graph showing Cas9 binding activity as a function of% cells above background fluorescence. Figure 5B is a graph showing Cas9 binding activity as a function of mean fluorescence. The Cas9 protein used in these experiments was dCas9 protein fused to VPR.
Fig. 6 shows dCas9-VPR on all 64 PAM sequences, as evidenced by mean fluorescence on transfected cells gated by iRFP fluorescence. WT dCas9-VPR is pJH 306.
Figure 7 shows an in vitro cleavage assay. On the gel, WT is wild-type Cas9(SEQ ID NO:9), and 1 is Cas9 with the E1219V mutation (SEQ ID NO: 9).
Figure 8 shows Cas9 fusion proteins that can be used to modulate PAM specificity. One possible configuration of Cas9-dCas9 system that can be used to increase the targeted attachment of Cas9 to non-canonical PAM is shown. Binding of dCas9 to 5'-NGG-3' or another PAM can localize Cas9 to a region near target 2. This positioning may help Cas9 cleave previously inaccessible PAMs.
Fig. 9A to 9B show the binding activity of dCas9-VPR on NNNNN PAM libraries. FIG. 9A is a graph showing the binding activity of dCas9-VPR to NNNNN PAM libraries as a function of% cells above background fluorescence. FIG. 9B is a graph showing the binding activity of dCas9-VPR to NNNNN PAM libraries as a function of mean fluorescence.
Figure 10 shows Cas9 cleavage activity using the percentage of cells with GFP loss as readout. Cas9 protein was tested using two sgrnas that target canonical 5'-NGG-3' PAM or GAT PAM within the GFP gene.
Fig. 11 shows a mapping from Cas 9: DNA editing enzyme: double stranded DNA substrate to which sgRNA complex binds. The DNA editing enzyme may be, but is not limited to, a nuclease, nickase, recombinase, deaminase, methyltransferase, methylase, acetylase, or acetyltransferase.
Fig. 12A to 12D show the results of the PAM depletion (depletion) assay. pJH760 was tested in a PAM depletion assay for four new targets: re2 (FIG. 12A), VEGF (FIG. 12B), CLTA (FIG. 12C) and CCR5D (FIG. 12D).
FIG. 13 shows GFP cleavage in mammalian cells.
Figure 14 shows the results of PAM depletion assays testing pJH760(xCas 9v 1.0) for re2 target.
Figure 15 shows the results of a PAM depletion assay testing pJH760(xCas 9v 1.0) for VEGF targets.
Figure 16 shows the results of PAM depletion assay testing pJH760(xCas 9v 1.0) for CLTA target.
Figure 17 shows the results of a PAM depletion assay testing pJH760(xCas 9v 1.0) for the CCR5D target.
Figure 18 shows the results of PAM depletion assays testing four xCas9v3 mutants against re2 target.
Figure 19 shows the results of a PAM depletion assay testing four xCas9v3 mutants for VEGF targets.
Figure 20 shows the results of a PAM depletion assay testing four xCas9v3 mutants against CLTA targets.
Detailed Description
Definition of
As used herein and in the claims, the singular forms "a," "an," and "the" include both singular and plural references unless the context clearly dictates otherwise. Thus, for example, reference to "an agent" includes a single agent and a plurality of such agents.
The term "Cas 9" or "Cas 9 nuclease" refers to an RNA-guided nuclease comprising a Cas9 protein or fragment thereof (e.g., a protein comprising the active or inactive DNA cleavage domain of Cas9, and/or the gRNA binding domain of Cas 9). Cas9nuclease is also sometimes referred to as Cas 1 nuclease or CRISPR (clustered regularly interspaced short palindromic repeats) associated nuclease. CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain spacers, sequences complementary to previous mobile elements, and target invading nucleic acids. The CRISPR cluster is transcribed and processed to CRISPR RNA (crRNA). In type II CRISPR systems, correct processing of pre-crRNA requires trans-encoded small rna (tracrrna), endogenous ribonuclease 3(rnc) and Cas9 proteins. tracrRNA can be used as a guide for ribonuclease 3-assisted processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endocleaves a linear or circular dsDNA target complementary to the spacer. The target strand that is not complementary to the crRNA is first cleaved, and then cleaved exonucleotically 3 '-5'. In nature, DNA binding and cleavage usually requires a protein and two RNAs. However, a single guide RNA ("sgRNA" or simply "gNRA") may be engineered to incorporate aspects of both crRNA and tracrRNA into a single RNA species. See, e.g., Jinek m., chylinki k., Fonfara i., Hauer m., Doudna j.a., charpienter e.science337:816-821(2012), the entire contents of which are incorporated herein by reference. Cas9 recognizes short motifs (PAM or protospacer adjacent motifs) in CRISPR repeats to help distinguish between self and non-self. Cas9nuclease sequences and structures are well known to those of skill in the art (see, e.g., "Complete genome sequence of an M1strain of Streptococcus polynucleotides," Ferretti et al, J.J., McShan W.M., Ajdic D.J., SavicD.J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar F.Z., Ren Q., Zhu H.G., Song L.J., Yuan X, Clifton S.W., Roue B.A., Laugh R.E., prohlia.W., Nature L.S.J., DNA J., DNA, Clifton S.W., DNA J.S.S.S., DNA J.S.S.S.S.S.S.S.D., DNA J.S.S.S.S.S.S.S.S., DNA, C. J., DNA J.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.58, DNA, C. 20, DNA, U.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.58, C. No. C. 5, U.S. DNA, U.S. 20, U.S. DNA, U.S. 20, hauer m, Doudna j.a., charpientier e.science337: 816-. Cas9 orthologs have been described in a variety of species, including but not limited to streptococcus pyogenes and streptococcus thermophilus. Based on The present disclosure, other suitable Cas9nucleases and sequences will be apparent to those skilled in The art, and such Cas9nucleases and sequences include Cas9 sequences from The species and loci disclosed in chylinki, rhin, and charpienter, "The tracrRNA and Cas9 families of typeII CRISPR-Cas immunnity systems" (2013) RNA Biology 10:5, 726-; the entire contents of which are incorporated herein by reference. In some embodiments, the Cas9nuclease has an inactive (e.g., inactivated) DNA cleavage domain.
The nuclease-inactivated Cas9 protein may be interchangeably referred to as the "dCas 9" protein (for nuclease- "dead" Cas 9). Methods of generating Cas9 with an inactive DNA cleavage domain are known (see, e.g., Jinek et al, science337:816-821 (2012); Qi et al, "reproducing CRISPR as an RNA-guided platform for Sequence-Specific Control of Gene Expression" (2013) cell.28; 152(5):1173-83, the entire contents of each of which are incorporated herein by reference). For example, the DNA cleavage domain of Cas9 is known to include two subdomains, an HNH nuclease subdomain and a RuvC1 subdomain. The HNH subdomain cleaves the strand complementary to the gRNA, while the RuvC1 subdomain cleaves the non-complementary strand. Mutations within these subdomains can silence the nuclease activity of Cas 9. For example, mutations D10A and H840A completely inactivate the nuclease activity of Streptococcus pyogenes Cas9 (Jinek et al, science337:816-821 (2012); Qi et al, cell.28; 152(5):1173-83 (2013)).
In some embodiments, proteins comprising a fragment of Cas9 are provided. For example, in some embodiments, the protein comprises one of two Cas9 domains: (1) a gRNA binding domain of Cas 9; or (2) the DNA cleavage domain of Cas 9. In some embodiments, a protein comprising Cas9 or a fragment thereof is referred to as a "Cas 9 variant". Cas9 variants have homology to Cas 9. For example, a Cas9 variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to wild-type Cas 9. In some embodiments, a Cas9 variant comprises a fragment of Cas9 (e.g., a gRNA binding domain or a DNA cleavage domain) such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild-type Cas 9. In some embodiments, wild-type Cas9 corresponds to Cas9 from Streptococcus pyogenes (NCBI reference sequence: NC-017053.1, SEQ ID NO:1 (nucleotides); SEQ ID NO:2 (amino acids)).
ATGGATAAGAAATACTCAATAGGCTTAGATATCGGCACAAATAGCGTCGGATGGGCGGTGATCACTGATGATTATAAGGTTCCGTCTAAAAAGTTCAAGGTTCTGGGAAATACAGACCGCCACAGTATCAAAAAAAATCTTATAGGGGCTCTTTTATTTGGCAGTGGAGAGACAGCGGAAGCGACTCGTCTCAAACGGACAGCTCGTAGAAGGTATACACGTCGGAAGAATCGTATTTGTTATCTACAGGAGATTTTTTCAAATGAGATGGCGAAAGTAGATGATAGTTTCTTTCATCGACTTGAAGAGTCTTTTTTGGTGGAAGAAGACAAGAAGCATGAACGTCATCCTATTTTTGGAAATATAGTAGATGAAGTTGCTTATCATGAGAAATATCCAACTATCTATCATCTGCGAAAAAAATTGGCAGATTCTACTGATAAAGCGGATTTGCGCTTAATCTATTTGGCCTTAGCGCATATGATTAAGTTTCGTGGTCATTTTTTGATTGAGGGAGATTTAAATCCTGATAATAGTGATGTGGACAAACTATTTATCCAGTTGGTACAAATCTACAATCAATTATTTGAAGAAAACCCTATTAACGCAAGTAGAGTAGATGCTAAAGCGATTCTTTCTGCACGATTGAGTAAATCAAGACGATTAGAAAATCTCATTGCTCAGCTCCCCGGTGAGAAGAGAAATGGCTTGTTTGGGAATCTCATTGCTTTGTCATTGGGATTGACCCCTAATTTTAAATCAAATTTTGATTTGGCAGAAGATGCTAAATTACAGCTTTCAAAAGATACTTACGATGATGATTTAGATAATTTATTGGCGCAAATTGGAGATCAATATGCTGATTTGTTTTTGGCAGCTAAGAATTTATCAGATGCTATTTTACTTTCAGATATCCTAAGAGTAAATAGTGAAATAACTAAGGCTCCCCTATCAGCTTCAATGATTAAGCGCTACGATGAACATCATCAAGACTTGACTCTTTTAAAAGCTTTAGTTCGACAACAACTTCCAGAAAAGTATAAAGAAATCTTTTTTGATCAATCAAAAAACGGATATGCAGGTTATATTGATGGGGGAGCTAGCCAAGAAGAATTTTATAAATTTATCAAACCAATTTTAGAAAAAATGGATGGTACTGAGGAATTATTGGTGAAACTAAATCGTGAAGATTTGCTGCGCAAGCAACGGACCTTTGACAACGGCTCTATTCCCCATCAAATTCACTTGGGTGAGCTGCATGCTATTTTGAGAAGACAAGAAGACTTTTATCCATTTTTAAAAGACAATCGTGAGAAGATTGAAAAAATCTTGACTTTTCGAATTCCTTATTATGTTGGTCCATTGGCGCGTGGCAATAGTCGTTTTGCATGGATGACTCGGAAGTCTGAAGAAACAATTACCCCATGGAATTTTGAAGAAGTTGTCGATAAAGGTGCTTCAGCTCAATCATTTATTGAACGCATGACAAACTTTGATAAAAATCTTCCAAATGAAAAAGTACTACCAAAACATAGTTTGCTTTATGAGTATTTTACGGTTTATAACGAATTGACAAAGGTCAAATATGTTACTGAGGGAATGCGAAAACCAGCATTTCTTTCAGGTGAACAGAAGAAAGCCATTGTTGATTTACTCTTCAAAACAAATCGAAAAGTAACCGTTAAGCAATTAAAAGAAGATTATTTCAAAAAAATAGAATGTTTTGATAGTGTTGAAATTTCAGGAGTTGAAGATAGATTTAATGCTTCATTAGGCGCCTACCATGATTTGCTAAAAATTATTAAAGATAAAGATTTTTTGGATAATGAAGAAAATGAAGATATCTTAGAGGATATTGTTTTAACATTGACCTTATTTGAAGATAGGGGGATGATTGAGGAAAGACTTAAAACATATGCTCACCTCTTTGATGATAAGGTGATGAAACAGCTTAAACGTCGCCGTTATACTGGTTGGGGACGTTTGTCTCGAAAATTGATTAATGGTATTAGGGATAAGCAATCTGGCAAAACAATATTAGATTTTTTGAAATCAGATGGTTTTGCCAATCGCAATTTTATGCAGCTGATCCATGATGATAGTTTGACATTTAAAGAAGATATTCAAAAAGCACAGGTGTCTGGACAAGGCCATAGTTTACATGAACAGATTGCTAACTTAGCTGGCAGTCCTGCTATTAAAAAAGGTATTTTACAGACTGTAAAAATTGTTGATGAACTGGTCAAAGTAATGGGGCATAAGCCAGAAAATATCGTTATTGAAATGGCACGTGAAAATCAGACAACTCAAAAGGGCCAGAAAAATTCGCGAGAGCGTATGAAACGAATCGAAGAAGGTATCAAAGAATTAGGAAGTCAGATTCTTAAAGAGCATCCTGTTGAAAATACTCAATTGCAAAATGAAAAGCTCTATCTCTATTATCTACAAAATGGAAGAGACATGTATGTGGACCAAGAATTAGATATTAATCGTTTAAGTGATTATGATGTCGATCACATTGTTCCACAAAGTTTCATTAAAGACGATTCAATAGACAATAAGGTACTAACGCGTTCTGATAAAAATCGTGGTAAATCGGATAACGTTCCAAGTGAAGAAGTAGTCAAAAAGATGAAAAACTATTGGAGACAACTTCTAAACGCCAAGTTAATCACTCAACGTAAGTTTGATAATTTAACGAAAGCTGAACGTGGAGGTTTGAGTGAACTTGATAAAGCTGGTTTTATCAAACGCCAATTGGTTGAAACTCGCCAAATCACTAAGCATGTGGCACAAATTTTGGATAGTCGCATGAATACTAAATACGATGAAAATGATAAACTTATTCGAGAGGTTAAAGTGATTACCTTAAAATCTAAATTAGTTTCTGACTTCCGAAAAGATTTCCAATTCTATAAAGTACGTGAGATTAACAATTACCATCATGCCCATGATGCGTATCTAAATGCCGTCGTTGGAACTGCTTTGATTAAGAAATATCCAAAACTTGAATCGGAGTTTGTCTATGGTGATTATAAAGTTTATGATGTTCGTAAAATGATTGCTAAGTCTGAGCAAGAAATAGGCAAAGCAACCGCAAAATATTTCTTTTACTCTAATATCATGAACTTCTTCAAAACAGAAATTACACTTGCAAATGGAGAGATTCGCAAACGCCCTCTAATCGAAACTAATGGGGAAACTGGAGAAATTGTCTGGGATAAAGGGCGAGATTTTGCCACAGTGCGCAAAGTATTGTCCATGCCCCAAGTCAATATTGTCAAGAAAACAGAAGTACAGACAGGCGGATTCTCCAAGGAGTCAATTTTACCAAAAAGAAATTCGGACAAGCTTATTGCTCGTAAAAAAGACTGGGATCCAAAAAAATATGGTGGTTTTGATAGTCCAACGGTAGCTTATTCAGTCCTAGTGGTTGCTAAGGTGGAAAAAGGGAAATCGAAGAAGTTAAAATCCGTTAAAGAGTTACTAGGGATCACAATTATGGAAAGAAGTTCCTTTGAAAAAAATCCGATTGACTTTTTAGAAGCTAAAGGATATAAGGAAGTTAAAAAAGACTTAATCATTAAACTACCTAAATATAGTCTTTTTGAGTTAGAAAACGGTCGTAAACGGATGCTGGCTAGTGCCGGAGAATTACAAAAAGGAAATGAGCTGGCTCTGCCAAGCAAATATGTGAATTTTTTATATTTAGCTAGTCATTATGAAAAGTTGAAGGGTAGTCCAGAAGATAACGAACAAAAACAATTGTTTGTGGAGCAGCATAAGCATTATTTAGATGAGATTATTGAGCAAATCAGTGAATTTTCTAAGCGTGTTATTTTAGCAGATGCCAATTTAGATAAAGTTCTTAGTGCATATAACAAACATAGAGACAAACCAATACGTGAACAAGCAGAAAATATTATTCATTTATTTACGTTGACGAATCTTGGAGCTCCCGCTGCTTTTAAATATTTTGATACAACAATTGATCGTAAACGATATACGTCTACAAAAGAAGTTTTAGATGCCACTCTTATCCATCAATCCATCACTGGTCTTTATGAAACACGCATTGATTTGAGTCAGCTAGGAGGTGACTGA(SEQ ID NO:1)
(Single underlined: HNH domain; double underlined: RuvC domain)
In some embodiments, wild-type Cas9 corresponds to or comprises SEQ ID NO:3 (nucleotides) and/or SEQ ID NO:4 (amino acids):
ATGGATAAAAAGTATTCTATTGGTTTAGACATCGGCACTAATTCCGTTGGATGGGCTGTCATAACCGATGAATACAAAGTACCTTCAAAGAAATTTAAGGTGTTGGGGAACACAGACCGTCATTCGATTAAAAAGAATCTTATCGGTGCCCTCCTATTCGATAGTGGCGAAACGGCAGAGGCGACTCGCCTGAAACGAACCGCTCGGAGAAGGTATACACGTCGCAAGAACCGAATATGTTACTTACAAGAAATTTTTAGCAATGAGATGGCCAAAGTTGACGATTCTTTCTTTCACCGTTTGGAAGAGTCCTTCCTTGTCGAAGAGGACAAGAAACATGAACGGCACCCCATCTTTGGAAACATAGTAGATGAGGTGGCATATCATGAAAAGTACCCAACGATTTATCACCTCAGAAAAAAGCTAGTTGACTCAACTGATAAAGCGGACCTGAGGTTAATCTACTTGGCTCTTGCCCATATGATAAAGTTCCGTGGGCACTTTCTCATTGAGGGTGATCTAAATCCGGACAACTCGGATGTCGACAAACTGTTCATCCAGTTAGTACAAACCTATAATCAGTTGTTTGAAGAGAACCCTATAAATGCAAGTGGCGTGGATGCGAAGGCTATTCTTAGCGCCCGCCTCTCTAAATCCCGACGGCTAGAAAACCTGATCGCACAATTACCCGGAGAGAAGAAAAATGGGTTGTTCGGTAACCTTATAGCGCTCTCACTAGGCCTGACACCAAATTTTAAGTCGAACTTCGACTTAGCTGAAGATGCCAAATTGCAGCTTAGTAAGGACACGTACGATGACGATCTCGACAATCTACTGGCACAAATTGGAGATCAGTATGCGGACTTATTTTTGGCTGCCAAAAACCTTAGCGATGCAATCCTCCTATCTGACATACTGAGAGTTAATACTGAGATTACCAAGGCGCCGTTATCCGCTTCAATGATCAAAAGGTACGATGAACATCACCAAGACTTGACACTTCTCAAGGCCCTAGTCCGTCAGCAACTGCCTGAGAAATATAAGGAAATATTCTTTGATCAGTCGAAAAACGGGTACGCAGGTTATATTGACGGCGGAGCGAGTCAAGAGGAATTCTACAAGTTTATCAAACCCATATTAGAGAAGATGGATGGGACGGAAGAGTTGCTTGTAAAACTCAATCGCGAAGATCTACTGCGAAAGCAGCGGACTTTCGACAACGGTAGCATTCCACATCAAATCCACTTAGGCGAATTGCATGCTATACTTAGAAGGCAGGAGGATTTTTATCCGTTCCTCAAAGACAATCGTGAAAAGATTGAGAAAATCCTAACCTTTCGCATACCTTACTATGTGGGACCCCTGGCCCGAGGGAACTCTCGGTTCGCATGGATGACAAGAAAGTCCGAAGAAACGATTACTCCATGGAATTTTGAGGAAGTTGTCGATAAAGGTGCGTCAGCTCAATCGTTCATCGAGAGGATGACCAACTTTGACAAGAATTTACCGAACGAAAAAGTATTGCCTAAGCACAGTTTACTTTACGAGTATTTCACAGTGTACAATGAACTCACGAAAGTTAAGTATGTCACTGAGGGCATGCGTAAACCCGCCTTTCTAAGCGGAGAACAGAAGAAAGCAATAGTAGATCTGTTATTCAAGACCAACCGCAAAGTGACAGTTAAGCAATTGAAAGAGGACTACTTTAAGAAAATTGAATGCTTCGATTCTGTCGAGATCTCCGGGGTAGAAGATCGATTTAATGCGTCACTTGGTACGTATCATGACCTCCTAAAGATAATTAAAGATAAGGACTTCCTGGATAACGAAGAGAATGAAGATATCTTAGAAGATATAGTGTTGACTCTTACCCTCTTTGAAGATCGGGAAATGATTGAGGAAAGACTAAAAACATACGCTCACCTGTTCGACGATAAGGTTATGAAACAGTTAAAGAGGCGTCGCTATACGGGCTGGGGACGATTGTCGCGGAAACTTATCAACGGGATAAGAGACAAGCAAAGTGGTAAAACTATTCTCGATTTTCTAAAGAGCGACGGCTTCGCCAATAGGAACTTTATGCAGCTGATCCATGATGACTCTTTAACCTTCAAAGAGGATATACAAAAGGCACAGGTTTCCGGACAAGGGGACTCATTGCACGAACATATTGCGAATCTTGCTGGTTCGCCAGCCATCAAAAAGGGCATACTCCAGACAGTCAAAGTAGTGGATGAGCTAGTTAAGGTCATGGGACGTCACAAACCGGAAAACATTGTAATCGAGATGGCACGCGAAAATCAAACGACTCAGAAGGGGCAAAAAAACAGTCGAGAGCGGATGAAGAGAATAGAAGAGGGTATTAAAGAACTGGGCAGCCAGATCTTAAAGGAGCATCCTGTGGAAAATACCCAATTGCAGAACGAGAAACTTTACCTCTATTACCTACAAAATGGAAGGGACATGTATGTTGATCAGGAACTGGACATAAACCGTTTATCTGATTACGACGTCGATCACATTGTACCCCAATCCTTTTTGAAGGACGATTCAATCGACAATAAAGTGCTTACACGCTCGGATAAGAACCGAGGGAAAAGTGACAATGTTCCAAGCGAGGAAGTCGTAAAGAAAATGAAGAACTATTGGCGGCAGCTCCTAAATGCGAAACTGATAACGCAAAGAAAGTTCGATAACTTAACTAAAGCTGAGAGGGGTGGCTTGTCTGAACTTGACAAGGCCGGATTTATTAAACGTCAGCTCGTGGAAACCCGCCAAATCACAAAGCATGTTGCACAGATACTAGATTCCCGAATGAATACGAAATACGACGAGAACGATAAGCTGATTCGGGAAGTCAAAGTAATCACTTTAAAGTCAAAATTGGTGTCGGACTTCAGAAAGGATTTTCAATTCTATAAAGTTAGGGAGATAAATAACTACCACCATGCGCACGACGCTTATCTTAATGCCGTCGTAGGGACCGCACTCATTAAGAAATACCCGAAGCTAGAAAGTGAGTTTGTGTATGGTGATTACAAAGTTTATGACGTCCGTAAGATGATCGCGAAAAGCGAACAGGAGATAGGCAAGGCTACAGCCAAATACTTCTTTTATTCTAACATTATGAATTTCTTTAAGACGGAAATCACTCTGGCAAACGGAGAGATACGCAAACGACCTTTAATTGAAACCAATGGGGAGACAGGTGAAATCGTATGGGATAAGGGCCGGGACTTCGCGACGGTGAGAAAAGTTTTGTCCATGCCCCAAGTCAACATAGTAAAGAAAACTGAGGTGCAGACCGGAGGGTTTTCAAAGGAATCGATTCTTCCAAAAAGGAATAGTGATAAGCTCATCGCTCGTAAAAAGGACTGGGACCCGAAAAAGTACGGTGGCTTCGATAGCCCTACAGTTGCCTATTCTGTCCTAGTAGTGGCAAAAGTTGAGAAGGGAAAATCCAAGAAACTGAAGTCAGTCAAAGAATTATTGGGGATAACGATTATGGAGCGCTCGTCTTTTGAAAAGAACCCCATCGACTTCCTTGAGGCGAAAGGTTACAAGGAAGTAAAAAAGGATCTCATAATTAAACTACCAAAGTATAGTCTGTTTGAGTTAGAAAATGGCCGAAAACGGATGTTGGCTAGCGCCGGAGAGCTTCAAAAGGGGAACGAACTCGCACTACCGTCTAAATACGTGAATTTCCTGTATTTAGCGTCCCATTACGAGAAGTTGAAAGGTTCACCTGAAGATAACGAACAGAAGCAACTTTTTGTTGAGCAGCACAAACATTATCTCGACGAAATCATAGAGCAAATTTCGGAATTCAGTAAGAGAGTCATCCTAGCTGATGCCAATCTGGACAAAGTATTAAGCGCATACAACAAGCACAGGGATAAACCCATACGTGAGCAGGCGGAAAATATTATCCATTTGTTTACTCTTACCAACCTCGGCGCTCCAGCCGCATTCAAGTATTTTGACACAACGATAGATCGCAAACGATACACTTCTACCAAGGAGGTGCTAGACGCGACACTGATTCACCAATCCATCACGGGATTATATGAAACTCGGATAGATTTGTCACAGCTTGGGGGTGACGGATCCCCCAAGAAGAAGAGGAAAGTCTCGAGCGACTACAAAGACCATGACGGTGATTATAAAGATCATGACATCGATTACAAGGATGACGATGACAAGGCTGCAGGA(SEQ ID NO:3)
(Single underlined: HNH domain; double underlined: RuvC domain)
In some embodiments, wild-type Cas9 corresponds to Cas9 of Streptococcus pyogenes (NCBI reference sequence: NC-002737.2, SEQ ID NO:282 (nucleotides); and Uniport reference sequence: Q99ZW2, SEQ ID NO:9 (amino acids)).
ATGGATAAGAAATACTCAATAGGCTTAGATATCGGCACAAATAGCGTCGGATGGGCGGTGATCACTGATGAATATAAGGTTCCGTCTAAAAAGTTCAAGGTTCTGGGAAATACAGACCGCCACAGTATCAAAAAAAATCTTATAGGGGCTCTTTTATTTGACAGTGGAGAGACAGCGGAAGCGACTCGTCTCAAACGGACAGCTCGTAGAAGGTATACACGTCGGAAGAATCGTATTTGTTATCTACAGGAGATTTTTTCAAATGAGATGGCGAAAGTAGATGATAGTTTCTTTCATCGACTTGAAGAGTCTTTTTTGGTGGAAGAAGACAAGAAGCATGAACGTCATCCTATTTTTGGAAATATAGTAGATGAAGTTGCTTATCATGAGAAATATCCAACTATCTATCATCTGCGAAAAAAATTGGTAGATTCTACTGATAAAGCGGATTTGCGCTTAATCTATTTGGCCTTAGCGCATATGATTAAGTTTCGTGGTCATTTTTTGATTGAGGGAGATTTAAATCCTGATAATAGTGATGTGGACAAACTATTTATCCAGTTGGTACAAACCTACAATCAATTATTTGAAGAAAACCCTATTAACGCAAGTGGAGTAGATGCTAAAGCGATTCTTTCTGCACGATTGAGTAAATCAAGACGATTAGAAAATCTCATTGCTCAGCTCCCCGGTGAGAAGAAAAATGGCTTATTTGGGAATCTCATTGCTTTGTCATTGGGTTTGACCCCTAATTTTAAATCAAATTTTGATTTGGCAGAAGATGCTAAATTACAGCTTTCAAAAGATACTTACGATGATGATTTAGATAATTTATTGGCGCAAATTGGAGATCAATATGCTGATTTGTTTTTGGCAGCTAAGAATTTATCAGATGCTATTTTACTTTCAGATATCCTAAGAGTAAATACTGAAATAACTAAGGCTCCCCTATCAGCTTCAATGATTAAACGCTACGATGAACATCATCAAGACTTGACTCTTTTAAAAGCTTTAGTTCGACAACAACTTCCAGAAAAGTATAAAGAAATCTTTTTTGATCAATCAAAAAACGGATATGCAGGTTATATTGATGGGGGAGCTAGCCAAGAAGAATTTTATAAATTTATCAAACCAATTTTAGAAAAAATGGATGGTACTGAGGAATTATTGGTGAAACTAAATCGTGAAGATTTGCTGCGCAAGCAACGGACCTTTGACAACGGCTCTATTCCCCATCAAATTCACTTGGGTGAGCTGCATGCTATTTTGAGAAGACAAGAAGACTTTTATCCATTTTTAAAAGACAATCGTGAGAAGATTGAAAAAATCTTGACTTTTCGAATTCCTTATTATGTTGGTCCATTGGCGCGTGGCAATAGTCGTTTTGCATGGATGACTCGGAAGTCTGAAGAAACAATTACCCCATGGAATTTTGAAGAAGTTGTCGATAAAGGTGCTTCAGCTCAATCATTTATTGAACGCATGACAAACTTTGATAAAAATCTTCCAAATGAAAAAGTACTACCAAAACATAGTTTGCTTTATGAGTATTTTACGGTTTATAACGAATTGACAAAGGTCAAATATGTTACTGAAGGAATGCGAAAACCAGCATTTCTTTCAGGTGAACAGAAGAAAGCCATTGTTGATTTACTCTTCAAAACAAATCGAAAAGTAACCGTTAAGCAATTAAAAGAAGATTATTTCAAAAAAATAGAATGTTTTGATAGTGTTGAAATTTCAGGAGTTGAAGATAGATTTAATGCTTCATTAGGTACCTACCATGATTTGCTAAAAATTATTAAAGATAAAGATTTTTTGGATAATGAAGAAAATGAAGATATCTTAGAGGATATTGTTTTAACATTGACCTTATTTGAAGATAGGGAGATGATTGAGGAAAGACTTAAAACATATGCTCACCTCTTTGATGATAAGGTGATGAAACAGCTTAAACGTCGCCGTTATACTGGTTGGGGACGTTTGTCTCGAAAATTGATTAATGGTATTAGGGATAAGCAATCTGGCAAAACAATATTAGATTTTTTGAAATCAGATGGTTTTGCCAATCGCAATTTTATGCAGCTGATCCATGATGATAGTTTGACATTTAAAGAAGACATTCAAAAAGCACAAGTGTCTGGACAAGGCGATAGTTTACATGAACATATTGCAAATTTAGCTGGTAGCCCTGCTATTAAAAAAGGTATTTTACAGACTGTAAAAGTTGTTGATGAATTGGTCAAAGTAATGGGGCGGCATAAGCCAGAAAATATCGTTATTGAAATGGCACGTGAAAATCAGACAACTCAAAAGGGCCAGAAAAATTCGCGAGAGCGTATGAAACGAATCGAAGAAGGTATCAAAGAATTAGGAAGTCAGATTCTTAAAGAGCATCCTGTTGAAAATACTCAATTGCAAAATGAAAAGCTCTATCTCTATTATCTCCAAAATGGAAGAGACATGTATGTGGACCAAGAATTAGATATTAATCGTTTAAGTGATTATGATGTCGATCACATTGTTCCACAAAGTTTCCTTAAAGACGATTCAATAGACAATAAGGTCTTAACGCGTTCTGATAAAAATCGTGGTAAATCGGATAACGTTCCAAGTGAAGAAGTAGTCAAAAAGATGAAAAACTATTGGAGACAACTTCTAAACGCCAAGTTAATCACTCAACGTAAGTTTGATAATTTAACGAAAGCTGAACGTGGAGGTTTGAGTGAACTTGATAAAGCTGGTTTTATCAAACGCCAATTGGTTGAAACTCGCCAAATCACTAAGCATGTGGCACAAATTTTGGATAGTCGCATGAATACTAAATACGATGAAAATGATAAACTTATTCGAGAGGTTAAAGTGATTACCTTAAAATCTAAATTAGTTTCTGACTTCCGAAAAGATTTCCAATTCTATAAAGTACGTGAGATTAACAATTACCATCATGCCCATGATGCGTATCTAAATGCCGTCGTTGGAACTGCTTTGATTAAGAAATATCCAAAACTTGAATCGGAGTTTGTCTATGGTGATTATAAAGTTTATGATGTTCGTAAAATGATTGCTAAGTCTGAGCAAGAAATAGGCAAAGCAACCGCAAAATATTTCTTTTACTCTAATATCATGAACTTCTTCAAAACAGAAATTACACTTGCAAATGGAGAGATTCGCAAACGCCCTCTAATCGAAACTAATGGGGAAACTGGAGAAATTGTCTGGGATAAAGGGCGAGATTTTGCCACAGTGCGCAAAGTATTGTCCATGCCCCAAGTCAATATTGTCAAGAAAACAGAAGTACAGACAGGCGGATTCTCCAAGGAGTCAATTTTACCAAAAAGAAATTCGGACAAGCTTATTGCTCGTAAAAAAGACTGGGATCCAAAAAAATATGGTGGTTTTGATAGTCCAACGGTAGCTTATTCAGTCCTAGTGGTTGCTAAGGTGGAAAAAGGGAAATCGAAGAAGTTAAAATCCGTTAAAGAGTTACTAGGGATCACAATTATGGAAAGAAGTTCCTTTGAAAAAAATCCGATTGACTTTTTAGAAGCTAAAGGATATAAGGAAGTTAAAAAAGACTTAATCATTAAACTACCTAAATATAGTCTTTTTGAGTTAGAAAACGGTCGTAAACGGATGCTGGCTAGTGCCGGAGAATTACAAAAAGGAAATGAGCTGGCTCTGCCAAGCAAATATGTGAATTTTTTATATTTAGCTAGTCATTATGAAAAGTTGAAGGGTAGTCCAGAAGATAACGAACAAAAACAATTGTTTGTGGAGCAGCATAAGCATTATTTAGATGAGATTATTGAGCAAATCAGTGAATTTTCTAAGCGTGTTATTTTAGCAGATGCCAATTTAGATAAAGTTCTTAGTGCATATAACAAACATAGAGACAAACCAATACGTGAACAAGCAGAAAATATTATTCATTTATTTACGTTGACGAATCTTGGAGCTCCCGCTGCTTTTAAATATTTTGATACAACAATTGATCGTAAACGATATACGTCTACAAAAGAAGTTTTAGATGCCACTCTTATCCATCAATCCATCACTGGTCTTTATGAAACACGCATTGATTTGAGTCAGCTAGGAGGTGACTGA(SEQ ID NO:282)
(Single underlined: HNH domain; double underlined: RuvC domain)
In some embodiments, Cas9 refers to Cas9 from: corynebacterium ulcerosa (Corynebacterium ulcerans) (NCBI Refs: NC-015683.1, NC-017317.1); corynebacterium diphtheriae (NCBI Refs: NC-016782.1, NC-016786.1); spiroplama sylphidicala (NCBI Ref: NC-021284.1); prevotella intermedia (NCBI Ref: NC-017861.1); spiroplama taiwanense (NCBI Ref: NC-021846.1); streptococcus iniae (NCBI Ref: NC-021314.1); bellliella baltca (NCBI Ref: NC-018010.1); psychrofelexus torquisI (NCBI Ref: NC-018721.1); streptococcus thermophilus (NCBI Ref: YP-820832.1); listeria innocua (NCBI Ref: NP-472073.1); campylobacter jejuni (NCBI Ref: YP _ 002344900.1); or neissemia menidis (NCBI Ref: YP _002342100.1), or Cas9 from any organism listed in example 3.
In some embodiments, dCas9 corresponds to or comprises part or all of a Cas9 amino acid sequence having one or more mutations that inactivate Cas9nuclease activity. For example, in some embodiments, the dCas9 domain comprises a D10A and/or H840A mutation.
dCas9(D10A and H840A):
(Single underlined: HNH domain; double underlined: RuvC domain)
In some embodiments, Cas9 corresponds to or comprises a portion or all of a Cas9 amino acid sequence having one or more mutations that alter Cas9nuclease activity. In some embodiments, Cas9 may be a Cas9 nickase that is a form of Cas9 that generates single-stranded DNA breaks at specific locations based on the target sequence defined by the co-expressed gRNA, rather than double-stranded DNA breaks. For example, in some embodiments, the Cas9 domain comprises a D10A mutation (e.g., SEQ ID NO:301) and/or a H840A mutation (e.g., SEQ ID NO: 302). Exemplary Cas9 nickases are shown below. However, it is understood that additional Cas9 nickases that generate single-stranded DNA breaks of DNA duplexes would be apparent to one skilled in the art and are within the scope of the present disclosure.
Cas9D10A nickase:
(Single underlined: HNH domain; double underlined: RuvC domain)
Cas9H840A nickase:
(Single underlined: HNH domain; double underlined: RuvC domain)
In other embodiments, dCas9 variants are provided having mutations other than D10A and H840A, for example, resulting in nuclease-inactivated Cas9(dCas 9). For example, such mutations include other amino acid substitutions at D10 and H820, or other substitutions within the nuclease domain of Cas9 (e.g., substitutions in the HNH nuclease subdomain and/or the RuvC1 subdomain). In some embodiments, a variant or homolog of dCas9 (e.g., a variant of SEQ ID NO:9) is provided that is about 70% less identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to SEQ ID NO: 9. In some embodiments, variants of dCas9 (e.g., variants of SEQ ID NO:9) are provided that have an amino acid sequence that is about 5 amino acids, about 10 amino acids, about 15 amino acids, about 20 amino acids, about 25 amino acids, about 30 amino acids, about 40 amino acids, about 50 amino acids, about 75 amino acids, about 100 amino acids or more shorter or longer than SEQ ID NO: 9.
In some embodiments, the Cas9 fusion proteins provided herein comprise the full-length amino acid sequence of a Cas9 protein, e.g., one of the sequences provided above. However, in other embodiments, the fusion proteins provided herein do not comprise the full-length Cas9 sequence, but only a fragment thereof. For example, in some embodiments, a Cas9 fusion protein provided herein comprises a Cas9 fragment, wherein the fragment binds crRNA and tracrRNA or sgRNA, but does not comprise a functional nuclease domain, e.g., it comprises only a truncated form of the nuclease domain or no nuclease domain at all. Exemplary amino acid sequences of suitable Cas9 domains and Cas9 fragments are provided herein, and additional suitable sequences of Cas9 domains and Cas9 fragments will be apparent to those skilled in the art. In some embodiments, the Cas9 fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of the corresponding wild-type Cas9 protein. In some embodiments, the Cas9 fragment comprises at least 100 amino acids in length. In some embodiments, the Cas9 fragment is at least 100,150,200,250,300,350,400,450,500,550,600,650,700,750,800,850,900,950,1000,1050,1100,1150,1200,1250,1300,1350,1400,1450,1500,1550 or at least 1600 amino acids of a corresponding wild-type Cas9 protein. In some embodiments, the Cas9 fragment comprises an amino acid sequence having at least 10, at least 15, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 150, at least 200, at least 250, at least 300, at least 350, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 1100, or at least 1200 identical contiguous amino acid residues of a corresponding wild-type Cas9 protein.
Cas 9. In some embodiments, Cas9 refers to Cas9 from: corynebacterium ulcerosa (Corynebacterium ulcerans) (NCBI Refs: NC-015683.1, NC-017317.1); corynebacterium diphtheriae (NCBI Refs: NC-016782.1, NC-016786.1); spiroplama sylphidicala (NCBI Ref: NC-021284.1); prevotella intermedia (NCBI Ref: NC-017861.1); spiroplama taiwanense (NCBI Ref: NC-021846.1); streptococcus iniae (NCBI Ref: NC-021314.1); bellliella baltca (NCBI Ref: NC-018010.1); psychrofelexus torquisI (NCBI Ref: NC-018721.1); streptococcus thermophilus (NCBI Ref: YP-820832.1); listeria innocua (NCBI Ref: NP-472073.1); campylobacter jejuni (NCBI Ref: YP _ 002344900.1); or neissemia menidis (NCBI Ref: YP-002342100.1).
As used herein, the term "deaminase" or "deaminase domain" refers to a protein or enzyme that catalyzes a deamination reaction. In some embodiments, the deaminase or deaminase domain is a cytidine deaminase that catalyzes the hydrolysis of a cytidine or deoxycytidine to deaminate into a uridine or deoxyuridine, respectively. In some embodiments, the deaminase or deaminase domain is a cytosine deaminase, catalyzing the hydrolytic deamination of cytosine to uracil. In some embodiments, the deaminase or deaminase domain is a naturally occurring deaminase from an organism such as a human, chimpanzee, gorilla, monkey, cow, dog, rat, or mouse. In some embodiments, a deaminase or deaminase domain is a variant of a naturally occurring deaminase from an organism that does not occur in nature. For example, in some embodiments, a deaminase or deaminase domain is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally occurring deaminase from an organism.
As used herein, the term "effective amount" refers to an amount of a biologically active agent sufficient to elicit a desired biological response. For example, in some embodiments, an effective amount of a nuclease can refer to an amount of a nuclease sufficient to induce cleavage of a target site that the nuclease specifically binds to and cleaves. In some embodiments, an effective amount of a fusion protein provided herein, e.g., a fusion protein comprising a nuclease-inactivated Cas9 domain and an effector domain (e.g., a deaminase domain), can refer to an amount of the fusion protein sufficient to induce editing of a target site specifically bound and edited by the fusion protein. As will be appreciated by those skilled in the art, the effective amount of an agent, e.g., a fusion protein, nuclease, deaminase, recombinase, hybrid protein, protein dimer, complex of a protein (or protein dimer) and polynucleotide, or polynucleotide, may vary depending on various factors, e.g., depending on the desired biological response, e.g., the particular allele, genomic or target site to be edited; targeted cells or tissues; and the reagents used.
The term "immediately adjacent" as used in the context of two nucleic acid sequences means that the two sequences are directly linked to each other as part of the same nucleic acid molecule and are not separated by one or more nucleotides. Thus, sequences are immediately adjacent when the 3 'nucleotide of one of the sequences is directly linked to the 5' nucleotide of the other sequence by a phosphodiester linkage.
The term "linker" as used herein refers to a chemical group or molecule that connects two molecules or moieties, e.g., two domains of a fusion protein, e.g., a nuclease-inactivated Cas9 domain and an effector domain (e.g., a deaminase domain). In some embodiments, the linker connects the gRNA binding domain of the RNA programmable nuclease (including the Cas9nuclease domain) and the catalytic domain of the nucleic acid editing protein. In some embodiments, the linker connects the gRNA binding domain of the RNA programmable nuclease comprising the Cas9nuclease domain, and the catalytic domain of the nucleic acid editing protein. In some embodiments, a linker connects dCas9 and the nucleic acid editing protein. Typically, a linker is located between or on both groups, molecules or other moieties, and is attached to each group by a covalent bond, thereby linking the two groups. In some embodiments, the linker is an amino acid or a plurality of amino acids (e.g., peptides or proteins). In some embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5-100 amino acids in length, e.g., 5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,30-35,35-40,40-45,45-50,50-60,60-70,70-80,80-90,90-100,100-150 or 150-200 amino acids in length. Longer or shorter linkers are also contemplated.
The term "mutation" as used herein refers to the substitution of a residue within a sequence (e.g., a nucleic acid or amino acid sequence) with another residue, or the deletion or insertion of one or more residues within a sequence. Mutations are generally described herein by identifying the original residue, followed by the position of the residue within the sequence and the identity of the newly substituted residue. Various methods for preparing the amino acid substitutions (mutations) provided herein are well known in the art and are described, for example, by Green and Sambrook, molecular cloning: A Laboratory Manual (4)thed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)).
The terms "nucleic acid" and "nucleic acid molecule" as used herein refer to a compound comprising a nucleobase and an acidic moiety (e.g., a nucleoside, nucleotide, or polymer of nucleotides). Typically, polymeric nucleic acids, such as nucleic acid molecules comprising three or more nucleotides, are linear molecules in which adjacent nucleotides are linked to each other by phosphodiester bonds. In some embodiments, "nucleic acid" refers to a single nucleic acid residue (e.g., a nucleotide and/or nucleoside). In some embodiments, a "nucleic acid" refers to an oligonucleotide strand comprising three or more individual nucleotide residues. As used herein, the terms "oligonucleotide" and "polynucleotide" are used interchangeably to refer to a polymer of nucleotides (e.g., a strand of at least 3 nucleotides). In some embodiments, "nucleic acid" encompasses RNA as well as single-and/or double-stranded DNA. The nucleic acid can be naturally occurring, e.g., in the context of a genome, transcript, mRNA, tRNA, rRNA, siRNA, snRNA, plasmid, cosmid, chromosome, chromatid, or other naturally occurring nucleic acid molecule. In another aspect, the nucleic acid molecule can be a non-naturally occurring molecule, such as a recombinant DNA or RNA, an artificial chromosome, an engineered genome or fragment thereof, or a synthetic DNA, RNA, DNA/RNA hybrid, or include non-naturally occurring nucleotides or nucleosides. Furthermore, the terms "nucleic acid", "DNA", "RNA" and/or similar terms include nucleic acid analogs, e.g., analogs having a backbone other than a phosphodiester backbone. Nucleic acids may be purified from natural sources, produced using recombinant expression systems and optionally purified, chemically synthesized, and the like. Where appropriate, e.g., in the case of chemically synthesized molecules, the nucleic acid may comprise nucleoside analogs, e.g., bases or sugars with chemical modifications, and backbone-modified analogs. Unless otherwise indicated, nucleic acid sequences are presented in a 5 'to 3' orientation. In some embodiments, the nucleic acid is or comprises a natural nucleoside (e.g., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine); nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolopyrimidine, 3-methyladenosine, 5-methylcytidine, 2-aminoadenosine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine, 2-aminoadenosine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, O (6) -methylguanine and 2-thiocytidine); a chemically modified base; biologically modified bases (e.g., methylated bases); the inserted base; modified sugars (e.g., 2 '-fluororibose, ribose, 2' -deoxyribose, arabinose, and hexose); and/or modified phosphate groups (e.g., phosphorothioate and 5' -N-phosphoramidite linkages). In some embodiments, the RNA is RNA associated with a Cas9 system. For example, the RNA may be CRISPR RNA (crRNA), a trans-encoded small RNA (tracrrna), a single guide RNA (sgrna), or a guide RNA (grna).
As used herein, the term "proliferative disease" refers to any disease in which cellular or tissue homeostasis is disturbed, as cells or cell populations exhibit abnormally elevated rates of proliferation. Proliferative diseases include hyperproliferative diseases, such as pre-neoplastic conditions and neoplastic diseases. Neoplastic diseases are characterized by abnormal proliferation of cells and include benign and malignant tumors. Malignant tumors are also known as cancers.
The terms "protein," "peptide" and "polypeptide" are used interchangeably herein and refer to a polymer of amino acid residues joined together by peptide (amide) bonds. The term refers to a protein, peptide or polypeptide of any size, structure or function. Typically, a protein, peptide or polypeptide will be at least three amino acids long. A protein, peptide or polypeptide may refer to a single protein or a collection of proteins. One or more amino acids in a protein, peptide, or polypeptide can be modified, for example, by the addition of chemical entities such as carbohydrate groups, hydroxyl groups, phosphate groups, farnesyl groups, isofarnesyl groups, fatty acid groups, linkers for conjugation, functionalization, or other modification, and the like. The protein, peptide or polypeptide may also be a single molecule or may be a multi-molecule complex. The protein, peptide or polypeptide may be only a fragment of a naturally occurring protein or peptide. The protein, peptide or polypeptide may be naturally occurring, recombinant or synthetic, or any combination thereof. The term "fusion protein" as used herein refers to a hybrid polypeptide comprising protein domains from at least two different proteins. A protein may be located at the amino-terminus (N-terminus) or the carboxy-terminus (C-terminus) of a fusion protein, thereby forming an "amino-terminal fusion protein" or a "carboxy-terminal fusion protein", respectively. The protein may comprise different domains, such as a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs binding of the protein to a target site) and a nucleic acid cleavage domain or catalytic domain of a nucleic acid editing protein. In some embodiments, the protein comprises a protein portion, e.g., a constituent nucleic acidThe amino acid sequence of the binding domain, and organic compounds, such as compounds that can act as nucleic acid cleaving agents. In some embodiments, the protein is complexed or associated with a nucleic acid, such as an RNA. Any of the proteins provided herein can be produced by any method known in the art. For example, the proteins provided herein can be produced by recombinant protein expression and purification, which is particularly useful for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known and include Green and Sambrook, Molecular Cloning: A laboratory Manual (4)thed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which are incorporated herein by reference.
The terms "RNA programmable nuclease" and "RNA guided nuclease" are used interchangeably herein and refer to a nuclease that forms a complex (e.g., binds to or associates with) one or more RNAs that are not targets for cleavage. In some embodiments, an RNA programmable nuclease may be referred to as a nuclease when forming a complex with RNA: an RNA complex. Typically, the bound RNA is referred to as guide RNA (grna). The gRNA may exist as a complex of two or more RNAs or as a single RNA molecule. A gRNA that exists as a single RNA molecule may be referred to as a single guide RNA (sgrna), although "gRNA" may be used interchangeably to refer to a guide RNA that exists as a single molecule or as a complex of two or more molecules. Typically, a gRNA that exists as a single RNA species comprises two domains: (1) a domain that shares homology with the target nucleic acid (e.g., and directs binding of the Cas9 complex to the target); and (2) a domain that binds to a Cas9 protein. In some embodiments, domain (2) corresponds to a sequence referred to as tracrRNA and comprises a stem-loop structure. For example, in some embodiments, domain (2) is identical or homologous to the tracrRNA provided in Jinek et, Science337: 816-. Other examples of grnas (e.g., those including domain 2) can be found in U.S. provisional patent application No. u.s.s.n.61/874,682 entitled "Switchable Cas9 nuclei and Uses Thereof," filed on 6.9.2013, and U.S.s.n.61/874,746 entitled "Delivery System For Functional nuclei," filed on 6.9.2013, the entire contents of which are incorporated herein by reference in their entirety. In some embodiments, a gRNA comprises two or more domains (1) and (2), and may be referred to as an "extended gRNA. For example, an extended gRNA will, for example, bind to two or more Cas9 proteins and bind to a target nucleic acid at two or more different regions, as described herein. The gRNA comprises a nucleotide sequence complementary to a target site that mediates binding of a nuclease/RNA complex to the target site, providing a nuclease: sequence specificity of the RNA complex. In some embodiments, the RNA programmable nuclease is a (CRISPR-associated system) Cas9 endonuclease, such as Cas9(Csn1) from Streptococcus pyogenes (see, e.g., Complete genome sequence of an M1 strand of Streptococcus pyogenes, "Ferretti J.J., McShan W.M., Ajdic D.J., Savic D.J., Savic G.J., Lyon K.Primeaux C.J., Sezate S.S., Suvorov A.N., Nat S.S., Lai H.S.S.P., Lin S.P., Qiy, Jia H.G., Najar F.Z., RenQ.Zhu H.H., Song L.J., Yuuu X.Clifton S.W.Acme B.G., Najar F.Z., Renq Q.S.S.S.J., DNA, Chu.S.S.S.S.D., DNA J., DNA J.S.S.S.S.S.S.S.S.S.S.S.D., Nature A.S.S.S.S.S.S.S.S.S.No. DNA, C. DNA J.S. DNA J.S.S.S.S.S.S.S.S.S.S.S.S.S.8, DNA, U.S. C. DNA, DNA J.S. 35, DNA J.S. C. DNA, U.S. DNA, U.S. S. 35, U.S. S. 20, U.S. 20, DNA, U.S., chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E.science337:816-821(2012), the entire contents of each of which are incorporated herein by reference.
Because RNA programmable nucleases (e.g., Cas9) use RNA: DNA hybridization targets DNA cleavage sites, and these proteins can in principle target any sequence specified by the guide RNA. Methods for site-specific cleavage (e.g., modification of the genome) using an RNA programmable nuclease such as Cas9 are known in the art (see, e.g., Cong, L.et. multiple genome engineering using CRISPR/Cas system. science339,819-823 (2013); Mali, P.et. RNA-guided genome engineering via case 9.science339,823-826 (2013); Hwang, W.Y.et. efficient genome engineering in hybridization using CRISPR-Cas system. Nature biotechnology31,227-229 (2013); Jinek, M.et. RNA-nucleic genome engineering in human cells, Liefet 2, e00471 (2013); Carroll, J.Sa.et. RNA-nucleic genome engineering in each case W.233. J.3; incorporated by the contents of J.2019. genome engineering of RNA-scientific systems; see, U.W.3, incorporated by Ser. No. 2, 2013, see, incorporated by way of FIGS.
As used herein, the term "subject" refers to an individual organism, e.g., an individual mammal. In some embodiments, the subject is a human. In some embodiments, the subject is a non-human mammal. In some embodiments, the subject is a non-human primate. In some embodiments, the subject is a rodent. In some embodiments, the subject is a sheep, goat, cow, cat, or dog. In some embodiments, the subject is a vertebrate, amphibian, reptile, fish, insect, fly, or nematode. In some embodiments, the subject is a research animal. In some embodiments, the subject is a genetically engineered, e.g., genetically engineered, non-human subject. The subject may be of any gender, of any age, and at any stage of development.
The term "target site" refers to a sequence within a nucleic acid molecule that is deaminated by a deaminase or a fusion protein comprising a deaminase (e.g., a dCas 9-deaminase fusion protein provided herein).
The term "treatment" refers to a clinical intervention as described herein aimed at reversing, alleviating, delaying the onset of, or inhibiting the progression of a disease or disorder or one or more symptoms thereof. As used herein, the term "treatment" refers to a clinical intervention as described herein that is intended to reverse, alleviate, delay the onset of, or inhibit the progression of a disease or disorder, or one or more symptoms thereof. In some embodiments, the treatment may be administered after one or more symptoms have developed and/or after the disease has been diagnosed. In other embodiments, treatment may be administered without symptoms, e.g., for preventing or delaying the onset of symptoms or inhibiting the onset or progression of a disease. For example, administration to a susceptible individual can be prior to the onset of symptoms (e.g., based on history of symptoms and/or genetics or other susceptibility factors). Treatment may also be continued after the symptoms have disappeared, e.g., to prevent or delay their recurrence.
The term "recombinant" as used herein in the context of a protein or nucleic acid refers to a protein or nucleic acid that does not occur in nature but is the product of human engineering. For example, in some embodiments, a recombinant protein or nucleic acid molecule comprises an amino acid or nucleotide sequence that comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations compared to any naturally occurring sequence.
The term "nucleic acid editing enzyme" as used herein refers to a protein capable of modifying a nucleic acid or one or more nucleotide bases of a nucleic acid. For example, in some embodiments, the nucleic acid editing enzyme is a deaminase that can catalyze C to T or G to exchange. Other suitable nucleic acid editing enzymes that may be used in accordance with the present disclosure include, but are not limited to, nucleases, nickases, recombinases, deaminases, methyltransferases, methylases, acetylases, or acetyltransferases.
Detailed Description
Some aspects of the disclosure provide a recombinant Cas9 protein that effectively targets DNA sequences that do not include a canonical PAM sequence (5' -NGG-3', where N is any nucleotide, e.g., a, T, G, or C) at its 3' end. In some embodiments, Cas9 proteins provided herein comprise the use of a target sequence library comprising randomized PAM comprising one or more mutations identified in a directed evolution experiment. The recombinant non-PAM restricted Cas9 proteins provided herein can be used to target DNA sequences that do not contain the canonical PAM sequence at their 3' end, thus greatly extending the usefulness of Cas9 technology for gene editing.
Some aspects of the disclosure provide fusion proteins comprising a Cas9 protein and an effector domain, e.g., a DNA editing domain, e.g., a deaminase domain. Deamination of a nucleobase by a deaminase can cause a point mutation at a particular residue, which is referred to herein as nucleic acid editing. A fusion protein comprising a Cas9 protein or a variant thereof and a DNA editing domain can therefore be used for targeted editing of a nucleic acid sequence. Such fusion proteins can be used for targeted editing of DNA in vitro, e.g., for the production of mutant cells or animals; for introducing targeted mutations, e.g., for repairing a genetic defect in a cell ex vivo, e.g., in a cell obtained from a subject, which is subsequently reintroduced into the same or another subject; and introducing targeted mutations, such as correction of a genetic defect or introduction of an inactivating mutation in a disease-associated gene in a subject in vivo. Typically, the Cas9 protein of the fusion proteins described herein does not have any nuclease activity, but is a Cas9 fragment or a dCas9 protein. Also provided are methods of using Cas9 fusion proteins as described herein.
Provided herein are non-limiting exemplary nuclease-activity-free Cas9 proteins. An exemplary suitable nuclease-free active Cas9 protein is the D10A/H840A Cas9 protein mutant:
MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD (SEQ ID NO: 262; see, e.g., Qi et al, reproducing CRISPR as an RNA-defined platform for sequence-specific control of gene expression, cell.2013; 152(5):1173-83, the entire contents of which are incorporated herein by reference).
Other suitable nuclease-activity-free Cas9 proteins will be apparent to those of skill in the art based on this disclosure. Such additional exemplary suitable nuclease-activity-free Cas9 proteins include, but are not limited to, D10A, D839A, H840A, N863A, D10A/D839A, D10A/H840A, D10A/N863A, D839A/H840A, D839A/N863A, D10A/D839A/H840A, and D10A/D839A/H840A/N863A mutant proteins (see, e.g., Prasint et al, 9transcriptional activities for target specificity screening and experimental for cooperative genetic engineering. Nature Biotechnology.2013; CAS 31, CAS 838, incorporated herein by reference in its entirety).
Recombinant Cas9 protein
Some aspects of the disclosure provide a recombinant Cas9 protein comprising an amino acid sequence that is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of Cas9 provided in SEQ ID NOs 10-262, wherein the Cas9 protein comprises RuvC and HNH domains, wherein the amino acid sequence of Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations at amino acid residues selected from the group consisting of: amino acid residues 122,137,182,262,294,409,480,543,660,694,1219 and 1329 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding amino acid residues in any of the amino acid sequences provided in SEQ ID nos. 10-262, and wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein.
In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: the corresponding mutation in any one of X262T, X294R, X409I, X480K, X543D, X694I and X1219V of the amino acid sequence provided in SEQ ID NO 9, or the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid at the corresponding position.
In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: a corresponding mutation in any of the amino acid sequences provided in SEQ ID No. 9, a262T, K294R, S409I, E480K, E543D, M694I or E1219V of the amino acid sequence provided in SEQ ID No. 10-262. In some embodiments, the amino acid sequence of the Cas9 protein comprises the X1219V mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262. In some embodiments, the amino acid sequence of the Cas9 protein comprises the E1219V mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262.
In some embodiments, the amino acid sequence of the HNH domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the HNH domain of SEQ ID No. 9. In some embodiments, the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of any one of SEQ id nos 10-262. In some embodiments, the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of SEQ ID No. 9. In some embodiments, the Cas9 protein comprises the D10A and/or H840A mutations of the amino acid sequences provided in SEQ ID No. 9, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262.
Recombinant Cas9 protein with activity on non-canonical PAM
Some aspects of the disclosure provide a recombinant Cas9 protein comprising an amino acid sequence at least 90% identical to the amino acid sequence of streptococcus pyogenes Cas9 provided in SEQ ID No. 9, comprising RuvC and HNH domains of SEQ ID No. 9, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 122,137,182,262,294,409,480,543,660,694,1219 and 1329 of the amino acid sequence provided in SEQ ID NO:9, wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein, and wherein the recombinant Cas protein exhibits activity (e.g., increased activity) on a target sequence that does not comprise the canonical PAM (5' -NGG-3') at its 3' end compared to the streptococcus pyogenes Cas9 provided in SEQ ID NO: 9.
In some embodiments, the Cas9 protein exhibits an activity on a target sequence that has a 3' end that is not directly adjacent to a canonical PAM sequence (5' -NGG-3') that is at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1000-fold, at least 5000-fold, at least 10000-fold, at least 50000-fold, at least 100000-fold, at least 500000-fold, or at least 1000000-fold increased over the activity of streptococcus pyogenes Cas9 on the same target sequence provided in SEQ ID No. 9. In some embodiments, the 3' end of the target sequence is directly adjacent to the AGC, GAG, TTT, GTG or CAA sequence. In some embodiments, Cas9 protein activity is measured by a nuclease assay, deamination assay, or transcription activation assay. In some embodiments, the transcriptional activation assay is a reporter activation assay, such as a GFP activation assay. Exemplary methods of measuring binding activity (e.g., binding activity of Cas9) using transcription activation assays are known in the art and will be apparent to those skilled in the art. For example, Methods for measuring Cas9 activity using the triple activator VPR are described in Chavez A., et al, "" high effective Cas9-mediated transcriptional programming. "" Nature Methods 12, 326-328 (2015); the entire contents of which are incorporated herein by reference.
In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: the corresponding mutation in any one of X262T, X294R, X409I, X480K, X543D, X694I and X1219V of the amino acid sequence provided in SEQ ID NO 9, or the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid at the corresponding position.
In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: a corresponding mutation in any of the amino acid sequences provided in SEQ ID No. 9, a262T, K294R, S409I, E480K, E543D, M694I or E1219V of the amino acid sequence provided in SEQ ID No. 10-262. In some embodiments, the amino acid sequence of the Cas9 protein comprises X1219V of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262. In some embodiments, the amino acid sequence of the Cas9 protein comprises X1219V of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262.
In some embodiments, the amino acid sequence of the HNH domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the HNH domain of any one of SEQ ID NOs 2,4, or 9. In some embodiments, the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of any one of SEQ ID NOs 10-262. In some embodiments, the Cas9 protein comprises the D10A and/or H840A mutations of the amino acid sequences provided in SEQ ID No. 9, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262.
Some aspects of the disclosure provide a recombinant Cas9 protein comprising an amino acid sequence at least 90% identical to the amino acid sequence of streptococcus pyogenes Cas9 provided in SEQ ID No. 9, comprising RuvC and HNH domains of SEQ ID No. 9, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 122,137,182,262,294,409,480,543,660,694,1219 and 1329 of the amino acid sequence provided in SEQ ID NO:9, wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein, and wherein the recombinant Cas protein exhibits increased activity on a target sequence that does not comprise the canonical PAM (5' -NGG-3') at its 3' end compared to the streptococcus pyogenes Cas9 provided in SEQ ID NO: 9. In some embodiments, the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the HNH domain of any of SEQ ID NOs 2,4, or 9. In some embodiments, the Cas9 protein comprises the D10A and/or H840A mutations of the amino acid sequences provided in SEQ ID No. 9, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262.
Some aspects of the disclosure provide a recombinant Cas9 protein comprising an amino acid sequence at least 90% identical to the amino acid sequence of streptococcus pyogenes Cas9 provided in SEQ ID No. 9, comprising RuvC and HNH domains of SEQ ID No. 9, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 122,137,182,262,294,409,480,543,660,694,1219 and 1329 of the amino acid sequence provided in SEQ ID NO:9, wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein, and wherein the recombinant Cas protein exhibits increased activity on a target sequence that does not comprise the canonical PAM (5' -NGG-3') at its 3' end compared to the streptococcus pyogenes Cas9 provided in SEQ ID NO: 9. In some embodiments, the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the HNH domain of any of SEQ ID NOs 2,4, or 9. In some embodiments, the Cas9 protein comprises D10A of the amino acid sequence provided in SEQ ID No. 9 and, or, the corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262. In some embodiments, Cas9 further comprises a histidine at position 840 provided in SEQ ID NO:9, or a corresponding histidine in any of the amino acid sequences provided in SEQ ID NOs 10-262. Without wishing to be bound by any particular theory, the presence of catalytic residue H840 allows Cas9to cleave the non-targeted strand, i.e., the strand bound by the sgRNA. In some embodiments, Cas9 having an amino acid residue other than histidine at position 840 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding amino acid in the amino acid sequence provided in SEQ ID nos. 10-262, may be altered or transformed such that the amino acid sequence provided in SEQ ID No. 9 or the corresponding amino acid sequence provided in SEQ ID nos. 10-262 at position 840 is a histidine.
Some aspects of the disclosure provide a recombinant Cas9 protein comprising an amino acid sequence at least 90% identical to the amino acid sequence of streptococcus pyogenes Cas9 provided in SEQ ID No. 9, comprising RuvC and HNH domains of SEQ ID No. 9, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 262,267,294,405,409,480,543,694,1219,1224 and 1256 of the amino acid sequence provided in SEQ ID NO 9; wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein; and wherein the recombinant Cas protein exhibits activity (e.g., increased activity) against a target sequence that does not comprise the canonical PAM (5' -NGG-3') at its 3' end compared to the streptococcus pyogenes Cas9 provided by SEQ id no: 9.
In some embodiments, the Cas9 protein exhibits an activity on a target sequence that has a 3' end that is not directly contiguous with the canonical PAM sequence (5' -NGG-3') that is at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1000-fold, at least 5000-fold, at least 10000-fold, at least 50000-fold, at least 100000-fold, at least 500000-fold, or at least 1000000-fold increased over the activity of streptococcus pyogenes Cas9 on the same target sequence as provided by SEQ ID No. 9. In some embodiments, the 3' end of the target sequence is directly adjacent to an AAA, AAC, AAG, AAT, CAA, CAC, CAG, CAT, GAA, GAC, GAG, GAT, TAA, TAC, TAG, TAT, ACA, ACC, ACG, ACT, CCA, CCC, CCG, CCT, GCA, GCC, GCG, GCT, TCA, TCC, TCG, TCT, AGA, AGC, AGT, CGA, CGC, CGT, GGA, GGC, GGT, TGA, TGC, TGT, ATA, ATC, ATG, ATT, CTA, CTC, CTG, CTT, GTA, GTC, GTG, GTT, TTA, TTC, TTG, or TTT PAM sequence.
In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 9, X262T, X267G, X294R, X405I, X409I, X480K, X543D, X694I, X1219V, X1224K, X1256K and X1362P, or in SEQ ID NO 10-262, wherein X denotes any amino acid at the corresponding position.
In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 9, A262T, S267G, K294R, F405I, S409I, E480K, E543D, M694I, E1219V, N1224K, Q1256K and L1362P, or in SEQ ID NO 10-262. In some embodiments, the amino acid sequence of the Cas9 protein comprises the X1219V mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262. In some embodiments, the amino acid sequence of the Cas9 protein comprises the E1219V mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the X480K mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262. In some embodiments, the amino acid sequence of the Cas9 protein comprises the E480K mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262.
In some embodiments, the amino acid sequence of the Cas9 protein comprises the X543D mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262. In some embodiments, the amino acid sequence of the Cas9 protein comprises the E543D mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262.
In some embodiments, the amino acid sequence of the Cas9 protein comprises a combination of mutations selected from the group consisting of: (iii) of the amino acid sequence provided in SEQ ID NO:9 (X480K, X543D and X1219V); (X262T, X409I, X480K, X543D, X694I and X1219V); (X294R, X480K, X543D, X1219V, X1256K and X1362P); (X294R, X480K, X543D, X1219V and X1256K); (X267G, X294R, X480K, X543D, X1219V, X1224K, and X1256K); and (X262T, X405I, X409I, X480K, X543D, X694I and X1219V), or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOS: 10-262. In some embodiments, the amino acid sequence of the Cas9 protein comprises a combination of mutations selected from the group consisting of: of the amino acid sequence provided in SEQ ID NO. 9(E480K, E543D and E1219V); (a262T, S409I, E480K, E543D, M694I and E1219V); (K294R, E480K, E543D, E1219V, Q1256K and L1362P); (K294R, E480K, E543D, E1219V, and Q1256K); (S267G, K294R, E480K, E543DE121 1219V, N1224K and Q1256K); and (A262T, F405I, S409I, E480K, E543D, M694I and E1219V), or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOS: 10-262.
In some embodiments, the amino acid sequence of the HNH domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the HNH domain of any one of SEQ ID NOs 2,4, or 9. In some embodiments, the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of any one of SEQ ID NOs 2,4, or 9. In some embodiments, the Cas9 protein comprises the D10A and/or H840A mutations of the amino acid sequences provided in SEQ ID No. 9, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262. In some embodiments, the Cas9 protein comprises the D10A and H840A mutations of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262. In some embodiments, the Cas9 protein comprises the D10A mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262. In some embodiments, the Cas9 protein comprises the H840A mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262.
Some aspects of the disclosure provide a recombinant Cas9 protein comprising an amino acid sequence at least 90% identical to the amino acid sequence of streptococcus pyogenes Cas9 provided in SEQ ID No. 9, comprising RuvC and HNH domains of SEQ ID No. 9, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 262,267,294,405,409,480,543,694,1219,1224,1256 and 1362 of the amino acid sequence provided in SEQ ID NO 9; wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein; and wherein the recombinant Cas protein exhibits increased activity on a target sequence that does not comprise a canonical PAM (5' -NGG-3') at its 3' end compared to the Streptococcus pyogenes Cas9 provided by SEQ ID NO: 9. In some embodiments, the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of any one of SEQ ID NOs 2,4, or 9. In some embodiments, the Cas9 protein comprises the D10A and H840A mutations of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262.
Some aspects of the disclosure provide a recombinant Cas9 protein comprising an amino acid sequence at least 90% identical to the amino acid sequence of streptococcus pyogenes Cas9 provided in SEQ ID No. 9, comprising RuvC and HNH domains of SEQ ID No. 9, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 262,267,294,405,409,480,543,694,1219,1224,1256 and 1362 of the amino acid sequence provided in SEQ ID NO 9; wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein; and wherein the recombinant Cas protein exhibits increased activity on a target sequence that does not comprise a canonical PAM (5' -NGG-3') at its 3' end compared to the Streptococcus pyogenes Cas9 provided by SEQ ID NO: 9. In some embodiments, the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of any one of SEQ ID NOs 2,4, or 9. In some embodiments, the Cas9 protein comprises the D10A mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262. In some embodiments, the Cas9 protein further comprises a histidine residue at position 840 provided in EQ ID No. 9, or a corresponding histidine residue in any of the amino acid sequences provided in SEQ ID nos. 10-262. Without wishing to be bound by any particular theory, the presence of catalytic residue H840 allows Cas9to cleave the non-targeted strand, i.e., the strand bound by the sgRNA. In some embodiments, Cas9 having an amino acid residue other than histidine at position 840 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding amino acid in the amino acid sequence provided in SEQ ID NOs 10-262, may be altered or transformed such that position 840 of the amino acid sequence provided in SEQ ID No. 9 or the corresponding amino acid sequence provided in SEQ ID NOs 10-262 is a histidine.
Cas9 fusion protein
Some aspects of the disclosure provide fusion proteins comprising a Cas9 protein provided herein fused to a second protein or "fusion partner" such as an effector domain to form a fusion protein. In some embodiments, the effector domain is fused to the N-terminus of the Cas9 protein. In some embodiments, the effector domain is fused to the C-terminus of the Cas9 protein. In some embodiments, the Cas9 protein and the effector domain are fused to each other via a linker. Suitable strategies for generating fusion proteins according to aspects of the present disclosure, with or without a linker, will be apparent to those skilled in the art in light of the present disclosure and knowledge in the art. For example, Gilbert et al, CRISPR-mediated modular RNA-guided regulation of transformation in eukaryotes.cell.2013; 154(2) 442-51 shows that C-terminal fusion to VP64 using 2 NLS's (SPKKKRKVEAS, SEQ ID NO:284) as linker Cas9 can be used for transcriptional activation. Mali et al, CAS9tr(iv) descriptive activators for target specific screening and paired information for collaborative genome engineering. Nat Biotechnol. 2013; 31(9) 833-8 reports that C-terminal fusion to VP64 without a linker can be used for transcriptional activation. And Maeder et al, CRISPR RNA-modulated activity of endogenous human genes nat methods 2013; 10:977-979 reports that C-terminal fusions to VP64 using a Gly4Ser (SEQ ID NO:5) linker can be used as transcriptional activators. Recently, dCas9-FokI nuclease fusions were successfully generated and showed improved enzyme specificity compared to the parent Cas9 enzyme (Guilinger JP, Thompson DB, Liu DR. fusion of catalytic inactive Cas9to FokI nucleic acid improvements. the specificity of genetic modification. Nat. Biotechnology. 2014; 32(6):577-82 and Tsai SQ, Wyvekens N, Khayter C, Foden JA, Thapar V, Reyon D, Goodwin MJ, Arye MJ, Joung JK. digital CRISPR RNA-guided FokI nucleotides for high genetic specificity. Nat. Biotechnology. 32-6) PMIS 76. GGNO: 24770325SGSETPGTSESATPES GGNO: 367: 24770325SGSETPGTSESATPESn(SEQ ID NO:5) linkers were used for the FokI-dCas9 fusion proteins, respectively). In some embodiments, the linker comprises (GGGGS)n(SEQ ID NO:5),(G)n,(EAAAK)n(SEQ ID NO:6),(GGS)nSGSETPGTSESATPES (SEQ ID NO:7), or (XP)nA motif, or a combination of any of these, wherein n is independently an integer between 1 and 30. In some embodiments, the effector domain comprises an enzyme domain. Suitable enzyme domains include, but are not limited to, nucleases, nickases, recombinases, deaminases, methyltransferases, methylases, acetylases, acetyltransferases, transcriptional activators and transcriptional repressors.
In some embodiments, the linker comprises a polymer (e.g., polyethylene glycol, polyamide, polyester, etc.) in some embodiments, the linker comprises a monomer of an aminoalkanoic acid (e.g., glycine, acetic acid, alanine, β -alanine, 3-aminopropionic acid, 4-aminobutanoic acid, 5-pentanoic acid, etc.) in some embodiments, the linker comprises a monomer of an aminoalkanoic acid (Ahx), or a polymer (e.g., a polymer of an aminoalkanoic acid, a dimer, a thiol, etc.) in some embodiments, the linker comprises a monomer of an aminoalkanoic acid (Ahx), or a polymer (e.g., a dimer, an electrophilic amino acid, a thiol, a linker, a thiol, a.
In some embodiments, the effector domain comprises an effector enzyme. Suitable effector enzymes that may be used in accordance with the present disclosure include nickases, recombinases, and deaminases. However, additional effector enzymes will be apparent to those skilled in the art and are within the scope of the present disclosure. In other embodiments, the effector domain comprises a domain that modulates transcriptional activity. Such transcriptional regulatory domain may be, but is not limited to, a transcriptional activator or transcriptional repressor domain.
In some embodiments, the effector domain is an effector domain. In some embodiments, the effector domain is a deaminase domain. In some embodiments, the deaminase is a cytosine deaminase or a cytidine deaminase. In some embodiments, the deaminase is an apolipoprotein B mRNA editing complex (APOBEC) family deaminase. In some embodiments, the deaminase is an APOBEC1 deaminase. In some embodiments, the deaminase is an APOBEC2 deaminase. In some embodiments, the deaminase is an APOBEC3 deaminase. In some embodiments, the deaminase is an APOBEC3A deaminase. In some embodiments, the deaminase is an APOBEC3D deaminase. In some embodiments, the deaminase is an APOBEC3E deaminase. In some embodiments, the deaminase is an APOBEC3F deaminase. In some embodiments, the deaminase is an APOBEC3G deaminase. In some embodiments, the deaminase is an APOBEC3H deaminase. In some embodiments, the deaminase is an APOBEC4 deaminase. In some embodiments, the deaminase is an activation-induced deaminase (AID).
In some embodiments, the effector domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or at least 99.5% identical to the deaminase domain of any of SEQ ID NO: 263-281.
In some embodiments, the effector domain is a nuclease domain. In some embodiments, the nuclease domain is a FokI DNA cleavage domain. In some embodiments, the present disclosure provides dimers of fusion proteins provided herein, e.g., dimers of fusion proteins may comprise a dimerization nuclease domain.
In some embodiments, the Cas9 protein comprises an amino acid sequence at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of Cas9 provided in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein the Cas9 protein comprises RuvC and HNH domains, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 122,137,182,262,294,409,480,543,660,694,1219 and 1329 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding amino acid residues in any of the amino acid sequences provided in SEQ ID NOs 10-262, and wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein. In some embodiments, the Cas9 protein comprises an amino acid sequence at least 90% identical to the amino acid sequence of streptococcus pyogenes Cas9, comprising the RuvC and HNH domains of SEQ ID No. 9, wherein the amino acid sequence of Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 122,137,182,262,294,409,480,543,660,694,1219 and 1329 of the amino acid sequence provided in SEQ ID NO 9; wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein; and wherein the recombinant Cas protein exhibits activity against a target sequence that does not comprise the canonical PAM (5' -NGG-3') at its 3' end, as compared to the streptococcus pyogenes Cas9 provided in SEQ ID NO: 9.
In some embodiments, the Cas9 protein comprises an amino acid sequence at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of Cas9 provided in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein the Cas9 protein comprises RuvC and HNH domains, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 262,267,294,405,409,480,543,694,1219,1224,1256 and 1362 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding amino acid residues in any one of the amino acid sequences provided in SEQ ID NOs 10-262, and wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein. In some embodiments, the Cas9 protein comprises an amino acid sequence at least 90% identical to the amino acid sequence of streptococcus pyogenes Cas9, comprising the RuvC and HNH domains of SEQ ID No. 9, wherein the amino acid sequence of Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 262,267,294,405,409,480,543,694,1219,1224 and 1256 of the amino acid sequence provided in SEQ ID NO 9; wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein; and wherein the recombinant Cas protein exhibits increased activity against a target sequence that does not comprise the canonical PAM (5' -NGG-3') at its 3' end compared to the streptococcus pyogenes Cas9 provided in SEQ ID NO: 9.
Some aspects of the disclosure provide fusion proteins comprising (i) a nuclease-free active Cas9 protein; and (ii) an effector domain. In some embodiments, the effector domain is a DNA editing domain. In some embodiments, the effect domain has deaminase activity. In some embodiments, the effector domain comprises or is a deaminase domain. In some embodiments, the deaminase is a cytidine deaminase. In some embodiments, the deaminase is an apolipoprotein B mRNA editing complex (APOBEC) family deaminase. In some embodiments, the deaminase is an APOBEC1 family deaminase. In some embodiments, the deaminase is an activation-induced cytidine deaminase (AID). Some nucleic acid editing domains and Cas9 fusion proteins comprising such domains are described in detail herein. Other suitable effector domains will be apparent to those skilled in the art based on this disclosure. In some embodiments, the nucleic acid-editing domain is a fokl nuclease domain.
The present disclosure provides Cas9in various configurations: an effector domain fusion protein. In some embodiments, the effector domain is fused to the N-terminus of the Cas9 protein. In some embodiments, the effector domain is fused to the C-terminus of the Cas9 protein. In some embodiments, the Cas9 protein and the effector domain are fused by a linker. In some embodiments, the linker comprises a linker comprising (GGGGS)n(SEQ ID NO:5),(G)n,(EAAAK)n(SEQ ID NO:6),(GGS)nSGSETPGTSESATPES (SEQ ID NO:7) motif (see, e.g., Guilinger JP, Thompson DB, Liu DR. fusion of catalytic activity Cas9to FokI nucleic acids improvements of the specificity of genememodification.nat.biotechnol.2014; 32(6) 577-82; incorporated herein by reference in its entirety), or (XP)nA motif, or a combination of any of these, wherein n is independently an integer between 1 and 30. In some embodiments, n is independently 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29, or 30, or any combination thereof if more than one linker or more than one linker motif is present. Additional suitable linker motifs and linker configurations will be apparent to those skilled in the art. In some embodiments, suitable linker motifs and configurations include Chen et al, Fusion protein linkers: properties, design and functional. adv Drug Deliv Rev.2013; 65(10) 1357-69, the entire contents of which are incorporated herein by reference. Other suitable linker sequences will be apparent to those skilled in the art based on this disclosure and the knowledge in the art.
In some embodiments, the general architecture of the exemplary Cas9 fusion proteins provided herein comprises the following structure:
[NH2]- [ Effect Domain]-[Cas9]-[COOH]Or
[NH2]-[Cas9]- [ Effect Domain]-[COOH],
Wherein NH2Is the N-terminus of the fusion protein and COOH is the C-terminus of the fusion protein. Fig. 11 provides a schematic representation of a Cas9 protein fused to an effector domain (e.g., rAPOBEC1) complexed to a sgRNA and bound to a target nucleic acid sequence.
In some embodiments, any of the fusion proteins provided herein can comprise one or more Nuclear Localization Sequences (NLS). As used herein, a nuclear localization sequence refers to an amino acid sequence that facilitates entry (e.g., via nuclear transport) of a protein (e.g., any of the fusion proteins provided herein with NLS) into the nucleus. Typically, NLS comprise one or more short amino acid sequences of positively charged lysines or arginines exposed on the surface of the protein. Nuclear localization sequences are known in the art and will be apparent to those skilled in the art. For example, Kaldoron D., et al, "A short amino acid sequence able to specific nuclear location". Cell (1984)39(3Pt 2):499,509; dingwall C., et al, "The nucleocapsin nuclear location sequence is larger and complex that of SV-40 large T antigen". J Cell Biol. (1988)107(3): 841-9; makkerh J.P., et al, "Comparative mutagenesis of nuclear localization reactions of neural and acidic amino acids". Curr Biol. (1996)6(8): 1025-7; and Ray M, et al, "Quantitative tracking of protein to the nuclear using nuclear protein delivery by negative nanoparticle-stabilized nanocapsules". bioconjug.chem. (2015)26(6) 1004-7; the entire contents of each of which are incorporated herein by reference. Additional nuclear localization sequences are described, for example, in Plank et al, PCT/EP2000/011690, the entire contents of which are incorporated herein by reference. In some embodiments, the NLS comprises the amino acid sequence PKKKRKV (SEQ ID NO:299) or MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 300).
Exemplary features that may be present are localization sequences, such as nuclear localization sequences, cytoplasmic localization sequences, export sequences, such as nuclear export sequences or other localization sequences, and sequence tags that may be used for solubilization, purification, or detection of the fusion protein. Suitable localization signal sequences and protein tag sequences are provided herein and include, but are not limited to, a Biotin Carboxylase Carrier Protein (BCCP) tag, myc-tag, calmodulin-tag, FLAG-tag, Hemagglutinin (HA) -tag, polyhistidine-tag, also known as histidine-tag or His-tag, Maltose Binding Protein (MBP) -tag, nus-tag, glutathione-S-transferase (GST) -tag, Green Fluorescent Protein (GFP) -tag, thioredoxin-tag, S-tag, Softag (e.g., Softag 1, Softag 3), strep-tag, biotin ligase tag, FlAsH tag, V5 tag, and SBP-tag. Additional suitable sequences will be apparent to those skilled in the art and are within the scope of the disclosure.
Any of the nuclear localization sequences provided herein can be fused to the fusion protein in any suitable location. For example, to facilitate translocation of the fusion protein to the nucleus of a cell without impairing the function of the fusion protein. In some embodiments, the NLS is fused to the Cas9 protein N-terminus of the fusion protein. In some embodiments, the NLS is fused to the Cas9 protein C-terminus of the fusion protein. In some embodiments, the NLS is fused to the N-terminus of the effector domain of the fusion protein. In some embodiments, the NLS is fused to the C-terminus of the effector domain of the fusion protein.
In some embodiments, the effector domain is a deaminase. For example, in some embodiments, the general architecture of an exemplary Cas9 fusion protein with a deaminase domain comprises the following structure:
[NH2]-[NLS]-[Cas9]- [ deaminase)]-[COOH],
[NH2]-[NLS]- [ deaminase)]-[Cas9]-[COOH],
[NH2]-[Cas9]-[NLS]- [ deaminase)]-[COOH],
[NH2]- [ deaminase)]-[NLS]-[Cas9]-[COOH],
[NH2]- [ deaminase)]-[Cas9]-[NLS]-[COOH]Or is or
[NH2]-[Cas9]- [ deaminase)]-[NLS]-[COOH],
Where NLS is a nuclear localization signal, NH2Is the N-terminus of the fusion protein and COOH is the C-terminus of the fusion protein. In some embodiments, a linker is inserted between the Cas9 protein and the deaminase domain. In some embodiments, any "]- [ "may be one or more linkers. In some embodiments, the NLS is C-terminal to the deaminase and/or Cas9 domain. In some embodiments, the NLS is located between the deaminase and Cas9 domains. Additional features, such as sequence tags, may also be present.
One exemplary suitable type of effector domain includes cytosine deaminases, such as the APOBEC family. The apolipoprotein B mRNA editing complex (APOBEC) cytosine deaminase family comprises eleven proteins that act in a controlled and beneficial mannerMutagenesis was initiated.29One family member, activation-induced cytosine deaminase (AID), is responsible for antibody maturation by converting cytosine to uracil in ssDNA in a transcription-dependent, chain-biased manner.30Apolipoprotein B editing complex 3(APOBEC3) provides protection against certain HIV-1 strains by deamination of cytosine in reverse transcription of viral ssDNA. These proteins all require Zn2+Coordinating motif (His-X-Glu-X)23-26-Pro-Cys-X2-4-Cys; 283) and bound water molecules for catalytic activity. The Glu residue acts to activate water molecules to zinc hydroxide for nucleophilic attack in the deamination reaction. Each family member preferably deaminates at its own specific "hot spot", from the WRC of the hAID (W is a or T, R is a or G) to the TTC of the hAPOBEC 3F.32The recent crystal structure of the catalytic domain of APOBEC3G revealed that the secondary structure contains a five-chain β -fold core flanked by six α -helices, which are thought to be conserved throughout the family.33The active center loop has been shown to be responsible for ssDNA binding and determining the identity of the "hot spot".34Overexpression of these enzymes is associated with genomic instability and cancer, thus emphasizing the importance of sequence-specific targeting.35
Some aspects of the disclosure provide a systematic series of fusions between Cas9 and deaminase domains (e.g., cytosine deaminases such as APOBEC enzymes, or adenosine deaminases such as ADAT enzymes) that have been generated to direct the enzymatic activity of these deaminases to specific sites in genomic DNA. The advantages of using Cas9 as a recognition agent are twofold: (1) the sequence specificity of Cas9 can be easily altered by simply changing the sgRNA sequence; (2) cas9 binds to its target sequence by denaturing the dsDNA, creating a single stranded DNA, and thus is the active substrate for deaminase. It is understood that other catalytic domains or catalytic domains from other deaminases may also be used to produce fusion proteins with Cas9, and the disclosure is not limited in this respect.
Some aspects of the present disclosure are based on the recognition that, according to the numbering scheme in fig. 11, cas 9: the deaminase fusion protein can efficiently deaminate nucleotides at positions 3-11. It will be appreciated that one skilled in the art will be able to design suitable guide RNAs to target the fusion protein to a target sequence comprising the nucleotide to be deaminated. Both a PAM-dependent Cas9 protein or a Cas9 protein provided herein that is capable of targeting a PAM-free target sequence can be used for deamination of the target sequence.
Provided below are some exemplary suitable nucleic acid editing domains, e.g., deaminase domains, that can be fused to Cas9 domains according to aspects of the present disclosure. Generally, these proteins are required for deaminase and Zn is required for deaminase2+Coordinating motif (His-X-Glu-X)23-26-Pro-Cys-X2-4-Cys; 283) and bound water molecules for catalytic activity. The Glu residue acts to activate water molecules to zinc hydroxide for nucleophilic attack in the deamination reaction. It is understood that in some embodiments, an active domain of the corresponding sequence may be used, such as a domain without a localization signal (nuclear localization signal, non-nuclear export signal, cytoplasmic localization signal).
Human AID:
(underlined: nuclear localization signal; double underlined: nuclear export signal)
Mouse AID:
(underlined: nuclear localization signal; double underlined: nuclear export signal)
Dog AID:
(underlined: nuclear localization signal; double underlined: nuclear export signal)
Cattle AID:
(underlined: nuclear localization signal; double underlined: nuclear export signal)
Mouse APOBEC-3:
(italics: nucleic acid editing Domain)
Rat APOBEC-3:
(italics: nucleic acid editing Domain)
Rhesus APOBEC-3G:
(italics: nucleic acid editing domain; underlined: cytoplasmic localization signal)
Chimpanzee APOBEC-3G:
(italics: nucleic acid editing domain; underlined: cytoplasmic localization signal)
Green monkey (Green monkey) APOBEC-3G:
(italics: nucleic acid editing domain; underlined: cytoplasmic localization signal)
Human APOBEC-3G:
(italics: nucleic acid editing domain; underlined: cytoplasmic localization signal)
Human APOBEC-3F:
(italics: nucleic acid editing Domain)
Human APOBEC-3B:
(italics: nucleic acid editing Domain)
Human APOBEC-3C:
(italics: nucleic acid editing Domain)
Human APOBEC-3A:
(italics: nucleic acid editing Domain)
Human APOBEC-3H:
(italics: nucleic acid editing Domain)
Human APOBEC-3D:
(italics: nucleic acid editing Domain)
Human APOBEC-1:
MTSEKGPSTGDPTLRRRIEPWEFDVFYDPRELRKEACLLYEIKWGMSRKIWRSSGKNTTNHVEVNFIKKFTSERDFHPSMSCSITWFLSWSPCWECSQAIREFLSRHPGVTLVIYVARLFWHMDQQNRQGLRDLVNSGVTIQIMRASEYYHCWRNFVNYPPGDEAHWPQYPPLWMMLYALELHCIILSLPPCLKISRRWQNHLTFFRLHLQNCHYQTIPPHILLATGLIHPSVAWR(SEQ ID NO:279)
mouse APOBEC-1:
MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSVWRHTSQNTSNHVEVNFLEKFTTERYFRPNTRCSITWFLSWSPCGECSRAITEFLSRHPYVTLFIYIARLYHHTDQRNRQGLRDLISSGVTIQIMTEQEYCYCWRNFVNYPPSNEAYWPRYPHLWVKLYVLELYCIILGLPPCLKILRRKQPQLTFFTITLQTCHYQRIPPHLLWATGLK(SEQ ID NO:280)
rat APOBEC-1:
MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLK(SEQ ID NO:281)
in some embodiments, the fusion proteins provided herein comprise the full length amino acids of the effector domain, e.g., one of the sequences provided above. However, in other embodiments, the fusion proteins provided herein do not comprise the full-length sequence of the effector domain, but only fragments thereof. For example, in some embodiments, a fusion protein provided herein comprises a Cas9 protein and a fragment of an effector domain, e.g., wherein the fragment comprises an effector domain. Exemplary amino acid sequences of effector domains are shown in italics in the above sequences, and additional suitable sequences for such domains will be apparent to those skilled in the art.
Additional suitable nucleic acid editing domains, such as deaminase domain sequences, such as those that can be fused to a nuclease-free active Cas9 protein, that can be used according to aspects of the present disclosure will be apparent to those of skill in the art based on the present disclosure. In some embodiments, such additional domain sequences include deaminase domain sequences that are at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% similar to the sequences provided herein. Additional suitable Cas9 proteins, variants, and sequences will also be apparent to those skilled in the art. Examples of such additional suitable Cas9 proteins include, but are not limited to, Cas9 proteins with the following mutations: D10A, D10A/D839A/H840A and D10A/D839A/H840A/N863A (see, e.g., Prasinhuntet al, CAS9transcriptional activators for target specific screening and acquired nucleic acids for collaborative genome engineering. Nature Biotechnology.2013; 31(9):833-838, the entire contents of which are incorporated herein by reference).
Based on the present disclosure and in conjunction with general knowledge in the art, further suitable strategies for generating fusion proteins comprising a Cas9 protein and an effector domain (e.g., a DNA editing domain) will be apparent to those skilled in the art. Suitable strategies for producing fusion proteins with or without linkers according to aspects of the present disclosure will also be apparent to those of skill in the art in view of the present disclosure and knowledge in the art. For example, Gilbert et al, CRISPR-mediated modular RNA-mediated regulation of transformation in eukaryotes.cell.2013; 154(2) 442-51 shows that C-terminal fusion of Cas9to VP64 using 2 NLS's as linker (SPKKKRKVEAS, SEQ ID NO:284) can be used for transcriptional activation. Mali et al, CAS9transcriptional activators for targeted specific screening and targeted amino engineering, Nat Biotechnol.2013; 31(9) 833-8 reports that C-terminal fusion to VP64 without a linker can be used for transcriptional activation. And Maeder et al, CRISPR RNA-guided activity of endogeneous shruumgenes Nat methods.2013; 10:977-979 reports that C-terminal fusions to VP64 using a Gly4Ser (SEQ ID NO:5) linker can be used as transcriptional activators. Recently, dCas9-FokI nuclease fusions have been successfully generated and show improved enzyme specificity compared to the parent Cas9 enzyme (Guilinger JP, Thompson DB, Liu DR. fusion of catalytic inactive Cas9to FokI nucleic acid improvements. Nat. Biotechnology. 2014; 32(6):577-82 and Tsai SQ, Wyvekens N, Khayter C, Foden JA, Thapar V, Reyon D, Goodwin MJ, Arye MJ, Joung JK.mericCRISPR RNA-guided FokI nucleic acids for highliggiy genetic information. Nature Biotechnology. 2014; 32-6) PMIS 9-36 GGNO: 3 GGNO: 567 or GGNO: 567n(SEQ ID NO:5) linkers were used for the FokI-dCas9 fusion proteins, respectively).
In some embodiments, the Cas9 fusion protein comprises: (i) a Cas9 protein; and (ii) a transcriptional activator domain. In some embodiments, the transcriptional activator domain comprises a VPR. VPR is VP64-SV40-P65-RTA triple activator. In some embodiments, the VPR comprises the VP64 amino acid sequence encoded by the nucleic acid sequence of SEQ ID NO: 292:
GAGGCCAGCGGTTCCGGACGGGCTGACGCATTGGACGATTTTGATCTGGATATGCTGGGAAGTGACGCCCTCGATGATTTTGACCTTGACATGCTTGGTTCGGATGCCCTTGATGACTTTGACCTCGACATGCTCGGCAGTGACGCCCTTGATGATTTCGACCTGGACATGCTGATTAACTCTAGATAG(SEQ ID NO:292)
in some embodiments, the VPR comprises the VP64 amino acid sequence shown as SEQ ID NO: 293:
EASGSGRADALDDFDLDMLGSDALDDFDLDMLGSDALDDFDLDMLGSDALDDFDLDMLINSR(SEQ IDNO:293)
in some embodiments, the VPR comprises the VP64-SV40-P65-RTA amino acid sequence encoded by the nucleic acid sequence of SEQ ID NO: 294:
TCGCCAGGGATCCGTCGACTTGACGCGTTGATATCAACAAGTTTGTACAAAAAAGCAGGCTACAAAGAGGCCAGCGGTTCCGGACGGGCTGACGCATTGGACGATTTTGATCTGGATATGCTGGGAAGTGACGCCCTCGATGATTTTGACCTTGACATGCTTGGTTCGGATGCCCTTGATGACTTTGACCTCGACATGCTCGGCAGTGACGCCCTTGATGATTTCGACCTGGACATGCTGATTAACTCTAGAAGTTCCGGATCTCCGAAAAAGAAACGCAAAGTTGGTAGCCAGTACCTGCCCGACACCGACGACCGGCACCGGATCGAGGAAAAGCGGAAGCGGACCTACGAGACATTCAAGAGCATCATGAAGAAGTCCCCCTTCAGCGGCCCCACCGACCCTAGACCTCCACCTAGAAGAATCGCCGTGCCCAGCAGATCCAGCGCCAGCGTGCCAAAACCTGCCCCCCAGCCTTACCCCTTCACCAGCAGCCTGAGCACCATCAACTACGACGAGTTCCCTACCATGGTGTTCCCCAGCGGCCAGATCTCTCAGGCCTCTGCTCTGGCTCCAGCCCCTCCTCAGGTGCTGCCTCAGGCTCCTGCTCCTGCACCAGCTCCAGCCATGGTGTCTGCACTGGCTCAGGCACCAGCACCCGTGCCTGTGCTGGCTCCTGGACCTCCACAGGCTGTGGCTCCACCAGCCCCTAAACCTACACAGGCCGGCGAGGGCACACTGTCTGAAGCTCTGCTGCAGCTGCAGTTCGACGACGAGGATCTGGGAGCCCTGCTGGGAAACAGCACCGATCCTGCCGTGTTCACCGACCTGGCCAGCGTGGACAACAGCGAGTTCCAGCAGCTGCTGAACCAGGGCATCCCTGTGGCCCCTCACACCACCGAGCCCATGCTGATGGAATACCCCGAGGCCATCACCCGGCTCGTGACAGGCGCTCAGAGGCCTCCTGATCCAGCTCCTGCCCCTCTGGGAGCACCAGGCCTGCCTAATGGACTGCTGTCTGGCGACGAGGACTTCAGCTCTATCGCCGATATGGATTTCTCAGCCTTGCTGGGCTCTGGCAGCGGCAGCCGGGATTCCAGGGAAGGGATGTTTTTGCCGAAGCCTGAGGCCGGCTCCGCTATTAGTGACGTGTTTGAGGGCCGCGAGGTGTGCCAGCCAAAACGAATCCGGCCATTTCATCCTCCAGGAAGTCCATGGGCCAACCGCCCACTCCCCGCCAGCCTCGCACCAACACCAACCGGTCCAGTACATGAGCCAGTCGGGTCACTGACCCCGGCACCAGTCCCTCAGCCACTGGATCCAGCGCCCGCAGTGACTCCCGAGGCCAGTCACCTGTTGGAGGATCCCGATGAAGAGACGAGCCAGGCTGTCAAAGCCCTTCGGGAGATGGCCGATACTGTGATTCCCCAGAAGGAAGAGGCTGCAATCTGTGGCCAAATGGACCTTTCCCATCCGCCCCCAAGGGGCCATCTGGATGAGCTGACAACCACACTTGAGTCCATGACCGAGGATCTGAACCTGGACTCACCCCTGACCCCGGAATTGAACGAGATTCTGGATACCTTCCTGAACGACGAGTGCCTCTTGCATGCCATGCATATCAGCACAGGACTGTCCATCTTCGACACATCTCTGTTTTGA(SEQ ID NO:294)
in some embodiments, the VPR comprises the VP64-SV40-P65-RTA amino acid sequence as set forth in SEQ ID NO: 295:
SPGIRRLDALISTSLYKKAGYKEASGSGRADALDDFDLDMLGSDALDDFDLDMLGSDALDDFDLDMLGSDALDDFDLDMLINSRSSGSPKKKRKVGSQYLPDTDDRHRIEEKRKRTYETFKSIMKKSPFSGPTDPRPPPRRIAVPSRSSASVPKPAPQPYPFTSSLSTINYDEFPTMVFPSGQISQASALAPAPPQVLPQAPAPAPAPAMVSALAQAPAPVPVLAPGPPQAVAPPAPKPTQAGEGTLSEALLQLQFDDEDLGALLGNSTDPAVFTDLASVDNSEFQQLLNQGIPVAPHTTEPMLMEYPEAITRLVTGAQRPPDPAPAPLGAPGLPNGLLSGDEDFSSIADMDFSALLGSGSGSRDSREGMFLPKPEAGSAISDVFEGREVCQPKRIRPFHPPGSPWANRPLPASLAPTPTGPVHEPVGSLTPAPVPQPLDPAPAVTPEASHLLEDPDEETSQAVKALREMADTVIPQKEEAAICGQMDLSHPPPRGHLDELTTTLESMTEDLNLDSPLTPELNEILDTFLNDECLLHAMHISTGLSIFDTSLF(SEQ ID NO:295)
some aspects of the disclosure provide fusion proteins comprising a transcriptional activator. In some embodiments, the transcriptional activator is VPR. In some embodiments, the VPR comprises a wild-type VPR or a VPR as set forth in SEQ ID NO: 293. In some embodiments, VPR proteins provided herein include fragments of VPR and proteins homologous to VPR or VPR fragments. For example, in some embodiments, the VPR comprises a fragment of the amino acid sequence set forth in SEQ ID NO: 293. In some embodiments, the VPR comprises an amino acid sequence that is homologous to the amino acid sequence set forth in SEQ ID No. 293, or an amino acid sequence that is homologous to a fragment of the amino acid sequence set forth in SEQ ID No. 293. In some embodiments, a protein comprising a VPR or a fragment of VPR, or a homologue of a VPR or VPR fragment, is referred to as a "VPR variant". The VPR variants have homology to VPR or fragments thereof. For example, a VPR variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to a wild-type VPR or a VPR as shown in SEQ ID NO: 293. In some embodiments, a VPR variant comprises a fragment of a VPR such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to a wild-type VPR or a fragment of a VPR as set forth in SEQ ID NO: 293. In some embodiments, the VPR comprises the VPR set forth in SEQ ID NO: 293. In some embodiments, the VPR comprises an amino acid sequence encoded by the nucleotide sequence set forth in SEQ ID NO: 292.
In some embodiments, VPR is VP64-SV40-P65-RTA triple activator. In some embodiments, the VP64-SV40-P65-RTA comprises VP64-SV40-P65-RTA as shown in SEQ ID NO: 295. In some embodiments, the VP64-SV40-P65-RTA proteins provided herein include fragments of VP64-SV40-P65-RTA and proteins homologous to VP64-SV40-P65-RTA or VP64-SV40-P65-RTA fragments. For example, in some embodiments, VP64-SV40-P65-RTA comprises a fragment of the amino acid sequence set forth in SEQ ID NO: 295. In some embodiments, the VP64-SV40-P65-RTA comprises an amino acid sequence that is homologous to the amino acid sequence set forth in SEQ ID NO:295, or an amino acid sequence that is homologous to a fragment of the amino acid sequence set forth in SEQ ID NO: 295. In some embodiments, a protein comprising a fragment of VP64-SV40-P65-RTA or VP64-SV40-P65-RTA, or a homolog of VP64-SV40-P65-RTA or VP64-SV40-P65-RTA fragment is referred to as a "VP 64-SV40-P65-RTA variant". The VP64-SV40-RTA variant has homology to VP64-SV40-P65-RTA or fragments thereof. For example, the VP64-SV40-P65-RTA variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the VP64-SV40-P65-RTA as set forth in SEQ ID NO: 295. In some embodiments, the VP64-SV40-P65-RTA variant comprises a fragment of VP64-SV40-P65-RTA such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of VP64-SV40-P65-RTA as set forth in SEQ ID NO: 295. In some embodiments, the VP64-SV40-P65-RTA comprises the amino acid sequence set forth in SEQ ID NO: 295. In some embodiments, the VP64-SV40-P65-RTA comprises an amino acid sequence encoded by the nucleotide sequence set forth in SEQ ID NO: 294.
Some aspects of the present disclosure provide fusion proteins comprising (i) a Cas9 protein; and (ii) an effector domain. In some aspects, the fusion proteins provided herein further comprise (iii) a DNA binding protein, e.g., a zinc finger domain, a TALE, or a second Cas9 protein. Without wishing to be bound by any particular theory, a DNA-binding protein (e.g., a second Cas9 protein) is fused to a fusion protein comprising (i) a protein; and (ii) the effector domain may be used to increase the specificity of the fusion protein for a target nucleic acid sequence, or to increase the specificity or binding affinity of the fusion protein for a target nucleic acid sequence that does not comprise the canonical PAM (5 '-NGG-3'). In some embodiments, the second Cas9 protein is any Cas9 protein provided herein. In some embodiments, the second Cas9 protein is fused to the fusion protein to the N-terminus of the Cas9 protein. In some embodiments, the second Cas9 protein is fused to the fusion protein to the C-terminus of the Cas9 protein. In some embodiments, the Cas9 protein and the second Cas9 protein are fused by a linker.
Further provided herein are complexes comprising any of the fusion proteins provided herein, a first guide RNA that binds to the Cas9 protein of the fusion protein, and a second guide RNA that binds to the second Cas9 protein of the fusion protein. In some embodiments, the first guide RNA is about 15-100 nucleotide rows and comprises a sequence of at least 10 contiguous nucleotides complementary to the first target sequence, and the second guide RNA is about 15-100 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides complementary to the second target sequence. In some embodiments, the first guide RNA and/or the second guide RNA is 15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49, or 50 nucleotides in length. In some embodiments, the first guide RNA and the second guide RNA are different. In some embodiments, the first guide RNA comprises a sequence of 15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39, or 40 contiguous nucleotides that is complementary to the first target sequence, and wherein the second guide RNA comprises a sequence of 15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39, or 40 contiguous nucleotides that is complementary to the second target sequence. In some embodiments, the first target sequence and the second target sequence are different. In some embodiments, the first target sequence and the second target sequence are DNA sequences. In some embodiments, the first target sequence and the second target sequence are in the genome of the mammal. In some embodiments, the first target sequence and the second target sequence are in the genome of the human. In some embodiments, the first target sequence is within 30 nucleotides of the second target sequence. In some embodiments, the 3' end of the first target sequence is not directly adjacent to the canonical PAM sequence (5' -NGG-3 '). In some embodiments, the 3' end of the second target sequence is not directly adjacent to the canonical PAM sequence (5' -NGG-3 ').
In some embodiments, the general architecture of the exemplary Cas9 fusion proteins provided herein has the following structure:
[NH2]- [ Effect Domain]-[Cas9]- [ second Cas9 protein]-[COOH];
[NH2]- [ second Cas9 protein]-[Cas9]- [ Effect Domain]-[COOH];
[NH2]-[Cas9]- [ Effect Domain]- [ second Cas9 protein]-[COOH];
[NH2]- [ second Cas9 protein]- [ Effect Domain]-[Cas9]-[COOH];
[NH2]-[UGI]- [ Effect Domain]-[Cas9]- [ second Cas9 protein]-[COOH];
[NH2]-[UGI]- [ second Cas9 protein]-[Cas9]- [ Effect Domain]-[COOH];
[NH2]-[UGI]-[Cas9]- [ Effect Domain]- [ second Cas9 protein]-[COOH];
[NH2]-[UGI]- [ second Cas9 protein]- [ Effect Domain]-[Cas9]-[COOH];
[NH2]- [ Effect Domain]-[Cas9]- [ second Cas9 protein]-[UGI]-[COOH];
[NH2]- [ second Cas9 protein]-[Cas9]- [ Effect Domain]-[UGI]-[COOH];
[NH2]-[Cas9]- [ Effect Domain]- [ second Cas9 protein]-[UGI]-[COOH](ii) a Or
[NH2]- [ second Cas9 protein]- [ Effect Domain]-[Cas9]-[UGI]-[COOH];
Wherein NH2Is the N-terminus of the fusion protein and COOH is the C-terminus of the fusion protein. In some embodiments, as used in the general architecture described above "]- [ "indicates the presence of an optional linker sequence. In other examples, the general architecture of the exemplary Cas9 fusion proteins provided herein has the following structure:
[NH2]- [ Effect Domain]-[Cas9]- [ second Cas9 protein]-[COOH];
[NH2]- [ second Cas9 protein]-[Cas9]- [ Effect Domain]-[COOH];
[NH2]-[Cas9]- [ Effect Domain]- [ second Cas9 protein]-[COOH];
[NH2]- [ second Cas9 protein]- [ Effect Domain]-[Cas9]-[COOH];
[NH2]-[UGI]- [ Effect Domain]-[Cas9]- [ second Cas9 protein]-[COOH],
[NH2]-[UGI]- [ second Cas9 protein]-[Cas9]- [ Effect Domain]-[COOH];
[NH2]-[UGI]-[Cas9]- [ Effect Domain]- [ second Cas9 protein]-[COOH];
[NH2]-[UGI]- [ second Cas9 protein]- [ Effect Domain]-[Cas9]-[COOH];
[NH2]- [ Effect Domain]-[Cas9]- [ second Cas9 protein]-[UGI]-[COOH];
[NH2]- [ second Cas9 protein]-[Cas9]- [ Effect Domain]-[UGI]-[COOH];
[NH2]-[Cas9]- [ Effect Domain]- [ second Cas9 protein]-[UGI]-[COOH](ii) a Or
[NH2]- [ second Cas9 protein]- [ Effect Domain]-[Cas9]-[UGI]-[COOH];
Wherein NH2Is the N-terminus of the fusion protein and COOH is the C-terminus of the fusion protein. In some embodiments, "-" as used in the general architectures above indicates the presence of an optional linker sequence. In some embodiments, the second Cas9 is a dCas9 protein. In some examples, the general architecture of the exemplary Cas9 fusion proteins provided herein comprises the structure as shown in fig. 8. It is understood that any of the proteins provided in any general structure of the exemplary Cas9 fusion protein may be linked by one or more linkers provided herein. In some embodiments, the linkers are the same. In some embodiments, the linkers are different. In some embodiments, one or more proteins provided in any general architecture of the exemplary Cas9 fusion protein are not fused via a linker. In some embodiments, the fusion protein further comprises a nuclear targeting sequence, such as a nuclear localization sequence. In some embodiments, the fusion proteins provided herein further comprise a Nuclear Localization Sequence (NLS). In some embodiments, the NLS is fused to the N-terminus of the fusion protein. In some embodiments, the NLS is fused to the C-terminus of the fusion protein. In some embodiments, the NLS is fused to the N-terminus of the second Cas9 protein. In some embodiments, the NLS is associated with a second CaThe C-terminal fusion of the s9 protein. In some embodiments, the NLS is fused to the N-terminus of the Cas9 protein. In some embodiments, the NLS is fused to the C-terminus of the Cas9 protein. In some embodiments, the NLS is fused to the N-terminus of the effector domain. In some embodiments, the NLS is fused to the C-terminus of the effector domain. In some embodiments, the NLS is fused to the N-terminus of the UGI protein. In some embodiments, the NLS is fused to the C-terminus of the UGI protein. In some embodiments, the NLS is fused to the fusion protein through one or more linkers. In some embodiments, the NLS is fused to the fusion protein without a linker.
Uracil glycosylase inhibitor fusion protein
Some aspects of the disclosure provide fusion proteins comprising a Cas9 protein fused to an effector domain (e.g., deaminase) and a Uracil Glycosylase Inhibitor (UGI). In some embodiments, the fusion protein comprises the following structure:
[ deaminase ] - [ optional linker sequence ] - [ Cas9] - [ optional linker sequence ] - [ UGI ];
[ deaminase ] - [ optional linker sequence ] - [ UGI ] - [ optional linker sequence ] - [ Cas9 ];
[ UGI ] - [ optional linker sequence ] - [ deaminase ] - [ optional linker sequence ] - [ Cas9 ];
[ UGI ] - [ optional linker sequence ] - [ Cas9] - [ optional linker sequence ] - [ deaminase ];
[ Cas9] - [ optional linker sequence ] - [ deaminase ] - [ optional linker sequence ] - [ UGI ]; or
[ Cas9] - [ alternative linker sequence ] - [ UGI ] - [ alternative linker sequence ] - [ deaminase ].
In some embodiments, the fusion protein does not comprise a linker sequence. In some embodiments, one or two optional linker sequences are present.
In some embodiments, the fusion protein further comprises a second Cas9 protein. For example, the second Cas9 protein can be any Cas9 protein provided herein. In some embodiments, the fusion protein comprises the following structure:
[ deaminase ] - [ Cas9] - [ UGI ]; [ deaminase ] - [ UGI ] - [ Cas9 ];
[ UGI ] - [ deaminase ] - [ Cas9 ];
[ UGI ] - [ Cas9] - [ deaminase ];
[ Cas9] - [ deaminase ] - [ UGI ];
[ Cas9] - [ UGI ] - [ deaminase ];
[ second Cas9] - [ deaminase ] - [ Cas9] - [ UGI ];
[ second Cas9] - [ deaminase ] - [ UGI ] - [ Cas9 ];
[ second Cas9] - [ UGI ] - [ deaminase ] - [ Cas9 ];
[ second Cas9] - [ UGI ] - [ Cas9] - [ deaminase ];
[ second Cas9] - [ Cas9] - [ deaminase ] - [ UGI ];
[ second Cas9] - [ Cas9] - [ UGI ] - [ deaminase ];
[ deaminase ] - [ second Cas9] - [ Cas9] - [ UGI ];
[ deaminase ] - [ second Cas9] - [ UGI ] - [ Cas9 ];
[ UGI ] - [ second Cas9] - [ deaminase ] - [ Cas9 ];
[ UGI ] - [ second Cas9] - [ Cas9] - [ deaminase ];
[ Cas9] - [ second Cas9] - [ deaminase ] - [ UGI ];
[ Cas9] - [ second Cas9] - [ UGI ] - [ deaminase ]
[ deaminase ] - [ Cas9] - [ second Cas9] - [ UGI ];
[ deaminase ] - [ UGI ] - [ second Cas9] - [ Cas9 ];
[ UGI ] - [ deaminase ] - [ second Cas9] - [ Cas9 ];
[ UGI ] - [ Cas9] - [ second Cas9] - [ deaminase ];
[ Cas9] - [ deaminase ] - [ second Cas9] - [ UGI ];
[ Cas9] - [ UGI ] - [ second Cas9] - [ deaminase ];
[ deaminase ] - [ Cas9] - [ UGI ] - [ second Cas9 ];
[ deaminase ] - [ UGI ] - [ Cas9] - [ second Cas9 ];
[ UGI ] - [ deaminase ] - [ Cas9] - [ second Cas9 ];
[ UGI ] - [ Cas9] - [ deaminase ] - [ second Cas9 ];
[ Cas9] - [ deaminase ] - [ UGI ] - [ second Cas9 ]; or
[ Cas9] - [ UGI ] - [ deaminase ] - [ second Cas9 ].
In some embodiments, "-" as used in the general architectures above indicates the presence of an optional linker sequence. In some embodiments, the fusion protein comprising UGI further comprises a nuclear targeting sequence, such as a nuclear localization sequence. In some embodiments, the fusion proteins provided herein further comprise a Nuclear Localization Sequence (NLS). In some embodiments, the NLS is fused to the N-terminus of the fusion protein. In some embodiments, the NLS is fused to the C-terminus of the fusion protein. In some embodiments, the NLS is fused to the N-terminus of the UGI protein. In some embodiments, the NLS is fused to the C-terminus of the UGI protein. In some embodiments, the NLS is fused to the N-terminus of the Cas9 protein. In some embodiments, the NLS is fused to the C-terminus of the Cas9 protein. In some embodiments, the NLS is fused to the N-terminus of the deaminase. In some embodiments, the NLS is fused to the C-terminus of the deaminase. In some embodiments, the NLS is fused to the N-terminus of the second Cas9 protein. In some embodiments, the NLS is fused to the C-terminus of the second Cas9 protein. In some embodiments, the NLS is fused to the fusion protein through one or more linkers. In some embodiments, the NLS is fused to the fusion protein without a linker.
In some embodiments, the UGI comprises a wild-type UGI or a UGI as set forth in SEQ ID NO: 553. In some embodiments, the UGI proteins provided herein include fragments of UGI and proteins homologous to UGI or fragments of UGI. For example, in some embodiments, the UGI comprises a fragment of the amino acid sequence set forth in SEQ ID NO 553. In some embodiments, the UGI comprises an amino acid sequence that is homologous to the amino acid sequence set forth in SEQ ID NO:553, or an amino acid sequence that is homologous to a fragment of the amino acid sequence set forth in SEQ ID NO: 553. In some embodiments, a protein comprising a UGI or a fragment of a UGI, or a homolog of a UGI or fragment of a UGI, is referred to as a "UGI variant". The UGI variant has homology to UGI or a fragment thereof. For example, a variant of UGI is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the wild-type UGI or UGI as shown in SEQ ID NO: 553. In some embodiments, the UGI variant comprises a fragment of the UGI such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the wild-type UGI or a fragment of the UGI as shown in SEQ ID NO: 553. In some embodiments, the UGI comprises the following amino acid sequence:
inhibitors of > sp | P14739| UNGI _ BPPB2 uracil-DNA glycosylase
MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML(SEQ ID NO:553)
Suitable UGI proteins and nucleotide sequences are provided herein, and other suitable UGI sequences are known to those skilled in the art and include, for example, those described in Wang et al, Uracil-DNA glycosylation inhibitor gene of bacterial phage PBS2 encode a binding protein specific for Uracil-DNAglycosylation J.biol.chem.264: 1163-; lundquist et al, Site-directed mutagenesis and characterization of uracil-DNA glycosylation inhibiting protein. role of specific carboxylic amino acids in complex formation with Escherichia coli-DNA glycosylation J.biol.chem.272:21408-21419 (1997); ravishankar et al, X-ray analysis of a complex of Escherichia coli DNA glycosylase (EcUDG) with a protein amino inhibitor, the structure identification of a prokar genetic UDG nucleic Acids Res.26:4880-4887 (1998); and Putnam et al, Protein chemistry of DNA from crystal structures of the uracil-DNA glycosylation inhibitor Protein and its complex with Escherichia coli-DNA glycosylation enzyme J.mol.biol.287:331-346(1999), the entire contents of each of which are incorporated herein by reference.
It is understood that the additional protein may be a uracil glycosylase inhibitor. For example, other proteins capable of inhibiting (e.g., sterically blocking) uracil-DNA glycosylase base excision repair enzymes are within the scope of the present disclosure. In some embodiments, the uracil glycosylase inhibitor is a protein that binds DNA. In some embodiments, the uracil glycosylase inhibitor is a protein that binds single-stranded DNA. For example, the uracil glycosylase inhibitor can be a single-chain binding protein of talania europaea (Erwinia tasmaniaensis). In some embodiments, the single-chain binding protein comprises an amino acid sequence (SEQ ID NO: 303). In some embodiments, the uracil glycosylase inhibitor is a uracil-binding protein. In some embodiments, the uracil glycosylase inhibitor is a protein that binds uracil in DNA. In some embodiments, the uracil glycosylase inhibitor is a catalytically inactive uracil DNA-glycosylase protein. In some embodiments, the uracil glycosylase inhibitor is a catalytically inactive uracil DNA-glycosylase protein that does not excise uracil from urine. For example, the uracil glycosylase inhibitor is UdgX. In some embodiments, UdgX comprises an amino acid sequence (SEQ ID NO: 304). As another example, the uracil glycosylase inhibitor is catalytically inactive UDG. In some embodiments, the catalytically inactive UDG comprises an amino acid sequence (SEQ ID NO: 305). It is understood that other uracil glycosylase inhibitors will be apparent to those skilled in the art and are within the scope of this disclosure.
Erwinia tasmaniansis SSB (thermostable Single-stranded DNA binding protein)
MASRGVNKVILVGNLGQDPEVRYMPNGGAVANITLATSESWRDKQTGETKEKTEWHRVVLFGKLAEVAGEYLRKGSQVYIEGALQTRKWTDQAGVEKYTTEVVVNVGGTMQMLGGRSQGGGASAGGQNGGSNNGWGQPQQPQGGNQFSGGAQQQARPQQQPQQNNAPANNEPPIDFDDDIP(SEQ ID NO:303)
UdgX (binding to uracil in DNA but not excision)
MAGAQDFVPHTADLAELAAAAGECRGCGLYRDATQAVFGAGGRSARIMMIGEQPGDKEDLAGLPFVGPAGRLLDRALEAADIDRDALYVTNAVKHFKFTRAAGGKRRIHKTPSRTEVVACRPWLIAEMTSVEPDVVVLLGATAAKALLGNDFRVTQHRGEVLHVDDVPGDPALVATVHPSSLLRGPKEERESAFAGLVDDLRVAADVRP(SEQ ID NO:304)
UDG (human UDG without catalytic Activity, binding to uracil in DNA but not excision)
MIGQKTLYSFFSPSPARKRHAPSPEPAVQGTGVAGVPEESGDAAAIPAKKAPAGQEEPGTPPSSPLSAEQLDRIQRNKAAALLRLAARNVPVGFGESWKKHLSGEFGKPYFIKLMGFVAEERKHYTVYPPPHQVFTWTQMCDIKDVKVVILGQEPYHGPNQAHGLCFSVQRPVPPPPSLENIYKELSTDIEDFVHPGHGDLSGWAKQGVLLLNAVLTVRAHQANSHKERGWEQFTDAVVSWLNQNSNGLVFLLWGSYAQKKGSAIDRKRHHVLQTAHPSPLSVYRGFFGCRHFSKTNELLQKSGKKPIDWKEL(SEQ ID NO:305)
High fidelity Cas9
Some aspects of the disclosure provide high fidelity Cas9 proteins. In some embodiments, the high fidelity Cas9 protein has reduced electrostatic interactions between the Cas9 protein and the sugar-phosphate backbone of DNA compared to the wild-type Cas9 domain. In some embodiments, any of the Cas9 proteins provided herein comprises one or more mutations that reduce binding between the Cas9 protein and the sugar-phosphate backbone of DNA. In some embodiments, any of the Cas9 proteins provided herein comprises one or more mutations that reduce binding between the Cas9 protein and the sugar-phosphate backbone of DNA by at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95%. In some embodiments, any of the Cas9 proteins provided herein comprises one or more of the N497X, R661X, Q695X, and/or Q926X mutations of the amino acid sequence provided in SEQ ID No. 9, or the amino acid sequence of SEQ ID NO:10-262, wherein X is any amino acid. In some embodiments, any of the Cas9 proteins provided herein comprises one or more of the N497A, R661A, Q695A, and/or Q926A mutations of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NOs 10-262. In some embodiments, the Cas9 protein comprises the D10A mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID nos. 10-262. In some embodiments, the Cas9 protein comprises the amino acid sequence shown as SEQ ID NO: 306. High fidelity Cas9 protein has been described in the art and will be apparent to those skilled in the art. For example, "Nature 529,490-495 (2016)" High-fidelity CRISPR-Cas9 cycles with no detectable genes-with off-target effects; and the high fidelity Cas9 protein is described in Slaymaker, i.m., et al, "rational engineered Cas9 cycles with improved specificity," Science351,84-88 (2015); the entire contents of each are incorporated herein by reference. It is understood, based on the present disclosure and knowledge in the art, that mutations can be made in any Cas9 protein to make a high fidelity Cas9 protein with reduced electrostatic interactions between the Cas9 protein and the sugar-phosphate backbone of DNA compared to the wild-type Cas9 domain.
Cas9 domain, wherein the mutation of Cas9 relative to SEQ ID NO 9 is shown in bold and underlined.
Cas9 protein with reduced PAM exclusivity
Some aspects of the disclosure provide Cas9 proteins with different PAM specificities. Typically, Cas9 proteins, such as Cas9 from streptococcus pyogenes (spCas9), require a canonical NGG PAM sequence to bind to a particular nucleic acid region. This may limit the ability of the Cas9 protein to bind to specific nucleotide sequences within the genome. Thus, in some embodiments, any Cas protein provided herein may be capable of binding to a nucleotide sequence that does not contain a canonical (e.g., NGG) PAM sequence. For example, Cas9 proteins that bind non-canonical PAM sequences have been described in Kleinstimer, B.P., et al, "Engineered CRISPR-Cas9 nucleotides with altered PAM specificities" Nature 523,481-485 (2015); andKleinstitver, B.P., et al, "broadcasting the targeting range of Staphylococcus CRISPR-Cas9by modifying PAM recognition," Nature Biotechnology 33,1293-1298 (2015); the entire contents of each are incorporated herein by reference.
In some embodiments, the Cas9 protein is a Cas9 protein (SaCas9) from Staphylococcus aureus (Staphylococcus aureus). In some embodiments, the SaCas9 protein is a nuclease active SaCas9, a nuclease inactive SaCas9(SaCas9d) or a SaCas9 nickase (SaCas9 n). In some embodiments, SaCas9 comprises the amino acid sequence SEQ ID NO 307. In some embodiments, SaCas9 comprises the N579X mutation of SEQ ID NO:307, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NOs 9-262, wherein X is any amino acid other than N. In some embodiments, SaCas9 comprises the N579A mutation of SEQ ID No. 307, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NOs 10-262. In some embodiments, a SaCas9 protein, a SaCas9d protein, or a SaCas9n protein is capable of binding to a nucleic acid sequence having a non-canonical PAM. In some embodiments, a SaCas9 protein, a SaCas9d protein, or a SaCas9n protein is capable of binding a nucleic acid sequence having a sequence of NNGRRT PAM. In some embodiments, the SaCas9 protein comprises one or more of the E781X, N967X, or R1014X mutations of SEQ ID NO:307, or the corresponding mutation in any of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X is any amino acid. In some embodiments, the SaCas9 protein comprises one or more of the E781K, N967K, or R1014H mutations of SEQ ID NO:307, or the corresponding mutation in any of the amino acid sequences provided in SEQ ID NOs 10-262. In some embodiments, the SaCas9 protein comprises the E781K, N967K, and R1014H mutations of SEQ ID NO:307, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262. It is understood that these mutations can be combined with any other mutation provided herein
In some embodiments, the Cas9 protein of any of the fusion proteins provided herein comprises an amino acid sequence that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to any one of SEQ ID NOs 307 and 309. In some embodiments, the Cas9 protein of any of the fusion proteins provided herein comprises the amino acid sequence of any one of SEQ ID NOs 307-309. In some embodiments, the Cas9 protein of any of the fusion proteins provided herein consists of the amino acid sequence of any one of SEQ ID NOs 307-309.
Exemplary SacAS9 sequences
307, residue N579 can be mutated (e.g., to A579) to produce a SacAS9 nickase.
Exemplary SacaS9n sequences
Residue A579 of SEQ ID NO:308 (which can be mutated from N579 of SEQ ID NO:307 to produce a SacAS9 nickase) is underlined and in bold.
Exemplary SaKKH Cas9
Residue A579 of SEQ ID NO:309 (which can be mutated from N579 of SEQ ID NO:307 to produce a SacAS9 nickase) is underlined and in bold. Residues K781, K967 and H1014 of SEQ ID NO:309 (which can be mutated from E781, N967 and R1014 of SEQ ID NO:307 to yield SaKKH Cas9) are underlined and italicized.
In some embodiments, the Cas9 protein is Cas9 protein from streptococcus pyogenes (SpCas 9). In some embodiments, the SpCas9 protein is a nuclease-active SpCas9, a nuclease-inactive SpCas9(SpCas9d) or a SpCas9 nickase (SpCas9 n). In some embodiments, SpCas9 comprises the amino acid sequence SEQ ID No. 9. In some embodiments, SpCas9 comprises the D10X mutation of SEQ ID NO:9, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X is any amino acid other than D. In some embodiments, SpCas9 comprises the D10A mutation of SEQ ID No. 9, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NOs 10-262. In some embodiments, the SpCas9 protein, SpCas9d protein, or SpCas9n protein is capable of binding to a nucleic acid sequence with non-canonical PAM. In some embodiments, the SpCas9 protein, SpCas9d protein, or SpCas9n protein is capable of binding to a nucleic acid sequence having an NGG, NGA, or NGCG PAM sequence. In some embodiments, the SpCas9 protein comprises one or more of the D1135X, R1335X and T1337X mutations of SEQ ID NO:9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X is any amino acid. In some embodiments, the SpCas9 protein comprises one or more of the D1135E, R1335Q and T1337R mutations of SEQ ID NO:9, or the corresponding mutation in any of the amino acid sequences provided in SEQ ID NOs 10-262. In some embodiments, the SpCas9 protein comprises the D1135E, R1335Q, and T1335R mutations of SEQ ID NO:9, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262. In some embodiments, the SpCas9 protein comprises one or more of the D1135X, R1335X and T1337X mutations of SEQ ID NO:9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X is any amino acid. In some embodiments, the SpCas9 protein comprises one or more of the D1135V, R1335Q and T1337R mutations of SEQ ID NO:9, or the corresponding mutation in any of the amino acid sequences provided in SEQ ID NOs 10-262. In some embodiments, the SpCas9 protein comprises the D1135V, R1335Q, and T1337R mutations of SEQ ID NO:9, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262. In some embodiments, the SpCas9 protein comprises one or more of the D1135X, G1218X, R1335X, and T1337X mutations of SEQ ID NO:9, or the corresponding mutation in any of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X is any amino acid. In some embodiments, the SpCas9 protein comprises one or more of the D1135V, G1218R, R1335Q, and T1337R mutations of SEQ ID NO:9, or the corresponding mutation in any of the amino acid sequences provided in SEQ ID NOs 10-262. In some embodiments, the SpCas9 protein comprises the D1135V, G1218R, R1335Q, and T1337R mutations of SEQ ID NO:9, or the corresponding mutations in any of the amino acid sequences provided in SEQ ID NOs 10-262. It is understood that these mutations can be combined with any other mutation provided herein-in some embodiments, the Cas9 protein of any fusion protein provided herein comprises an amino acid sequence that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to any one of SEQ id nos 310-313. In some embodiments, the Cas9 protein of any of the fusion proteins provided herein comprises the amino acid sequence of any one of SEQ ID NOs 310-313. In some embodiments, the Cas9 protein of any of the fusion proteins provided herein consists of the amino acid sequence of any one of SEQ id nos 310-313.
Exemplary SpCas9
MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD(SEQ ID NO:9)
Exemplary SpCas9n
DKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD(SEQ ID NO:310)
Exemplary SpEQR Cas9
Residues E1134, Q1334 and R1336 of SEQ ID NO 311 (which may be mutated from D1134, R1334 and T1336 of SEQ ID NO: 9to produce SpEQR Cas9) are underlined and bolded.
Exemplary SpVQR Cas9
SEQ ID NO: residues V1134, Q1334 and R1336 of 312 (which may be mutated from D1134, R1334 and T1336 of SEQ ID NO: 9to produce SpVQR Cas9) are underlined and bolded.
Exemplary SpVRER Cas9
Residues V1134, R1217, Q1334 and R1336 of SEQ ID NO:313 (which can be mutated from D1134, G1217, R1334 and T1336 of SEQ ID NO: 9to generate SpVRER Cas9) are underlined and bolded.
Cas9 complex with guide RNA
Some aspects of the invention provide complexes comprising a Cas9 protein or Cas9 fusion protein provided herein, and a guide RNA that binds to a Cas9 protein or Cas9 fusion protein. In some embodiments, the guide RNA is about 15-100 nucleotides in length and comprises a sequence of at least 10 contiguous nucleotides that are complementary to the target sequence. In some embodiments, the guide RNA is 15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49, or 50 long. In some embodiments, the guide RNA comprises a sequence of 15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39, or 40 contiguous nucleotides complementary to the target sequence. In some embodiments, the target sequence is a DNA sequence. In some embodiments, the target sequence is a sequence in a mammalian genome. In some embodiments, the target sequence is a sequence in the human genome. In some embodiments, the 3' end of the target sequence is not directly adjacent to the canonical PAM sequence (5' -NGG-3 ').
Some aspects of the disclosure provide complexes comprising a first guide RNA that binds to a Cas9 protein of a fusion protein, and a second guide RNA that binds to a second Cas9 protein of the fusion protein. In some embodiments, the first guide RNA is about 15-100 nucleotides in length and comprises a sequence of at least 10 contiguous nucleotides that is complementary to the first target sequence, and the second guide RNA is about 15-100 nucleotides in length and comprises a sequence of at least 10 contiguous nucleotides that is complementary to the second target sequence. In some embodiments, the first guide RNA and/or the second guide RNA is 15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49, or 50 long. In some embodiments, the first guide RNA and the second guide RNA are different. In some embodiments, the first guide RNA comprises a sequence of 15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39, or 40 contiguous nucleotides that is complementary to the first target sequence, and wherein the second guide RNA comprises a sequence of 15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39, or 40 contiguous nucleotides that is complementary to the second target sequence.
In some embodiments, the first target sequence and the second target sequence are different. In some embodiments, the first target sequence and the second target sequence are DNA sequences. In some embodiments, the first target sequence and the second target sequence are in the genome of the mammal. In some embodiments, the first target sequence and the second target sequence are in the genome of the human. In some embodiments, the first target sequence is within at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, or 200 nucleotides of the second target sequence. In some embodiments, the 3' end of the first target sequence is not directly adjacent to the canonical PAM sequence (5' -NGG-3 '). In some embodiments, the 3' end of the second target sequence is not directly adjacent to the canonical PAM sequence (5' -NGG-3 ').
Methods of using Cas9 fusion proteins
Some aspects of the disclosure provide methods of using Cas9 proteins, fusion proteins, or complexes provided herein. For example, some aspects of the present disclosure provide methods comprising contacting a DNA molecule with (a) any of the Cas9 proteins or fusion proteins provided herein, and with at least one guide RNA, wherein the guide RNA is about 15-100 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides complementary to a target sequence; or a Cas9 protein or fusion protein complexed with (b) a Cas9 protein or Cas9 fusion protein as provided herein, or at least one gRNA. In some embodiments, the 3' end of the target sequence is not directly adjacent to the canonical PAM sequence (5' -NGG-3 '). In some embodiments, the 3' end of the target sequence is immediately adjacent to the AGC, GAG, TTT, GTG or CAA sequence.
In some embodiments, the target DNA sequence comprises a sequence associated with a disease or disorder. In some embodiments, the target DNA sequence comprises a point mutation associated with a disease or disorder. In some embodiments, the activity of the Cas9 protein, Cas9 fusion protein or complex results in correction of a point mutation. In some embodiments the target DNA sequence comprises a T → C point mutation associated with a disease or condition, and wherein deamination of the mutant C base results in a sequence not associated with a disease or condition. In some embodiments, the target DNA sequence encodes a protein, and wherein the point mutation is located in a codon and results in a change in the amino acid encoded by the mutant codon as compared to the wild-type codon. In some embodiments, deamination of mutant C results in a change in the amino acid encoded by the mutant codon. In some embodiments, deamination of mutant C results in a codon encoding a wild-type amino acid. In some embodiments, the contacting is in vivo in a subject. In some embodiments, the subject has or has been diagnosed with a disease or disorder. In some embodiments, the disease or disorder is cystic fibrosis, phenylketonuria, epidermolysis keratosis Epidermolysis (EHK), charcharcharcharcharchar-Marie-tot disease type 4J, Neuroblastoma (NB), von willebrand disease (vWD), myotonia congenitum, hereditary renal amyloidosis, Dilated Cardiomyopathy (DCM), hereditary lymphedema, familial alzheimer disease, HIV, prion diseases, chronic infantile neurocutaneous joint syndrome (CINCA), desmin-related myopathy (DRM), neoplastic diseases associated with mutant PI3KCA protein, mutant CTNNB1 protein, mutant HRAS protein, or mutant p53 protein.
Some embodiments provide methods of editing fusion proteins using Cas9DNA provided herein. In some embodiments, the fusion protein is used to introduce point mutations into nucleic acids by deaminating a target nucleobase, such as a C residue. In some embodiments, deamination of a target nucleobase results in correction of a genetic defect, e.g., correction of a point mutation that results in a loss of function in a gene product. In some embodiments, the genetic defect is associated with a disease or disorder, such as a lysosomal storage disorder or a metabolic disease, such as type I diabetes. In some embodiments, the methods provided herein are used to introduce an inactivating point mutation into a gene or allele encoding a gene product associated with a disease or disorder. For example, in some embodiments, provided herein are methods of introducing inactivating point mutations into oncogenes (e.g., in the treatment of proliferative diseases) using Cas9DNA editing fusion proteins. In some embodiments, the inactivating mutation may result in a premature stop codon in the coding sequence that results in the expression of a truncated gene product (e.g., a truncated protein that lacks the function of the full-length protein).
In some embodiments, the methods provided herein are directed to restoring the function of a dysfunctional gene through genome editing. The Cas9 deaminase fusion proteins provided herein can be used in vitro for gene editing-based human therapeutics, for example, by correcting disease-related mutations in human cell culture. One of skill in the art will appreciate that the fusion proteins provided herein (e.g., fusion proteins comprising a Cas9 domain and a nucleic acid deaminase domain) can be used to correct any single point T → C or a → G mutation. In the first case, deamination of mutant C to U corrects the mutation, in the latter case, deamination of C which base pairs with mutant G, followed by a round of replication corrects the mutation.
An exemplary disease-associated mutation that may be corrected by providing a fusion protein in vitro or in vivo is the H1047R (A3140G) polymorphism in the PI3KCA protein phosphoinositide-3-kinase, catalytic α subunitThe (PI3KCA) protein acts on the 3-OH group of the inositol ring of phosphorylated phosphatidylinositol. The PI3KCA gene has been found to be mutated in many different cancers and is therefore considered to be a potent oncogene.50In fact, the A3140G mutation is present in several NCI-60 cancer cell lines, such as the HCT116, SKOV3 and T47D cell lines, which are readily available from the American Type Culture Collection (ATCC).51
In some embodiments, a cell carrying a mutation to be corrected, e.g., a cell carrying a point mutation such as an a3140G point mutation in exon 20 of the PI3KCA gene (which results in an H1047R substitution in the PI3KCA protein), is contacted with a Cas9 deaminase fusion protein and an appropriately designed sgRNA that targets the fusion protein to the corresponding mutation site in the encoded PI3KCA gene. A control experiment can be performed in which the sgrnas are designed to target the fusion enzyme to a non-C residue within the PI3KCA gene. Genomic DNA of the treated cells can be extracted and the relevant sequences of the PI3KCA gene subjected to PCR amplification and sequencing to assess the activity of the fusion protein in human cell culture.
It should be understood that examples of correcting point mutations in PI3KCA are provided for illustrative purposes and are not meant to limit the present disclosure. One skilled in the art will appreciate that the DNA editing fusion proteins disclosed herein can be used to correct other point mutations and mutations associated with other cancers as well as diseases other than cancer, including other proliferative diseases.
Successful correction of point mutations in disease-associated genes and alleles opens up new strategies for gene correction, which have utility in therapeutics and basic research. The site-specific single-base modification system of Cas9 fusion protein and deaminase or domain as disclosed can also be used for "reverse" gene therapy, where certain gene functions are intentionally inhibited or eliminated. In these cases, site-specific mutations trp (tgg), Gln (CAA and CAG) or arg (cga) residues to a premature stop codon (TAA, TAG, TGA) can be used to eliminate protein function in vitro, ex vivo, or in vivo.
The present disclosure provides methods for treating a subject diagnosed with a disease associated with or caused by a point mutation, which can be corrected by a Cas9DNA editing fusion protein provided herein. For example, in some embodiments, methods are provided that include administering to a subject having such a disease (e.g., a cancer associated with a PI3KCA point mutation as described above) an effective amount of a Cas9 deaminase fusion protein that corrects the point mutation or introduces an inactivating mutation in a disease-associated gene. In some embodiments, the disease is a proliferative disease. In some embodiments, the disease is a genetic disease. In some embodiments, the disease is a neoplastic disease. In some embodiments, the disease is a metabolic disease. In some embodiments, the disease is a lysosomal storage disease. Other diseases that can be treated by correcting point mutations or introducing inactivating mutations into disease-associated genes are known to those of skill in the art and the disclosure is not limited in this regard.
The present disclosure provides methods for treating additional diseases or disorders associated with or caused by point mutations that can be corrected by deaminase-mediated gene editing. Some such diseases are described herein, and other suitable diseases that can be treated using the strategies and fusion proteins provided herein will be apparent to those skilled in the art based on this disclosure. Exemplary suitable diseases and conditions are listed below. It will be appreciated that the numbering of particular positions or residues in the corresponding sequences will depend on the particular protein and numbering scheme used. The numbering may be different, for example in the precursor of the mature protein and the mature protein itself, and sequence differences from species to species may affect the numbering. One skilled in the art will be able to identify corresponding residues in any homologous protein and corresponding coding nucleic acid by methods well known in the art, such as by sequence alignment and determination of homologous residues. Exemplary suitable diseases and conditions include, but are not limited to, cystic fibrosis (see, e.g., Schwank et al, Functional repair of CFTR by CRISPR/Cas9in intestinalstem cell organisms of cystic fibrosis paCell stem cell.2013; 13: 653-; and Wu et al, Correction of a genetic disease in mouse use of CRISPR-Cas9.cell stem cell.2013; 13:659-662, neither of which uses a deaminase fusion protein to correct a genetic defect); phenylketonuria-phenylalanine to serine mutations (T) at, for example, position 835 (mouse) or 240 (human) or homologous residues of the phenylalanine hydroxylase gene>C mutation) -see, e.g., McDonald et al, genomics.1997; 39: 402-; giant platelet syndrome (BSS), such as phenylalanine to serine mutations at residue 55 or a homologous residue, or cysteine to arginine mutations at residue 24 or a homologous residue in platelet membrane glycoprotein IX-see, e.g., Noris et al, British Journal of haematology.1997; 97: 312-; 93: 381-384; epidermolysis Hyperkeratosis (EHK) -for example the leucine to proline mutation (T) at position 160 or 161 (if the initial methionine is calculated) in keratin 1>C mutation) -see, e.g., Chipev et al, cell.1992; 70:821-828, see also www dot]uniprot[dot]Accession number P04264 in the UNIPROT database of org, Chronic Obstructive Pulmonary Disease (COPD) -e.g. α1-position 54 or 55 (if the initial methionine is calculated) or a homologous residue in a processed form of antitrypsin, or a leucine to proline mutation (T) at residue 78 or a homologous residue in an unprocessed form>C mutation) -see, e.g., Poller et al, genomics.1993; 740-; Charcot-Marie-Toot disease type 4J-isoleucine to threonine mutation at position 41 or a homologous residue, for example in FIG4 (T)>C mutations) -see, e.g., Lenk et al, PLoS genetics.2011; 7: e 1002104; neuroblastoma (NB) -leucine to proline mutation (T) at position 197 or at a homologous residue in, for example, caspase-9>C mutation) -see, e.g., Kundu et al, 3biotech.2013,3: 225-; von Willebrand disease (vWD) -e.g., a cysteine to arginine mutation at position 509 or a homologous residue in the processed form of von Willebrand factor, or at position 1282 or a homologous residue in the unprocessed form of von Willebrand factor (T)>C mutation) -see, e.g., laver et al, br.j.haematol.1992, also see accession number P04275 in UNIPROT database; 82: 66-72;myotonia congenita-for example, a cysteine to arginine mutation at position 277 or a homologous residue in the muscle chloride channel gene CLCN1 (T)>C mutation) -see, e.g., Weinberger et al, The j.of physiology.2012; 590: 3449-; hereditary renal amyloidosis-e.g. a stop codon to arginine mutation (T) at position 78 or a homologous residue in the processed form of apolipoprotein AII or at position 101 or a homologous residue in the unprocessed form>C mutation) -see, e.g., Yazaki et al, Kidney int.2003; 64: 11-16; dilated Cardiomyopathy (DCM) -e.g. a tryptophan to arginine mutation at position 148 or at a homologous residue in the FOXD4 gene (T)>C mutation) -see, e.g., minoettietet.al, int.j.of mol.med.2007; 369-; hereditary lymphedema-e.g. a histidine to arginine mutation at position 1035 or a homologous residue in VEGFR3 tyrosine kinase (a)>A G mutation); see, e.g., Irrthum et al, am.j.hum.genet.2000; 67: 295-; familial Alzheimer's disease-isoleucine to valine mutation at position 143 or a homologous residue, e.g., in presenilin 1 (A)>G mutations), see, e.g., galvo et al, j.alzheimer's disease.2011; 25: 425-; raney virus diseases-for example, a methionine to valine mutation at position 129 or a homologous residue in a protein of the Raney virus (A)>G mutations) -see, e.g., Lewis et al, j.of General virology.2006; 2443-2449; chronic infantile neurological cutaneous joint syndrome (CINCA) -a tyrosine to cysteine mutation (A) at position 570 or at a homologous residue in, for example, cryopyrin>G mutations) -see, e.g., Fujisawa et al. blood.2007; 109: 2903-2911; and desmin-related myopathies (DRM) -e.g., the arginine-to-glycine mutation at position 120 or a homologous residue in the crystallin of α B (A mutation)>G mutation) -see, e.g., Kumar et al, j.biol.chem.1999; 274:24137-24141. All references and database entries are incorporated herein by reference in their entirety.
The present disclosure provides a list of genes comprising pathogenic T > C or a > G mutations that can be corrected using any of the Cas9 fusion proteins provided herein. The names of these genes, their respective SEQ ID NOs, their gene IDs, and the sequences flanking the mutation sites are provided herein. See tables 4 and 5. Without wishing to be bound by any particular theory, the mutations provided in tables 4 and 5 can be corrected using the Cas9 fusions provided herein, which Cas9 fusions are capable of binding target sequences lacking the canonical PAM sequence. In some embodiments, the Cas 9-deaminase fusion protein exhibits activity against non-canonical PAM and thus can correct all pathogenic T > C or A > G mutations listed in Table 4 and Table 5, respectively (SEQ ID NOS: 674-2539 and 3144-5083). In some embodiments, the Cas 9-deaminase fusion protein recognizes canonical PAM, and thus can correct for pathogenic T > C or a > G mutations with canonical PAM, e.g., 5 '-NGG-3'. It is understood that one of skill in the art would understand how to design an RNA (e.g., a gRNA) to target any Cas9 protein or fusion protein provided herein to any target sequence to correct any mutation provided herein, e.g., the mutations provided in tables 4 and 5. It will be apparent to those skilled in the art that in order to convert Cas9 disclosed herein: targeting of the effector domain fusion protein to a target site, e.g., a site comprising a point mutation to be edited, typically requires co-expression of Cas 9: effector domain fusion proteins, and guide RNAs, such as sgrnas. As explained in more detail elsewhere herein, the guide RNA typically comprises a tracrRNA framework that allows Cas9to bind, and a sequence that confers Cas 9: effector domain fusion protein sequence-specific leader sequences. In some embodiments, the guide RNA comprises the structure 5'- [ guide sequence ] -guuuuagagcuagaaauagcaaguuaaaauaaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu-3' (SEQ ID NO:285), wherein the guide sequence comprises a sequence complementary to the target sequence. The leader sequence is typically 20 nucleotides long. Based on the present disclosure, for comparing Cas 9: suitable guide RNA sequences for targeting effector domain fusion proteins to specific genomic target sites will be apparent to those skilled in the art. Such suitable guide RNA sequences typically comprise a guide sequence that is complementary to a nucleic acid sequence within 50 nucleotides upstream or downstream of the target nucleotide to be edited.
Kit, vector and cell
Some aspects of the present disclosure provide kits comprising a nucleic acid construct comprising (a) a nucleotide sequence encoding a Cas9 protein or a Cas9 fusion protein provided herein; and (b) a heterologous promoter that drives expression of the sequence of (a). In some embodiments, the kit further comprises an expression construct encoding a guide RNA backbone, wherein the construct comprises a cloning site positioned to allow cloning of a nucleic acid sequence identical or complementary to a target sequence into the guide RNA backbone.
Some aspects of the disclosure provide polynucleotides encoding Cas9 proteins of the fusion proteins provided herein. Some aspects of the disclosure provide vectors comprising such polynucleotides. In some embodiments, the vector comprises a heterologous promoter that drives expression of the polynucleotide.
Some aspects of the present disclosure provide cells comprising Ca9 proteins, fusion proteins, nucleic acid molecules, and/or vectors provided herein.
The description of the exemplary embodiments of the reporting system provided herein is for illustrative purposes only and is not meant to be limiting. Additional reporting systems, such as variations of the exemplary system described in detail above, are also included in the present disclosure.
Examples
Example 1: PACE evolution of Cas9 without PAM sequence restriction
And (5) establishing a PAM library. Four different protospacer (protospacer) target sequences were synthesized: doench 1-5'-AAGAGAGACAGTACATGCCC-3' (SEQ ID NO: 286); doench 2-5'-GGAGCCCACCGAGTACCTGG-3' (SEQ ID NO: 287); g7 ' -5'-AGTCTCCTCAGCAAAACGAA-3' (SEQ ID NO: 288); and VEGF target 2-5'-GACCCCCTCCACCCCGCCTC-3' (SEQ ID NO: 289). For each protospacer target sequence, a 3' -NNNPAM library was created. Although the canonical PAM sequence is 5'-NGG-3' (e.g., exemplary [ Doench1 ]]-[Canonical PAM]The target sequence may be 5' - [ AAGAGAGACAGTACATGCCC]-[NGG]3'(SEQ ID NO:291)), each 3' NNN PAM library of protospacer target sequences containedFully random PAM sequences, e.g. for Doench 15' -AAGAGAGACAGTACATGCCCNNN-3' (SEQ ID NO:290) wherein N represents any nucleotide. Thus, the NNN PAM library included every possible combination of PAM sequences at the 3' end of the corresponding protospacer target sequence.
Cas9 activity against PAM libraries was tested in an omega-dCas 9 luciferase assay. Cas9 activity was tested using a bacterial luciferase activation assay in which fusion proteins of the omega subunit of E.coli RNA polymerase (rpoZ) with dCas9 (see, e.g., Bikard et al, Nucleic Acids Res.2013Aug; 41(15): 7429-7437) drive production of luciferase encoded by Nucleic Acids under the control of a weak promoter comprising the sequence targeted by sgNNA. Each PAM library was cloned into a plasmid containing this weak promoter, with the [ target sequence ] - [ PAM library ] nucleic acid sequences used as sgRNA-targeted sequences. The PAM library was cloned into the promoter. The ω -dCas9 assay was performed on all four protospacer targets of the canonical PAM and random PAM libraries. FIG. 1 shows the activity of wild-type Streptococcus pyogenes on PAM libraries.
Evolution of Cas9 on PAM libraries. Streptococcus pyogenes Caas9 is fused to the ω unit of RNA polymerase. The resulting omega-dCas 9 fusion protein was cloned into a M13 phage-based Selection Phagemid (SP) containing the complete M13 phage genome except for the functional version of the gene encoding pIII, which is essential for the generation of infectious phage particles. The phage gene encoding pIII was provided on a separate plasmid (helper plasmid, AP) under the control of a promoter transcriptionally activated by ω -dCas 9. The PAM library was cloned into the promoter region of the helper plasmid. Host cells for directed evolution of Cas9 protein without PAM restriction are provided, containing helper plasmids. The amount of infectious phage particles produced by a given host cell after infection with a selective phage is thus dependent on the activity of the ω -dCas9 fusion protein encoded by the selective phage on the promoter of the helper plasmid required for the production of the pIII protein. Thus, the helper plasmid confers the selective advantage of those selective phages encoding omega-dCas 9 fusion protein variants with increased activity against different non-canonical PAM sequences.
A lagoon (lagoon) is provided and a stream of host cells comprising helper plasmids is produced by the lagoon. The host cells are contacted with the selection phagemid resulting in a population of selection phage that propagate in the stream of host cells in the lagoon. Phage-infected host cells are removed from the lagoon and fresh uninfected host cells are fed into the lagoon at a rate that results in host cells remaining in the lagoon for an average time that is shorter than the average time between cell divisions of the host cells, but longer than the average M13 phage life cycle time.
To generate Cas9 variants during directed evolution experiments, host cells in lagoon were incubated under conditions that resulted in increased mutation rates. The host cells carry a Mutagenized Plasmid (MP) which increases the rate of mutagenesis, thus introducing mutations in the ω -dCas9 fusion protein encoded by the selection phagemid during the phage life cycle. Because the flow rate of the host cells through the lagoon results in the average time the host cells remain in the lagoon being shorter than the average time between host cell divisions, the host cells in the lagoon are unable to accumulate mutations in their genome or on the helper plasmid resulting from the increased mutation rate conferred by the mutagenized plasmid. However, the selection phage replicated in the host cell stream in lagoons, thus accumulating mutations over time, resulting in the production of new, evolved ω -dCas9 fusion protein variants.
If any of these evolved omega-dCas 9 fusion protein variants include mutations that confer increased activity on helper plasmids containing PAM libraries, this will translate directly into host cells infected with a selection phage encoding the corresponding omega-dCas 9 fusion protein variant yielding more pIII encoding. More pIII production will in turn lead to the production of more infectious selection phage particles, which over time lead to a competitive advantage of mutant selection phage carrying such beneficial mutations over selective phage not comprising such mutations. After a period of time, the selective pressure exerted by the helper plasmid will thus result in the selected phage having acquired the beneficial mutation being the dominant species for replication in the host cell stream, while the selected phage having no mutation or a deleterious mutation will be washed out of the lagoon.
Since the activity of the ω -dCas9 fusion protein on the PAM library was very low at the start of the experiment, phagemids were selected for multiple rounds of overnight propagation in host cells harboring helper plasmids containing the PAM library to evolve Cas9 variants showing increased activity on non-canonical PAM sequences. At the end of the directed evolution experiment, the evolved selective phage population was isolated from lagoon and a representative number of clones were analyzed to test Cas9 variants with beneficial mutations. Although all of the mutations observed provide beneficial phenotypes, mutations shared by more than one clone or all clones are of particular interest.
Mutation from Cas9 PACE. Several selected phage clones were isolated from directed evolution experiments using PAM library helper plasmids as described above. Mutations identified in the Cas9 amino acid sequence of some exemplary clones are provided in table 1 (residue numbering according to SEQ ID NO: 9):
table 1-Cas 9 mutations identified in PACE (residue numbering according to SEQ ID NO: 9).
Clones 1-4 were tested in the omega-dCas 9 luciferase activity assay described above. When tested on a PAM library as a whole, the different clones showed an improvement in luciferase expression (fig. 2-Cas 9 activity of the exemplary evolved clones on the PAM library after directed evolution).
Improvement of Cas9 activity for non-canonical PAM sequences. The activity of Cas9 protein on target sequences with non-canonical PAM was evaluated in more detail. Clone 4, carrying the I122, D182 and E1219V mutations, was tested for activity against various [ Doench 2 (5'-GGAGCCCACCGAGTACCTGG-3' (SEQ ID NO:287)) ] - [ PAM ] target sequences in an omega-dCas 9 luciferase assay and compared to the activity of wild-type dCas 9.
Improvement of Cas9 activity for non-canonical PAM sequences. The activity of Cas9 protein on target sequences with non-canonical PAM was evaluated in more detail. Clone 4, carrying the I122, D182 and E1219V mutations, was tested for activity against various [ Doench 2 (5'-GGAGCCCACCGAGTACCTGG-3' (SEQ ID NO:287)) ] - [ PAM ] target sequences in an omega-dCas 9 luciferase activation assay and compared to the activity of wild-type dCas 9. The data are shown in table 2.
TABLE 2 relative Activity of clone 4 on various PAM sequences
Example 2 PACE evolution of Cas9 without any PAM sequence restriction
Because the activity of the ω -dCas9 fusion protein on NNN-PAM libraries was very low, a second round of PACE experiments was performed in which the initial phase of diversification of the ω -dCas9 fusion protein population was performed without selective pressure by providing pIII sources independent of ω -dCas9 fusion protein activity. The initial diversification phase allows mutations to be formed that were not available in PACE experiments where selective pressure was applied throughout the experiment.
Selection phages carrying the omega-dCas 9 fusion protein with dCas9 sequence provided by SEQ ID NO 8 and having the D10A and H840A mutations were propagated overnight in 1030 host cells together with MP6 mutagenized plasmids in the presence of arabinose to create libraries of mutated selection phages encoding libraries of omega-dCas 9 fusion protein variants. PIII is expressed from a separate plasmid in the host cell during this initial diversification phase. After overnight (12h) diversification, 1030 host cells carrying helper and mutagenized plasmids containing NNN PAM libraries cloned into weak promoters as guide RNA target sequences were grown to log phase and used as a source of host cells to create a host cell stream through lagoon. Cells in lagoon were infected with diversified selection phages from overnight incubation. The host cells in the lagoon were contacted with arabinose to maintain high level expression of the mutagenized gene in the mutagenized plasmid.
Initial phage titer was about 108pfu/mL. For four NNN-PAM libraries ([ Doench1 ]]-[NNN-PAM],[Doench 2]-[NNN-PAM],[G7]-[NNN-PAM]And [ VEGF target]-[NNN-PAM]Cloned into a helper plasmid driving pIII expression from a weak promoter, as described above) for each of the PACE experiments. Phage titers were monitored during the PACE experiments. A slow drop in phage titer to 10 was observed4pfu/mL. At this time point the phage population was isolated from the lagoon and grown on 2208 host cells containing a separate pIII source (psp-driven pIII). After this low stringency breeding period, a 1:100 dilution of the supernatant was added to fresh host cells in a new lagoon, carrying the helper plasmid as the sole source of pIII, and the PACE experiment was continued. No decrease in phage titer was observed after this low stringency incubation in 2208 cells.
An exemplary PACE experiment was run for 72 hours. After this time, 24 surviving clones were isolated from the lagoon, sequenced and characterized. The mutations identified included A262T, K294R, S409I, M694I, E480K, E543D and E1219V (amino acid numbering according to SEQ ID NO: 9). In another exemplary experiment, surviving clones were isolated after 15 days of incubation. The activity of the identified dVas9 mutants was characterized in an omega-dCas 9 luciferase assay. Clones with the best omega-dCas 9 fusion protein activity against non-canonical PAM target sequences had the following mutations: E480K, E543D, E1219V and T1329.
Cas9 mammalian GFP activation. Wild-type dCas9(SEQ ID NO:9) and evolved Cas9 clones were tested in a dCas9-GFP assay in Hek293T cells. The cells are contacted with a reporter construct in which the GFP coding sequence is driven by a weak promoter comprising the [ gRNA target sequence ] - [ PAM ] sequence. Fusion proteins of dCas9 (wild-type and PACE variants) attached to the transcriptional activator VP64-p65-rta (VPR) were generated and various dCas9-VPR variants were tested for their ability to activate GFP reporter in HEK293 cells.
Four separate plasmids were used to transfect Hek 293T: dCas9-VPR expression plasmid; a plasmid expressing sgRNA targeting the gRNA target sequence of the GFP reporter plasmid; a GFP reporter plasmid; and iRFP transfection controls. In one experiment, HEK293 cells were contacted with a GFP reporter comprising TAA PAM, and in another experiment HEK293 cells were contacted with a population of reporter plasmids containing a NNN PAM library. Cells were harvested 48 hours post-transfection and cells expressing GFP were quantified using a BD LSR-FORTESSA cell analyzer.
Figure 3-Cas9 mammalian GFP activation. Compared to WT Cas9, evolved Cas9 showed higher activity on TAA PAM (21.08% vs. 0.60% cells over negative control) and NNN PAM library (22.76% vs. 3.38% cells over negative control).
Cleavage activity of evolved Cas9 for target sequences with non-canonical PAMs. To demonstrate that PACE mutations universally confer Cas9 activity under PAM restriction, nuclease-active Cas9 protein was generated based on the provided sequences, i.e., without the D10A and H840A mutations, but with various PACE mutations. Evolved Cas9 variants were tested in Cas9GFP assay and the ability of evolved Cas9 protein variants to target and inactivate the emGFP gene integrated into the HEK293 cell genome was assessed using guide RNAs targeting sequences with non-canonical PAMs. It was observed that 6.45% of the cells showed a deletion of GFP expression when contacted with wild-type nuclease-active Cas9(SEQ ID NO:9), while 54.55% of the cells contacted with evolved Cas9(E480K, E543D, E1219V and T1329) showed a deletion of GFP expression.
Example 3: cas9 variants without PAM restriction
Beneficial mutations conferring Cas9 activity on non-canonical PAM sequences were mapped to the streptococcus pyogenes wildtype sequence. The following is an exemplary Cas9 sequence (Streptococcus pyogenes Cas9 with D10 and H840 residues, marked with an asterisk after the corresponding amino acid residues, SEQ ID NO: 9). Residues D10 and H840 of SEQ ID NO 9 can be mutated to produce a nuclease-free Cas9 (e.g., to D10A and H840A) or a nickase Cas9 (e.g., to D10A in the case of H840; or to D10 in the case of H840A). The HNH domain (bold and underlined) and RuvC domain (boxes) were identified. The residues found to be mutated in clones isolated from multiple PACE experiments, amino acid residues 122,137,182,262,294,409,480,543,660,694,1219 and 1329, were marked with an asterisk following the corresponding amino acid residue.
Beneficial mutations conferring Cas9 activity on non-canonical PAM sequences were mapped to additional exemplary wild-type sequences. The HNH domain (bold and underlined) and RuvC domain (boxes) were identified. Residues homologous to the residues found mutated in SEQ ID NO. 9, amino acid residues 122,137,182,262,294,409,480,543,660,694,1219 and 1329, are marked with an asterisk following the corresponding amino acid residue. In addition, amino acid residues 10 and 840 mutated in variants of dCas9 protein are also identified by asterisks.
The present disclosure provides Cas9 variants in which one or more amino acid residues identified by asterisks are mutated as described herein. In some embodiments, the D10 and H840 residues are mutated, e.g., to alanine residues, and the provided Cas9 variants include one or more additional mutations of the asterisked identified amino acid residues as provided herein. In some embodiments, the D10 residue is mutated, e.g., to an alanine residue, and the provided Cas9 variants include one or more additional mutations of the asterisked identified amino acid residues as provided herein.
Many Cas9 sequences from different species were aligned to determine whether corresponding homologous amino acid residues could be identified in other Cas9 proteins, allowing Cas9 variants to be generated with corresponding mutations of the homologous amino acid residues. The Alignment was performed using NCBI Constraint-based Multiple Alignment Tool (COBALLT, available at st-va. NCBI. nlm. nih. gov/tools/COBALT) using the following parameters. And (3) comparison parameters: gap penalty-11, -1; end gap penalties-5, -1. CDD parameters: opened using RPS BLAST; blast E-value 0.003; look up the conservative column and recalculate it. And inquiring clustering parameters: clustering by using a query; word size 4; the maximum clustering distance is 0.8; AlphabetRegular.
An exemplary alignment of four Cas9 sequences is provided below. The Cas9 sequence in the alignment is: sequence 1 (S1): 10| WP _010922251| gi 499224711| II type CRISPR RNA-directed endonuclease Cas9[ Streptococcus pyogenes ]; sequence 2 (S2): 11| WP _039695303| gi 746743737| type II CRISPR RNA directed endonuclease Cas9[ Streptococcus galelliticus ]; sequence 3 (S3): 12| WP _045635197| gi782887988| type II CRISPR RNA directed endonuclease Cas9[ Chronic streptococci ]; sequence 4 (S4): 13|5AXW _ A | gi 924443546| S.pyogenes aureus Cas9. The HNH domain (bold and underlined) and RuvC domain (boxes) were identified for each of the four sequences. The asterisks following the corresponding amino acid residues were used to identify amino acid residues 10,122,137,182,262,294,409,480,543,660,694,840,1219 and 1329 in S1 and homologous amino acids in the aligned sequences.
Alignment indicates that amino acid sequences and amino acid residues homologous to the reference Cas9 amino acid sequence or amino acid residues can be identified among Cas9 sequence variants, including but not limited to Cas9 sequences from different species, which identify amino acid sequences or residues that are aligned to the reference sequence or reference residues by using alignment programs and algorithms known in the art. The present disclosure provides Cas9 variants in which one or more amino acid residues identified by an asterisk in SEQ ID No. 9 are mutated as described herein. Residues in the Cas9 sequence other than SEQ ID No. 9 that correspond to the residues identified by asterisks in SEQ ID No. 9 are referred to herein as "homologous" or "corresponding" residues. Such homologous residues can be identified by sequence alignment, e.g., as described above and by identifying sequences or residues that align with a reference sequence or residue. Similarly, mutations in the Cas9 sequence other than SEQ ID NO:9 that are identical to the mutations identified herein in SEQ ID NO:9, e.g., residues 10,122,137,182,262,294,409,480,543,660,694,840,1219 and 1329 in SEQ ID NO:9, are referred to as "homologous" or "corresponding" mutations. For example, the mutations in the above four aligned sequences corresponding to the D10A mutation in S1 are D10A of S2, D9A of S3 and D13A of S4; the corresponding mutations of H840A in S1 were H850A of S2, H842A of S3 and H560 of S4; mutations corresponding to X1219V in S1 are X1228V of S2, X1226 of S3, and X903V of S4, and the like.
A total of 250 Cas9 sequences (SEQ ID NOS: 10-262) from different species were aligned using the same algorithms and alignment parameters described above. Residues homologous to residues 10,122,137,182,262,294,409,480,543,660,694,840,1219 and 1329 were identified in the same manner as described above. The alignment is provided below. The HNH domain (bold and underlined) and RuvC domain (boxes) were identified for each of the four sequences. In the alignment, the residues corresponding to amino acid residues 10,122,137,182,262,294,409,480,543,660,694,840,1219 and 1329 in SEQ ID NO:9 are shown in box in SEQ ID NO:10 in the alignment, allowing the corresponding amino acid residues to be identified in the aligned sequences.
Provided herein are Cas9 variants having one or more mutations in amino acid residues homologous to amino acid residues 122,137,182,262,294,409,480,543,660,694,1219 and 1329 of SEQ ID NO: 9. In some embodiments, the Cas9 variants provided herein comprise mutations corresponding to the D10A and H840A mutations in SEQ ID No. 9 that result in dCas9 with NO nuclease activity and at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations to amino acid residues 122,137,182,262,294,409,480,543,660,694,1219 and 1329 in SEQ ID No. 9.
Provided herein are Cas9 variants having one or more mutations in amino acid residues homologous to amino acid residues 122,137,182,262,294,409,480,543,660,694,1219 and 1329 of SEQ ID NO: 9. In some embodiments, the Cas9 variants provided herein comprise a mutation corresponding to D10A in SEQ ID No. 9 that results in partially nuclease inactive dCas9, wherein dCas9 can nick non-target strands but not target strands, and at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations homologous to amino acid residues 122,137,182,262,294,409,480,543,660,694,1219 and 1329 in SEQ ID No. 9.
Additional suitable Cas9 sequences are known to those of skill in the art, where amino acid residues homologous to residue 10,122,137,182,262,294,409,480,543,660,694,840,1219 and/or 1329 of SEQ ID NO:9 can be identified. See, e.g., Fonfara et al, phenyl of Cas9 derivatives functional exchange availability of dual-RNA and Cas9. ang. organic type II CRISPR-systems, Nucl. acids Res.2013, doi:10.1093/nar/gkt1074, supplementary Table S2 and supplementary FIG. S2, the entire contents of which are incorporated herein by reference. The present disclosure provides Cas9 variants of the sequences provided herein or known in the art, comprising one or more mutations provided herein (e.g., at least one, at least two, at least three, at least four, at least five, at least six, or at least seven), e.g., one or more amino acid residues homologous to amino acid residue 10,122,137,182,262,294,409,480,543,660,694,840,1219 and/or 1329 in SEQ ID NO:9, e.g., Cas9 variants comprising a262T, K294R, S409I, E480K, E543D, M694I, and/or E1219V mutations.
Example 4: evolution of Cas9 with broadened PAM specificity
By evolving streptococcus pyogenes Cas9 on NNN PAM libraries using PACE, Cas9 with broadened PAM specificity was evolved, which has higher activity against many non-canonical PAMs. Such Cas9 still retains its native DNA binding and cleavage activity and can be used with all existing tools. It is speculated that by modulating Cas9 interaction with DNA, the PAM specificity of Cas9 may be modified and extended. Other Cas9, such as staphylococcus aureus, can also be engineered by such methods to alter and expand their PAM specificity. Methods of modulating DNA binding such as targeted mutagenesis of Cas9 protein, fusion with DNA binding proteins and the use of multiple Cas9 proteins linked to each other can also expand PAM that can be targeted.
Cas9 evolved. After evolution by overnight propagation of phages in the case of Mutagenized Plasmids (MP), the resulting phages containing the above mutations in PACE were used. 24 individual phages from the PACE run were sequenced. Mutations found in the Cas9 gene are recorded in table 3 below. The Cas9 gene containing these mutations was cloned from phage and introduced into plasmids to test DNA binding and cleavage activity.
Table 3: cas9 mutation
GFP activation in human cell culture. The reporter was tested with dCas9-VPR activated GFP reporter. The test was first performed on 5'-NGG-3' PAM (fig. 4A and 4B), and then using GFP reporter libraries all containing the same Cas9 target site, but with NNN at the PAM position (fig. 5A and 5B). In this test, dCas9-VPR was found to activate the GFP signal. Wild-type Cas9(pJH306) without mutations, Cas9(pJH562) evolved from overnight propagation and Cas9(pJH599-pJH605) evolved from PACE were tested. Mutations of each Cas9 are described in table 3 above.
dCas9-VPR over all 64 PAM sequences. pJH306(WT dCas9-VPR) and pJH599 (WT dCas9-VPR with mutations A262T, S409I, E480K, E543D, M694I and E1219V) were tested on all 64 PAM sequences (FIG. 6). The GFP was activated using dCas9-VPR as described previously, and a different reporter plasmid was used for each well to determine the activation efficiency for all 64 different PAM sequences. The average GFP fluorescence of all transfected cells was measured, as gated by iRFP signal. For all PAM sequences, pJH599 showed improved or similar activation levels compared to pJH 306.
In vitro cleavage assay. Expressed and purified WT Cas9(WT) and Cas9 with the E1219V mutation (1) were tested for their ability to cleave DNA with different PAMs (fig. 7). Cas9 was incubated with dsDNA containing the target site. Cleavage was measured by running the DNA on a gel to compare the amount of uncut product to the amount of cleaved product, which ran faster due to its smaller size. The E1219V mutation was found to increase the cleavage activity of Cas9 for non-canonical PAM, while maintaining the activity for 5'-NGG-3' PAM.
Different systems have evolved. In addition to the evolution of streptococcus pyogenes Cas9, other Cas9 systems, such as staphylococcus aureus, streptococcus thermophilus, neisseria meningitidis, and t. The data show that phages containing staphylococcus aureus Cas9 can also be evolved to extend their PAM specificity by using a system similar to that evolved for streptococcus pyogenes Cas9.
Adjusting PAM specificity. By mutating neutral and negatively charged amino acids to positively charged amino acids, Cas9 can be modified to expand the PAM that can be targeted. Generally, incorporation of mutations into Cas9 protein that produce a net increase in positive charge can increase the affinity of Cas9 for binding to DNA. Other residues that can be mutated to increase PAM targeting in streptococcus pyogenes Cas9, in combination with Cas9 mutations provided herein, further include those that have been identified as altering PAM specificity (D1135, G1218, R1333, R1335, T1337)38And residues that can increase Cas9 activity (S845 and L847)37. Previously identified residues that increase Cas9 specificity, such as arginine, histidine and lysine, were mutated to alanine37And previously identified mutations of asparagine, arginine and glutamine to alanine39May result in reduced tolerance to non-canonical PAM, as these mutations may reduce the interaction between Cas9 and DNA.
Fusions that modulate PAM specificity. Programmable DNA binding proteins such as zinc finger domains, TALEs and other Cas9 proteins can be fused to Cas9to improve the ability to target nucleotide sequences with canonical or non-canonical PAMs, e.g., to increase activity, specificity or efficiency. Nuclease-null dCas9 can be fused to nuclease-active Cas9to increase the ability of nuclease-active Cas9to target different PAM sequences. An example of nuclease-null dCas9 fused to nuclease-active Cas9 is shown in figure 8. Such fusions are useful for improving the ability to target nucleotide sequences having canonical or non-canonical PAMs. Cas9 may be from the same species or different species. In addition, these two Cas9 can be nuclease-null dCas9, and can be further fused to effector proteins such as VP64, VP64-p65-Rta, FokI, GFP and other fluorescent proteins, deaminase or any of the effector proteins provided herein.
Cas9 was used to locate other nucleases and other DNA binding proteins. Cas9 can also be used to overcome the natural binding specificity of other proteins by targeting them to their DNA targets. DNA nucleases, recombinases, deaminases and other effectors usually have natural DNA specificity. Cas9 can be fused to these proteins to overcome and extend their native DNA specificity. The grnas target Cas9 near the DNA effector target site and will help to localize them to their target site.
dCas9-VPR on NNNNN PAM library. To test that the evolved Cas9 did not gain specificity at the 4 th and 5 th PAM positions, dCas9-VPR was tested on NNNNN PAM libraries. As seen using the NNN library, the majority of constructs (e.g., pJH562, pJ559, pJH600, pJH601, pJH602, pJH603 and pJH605) showed improved activity. pJH599 consistently showed improvement in both the percentage of cells showing GFP activation (FIG. 9A) and the mean fluorescence of those cells showing GFP activation (FIG. 9B).
Cas9GFP cleavage. WT Cas9, pJH407 was compared to nuclease positive evolved Cas9, pJH760 (fig. 10). pJH760 contained the same mutation as pJH599, but no D10A and H480A nuclease inactivating mutations, and no VPR fusion. Cas9 cleaves the genome-integrated GFP gene and activity is measured by loss of GFP signal in the cell. At the site with 5'-NGG-3' PAM, pJH407 and pJH760 showed comparable activity. At the site with GATPAM, pJH760 showed a significant increase in activity compared to pJH 407.
Example 5: cas 9: DNA editing enzyme fusion proteins
The present disclosure also provides Cas9 fused to a DNA editing enzyme for targeted editing of DNA sequences. Fig. 11 shows DNA editing enzyme-dCas 9: double-stranded DNA substrate binding of the sgRNA complex. The DNA editing enzyme shown is deaminase. Among these sequences the structure according to FIG. 11 was identified (36 bp: underlined, sgRNA target sequence: bold; PAM: boxed; 21 bp: italic).
Example 6: PAM depletion assay
In E.coli, a PAM sequence library is encoded by a plasmid that also contains an antibiotic gene. If Cas9 is able to cleave the PAM sequence on the plasmid, the plasmid is not replicated and lost; only uncleaved plasmids remain in the population. The plasmids cleaved by Cas9 can be determined by sequencing the initial and final plasmid populations by high throughput sequencing. The proportion of the library consisting of each PAM sequence was obtained by dividing the number of reads containing the PAM sequence by the total number of reads. The depletion score was then calculated by dividing the proportion of the libraries containing PAM moieties before selection by the proportion of the libraries containing PAM sequences after selection. A higher depletion score indicates a higher cleavage activity of Cas9 for this particular PAM sequence. Fig. 12 shows the results of the PAM depletion assay.
Some PAM sequences that were not cleaved with wild-type Cas9 were cleaved by the evolved Cas9(xcas9v1.0, pJH 760). Notably, all PAM sequences in the NGN and NNG formats, as well as GAA and GAT, showed greater than 10-fold depletion using xCas 9. A single G at the second or third PAM position can be sufficient for cleavage by the newly evolved Cas9, thereby significantly opening sequence space for targeting a target site that can be targeted using Cas9. The PAM depletion fractions are given in table 4.
Table 4: PAM exhaustion fraction
Example 7: GFP cleavage of mammalian cells
Either wild-type or evolved Cas9 and grnas were transfected into mammalian cells containing a genome-integrated GFP gene. Different grnas target different sites with different PAM sequences, so that cleavage of GFP by Cas9 will result in loss of GFP signal. The GFP signal was quantified after 5 days with flow cytometry. As shown in fig. 13, the evolved Cas9 cleaved GFP in mammals, which is consistent with the results in the GFP activation assay and PAM depletion assay. High throughput sequencing around the cleavage site verified the results seen by flow cytometry; insertion loss (indel) is proportional to the percent cleavage seen by flow cytometry.
Example 8: further evolution of HHH libraries
Since SpCas9 has a preference for G residues at the second and third bases, evolution continues using the end point of the last evolution from HHH (H ═ a, C or T) PAM libraries. After evolution, 13 colonies were sequenced and many new mutations were identified. Three mutations, E1219V, E480K and E543D, were found in all clones. Many clones had the S267G/K294R/Q1256K mutation or the A262T/S409I mutation, but those mutations were never seen together, indicating that the clones took two different paths along the evolving panorama. Table 5 gives the new mutations.
Table 5: novel mutations
Example 9: cas9 protein for further evolution of gene editing
The pJH760 described in example 6 was tested on a number of new targets in a PAM depletion assay. Four new targets were selected: re2(GGGGCCACTAGGGACAGGAT (SEQ ID NO: 314)), synthetic targets previously used for GFP activation in mammalian cells; VEGF (GGGTGGGGGGAGTTTGCTCC (SEQ ID NO: 315)), a target within the VEGF gene, CLTA (GCAGATGTAGTGTTTCCACA (SEQ ID NO: 316)), a target within the CLTA gene; and CCR5D (TCACTATGCTGCCGCCCAGT (SEQ ID NO: 317)), a target within the CCR5D gene. The results of the PAM depletion assay are given in figures 14 to 17. It was found that PAM depletion in addition to canonical NGG sequences also showed cleavage of most NGN and some NNG sequences.
The HHH PAM library was further evolved using the end point from the last evolution on the HHH (H ═ a, C or T) PAM library. After evolution, 13 colonies were sequenced and many new mutations were identified. Three mutations, E1219V, E480K and E543D, were found in all clones. Many clones had either the K294R/Q1256K mutation or the A262T/S409I mutation, but those mutations were never seen together, indicating that the clones took two different paths along the evolution landscape. The new mutations are given in tables 8 and 9 below.
As expected, changes in activity were seen using different targets. PAM depletion assay scores are given in table 10. NGN consistently showed cleavage activity of some targets. Changes were seen in xcas93. x mutants in which mutant had the best activity. Notably, xcas93.3 contained the K294R/Q1256K series of mutations, while the other three mutations (3.6, 3.7 and 3.8) contained the a262T/S409I series of mutations. xcas93.6 and 3.7 performed better than 3.8. While 3.3 appears to have the highest activity in most cases, 3.6 and 3.7 perform better on certain PAM sequences. Figures 18 to 20 give the results of the PAM depletion assay for the three new targets described above.
A nnnnnnn PAM depleted library was constructed. Assays were performed to check for any fourth or fifth base specificity. Initial results of the PAM depletion assay showed no preference for the fourth and fifth bases as expected.
In summary, E1219V was found to be one of the earliest mutations fixed in evolution. It is close to the PAM sequence in the crystal structure. E480K and E543D are also visible in all clones early in evolution and can be important. K294R/Q1256K and A262T/S409I appear to be two distinct paths and may be important. Their PAM sequence profiles appear to be slightly different, which means their importance with respect to PAM activity determination.
Table 6 contains diseases/disorders with T to C changes. The table includes human gene mutations that can be corrected by changing cytosine (C) to thymine (T). The gene name, gene symbol and gene ID are indicated.
Table 7 contains diseases/disorders with a through G alterations. The table includes human gene mutations that can be corrected by changing guanine (G) to adenine (a). The gene name, gene symbol and gene ID are indicated.
Table 8: xCas9v3 mutation (K294R/Q1256K series)
Table 9: xCas9v3 mutation (A262T/S409I series)
TABLE 10 PAM fraction depleted (xCas9v3.0-3.6 mutation)
TABLE 10 continuation PAM depletion fraction (xCas9v3.7-3.12 mutation)
Reference to the literature
1.Humbert O,Davis L,Maizels N.Targeted gene therapies:tools,applications,optimization.Crit Rev Biochem Mol.2012;47(3):264-81.PMID:22530743.
2.Perez-Pinera P,Ousterout DG,Gersbach CA.Advances in targeted genomeediting.Curr Opin Chem Biol.2012;16(3-4):268-77.PMID:22819644.
3.Urnov FD,Rebar EJ,Holmes MC,Zhang HS,Gregory PD.Genome editing withengineered zinc finger nucleases.Nat Rev Genet.2010;11(9):636-46.PMID:20717154.
4.Joung JK,Sander JD.TALENs:a widely applicable technology fortargeted genome editing.Nat Rev Mol Cell Biol.2013;14(1):49-55.PMID:23169466.
5.Charpentier E,Doudna JA.Biotechnology:Rewriting agenome.Nature.2013;495,(7439):50-1.PMID:23467164.
6.Pan Y,Xia L,Li AS,Zhang X,Sirois P,Zhang J,Li K.Biological andbiomedical applications of engineered nucleases.Mol Biotechnol.2013;55(1):54-62.PMID:23089945.
7.De Souza,N.Primer:genome editing with engineered nucleases.NatMethods.2012;9(1):27.PMID:22312638.
8.Santiago Y,Chan E,Liu PQ,Orlando S,Zhang L,Urnov FD,Holmes MC,Guschin D,Waite A,Miller JC,Rebar EJ,Gregory PD,Klug A,CollingwoodTN.Targeted gene knockout in mammalian cells by using engineered zinc-fingernucleases.Proc Natl Acad Sci U S A.2008;105(15):5809-14.PMID:18359850.
9.Cargill M,Altshuler D,Ireland J,Sklar P,Ardlie K,Patil N,Lane CR,Lim EP,Kalyanaraman N,Nemesh J,Ziaugra L,Friedland L,Rolfe A,Warrington J,Lipshutz R,Daley GQ,Lander ES.Characterization of single-nucleotidepolymorphisms in coding regions of human genes.Nat Genet.1999;22(3):231-8.PMID:10391209.
10.Jansen R,van Embden JD,Gaastra W,Schouls LM.Identification ofgenes that are associated with DNA repeats in prokaryotes.Mol Microbiol.2002;43(6):1565-75.PMID:11952905.
11.Mali P,Esvelt KM,Church GM.Cas9as a versatile tool for engineeringbiology.Nat Methods.2013;10(10):957-63.PMID:24076990.
12.Jore MM,Lundgren M,van Duijin E,Bultema JB,Westra ER,Waghmare SP,Wiedenheft B,Pul U,Wurm R,Wagner R,Beijer MR,Barendregt A,Shou K,Snijders AP,Dickman MJ,Doudna JA,Boekema EJ,Heck AJ,van der Oost J,Brouns SJ.Structuralbasis for CRISPR RNA-guided DNA recognition by Cascade.Nat Struct MolBiol.2011;18(5):529-36.PMID:21460843.
13.Horvath P,Barrangou R.CRISPR/Cas,the immune system of bacteria andarchaea.Science.2010;327(5962):167-70.PMID:20056882.
14.Wiedenheft B,Sternberg SH,Doudna JA.RNA-guided genetic silencingsystems in bacteria and archaea.Nature.2012;482(7385):331-8.PMID:22337052.
15.Gasiunas G,Siksnys V.RNA-dependent DNA endonuclease Cas9 of theCRISPR system:Holy Grail of genome editing?Trends Microbiol.2013;21(11):562-7.PMID:24095303.
16.Qi LS,Larson MH,Gilbert LA,Doudna JA,Weissman JS,Arkin AP,LimWA.Repurposing CRISPR as an RNA-guided platform for sequence-specific controlof gene expression.Cell.2013;152(5):1173-83.PMID:23452860.
17.Perez-Pinera P,Kocak DD,Vockley CM,Adler AF,Kabadi AM,Polstein LR,Thakore PI,Glass KA,Ousterout DG,Leong KW,Guilak F,Crawford GE,Reddy TE,Gersbach CA.RNA-guided gene activation by CRISPR-Cas9-based transcriptionfactors.Nat Methods.2013;10(10):973-6.PMID:23892895.
18.Mali P,Aach J,Stranges PB,Esvelt KM,Moosburner M,Kosuri S,Yang L,Church GM.CAS9 transcriptional activators for target specificity screeningand paired nickases forcooperative genome engineering.Nat Biotechnol.2013;31(9):833-8.PMID:23907171.
19.Gilbert LA,Larson MH,Morsut L,Liu Z,Brar GA,Torres SE,Stern-Ginossar N,Brandman O,Whitehead EH,Doudna JA,Lim WA,Weissman JS,Qi LS.CRISPR-mediated modular RNA-guided regulation of transcription ineukaryotes.Cell.2013;154(2):442-51.PMID:23849981.
20.Larson MH,Gilbert LA,Wang X,Lim WA,Weissman JS,Qi LS.CRISPRinterference(CRISPRi)for sequence-specific control of gene expression.NatProtoc.2013;8(11):2180-96.PMID:24136345.
21.Mali P,Yang L,Esvelt KM,Aach J,Guell M,DiCarlo JE,Norville JE,Church GM.RNA-guided human genome engineering via Cas9.Science.2013;339(6121):823-6.PMID:23287722.
22.Cole-Strauss A,Yoon K,Xiang Y,Byrne BC,Rice MC,Gryn J,Holloman WK,Kmiec EB.Correction of the mutation responsible for sickle cell anemia by anRNA-DNA oligonucleotide.Science.1996;273(5280):1386-9.PMID:8703073.
23.Tagalakis AD,Owen JS,Simons JP.Lack of RNA-DNA oligonucleotide(chimeraplast)mutagenic activity in mouse embryos.Mol Reprod Dev.2005;71(2):140-4.PMID:15791601.
24.Ray A,Langer M.Homologous recombination:ends as the means.TrendsPlant Sci.2002;7(10):435-40.PMID 12399177.
25.Britt AB,May GD.Re-engineering plant gene targeting.Trends PlantSci.2003;8(2):90-5.PMID:12597876.
26.Vagner V,Ehrlich SD.Efficiency of homologous DNA recombinationvaries along the Bacillus subtilis chromosome.J Bacteriol.1988;170(9):3978-82.PMID:3137211.
27.Saleh-Gohari N,Helleday T.Conservative homologous recombinationpreferentially repairs DNA double-strand breaks in the S phase of the cellcycle in human cells.Nucleic Acids Res.2004;32(12):3683-8.PMID:15252152.
28.Lombardo A,Genovese P,Beausejour CM,Colleoni S,Lee YL,Kim KA,AndoD,Urnov FD,Galli C,Gregory PD,Holmes MC,Naldini L.Gene editing in human stemcells usingzinc finger nucleases and integrase-defective lentiviral vectordelivery.Nat Biotechnol.2007;25(11):1298-306.PMID:17965707.
29.Conticello SG.The AID/APOBEC family of nucleic acidmutators.Genome Biol.2008;9(6):229.PMID:18598372.
30.Reynaud CA,Aoufouchi S,Faili A,Weill JC.What role for AID:mutator,or assembler of the immunoglobulin mutasome?Nat Immunol.2003;4(7):631-8.
31.Bhagwat AS.DNA-cytosine deaminases:from antibody maturation toantiviral defense.DNA Repair(Amst).2004;3(1):85-9.PMID:14697763.
32.Navaratnam N,Sarwar R.An overview of cytidine deaminases.Int JHematol.2006;83(3):195-200.PMID:16720547.
33.Holden LG,Prochnow C,Chang YP,Bransteitter R,Chelico L,Sen U,Stevens RC,Goodman MF,Chen XS.Crystal structure of the anti-viral APOBEC3Gcatalytic domain and functional implications.Nature.2008;456(7218):121-4.PMID:18849968.
34.Chelico L,Pham P,Petruska J,Goodman MF.Biochemical basis ofimmunological and retroviral responses to DNA-targeted cytosine deaminationby activation-induced cytidine deaminase and APOBEC3G.J Biol Chem.2009;284(41).27761-5.PMID:19684020.
35.Pham P,Bransteitter R,Goodman MF.Reward versus risk:DNA cytidinedeaminases triggering immunity and disease.Biochemistry.2005;44(8):2703-15.PMID 15723516.
36.Chen X,Zaro JL,Shen WC.Fusion protein linkers:property,design andfunctionality.Adv Drug Deliv Rev.2013;65(10):1357-69.PMID:23026637.
37.Slaymaker,I.M.etal.RationallyengineeredCas9nucleaseswithimprovedspecificity.Scien ce,doi:10.1126/science.aad5227(2015).
38.Kleinstiver,B.P.etal.EngineeredCRISPR-Cas9nucleaseswithalteredPAMspecificities.N ature523,481-485,doi:10.1038/nature14592(2015).
39.Pattanayak,V.etal.High-throughputprofilingofoff-targetDNAcleavagerevealsRNA-programmedCas9nucleasespecificity.NatureBiotechnology31,839-843,doi:10.1038/nbt.2673(2013).
40.Shcherbakova,D.M.&Verkhusha,V.V.Near-infraredfluorescentproteinsformulticolorin vivoimaging.NatureMethods10,751-754,doi:10.1038/nmeth.2521(2013).
41.Kleinstiver,et al.,High-fidelity CRISPR–Cas9nucleases with nodetectable genome-wide off-target effects.Nature529,490-495doi:10.1038/nature16526(2016).
All publications, patents, patent applications, publications, and database entries (e.g., sequence database entries) mentioned herein, such as those in the background, summary, detailed description, examples, and/or references section, are incorporated by reference herein in their entirety as if each individual publication, patent application, publication, and database entry were specifically and individually incorporated by reference herein. In case of conflict, the present application, including any definitions herein, will control.
Equivalent embodiments and scope
Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the examples described herein. The scope of the present disclosure is not intended to be limited by the foregoing description, but rather is as set forth in the following claims.
The articles such as "a," "an," and "the" may mean one or more than one unless there is an indication to the contrary or is clear from the context. Claims or descriptions that include an "or" between two or more group members are deemed to be satisfied if one, more than one, or all of the group members are present, unless indicated to the contrary or otherwise evident from the context. Disclosure of a group comprising an "or" between two or more group members provides embodiments in which one member of the group is present, embodiments in which more than one member of the group is present, and embodiments in which all members are present. For the sake of brevity, these embodiments are not separately set forth herein, but it is to be understood that each of these embodiments is provided herein and may be specifically claimed or disclaimed.
It should be understood that the invention encompasses all variations, combinations, and permutations in which one or more limitations, elements, clauses, or descriptive terms from one or more claims or from one or more relevant portions of the specification are introduced into another claim. For example, a claim that depends from another claim may be amended to include one or more limitations that exist in any other claim that depends from the same base claim. Furthermore, where the claims recite a composition, it should be understood that a contradiction or inconsistency would arise unless otherwise indicated or unless it is apparent to one of ordinary skill in the art, including methods of making and using the compositions according to any of the methods of making or using disclosed herein or according to methods known in the art (if any).
Where elements are presented as lists, for example, in a markush group format, it is to be understood that each possible subset of elements is also disclosed, and that any element or subset of elements can be removed from the group. It should also be noted that the term "comprising" is intended to be open-ended and allows for the inclusion of additional elements or steps. It will be understood that, in general, where an embodiment, product or method is referred to as comprising a particular element, feature or step, an embodiment, product or method that consists of, or consists essentially of such element, feature or step is also provided. For the sake of brevity, these embodiments are not separately set forth herein, but it is to be understood that each of these embodiments is provided herein and may be specifically claimed or disclaimed.
Where ranges are given, endpoints are included. Further, it is to be understood that unless otherwise indicated or otherwise evident from the context and/or understanding of one of ordinary skill in the art, values expressed as ranges can, in some embodiments, take on any specific value within the range recited. In some embodiments, to the tenth of the unit of the lower limit of the range, unless the context clearly dictates otherwise. For the sake of brevity, values in each range are not individually set forth herein, but it is to be understood that each of these values is provided herein and can be specifically claimed or disclaimed. It is further understood that unless otherwise indicated or otherwise evident from the context and/or understanding of one of ordinary skill in the art, values expressed as ranges can assume any subrange within the given range, wherein the endpoints of the subrange are expressed to the same degree of accuracy as the tenth of the unit of the lower limit of the range.
In addition, it is to be understood that any particular embodiment of the invention may be explicitly excluded from any one or more claims. Where a range is given, any value within the range can be explicitly excluded from any one or more claims. Any embodiment, element, feature, application, or aspect of the compositions and/or methods of the present invention may be excluded from any one or more claims. In the interest of brevity, all embodiments in which one or more elements, features, objects, or aspects are excluded are not explicitly set forth herein.
Claims (434)
1. A recombinant Cas9 protein comprising an amino acid sequence at least 80% identical to the amino acid sequence of Cas9 as provided in any one of SEQ ID NOs 9-262,
wherein the amino acid sequence of the Cas protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 262,267,294,405,409,480,543,694,1219,1224,1256 and 1362 of the amino acid sequence provided in SEQ ID NO 9 or the corresponding amino acid residue in any of the amino acid sequences provided in SEQ ID NOS 10-262, and
wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein.
2. The Cas9 protein of claim 1, comprising an amino acid sequence at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of Cas9 as provided in any one of SEQ ID NOs 9-262.
3. The Cas9 protein of claim 1, wherein the Cas9 protein comprises RuvC and HNH domains.
4. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 9, X262T, X267G, X294R, X405I, X409I, X480K, X543D, X694I, X1219V, X1224K and X1362P, or in any of the amino acid sequences provided in SEQ ID NO 10-262, wherein X denotes any amino acid.
5. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: a corresponding mutation in any one of a262T, S267G, K294R, F405I, S409I, E480K, E543D, M694I, E1219V, N1224K, Q1256K, and L1362P of the amino acid sequence provided in SEQ ID No. 9, or the amino acid sequences provided in SEQ ID nos. 10-262.
6. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises the X1219V mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262, wherein X represents any amino acid.
7. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises the E1219V mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
8. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises the X294R mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262, wherein X represents any amino acid.
9. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises the K294R mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
10. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises the X1256K mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262, wherein X represents any amino acid.
11. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises the Q1256K mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
12. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises the X694I mutation of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid.
13. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises the M694I mutation of the amino acid sequence provided in SEQ ID NO 9, or a corresponding mutation in any one of the amino acid sequences provided in SEQ ID NO 10-262.
14. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises the X543D mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262, wherein X represents any amino acid.
15. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises the E543D mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
16. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises the X480K mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262, wherein X represents any amino acid.
17. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises the E480K mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
18. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises the X409I mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262, wherein X represents any amino acid.
19. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises the S409I mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
20. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises the X262T mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262, wherein X represents any amino acid.
21. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises the a262T mutation of the amino acid sequence provided in SEQ ID No. 9, or a corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
22. A recombinant Cas9 protein comprising an amino acid sequence at least 80% identical to the amino acid sequence of Cas9 as provided in any one of SEQ ID NOs 9-262,
wherein the amino acid sequence of the Cas protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 262,267,294,405,409,480,543,694,1219,1224,1256 and 1362 of the amino acid sequence provided in SEQ ID NO 9 or the corresponding amino acid residue in any of the amino acid sequences provided in SEQ ID NOS 10-262, and
wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein.
23. The Cas9 protein of claim 22, comprising an amino acid sequence at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of Cas9 as provided in any one of SEQ ID NOs 9-262.
24. A Cas9 protein according to claim 22 or 23, wherein the Cas9 protein comprises RuvC and HNH domains.
25. A Cas9 protein of any one of claims 22-24, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: X262T, X267G, X294R, X405I, X409I, X480K, X543D, X694I, X1219V, X1224K, X1256K and X1362P of the amino acid sequence provided in SEQ ID NO. 9 or the corresponding mutation in any of the amino acid sequences provided in SEQ ID NO. 10-262, wherein X represents any amino acid.
26. A Cas9 protein of any one of claims 22-25, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: corresponding mutations in any of the amino acid sequences provided in SEQ ID NO. 9, A262T, S267G, K294R, F405I, S409I, E480K, E543D, M694I, E1219V, N1224K, Q1256K, and L1362P, or in SEQ ID NO. 10-262.
27. A Cas9 protein according to any one of claims 22-26, wherein the amino acid sequence of the Cas9 protein comprises the X1219V mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
28. A Cas9 protein according to any one of claims 22-27, wherein the amino acid sequence of the Cas9 protein comprises the E1219V mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
29. A Cas9 protein according to any one of claims 22-28, wherein the amino acid sequence of the Cas9 protein comprises the X480K mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
30. A Cas9 protein according to any one of claims 22-29, wherein the amino acid sequence of the Cas9 protein comprises the E480K mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
31. A Cas9 protein according to any one of claims 22-30, wherein the amino acid sequence of the Cas9 protein comprises the X543D mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262, wherein X represents any amino acid.
32. A Cas9 protein according to any one of claims 22-31, wherein the amino acid sequence of the Cas9 protein comprises the E543D mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
33. A Cas9 protein according to any one of claims 22-32, wherein the amino acid sequence of the Cas9 protein comprises the X480K, X543D and X1219V mutations of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID nos. 10-262, wherein X represents any amino acid.
34. A Cas9 protein according to any one of claims 22 to 32, wherein the amino acid sequence of the Cas9 protein comprises the X262T, X409I, X480K, X543D, X694I and X1219V mutations of the amino acid sequence provided in SEQ id No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ id No. 10 to 262, wherein X represents any amino acid.
35. A Cas9 protein according to any one of claims 22 to 32, wherein the amino acid sequence of the Cas9 protein comprises the X294R, X480K, X543D, X1219V, X1256K and X1362P mutations of the amino acid sequence provided in SEQ id No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ id No. 10 to 262, wherein X represents any amino acid.
36. A Cas9 protein according to any one of claims 22 to 32, wherein the amino acid sequence of the Cas9 protein comprises the X294R, X480K, X543D, X1219V and X1256 mutations of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10 to 262, wherein X represents any amino acid.
37. A Cas9 protein according to any one of claims 22 to 32, wherein the amino acid sequence of the Cas9 protein comprises the X267G, X294R, X480K, X543D, X1219V, X1224K and X1256K mutations of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
38. A Cas9 protein according to any one of claims 22 to 32, wherein the amino acid sequence of the Cas9 protein comprises the X262T, X405I, X409I, X480K, X543D, X694I and X1219V mutations of the amino acid sequence provided in SEQ id No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ id No. 10-262, wherein X represents any amino acid.
39. A Cas9 protein according to any one of claims 22-32, wherein the amino acid sequence of the Cas9 protein comprises the E480K, E543D and E1219V mutations of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
40. A Cas9 protein according to any one of claims 22-32, wherein the amino acid sequence of the Cas9 protein comprises the a262T, S409I, E480K, E543D, M694I and E1219V mutations of the amino acid sequence provided in SEQ id No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ id No. 10-262.
41. A Cas9 protein according to any one of claims 22-32, wherein the amino acid sequence of the Cas9 protein comprises the K294R, E480K, E543D, E1219V, Q1256K and L1362P mutations of the amino acid sequence provided in SEQ id No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ id No. 10-262.
42. A Cas9 protein according to any one of claims 22-32, wherein the amino acid sequence of the Cas9 protein comprises the K294R, E480K, E543D, E1219V and Q1256K mutations of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
43. A Cas9 protein according to any one of claims 22-32, wherein the amino acid sequence of the Cas9 protein comprises the S267G, K294R, E480K, E543DE1219V, N1224K and Q1256K mutations of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262.
44. A Cas9 protein according to any one of claims 22-32, wherein the amino acid sequence of the Cas9 protein comprises the a262T, F405I, S409I, E480K, E543D, M694I and E1219V mutations of the amino acid sequence provided in SEQ id No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ id No. 10-262.
45. A Cas9 protein according to any one of claims 24-44, wherein the amino acid sequence of the HNH domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the HNH domain of any one of SEQ ID Nos. 9-262.
46. A Cas9 protein according to any one of claims 24-45, wherein the amino acid sequence of the HNH domain is identical to the amino acid sequence of the HNH domain of SEQ ID No. 9.
47. A Cas9 protein of any one of claims 24-46, wherein the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of any one of SEQ ID Nos. 9-262.
48. A Cas9 protein of any one of claims 24-47, wherein the amino acid sequence of the Ruv domain is identical to the amino acid sequence of the Ruv domain of SEQ ID NO 9.
49. A Cas9 protein of any one of claims 22-48, wherein the Cas9 protein comprises the D10A and/or H840A mutations of the amino acid sequences provided in SEQ ID NO 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NO 10-262.
50. A Cas9 protein of any one of claims 22-48, wherein the Cas9 protein comprises D10X of the amino acid sequence provided in SEQ ID NO 91And/or H840X2A mutation, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NO 10-262, wherein X1Is any amino acid except D, and X2Is any amino acid other than H.
51. A Cas9 protein of any one of claims 22-50, wherein the Cas9 protein comprises the D10A mutation of the amino acid sequence provided in SEQ ID NO 9, or a corresponding mutation in any one of the amino acid sequences provided in SEQ ID NO 10-262.
52. The Cas9 protein of claim 51, wherein the Cas9 protein comprises the H at amino acid residue 840 of the amino acid sequence provided in SEQ ID NO 9, or a corresponding residue in any one of the amino acid sequences provided in SEQ ID NO 10-262.
53. A Cas9 protein of any one of claims 22-51, wherein the Cas9 protein comprises the H840A mutation of the amino acid sequence provided in SEQ ID NO 9, or a corresponding mutation in any one of the amino acid sequences provided in SEQ ID NO 10-262.
54. The Cas9 protein of claim 51, wherein the Cas9 protein comprises the D at amino acid residue 10 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding residue in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
55. A Cas9 protein according to any one of claims 22-54, wherein the Cas9 protein exhibits increased activity against a target sequence that does not include the canonical PAM (5' -NGG-3') at its 3' end compared to the Streptococcus pyogenes (Streptococcus pyogenes) Cas9 provided as SEQ ID NO 9.
56. A recombinant Cas9 protein comprising an amino acid sequence that is at least 90% identical to the amino acid sequence of Streptococcus pyogenes Cas9 provided as SEQ ID NO:9 or a fragment thereof,
wherein the amino acid sequence of the Cas protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 262,267,294,405,409,480,543,694,1219,1224,1256 and 1362 of the amino acid sequence provided in SEQ ID NO 9,
wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein, and
wherein the recombinant Cas9 protein exhibits increased activity on a target sequence that does not comprise a canonical PAM (5' -NGG-3') at its 3' end compared to the Streptococcus pyogenes Cas9 provided as SEQ ID NO: 9.
57. The Cas9 protein of claim 56, wherein the Cas9 protein comprises RuvC and HNH domains.
58. A Cas9 protein of claim 56 or 57, wherein the Cas9 protein exhibits an activity on a target sequence that has a 3' end that is not immediately contiguous with the canonical PAM sequence (5' -NGG-3') that is at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1000-fold, at least 5000-fold, at least 10000-fold, at least 50000-fold, at least 100000-fold, at least 500000-fold, or at least 1000000-fold increased over the activity of Streptococcus pyogenes Cas9 on the same target sequence as provided in SEQ ID NO 9.
59. The Cas9 protein of claim 58, wherein the 3' end of the target sequence is directly adjacent to an AGC, GAG, TTT, GTG or CAA sequence.
60. A Cas9 protein of claim 58 or 59, wherein the 3' end of the target sequence is directly adjacent to a sequence selected from the group consisting of: CAC, GAT, TAA, ACG, CGA and CGT.
61. A Cas9 protein of any one of claims 56-60, wherein Cas9 protein activity is measured by a nuclease assay, deamination assay, or transcription activation assay.
62. The Cas9 protein of claim 61, wherein the transcriptional activation assay is a GFP activation assay.
63. The Cas9 protein of any one of claims 56-62, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 9, X262T, X267G, X294R, X405I, X409I, X480K, X543D, X694I, X1219V, X1224K, X1256K, and X1362P, or in SEQ ID NO 10-262, wherein X represents any amino acid.
64. The Cas9 protein of any one of claims 56-63, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: a corresponding mutation in any one of a262T, S267G, K294R, F405I, S409I, E480K, E543D, M694I, E1219V, N1224K, Q1256K, and L1362P of the amino acid sequence provided in SEQ ID No. 9, or the amino acid sequences provided in SEQ ID nos. 10-262.
65. A Cas9 protein according to any one of claims 56-64, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the X1219V mutation of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid.
66. A Cas9 protein of any one of claims 56-65, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the E1219V mutation of the amino acid sequence provided in SEQ ID NO 9, or a corresponding mutation in any one of the amino acid sequences provided in SEQ ID NO 10-262.
67. A Cas9 protein of any one of claims 56-66, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the X480K mutation of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid.
68. The Cas9 protein of any one of claims 56-67, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the E480K mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
69. A Cas9 protein of any one of claims 56-68, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the X543D mutation of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid.
70. A Cas9 protein of any one of claims 56-69, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the E543D mutation of the amino acid sequence provided in SEQ ID NO 9, or a corresponding mutation in any one of the amino acid sequences provided in SEQ ID NO 10-262.
71. A Cas9 protein according to any one of claims 56-70, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the X480K, X543D and X1219V mutations of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid.
72. A Cas9 protein according to any one of claims 56-70, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the X262T, X409I, X480K, X543D, X694I and X1219V mutations of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid.
73. A Cas9 protein according to any one of claims 56-70, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the X294R, X480K, X543D, X1219V, X1256K and X1362P mutations of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID nos. 10-262, wherein X represents any amino acid.
74. A Cas9 protein according to any one of claims 56-70, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the X294R, X480K, X543D, X1219V and X1256 mutations of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid.
75. A Cas9 protein according to any one of claims 56-70, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the X267G, X294R, X480K, X543D, X1219V, X1224K and X1256K mutations of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid.
76. A Cas9 protein according to any one of claims 56-70, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the X262T, X405I, X409I, X480K, X543D, X694I and X1219V mutations of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid.
77. A Cas9 protein of any one of claims 56-70, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the E480K, E543D and E1219V mutations of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NO 10-262.
78. A Cas9 protein according to any one of claims 56-70, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the A262T, S409I, E480K, E543D, M694I and E1219V mutations of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NO 10-262.
79. A Cas9 protein according to any one of claims 56-70, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the K294R, E480K, E543D, E1219V, Q1256K and L1362P mutations of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
80. A Cas9 protein according to any one of claims 56-70, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the K294R, E480K, E543D, E1219V and Q1256K mutations of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NO 10-262.
81. A Cas9 protein according to any one of claims 56-70, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the S267G, K294R, E480K, E543DE121 1219V, N1224K and Q1256K mutations of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NO 10-262.
82. A Cas9 protein according to any one of claims 56-70, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the A262T, F405I, S409I, E480K, E543D, M694I and E1219V mutations of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NO 10-262.
83. The Cas9 protein of any one of claims 57-82, wherein the amino acid sequence of the HNH domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the HNH domain of any one of SEQ id nos 9-262.
84. The Cas9 protein of any one of claims 57-83, wherein the amino acid sequence of the HNH domain is identical to the amino acid sequence of the HNH domain of any one of SEQ id nos 9-262.
85. A Cas9 protein of any one of claims 57-84, wherein the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of any one of SEQ ID Nos. 9-262.
86. A Cas9 protein of any one of claims 57-85, wherein the amino acid sequence of the Ruv domain is identical to the amino acid sequence of the Ruv domain of SEQ ID NO 9.
87. A Cas9 protein according to any one of claims 56-86, wherein the Cas9 protein or fragment thereof comprises the D10A and/or H840A mutations of the amino acid sequences provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262.
88. A Cas9 protein of any one of claims 56-86, wherein the Cas9 protein or fragment thereof comprises D10X of the amino acid sequence provided in SEQ ID NO 91And/or H840X2A mutation, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NO 10-262, wherein X1Is any amino acid except D, and X2Is any amino acid other than H.
89. A Cas9 protein according to any one of claims 56-88, wherein the Cas9 protein or fragment thereof comprises the D10A mutation of the amino acid sequence provided in SEQ ID No. 9 or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID No. 10-262.
90. The Cas9 protein of claim 89, wherein the Cas9 protein or fragment thereof comprises the H at amino acid residue 840 of the amino acid sequence provided in SEQ ID No. 9, or a corresponding residue in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
91. A Cas9 protein of any one of claims 56-89, wherein the Cas9 protein or fragment thereof comprises the H840A mutation of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NO 10-262.
92. The Cas9 protein of claim 91, wherein the Cas9 protein or fragment thereof comprises the D at amino acid residue 10 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding residue in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
93. A recombinant Cas9 protein comprising an amino acid sequence at least 80% identical to the amino acid sequence of Cas9 as provided in any one of SEQ ID NOs 9-262,
wherein the amino acid sequence of the Cas protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 122,137,182,262,294,409,480,543,660,694,1219 and 1329 of the amino acid sequence provided in SEQ ID NO 9 or the corresponding amino acid residue in any of the amino acid sequences provided in SEQ ID NOS 10-262, and
wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein.
94. The Cas9 protein of claim 93, wherein the Cas9 protein comprises RuvC and HNH domains.
95. The Cas9 protein of claim 93 or 94, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: X262T, X294R, X409I, X480K, X543D, X694I and X1219V of the amino acid sequence provided in SEQ ID NO 9 or the corresponding mutation in any of the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid.
96. The Cas9 protein of any one of claims 93-95, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: A262T, K294R, S409I, E480K, E543D, M694I or E1219V of the amino acid sequence provided in SEQ ID NO. 9 or the corresponding mutation in any of the amino acid sequences provided in SEQ ID NO. 10-262.
97. A Cas9 protein according to any one of claims 93 to 96, wherein the amino acid sequence of the Cas9 protein comprises the X1219V mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
98. A Cas9 protein according to any one of claims 93 to 97, wherein the amino acid sequence of the Cas9 protein comprises the E1219V mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10 to 262.
99. A Cas9 protein according to any one of claims 94-98, wherein the amino acid sequence of the HNH domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the HNH domain of any one of SEQ id nos 9-262.
100. A Cas9 protein according to any one of claims 94-99, wherein the amino acid sequence of the HNH domain is identical to the amino acid sequence of the HNH domain of SEQ id No. 9.
101. A Cas9 protein according to any one of claims 94-100, wherein the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of any one of seq id NOs 9-262.
102. A Cas9 protein according to any one of claims 94-101, wherein the amino acid sequence of the Ruv domain is identical to the amino acid sequence of the Ruv domain of seq id No. 9.
103. A Cas9 protein according to any one of claims 93 to 102, wherein the Cas9 protein comprises the D10A and/or H840A mutations of the amino acid sequences provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10 to 262.
104. A Cas9 protein according to any one of claims 93 to 102, wherein the Cas9 protein comprises the D10A mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID No. 10 to 262.
105. The Cas9 protein of claim 104, wherein the Cas9 protein comprises the H at amino acid residue 840 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding residue in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
106. A Cas9 protein according to any one of claims 93 to 105, wherein the Cas9 protein exhibits increased activity on a target sequence that does not comprise the canonical PAM (5' -NGG-3') at its 3' end compared to the streptococcus pyogenes Cas9 provided as SEQ ID NO 9.
107. A recombinant Cas9 protein comprising an amino acid sequence that is at least 90% identical to the amino acid sequence of Streptococcus pyogenes Cas9 as provided in SEQ ID NO:9 or a fragment thereof comprising the RuvC and HNH domains of SEQ ID NO:9,
wherein the amino acid sequence of the Cas protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 122,137,182,262,294,409,480,543,660,694,1219 and 1329 of the amino acid sequence provided in SEQ ID NO 9,
wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein, and
wherein the recombinant Cas9 protein exhibits increased activity on a target sequence that does not comprise a canonical PAM (5' -NGG-3') at its 3' end compared to the Streptococcus pyogenes Cas9 provided as SEQ ID NO: 9.
108. The Cas9 protein of claim 107, wherein the Cas9 protein comprises RuvC and HNH domains.
109. A Cas9 protein according to claim 107 or 108, wherein the Cas9 protein exhibits an activity on a target sequence that has a 3' end that is not immediately contiguous with the canonical PAM sequence (5' -NGG-3'), which is at least a 5-fold, at least a 10-fold, at least a 50-fold, at least a 100-fold, at least a 500-fold, at least a 1000-fold, at least a 5000-fold, at least a 10000-fold, at least a 50000-fold, at least a 100000-fold, at least a 500000-fold, or at least a 1000000-fold increase in activity on the same target sequence as compared to the streptococcus pyogenes Cas9 provided as SEQ ID NO: 9.
110. The Cas9 protein of claim 109, wherein the 3' end of the target sequence is directly adjacent to an AGC, GAG, TTT, GTG or CAA sequence.
111. The Cas9 protein of any one of claims 107-110, wherein Cas9 protein activity is measured by a nuclease assay, deamination assay, or transcription activation assay.
112. The Cas9 protein of claim 111, wherein the transcriptional activation assay is a GFP activation assay.
113. The Cas9 protein of any one of claims 107-112, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: X262T, X294R, X409I, X480K, X543D, X694I and X1219V of the amino acid sequence provided in SEQ ID NO 9 or the corresponding mutation in any of the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid.
114. The Cas9 protein of any one of claims 107-113, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: a corresponding mutation in any of a262T, K294R, S409I, E480K, E543D, M694I or E1219V of the amino acid sequence provided in SEQ ID No. 9, or the amino acid sequences provided in SEQ ID nos. 10-262.
115. The Cas9 protein of any one of claims 107-114, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the X1219 mutation of the amino acid sequence provided in SEQ ID NO 9 or a corresponding mutation in any one of the amino acid sequences provided in SEQ ID NO 10-262.
116. The Cas9 protein of any one of claims 107-115, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the X1219 mutation of the amino acid sequence provided in SEQ ID No. 9 or a corresponding mutation in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
117. The Cas9 protein of any one of claims 108-116, wherein the HNH domain has an amino acid sequence that is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the HNH domain of any one of SEQ ID NOS 9-262.
118. The Cas9 protein of any one of claims 108-117, wherein the HNH domain has an amino acid sequence identical to the HNH domain of SEQ ID NO. 9.
119. The Cas9 protein of any one of claims 108-118, wherein the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of any one of seq id NOs 9-262.
120. The Cas9 protein of any one of claims 108-119, wherein the amino acid sequence of the Ruv domain is identical to the amino acid sequence of the Ruv domain of seq id No. 9.
121. The Cas9 protein of any one of claims 107-120, wherein the Cas9 protein or fragment thereof comprises the D10A and/or H840A mutations of the amino acid sequences provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
122. The Cas9 protein of any one of claims 107-121, wherein the Cas9 protein comprises the D10A mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
123. The Cas9 protein of claim 122, wherein the Cas9 protein comprises the H at amino acid residue 840 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding residue in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
124. A recombinant Cas9 protein comprising an amino acid sequence at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of Cas9 provided in any one of SEQ ID NOS 9-262,
wherein the amino acid sequence of the Cas protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, or at least ten mutations selected from the group consisting of: amino acid residues 23,108,115,141,180,230,257,262,267,284,294,324,409,455,466,474,480,543,554,654,694,711,727,763,1063,1100,1219,1244,1256,1289 and 1323 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding amino acid residues in any of the amino acid sequences provided in SEQ ID nos. 10-262, and wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein.
125. The Cas9 protein of claim 124, wherein the Cas9 protein comprises RuvC and HNH domains.
126. A Cas9 protein of claim 124 or 125, wherein the amino acid sequence of the Cas protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, or at least ten mutations selected from the group consisting of: X23N, X108G, X115H, X141Q, X180N, X230S, X257N, X262T, X267G, X284N, X294R, X324L, X409I, X455F, X466A, X474I, X480K, X543D, X554R, X654L, X694I, X711E, X727P, X763I, X1063V, X1100I, X1219V, X4 1244N, X1256K, X128 1289Q and X1323S of the amino acid sequence provided in SEQ ID NO 9 or the corresponding mutation in any of the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid.
127. The Cas9 protein of any one of claims 124-126, wherein the amino acid sequence of the Cas protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, or at least ten mutations selected from the group consisting of: the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NO 9, D23N, E108G, R115H, K141Q, D180N, P230S, D257N, A262T, S267G, D284N, K294R, R324L, S409I, L455F, T466A, T474I, E480K, E543D, K554R, R654L, M694I, A711E, L727P, M763I, I1063V, V1100I, E1219V, K1244N, Q1256K, K1289Q and A1323S, or the amino acid sequences provided in SEQ ID NO 10-262.
128. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations X115H, X141Q, X267G, X294R, X480K, X543D, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
129. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations X267G, X294R, X480K, X543D, X1219V and X1256K of the amino acid sequence provided in seq id No. 9, or the corresponding mutations in any one of the amino acid sequences provided in seq id NOs 10-262, wherein X represents any amino acid.
130. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations X23N, X108G, X262T, X409I, X480K, X543D, X694I, X727P, X1219V and X128 1289Q of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
131. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations X257N, X267G, X294R, X466A, X480K, X543D, X1063V, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
132. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations X409I, X455F, X480K, X543D, X654L, X1100I, X1219V and X1323S of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
133. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations X267G, X294R, X480K, X543D, X1219V and X1256K of the amino acid sequence provided in seq id No. 9, or the corresponding mutations in any one of the amino acid sequences provided in seq id NOs 10-262, wherein X represents any amino acid.
134. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations X262T, X324L, X409I, X480K, X543D, X694I and X1219V of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
135. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations X294R, X480K, X543D, X711E, X1219V and X1256K of the amino acid sequence provided in seq id No. 9, or the corresponding mutations in any one of the amino acid sequences provided in seq id NOs 10-262, wherein X represents any amino acid.
136. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations X230S, X267G, X294R, X480K, X543D, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, wherein X represents any amino acid, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
137. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations X180N, X284N, X474I, X480K, X543D, X554R, X763I, X1219V and X1244N of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
138. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations X262T, X409I, X480K, X543D, X694I and X1219V of the amino acid sequence provided in SEQ id No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ id NOs 10-262, wherein X represents any amino acid.
139. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations X267G, X294R, X480K, X543D, X1219V and X1256K of the amino acid sequence provided in seq id No. 9, or the corresponding mutations in any one of the amino acid sequences provided in seq id NOs 10-262, wherein X represents any amino acid.
140. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations X230S, X267G, X294R, X480K, X543D, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, wherein X represents any amino acid, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
141. The Cas9 protein of any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations X267G, X294R and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID nos. 10-262, wherein X represents any amino acid.
142. The Cas9 protein of any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations X262T and X409I of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid.
143. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations R115H, K141Q, S267G, K294R, E480K, E543D, E1219V and Q1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262.
144. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations S267G, K294R, E480K, E543D, E1219V and Q1256K of the amino acid sequence provided in seq id No. 9, or the corresponding mutations in any one of the amino acid sequences provided in seq id NOs 10-262.
145. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations D23N, E108G, a262T, S409I, E480K, E543D, M694I, L727P, E1219V and K128 1289Q of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
146. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations D257N, S267G, K294R, T466A, E480K, E543D, I1063V, E1219V and Q1256K of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NO 10-262.
147. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations S409I, L455F, E480K, E543D, R654L, V1100I, E1219V and a1323S of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262.
148. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations S267G, K294R, E480K, E543D, E1219V and Q1256K of the amino acid sequence provided in seq id No. 9, or the corresponding mutations in any one of the amino acid sequences provided in seq id NOs 10-262.
149. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations a262T, R324L, S409I, E480K, E543D, M694I and E1219V of the amino acid sequence provided in SEQ ID No. 9 or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262.
150. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations K294R, E480K, E543D, a711E, E1219V and Q1256K of the amino acid sequence provided in seq id No. 9, or the corresponding mutations in any one of the amino acid sequences provided in seq id NOs 10-262.
151. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations P230S, S267G, K294R, E480K, E543D, E1219V and Q1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
152. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations D180N, D284N, T474I, E480K, E543D, K554R, M763I, E1219V and K1244N of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262.
153. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations a262T, S409I, E480K, E543D, M694I and E1219V of the amino acid sequence provided in SEQ id No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ id NOs 10-262.
154. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations S267G, K294R, E480K, E543D, E1219V and Q1256K of the amino acid sequence provided in seq id No. 9, or the corresponding mutations in any one of the amino acid sequences provided in seq id NOs 10-262.
155. A Cas9 protein according to any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations P230S, S267G, K294R, E480K, E543D, E1219V and Q1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
156. The Cas9 protein of any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations S267G, K294R and Q1256K of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NO 10-262.
157. The Cas9 protein of any one of claims 124-127, wherein the amino acid sequence of the Cas protein comprises the mutations A262T and S409I of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NO 10-262.
158. The Cas9 protein of any one of claims 124-157, wherein the amino acid sequence of the HNH domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the HNH domain of any one of seq id NOs 9-262.
159. The Cas9 protein of any one of claims 125-158, wherein the HNH domain has an amino acid sequence that is identical to the amino acid sequence of the HNH domain of seq id No. 9.
160. The Cas9 protein of any one of claims 125-159, wherein the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of any one of seq id NOs 9-262.
161. The Cas9 protein as set forth in any one of claims 125-160, wherein the Ruv domain has an amino acid sequence identical to that of the Ruv domain of SEQ ID NO. 9.
162. The Cas9 protein of any one of claims 124-161, wherein the Cas9 protein comprises the D10A and/or H840A mutations of the amino acid sequences provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
163. The Cas9 protein of any one of claims 124-162, wherein the Cas9 protein comprises D10X of the amino acid sequence provided in SEQ ID NO 91And/or H840X2A mutation, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NO 10-262, wherein X1Is any amino acid except D, and X2Is any amino acid other than H.
164. The Cas9 protein of any one of claims 124-163, wherein the Cas9 protein comprises the D10A mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
165. The Cas9 protein of claim 164, wherein the Cas9 protein comprises the H at amino acid residue 840 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding residue in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
166. The Cas9 protein of any one of claims 124-164, wherein the Cas9 protein comprises the H840A mutation of the amino acid sequence provided in SEQ ID No. 9 or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
167. The Cas9 protein of claim 166, wherein the Cas9 protein comprises the D at amino acid residue 10 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding residue in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
168. The Cas9 protein of any one of claims 124-167, wherein the Cas9 protein exhibits increased activity against a target sequence that does not include the canonical PAM (5' -NGG-3') at its 3' end compared to the streptococcus pyogenes Cas9 provided as SEQ ID NO 9.
169. A recombinant Cas9 protein comprising an amino acid sequence that is at least 90% identical to the amino acid sequence of Streptococcus pyogenes Cas9 provided as SEQ ID NO:9 or a fragment thereof,
wherein the amino acid sequence of the Cas protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, or at least ten mutations selected from the group consisting of: amino acid residues 23,108,115,141,180,230,257,262,267,284,294,324,409,455,466,474,480,543,554,654,694,711,727,763,1063,1100,1219,1244,1256,1289 and 1323 of the amino acid sequence provided in SEQ ID NO 9,
wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein, and
wherein the recombinant Cas9 protein exhibits increased activity on a target sequence that does not comprise a canonical PAM (5' -NGG-3') at its 3' end compared to the Streptococcus pyogenes Cas9 provided as SEQ ID NO: 9.
170. The Cas9 protein of claim 169, wherein the Cas9 protein comprises RuvC and HNH domains.
171. A Cas9 protein according to claim 169 or 170, wherein the Cas9 protein exhibits an activity on a target sequence that has a 3' end that is not directly adjacent to the canonical PAM sequence (5' -NGG-3'), which is at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1000-fold, at least 5000-fold, at least 10000-fold, at least 50000-fold, at least 100000-fold, at least 500000-fold, or at least 1000000-fold increased over the activity of streptococcus pyogenes Cas9 on the same target sequence as provided in SEQ ID No. 9.
172. The Cas9 protein of claim 171, wherein the 3' end of the target sequence is directly adjacent to an AGC, GAG, TTT, GTG or CAA sequence.
173. A Cas9 protein of claim 171 or 172, wherein the 3' end of the target sequence is directly adjacent to a sequence selected from the group consisting of: CAC, GAT, TAA, ACG, CGA and CGT.
174. The Cas9 protein of any one of claims 169-173, wherein Cas9 protein activity is measured by a nuclease assay, deamination assay, or transcription activation assay.
175. The Cas9 protein of claim 174, wherein the transcriptional activation assay is a GFP activation assay.
176. The Cas9 protein of any one of claims 169-175, wherein the amino acid sequence of the Cas protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: X23N, X108G, X115H, X141Q, X180N, X230S, X257N, X262T, X267G, X284N, X294R, X324L, X409I, X455F, X466A, X474I, X480K, X543D, X554R, X654L, X694I, X711E, X727P, X763I, X1063V, X1100I, X1219V, X4 1244N, X1256K, X128 1289Q and X1323S of the amino acid sequence provided in SEQ ID NO 9 or the corresponding amino acid residue in any one of the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid.
177. The Cas9 protein of any one of claims 169-176, wherein the amino acid sequence of the Cas protein or the fragment thereof comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NO 9, D23N, E108G, R115H, K141Q, D180N, P230S, D257N, A262T, S267G, D284N, K294R, R324L, S409I, L455F, T466A, T474I, E480K, E543D, K554R, R654L, M694I, A711E, L727P, M763I, I1063V, V1100I, E1219V, K1244N, Q1256K, K1289Q and A1323S, or the amino acid sequences provided in SEQ ID NO 10-262.
178. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations X115H, X141Q, X267G, X294R, X480K, X543D, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
179. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations X267G, X294R, X480K, X543D, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
180. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations X23N, X108G, X262T, X409I, X480K, X543D, X694I, X727P, X1219V and X1289Q of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
181. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations X257N, X267G, X294R, X466A, X480K, X543D, X1063V, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
182. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations X409I, X455F, X480K, X543D, X654L, X1100I, X1219V and X1323S of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
183. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations X267G, X294R, X480K, X543D, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
184. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations X262T, X324L, X409I, X480K, X543D, X694I and X1219V of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
185. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations X294R, X480K, X543D, X711E, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
186. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations X230S, X267G, X294R, X480K, X543D, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
187. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations X180N, X284N, X474I, X480K, X543D, X554R, X763I, X1219V and X1244N of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
188. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations X262T, X409I, X480K, X543D, X694I and X1219V of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
189. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations X267G, X294R, X480K, X543D, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
190. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations X230S, X267G, X294R, X480K, X543D, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
191. The Cas9 protein of any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations X267G, X294R and X1256K of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid.
192. The Cas9 protein of any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations X262T and X409I of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid.
193. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations R115H, K141Q, S267G, K294R, E480K, E543D, E1219V and Q1256K of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NO 10-262.
194. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations S267G, K294R, E480K, E543D, E1219V and Q1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
195. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations D23N, E108G, a262T, S409I, E480K, E543D, M694I, L727P, E1219V and K1289Q of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262.
196. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations D257N, S267G, K294R, T466A, E480K, E543D, I1063V, E1219V and Q1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
197. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the corresponding mutation in any one of the mutations S409I, L455F, E480K, E543D, R654L, V1100I, E1219V and A1323S of the amino acid sequence provided in SEQ ID NO 9, or the amino acid sequences provided in SEQ ID NO 10-262.
198. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations S267G, K294R, E480K, E543D, E1219V and Q1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
199. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations a262T, R324L, S409I, E480K, E543D, M694I and E1219V of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262.
200. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations K294R, E480K, E543D, a711E, E1219V and Q1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
201. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations in the amino acid sequence provided in SEQ ID No. 9, P230S, S267G, K294R, E480K, E543D, E1219V and Q1256K, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
202. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the corresponding mutation in any one of the amino acid sequences provided in SEQ ID No. 9, mutations D180N, D284N, T474I, E480K, E543D, K554R, M763I, E1219V and K1244N, or SEQ ID No. 10-262.
203. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations a262T, S409I, E480K, E543D, M694I and E1219V of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262.
204. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations S267G, K294R, E480K, E543D, E1219V and Q1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
205. A Cas9 protein according to any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations in the amino acid sequence provided in SEQ ID No. 9, P230S, S267G, K294R, E480K, E543D, E1219V and Q1256K, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
206. The Cas9 protein of any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations S267G, K294R and Q1256K of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NO 10-262.
207. The Cas9 protein of any one of claims 169-177, wherein the amino acid sequence of the Cas protein or fragment thereof comprises the mutations A262T and S409I of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NO 10-262.
208. The Cas9 protein of any one of claims 170-207, wherein the HNH domain has an amino acid sequence that is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the HNH domain of any one of SEQ ID NOS 9-262.
209. The Cas9 protein of any one of claims 170-208, wherein the HNH domain has an amino acid sequence identical to the HNH domain of SEQ ID NO. 9.
210. The Cas9 protein of any one of claims 170-209, wherein the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of any one of SEQ ID NOS 9-262.
211. The Cas9 protein of any one of claims 170-210, wherein the amino acid sequence of the Ruv domain is identical to the amino acid sequence of the Ruv domain of SEQ ID NO. 9.
212. The Cas9 protein of any one of claims 169-211, wherein the Cas9 protein or fragment thereof comprises the D10A and/or H840A mutations of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
213. The Cas9 protein of any one of claims 169-211, wherein the Cas9 protein or fragment thereof comprises D10X of the amino acid sequence provided in SEQ ID NO 91And/or H840X2A mutation, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NO 10-262, wherein X1Is any amino acid except D, and X2Is any amino acid other than H.
214. The Cas9 protein of any one of claims 169-213, wherein the Cas9 protein or fragment thereof comprises the D10A mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
215. The Cas9 protein of claim 214, wherein the Cas9 protein or fragment thereof comprises H at amino acid residue 840 of the amino acid sequence provided in SEQ ID No. 9, or a corresponding residue in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
216. The Cas9 protein of any one of claims 169-214, wherein the Cas9 protein or fragment thereof comprises the H840A mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
217. The Cas9 protein of claim 216, wherein the Cas9 protein or fragment thereof comprises the D at amino acid residue 10 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding residue in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
218. A recombinant Cas9 protein comprising an amino acid sequence at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of Cas9 as provided in any one of SEQ ID NOS 9-262,
wherein the amino acid sequence of the Cas protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of: amino acid residues 175,230,257,267,294,466,480,543,711,1063,1207,1219 and 1256 of the amino acid sequence provided in SEQ ID NO 9, or the corresponding amino acid residue in any one of the amino acid sequences provided in SEQ ID NO 10-262, and
wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein.
219. The Cas9 protein of claim 218, wherein the Cas9 protein comprises RuvC and HNH domains.
220. The Cas9 protein of claim 218 or 219, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of: corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 9, X175T, X230F, X257N, X267G, X294R, X466A, X480K, X543D, X711E, X1207G, X1063V, X1219V and X1256K, or SEQ ID NO 10-262, wherein X represents any amino acid.
221. The Cas9 protein of any one of claims 218-220, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of: corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 9, N175T, P230F, D257N, S267G, K294R, T466A, E480K, E543D, A711E, E1207G, I1063V, E1219V and Q1256K, or SEQ ID NO 10-262.
222. A Cas9 protein according to any one of claims 218-221, wherein the amino acid sequence of the Cas9 protein comprises the mutations X230F, X267G, X294R, X480K, X543D, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
223. A Cas9 protein according to any one of claims 218-221, wherein the amino acid sequence of the Cas9 protein comprises the mutations X294R, X480K, X543D, X711E, X1207G, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
224. A Cas9 protein according to any one of claims 218-221, wherein the amino acid sequence of the Cas9 protein comprises the mutations X175T, X267G, X294R, X480K, X543D, X121 1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
225. A Cas9 protein according to any one of claims 218-221, wherein the amino acid sequence of the Cas9 protein comprises the mutations X257N, X267G, X294R, X466A, X480K, X543D, X1063V, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID nos. 10-262, wherein X represents any amino acid.
226. A Cas9 protein according to any one of claims 218-221, wherein the amino acid sequence of the Cas9 protein comprises the mutations X294R, X480K, X543D, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
227. The Cas9 protein of any one of claims 218-221, wherein the amino acid sequence of the Cas9 protein comprises the mutations X294R and X1256K of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NO 10-262, wherein X represents any amino acid.
228. A Cas9 protein according to any one of claims 218-221, wherein the amino acid sequence of the Cas9 protein comprises the mutations P230F, S267G, K294R, E480K, E543D, E1219V and Q1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262.
229. A Cas9 protein according to any one of claims 218-221, wherein the amino acid sequence of the Cas9 protein comprises the mutations K294R, E480K, E543D, a711E, E1207G, E121 1219V and Q1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262.
230. A Cas9 protein according to any one of claims 218-221, wherein the amino acid sequence of the Cas9 protein comprises the mutations N175T, S267G, K294R, E480K, E543D, E1219V and Q1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262.
231. A Cas9 protein according to any one of claims 218-221, wherein the amino acid sequence of the Cas9 protein comprises the corresponding mutation in any one of the mutations D257N, S267G, K294R, T466A, E480K, E543D, I1063V, E1219V and Q1256K of the amino acid sequence provided in SEQ ID No. 9, or the amino acid sequences provided in SEQ ID nos. 10-262.
232. The Cas9 protein of any one of claims 218-221, wherein the amino acid sequence of the Cas9 protein comprises the mutations K294R, E480K, E543D, E1219V and Q1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
233. The Cas9 protein of any one of claims 218-221, wherein the amino acid sequence of the Cas9 protein comprises the mutations K294R and Q1256K of the amino acid sequence provided in SEQ ID NO 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NO 10-262.
234. The Cas9 protein of any one of claims 218-233, wherein the HNH domain has an amino acid sequence that is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the HNH domain of any one of seq id NOs 9-262.
235. The Cas9 protein of any one of claims 219 and 234, wherein the HNH domain has an amino acid sequence identical to the amino acid sequence of the HNH domain of seq id No. 9.
236. The Cas9 protein of any one of claims 219 and 235, wherein the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of any one of seq id NOs 9-262.
237. The Cas9 protein of any one of claims 219-236, wherein the Ruv domain has an amino acid sequence identical to the Ruv domain of seq id No. 9.
238. The Cas9 protein of any one of claims 218-237, wherein the Cas9 protein comprises the D10A and/or H840A mutations of the amino acid sequences provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
239. The Cas9 protein of any one of claims 218-238, wherein the Cas9 protein comprises the amino acid sequence provided in SEQ ID NO 9D 10X1And/or H840X2A mutation, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NO 10-262, wherein X1Is any amino acid except D, and X2Is any amino acid other than H.
240. The Cas9 protein of any one of claims 218-239, wherein the Cas9 protein comprises the D10A mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
241. The Cas9 protein of claim 240, wherein the Cas9 protein comprises the H at amino acid residue 840 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding residue in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
242. The Cas9 protein of any one of claims 218-240, wherein the Cas9 protein comprises the H840A mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
243. The Cas9 protein of claim 242, wherein the Cas9 protein comprises the D at amino acid residue 10 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding residue in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
244. The Cas9 protein of any one of claims 218-243, wherein the Cas9 protein exhibits increased activity against a target sequence that does not include the canonical PAM (5' -NGG-3') at its 3' end, compared to the Streptococcus pyogenes (Streptococcus pyogenes) Cas9 provided as SEQ ID NO: 9.
245. A recombinant Cas9 protein comprising an amino acid sequence that is at least 90% identical to the amino acid sequence of Streptococcus pyogenes Cas9 provided as SEQ ID NO:9 or a fragment thereof,
wherein the amino acid sequence of the Cas protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 175,230,257,267,294,466,480,543,711,1063,1207,1219 and 1256 of the amino acid sequence provided in SEQ ID NO 9,
wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein, and
wherein the recombinant Cas9 protein exhibits increased activity on a target sequence that does not comprise a canonical PAM (5' -NGG-3') at its 3' end compared to the Streptococcus pyogenes Cas9 provided as SEQ ID NO: 9.
246. The Cas9 protein of claim 245, wherein the Cas9 protein comprises RuvC and HNH domains.
247. A Cas9 protein according to claim 245 or 246, wherein the Cas9 protein exhibits an activity on a target sequence that has a 3' end that is not directly adjacent to a canonical PAM sequence (5' -NGG-3'), which is at least a 5-fold, at least a 10-fold, at least a 50-fold, at least a 100-fold, at least a 500-fold, at least a 1000-fold, at least a 5000-fold, at least a 10000-fold, at least a 50000-fold, at least a 100000-fold, at least a 500000-fold, or at least a 1000000-fold increase in activity on the same target sequence as compared to the streptococcus pyogenes Cas9 provided as SEQ ID NO: 9.
248. The Cas9 protein of claim 247, wherein the 3' end of the target sequence is directly adjacent to an AGC, GAG, TTT, GTG or CAA sequence.
249. A Cas9 protein of claim 247 or 248, wherein the 3' end of the target sequence is directly adjacent to a sequence selected from the group consisting of: CAC, GAT, TAA, ACG, CGA and CGT.
250. The Cas9 protein of any one of claims 246-249, wherein Cas9 protein activity is measured by a nuclease assay, deamination assay, or transcription activation assay.
251. The Cas9 protein of claim 250, wherein the transcriptional activation assay is a GFP activation assay.
252. The Cas9 protein of any one of claims 245-251, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of: corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 9, X175T, X230F, X257N, X267G, X294R, X466A, X480K, X543D, X711E, X1207G, X1063V, X1219V and X1256K, or SEQ ID NO 10-262, wherein X represents any amino acid.
253. The Cas9 protein of any one of claims 245-252, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of: corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 9, N175T, P230F, D257N, S267G, K294R, T466A, E480K, E543D, A711E, E1207G, I1063V, E1219V and Q1256K, or SEQ ID NO 10-262.
254. A Cas9 protein according to any one of claims 245-253, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations X230F, X267G, X294R, X480K, X543D, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
255. A Cas9 protein according to any one of claims 245-253, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations X294R, X480K, X543D, X711E, X1207G, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
256. A Cas9 protein according to any one of claims 245-253, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations X175T, X267G, X294R, X480K, X543D, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
257. A Cas9 protein according to any one of claims 245-253, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations X257N, X267G, X294R, X466A, X480K, X543D, X1063V, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
258. A Cas9 protein according to any one of claims 245-253, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations X294R, X480K, X543D, X1219V and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
259. The Cas9 protein of any one of claims 245-253, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations X294R and X1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262, wherein X represents any amino acid.
260. A Cas9 protein according to any one of claims 245-253, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations P230F, S267G, K294R, E480K, E543D, E1219V and Q1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262.
261. A Cas9 protein according to any one of claims 245-253, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations K294R, E480K, E543D, a711E, E1207G, E1219V and Q1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262.
262. A Cas9 protein according to any one of claims 245-253, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the corresponding mutation in any one of the amino acid sequences provided in SEQ ID No. 9, mutations N175T, S267G, K294R, E480K, E543D, E1219V and Q1256K, or SEQ ID No. 10-262.
263. A Cas9 protein according to any one of claims 245-253, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations D257N, S267G, K294R, T466A, E480K, E543D, I1063V, E1219V and Q1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
264. A Cas9 protein according to any one of claims 245-253, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations K294R, E480K, E543D, E1219V and Q1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
265. The Cas9 protein of any one of claims 245-253, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations K294R and Q1256K of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
266. The Cas9 protein of any one of claims 246 and 265, wherein the amino acid sequence of the HNH domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the HNH domain of any one of seq id NOs 9-262.
267. The Cas9 protein of any one of claims 246-266, wherein the HNH domain has an amino acid sequence identical to the HNH domain of SEQ ID NO. 9.
268. The Cas9 protein of any one of claims 246-267, wherein the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of any one of seq id NOs 9-262.
269. The Cas9 protein of any one of claims 247-268, wherein the amino acid sequence of the Ruv domain is identical to the amino acid sequence of the Ruv domain of SEQ ID NO 9.
270. The Cas9 protein of any one of claims 245-269, wherein the Cas9 protein or fragment thereof comprises the D10A and/or H840A mutations of the amino acid sequences provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
271. The Cas9 protein of any one of claims 245-269, wherein the Cas9 protein or fragment thereof comprises D10X of the amino acid sequence provided in SEQ ID NO 91And/or H840X2A mutation, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NO 10-262, wherein X1Is any amino acid except D, and X2Is any amino acid other than H.
272. The Cas9 protein of any one of claims 245-271, wherein the Cas9 protein or fragment thereof comprises the D10A mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
273. The Cas9 protein of claim 272, wherein the Cas9 protein or fragment thereof comprises H at amino acid residue 840 of the amino acid sequence provided in SEQ ID No. 9, or a corresponding residue in any of the amino acid sequences provided in SEQ ID nos. 10-262.
274. The Cas9 protein of any one of claims 245-272, wherein the Cas9 protein or fragment thereof comprises the H840A mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
275. The Cas9 protein of claim 274, wherein the Cas9 protein or fragment thereof comprises the D at amino acid residue 10 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding residue in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
276. A recombinant Cas9 protein comprising an amino acid sequence at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of Cas9 as provided in any one of SEQ ID NOS 9-262,
wherein the amino acid sequence of the Cas protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 108,217,262,324,409,480,543,673,694,1219,1264 and 1365 of the amino acid sequence provided in SEQ ID NO 9, or the corresponding amino acid residue in any one of the amino acid sequences provided in SEQ ID NO 10-262, and
wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein.
277. The Cas9 protein of claim 276, wherein the Cas9 protein comprises RuvC and HNH domains.
278. The Cas9 protein of claim 276 or 277, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of: corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 9, X108G, X217A, X262T, X324L, X409I, X480K, X543D, X673E, X694I, X1219V, X1264Y and X1365I, or in SEQ ID NO 10-262, wherein X represents any amino acid.
279. The Cas9 protein of any one of claims 276-278, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of: corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 9, E108G, S217A, A262T, R324L, S409I, E480K, E543D, K673E, M694I, E1219V, H1264Y and L1365I, or in SEQ ID NO 10-262.
280. A Cas9 protein according to any one of claims 276-279, wherein the amino acid sequence of the Cas9 protein comprises the corresponding mutation in any one of the mutations X108G, X262T, X409I, X480K, X543D, X673E, X694I, X1219V and X1365I of the amino acid sequence provided in SEQ ID No. 9, or the amino acid sequence provided in SEQ ID No. 10-262, wherein X represents any amino acid.
281. A Cas9 protein according to any one of claims 276-279, wherein the amino acid sequence of the Cas9 protein comprises the corresponding mutation in any one of the mutations X108G, X217A, X262T, X409I, X480K, X543D, X694I, X1219V and X1365I of the amino acid sequence provided in SEQ ID No. 9, or the amino acid sequence provided in SEQ ID No. 10-262, wherein X represents any amino acid.
282. A Cas9 protein according to any one of claims 276-279, wherein the amino acid sequence of the Cas9 protein comprises the mutations X262T, X324L, X409I, X480K, X543D, X694I and X1219V of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
283. A Cas9 protein according to any one of claims 276-279, wherein the amino acid sequence of the Cas9 protein comprises the mutations X262T, X409I, X480K, X543D, X694I, X1219V and X1264Y of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
284. A Cas9 protein according to any one of claims 276-279, wherein the amino acid sequence of the Cas9 protein comprises the mutations X262T, X409I, X480K, X543D, X694I and X121 1219V of the amino acid sequence provided in SEQ id No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ id NOs 10-262, wherein X represents any amino acid.
285. A Cas9 protein according to any one of claims 276-279, wherein the amino acid sequence of the Cas9 protein comprises the mutations X262T, X409I, X480K, X543D, X694I, X1219V and X1365I of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
286. A Cas9 protein according to any one of claims 276-279, wherein the amino acid sequence of the Cas9 protein comprises the mutations X108G, X262T, X409I, X480K, X543D, X673E, X694I and X1219V of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
287. A Cas9 protein according to any one of claims 276-279, wherein the amino acid sequence of the Cas9 protein comprises the mutations X108G, X262T, X409I, X480K, X543D, X694I, X1219V and X1365I of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
288. The Cas9 protein of any one of claims 276-279, wherein the amino acid sequence of the Cas9 protein comprises the mutations X262T and X409I of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262, wherein X represents any amino acid.
289. A Cas9 protein according to any one of claims 276-279, wherein the amino acid sequence of the Cas9 protein comprises the corresponding mutation in any one of the mutations E108G, a262T, S409I, E480K, E543D, K673E, M694I, E1219V and L1365I of the amino acid sequence provided in SEQ ID No. 9, or the amino acid sequence provided in SEQ ID No. 10-262.
290. A Cas9 protein according to any one of claims 276-279, wherein the amino acid sequence of the Cas9 protein comprises the corresponding mutation in any one of the mutations E108G, S217A, a262T, S409I, E480K, E543D, M694I, E1219V and L1365 of the amino acid sequence provided in SEQ ID No. 9, or the amino acid sequences provided in SEQ ID NOs 10-262.
291. A Cas9 protein according to any one of claims 276-279, wherein the amino acid sequence of the Cas9 protein comprises the corresponding mutation in any one of the amino acid sequences provided in SEQ id No. 9, mutations a262T, R324L, S409I, E480K, E543D, M694I and E1219V, or SEQ id NOs 10-262.
292. A Cas9 protein according to any one of claims 276-279, wherein the amino acid sequence of the Cas9 protein comprises the corresponding mutation in any one of the mutations a262T, S409I, E480K, E543D, M694I, E121 1219V and H1264Y of the amino acid sequence provided in SEQ ID No. 9, or the amino acid sequences provided in SEQ ID NOs 10-262.
293. A Cas9 protein according to any one of claims 276-279, wherein the amino acid sequence of the Cas9 protein comprises the corresponding mutation in any one of the mutations a262T, S409I, E480K, E543D, M694I and E121 1219V of the amino acid sequence provided in SEQ id No. 9, or the amino acid sequences provided in SEQ id NOs 10-262.
294. A Cas9 protein according to any one of claims 276-279, wherein the amino acid sequence of the Cas9 protein comprises the corresponding mutation in any one of the amino acid sequences provided in SEQ ID No. 9, mutations a262T, S409I, E480K, E543D, M694I, E121 1219V and L1365I, or SEQ ID No. 10-262.
295. A Cas9 protein according to any one of claims 276-279, wherein the amino acid sequence of the Cas9 protein comprises the corresponding mutation in any one of the mutations E108G, a262T, S409I, E480K, E543D, K673E, M694I and E1219V of the amino acid sequence provided in SEQ ID No. 9, or the amino acid sequences provided in SEQ ID NOs 10-262.
296. A Cas9 protein according to any one of claims 276-279, wherein the amino acid sequence of the Cas9 protein comprises the corresponding mutation in any one of the mutations E108G, a262T, S409I, E480K, E543D, M694I, E1219V and L1365I of the amino acid sequence provided in SEQ ID No. 9, or the amino acid sequences provided in SEQ ID NOs 10-262.
297. The Cas9 protein of any one of claims 276-279, wherein the amino acid sequence of the Cas9 protein comprises the mutations a262T and S409I of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
298. The Cas9 protein of any one of claims 276-297, wherein the HNH domain has an amino acid sequence at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the HNH domain of any one of seq id NOs 9-262.
299. The Cas9 protein of any one of claims 277-298, wherein the HNH domain has an amino acid sequence identical to the HNH domain of SEQ ID NO. 9.
300. The Cas9 protein of any one of claims 277 and 299, wherein the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of any one of SEQ ID Nos. 9-262.
301. The Cas9 protein of any one of claims 277 and 300, wherein the amino acid sequence of the Ruv domain is identical to the amino acid sequence of the Ruv domain of SEQ ID NO 9.
302. The Cas9 protein of any one of claims 276-301, wherein the Cas9 protein comprises the D10A and/or H840A mutations of the amino acid sequences provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
303. The Cas9 protein of any one of claims 276-302, wherein the Cas9 protein comprises the amino acid sequence provided in SEQ ID NO 9D 10X1And/or H840X2A mutation, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NO 10-262, wherein X1Is any amino acid except D, and X2Is any amino acid other than H.
304. The Cas9 protein of any one of claims 276-303, wherein the Cas9 protein comprises the D10A mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
305. The Cas9 protein of claim 304, wherein the Cas9 protein comprises the H at amino acid residue 840 of the amino acid sequence provided in SEQ ID No. 9, or a corresponding residue in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
306. The Cas9 protein of any one of claims 276 and 304, wherein the Cas9 protein comprises the H840A mutation of the amino acid sequence provided in SEQ ID No. 9 or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
307. The Cas9 protein of claim 306, wherein the Cas9 protein comprises the D at amino acid residue 10 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding residue in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
308. The Cas9 protein of any one of claims 276-307, wherein the Cas9 protein exhibits increased activity against a target sequence that does not comprise the canonical PAM (5' -NGG-3') at its 3' end compared to the streptococcus pyogenes Cas9 provided as SEQ ID NO 9.
309. A recombinant Cas9 protein comprising an amino acid sequence that is at least 90% identical to the amino acid sequence of Streptococcus pyogenes Cas9 provided in SEQ ID NO:9 or a fragment thereof,
wherein the amino acid sequence of the Cas protein comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations selected from the group consisting of: amino acid residues 108,217,262,324,409,480,543,673,694,1219,1264 and 1365 of the amino acid sequence provided in SEQ ID NO 9,
wherein the amino acid sequence of the recombinant Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein, and
wherein the recombinant Cas9 protein exhibits increased activity on a target sequence that does not comprise a canonical PAM (5' -NGG-3') at its 3' end compared to the Streptococcus pyogenes Cas9 provided as SEQ ID NO: 9.
310. The Cas9 protein of claim 309, wherein the Cas9 protein comprises RuvC and HNH domains.
311. A Cas9 protein according to claim 309 or 310, wherein the Cas9 protein exhibits an activity against a target sequence that has a 3' end that is not directly adjacent to a canonical PAM sequence (5' -NGG-3'), which activity is at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1000-fold, at least 5000-fold, at least 10000-fold, at least 50000-fold, at least 100000-fold, at least 500000-fold, or at least 1000000-fold increased over the activity of streptococcus pyogenes Cas9 against the same target sequence as provided in SEQ ID No. 9.
312. The Cas9 protein of claim 311, wherein the 3' end of the target sequence is directly adjacent to an AGC, GAG, TTT, GTG or CAA sequence.
313. A Cas9 protein of claim 311 or 312, wherein the 3' end of the target sequence is directly adjacent to a sequence selected from the group consisting of seq id nos: CAC, GAT, TAA, ACG, CGA and CGT.
314. The Cas9 protein of any one of claims 310-313, wherein Cas9 protein activity is measured by a nuclease assay, deamination assay, or transcription activation assay.
315. The Cas9 protein of claim 314, wherein the transcriptional activation assay is a GFP activation assay.
316. The Cas9 protein of any one of claims 309-315, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of: corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 9, X108G, X217A, X262T, X324L, X409I, X480K, X543D, X673E, X694I, X1219V, X1264Y and X1365I, or in SEQ ID NO 10-262, wherein X represents any amino acid.
317. The Cas9 protein of any one of claims 309-316, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of: corresponding mutations in any of the amino acid sequences provided in SEQ ID NO 9, E108G, S217A, A262T, R324L, S409I, E480K, E543D, K673E, M694I, E1219V, H1264Y and L1365I, or in SEQ ID NO 10-262.
318. A Cas9 protein according to any one of claims 309-317, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations X108G, X262T, X409I, X480K, X543D, X673E, X694I, X1219V and X1365I of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
319. A Cas9 protein according to any one of claims 309-317, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations X108G, X217A, X262T, X409I, X480K, X543D, X694I, X1219V and X1365I of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262, wherein X represents any amino acid.
320. A Cas9 protein according to any one of claims 309-317, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations X262T, X324L, X409I, X480K, X543D, X694I and X1219V of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
321. A Cas9 protein according to any one of claims 309-317, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations X262T, X409I, X480K, X543D, X694I, X1219V and X1264Y of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
322. The Cas9 protein according to any one of claims 309-317, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations X262T, X409I, X480K, X543D, X694I and X1219V of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
323. A Cas9 protein according to any one of claims 309-317, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations X262T, X409I, X480K, X543D, X694I, X1219V and X1365I of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262, wherein X represents any amino acid.
324. A Cas9 protein according to any one of claims 309-317, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the corresponding mutation in any one of the mutations X108G, X262T, X409I, X480K, X543D, X673E, X694I and X1219V of the amino acid sequence provided in SEQ ID No. 9, or the amino acid sequence provided in SEQ ID No. 10-262, wherein X represents any amino acid.
325. A Cas9 protein according to any one of claims 309-317, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the corresponding mutation in any one of the mutations X108G, X262T, X409I, X480K, X543D, X694I, X1219V and X1365I of the amino acid sequence provided in SEQ ID No. 9, or the amino acid sequence provided in SEQ ID No. 10-262, wherein X represents any amino acid.
326. The Cas9 protein of any one of claims 309-317, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations X262T and X409I of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID nos. 10-262, wherein X represents any amino acid.
327. A Cas9 protein according to any one of claims 309-317, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the corresponding mutation in any one of the amino acid sequences provided in SEQ ID No. 9, E108G, a262T, S409I, E480K, E543D, K673E, M694I, E1219V and L1365I, or SEQ ID No. 10-262.
328. A Cas9 protein according to any one of claims 309-317, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the corresponding mutation in any one of the amino acid sequences provided in SEQ ID No. 9, E108G, S217A, a262T, S409I, E480K, E543D, M694I, E1219V and L1365I, or SEQ ID No. 10-262.
329. The Cas9 protein according to any one of claims 309-317, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the corresponding mutation in any one of the amino acid sequences provided in SEQ ID No. 9, mutations a262T, R324L, S409I, E480K, E543D, M694I and E1219V, or SEQ ID No. 10-262.
330. The Cas9 protein according to any one of claims 309-317, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the corresponding mutation in any one of the mutations a262T, S409I, E480K, E543D, M694I, E1219V and H1264Y of the amino acid sequence provided in SEQ ID No. 9, or the amino acid sequences provided in SEQ ID NOs 10-262.
331. The Cas9 protein according to any one of claims 309-317, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations a262T, S409I, E480K, E543D, M694I and E1219V of the amino acid sequence provided in SEQ ID No. 9 or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID No. 10-262.
332. A Cas9 protein according to any one of claims 309-317, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the corresponding mutation in any one of the amino acid sequences provided in SEQ ID No. 9, mutations a262T, S409I, E480K, E543D, M694I, E1219V and L1365I, or SEQ ID No. 10-262.
333. The Cas9 protein according to any one of claims 309-317, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the corresponding mutation in any one of the mutations E108G, a262T, S409I, E480K, E543D, K673E, M694I and E1219V of the amino acid sequence provided in SEQ ID No. 9, or the amino acid sequence provided in SEQ ID No. 10-262.
334. A Cas9 protein according to any one of claims 309-317, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the corresponding mutation in any one of the mutations E108G, a262T, S409I, E480K, E543D, M694I, E1219V and L1365I of the amino acid sequence provided in SEQ ID No. 9, or the amino acid sequence provided in SEQ ID No. 10-262.
335. The Cas9 protein of any one of claims 309-317, wherein the amino acid sequence of the Cas9 protein or fragment thereof comprises the mutations a262T and S409I of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
336. The Cas9 protein of any one of claims 310-335, wherein the HNH domain has an amino acid sequence at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the HNH domain of any one of seq id NOs 9-262.
337. The Cas9 protein of any one of claims 310-330, wherein the HNH domain has an amino acid sequence identical to the HNH domain of SEQ ID NO. 9.
338. The Cas9 protein of any one of claims 310-337, wherein the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of any one of SEQ ID NOS 9-262.
339. The Cas9 protein of any one of claims 310-338, wherein the Ruv domain has an amino acid sequence identical to that of the Ruv domain of seq id No. 9.
340. The Cas9 protein of any one of claims 309-339, wherein the Cas9 protein or fragment thereof comprises the D10A and/or H840A mutations of the amino acid sequences provided in SEQ ID No. 9, or the corresponding mutations in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
341. The Cas9 protein of any one of claims 309-339, wherein the Cas9 protein or fragment thereof comprises D10X of the amino acid sequence provided in SEQ ID NO:91And/or H840X2A mutation, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NO 10-262, wherein X1Is any amino acid except D, and X2Is any amino acid other than H.
342. The Cas9 protein of any one of claims 309-341, wherein the Cas9 protein or fragment thereof comprises the D10A mutation of the amino acid sequence provided in SEQ ID No. 9 or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
343. The Cas9 protein of claim 342, wherein the Cas9 protein or fragment thereof comprises H at amino acid residue 840 of the amino acid sequence provided in SEQ ID No. 9, or a corresponding residue in any one of the amino acid sequences provided in SEQ ID nos. 10-262.
344. The Cas9 protein of any one of claims 309-342, wherein the Cas9 protein or fragment thereof comprises the H840A mutation of the amino acid sequence provided in SEQ ID No. 9, or the corresponding mutation in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
345. The Cas9 protein of claim 344, wherein the Cas9 protein or fragment thereof comprises the D at amino acid residue 10 of the amino acid sequence provided in SEQ ID No. 9, or the corresponding residue in any one of the amino acid sequences provided in SEQ ID NOs 10-262.
346. A fusion protein comprising a Cas9 protein of any one of claims 1-345, wherein the Cas9 protein or fragment is fused to an effector domain, thereby forming a fusion protein.
347. The fusion protein of claim 346, wherein the effector domain is fused to the N-terminus of the Cas protein.
348. The fusion protein of claim 346, wherein the effector domain is fused to the C-terminus of the Cas protein.
349. The fusion protein of any one of claims 346-348, wherein the Cas9 protein and the effector domain are fused via a linker.
350. The fusion protein of any one of claims 346-349, wherein the effector domain is a nucleic acid editing domain.
351. The fusion protein of claim 349 or 350, wherein the linker comprises (GGGGS)n(SEQ ID NO:5),(G)n,(EAAAK)n(SEQ ID NO:6),(GGS)nSGSETPGTSESATPES (SEQ ID NO:7), or (XP)nA motif, or a combination of any of these, wherein n is independently an integer between 1 and 30.
352. The fusion protein of any one of claims 349-351, wherein the linker comprises (GGS)3And (c) a motif.
353. The fusion protein of any one of claims 346-352, wherein the effector domain comprises an enzyme domain.
354. The fusion protein of any one of claims 346-353, wherein the effector domain comprises a nuclease domain, a nickase domain, a recombinase domain, a deaminase domain, a methyltransferase domain, a methylase domain, an acetylase domain, an acetyltransferase domain, a transcriptional activator domain or a transcriptional repressor domain.
355. The fusion protein of claim 354, wherein the effector domain is a domain comprising nuclease activity, nickase activity, recombinase activity, deaminase activity, methyltransferase activity, methylase activity, acetylase activity, acetyltransferase activity, transcription activation activity, or transcription repression activity.
356. The fusion protein of claim 350, wherein the effector domain is a deaminase domain.
357. The fusion protein of claim 356, wherein the deaminase is a cytosine deaminase or a cytidine deaminase.
358. The fusion protein of claim 357, wherein the deaminase is an apolipoprotein B mRNA editing complex (APOBEC) family deaminase.
359. The fusion protein of claim 357, wherein the deaminase is an APOBEC1 deaminase.
360. The fusion protein of claim 357, wherein the deaminase is an APOBEC2 deaminase.
361. The fusion protein of claim 357, wherein the deaminase is an APOBEC3 deaminase.
362. The fusion protein of claim 357, wherein the deaminase is an APOBEC3A deaminase.
363. The fusion protein of claim 357, wherein the deaminase is an APOBEC3D deaminase.
364. The fusion protein of claim 357, wherein the deaminase is an APOBEC3E deaminase.
365. The fusion protein of claim 357, wherein the deaminase is an APOBEC3F deaminase.
366. The fusion protein of claim 357, wherein the deaminase is an APOBEC3G deaminase.
367. The fusion protein of claim 357, wherein the deaminase is an APOBEC3H deaminase.
368. The fusion protein of claim 357, wherein the deaminase is an APOBEC4 deaminase.
369. The fusion protein of claim 357, wherein the deaminase is an activation-induced deaminase (AID).
370. The fusion protein of any one of claims 350-369, wherein the effector domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the deaminase domain of any one of SEQ ID NOs 263-281.
371. The fusion protein of claim 353, wherein the enzyme domain is a nuclease domain.
372. The fusion protein of claim 371, wherein the nuclease domain is a fokl DNA cleavage domain.
373. A dimer of the fusion protein of claim 371 or 372.
374. The fusion protein of any one of claims 346-373, wherein the fusion protein is fused to a second Cas9 protein.
375. The fusion protein of claim 374, wherein the second Cas9 protein is Cas 9.
376. The fusion protein of claim 374 or 375, wherein the second Cas9 protein is a Cas9 protein of any one of claims 1-345.
377. The fusion protein of claim 374 or 376, wherein the second Cas9 protein is fused to the N-terminus of the fusion protein.
378. The fusion protein of claim 374 or 376, wherein the second Cas9 protein is fused to the C-terminus of the fusion protein.
379. The fusion protein of any one of claims 375-378, wherein the Cas9 protein and the second Cas9 protein are fused via a second linker.
380. The fusion protein of claim 379, wherein the second linker comprises (GGGGS)n(SEQ ID NO:5),(G)n,(EAAAK)n(SEQ ID NO:6),(GGS)nSGSETPGTSESATPES (SEQ ID NO:7), or (XP)nA motif, or a combination of any of these, wherein n is independently an integer between 1 and 30.
381. The fusion protein of claim 379, wherein the second linker comprises (GGS)3And (c) a motif.
382. A fusion protein comprising a Cas9 protein of any one of claims 22-345 fused to a second Cas9 protein.
383. The fusion protein of claim 382, wherein the second Cas9 protein is a Cas9 protein of any one of claims 22-345.
384. The fusion protein of claim 382, wherein the second Cas9 protein is a Cas9 protein of any one of claims 346-373.
385. The fusion protein of claim 383 or 384, wherein the second Cas9 protein is fused to the N-terminus of the Cas9 protein.
386. The fusion protein of claim 383 or 384, wherein the second Cas9 protein is fused to the C-terminus of the Cas9 protein.
387. The fusion protein of any one of claims 382-386, wherein the second Cas protein and the Cas9 protein are fused via a linker.
388. The fusion protein of claim 387, wherein the linker comprises (GGGGS)n(SEQ ID NO:5),(G)n,(EAAAK)n(SEQ ID NO:6),(GGS)nSGSETPGTSESATPES (SEQ ID NO:7), or (XP)nA motif, or a combination of any of these, wherein n is independently an integer between 1 and 30.
389. The fusion protein of claim 387, wherein the linker comprises (GGS)3And (c) a motif.
390. The fusion protein of claim 387, wherein the linker is a SGSETPGTSESATPES (SEQ ID NO:7) sequence.
391. A complex comprising a Cas9 protein of any one of claims 22-345 or a fusion protein of any one of claims 346-390, and a guide RNA that binds to the Cas9 protein or Cas9 fusion protein.
392. The complex of claim 391, wherein the guide RNA is about 15-100 nucleotides in length and comprises a sequence of at least 10 contiguous nucleotides that is complementary to the target sequence.
393. The complex of claim 392, wherein the guide RNA is 15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49, or 50 nucleotides in length.
394. The complex of any one of claims 391, 393, wherein the guide RNA comprises a sequence of 15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39 or 40 contiguous nucleotides complementary to the target sequence.
395. The complex of any one of claims 391-394, wherein the target sequence is a DNA sequence.
396. The complex of claim 395, wherein the target sequence is a sequence in the genome of a mammal.
397. The complex of claim 396, wherein the target sequence is a sequence in the genome of a human.
398. The complex of any one of claims 392-397, wherein the 3' end of the target sequence is not immediately adjacent to the canonical PAM sequence (5' -NGG-3 ').
399. A complex comprising the fusion protein of any one of claims 374-390,
a first guide RNA that binds to the Cas9 protein of the fusion protein, and
a second guide RNA that binds to a second Cas protein of the fusion protein.
400. The complex of claim 399, wherein the first guide RNA is about 15-100 nucleotides in length and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a first target sequence, and the second guide RNA is about 15-100 nucleotides in length and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a second target sequence.
401. The complex of claim 400, wherein the first guide RNA and/or the second guide RNA is 15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49, or 50 nucleotides in length.
402. The complex of any one of claims 399 and 401, wherein the first guide RNA and the second guide RNA are different.
403. The complex of any one of claims 399 and 402, wherein the first guide RNA comprises a sequence of 15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39 or 40 contiguous nucleotides complementary to the first target sequence, and wherein the second guide RNA comprises a sequence of 15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39 or 40 contiguous nucleotides complementary to the second target sequence.
404. The complex of any one of claims 400-403, wherein the first target sequence and the second target sequence are different.
405. The complex of any one of claims 400-404, wherein the first target sequence and the second target sequence are DNA sequences.
406. The complex of claim 405, wherein said first target sequence and said second target sequence are in the genome of a mammal.
407. The complex of claim 406, wherein said first target sequence and said second target sequence are in the human genome.
408. The complex of any one of claims 405-407, wherein the first target sequence is within 30 nucleotides of the second target sequence.
409. The complex of any one of claims 405-408, wherein the 3' end of the first target sequence is not immediately adjacent to the canonical PAM sequence (5' -NGG-3 ').
410. The complex of any one of claims 405-409 wherein the 3' end of the second target sequence is not immediately adjacent to the canonical PAM sequence (5' -NGG-3 ').
411. A method comprising contacting a DNA molecule with:
(a) the Cas9 protein of any one of claims 22-123 or the fusion protein of any one of claims 346-373, and a guide RNA, wherein the guide RNA is about 15-100 nucleotides in length and comprises a sequence of at least 10 contiguous nucleotides complementary to the target sequence; or
(b) The complex of any one of claims 391-410.
412. The method of claim 411, wherein the target sequence is a DNA sequence.
413. The method of claim 411 or 412, wherein the 3' end of the target sequence is not immediately adjacent to a canonical PAM sequence (5' -NGG-3 ').
414. The method of any one of claims 411, wherein the 3' end of the target sequence is immediately adjacent to an AGC, GAG, TTT, GTG or CAA sequence.
415. The method of any one of claims 411-414, wherein the target sequence comprises a sequence associated with a disease or disorder.
416. The method of claim 415, wherein the target DNA sequence comprises a point mutation associated with a disease or condition.
417. The method of claim 416, wherein the Cas9 protein, the Cas9 fusion protein, or the complex results in the correction of the point mutation.
418. The method of any one of claims 411-417, wherein the target DNA sequence comprises a T → C point mutation associated with a disease or disorder, and wherein deamination of the mutant C base results in a sequence not associated with a disease or disorder.
419. The method of claim 418, wherein the target DNA sequence encodes a protein, and wherein the point mutation is in a codon and results in a change in an amino acid encoded by the mutant codon as compared to the wild-type codon.
420. The method of claim 419, wherein deamination of mutant C results in an alteration of an amino acid encoded by the mutant codon.
421. The method of claim 420, wherein the deamination of mutant C results in a codon that encodes a wild-type amino acid.
422. The method of any one of claims 411-421, wherein the contacting is in vivo in a subject.
423. The method of claim 422, wherein the subject has or has been diagnosed with a disease or disorder.
424. The method of any one of claims 418-423, wherein the disease or disorder is cystic fibrosis, phenylketonuria, epidermal lytic hyperkeratosis (EHK), Charchar-Marie-Toot disease type 4J, Neuroblastoma (NB), von Willebrand disease (vWD), myotonia congenita, hereditary renal amyloidosis, Dilated Cardiomyopathy (DCM), hereditary lymphedema, familial Alzheimer's disease, HIV, prion diseases, chronic infantile neurocutaneous syndrome (CINCA), desmin-related myopathy (DRM), neoplastic diseases associated with mutant PI3KCA protein, mutant CTNNB1 protein, mutant HRAS protein, or mutant p53 protein.
425. The method of any one of claims 415 and 423, wherein the disease or disorder is associated with a T > C or a > G mutation in a gene selected from the genes disclosed in table 6 and table 7, respectively.
426. The method of any one of claims 415 and 423, wherein the disease or disorder is associated with a T > C or a > G mutation in a gene selected from the genes disclosed in table 6 and table 7, respectively.
427. A kit comprising a nucleic acid construct comprising:
(a) a sequence encoding a Cas9 protein of any one of claims 22-123 or a fusion protein of any one of claims 246-373; and
(b) a heterologous promoter that drives expression of the sequence of (a).
428. The kit of claim 427, further comprising an expression construct encoding a guide RNA backbone, wherein the construct comprises a cloning site positioned to allow cloning of a nucleic acid sequence identical or complementary to a target sequence into the guide RNA backbone.
429. A polynucleotide encoding a Cas9 protein of any one of claims 22-345 or a fusion protein of any one of claims 346-373.
430. A vector comprising the polynucleotide of claim 429.
431. The vector of claim 430, wherein the vector comprises a heterologous promoter that drives expression of the polynucleotide.
432. A cell comprising a Cas9 protein of any one of claims 22-345, a fusion protein of any one of claims 346-373, or a nucleic acid molecule encoding a Cas9 protein of any one of claims 22-345 or a fusion protein of any one of claims 346-373.
433. A mutant Cas9 protein that recognizes a non-canonical PAM sequence selected from the group consisting of: AAA, AAC, AAG, AAT, CAA, CAC, CAG, CAT, GAA, GAC, GAG, GAT, TAA, TAC, TAG, TAT, ACA, ACC, ACG, ACT, CCA, CCC, CCG, CCT, GCA, GCC, GCG, GCT, TCA, TCC, TCG, TCT, AGA, AGC, AGT, CGA, CGC, CGT, GGA, GGC, GGT, TGA, TGC, TGT, ATA, ATC, ATG, ATT, CTA, CTC, CTG, CTT, GTA, GTC, GTG, GTT, TTA, TTC, TTG and TTT.
434. The mutant Cas9 protein of claim 433, wherein the mutant Cas9 protein comprises a Cas9 protein of any one of claims 1-345.
Applications Claiming Priority (9)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US62/245,828 | 2015-10-23 | ||
| US62/279,346 | 2016-01-15 | ||
| US62/311,763 | 2016-03-22 | ||
| US62/322,178 | 2016-04-13 | ||
| US62/357,332 | 2016-06-30 | ||
| US62/357,352 | 2016-06-30 | ||
| US62/370,700 | 2016-08-03 | ||
| US62/398,490 | 2016-09-22 | ||
| US62/408,686 | 2016-10-14 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| HK1261797A1 true HK1261797A1 (en) | 2020-01-03 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7525174B2 (en) | Evolved Cas9 Protein for Gene Editing | |
| US12215365B2 (en) | Cas variants for gene editing | |
| US20240209329A1 (en) | Programmable cas9-recombinase fusion proteins and uses thereof | |
| HK1261797A1 (en) | Evolved cas9 proteins for gene editing | |
| HK40112186A (en) | Cas variants for gene editing | |
| HK40023069A (en) | Cas variants for gene editing | |
| HK40023069B (en) | Cas variants for gene editing | |
| HK1229366B (en) | Cas variants for gene editing | |
| HK1229366A1 (en) | Cas variants for gene editing | |
| HK1260598A1 (en) | Nucleobase editors and uses thereof |