[go: up one dir, main page]

WO2010001189A1 - The crystal structure of i-dmoi in complex with its dna target, improved chimeric meganucleases and uses thereof - Google Patents

The crystal structure of i-dmoi in complex with its dna target, improved chimeric meganucleases and uses thereof Download PDF

Info

Publication number
WO2010001189A1
WO2010001189A1 PCT/IB2008/002756 IB2008002756W WO2010001189A1 WO 2010001189 A1 WO2010001189 A1 WO 2010001189A1 IB 2008002756 W IB2008002756 W IB 2008002756W WO 2010001189 A1 WO2010001189 A1 WO 2010001189A1
Authority
WO
WIPO (PCT)
Prior art keywords
dmol
dna
dna target
polypeptide
domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/IB2008/002756
Other languages
French (fr)
Inventor
Maria Josefina Marcaida Lopez
Francisco Jesus Prieto Lugo
Guillermo Montoya Blanco
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cellectis SA
Original Assignee
Cellectis SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cellectis SA filed Critical Cellectis SA
Priority to PCT/IB2008/002756 priority Critical patent/WO2010001189A1/en
Publication of WO2010001189A1 publication Critical patent/WO2010001189A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2299/00Coordinates from 3D structures of peptides, e.g. proteins or enzymes

Definitions

  • the present invention relates to the three-dimensional structure of the meganuclease l-Dmol in combination with its DNA target.
  • the present invention also relates to l-Dmol enzymes with altered characteristics such as altered target half sites or altered catalysis properties and to chimeric meganucleases comprising portions of I-Dmol and to the use of the three-dimensional structure of the meganuclease l-Dmol in combination with its DNA target in an in silico screening method.
  • Meganucleases are sequence-specific enzymes which recognize large (12-45 bp) DNA target sites. These enzymes are often encoded by introns or inteins behaving as mobile genetic elements.
  • DSB repair by homologous recombination with an intron- or intein- containing gene results in the insertion of the intron or intein where DSB occurred, in a specific locus in living cells (Thierry and Dujon, 1992).
  • meganuclease-induced recombination has long been limited by the repertoire of natural meganucleases.
  • meganucleases are essentially represented by homing endonucleases, a family of endonucleases encoded by mobile genetic elements, whose function is to initiate DSB induced recombination events in a process referred to as homing (Chevalier and Stoddard, 2001).
  • homing endonucleases Several hundred homing endonucleases have been identified in bacteria, eukaryotes, and archaea (Chevalier and Stoddard, 2001); however, the probability of finding a homing endonuclease cleavage site in a chosen gene is extremely low.
  • Sequence homology has been used to classify homing endonucleases into four families, the largest one having the conserved LAGLIDADG sequence motif. Homing endonucleases with only one such motif function as homodimers. In contrast, larger homing endonucleases containing two motifs are single chain proteins (Dalgaard et al., 1993; Jacquier and Dujon, 1985).
  • LAGLIDADG endonuclease family Structural information for several members of the LAGLIDADG endonuclease family indicate that these proteins adopt a similar active conformation as homodimers or as monomers with two separate domains (Chevalier et al., 2001 ; Ichiyanagi et al., 2000; Silva et al., 1999; Spiegel et al., 2006).
  • the LAGLIDADG motifs form structurally conserved ⁇ -helices tightly packed at the center of the interdomain or intermonomer interface.
  • the last acidic residue of the LAGLIDADG motif participates in the DNA cleavage by a metal dependent mechanism of phosphodiester hydrolysis (Chevalier and Stoddard, 2001).
  • Homing endonucleases with one LAGLIDADG (L) are around 20 kDa in molecular mass and act as homodimers. Those with two copies (LL) range from 25 kDa (230 amino acids) to 50 kDa (HO, 545 amino acids) with 70 to 150 residues between each motif and act as a monomer. Cleavage of the target sequence occurs inside the recognition site, leaving a 4 nucleotide staggered cut with 3'OH overhangs.
  • ⁇ -Ceul and I-Crel are homing endonucleases with one LAGLIDADG motif (mono- LAGLIDADG).
  • I-Dmol (194 amino acids, SWISSPROT accession number P21505 (SEQ ID NO: 2)), l-Scel, Pl-Pful and Fl-Scel are homing endonucleases with two LAGLIDADG motifs.
  • residue numbers refer to the amino acid numbering of the I-Dmol sequence SWISSPROT number P21505 (SEQ ID NO: 2).
  • LAGLIDADG proteins have been crystallized and they have been shown to exhibit a striking conservation of the core structure that contrasts with a lack of similarity at the primary sequence level (Jurica et al., 1998; Chevalier et al., 2001 ; Chevalier et al., 2003; Moure et al., 2003; Moure et al., 2002; Ichiyanagi et al., 2000; Duan et al., 1997; Bolduc et al., 2003; Silva et al., 1999).
  • LAGLIDADG proteins should they cut as dimers like I-Crel or as a monomer like /- Dmol, adopt a similar active conformation.
  • the LAGLIDADG motifs are central and form two packed ⁇ -helices where a 2-fold (pseudo-) symmetry axis separates two monomers or apparent domains.
  • the LAGLIDADG motif corresponds to residues 13 to 21 in l-Crel, and to positions 12 to 20 and 109 to 1 17, in I-Dmol.
  • a four ⁇ -sheet provides a DNA binding interface that drives the interaction of the protein with the half site of the target DNA sequence.
  • I-Dmol is similar to I-Crel dimers, except that the A domain (residues 1 to 95) and the B domain
  • LAGLIDADG proteins including Pl-Scel (Gimble et al.,
  • I-Crel (Seligman et al., 2002; Sussman et al., 2004; Rosen et al., 2006;
  • I-Scel Doyon et al., 2006
  • I-Msol I-Msol
  • Semi rational design assisted by high throughput screening methods have allowed the Applicants to derive thousands of novel proteins from I-Crel, an homodimeric protein from the LAGLIDADG family (Smith et al., 2006; Arnould et al., 2006).
  • Another strategy is to combine domains from distinct meganucleases. This approach has been illustrated by the creation of new meganucleases by domain swapping between I-Crel and I-Dmol, leading to the generation of a meganuclease cleaving the hybrid sequence corresponding to the fusion of the two half parent target sequences (Epinat et al., 2003; Chevalier et al.,
  • I-Dmol is a 22 kDa endonuclease from the hyperthermophilic archae
  • Desulfurococcus mobilis It is a monomelic protein comprising two similar domains, which have both a LAGLIDADG motif. The structure of the protein alone, without its
  • D 1234 DNA target henceforth referred to as D 1234 (SEQ ID NO: 7), has been solved (Silva et al., 1999).
  • E-Drel Engineered I-Dmol/I-Crel
  • E-Drel consists of the fusion of the first or A domain of I-Dmol to a single subunit of the I-Crel homodimer linked by a flexible linker to create the initial scaffold for the enzyme.
  • Chevalier et al. then made a number of residue modifications based upon the predictions of computational interface algorithms so as to alleviate any potential steric clashes predicted from a 3D model generated by combining elements of previously generated I-Dmol and I-Crel models.
  • Residues were identified between the facing surfaces of the two component molecules; in particular residues at positions 47, 51, 55, 108, 193 and 194 were identified as potentially clashing. These residues were replaced with alanine residues but such a modified protein was found to be insoluble.
  • Residue numbers refer to the E-Drel open reading frame which comprises 101 residues (beginning at the first methionine) from domain A of I-Dmol fused to the last 156 residues of I-Crel separated by a three amino acid NGN linker which mimics the native I-Dmol linker in length.
  • the interface was then optimised through a combination of computational redesign for residues 47, 51, 55, 108, 193 and 194 as well as residues 12, 13, 17, 19, 52, 105, 109 and 1 13; followed by an in vivo protein folding assay upon selected sequences to determine the solubility of E-Drel enzymes modified at these residues.
  • a final scaffold was designed with modifications: 119, H51 and H55 of I-Dmol and E8, LI l, F16, K96 and L97 of I-Crel (corresponding to E105, L108, Fl 13, K193 and L194).
  • E-Drel Cholier et al., 2002 structure in complex with its chimeric DNA target dre3 (C12D34 (SEQ ID NO: 5) using the applicants nomenclature) was solved. E-Drel was shown able to recognise and cut this hybrid
  • C12D34 (SEQ ID NO: 5) target only. From this structure a number of residues were predicted to be base-specific contacts of E-Drel to its target hybrid site, these residues were 25, 29, 31, 33, 34, 35, 37, 70, 75, 76, 77, 79, 81 of I-Dmol; and residues 123, 125, 127, 130, 137, 135, 139, 141, 163, 165, 167, 172 of /-Oe/.
  • DmoCre is a chimeric molecule built from the two homing endonucleases I-Dmol and I-Crel. It includes the N-terminal portion from I-Dmol linked to an I-Crel monomer.
  • DmoCre could have a tremendous advantage as scaffold: mutation in the I-Dmol moiety could be combined with mutations in the /- Crel domain, and thousands of such variant I-Crel molecules have already been identified and profiled (Smith J et al., 2006; Arnould S et al., 2006; Arnould S et al., 2007).
  • DmoCre is a monomeric protein that corresponds to the A domain of I-Dmol up to residue F 109 followed by I-Crel from residue L 13. To avoid a steric clash, 1107 of the I-Dmol domain was mutated into a leucine residue. In addition, residues 47, 51 and 55 of /- Dmol, which were found to be close to residues 96 and 97 of I-Crel, were mutated to alanine, alanine and aspartic acid respectively.
  • DmoCre has been shown to be active in vitro (Epinat et al., 2003) and was able to cleave the hybrid target C12D34 (SEQ ID NO: 5) composed from the left part of C 1234 (SEQ ID NO: 4) or C 1221 (SEQ ID NO: 6) (the palindromic target derived from C 1234) and the D 1234 (SEQ ID NO: 7).
  • I-Dmol and DmoCre variants able to cleave their DNA target sequences more efficiently at 37°C were identified by random mutagenesis and screening in yeast cells (WO 2005/105989; Prieto et al., 2008).
  • E-Drel and DmoCre chimeric enzymes therefore have in common the A domain of I-Dmol.
  • These E-Drel and DmoCre chimeric enzymes differ significantly in other respects as outlined above.
  • the inventors are interested in creating a new generation of chimeric enzymes which recognize a wider set of target sequences. By being able to target new
  • the applicants provide the tools to thereby induce a DNA recombination event, a loss of a particular DNA segment or cell death.
  • This double-strand break can be used to: repair a specific sequence, modifying a specific sequence, restoring a functional gene in place of a mutated one, attenuating or activating an endogenous gene of interest, introducing a mutation into a site of interest, introducing an exogenous gene or a part thereof, inactivating or detecting an endogenous gene or a part thereof, translocating a chromosomal arm, or leaving the DNA unrepaired and degraded.
  • modified meganuclease enzymes therefore give a user a wide variety of potential options in the therapeutic, research or other productive use of such modified meganuclease enzymes.
  • the inventors have therefore sought to improve chimeric meganuclease enzymes comprising at least one l-Dmol domain by seeking to increase the number of DNA targets these chimeric enzymes can recognize and cut.
  • the present invention relates to a polypeptide, comprising the sequence of an l-Dmol endonuclease or a chimeric derivative thereof, including at least the l-Dmol domain B and characterized in that it comprises the substitution of at least one of the residues at positions 124, 126, 154, 155 of said l-Dmol domain B and wherein the polypeptide recognises an l-Dmol DNA target half-site which differs from a wild type I-Dmol DNA target half-site SEQ ID NO: 1, in at least one of positions ⁇ 2, ⁇ 3, ⁇ 5 ⁇ 6, ⁇ 7.
  • substitution of one amino acid residue for another is well known in the art to cause changes to the structure and activity of a protein.
  • substitution of a non-polar amino acid residue for a polar amino acid would be expected to alter the interaction of this residue with the polypeptide in which it is present potentially affecting the three-dimensional structure thereof or conformation of an active/binding site and also to affect the function of the residue if this is linked to the presence of a polar side chain.
  • this gross replacement of one type of amino acid with another more subtle alterations are also possible.
  • polypeptide further comprises the substitution of at least one of the residues in positions 1 19, 128, 157 of the I-Dmol domain B, by any amino acid, which alters the recognition of said polypeptide for an I-Dmol DNA target half-site which differs from a wild type I-Dmol DNA target half-site SEQ ID NO: 1, in at least one of positions ⁇ 2 ⁇ 3.
  • polypeptide further comprises the substitution of at least one of the residues in positions 1 15, 1 16, 1 17, 1 18, 120, 130, 150, 152, 153, 156, 158, 160, 164, 166, 167, 170 of said I-Dmol domain B, by any amino acid, which alters the recognition of said polypeptide for an I-Dmol DNA target half-site which differs from a wild type I-Dmol DNA target half-site SEQ ID NO: 1, in at least one of positions ⁇ 1, ⁇ 2 ⁇ 3, ⁇ 4, ⁇ 5 ⁇ 6, ⁇ 7, ⁇ 8, ⁇ 9.
  • polypeptide is a chimeric-Dm ⁇ endonuclease consisting of the fusion of said I-Dmo I domain B to a sequence of a dimeric LAGLIDADG homing endonuclease or to a domain of another monomeric LAGLIDADG homing endonuclease.
  • said I-Dmol domain B is fused to a domain selected from one of the enzymes in the group: /See I, I-Chu I, I-Cre I, I-Csm I, Pl-Sce I, PITH I, PI-Mtu I, I-Ceu I, I-Sce II, I-Sce III, HO, Pi-Civ I, PI-Ctr I, PI-Aae I, PI-Bsu I, PI-Dha I, PI-Dr a I, PI-Mav I, PI-Mch I, PI-Mfu I, PI-MjI I, PI-Mga I, PI-Mgo I, PI- Min I, PI-Mka I, PI-MIe I, PI-Mma I, PI-Msh I, PI-Msm I, Pl-Mth I, PI-Mtu I 1 PI-Mxe I, PI-Npu
  • polypeptide is characterized in that said I-Dmol domain B is at the NH 2 -terminus of said chimeric-Dw ⁇ endonuclease.
  • polypeptide is characterized in that said dimeric LAGLIDADG homing endonuclease is I-Crel.
  • polypeptide is characterized in that it comprises a detectable tag at its NH 2 and/or COOH terminus.
  • polypeptide is characterized in that it comprises a
  • NLS Nuclear Localisation Signal
  • the NLS comprises a peptide sequence selected from the group consisting of: SEQ ID NO: 19, 20, 21, 22, 23.
  • a nuclear localizing sequence is an amino acid sequence which acts to target the protein to the cell nucleus through the Nuclear Pore Complex and to direct a newly synthesized protein into the nucleus via its recognition by cytosolic nuclear transport receptors.
  • NLSs consist of one or more short sequences of positively charged amino acids such as lysines or arginines.
  • the NLS is selected from the NLS sequences of the known proteins SV40 large T antigen -PKKKRKV- (SEQ ID NO: 19), nucleoplasm ⁇ -KR[PAATKKAGQA]KKKK- (SEQ ID NO: 20), p54 -RIRKKLR- (SEQ ID NO: 21), SOX9 -PRRRK- (SEQ ID NO: 22), NS5A -PPRKKRTVV- (SEQ ID NO: 23).
  • a polynucleotide characterized in that it encodes a polypeptide according to the present invention.
  • a vector characterized in that it comprises a polynucleotide according to the present invention.
  • a host cell characterized in that it is modified by a polynucleotide or a vector according to the present invention.
  • a non-human transgenic animal characterized in that all or part of its cells are modified by a polynucleotide or a vector according to the present invention.
  • a transgenic plant characterized in that all or part of its cells are modified by a polynucleotide or a vector according to the present invention.
  • a seventh aspect of the present invention there is provided the use of a polypeptide, a polynucleotide, a vector, a cell, a non-human animal or a plant, according to the present invention for the selection and/or the screening of meganucleases with novel DNA target specificity.
  • a method of identifying polypeptides comprising at least one domain of /- Dmol which can recognise and bind to an altered DNA target, comprising at least the steps of: i) applying a 3-dimensional molecular modelling algorithm to at least the set of atomic coordinates set out in Table II and figures 8 and 9 to determine the spatial coordinates of the DNA interacting portions of a candidate polypeptide and its native DNA target, modelled from the set of atomic coordinates and generating a model; ii) modifying at least one residue of the candidate polypeptide and altering the characteristics of the model accordingly; iii) electronically screening the modified candidate polypeptide of step ii) against a stored set of spatial coordinates representing the native DNA target sequence and at least one variant thereof; iv) calculating from said model the interaction energies of the modified candidate polypeptide of step ii) with the stored set of DNA targets; v) converting said interaction energies into a probability score to predict the preference of the modified polypeptide for
  • the inventors have developed a new means to cut down the number of in vitro/in vivo experiments that need to be performed when attempting to identify I-Dmol enzymes or chimeric derivatives thereof that bind to altered DNA targets, by modelling these variants in a first screen in silico to identify possible candidate polypeptides for further in vitro/in vivo studies.
  • the stored set of spatial coordinates of step iii) comprises the native DNA target in which at least one base therein is changed to the three alternate possible bases.
  • the stored set of spatial coordinates comprises all possible variants of the native DNA target sequence.
  • the modified residue of step ii) forms a direct contact between the candidate polypeptide and the native DNA target sequence.
  • the modified residue of step ii) forms an indirect contact between the candidate polypeptide and the native DNA target sequence.
  • the modified residue of step ii) forms a molecular interaction selected from the group: hydrogen bond, polar contact and van der Waals interactions, between said candidate polypeptide and said native DNA target sequence.
  • the candidate polypeptide comprises at least said I-Dmol domain B; and wherein in step ii) at least one of residues in positions 1 15, 1 16, 1 17, 1 18, 1 19,120, 124, 126, 128,130, 150, 152, 153, 154, 156, 155, 157, 158, 160, 164, 166, 167, 170 of said I-Dmol domain B is altered; and wherein the at least one altered DNA target differs from a native DNA target consisting of SEQ ID NO: 7, in at least one of positions -1, -2, -3, -4 -5 -6, -7, -8, -9.
  • the candidate polypeptide comprises at least the I-Dmol domain A; and wherein in step ii) at least one of residues in positions 15,
  • the at least one altered DNA target differs from a native DNA target consisting of SEQ ID NO: 7, in at least one of positions +1, +2, +3, +4, +5 +6, +7, +8, +9, +10, +1 1, +12, +13.
  • the candidate polypeptide consists of an A or B domain of I-Dmol fused to a dimeric LAGLIDADG homing endonuclease or to a domain of another monomeric LAGLIDADG homing endonuclease.
  • the candidate polypeptide consists of either said I-Dmol domain A or B fused to a domain selected from one of the enzymes in the group: I-Sce I, I-Chu I, I-Cre I 1 I-Csm I, Pl-Sce I, PI-TU I, PI-Mtu I, I-Ceu I, I-Sce II, I-Sce III, HO 1 Pi-Civ I, Pl-Ctr I, PI-Aae I, PI-Bsu I, PI-Dha I, PI-Dra I, PI-Mav I, PI- Mch I, PI-Mfu I 1 PI-Mfl I, PI-Mga I, PI-Mgo I, PI-Min I, PI-Mka I, PI-MIe I, PI-Mma I 1 PI-Msh I, PI-Msm I, PI-Mth I, PI-Mtu I,
  • Figure 1. shows the crystal structure of I-Dmol in complex with its target DNA.
  • Figure 2a. - shows a detailed view of the I-Dmol active site.
  • Figure 2c. - shows two atoms of manganese in the digested DNA structure.
  • Figure 2d. - shows a schematic diagram of the hypothetical enzymatic mechanism proposed for I-Dmol.
  • Figure 3. shows a scheme of the Protein-DNA contacts in the Ca 2+ and Mn 2+ bound structures.
  • Figure 4. shows the loops involved in DNA binding by I-Dmol.
  • the upper part of the figure depicts a ribbon diagram of the I-DmoIIOHA. complex.
  • the lower part of the figure shows detailed insets of the three loops involved in DNA interactions.
  • Figure 5a shows a structural sequence alignment between the archaeal I-Dmol, eukaryotic I-Scel and I-Crel homing endonucleases.
  • Figure 5b. - shows a comparison of the location of the protein-base contacts in the I-Dmol, I-Scel and I-Crel protein-DNA structures.
  • Figure 5c. - shows a schematic view of the protein-base contacts.
  • Figure 6a - shows In silico binding patterns for I-Dmol, E-Drel, I- Crel and I-Scel.
  • Figure 6b shows In silico R-10NNN binding pattern predicted by FoIdX.
  • Figure 7a shows in vivo cleavage patterns for the I-Dmol recognition site.
  • Figure 7b. - shows cleavage activities of I-Dmol wild type and of two mesophilic I-Dmol variants (Dl, D2).
  • Figure 8 lists the atomic coordinate data of I-Dmol in combination with its target in the presence Of Mn 2+ in pdb format.
  • Figure 9 lists the atomic coordinate data of I-Dmol in combination with its target in the presence of Ca 2+ in pdb format.
  • - Amino acid residues in a polypeptide sequence are designated herein according to the one-letter code, in which, for example, Q means GIn or Glutamine residue, R means Arg or Arginine residue and D means Asp or Aspartic acid residue.
  • - hydrophobic amino acid refers to leucine (L), valine (V), isoleucine (I), alanine (A), methionine (M), phenylalanine (F), tryptophane (W) and tyrosine (Y).
  • nucleosides are designated as follows: one-letter code is used for designating the base of a nucleoside: a is adenine, t is thymine, c is cytosine, and g is guanine.
  • r represents g or a (purine nucleotides)
  • k represents g or t
  • s represents g or c
  • w represents a or t
  • m represents a or c
  • y represents t or c (pyrimidine nucleotides)
  • d represents g, a or t
  • v represents g, a or c
  • b represents g, t or c
  • h represents a, t or c
  • n represents g, a, t or c.
  • parent LAGLIDADG homing endonuclease is intended a wild-type LAGLIDADG homing endonuclease or a functional variant thereof.
  • Said parent LAGLIDADG homing endonuclease may be a monomer, a dimer (homodimer or heterodimer) comprising two LAGLIDADG homing endonuclease core domains which are associated in a functional endonuclease able to cleave a double-stranded DNA target of 22 to 24 bp.
  • LAGLIDADG homing endonuclease a wild-type homodimeric LAGLIDADG homing endonuclease having a single LAGLIDADG motif and cleaving palindromic DNA target sequences, such as I-Crel or l-Msol or a functional variant thereof.
  • LAGLIDADG homing endonuclease variant or “variant” is intended a protein obtained by replacing at least one amino acid of a LAGLIDADG homing endonuclease sequence, with a different amino acid.
  • LAGLIDADG homing endonuclease variant which is able to cleave a DNA target, preferably a new DNA target which is not cleaved by a wild type LAGLIDADG homing endonuclease.
  • such variants have amino acid variation at positions contacting the DNA target sequence or interacting directly or indirectly with said DNA target.
  • homose variant with novel specificity is intended a variant having a pattern of cleaved targets (cleavage profile) different from that of the parent homing endonuclease.
  • the variants may cleave less targets (restricted profile) or more targets than the parent homing endonuclease.
  • the variant is able to cleave at least one target that is not cleaved by the parent homing endonuclease.
  • novel specificity refers to the specificity of the variant towards the nucleotides of the DNA target sequence.
  • I-CreF is intended the wild-type I-Crel having the sequence SWISSPROT P05725 or pdb accession code Ig9y (SEQ ID NO:8) .
  • I-DmoF is intended the wild-type I-Dmol having the sequence SWISSPROT number P21505 (SEQ ID NO: 2) .
  • domain or “core domain” is intended the "LAGLIDADG homing endonuclease core domain” which is the characteristic ⁇ fold of the homing endonucleases of the LAGLIDADG family, corresponding to a sequence of about one hundred amino acid residues. Said domain comprises four beta-strands folded in an antiparallel beta-sheet which interacts with one half of the DNA target. This domain is able to associate with another LAGLIDADG homing endonuclease core domain which interacts with the other half of the DNA target to form a functional endonuclease able to cleave said DNA target.
  • the LAGLIDADG homing endonuclease core domain corresponds to the residues 6 to 94.
  • two such domains are found in the sequence of the endonuclease; for example in I-Dmol ( 194 amino acids), the A domain (residues 7 to 99) and the B domain (residues 104 to 194) are separated by a short linker (residues 100 to 103).
  • subdomain is intended the region of a LAGLIDADG homing endonuclease core domain which interacts with a distinct part of a homing endonuclease DNA target half-site.
  • Two different subdomains behave independently or partly independently, and the mutation in one subdomain does not alter the binding and cleavage properties of the other subdomain, or does not alter it in a number of cases. Therefore, two subdomains bind distinct part of a homing endonuclease DNA target half-site.
  • Beta-hairpin is intended two consecutive beta-strands of the antiparallel beta-sheet of a LAGLIDADG homing endonuclease core domain which are connected by a loop or a turn, - by "C 1221" it is intended to refer to the first half of the I-Crel target site ' 12' repeated backwards so as to form a palindrome ' 1221 '.
  • the cleavage activity of the variant of the invention may be measured by a direct repeat recombination assay, in yeast or mammalian cells, using a reporter vector, as described in the PCT Application WO 2004/067736; Epinat et al, 2003; Chames et al., 2005 and Arnould et al., 2006.
  • the reporter vector comprises two truncated, non-functional copies of a reporter gene (direct repeats) and a chimeric DNA target sequence within the intervening sequence, cloned in a yeast or a mammalian expression vector.
  • the DNA target sequence is derived from the parent homing endonuclease cleavage site by replacement of at least one nucleotide by a different nucleotide.
  • a panel of palindromic or non- palindromic DNA targets representing the different combinations of the 4 bases (g, a, c, t) at one or more positions of the DNA cleavage site is tested (4 n palindromic targets for n mutated positions).
  • Expression of the variant results in a functional endonuclease which is able to cleave the DNA target sequence. This cleavage induces homologous recombination between the direct repeats, resulting in a functional reporter gene, whose expression can be monitored by appropriate assay.
  • cleavage site is intended a 22 to 24 bp double- stranded palindromic, partially palindromic (pseudo-palindromic) or non-palindromic polynucleotide sequence that is recognized and cleaved by a LAGLIDADG homing endonuclease.
  • These terms refer to a distinct DNA location, preferably a genomic location, at which a double stranded break (cleavage) is to be induced by the endonuclease.
  • the DNA target is defined by the 5' to 3' sequence of one strand of the double-stranded polynucleotide.
  • the palindromic DNA target sequence cleaved by wild type I-Crel is defined by the sequence 5'- t.i 2 c.na-ioa -9 a-ga -7 c -6 g. 5 t -4 c- 3 g. 2 t-ia + [C +2 g+ 3 a + 4C + 5g + 6t+7t+ 8 t+9t+iog+iia + i2 (SEQ ID NO:4).
  • Cleavage of the DNA target occurs at the nucleotides in positions +2 and -2, respectively for the sense and the antisense strand. Unless otherwise indicated, the position at which cleavage of the DNA target by a meganuclease variant occurs, corresponds to the cleavage site on the sense strand of the DNA target.
  • DNA target half-site by "DNA target half-site", "half cleavage site” or half-site” is intended the portion of the DNA target which is bound by each LAGLIDADG homing endonuclease core domain.
  • DClONNN (SEQ ID NO: 3) it is intended that this is the target sequence of DmoCre with variability in positions +8, +9 and +10 of the sequence, hence DmoCre in position 10 variable at 3 nucleotides sequentially backwards from 10.
  • DC4NNN (SEQ ID NO: 9) refers to the target sequence of DmoCre with variability in positions +2, +3 and +4 of the sequence
  • DC7NNN (SEQ ID NO: 10) refers to the target sequence of DmoCre with variability in positions +5, +6 and +7 of the sequence.
  • chimeric DNA target or "hybrid DNA target” is intended the fusion of a different half of two parent meganuclease target sequences.
  • at least one half of said target may comprise the combination of nucleotides which are bound by separate subdomains (combined DNA target).
  • mutation is intended the substitution, the deletion, and/or the addition of one or more nucleotides/amino acids in a nucleic acid/amino acid sequence.
  • homologous is intended a sequence with enough identity to another one to lead to a homologous recombination between sequences, more particularly having at least 95 % identity, preferably 97 % identity and more preferably 99 %.
  • Identity refers to sequence identity between two nucleic acid molecules or polypeptides. Identity can be determined by comparing a position in each sequence which may be aligned for purposes of comparison. When a position in the compared sequence is occupied by the same base, then the molecules are identical at that position. A degree of similarity or identity between nucleic acid or amino acid sequences is a function of the number of identical or matching nucleotides at positions shared by the nucleic acid sequences.
  • Various alignment algorithms and/or programs may be used to calculate the identity between two sequences, including FASTA, or BLAST which are available as a part of the GCG sequence analysis package (University of Wisconsin, Madison, Wis.), and can be used with, e.g., default settings.
  • mammals as well as other vertebrates (e.g., birds, fish and reptiles).
  • mammals e.g., birds, fish and reptiles.
  • Examples of mammalian species include humans and other primates (e.g., monkeys, chimpanzees), rodents (e.g., rats, mice, guinea pigs) and ruminants (e.g., cows, pigs, horses).
  • genetic disease refers to any disease, partially or completely, directly or indirectly, due to an abnormality in one or several genes.
  • Said abnormality can be a mutation, an insertion or a deletion.
  • Said mutation can be a punctual muta- tion.
  • Said abnormality can affect the coding sequence of the gene or its regulatory sequence.
  • Said abnormality can affect the structure of the genomic sequence or the structure or stability of the encoded mRNA. This genetic disease can be recessive or dominant.
  • Such genetic disease could be, but are not limited to, cystic fibrosis, Huntington's chorea, familial hyperchoiesterolemia (LDL receptor defect), hepatoblastoma, Wilson's disease, congenital hepatic porphyrias, inherited disorders of hepatic metabolism, Lesch Nyhan syndrome, sickle cell anemia, thalassaemias, xeroderma pigmentosum, Fanconi's anemia, retinitis pigmentosa, ataxia telangiectasia, Bloom's syndrome, retinoblastoma, Duchenne's muscular dystrophy, and Tay-Sachs disease.
  • vectors which can be used in the present invention includes, but is not limited to, a viral vector, a plasmid, a RNA vector or a linear or circular DNA or RNA molecule which may consists of a chromosomal, non chromosomal, semi-synthetic or synthetic nucleic acids.
  • Preferred vectors are those capable of autonomous replication (episomal vector) and/or expression of nucleic acids to which they are linked (expression vectors). Large numbers of suitable vectors are known to those of skill in the art and commercially available.
  • Viral vectors include retrovirus, adenovirus, parvovirus (e. g.
  • RNA viruses such as orthomyxovirus (e. g., influenza virus), rhabdovirus (e. g., rabies and vesicular stomatitis virus), paramyxovirus (e. g. measles and Sendai), positive strand RNA viruses such as picor- navirus and alphavirus, and double-stranded DNA viruses including adenovirus, herpesvirus (e. g., Herpes Simplex virus types 1 and 2, Epstein-Barr virus, cytomegalovirus), and poxvirus (e. g., vaccinia, fowlpox and canarypox).
  • orthomyxovirus e. g., influenza virus
  • rhabdovirus e. g., rabies and vesicular stomatitis virus
  • paramyxovirus e. g. measles and Sendai
  • positive strand RNA viruses such as picor- navirus
  • viruses include Norwalk virus, togavirus, flavivirus, reoviruses, papovavirus, hepadnavirus, and hepatitis virus, for example.
  • retroviruses include: avian leukosis- sarcoma, mammalian C-type, B-type viruses, D type viruses, HTLV-BLV group, lentivirus, spumavirus (Coffin, J. M., Retroviridae: The viruses and their replication, In Fundamental Virology, Third Edition, B. N. Fields, et al., Eds., Lippincott-Raven Publishers, Philadelphia, 1996).
  • vector refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked.
  • One type of preferred vector is an episome, i.e., a nucleic acid capable of extra-chromosomal replication.
  • Preferred vectors are those capable of autonomous replication and/or expression of nucleic acids to which they are linked.
  • Vectors capable of directing the expression of genes to which they are operatively linked are referred to herein as "expression vectors”.
  • a vector according to the present invention comprises, but is not limited to, a YAC (yeast artificial chromosome), a BAC (bacterial artificial), a baculovirus vector, a phage, a phagemid, a cosmid, a viral vector, a plasmid, a RNA vector or a linear or circular DNA or RNA molecule which may consist of chromosomal, non chromosomal, semi-synthetic or synthetic DNA.
  • expression vectors of utility in recombinant DNA techniques are often in the form of "plasmids" which refer generally to circular double stranded DNA loops which, in their vector form are not bound to the chromosome.
  • Vectors can comprise selectable markers, for example: neomycin phosphotransferase, histidinol dehydrogenase, dihydrofolate reductase, hygromycin phosphotransferase, herpes simplex virus thymidine kinase, adenosine deaminase, glutamine synthetase, and hypoxanthine-guanine phosphoribosyl transferase for eukaryotic cell culture; TRPl for S. cerevisiae; tetracycline, rifampicin or ampicillin resistance in E. coli.
  • selectable markers for example: neomycin phosphotransferase, histidinol dehydrogenase, dihydrofolate reductase, hygromycin phosphotransferase, herpes simplex virus thymidine kinase, adenosine deaminase, glut
  • said vectors are expression vectors, wherein a sequence encoding a polypeptide of the invention is placed under control of appropriate transcriptional and translational control elements to permit production or synthesis of said protein. Therefore, said polynucleotide is comprised in an expression cassette. More particularly, the vector comprises a replication origin, a promoter operatively linked to said encoding polynucleotide, a ribosome site, an RNA-splicing site (when genomic DNA is used), a polyadenylation site and a transcription termination site. It also can comprise an enhancer. Selection of the promoter will depend upon the cell in which the polypeptide is expressed. EXAMPLE 1 - Materials and Methods
  • E. coli Rosetta(DE3)pLysS cells were transformed with plasmid pET24d(+) containing the I-Dmo-I ORF with a 6His tag at the C-terminus. His-tagged I-Dmo-I was overexpressed in LB medium at 24.85°C for 5 h after addition of 0.3 mM IPTG when the OD 60O was around 0.6-0.8. Selenomethionine-labelled I-Dmo-I was expressed using the same strain.
  • the cells were collected from a 50 ml overnight culture grown in LB medium containing 30 mg ml "1 kanamycin until OD 600 ⁇ 1 0; at this point the cells were spun down, washed once with M9 minimal medium and finally resuspended in M9 minimal medium supplemented with thiamine (0.01 mg ml "1 ), glucose [0.4%(w/v)], CaCl 2 (0.0147 mg ml "1 ), MgSO 4 (0.246 mg ml "1 ) and kanamycin (30 mg ml "1 ).
  • the culture was shaken at 36.85°C for 30 min and selenomethionine (50 mg ml '1 ; Molecular Dimensions) was then added together with lysine hydrochloride, threonine, phenylalanine, leucine, isoleucine and valine as described in Van Duyne et al. (1993). After an additional 15 min of shaking, protein expression was induced for 5 h at 24.85°C by the addition of 0.3 mM IPTG.
  • the bacterial pellet was resuspended and the cells were disrupted by sonication in 50 mM sodium phosphate pH 8.0, 300 mM NaCl and 5% glycerol including protease inhibitors (Complete EDTA-free tablets, Roche).
  • the lysate was clarified by centrifugation (20 00Og for 1 h).
  • the supernatant was applied onto a Co 2+ - loaded HiTrap Chelating HP column (GE Healthcare) and the protein was eluted using an imidazole gradient (0-0.5 M).
  • the fractions containing I-Dmo-I were collected and the pH was adjusted to 6.0.
  • the sample was loaded onto a 5 ml HiTrap Heparin HP column (GE Healthcare) previously equilibrated with 20 mM sodium phosphate pH 6.0.
  • the sample was eluted with a continuous gradient from 0 to 1 M NaCl in 20 mM sodium phosphate pH 6.0 buffer.
  • the purified protein was subsequently concentrated using an Amicon Ultra system equipped with a 10 kDa cut off filter and loaded onto a PD-IO Desalting column (GE Healthcare) pre-equilibrated with 5 mM Tris-HCl pH 8.0 and 150 mM NaCl.
  • the protein was concentrated to 16 mg ml "1 , flash frozen in liquid nitrogen and stored at -8O.15°C.
  • the protein concentration was determined from the absorbance at 280 nm.
  • the purity of the samples was checked by SDS-PAGE and their homogeneity was evaluated using dynamic light scattering. Finally, the incorporation of selenomethionine was tested by mass spectrometry (data not shown).
  • the I-Dmol target DNA was purchased from Proligo and consisted of two strands of sequence 5'-GCCTTGCCGGGTAAGTTCCGGCGCG-S ' (SEQ ID NO: 1 1 and 5 '-CGCGCCGGAACTTACCCGGCAAGGC-S' (SEQ ID NO: 12). The construct forms a 25 bp blunt-end duplex.
  • the I-DmoI-DNA complex was formed after pre-warming the meganuclease and the oligonucleotide samples to 14.85°C and then mixing them in a 1.5: 1 molar ratio (DNA:protein). The mixture was incubated for 50 min and then spun down for 5 min. The supernatant was stored at room temperature to avoid precipitation. To assess the presence of DNA in the complex with I-Dmo-I, the purified complex was analyzed by running a 15% SDS-PAGE and staining first with Coomassie and subsequently with SYBR Safe. The same protocol was followed in the presence of 2 mM Ca 2+ or Mn 2+ .
  • Crystallization screening was performed immediately after complex formation using a Cartesian MicroSys robot (Genomic Solutions) and the sitting-drop method (96-well MRC plates) with nanodrops of 0.1 ml protein solution plus 0.1 ml reservoir solution and a reservoir volume of 60 ml.
  • the initial screens tested were Crystal Screens I and II, Crystal Screen Cryo and Crystal Screen Lite (Hampton Research), Wizard I and II, Wizard Cryo I and II, Precipitant Synergy Primary, Precipitant Synergy Expanded 67% and Precipitant Synergy Expanded 33% (Emerald BioSystems).
  • the final concentration of I-Dmo-I in the DNA-protein complex solution was 6 mg ml "1 .
  • Crystals were obtained in the nanodrops under several conditions (Crystal Screen I conditions 15 and 36, Crystal Screen II conditions 22, 35, 37 and 43, Crystal Screen Cryo conditions 15, 20 and 37, Crystal Screen Lite conditions 18, 28 and 41, Wizard I condition 21, Wizard Cryo I conditions 40 and 47, Wizard Cryo II condition 10, Precipitant Synergy Primary conditions 42 and 52 and Precipitant Synergy Expanded 67% condition 51).
  • Table I showing data-collection statistics of the native I-Dmo-I- DNA crystals grown in 2 mM Mn 2+ .
  • the non palindromic twenty-four base pairs long target sequence 5'- GCCTTGCCGGGTAAGTTCCGGCGC-3' (SEQ ID NO: 13) is the natural I-Dmol target.
  • the inventors divided it in two equal parts L and R.
  • the 64 degenerated targets derived from LR sequence were obtained by mutating nucleotides at positions +8, +9, and +10 in the R sequence.
  • oligonucleotides (5'-GCCTTGCCGGGTAAGTTCCNNNGC-S ' (SEQ ID NO: 14) and reverse complementary sequences) representing the target library LR(IONNN) were ordered from Sigma, annealed, and cloned using the Gateway protocol (Invitrogen) into the yeast pFL39-ADH-LACURAZ containing a I-Scel target site as control(Arnould et al., 2006).
  • Yeast reporter vectors were transformed into S. cerevisiae strain FYBL2-7B (MAT a, ura3 ⁇ 851 , trpl ⁇ 63, leu2 ⁇ l , lys2 ⁇ 202).
  • I-Dmol WT wild-type
  • Dl and D2 two I-Dmol mesophilic variants reported, Dl and D2
  • R-IONNN 64 I-Dmol derivated targets
  • a specificity logo is a diagrammatic representation of the specificity preference for each of the possible nucleotides in the I-Dmol (non coding strand) SEQ ID NO: 15, E-Drel SEQ ID NO: 17, 1-Crel SEQ ID NO: 15 and I-Scel SEQ ID NO: 18, DNA target sites.
  • the height of a given nucleotide is proportional to exp(- ⁇ G, nt /RT), where ⁇ G, nt is difference in interaction energy between the complex with mutated DNA and the wild type.
  • Full-length I-Dmol in complex with a 25bp double stranded DNA was crystallized as an enzyme-substrate complex with calcium and as an enzyme- product complex with manganese. Protein expression, purification, protein-DNA complex formation and crystallization were carried out as described in example 1 above. The phase problem was solved using the anomalous signal at the selenium peak wavelength.
  • the single anomalous dispersion (SAD) method was applied to obtain initial phases at 2.8 A resolution in crystals grown with Se-Met protein (see example 1).
  • the three selenium atoms were located in I-Dmol using SHELX (Schneider and Sheldrick, 2002) and initial phases were obtained with SHARP (de Ia Fortelle and Bricogne, 1997).
  • the initial model was built in 2.6 A 2fo-fc maps after solvent flattening using SOLOMON (Abrahams and Leslie, 1996).
  • SOLOMON Abrahams and Leslie, 1996
  • the structures were finally refined to 2.0 and 2.1 A in the same monoclinic space group using REFMAC (Murshudov et al., 1997) (Table II).
  • Figure 1 shows the crystal structure of I-Dmol in complex with its target DNA.
  • Panel a) shows the protein secondary structure, the complex is shown in two different orientations. The calcium ion is shown.
  • the crystallization oligonucleotide construct is shown in panel b).
  • the individual bases are named with a subindex strandA (coding strand) or strandB (non-coding strand) indicating the DNA strand where they belong.
  • Fig. Ia The overall fold of I-Dmol in complex with its DNA target (Fig. Ia) shows a clear pseudo two-fold axis between the two LAGLIDADG helices dividing the protein in two domains, A (residues 5-98) and B (residues 103-195) joined by a four residue linker. These domains contain the typical ⁇ topology of the
  • LAGLIDADG LAGLIDADG family. Both domains have a similar size and the ⁇ -strands form two antiparallel ⁇ -sheets composed of strands ⁇ l-4 in domain A and ⁇ 5-7 in domain B.
  • the ⁇ -sheets form a concave surface with an inner cylindrical shape where the DNA molecule is accommodated.
  • RMSD refers to Root Mean Square Deviation and is the measure of the average distance between the backbones of superimposed proteins.
  • panel a) there is shown a stereo comparison of the enzyme active site in the substrate and product bound structures.
  • the metal sites are labeled (Ml) for the shared position between calcium and manganese and the second manganese atom (M2).
  • Anomalous difference maps illustrate the presence of only one atom of calcium in the DNA bound structure b) and two atoms of manganese in the digested DNA structure c).
  • Panel d) shows a schematic diagram of the hypothetical enzymatic mechanism proposed for /- Dmol. Hydrolysis of the phosphodiester bonds would follow a sequential two-metal mechanism. While a single metal ion (sitel) is bound in one active site and the water nucleophile is positioned in the central site.
  • a second metal ion would enter the second site (site2) displacing the water molecule previously located in that site to the central one.
  • site2 displacing the water molecule previously located in that site to the central one.
  • the Asp21, Gly20 and GIu 1 17, Alal l ⁇ are contributed by the LAGLIDADG motifs of the enzyme.
  • Figure 3 shows a scheme of the Protein-DNA contacts in the Ca 2+ and Mn 2+ bound structures.
  • the cleavage sites are indicated by the shaded phosphates.
  • the portion of the DNA target which binds to the domain A of /- Dmol consists of residues -2 to 13 of the '3 strand and residues 3 to 13 of the 5' strand, with the remaining nucleotides being bound by the domain B of I- Dmol.
  • Lines indicate polar contacts and van der Waals interactions respectively. Dots represent water molecules involved in the interaction. Amino acids depicted on blackened boxes represent hydrogen bond interactions with the bases and the other residues represent van der Waals interactions with the DNA (bases, riboses or phosphates).
  • DNA target provided a preliminary view of regions involved in DNA target recognition inside domain A, it did not yield a complete picture of the recognition mechanism.
  • Divalent metal ions play an essential role in the catalysis of endonucleases and other enzymes.
  • LAGLIDADG homing endonucleases
  • the general mechanism of cleavage of the phosphodiester bonds of DNA requires a nucleophile to attack the electron deficient phosphorus atom, a general base to activate the nucleophile, a general acid to protonate the leaving group, and positively charged groups to stabilize the phosphoanion transition state.
  • the presence of cations is dispensable for DNA binding (Fig. 3) (Dalgaard et al., 1994).
  • Figure 4 shows the loops involved in DNA binding by I-Dmol.
  • the upper part of the figure depicts a ribbon diagram of the I-DmoIFDNA complex. Domain A contains two loops that contact the DNA (LIa and L2a), and domain B only has one loop (L2b) engaged in contacts with the nucleic acid. L2a and L2b are primarily associated with the central bases of the target site, and LIa is associated with bases outside that region — reflecting the asymmetry of the target recognition by I-Dmol.
  • the lower part of the figure shows detailed insets of the three loops involved in DNA interactions. The protein-DNA interactions are displayed as dashed lines.
  • Figure 5 shows structural basis of DNA recognition, a) Structural sequence alignment between the archaeal I-Dmol, eukaryotic I-Scel and I-Crel homing endonucleases. Secondary structure elements of the homing endonuclease I- Dmol are shown above the alignment. conserveed residues are boxed with a black background while homologous residues are boxed with a white background. Residues with a gray background are those involved in protein-base contacts in the complexes crystal structures. Sequence alignment was carried out with Clustal (Larkin et al., 2007) and the structural alignment with ESPript (Gouet et al., 1999). Panel b) shows a comparison of the location of the protein-base contacts (regions colored in gray) in the I-Dmol, I-Scel and I-Crel protein-DNA structures. Panel c) shows a schematic view of the protein-base contacts.
  • the central cation should stabilize the phosphoanion transition state in the hydrolysis of both strands, and facilitate the protonation of the 03' leaving group of each strand.
  • the absence of direct protein contacts between I-Crel and the nucleophilic water molecule did not facilitate the identification of a general base. It has been suggested that the extensive network of water molecules surrounding the active site participates in a concerted transfer of hydrogen atoms that activate the nucleophilic water molecule and protonate the leaving group.
  • Mn 2+ cations is coordinated with the side chain of Asp21, the carbonyl of Alal 16, the 5 'phosphate of -3C str andB, the phosphate of 2A stra ndA and a water molecule outside the active site, whereas the second Mn 2+ has similar interactions with the 5 phosphate of 3G s trandA, the phosphate of -2C st randB the main chain carbonyl of Gly20, the side chain of Glul l7 in the second LAGLIDADG motive, and another water molecule outside the active center.
  • I-Scel and I-Crel contain three metal sites in the active site
  • I-Dmol contains only two metal sites.
  • the comparison of the I-Dmol Ca 2+ and Mn 2+ anomalous maps shows that only one of the metal sites overlaps.
  • the other sites, including the central one, were occupied by water molecules, whereas in the case of I-Scel and I-Crel both can be occupied by a metal (Chevalier et al., 2004; Moure et al., 2003). Therefore the structural organization of the I-Dmol active site presents a clear asymmetry in the case of the calcium-bound structure, indicating a sequential mechanism for I-Dmol catalysis that has also been suggested for I-Scel (Moure et al., 2003).
  • the non-coding strand would be cleaved before the final reaction takes place on the coding strand.
  • the central water could be the nucleophile that would initiate the reaction, previous activation by the electropositive environment generated by the metal present in the active site (Garcia-Viloca et al., 2004).
  • the entry of another catalytic metal in the second site would promote the transfer or regeneration of the central water, leading to the cleavage of the coding strand.
  • the Ca 2+ bound structure would represent a snapshot of the activation state previous to the cleavage of the phosphodiester bond in the non-coding strand whereas the Mn 2+ bound structure would depict the organization of the active site after the cleavage of both strands.
  • the enzyme would produce a nick in the DNA non-coding strand before the coding strand would be cleaved resulting in the double strand cleavage.
  • this possible mechanism could be discarded after the observation of the cleavage properties of I-Dmol Asp21Asn and GIu 1 17GIn single mutants (Lykke- Andersen et al., 1997), nicked intermediates were observed in I-Scel (Perrin et al., 1993) and in I-Dmol when the cleavage properties of a homodimeric I-Dmol mutant were studied using a plasmid as substrate (Silva et al., 2006) indicating that a sequential cleavage mechanism is possible for the monomeric members of the LAGLIDADG homing endonuclease family.
  • I-Dmol contains two LAGLIDADG helices and it binds the nucleic acid in a monomeric form.
  • the protein forces a clear bend in the DNA molecule forming and angle of approximately 140° between the longitudinal axes of both DNA halves. This angle distorts the minor groove in the middle of the DNA molecule positioning both strands in the enzyme's active site.
  • the crystal structure reveals the asymmetric nature of the I-Dmol DNA binding cavity.
  • domain A contains a four ⁇ -strand sheet, whereas domain B contains only three (Fig.1).
  • a detailed view of the protein DNA contacts in the loops shows that the L2a and L2b loops contact symmetric regions on the DNA major grooves (Fig. 4).
  • LIa interacts with bases (6-10) in the major groove closer to the 5' in strand B.
  • This protein-DNA interaction is absent in the other half of the DNA target.
  • the lack of the fourth ⁇ -strand in domain B eliminates the presence of a loop similar to LIa in domain B, promoting the lack of protein DNA contacts in the major groove closer to the 5' in strand A. This difference implies that the target half associated with domain A (basesl-13 in both strands) is recognized by a major number of residues.
  • the protein-DNA contacts in the substrate and product bound structures were analyzed in detail with NUCPLOT (Luscombe et al., 1997) (Fig. 3, 4, 6 and 7).
  • a schematic representation of the interaction reveals few differences in the protein DNA interactions between both forms (Fig.3).
  • the main contacts in domain B interacting with the nucleotide bases involve Argl24, Argl26, Asp 154, Argl57 and Aspl 55.
  • Argl24 is positioned at a proper distance to make polar contacts with the bases of -7G str andA and -6G str andB, whereas Argl26 hydrogen bonds the base of - 5G strandB -
  • the conformation of Argl26 side chain is influenced by the interaction with Asp 1 19 that does not contact the nucleotide bases, but interacts with the phosphate backbone.
  • the rotamer of Aspl l9 forces a conformation of the Argl26, indirectly inducing the recognition of the base at -5G str andB-
  • the conformations and contacts of these residues are very similar both in the bound and cleaved DNA structures.
  • the Ioop2a presents Thr76 and Asp75, which are the only amino acids whose interactions provide specific recognition in the central four base pairs of the DNA.
  • Thr76 makes a polar contact with the base of 2A str andA
  • Asp75 hydrogen bonds the base of 3C str andB-
  • the conformation of this side chain is influenced by the presence of Arg77, which makes a polar contact with the base of 3G strandA (Fig.4, Loop2a).
  • Arg77 together with Arg81, Arg37, Tyr29, Arg33, Glu79, Glu35 and Ser34 are the remaining residues in domain A responsible for making direct contacts with the bases of the target DNA (Fig.4, Loop2a, and Loop Ia).
  • the side chain of Glu79 makes a direct contact with the base of 5A strandB and 5T s , randA , whereas the side chain of Glu35 contacts the base of 8C str andB-
  • the conformation of the Glu35 side chain is favored by the interactions of the side chain of Ser83 and the main chain of Tyr36 with a water molecule, which contacts the DNA backbone.
  • Tyr29 together with Arg33 and Ser34 form a second group of residues clustered in space that interact with the bases of 6C stra ndA 9G s t ra ndA and 9C s trandB respectively (Fig .4, Loop Ia). All these contacts between I-Dmol and the bases of its target DNA seem to be the responsible for DNA target recognition; the rest of the amino acids (Fig.3) involved in contacts make hydrogen bonds or van der Waals interactions with the DNA backbone.
  • I-Dmol and I- Seel were two well characterized meganucleases representing the homodimeric and monomeric members of the LAGLIDADG family that bind pseudo- and non-palindromic targets respectively (Fig.5).
  • the alignment illustrates the differences in the primary and secondary structures among these enzymes regarding the location of residues involved in DNA binding (residues with gray background in Fig.5a).
  • FIG.5b A structural comparison of the DNA binding residues of these homing endonucleases, shows that despite the similar structural scaffold the residues responsible for DNA recognition are not topologically conserved (Fig.5b).
  • the schematic comparison of the specific base- protein contacts in the different meganuclease-DNA complexes (Fig.5c) illustrates how the homodimeric meganuclease accomplish DNA recognition generating a similar network of protein-DNA contacts on both sides of the pseudo-palindromic DNA, whereas the monomeric ones display a tendency to maximize the interactions in one half of the target DNA.
  • the inventors analyzed the length of the binding site and the number of specific positions for the target DNA sequences of each meganuclease. To perform this study the inventors have used the last version of FoIdX (FoldX2.8) (Schymkowitz et al., 2005). Each base was mutated to the other three possibilities and the resulting interaction energies were converted to a probability to predict the preference for each base at a determined position (Fig.6).
  • Figure 6 shows the in silico binding patterns and in particular the in silico binding specificities for I-Dmol, E-Drel, I-Crel and I-Scel.
  • the energy-based logos display the different specificities of the meganucleases.
  • I-Dmol presents a short binding site with the highest specificity, while I-Scel has a long but quite tolerant binding site.
  • Base discrimination predicted by FoIdX compares notably well with respect to available experimental results for I-Scel (Doyon et al., 2006) and l-Crel (Argast et al., 1998).
  • the reference sequences for the logos are the wild type coding strands, except for I-Dmol where the non-coding strand was used.
  • the wild type sequence is shown, b) in silico R-10NNN binding pattern predicted by FoIdX.
  • the pattern was calculated by using the wild type l-Dmol structure and based on the difference in interaction energy with the WT DNA sequence. The energies were calculated by adding up the change in interaction energy due to each individual mutation in the DNA. Hits are found for R-IOGCC, R-IOGCG, R-IOGCA, and R- IOACC (see coding strand B sequence in Figure Ib or 7). Only the R-IOGCG hit was not found experimentally for l-Dmol or the two mutants analyzed. The target triplet is shown inside each cell.
  • the Inventors scanned the energy matrices coming from the analysis above along the Saccharomyces cerevisiae and Drosophila melanogaster genomes, using two different energy thresholds with respect the wild-type interaction energy.
  • the results did not yield a hit in yeast with the lower energy threshold (I-Scel site in the yeast strain sequenced is disrupted by the insertion of an intron that contains the meganuclease), and very few were found in Drosophila melanogaster (Table III).
  • the energy threshold was increased, weaker hits become apparent (Table III). This could be important in the context of a highly expressed enzyme or one with enhanced activity. Table III.- DNA Interaction analysis for I-Dmol, E-Drel, I-Crel and I-Scel.
  • the binding event induces a deep kink in the nucleic acid molecule to force both strands in the active site. This kink is more pronounced than in other meganucleases of the LAGLIDADG family.
  • the structures of the enzyme with the bound and cleaved DNA molecule suggest a sequential mechanism for the catalysis mechanism.
  • I-Dmol exhibits poor activity at 37° C due to its thermophilic origin and therefore is not an appropriate tool for practical in vivo applications.
  • Dl Ile52Phe, Leu95Gln
  • I-Dmol D2 Ile52Phe, Ala92Thr, PhelOlCys
  • I-Dmol is a very specific meganuclease.
  • the inventors monitored the cleavage pattern of I-Dmol and of its two mesophilic derivatives with the R-10NNN targets collection, which corresponds to the all sixty four possible triplets for the target positions 8G str andA, 9G stran dA, and 10C stra ndA, (Fig.7a).
  • the R- 10NNN triplet is in contact with domain A (Fig. 1, 3, 4 and 7), in the region having the maximal density of protein/DNA interactions.
  • the detailed interaction map of this region includes polar contacts of Arg33 with the 9G str andA base, of Ser34 carbonyl main chain with 9C str andB, of Ser34 main chain amide with 10G st randB base; and of Glu35 side chain with 8C str andB pirimidine ring.
  • Figure 7 shows the in vivo cleavage patterns a) I-Dmol recognition site.
  • the target sequence has been divided in two halves, L (left) and R (right).
  • 64 targets R-IONNN
  • 64 targets were derived from the natural I- Dmol targets differing from the natural I-Dmol target only by three base pairs at position 8, 9, and 10 on the R half of the target. •, cleavage positions, b) cleavage activities of I-Dmol wild type and of two mesophilic I-Dmol variants (Dl, D2).
  • the 64 R-10NNN targets are identified in the top left panel by the 5'-NNN-3' bottom strand sequence of the nucleotides 10, 9, and 8.
  • the grey box identifies the natural target. Bottom left, profile of I-Dmol. Top right, profile of l-Dmol variant Dl . Bottom right, profile of I-Dmol variant D2. Targets cleaved by the samples are boxed in solid black lines.
  • the controls no meganuclease expressed, at positions al, a4, a7,b2, b5, b8, c3, c6, dl, d4, d7, ...; I-SceI CLS (variant with moderate activity) , at positions bl , b4, b7, c2, c5, c8, ...; and /-See/ WT at positions cl, c4, c7, d2, d5, d8, ....
  • in silico screening methods are now possible using the information from this structure to predict the effects of changes on the I-Dmol structure of residue changes therein and also changes in the DNA target.
  • Such in silico screening allows large numbers of potential enzymes to be screened against all possible targets and allows the more time consuming later in vitro/in vivo characterisation work to be focussed upon the candidate molecules identified during an initial in silico screen or later more focussed analysis of three-dimensional models of candidate polypeptides.
  • the Inventors have previously conducted similar profiling with the I-Crel meganuclease as well as with hundreds of engineered derivatives (Arnould et al., 2006; Smith et al., 2006). Using a statistical approach, the inventors could also infer clues about the role of individual contacts between the protein and the target (Arnould et al., 2006; Smith et al., 2006). In former studies, the inventors have used structural data to engineer the specificity of the homodimeric I-Crel protein. First, the inventors locally engineered sub-domains of the I-Crel DNA binding interface to cleave DNA targets differing from the I-Crel target by a few consecutive base pairs.
  • I-Dmol seems to have a very narrow specificity.
  • the inventors have shown that the I-Crel Asp75Asn meganuclease mutant had a narrow target specificity, showing strong cleavage for only 3 targets out of two similar collections of 64 targets derived from the wild-type I-Crel target (Arnould et al., 2006; Smith et al., 2006).
  • the narrow cleavage pattern of the I-Dmol Dl and D2 variants suggest that I-Dmol is at least as selective as I-Crel.
  • the induction of homologous gene targeting by sequence specific endonuclease is seen today as an emerging technology with many applications (Paques and Duchateau, 2007).
  • I-Ceul homing endonuclease Evolving asymmetric DNA recognition from a symmetric protein scaffold. Structure, 14, 869- 880.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Medicinal Chemistry (AREA)
  • Microbiology (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

The present invention relates to the three-dimensional structure of the meganuclease I-Dmol in combination with its DNA target and from this the prediction of residues in the I-Dmol enzyme which affect the binding, catalytic and other properties of this enzyme. The present invention also relates to I-Dmol enzymes with altered characteristics such as altered target half sites or altered catalysis properties and to chimeric meganucleases comprising portions of I-Dmol and to the use of the three-dimensional structure of the meganuclease I-Dmol in combination with its DNA target in an in silico screening method.

Description

The crystal structure of I-Dmol in complex with its DNA target, improved chimeric meganucleases and uses thereof
The present invention relates to the three-dimensional structure of the meganuclease l-Dmol in combination with its DNA target. The present invention also relates to l-Dmol enzymes with altered characteristics such as altered target half sites or altered catalysis properties and to chimeric meganucleases comprising portions of I-Dmol and to the use of the three-dimensional structure of the meganuclease l-Dmol in combination with its DNA target in an in silico screening method. Meganucleases are sequence-specific enzymes which recognize large (12-45 bp) DNA target sites. These enzymes are often encoded by introns or inteins behaving as mobile genetic elements. They recognize sites that usually correspond to intron-free or intein-free genes, where they produce a DNA double strand break (DSB). DSB repair by homologous recombination with an intron- or intein- containing gene results in the insertion of the intron or intein where DSB occurred, in a specific locus in living cells (Thierry and Dujon, 1992).
Among the various strategies to engineer a given genetic locus, the use of rare cutting DNA endonucleases such as meganucleases has emerged as a powerful tool to increase the rate of successful gene targeting through the generation of a DNA double strand break (DSB) by a rare cutting DNA endonuclease and a homologous recombination event at the site of the break.
Meganucleases have been used to stimulate homologous gene targeting in the vicinity of their target sequences in cultured cells and plants (Choulika et al., 1995; Puchta et al., 1996). These results present new means to engineer a genome in a wide range of applications, such as the correction of mutations linked with monogenic inherited diseases. This strategy should alleviate the risks due to the randomly inserted transgenes used in current gene therapy approaches (Hacein-Bey- Abina et al., 2003).
The use of meganuclease-induced recombination has long been limited by the repertoire of natural meganucleases. In nature, meganucleases are essentially represented by homing endonucleases, a family of endonucleases encoded by mobile genetic elements, whose function is to initiate DSB induced recombination events in a process referred to as homing (Chevalier and Stoddard, 2001). Several hundred homing endonucleases have been identified in bacteria, eukaryotes, and archaea (Chevalier and Stoddard, 2001); however, the probability of finding a homing endonuclease cleavage site in a chosen gene is extremely low. Thus, the making of artificial meganucleases with custom-made substrate specificity is an intense area of research. Given their biological function and their specificity, homing endonucleases represent ideal scaffolds to engineer altered substrate/DNA target specific proteins that cleave or recombine specific DNA targets.
Sequence homology has been used to classify homing endonucleases into four families, the largest one having the conserved LAGLIDADG sequence motif. Homing endonucleases with only one such motif function as homodimers. In contrast, larger homing endonucleases containing two motifs are single chain proteins (Dalgaard et al., 1993; Jacquier and Dujon, 1985). Structural information for several members of the LAGLIDADG endonuclease family indicate that these proteins adopt a similar active conformation as homodimers or as monomers with two separate domains (Chevalier et al., 2001 ; Ichiyanagi et al., 2000; Silva et al., 1999; Spiegel et al., 2006). The LAGLIDADG motifs form structurally conserved α-helices tightly packed at the center of the interdomain or intermonomer interface. The last acidic residue of the LAGLIDADG motif participates in the DNA cleavage by a metal dependent mechanism of phosphodiester hydrolysis (Chevalier and Stoddard, 2001).
Homing endonucleases with one LAGLIDADG (L) are around 20 kDa in molecular mass and act as homodimers. Those with two copies (LL) range from 25 kDa (230 amino acids) to 50 kDa (HO, 545 amino acids) with 70 to 150 residues between each motif and act as a monomer. Cleavage of the target sequence occurs inside the recognition site, leaving a 4 nucleotide staggered cut with 3'OH overhangs.
\-Ceul and I-Crel (166 amino acids) are homing endonucleases with one LAGLIDADG motif (mono- LAGLIDADG). I-Dmol (194 amino acids, SWISSPROT accession number P21505 (SEQ ID NO: 2)), l-Scel, Pl-Pful and Fl-Scel are homing endonucleases with two LAGLIDADG motifs. In the present invention, unless otherwise mentioned, the residue numbers refer to the amino acid numbering of the I-Dmol sequence SWISSPROT number P21505 (SEQ ID NO: 2).
Structural models using X-ray crystallography have been generated for I-Crel (PDB code Ig9y), I-Dmol (PDB code XbIA), Pl-Sce I, Pl-Pful. Structures of I-Crel and Pl-Scel (Moure et al., 2002) bound to their DNA site have also been elucidated leading to a number of predictions about specific protein-DNA contacts.
LAGLIDADG proteins with a single motif, such as I-Crel, form homodimers and cleave palindromic or pseudo-palindromic DNA sequences, whereas the larger, double motif proteins, such as I-Scel are monomers and cleave non- palindromic targets. Several different LAGLIDADG proteins have been crystallized and they have been shown to exhibit a striking conservation of the core structure that contrasts with a lack of similarity at the primary sequence level (Jurica et al., 1998; Chevalier et al., 2001 ; Chevalier et al., 2003; Moure et al., 2003; Moure et al., 2002; Ichiyanagi et al., 2000; Duan et al., 1997; Bolduc et al., 2003; Silva et al., 1999).
In this core structure, two characteristic αββαββα folds, contributed by two monomers in dimeric LAGLIDADG proteins or by two domains in monomeric LAGLIDADG proteins, face each other with a two-fold symmetry. DNA binding depends on the β strands from each domain, folded into an antiparallel β-sheet, and forming a saddle on the DNA helix major groove. The catalytic core is central, with a contribution of both symmetric monomers/domains. In addition to this core structure, other domains can be found: for example, Pl-Scel, an intein, has a protein splicing domain, and an additional DNA-binding domain (Moure et al., 2002; Grindl et al., 1998). Despite an apparent lack of sequence conservation between individual members of the LAGLIDADG family, structural comparisons indicate that LAGLIDADG proteins, should they cut as dimers like I-Crel or as a monomer like /- Dmol, adopt a similar active conformation. In all structures, the LAGLIDADG motifs are central and form two packed α-helices where a 2-fold (pseudo-) symmetry axis separates two monomers or apparent domains.
The LAGLIDADG motif corresponds to residues 13 to 21 in l-Crel, and to positions 12 to 20 and 109 to 1 17, in I-Dmol. On either side of the LAGLIDADG α-helices, a four β-sheet provides a DNA binding interface that drives the interaction of the protein with the half site of the target DNA sequence. I-Dmol is similar to I-Crel dimers, except that the A domain (residues 1 to 95) and the B domain
(residues 105 to 194) are separated by a linker (residues 96 to 104) (Epinat et al., 2003).
Given their high level of specificity, homing endonucleases represent ideal scaffolds for engineering tailored endonucleases. Several studies have shown that the DNA binding domain from LAGLIDADG proteins, (Chevalier et al.,
2001) can be engineered. Several LAGLIDADG proteins, including Pl-Scel (Gimble et al.,
2003), I-Crel (Seligman et al., 2002; Sussman et al., 2004; Rosen et al., 2006;
Arnould et al., 2006), I-Scel (Doyon et al., 2006) and I-Msol (Ashworth et al., 2006) have been modified by rational or semi-rational mutagenesis and screening to acquire new sequence binding or cleavage specificities. Semi rational design assisted by high throughput screening methods have allowed the Applicants to derive thousands of novel proteins from I-Crel, an homodimeric protein from the LAGLIDADG family (Smith et al., 2006; Arnould et al., 2006).
This combinatorial strategy has been shown to work by the generation of meganucleases cleaving a natural DNA target sequence located within the human RAGl, XPC and HPRT genes (Smith et al., 2006; Arnould et al., 2007;
WO2008/059382).
However, although the capacity to combine up to four sub-domains considerably increases the number of DNA sequences that can be targeted, it is still difficult to prepare a suite of enzymes which can act upon the complete range of sequences possible for a natural target sequence of a given size.
Another strategy is to combine domains from distinct meganucleases. This approach has been illustrated by the creation of new meganucleases by domain swapping between I-Crel and I-Dmol, leading to the generation of a meganuclease cleaving the hybrid sequence corresponding to the fusion of the two half parent target sequences (Epinat et al., 2003; Chevalier et al.,
2002). I-Dmol is a 22 kDa endonuclease from the hyperthermophilic archae
Desulfurococcus mobilis. It is a monomelic protein comprising two similar domains, which have both a LAGLIDADG motif. The structure of the protein alone, without its
DNA target henceforth referred to as D 1234 (SEQ ID NO: 7), has been solved (Silva et al., 1999).
The research group of Chevalier et al., (2002) has built a chimeric protein based on the two endonucleases I-Dmol and I-Crel that was called E-Drel (Engineered I-Dmol/I-Crel). E-Drel consists of the fusion of the first or A domain of I-Dmol to a single subunit of the I-Crel homodimer linked by a flexible linker to create the initial scaffold for the enzyme. Chevalier et al., then made a number of residue modifications based upon the predictions of computational interface algorithms so as to alleviate any potential steric clashes predicted from a 3D model generated by combining elements of previously generated I-Dmol and I-Crel models.
Residues were identified between the facing surfaces of the two component molecules; in particular residues at positions 47, 51, 55, 108, 193 and 194 were identified as potentially clashing. These residues were replaced with alanine residues but such a modified protein was found to be insoluble.
Residue numbers refer to the E-Drel open reading frame which comprises 101 residues (beginning at the first methionine) from domain A of I-Dmol fused to the last 156 residues of I-Crel separated by a three amino acid NGN linker which mimics the native I-Dmol linker in length.
The interface was then optimised through a combination of computational redesign for residues 47, 51, 55, 108, 193 and 194 as well as residues 12, 13, 17, 19, 52, 105, 109 and 1 13; followed by an in vivo protein folding assay upon selected sequences to determine the solubility of E-Drel enzymes modified at these residues. A final scaffold was designed with modifications: 119, H51 and H55 of I-Dmol and E8, LI l, F16, K96 and L97 of I-Crel (corresponding to E105, L108, Fl 13, K193 and L194).
The E-Drel (Chevalier et al., 2002) structure in complex with its chimeric DNA target dre3 (C12D34 (SEQ ID NO: 5) using the applicants nomenclature) was solved. E-Drel was shown able to recognise and cut this hybrid
C12D34 (SEQ ID NO: 5) target only. From this structure a number of residues were predicted to be base-specific contacts of E-Drel to its target hybrid site, these residues were 25, 29, 31, 33, 34, 35, 37, 70, 75, 76, 77, 79, 81 of I-Dmol; and residues 123, 125, 127, 130, 137, 135, 139, 141, 163, 165, 167, 172 of /-Oe/.
The Applicants have also previously conducted experiments with a DmoCre scaffold to seek to broaden the range of DNA target sequences cleaved by engineered homing nuclease enzymes. DmoCre is a chimeric molecule built from the two homing endonucleases I-Dmol and I-Crel. It includes the N-terminal portion from I-Dmol linked to an I-Crel monomer. DmoCre could have a tremendous advantage as scaffold: mutation in the I-Dmol moiety could be combined with mutations in the /- Crel domain, and thousands of such variant I-Crel molecules have already been identified and profiled (Smith J et al., 2006; Arnould S et al., 2006; Arnould S et al., 2007).
Based upon the structure of the I-Dmol protein alone, without its DNA target (Silva et al., 1999) and on the structure of the complex between I-Crel and its DNA target C 1234 (SEQ ID NO: 4) (Jurica et al., 1998, Chevalier et al., 2003), a chimeric DmoCre endonuclease has been built (Epinat et al., 2003). DmoCre is a monomeric protein that corresponds to the A domain of I-Dmol up to residue F 109 followed by I-Crel from residue L 13. To avoid a steric clash, 1107 of the I-Dmol domain was mutated into a leucine residue. In addition, residues 47, 51 and 55 of /- Dmol, which were found to be close to residues 96 and 97 of I-Crel, were mutated to alanine, alanine and aspartic acid respectively.
DmoCre has been shown to be active in vitro (Epinat et al., 2003) and was able to cleave the hybrid target C12D34 (SEQ ID NO: 5) composed from the left part of C 1234 (SEQ ID NO: 4) or C 1221 (SEQ ID NO: 6) (the palindromic target derived from C 1234) and the D 1234 (SEQ ID NO: 7). Furthermore I-Dmol and DmoCre variants able to cleave their DNA target sequences more efficiently at 37°C were identified by random mutagenesis and screening in yeast cells (WO 2005/105989; Prieto et al., 2008).
The E-Drel and DmoCre chimeric enzymes therefore have in common the A domain of I-Dmol. These E-Drel and DmoCre chimeric enzymes differ significantly in other respects as outlined above. The inventors are interested in creating a new generation of chimeric enzymes which recognize a wider set of target sequences. By being able to target new
DNA sequences and so induce a double-strand break in a site of interest comprising a
DNA target sequence, the applicants provide the tools to thereby induce a DNA recombination event, a loss of a particular DNA segment or cell death.
This double-strand break can be used to: repair a specific sequence, modifying a specific sequence, restoring a functional gene in place of a mutated one, attenuating or activating an endogenous gene of interest, introducing a mutation into a site of interest, introducing an exogenous gene or a part thereof, inactivating or detecting an endogenous gene or a part thereof, translocating a chromosomal arm, or leaving the DNA unrepaired and degraded. Such modified meganuclease enzymes therefore give a user a wide variety of potential options in the therapeutic, research or other productive use of such modified meganuclease enzymes.
Although the structure of I-Dmol has been previously solved (Silva et al., 1999), there is no specific information at the molecular level about I-Dmol DNA recognition and cleavage mechanisms because no structure of the enzyme in complex with its DNA target has previously been available. The absence of these data has hampered the use of l-Dmol as a scaffold to engineer tailored specificities. Here the inventors report the crystal structure of I-Dmol solved in the presence of Ca2+ and Mn2+, revealing the molecular mechanisms that govern l-Dmol DNA binding and cleavage. The findings revealed by the crystal structure interactions, in silico predictions and in vivo experiments open new possibilities to engineer custom specificities using l-Dmol as scaffold. In particular for the first time the DNA binding and catalytically important residues have been identified within the B domain. The inventors have therefore sought to improve chimeric meganuclease enzymes comprising at least one l-Dmol domain by seeking to increase the number of DNA targets these chimeric enzymes can recognize and cut.
Therefore the present invention relates to a polypeptide, comprising the sequence of an l-Dmol endonuclease or a chimeric derivative thereof, including at least the l-Dmol domain B and characterized in that it comprises the substitution of at least one of the residues at positions 124, 126, 154, 155 of said l-Dmol domain B and wherein the polypeptide recognises an l-Dmol DNA target half-site which differs from a wild type I-Dmol DNA target half-site SEQ ID NO: 1, in at least one of positions ±2, ±3, ±5 ±6, ±7.
Based upon the inventors work the use of the B domain from I-Dmol in a modified I-Dmol enzyme or chimeric enzyme is possible for the first time. Using the three-dimensional model, the DNA binding residues have been identified and from the work of the inventors with I-Dmol and other meganucleases, modification of these residues will lead to altered specificity at the nucleotides corresponding to those of the I-Dmol DNA target half-site with which these residues have been shown to interact.
In particular wherein at least one of residues in positions 124, 126, 154, 155 of the I-Dmol domain B are substituted for any alternative amino acid.
The substitution of one amino acid residue for another is well known in the art to cause changes to the structure and activity of a protein. For instance the substitution of a non-polar amino acid residue for a polar amino acid would be expected to alter the interaction of this residue with the polypeptide in which it is present potentially affecting the three-dimensional structure thereof or conformation of an active/binding site and also to affect the function of the residue if this is linked to the presence of a polar side chain. As well as this gross replacement of one type of amino acid with another, more subtle alterations are also possible.
In particular the polypeptide further comprises the substitution of at least one of the residues in positions 1 19, 128, 157 of the I-Dmol domain B, by any amino acid, which alters the recognition of said polypeptide for an I-Dmol DNA target half-site which differs from a wild type I-Dmol DNA target half-site SEQ ID NO: 1, in at least one of positions ±2 ±3.
In particular the polypeptide further comprises the substitution of at least one of the residues in positions 1 15, 1 16, 1 17, 1 18, 120, 130, 150, 152, 153, 156, 158, 160, 164, 166, 167, 170 of said I-Dmol domain B, by any amino acid, which alters the recognition of said polypeptide for an I-Dmol DNA target half-site which differs from a wild type I-Dmol DNA target half-site SEQ ID NO: 1, in at least one of positions ±1, ±2 ±3, ±4, ±5 ±6, ±7, ±8, ±9. The Inventors model predicts further interacting residues, the alteration of which it is expected will alter the interaction of the B domain of I-Dmol with its DNA target and so potentially alter the DNA target which such a modified /- Dmol recognises. These residues are described in the current Patent Application and in particular in figure 3 herein.
In particular the polypeptide is a chimeric-Dmø endonuclease consisting of the fusion of said I-Dmo I domain B to a sequence of a dimeric LAGLIDADG homing endonuclease or to a domain of another monomeric LAGLIDADG homing endonuclease.
In particular said I-Dmol domain B is fused to a domain selected from one of the enzymes in the group: /See I, I-Chu I, I-Cre I, I-Csm I, Pl-Sce I, PITH I, PI-Mtu I, I-Ceu I, I-Sce II, I-Sce III, HO, Pi-Civ I, PI-Ctr I, PI-Aae I, PI-Bsu I, PI-Dha I, PI-Dr a I, PI-Mav I, PI-Mch I, PI-Mfu I, PI-MjI I, PI-Mga I, PI-Mgo I, PI- Min I, PI-Mka I, PI-MIe I, PI-Mma I, PI-Msh I, PI-Msm I, Pl-Mth I, PI-Mtu I1 PI-Mxe I, PI-Npu I, Pl-Pfu I, PI-Rma I, PISpb I, PI-Ssp I, PI-Fac I, PI-Mja I, PI-Pho I, Pi- Tag I, PI-Thy I, PI-Tko I, PI-Tsp I, I-Msol.
In particular the polypeptide is characterized in that said I-Dmol domain B is at the NH2-terminus of said chimeric-Dwσ endonuclease.
In particular the polypeptide is characterized in that said dimeric LAGLIDADG homing endonuclease is I-Crel.
In particular the polypeptide is characterized in that it comprises a detectable tag at its NH2 and/or COOH terminus. In particular the polypeptide is characterized in that it comprises a
Nuclear Localisation Signal (NLS) at the NH2 and/or COOH terminus or comprised with the polypeptide but not within either enzymatic domain.
Preferably the NLS comprises a peptide sequence selected from the group consisting of: SEQ ID NO: 19, 20, 21, 22, 23. A nuclear localizing sequence (NLS) is an amino acid sequence which acts to target the protein to the cell nucleus through the Nuclear Pore Complex and to direct a newly synthesized protein into the nucleus via its recognition by cytosolic nuclear transport receptors. Typically, NLSs consist of one or more short sequences of positively charged amino acids such as lysines or arginines. In particular the NLS is selected from the NLS sequences of the known proteins SV40 large T antigen -PKKKRKV- (SEQ ID NO: 19), nucleoplasm^ -KR[PAATKKAGQA]KKKK- (SEQ ID NO: 20), p54 -RIRKKLR- (SEQ ID NO: 21), SOX9 -PRRRK- (SEQ ID NO: 22), NS5A -PPRKKRTVV- (SEQ ID NO: 23).
According to a second aspect of the present invention there is provided a polynucleotide, characterized in that it encodes a polypeptide according to the present invention.
According to a third aspect of the present invention there is provided a vector, characterized in that it comprises a polynucleotide according to the present invention.
According to a fourth aspect of the present invention there is provided a host cell, characterized in that it is modified by a polynucleotide or a vector according to the present invention.
According to a fifth aspect of the present invention there is provided a non-human transgenic animal, characterized in that all or part of its cells are modified by a polynucleotide or a vector according to the present invention. According to a sixth aspect of the present invention there is provided a transgenic plant, characterized in that all or part of its cells are modified by a polynucleotide or a vector according to the present invention.
According to a seventh aspect of the present invention there is provided the use of a polypeptide, a polynucleotide, a vector, a cell, a non-human animal or a plant, according to the present invention for the selection and/or the screening of meganucleases with novel DNA target specificity.
According to an eighth aspect of the present invention there is provided a method of identifying polypeptides comprising at least one domain of /- Dmol which can recognise and bind to an altered DNA target, comprising at least the steps of: i) applying a 3-dimensional molecular modelling algorithm to at least the set of atomic coordinates set out in Table II and figures 8 and 9 to determine the spatial coordinates of the DNA interacting portions of a candidate polypeptide and its native DNA target, modelled from the set of atomic coordinates and generating a model; ii) modifying at least one residue of the candidate polypeptide and altering the characteristics of the model accordingly; iii) electronically screening the modified candidate polypeptide of step ii) against a stored set of spatial coordinates representing the native DNA target sequence and at least one variant thereof; iv) calculating from said model the interaction energies of the modified candidate polypeptide of step ii) with the stored set of DNA targets; v) converting said interaction energies into a probability score to predict the preference of the modified polypeptide for the stored set of DNA targets; vi) identifying at least one DNA target sequence from the stored set of DNA targets, which the modified candidate polypeptide recognises, when the probability score is greater than a predetermined threshold; vii) identifying candidate polpeptides with altered specificity wherein the at least one DNA target sequence of step vi) is not the native DNA target sequence.
Based upon the inventors model therefore, the inventors have developed a new means to cut down the number of in vitro/in vivo experiments that need to be performed when attempting to identify I-Dmol enzymes or chimeric derivatives thereof that bind to altered DNA targets, by modelling these variants in a first screen in silico to identify possible candidate polypeptides for further in vitro/in vivo studies. In particular wherein the stored set of spatial coordinates of step iii) comprises the native DNA target in which at least one base therein is changed to the three alternate possible bases.
In particular wherein the stored set of spatial coordinates comprises all possible variants of the native DNA target sequence. In particular wherein the modified residue of step ii) forms a direct contact between the candidate polypeptide and the native DNA target sequence.
Alternatively wherein the modified residue of step ii) forms an indirect contact between the candidate polypeptide and the native DNA target sequence. In particular wherein the modified residue of step ii) forms a molecular interaction selected from the group: hydrogen bond, polar contact and van der Waals interactions, between said candidate polypeptide and said native DNA target sequence.
In particular wherein the candidate polypeptide comprises at least said I-Dmol domain B; and wherein in step ii) at least one of residues in positions 1 15, 1 16, 1 17, 1 18, 1 19,120, 124, 126, 128,130, 150, 152, 153, 154, 156, 155, 157, 158, 160, 164, 166, 167, 170 of said I-Dmol domain B is altered; and wherein the at least one altered DNA target differs from a native DNA target consisting of SEQ ID NO: 7, in at least one of positions -1, -2, -3, -4 -5 -6, -7, -8, -9.
In particular wherein the candidate polypeptide comprises at least the I-Dmol domain A; and wherein in step ii) at least one of residues in positions 15,
19, 20, 21, 22, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 41, 42, 43, 67, 68, 70,
71, 72, 73, 75, 76, 77, 79, 81, 83, 84, 85 of the I-Dmol domain A is altered; and wherein the at least one altered DNA target differs from a native DNA target consisting of SEQ ID NO: 7, in at least one of positions +1, +2, +3, +4, +5 +6, +7, +8, +9, +10, +1 1, +12, +13.
In particular wherein the candidate polypeptide consists of an A or B domain of I-Dmol fused to a dimeric LAGLIDADG homing endonuclease or to a domain of another monomeric LAGLIDADG homing endonuclease.
In particular wherein the candidate polypeptide consists of either said I-Dmol domain A or B fused to a domain selected from one of the enzymes in the group: I-Sce I, I-Chu I, I-Cre I1 I-Csm I, Pl-Sce I, PI-TU I, PI-Mtu I, I-Ceu I, I-Sce II, I-Sce III, HO1 Pi-Civ I, Pl-Ctr I, PI-Aae I, PI-Bsu I, PI-Dha I, PI-Dra I, PI-Mav I, PI- Mch I, PI-Mfu I1 PI-Mfl I, PI-Mga I, PI-Mgo I, PI-Min I, PI-Mka I, PI-MIe I, PI-Mma I1 PI-Msh I, PI-Msm I, PI-Mth I, PI-Mtu I, PI-Mxe I, PI-Npu I, PI-Pfu I, PI-Rma I, PI- Spb I, PI-Ssp I1 PI-Fac I, PI-Mja I, PI-Pho I, Pi-Tag I, PI-Thy I, PI-Tko I, PI-Tsp I, I- Msol.
For a better understanding of the invention and to show how the same may be carried into effect, there will now be shown by way of example only, specific embodiments, methods and processes according to the present invention with reference to the accompanying drawings in which:
Figure 1. - shows the crystal structure of I-Dmol in complex with its target DNA. Figure 2a. - shows a detailed view of the I-Dmol active site.
Figure 2b. - shows the presence of only one atom of calcium in the DNA bound structure b)
Figure 2c. - shows two atoms of manganese in the digested DNA structure.
Figure 2d. - shows a schematic diagram of the hypothetical enzymatic mechanism proposed for I-Dmol.
Figure 3. - shows a scheme of the Protein-DNA contacts in the Ca2+ and Mn2+ bound structures. Figure 4. - shows the loops involved in DNA binding by I-Dmol.
The upper part of the figure depicts a ribbon diagram of the I-DmoIIOHA. complex. The lower part of the figure shows detailed insets of the three loops involved in DNA interactions.
Figure 5a. - shows a structural sequence alignment between the archaeal I-Dmol, eukaryotic I-Scel and I-Crel homing endonucleases.
Figure 5b. - shows a comparison of the location of the protein-base contacts in the I-Dmol, I-Scel and I-Crel protein-DNA structures.
Figure 5c. - shows a schematic view of the protein-base contacts.
Figure 6a - shows In silico binding patterns for I-Dmol, E-Drel, I- Crel and I-Scel.
Figure 6b. - shows In silico R-10NNN binding pattern predicted by FoIdX.
Figure 7a. - shows in vivo cleavage patterns for the I-Dmol recognition site. Figure 7b. - shows cleavage activities of I-Dmol wild type and of two mesophilic I-Dmol variants (Dl, D2).
Figure 8 - lists the atomic coordinate data of I-Dmol in combination with its target in the presence Of Mn2+ in pdb format.
Figure 9 - lists the atomic coordinate data of I-Dmol in combination with its target in the presence of Ca2+ in pdb format.
There will now be described by way of example a specific mode contemplated by the Inventors. In the following description numerous specific details are set forth in order to provide a thorough understanding. It will be apparent however, to one skilled in the art, that the present invention may be practiced without limitation to these specific details. In other instances, well known methods and structures have not been described so as not to unnecessarily obscure the description. Definitions
- Amino acid residues in a polypeptide sequence are designated herein according to the one-letter code, in which, for example, Q means GIn or Glutamine residue, R means Arg or Arginine residue and D means Asp or Aspartic acid residue. - hydrophobic amino acid refers to leucine (L), valine (V), isoleucine (I), alanine (A), methionine (M), phenylalanine (F), tryptophane (W) and tyrosine (Y).
- Nucleotides are designated as follows: one-letter code is used for designating the base of a nucleoside: a is adenine, t is thymine, c is cytosine, and g is guanine. For the degenerated nucleotides, r represents g or a (purine nucleotides), k represents g or t, s represents g or c, w represents a or t, m represents a or c, y represents t or c (pyrimidine nucleotides), d represents g, a or t, v represents g, a or c, b represents g, t or c, h represents a, t or c, and n represents g, a, t or c.
- by "meganuclease" is intended an endonuclease having a double- stranded DNA target sequence of 12 to 45 bp.
- by "parent LAGLIDADG homing endonuclease" is intended a wild-type LAGLIDADG homing endonuclease or a functional variant thereof. Said parent LAGLIDADG homing endonuclease may be a monomer, a dimer (homodimer or heterodimer) comprising two LAGLIDADG homing endonuclease core domains which are associated in a functional endonuclease able to cleave a double-stranded DNA target of 22 to 24 bp.
- by "homodimeric LAGLIDADG homing endonuclease" is intended a wild-type homodimeric LAGLIDADG homing endonuclease having a single LAGLIDADG motif and cleaving palindromic DNA target sequences, such as I-Crel or l-Msol or a functional variant thereof. - by "LAGLIDADG homing endonuclease variant" or "variant" is intended a protein obtained by replacing at least one amino acid of a LAGLIDADG homing endonuclease sequence, with a different amino acid.
- by "functional variant" is intended a LAGLIDADG homing endonuclease variant which is able to cleave a DNA target, preferably a new DNA target which is not cleaved by a wild type LAGLIDADG homing endonuclease. For example, such variants have amino acid variation at positions contacting the DNA target sequence or interacting directly or indirectly with said DNA target.
- by "homing endonuclease variant with novel specificity" is intended a variant having a pattern of cleaved targets (cleavage profile) different from that of the parent homing endonuclease. The variants may cleave less targets (restricted profile) or more targets than the parent homing endonuclease. Preferably, the variant is able to cleave at least one target that is not cleaved by the parent homing endonuclease. The terms "novel specificity", "modified specificity", "novel cleavage specificity", "novel substrate specificity" which are equivalent and used indifferently, refer to the specificity of the variant towards the nucleotides of the DNA target sequence.
- by "I-CreF is intended the wild-type I-Crel having the sequence SWISSPROT P05725 or pdb accession code Ig9y (SEQ ID NO:8) .
-by "I-DmoF is intended the wild-type I-Dmol having the sequence SWISSPROT number P21505 (SEQ ID NO: 2) .
- by "domain" or "core domain" is intended the "LAGLIDADG homing endonuclease core domain" which is the characteristic αββαββα fold of the homing endonucleases of the LAGLIDADG family, corresponding to a sequence of about one hundred amino acid residues. Said domain comprises four beta-strands folded in an antiparallel beta-sheet which interacts with one half of the DNA target. This domain is able to associate with another LAGLIDADG homing endonuclease core domain which interacts with the other half of the DNA target to form a functional endonuclease able to cleave said DNA target. For example, in the case of the dimeric homing endonuclease I-Crel (163 amino acids), the LAGLIDADG homing endonuclease core domain corresponds to the residues 6 to 94. In the case of monomelic homing endonucleases, two such domains are found in the sequence of the endonuclease; for example in I-Dmol ( 194 amino acids), the A domain (residues 7 to 99) and the B domain (residues 104 to 194) are separated by a short linker (residues 100 to 103). - by "subdomain" is intended the region of a LAGLIDADG homing endonuclease core domain which interacts with a distinct part of a homing endonuclease DNA target half-site. Two different subdomains behave independently or partly independently, and the mutation in one subdomain does not alter the binding and cleavage properties of the other subdomain, or does not alter it in a number of cases. Therefore, two subdomains bind distinct part of a homing endonuclease DNA target half-site.
- by "beta-hairpin" is intended two consecutive beta-strands of the antiparallel beta-sheet of a LAGLIDADG homing endonuclease core domain which are connected by a loop or a turn, - by "C 1221" it is intended to refer to the first half of the I-Crel target site ' 12' repeated backwards so as to form a palindrome ' 1221 '.
- by "cleavage activity" the cleavage activity of the variant of the invention may be measured by a direct repeat recombination assay, in yeast or mammalian cells, using a reporter vector, as described in the PCT Application WO 2004/067736; Epinat et al, 2003; Chames et al., 2005 and Arnould et al., 2006. The reporter vector comprises two truncated, non-functional copies of a reporter gene (direct repeats) and a chimeric DNA target sequence within the intervening sequence, cloned in a yeast or a mammalian expression vector. The DNA target sequence is derived from the parent homing endonuclease cleavage site by replacement of at least one nucleotide by a different nucleotide. Preferably a panel of palindromic or non- palindromic DNA targets representing the different combinations of the 4 bases (g, a, c, t) at one or more positions of the DNA cleavage site is tested (4n palindromic targets for n mutated positions). Expression of the variant results in a functional endonuclease which is able to cleave the DNA target sequence. This cleavage induces homologous recombination between the direct repeats, resulting in a functional reporter gene, whose expression can be monitored by appropriate assay. - by "DNA target", "DNA target sequence", "target sequence", "target-site", "target" , "site"; "recognition site", "recognition sequence", "homing recognition site", "homing site", "cleavage site" is intended a 22 to 24 bp double- stranded palindromic, partially palindromic (pseudo-palindromic) or non-palindromic polynucleotide sequence that is recognized and cleaved by a LAGLIDADG homing endonuclease. These terms refer to a distinct DNA location, preferably a genomic location, at which a double stranded break (cleavage) is to be induced by the endonuclease. The DNA target is defined by the 5' to 3' sequence of one strand of the double-stranded polynucleotide. For example, the palindromic DNA target sequence cleaved by wild type I-Crel is defined by the sequence 5'- t.i2c.na-ioa-9a-ga-7c-6g.5t-4c- 3g.2t-ia+[C+2g+3a+4C+5g+6t+7t+8t+9t+iog+iia+i2 (SEQ ID NO:4). Cleavage of the DNA target occurs at the nucleotides in positions +2 and -2, respectively for the sense and the antisense strand. Unless otherwise indicated, the position at which cleavage of the DNA target by a meganuclease variant occurs, corresponds to the cleavage site on the sense strand of the DNA target.
- by " DNA target half-site", "half cleavage site" or half-site" is intended the portion of the DNA target which is bound by each LAGLIDADG homing endonuclease core domain.
- by "DClONNN", (SEQ ID NO: 3) it is intended that this is the target sequence of DmoCre with variability in positions +8, +9 and +10 of the sequence, hence DmoCre in position 10 variable at 3 nucleotides sequentially backwards from 10. Likewise DC4NNN (SEQ ID NO: 9) refers to the target sequence of DmoCre with variability in positions +2, +3 and +4 of the sequence; and DC7NNN (SEQ ID NO: 10) refers to the target sequence of DmoCre with variability in positions +5, +6 and +7 of the sequence.
- by "chimeric DNA target» or "hybrid DNA target" is intended the fusion of a different half of two parent meganuclease target sequences. In addition at least one half of said target may comprise the combination of nucleotides which are bound by separate subdomains (combined DNA target). - by "mutation" is intended the substitution, the deletion, and/or the addition of one or more nucleotides/amino acids in a nucleic acid/amino acid sequence. - by "homologous" is intended a sequence with enough identity to another one to lead to a homologous recombination between sequences, more particularly having at least 95 % identity, preferably 97 % identity and more preferably 99 %. - "Identity" refers to sequence identity between two nucleic acid molecules or polypeptides. Identity can be determined by comparing a position in each sequence which may be aligned for purposes of comparison. When a position in the compared sequence is occupied by the same base, then the molecules are identical at that position. A degree of similarity or identity between nucleic acid or amino acid sequences is a function of the number of identical or matching nucleotides at positions shared by the nucleic acid sequences. Various alignment algorithms and/or programs may be used to calculate the identity between two sequences, including FASTA, or BLAST which are available as a part of the GCG sequence analysis package (University of Wisconsin, Madison, Wis.), and can be used with, e.g., default settings. - "individual" includes mammals, as well as other vertebrates (e.g., birds, fish and reptiles). The terms "mammal" and "mammalian", as used herein, refer to any vertebrate animal, including monotremes, marsupials and placental, that suckle their young and either give birth to living young (eutharian or placental mammals) or are egg-laying (metatharian or nonplacental mammals). Examples of mammalian species include humans and other primates (e.g., monkeys, chimpanzees), rodents (e.g., rats, mice, guinea pigs) and ruminants (e.g., cows, pigs, horses).
- "genetic disease" refers to any disease, partially or completely, directly or indirectly, due to an abnormality in one or several genes. Said abnormality can be a mutation, an insertion or a deletion. Said mutation can be a punctual muta- tion. Said abnormality can affect the coding sequence of the gene or its regulatory sequence. Said abnormality can affect the structure of the genomic sequence or the structure or stability of the encoded mRNA. This genetic disease can be recessive or dominant. Such genetic disease could be, but are not limited to, cystic fibrosis, Huntington's chorea, familial hyperchoiesterolemia (LDL receptor defect), hepatoblastoma, Wilson's disease, congenital hepatic porphyrias, inherited disorders of hepatic metabolism, Lesch Nyhan syndrome, sickle cell anemia, thalassaemias, xeroderma pigmentosum, Fanconi's anemia, retinitis pigmentosa, ataxia telangiectasia, Bloom's syndrome, retinoblastoma, Duchenne's muscular dystrophy, and Tay-Sachs disease.
- "vectors": a vector which can be used in the present invention includes, but is not limited to, a viral vector, a plasmid, a RNA vector or a linear or circular DNA or RNA molecule which may consists of a chromosomal, non chromosomal, semi-synthetic or synthetic nucleic acids. Preferred vectors are those capable of autonomous replication (episomal vector) and/or expression of nucleic acids to which they are linked (expression vectors). Large numbers of suitable vectors are known to those of skill in the art and commercially available. Viral vectors include retrovirus, adenovirus, parvovirus (e. g. adeno- associated viruses), coronavirus, negative strand RNA viruses such as orthomyxovirus (e. g., influenza virus), rhabdovirus (e. g., rabies and vesicular stomatitis virus), paramyxovirus (e. g. measles and Sendai), positive strand RNA viruses such as picor- navirus and alphavirus, and double-stranded DNA viruses including adenovirus, herpesvirus (e. g., Herpes Simplex virus types 1 and 2, Epstein-Barr virus, cytomegalovirus), and poxvirus (e. g., vaccinia, fowlpox and canarypox). Other viruses include Norwalk virus, togavirus, flavivirus, reoviruses, papovavirus, hepadnavirus, and hepatitis virus, for example. Examples of retroviruses include: avian leukosis- sarcoma, mammalian C-type, B-type viruses, D type viruses, HTLV-BLV group, lentivirus, spumavirus (Coffin, J. M., Retroviridae: The viruses and their replication, In Fundamental Virology, Third Edition, B. N. Fields, et al., Eds., Lippincott-Raven Publishers, Philadelphia, 1996). The term "vector" refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. One type of preferred vector is an episome, i.e., a nucleic acid capable of extra-chromosomal replication. Preferred vectors are those capable of autonomous replication and/or expression of nucleic acids to which they are linked. Vectors capable of directing the expression of genes to which they are operatively linked are referred to herein as "expression vectors". A vector according to the present invention comprises, but is not limited to, a YAC (yeast artificial chromosome), a BAC (bacterial artificial), a baculovirus vector, a phage, a phagemid, a cosmid, a viral vector, a plasmid, a RNA vector or a linear or circular DNA or RNA molecule which may consist of chromosomal, non chromosomal, semi-synthetic or synthetic DNA. In general, expression vectors of utility in recombinant DNA techniques are often in the form of "plasmids" which refer generally to circular double stranded DNA loops which, in their vector form are not bound to the chromosome. Large numbers of suitable vectors are known to those of skill in the art. Vectors can comprise selectable markers, for example: neomycin phosphotransferase, histidinol dehydrogenase, dihydrofolate reductase, hygromycin phosphotransferase, herpes simplex virus thymidine kinase, adenosine deaminase, glutamine synthetase, and hypoxanthine-guanine phosphoribosyl transferase for eukaryotic cell culture; TRPl for S. cerevisiae; tetracycline, rifampicin or ampicillin resistance in E. coli.
Preferably said vectors are expression vectors, wherein a sequence encoding a polypeptide of the invention is placed under control of appropriate transcriptional and translational control elements to permit production or synthesis of said protein. Therefore, said polynucleotide is comprised in an expression cassette. More particularly, the vector comprises a replication origin, a promoter operatively linked to said encoding polynucleotide, a ribosome site, an RNA-splicing site (when genomic DNA is used), a polyadenylation site and a transcription termination site. It also can comprise an enhancer. Selection of the promoter will depend upon the cell in which the polypeptide is expressed. EXAMPLE 1 - Materials and Methods
1.1. Protein expression and purification
E. coli Rosetta(DE3)pLysS cells were transformed with plasmid pET24d(+) containing the I-Dmo-I ORF with a 6His tag at the C-terminus. His-tagged I-Dmo-I was overexpressed in LB medium at 24.85°C for 5 h after addition of 0.3 mM IPTG when the OD60O was around 0.6-0.8. Selenomethionine-labelled I-Dmo-I was expressed using the same strain.
The cells were collected from a 50 ml overnight culture grown in LB medium containing 30 mg ml"1 kanamycin until OD600 ~ 1 0; at this point the cells were spun down, washed once with M9 minimal medium and finally resuspended in M9 minimal medium supplemented with thiamine (0.01 mg ml"1), glucose [0.4%(w/v)], CaCl2 (0.0147 mg ml"1), MgSO4 (0.246 mg ml"1) and kanamycin (30 mg ml"1). The culture was shaken at 36.85°C for 30 min and selenomethionine (50 mg ml'1; Molecular Dimensions) was then added together with lysine hydrochloride, threonine, phenylalanine, leucine, isoleucine and valine as described in Van Duyne et al. (1993). After an additional 15 min of shaking, protein expression was induced for 5 h at 24.85°C by the addition of 0.3 mM IPTG. The bacterial pellet was resuspended and the cells were disrupted by sonication in 50 mM sodium phosphate pH 8.0, 300 mM NaCl and 5% glycerol including protease inhibitors (Complete EDTA-free tablets, Roche). The lysate was clarified by centrifugation (20 00Og for 1 h). The supernatant was applied onto a Co2+- loaded HiTrap Chelating HP column (GE Healthcare) and the protein was eluted using an imidazole gradient (0-0.5 M).
The fractions containing I-Dmo-I were collected and the pH was adjusted to 6.0. The sample was loaded onto a 5 ml HiTrap Heparin HP column (GE Healthcare) previously equilibrated with 20 mM sodium phosphate pH 6.0. The sample was eluted with a continuous gradient from 0 to 1 M NaCl in 20 mM sodium phosphate pH 6.0 buffer.
The purified protein was subsequently concentrated using an Amicon Ultra system equipped with a 10 kDa cut off filter and loaded onto a PD-IO Desalting column (GE Healthcare) pre-equilibrated with 5 mM Tris-HCl pH 8.0 and 150 mM NaCl. The protein was concentrated to 16 mg ml"1, flash frozen in liquid nitrogen and stored at -8O.15°C. The protein concentration was determined from the absorbance at 280 nm. The purity of the samples was checked by SDS-PAGE and their homogeneity was evaluated using dynamic light scattering. Finally, the incorporation of selenomethionine was tested by mass spectrometry (data not shown). 1.2 I-DmoI-DNA complex formation The I-Dmol target DNA was purchased from Proligo and consisted of two strands of sequence 5'-GCCTTGCCGGGTAAGTTCCGGCGCG-S ' (SEQ ID NO: 1 1 and 5 '-CGCGCCGGAACTTACCCGGCAAGGC-S' (SEQ ID NO: 12). The construct forms a 25 bp blunt-end duplex. Due to the stability of I-Dmo-I at high temperature (Tm ~ 89.85°C, data not shown), the I-DmoI-DNA complex was formed after pre-warming the meganuclease and the oligonucleotide samples to 14.85°C and then mixing them in a 1.5: 1 molar ratio (DNA:protein). The mixture was incubated for 50 min and then spun down for 5 min. The supernatant was stored at room temperature to avoid precipitation. To assess the presence of DNA in the complex with I-Dmo-I, the purified complex was analyzed by running a 15% SDS-PAGE and staining first with Coomassie and subsequently with SYBR Safe. The same protocol was followed in the presence of 2 mM Ca2+ or Mn2+.
1.3 Crystallization
Crystallization screening was performed immediately after complex formation using a Cartesian MicroSys robot (Genomic Solutions) and the sitting-drop method (96-well MRC plates) with nanodrops of 0.1 ml protein solution plus 0.1 ml reservoir solution and a reservoir volume of 60 ml. The initial screens tested were Crystal Screens I and II, Crystal Screen Cryo and Crystal Screen Lite (Hampton Research), Wizard I and II, Wizard Cryo I and II, Precipitant Synergy Primary, Precipitant Synergy Expanded 67% and Precipitant Synergy Expanded 33% (Emerald BioSystems). The final concentration of I-Dmo-I in the DNA-protein complex solution was 6 mg ml"1. Crystals were obtained in the nanodrops under several conditions (Crystal Screen I conditions 15 and 36, Crystal Screen II conditions 22, 35, 37 and 43, Crystal Screen Cryo conditions 15, 20 and 37, Crystal Screen Lite conditions 18, 28 and 41, Wizard I condition 21, Wizard Cryo I conditions 40 and 47, Wizard Cryo II condition 10, Precipitant Synergy Primary conditions 42 and 52 and Precipitant Synergy Expanded 67% condition 51).
These small crystals were collected under Al's oil and tested for diffraction using synchrotron radiation. The best diffracting crystals were obtained using condition 37 of Hampton Research Crystal Screen Cryo (5.6% PEG 4000, 0.07 M sodium acetate pH 4.6, 30% glycerol). Crystals grown under these conditions were subjected to three cycles of optimization using 24-well Linbro plates with droplets containing 1 ml protein solution plus 1 ml reservoir solution and a reservoir volume of 500 ml. The best crystals were those that grew using 5.6% PEG 4000, 0.07 M sodium acetate with a pH ranging from 4.5 to 5.5 and 30% glycerol (Fig. Ib). Plate-shaped clusters (approximately 0.2-0.4 x O.lx 0.05 mm) grew in 5-15 days and were easily disrupted into single crystals using an acupuncture needle. Further changes in the crystallization conditions did not lead to single crystals (changes in the PEG and buffer, the use of Hampton Additive Screen and one cycle of seeding were attempted). The presence of DNA in the crystals was confirmed by fluorescent detection using SYBR Gold (Invitrogen; Kettenberger & Cramer, 2006; Fig. Ic). Finally, crystallization trials were performed in the presence of 2 mM CaCl2 or MnCl2. 1.4 Data collection
Crystals were removed directly from the drop and flash-frozen in liquid nitrogen. The crystals were tested in-house using a Bruker FR-591 generator and diffracted to 3.5 A resolution. Further data sets for the /-Dwo-/-DNA complex were collected using synchrotron radiation at the ID-29 (ESRJF) and PX (SLS) beamlines. The diffraction data in Table I were recorded using an ADSC-Q315 detector at ID-29. The best data set was collected using Δφ = 1 ° and a wavelength of 0.979 A. Processing and scaling were accomplished using HKL-2000 (Otwinowski & Minor, 1997). The statistics of the crystallographic data are summarized in Table I.
Table I showing data-collection statistics of the native I-Dmo-I- DNA crystals grown in 2 mM Mn2+.
Figure imgf000024_0001
f Rsym = Σh∑i|Ih;i - (lh)|/Σh∑i|Ih;i|
The list of atomic coordinate date in pdb format is given in Figure 8 for I-Dmol in combination with its target in the presence of Mn2+ and Figure 9 for I- Dmol in combination with its target in the presence of Ca2+. 1.5 Structure solution, model building and refinement
All data were collected at cryogenic temperatures using synchrotron radiation at -173.15°C. I-Dmol crystals were mounted and cryoprotected. The data sets were collected using synchrotron radiation at the ID29 beamline at the ESRF (Grenoble), and at the PX beamline at the SLS (Villigen). Diffraction data were recorded on an ADSC-Q4 or Mar225 CCD detectors depending on the beamline. Processing and scaling were accomplished with HKL2000 (Otwinowski and Minor, 1997) and XDS (Kabsch, 1988). Statistics for the crystallographic data and structure solution are summarized in Table II. Table IL- Data collection, phasing and refinement statistics
Figure imgf000025_0001
Figure imgf000026_0001
a Values in the highest resolution shell are given in parentheses. b Rmerge - ∑ή∑>|Iή,i - <Iή>|/∑ή∑i|Iή,l|
Calculated using MOLEMAN. (Kleywegt et al., 2001) d Calculated using PROCHECK. (Laskowski et al., 1993) * Se-SAD: single anomalous dispersion of Se.
Reduced intensities were used to search for the Se substructure. A resolution cut-off of 4.0 A was applied during the substructure solution. Thus, the three disulphides were treated as super-sulphur atoms. All the nine Se positions were found with SnB (Weeks and Miller, 1999) and SHELXD (Schneider and Sheldrick, 2002). These positions were fed into SHARP (de Ia Fortelle and Bricogne, 1997). After solvent flattening with SOLOMON (Abrahams and Leslie, 1996) the initial 2.7 A map, showing all the Se sites, was used for automatic model building using MAID (Levitt, 2001). The models of I-DmoIIOHA complexes were rebuilt and refined to 2.1 A (PDB entries 2vs7, 2vs8) using REFMAC5 (Murshudov et al., 1997) (Table II). 1.6 Construction of target clones
The non palindromic twenty-four base pairs long target sequence 5'- GCCTTGCCGGGTAAGTTCCGGCGC-3' (SEQ ID NO: 13) is the natural I-Dmol target. For nomenclature purpose, the inventors divided it in two equal parts L and R. The 64 degenerated targets derived from LR sequence were obtained by mutating nucleotides at positions +8, +9, and +10 in the R sequence. Sixty four pairs of oligonucleotides (5'-GCCTTGCCGGGTAAGTTCCNNNGC-S ' (SEQ ID NO: 14) and reverse complementary sequences) representing the target library LR(IONNN) were ordered from Sigma, annealed, and cloned using the Gateway protocol (Invitrogen) into the yeast pFL39-ADH-LACURAZ containing a I-Scel target site as control(Arnould et al., 2006). Yeast reporter vectors were transformed into S. cerevisiae strain FYBL2-7B (MAT a, ura3Δ851 , trplΔ63, leu2Δl , lys2Δ202).
1.7 Yeast screening
I-Dmol WT (wild-type) and the two I-Dmol mesophilic variants reported, Dl and D2, (Prieto et al., 2008) were screened against the 64 I-Dmol derivated targets (R-IONNN) by mating meganuclease expressing yeast clones with yeast strains harboring a reporter system as previously described (Arnould et al., 2006). The Meganuclease-induced recombination of the LacZ reporter system restores a functional beta-galactosidase gene which can be detected by X-GaI staining.
1.8 Interaction analysis Specificity logos in Fig 6a have been calculated with FoIdX, by mutating each of the bases of the DNA to all other nucleotides. A specificity logo is a diagrammatic representation of the specificity preference for each of the possible nucleotides in the I-Dmol (non coding strand) SEQ ID NO: 15, E-Drel SEQ ID NO: 17, 1-Crel SEQ ID NO: 15 and I-Scel SEQ ID NO: 18, DNA target sites. The height of a given nucleotide is proportional to exp(-ΔΔG,nt/RT), where ΔΔG,nt is difference in interaction energy between the complex with mutated DNA and the wild type. This function is graphically displayed as information content thanks to the R package seqLogo. Matrices for each of the four meganucleases analyzed, containing the difference of interaction energy due to each mutation in the DNA, have been scanned through both the S. cervisiae and the D. melanogaster genomes in order to find how many potential binding sites can be found. Two different energy thresholds have been used: 2 kcal/mol, which sets the limit for reasonable binding, and 4 kcal/mol, to show the increase in sites with a more permissive threshold. EXAMPLE 2 - Results
2.1 Structure determination
Full-length I-Dmol in complex with a 25bp double stranded DNA was crystallized as an enzyme-substrate complex with calcium and as an enzyme- product complex with manganese. Protein expression, purification, protein-DNA complex formation and crystallization were carried out as described in example 1 above. The phase problem was solved using the anomalous signal at the selenium peak wavelength.
The single anomalous dispersion (SAD) method was applied to obtain initial phases at 2.8 A resolution in crystals grown with Se-Met protein (see example 1). The three selenium atoms were located in I-Dmol using SHELX (Schneider and Sheldrick, 2002) and initial phases were obtained with SHARP (de Ia Fortelle and Bricogne, 1997). The initial model was built in 2.6 A 2fo-fc maps after solvent flattening using SOLOMON (Abrahams and Leslie, 1996). The structures were finally refined to 2.0 and 2.1 A in the same monoclinic space group using REFMAC (Murshudov et al., 1997) (Table II). There were three protein-DNA complexes in the asymmetric unit and the quality of the electron density maps was excellent (see Fig. 1). The first and last residues observed in the electron density for the protein moiety are Glu5 and His 195, whereas all the DNA nucleotides were clearly visible both in the Ca2+ and the Mn2+ bound structures.
Figure 1 shows the crystal structure of I-Dmol in complex with its target DNA. Panel a) shows the protein secondary structure, the complex is shown in two different orientations. The calcium ion is shown. The crystallization oligonucleotide construct is shown in panel b). Throughout the present Patent Application the individual bases are named with a subindex strandA (coding strand) or strandB (non-coding strand) indicating the DNA strand where they belong.
2.2 Overall structure of the I-DmoIfDNA complex
The overall fold of I-Dmol in complex with its DNA target (Fig. Ia) shows a clear pseudo two-fold axis between the two LAGLIDADG helices dividing the protein in two domains, A (residues 5-98) and B (residues 103-195) joined by a four residue linker. These domains contain the typical αββαββα topology of the
LAGLIDADG family. Both domains have a similar size and the β-strands form two antiparallel β-sheets composed of strands βl-4 in domain A and β5-7 in domain B. The β-sheets form a concave surface with an inner cylindrical shape where the DNA molecule is accommodated. The substrate and product bound structures — Cαι-i79 rmsd 0.21 A — differ only in the DNA molecule, with both strands cleaved at the expected sites (Fig. Ib) in the Mn2+ bound complex. RMSD refers to Root Mean Square Deviation and is the measure of the average distance between the backbones of superimposed proteins. Both the cleaved and intact DNA molecules show a large bending of the double helix as compared with a regular B-DNA structure or with analogous duplexes bound to other meganucleases. In general the structure of the α-helical core of the protein scaffold is similar to the l-Dtnol structure in the absence of the DNA duplex (Silva et al.,1999). However, there are structural differences in the overall conformation of the β-strands, the loops that join them and the connections with the core helices, which favor the embracing of the DNA molecule (Fig. 2). Figure 2 shows a detailed view of the I-Dmol active site. In panel a) there is shown a stereo comparison of the enzyme active site in the substrate and product bound structures. The metal sites are labeled (Ml) for the shared position between calcium and manganese and the second manganese atom (M2). Anomalous difference maps illustrate the presence of only one atom of calcium in the DNA bound structure b) and two atoms of manganese in the digested DNA structure c). Panel d) shows a schematic diagram of the hypothetical enzymatic mechanism proposed for /- Dmol. Hydrolysis of the phosphodiester bonds would follow a sequential two-metal mechanism. While a single metal ion (sitel) is bound in one active site and the water nucleophile is positioned in the central site. A second metal ion would enter the second site (site2) displacing the water molecule previously located in that site to the central one. The Asp21, Gly20 and GIu 1 17, Alal lό are contributed by the LAGLIDADG motifs of the enzyme.
All these changes are reflected in a Cαi-i79 rmsd of 1.27 A between the I-Dmol and I-DmoI/DNA complex. The structures of the I-DmoIfDNA complex also depict differences when compared to the DNA complex of E-Drel (Chevalier et al., 2002), a chimeric enzyme which contains the domain A of I-Dmol fused to an /- Crel monomer. A comparison between the wild type structure and the I-Dmol domain A of the chimera reveals that the recognition of the DNA is different due to changes in the DNA conformation, whereas the comparison between the other half of the enzyme is completely different both at the protein moiety level and the DNA backbone also adopts a different conformation (Fig.3). Figure 3 shows a scheme of the Protein-DNA contacts in the Ca2+ and Mn2+ bound structures. The cleavage sites are indicated by the shaded phosphates. In the Mn2+ structure the portion of the DNA target which binds to the domain A of /- Dmol consists of residues -2 to 13 of the '3 strand and residues 3 to 13 of the 5' strand, with the remaining nucleotides being bound by the domain B of I- Dmol. . Lines indicate polar contacts and van der Waals interactions respectively. Dots represent water molecules involved in the interaction. Amino acids depicted on blackened boxes represent hydrogen bond interactions with the bases and the other residues represent van der Waals interactions with the DNA (bases, riboses or phosphates). Although the model generated using E-Drel in combination with its
DNA target provided a preliminary view of regions involved in DNA target recognition inside domain A, it did not yield a complete picture of the recognition mechanism.
2.3 Active site and cleavage mechanism Divalent metal ions play an essential role in the catalysis of endonucleases and other enzymes. In LAGLIDADG homing endonucleases, the conserved acidic residues at the active sites coordinate these divalent cations. The general mechanism of cleavage of the phosphodiester bonds of DNA requires a nucleophile to attack the electron deficient phosphorus atom, a general base to activate the nucleophile, a general acid to protonate the leaving group, and positively charged groups to stabilize the phosphoanion transition state. Interestingly, the presence of cations is dispensable for DNA binding (Fig. 3) (Dalgaard et al., 1994). Native electrophoresis showed that DNA binding could be accomplished in the absence of metal, suggesting that the presence of the ion is necessary only for catalysis. Figure 4 shows the loops involved in DNA binding by I-Dmol. The upper part of the figure depicts a ribbon diagram of the I-DmoIFDNA complex. Domain A contains two loops that contact the DNA (LIa and L2a), and domain B only has one loop (L2b) engaged in contacts with the nucleic acid. L2a and L2b are primarily associated with the central bases of the target site, and LIa is associated with bases outside that region — reflecting the asymmetry of the target recognition by I-Dmol. The lower part of the figure shows detailed insets of the three loops involved in DNA interactions. The protein-DNA interactions are displayed as dashed lines.
The active sites of I-Scel, I-Crel and I-Dmol are structurally similar, containing overlapping residues (Figures 5 and 2a).
Figure 5 shows structural basis of DNA recognition, a) Structural sequence alignment between the archaeal I-Dmol, eukaryotic I-Scel and I-Crel homing endonucleases. Secondary structure elements of the homing endonuclease I- Dmol are shown above the alignment. Conserved residues are boxed with a black background while homologous residues are boxed with a white background. Residues with a gray background are those involved in protein-base contacts in the complexes crystal structures. Sequence alignment was carried out with Clustal (Larkin et al., 2007) and the structural alignment with ESPript (Gouet et al., 1999). Panel b) shows a comparison of the location of the protein-base contacts (regions colored in gray) in the I-Dmol, I-Scel and I-Crel protein-DNA structures. Panel c) shows a schematic view of the protein-base contacts.
However, the precise use of bound divalent metal ions for cleavage by LAGLIDADG meganucleases, specially the monomeric ones, is still controversial (Stoddard, 2005). The presence of three cations in the active-sites (two unshared cations and one shared between the two catalytic residues) in the high resolution structures of I-Crel with bound DNA, supported a two-metal catalytic mechanism in which one central metal site is shared between two overlapping active sites (Chevalier et al., 2004). The role of the unshared cation present in each active site is to position a nucleophilic water molecule for the SN2 inline attack of the scissile bond P-O3'. The central cation should stabilize the phosphoanion transition state in the hydrolysis of both strands, and facilitate the protonation of the 03' leaving group of each strand. The absence of direct protein contacts between I-Crel and the nucleophilic water molecule did not facilitate the identification of a general base. It has been suggested that the extensive network of water molecules surrounding the active site participates in a concerted transfer of hydrogen atoms that activate the nucleophilic water molecule and protonate the leaving group.
The situation is different in I-Dmol; the inventors analyzed the positions of the cations in the DNA bound and cleaved structures. Anomalous diffraction data were collected using crystals grown in the presence of CaCl2 and MnCl2. The binding of non-activating calcium ions to the enzyme-DNA complex was visualized by collecting a highly redundant diffraction data set on a home X-ray source (Table II) and examining the resulting anomalous difference Fourier maps (Fig. 2b). A similar strategy was followed for the enzyme-DNA complex with Mn2+ using synchrotron radiation (Fig 2c and table II). In contrast to I-Scel and I-Crel, whose number of non-activating Ca2+ cations in the active center have been reported to be three and two respectively (Moure et al., 2003) (Chevalier et al., 2004), only one anomalous peak could be detected in the I-Dmol active centre. The Ca2+ atom is hexa- coordinated with phosphates from DNA strands (2AstrandA and -3CstrandB), three water molecules and Asp21 from of the first LAGLIDADG motive. Interestingly, the additional anomalous difference mapping study in the presence of Mn2+ demonstrates binding of two metal ions with equivalent occupancy. The central site did not show any anomalous signal and therefore it was modeled as a water molecule. One of the Mn2+ cations is coordinated with the side chain of Asp21, the carbonyl of Alal 16, the 5 'phosphate of -3CstrandB, the phosphate of 2AstrandA and a water molecule outside the active site, whereas the second Mn2+ has similar interactions with the 5 phosphate of 3GstrandA, the phosphate of -2CstrandB the main chain carbonyl of Gly20, the side chain of Glul l7 in the second LAGLIDADG motive, and another water molecule outside the active center. Whereas I-Scel and I-Crel contain three metal sites in the active site
(Chevalier et al., 2004; Moure et al., 2003), I-Dmol contains only two metal sites. In addition the comparison of the I-Dmol Ca2+ and Mn2+ anomalous maps shows that only one of the metal sites overlaps. The other sites, including the central one, were occupied by water molecules, whereas in the case of I-Scel and I-Crel both can be occupied by a metal (Chevalier et al., 2004; Moure et al., 2003). Therefore the structural organization of the I-Dmol active site presents a clear asymmetry in the case of the calcium-bound structure, indicating a sequential mechanism for I-Dmol catalysis that has also been suggested for I-Scel (Moure et al., 2003). The non-coding strand would be cleaved before the final reaction takes place on the coding strand. In this mechanism (Fig. 2d) the central water could be the nucleophile that would initiate the reaction, previous activation by the electropositive environment generated by the metal present in the active site (Garcia-Viloca et al., 2004). Once that the non-coding strand is cleaved the entry of another catalytic metal in the second site would promote the transfer or regeneration of the central water, leading to the cleavage of the coding strand. The Ca2+ bound structure would represent a snapshot of the activation state previous to the cleavage of the phosphodiester bond in the non-coding strand whereas the Mn2+ bound structure would depict the organization of the active site after the cleavage of both strands.
Hence, the enzyme would produce a nick in the DNA non-coding strand before the coding strand would be cleaved resulting in the double strand cleavage. Although, this possible mechanism could be discarded after the observation of the cleavage properties of I-Dmol Asp21Asn and GIu 1 17GIn single mutants (Lykke- Andersen et al., 1997), nicked intermediates were observed in I-Scel (Perrin et al., 1993) and in I-Dmol when the cleavage properties of a homodimeric I-Dmol mutant were studied using a plasmid as substrate (Silva et al., 2006) indicating that a sequential cleavage mechanism is possible for the monomeric members of the LAGLIDADG homing endonuclease family. An explanation for the abolishment of the enzyme activity in the Asp21Asn and GIu 1 17GIn single mutants is suggested by the structures, which indicate that these single mutants could induce a hydrogen bond interaction with the acidic residue in the non-mutated LAGLYDADG motive disturbing the coordination of the shared central water in the active site and therefore abolishing enzyme catalysis.
2.4 DNA target recognition
As mentioned before, I-Dmol contains two LAGLIDADG helices and it binds the nucleic acid in a monomeric form. The protein forces a clear bend in the DNA molecule forming and angle of approximately 140° between the longitudinal axes of both DNA halves. This angle distorts the minor groove in the middle of the DNA molecule positioning both strands in the enzyme's active site. The crystal structure reveals the asymmetric nature of the I-Dmol DNA binding cavity. Interestingly, domain A contains a four β-strand sheet, whereas domain B contains only three (Fig.1). A detailed view of the protein DNA contacts in the loops shows that the L2a and L2b loops contact symmetric regions on the DNA major grooves (Fig. 4). In contrast, LIa interacts with bases (6-10) in the major groove closer to the 5' in strand B. This protein-DNA interaction is absent in the other half of the DNA target. The lack of the fourth β -strand in domain B eliminates the presence of a loop similar to LIa in domain B, promoting the lack of protein DNA contacts in the major groove closer to the 5' in strand A. This difference implies that the target half associated with domain A (basesl-13 in both strands) is recognized by a major number of residues.
The protein-DNA contacts in the substrate and product bound structures were analyzed in detail with NUCPLOT (Luscombe et al., 1997) (Fig. 3, 4, 6 and 7). A schematic representation of the interaction reveals few differences in the protein DNA interactions between both forms (Fig.3). The main contacts in domain B interacting with the nucleotide bases involve Argl24, Argl26, Asp 154, Argl57 and Aspl 55. Argl24 is positioned at a proper distance to make polar contacts with the bases of -7GstrandA and -6GstrandB, whereas Argl26 hydrogen bonds the base of - 5GstrandB- The conformation of Argl26 side chain is influenced by the interaction with Asp 1 19 that does not contact the nucleotide bases, but interacts with the phosphate backbone. The rotamer of Aspl l9 forces a conformation of the Argl26, indirectly inducing the recognition of the base at -5GstrandB- The conformations and contacts of these residues are very similar both in the bound and cleaved DNA structures. In Loop2b, Aspl54 makes a hydrogen bond with the base of -5CstrandA- The conformation of the Asp side chain is predisposed by polar contacts with the imidazol ring of Trpl28, the same network of interactions is again observed in both structures. Finally, Argl57 hydrogen bonds the base of -3GstrandA in both structures. The conformation of Argl57 side chain is influenced by the conformation of Aspl 55, whose side chain also hydrogen bonds the base of -2CstrandB (Fig.4, Loop2b).
In domain A, the Ioop2a presents Thr76 and Asp75, which are the only amino acids whose interactions provide specific recognition in the central four base pairs of the DNA. Whereas the side chain of Thr76 makes a polar contact with the base of 2AstrandA, Asp75 hydrogen bonds the base of 3CstrandB- The conformation of this side chain is influenced by the presence of Arg77, which makes a polar contact with the base of 3GstrandA (Fig.4, Loop2a). The afore mentioned Arg77 together with Arg81, Arg37, Tyr29, Arg33, Glu79, Glu35 and Ser34 are the remaining residues in domain A responsible for making direct contacts with the bases of the target DNA (Fig.4, Loop2a, and Loop Ia). The residue Arg81 hydrogen bonds 6GstrandB- This residue is grouped with Arg37 and this cluster of basic residues is flanked by Glu79 and Glu35. The side chain of Glu79 makes a direct contact with the base of 5AstrandB and 5Ts,randA, whereas the side chain of Glu35 contacts the base of 8CstrandB- The conformation of the Glu35 side chain is favored by the interactions of the side chain of Ser83 and the main chain of Tyr36 with a water molecule, which contacts the DNA backbone. Tyr29 together with Arg33 and Ser34 form a second group of residues clustered in space that interact with the bases of 6CstrandA 9GstrandA and 9CstrandB respectively (Fig .4, Loop Ia). All these contacts between I-Dmol and the bases of its target DNA seem to be the responsible for DNA target recognition; the rest of the amino acids (Fig.3) involved in contacts make hydrogen bonds or van der Waals interactions with the DNA backbone.
2.5 Pseudo-palindromic versus non-palindrotnic DNA recognition
A structure-based sequence alignment of I-Dmol with I-Crel and I- Seel was performed to compare the molecular basis of I-Dmol DNA recognition with other members of the LAGLIDADG family. I-Crel and I-Scel are two well characterized meganucleases representing the homodimeric and monomeric members of the LAGLIDADG family that bind pseudo- and non-palindromic targets respectively (Fig.5). The alignment illustrates the differences in the primary and secondary structures among these enzymes regarding the location of residues involved in DNA binding (residues with gray background in Fig.5a). A structural comparison of the DNA binding residues of these homing endonucleases, shows that despite the similar structural scaffold the residues responsible for DNA recognition are not topologically conserved (Fig.5b). The schematic comparison of the specific base- protein contacts in the different meganuclease-DNA complexes (Fig.5c) illustrates how the homodimeric meganuclease accomplish DNA recognition generating a similar network of protein-DNA contacts on both sides of the pseudo-palindromic DNA, whereas the monomeric ones display a tendency to maximize the interactions in one half of the target DNA.
2.6 1-Dmol sequence specificity modelling
The differences in target recognition lead the inventors to assess the specificity of I-Dmol and compared it with I-Scel, I-Crel and the chimeric E-Drel. The inventors analyzed the length of the binding site and the number of specific positions for the target DNA sequences of each meganuclease. To perform this study the inventors have used the last version of FoIdX (FoldX2.8) (Schymkowitz et al., 2005). Each base was mutated to the other three possibilities and the resulting interaction energies were converted to a probability to predict the preference for each base at a determined position (Fig.6).
Figure 6 shows the in silico binding patterns and in particular the in silico binding specificities for I-Dmol, E-Drel, I-Crel and I-Scel. The energy-based logos display the different specificities of the meganucleases. I-Dmol presents a short binding site with the highest specificity, while I-Scel has a long but quite tolerant binding site. Base discrimination predicted by FoIdX compares notably well with respect to available experimental results for I-Scel (Doyon et al., 2006) and l-Crel (Argast et al., 1998). The reference sequences for the logos are the wild type coding strands, except for I-Dmol where the non-coding strand was used. Under each logo, the wild type sequence is shown, b) in silico R-10NNN binding pattern predicted by FoIdX. The pattern was calculated by using the wild type l-Dmol structure and based on the difference in interaction energy with the WT DNA sequence. The energies were calculated by adding up the change in interaction energy due to each individual mutation in the DNA. Hits are found for R-IOGCC, R-IOGCG, R-IOGCA, and R- IOACC (see coding strand B sequence in Figure Ib or 7). Only the R-IOGCG hit was not found experimentally for l-Dmol or the two mutants analyzed. The target triplet is shown inside each cell.
These predictions were validated by comparison with the experimental logos of l-Scel and l-Crel (Fig.6a) (Argast et al., 1998; Doyon et al., 2006). The different bases were considered to be specific when their probability was higher than 2/3, and the recognition site was defined as the sequence between the first and last specific nucleotide. The inventors defined a specificity index as the sum of probabilities of the specific bases normalized by the recognition site length (see materials and methods). Higher values of this index indicate a high degree of the meganuclease specificity. As can be observed (Table III and Fig.6), I-Dmol is a very specific homing endonuclease, the most specific of the group. Although its recognition site is the shortest one, it does not have many unspecific bases within it. The opposite case would be that of I-Crel, where the binding site spans over twenty two bases, but contains nine variable positions. I-Dmol also shows a clear preference for cytosine and guanine, reflecting the rich GC content of the thermophilic organism where it is coded. No clear correlation could be found between the interaction energy and the recognition site length or the number of specific bases. To evaluate the specificity of these enzymes in real genomes, the Inventors searched for putative binding sites in two model organisms.
The Inventors scanned the energy matrices coming from the analysis above along the Saccharomyces cerevisiae and Drosophila melanogaster genomes, using two different energy thresholds with respect the wild-type interaction energy. The results did not yield a hit in yeast with the lower energy threshold (I-Scel site in the yeast strain sequenced is disrupted by the insertion of an intron that contains the meganuclease), and very few were found in Drosophila melanogaster (Table III). When the energy threshold was increased, weaker hits become apparent (Table III). This could be important in the context of a highly expressed enzyme or one with enhanced activity. Table III.- DNA Interaction analysis for I-Dmol, E-Drel, I-Crel and I-Scel.
Figure imgf000037_0001
The binding event induces a deep kink in the nucleic acid molecule to force both strands in the active site. This kink is more pronounced than in other meganucleases of the LAGLIDADG family. In addition the structures of the enzyme with the bound and cleaved DNA molecule, suggest a sequential mechanism for the catalysis mechanism.
All these structural data provide valuable preliminary information to alter I-Dmol specificity with the final purpose of engineering custom meganucleases using this scaffold. Initial attempts have shown that this enzyme can be turn into a mesophilic meganuclease, a fundamental requirement for biotechnological purposes. 2.7 Sequence specificity in vivo
The cleavage properties of I-Dmol and of the Dl and D2 mutants have been previously analyzed (Prieto et al., 2008). I-Dmol exhibits poor activity at 37° C due to its thermophilic origin and therefore is not an appropriate tool for practical in vivo applications. Beforehand the inventors described two I-Dmol variants with enhanced activity at 370C named Dl (Ile52Phe, Leu95Gln) and I-Dmol D2 (Ile52Phe, Ala92Thr, PhelOlCys) whose mutations are outside the DNA binding regions (Prieto et al., 2008), with no impact on the DNA binding interface, nor on the structural integrity of the protein. In addition, since these two proteins have only the Ile52Phe mutation in common, any effect from the two other mutations should be detected by a differential binding pattern. The I-Dmol Dl and D2 variants display cleavage activity at 370C, and a thermodynamic analysis showed a significant destabilization of the mutants (Prieto et al., 2008). This decrease in stability together with the increase in activity could compromise the enzyme specificity increasing its toxicity. These two mutants were modelled with FoIdX giving destabilization energies compatible with the experimental results and revealing that their binding patterns are indistinguishable from the wild type one (data not shown).
This in silico analysis indicates that I-Dmol is a very specific meganuclease. To evaluate this specificity in an experimental model, the inventors monitored the cleavage pattern of I-Dmol and of its two mesophilic derivatives with the R-10NNN targets collection, which corresponds to the all sixty four possible triplets for the target positions 8GstrandA, 9GstrandA, and 10CstrandA, (Fig.7a). The R- 10NNN triplet is in contact with domain A (Fig. 1, 3, 4 and 7), in the region having the maximal density of protein/DNA interactions. The detailed interaction map of this region includes polar contacts of Arg33 with the 9GstrandA base, of Ser34 carbonyl main chain with 9CstrandB, of Ser34 main chain amide with 10GstrandB base; and of Glu35 side chain with 8CstrandB pirimidine ring.
Cleavage was monitored with a previously described yeast assay that directly monitors meganuclease-induced recombination at 37 °C (Arnould et al., 2006; Smith et al., 2006). Changes in the R-10NNN triplet are not supported by I-Dmol, at
370C in vivo, and the inventors could detect only the residual cleavage of the wild type
R-IOGCC target (Fig.7).
Figure 7 shows the in vivo cleavage patterns a) I-Dmol recognition site. For the in vivo nomenclature purpose, the target sequence has been divided in two halves, L (left) and R (right). 64 targets (R-IONNN) were derived from the natural I- Dmol targets differing from the natural I-Dmol target only by three base pairs at position 8, 9, and 10 on the R half of the target. •, cleavage positions, b) cleavage activities of I-Dmol wild type and of two mesophilic I-Dmol variants (Dl, D2). The 64 R-10NNN targets are identified in the top left panel by the 5'-NNN-3' bottom strand sequence of the nucleotides 10, 9, and 8. The grey box identifies the natural target. Bottom left, profile of I-Dmol. Top right, profile of l-Dmol variant Dl . Bottom right, profile of I-Dmol variant D2. Targets cleaved by the samples are boxed in solid black lines. The controls : no meganuclease expressed, at positions al, a4, a7,b2, b5, b8, c3, c6, dl, d4, d7, ...; I-SceI CLS (variant with moderate activity) , at positions bl , b4, b7, c2, c5, c8, ...; and /-See/ WT at positions cl, c4, c7, d2, d5, d8, ....
This target remained the preferential recognition site of both mesophilic variants. However Dl supports two variations: C to A at position 8 of the non-coding strand (R-IOGCA) and G to A at position 10 of the non-coding strand (R- 1 OACC). D2 supports only one, G to A at position 10 of the non-coding strand (R- 1 OACC). These results are in good agreement with the in silico calculations for the I- Dmol structure (Fig.6). The wild type DNA triplet displays the best interaction energy and the other two weak positives correspond to the other two experimental hits (R- 10GCA and R-IOACC). An extra positive found in silico (R-IOGCG) could not be confirmed experimentally. Consequently, the amino acids contacting the R-IONNN domain of the target seem to be essential determinants in l-Dmol target recognition. That region of DNA contains a dense network of specific polar and van der Waals interactions that influence binding in adjacent regions of the DNA molecule (Fig.3). 2.8 In silico screening methods
Based upon the crystal structure of I-Dmol and the new details about the way in which this homing endonuclease recognizes and cleaves its target DNA, in silico screening methods are now possible using the information from this structure to predict the effects of changes on the I-Dmol structure of residue changes therein and also changes in the DNA target. Such in silico screening allows large numbers of potential enzymes to be screened against all possible targets and allows the more time consuming later in vitro/in vivo characterisation work to be focussed upon the candidate molecules identified during an initial in silico screen or later more focussed analysis of three-dimensional models of candidate polypeptides.
The Inventors have previously conducted similar profiling with the I-Crel meganuclease as well as with hundreds of engineered derivatives (Arnould et al., 2006; Smith et al., 2006). Using a statistical approach, the inventors could also infer clues about the role of individual contacts between the protein and the target (Arnould et al., 2006; Smith et al., 2006). In former studies, the inventors have used structural data to engineer the specificity of the homodimeric I-Crel protein. First, the inventors locally engineered sub-domains of the I-Crel DNA binding interface to cleave DNA targets differing from the I-Crel target by a few consecutive base pairs. Then, mutations from locally engineered variants were combined into heterodimeric mutants in order to cleave chosen targets differing from the I-Crel cleavage site over their entire length (Arnould et al., 2006; Smith et al., 2006). During the first step, engineering relied essentially on the mutation of residues shown to contact the DNA target (Chevalier et al., 2003; Jurica et al., 1998). The structure of the I-DmoIfDNA complex opens new possibilities for a similar approach that the one used with I-Crel, which could present significant advantages over the I-Crel scaffold. The previously described I-Crel engineered derivatives are heterodimers. Therefore, they are obtained by co-expression of two different monomers in the target cell (Arnould et al., 2006; Arnould et al., 2007; Smith et al., 2006), resulting in the formation of three molecular species (the heterodimer and two homodimers), and as a consequence in an overall loss of specificity and efficiency. Previous studies with heterodimeric Zinc Finger Nucleases have shown that such homodimeric by-products can display high levels of toxicity (Beumer et al., 2006; Bibikova et al., 2003). As a monomeric homing endonuclease, the I-Dmol protein could bypass this drawback. Furthermore, I-Dmol seems to have a very narrow specificity. In recent reports, the inventors have shown that the I-Crel Asp75Asn meganuclease mutant had a narrow target specificity, showing strong cleavage for only 3 targets out of two similar collections of 64 targets derived from the wild-type I-Crel target (Arnould et al., 2006; Smith et al., 2006). The narrow cleavage pattern of the I-Dmol Dl and D2 variants suggest that I-Dmol is at least as selective as I-Crel. The induction of homologous gene targeting by sequence specific endonuclease is seen today as an emerging technology with many applications (Paques and Duchateau, 2007). However, an increasing emphasis is being set on the specificity of such endonucleases (Miller et al., 2007; Paques and Duchateau, 2007; Szczepek et al., 2007), in order to meet the high requirements of therapeutic applications. The scaffold of I-Dmol could be a very good starting point to engineer very specific endonucleases for such purposes. References Abrahams, J.P. and Leslie, A.G. (1996) Methods used in the structure determination of bovine mitochondrial Fl ATPase. Acta Crystallogr D Biol Crystallogr, 52, 30-42.
Argast, G. M., Stephens, K. M., Emond, MJ. and Monnat, R.J., Jr. (1998) I-Ppol and I-Crel homing site sequence degeneracy determined by random mutagenesis and sequential in vitro enrichment. J MoI Biol, 280, 345-353.
Arnould, S., Chames, P., Perez, C, Lacroix, E., Duclert, A., Epinat, J.C., Stricher, F., Petit, A.S., Patin, A., Guillier, S., Rolland, S., Prieto, J., Blanco, F.J., Bravo, J., Montoya, G., Serrano, L., Duchateau, P. and Paques, F. (2006) Engineering of large numbers of highly specific homing endonucleases that induce recombination on novel DNA targets. J MoI Biol, 355, 443-458.
Arnould, S., Perez, C, Cabaniols, J. P., Smith, J., Gouble, A., Grizot, S., Epinat, J.C., Duclert, A., Duchateau, P. and Paques, F. (2007) Engineered I-Crel derivatives cleaving sequences from the human XPC gene can induce highly efficient gene correction in mammalian cells. J MoI Biol, 371, 49-65. Ashworth J, Havranek JJ, Duarte CM, Sussman D, Monnat RJ Jr, Stoddard BL, Baker D. (2006) Computational redesign of endonuclease DNA binding and cleavage specificity. Nature. Jun l ;441(7093):656-9.
Beumer, K., Bhattacharyya, G., Bibikova, M., Trautman, J. K. and Carroll, D. (2006) Efficient gene targeting in Drosophila with zinc-finger nucleases. Genetics, 172, 2391-2403.
Bibikova, M., Beumer, K., Trautman, J.K. and Carroll, D. (2003) Enhancing gene targeting with designed zinc finger nucleases. Science, 300, 764.
Bolduc JM, Spiegel PC, Chatterjee P, Brady KL, Downing ME, Caprara MG, Waring RB, Stoddard BL. (2003) Structural and biochemical analyses of DNA and RNA binding by a bifunctional homing endonuclease and group I intron splicing factor. Genes Dev. Dec l ;17(23):2875-88. Epub 2003 Nov 21.
Chames P, Epinat JC, Guillier S, Patin A, Lacroix E, Paques F. (2005) In vivo selection of engineered homing endonucleases using double-strand break induced homologous recombination. Nucleic Acids Res. Nov 23;33(20):el78.
Chevalier, B., Sussman, D., Otis, C, Noel, A. J., Turmel, M., Lemieux, C, Stephens, K., Monnat, R.J., Jr. and Stoddard, B.L. (2004) Metal- dependent DNA cleavage mechanism of the I-Crel LAGLIDADG homing endonuclease. Biochemistry, 43, 14015-14026. Chevalier, B., Turmel, M., Lemieux, C, Monnat, R.J., Jr. and
Stoddard, B.L. (2003) Flexible DNA target site recognition by divergent homing endonuclease isoschizomers I-Crel and I-Msol. J MoI Biol, 329, 253-269.
Chevalier, B. S., Kortemme, T., Chadsey, M.S., Baker, D., Monnat, R.J. and Stoddard, B.L. (2002) Design, activity, and structure of a highly specific artificial endonuclease. MoI Cell, 10, 895-905.
Chevalier, B. S., Monnat, R.J., Jr. and Stoddard, B.L. (2001) The homing endonuclease I-Crel uses three metals, one of which is shared between the two active sites. Nat Struct Biol, 8, 312-316.
Chevalier, B. S. and Stoddard, B.L. (2001) Homing endonucleases: structural and functional insight into the catalysts of intron/intein mobility. Nucleic Acids Res, 29, 3757-3774. Choulika, A., Perrin, A., Dujon, B. and Nicolas, J.F. (1995) Induction of homologous recombination in mammalian chromosomes by using the I- Scel system of Saccharomyces cerevisiae. MoI Cell Biol, 15, 1968-1973.
Dalgaard, J.Z., Garrett, R. A. and Belfort, M. (1993) A site-specific endonuclease encoded by a typical archaeal intron. Proc Natl Acad Sci U S A, 90, 5414-5417.
Dalgaard, J.Z., Garrett, R. A. and Belfort, M. (1994) Purification and characterization of two forms of I-Dmol, a thermophilic site-specific endonuclease encoded by an archaeal intron. J Biol Chem, 269, 28885-28892. de Ia Fortelle, E. and Bricogne, G. (1997) Macromolecular
Crystallography. Academic Press, New York.
Doyon, J.B., Pattanayak, V., Meyer, CB. and Liu, D.R. (2006) Directed evolution and substrate specificity profile of homing endonuclease I-Scel. J Am Chem Soc, 128, 2477-2484. Duan, X; Gimble, F; and Quiocho, F; ( 1997) Crystal Structure of PI-
Scel, a Homing Endonuclease with Protein Splicing Activity, Cell 89: 555-564.
Epinat JC, Arnould S, Chames P, Rochaix P, Desfontaines D, Puzin C, Patin A, Zanghellini A, Paques F, Lacroix E. (2003) A novel engineered meganuclease induces homologous recombination in yeast and mammalian cells. Nucleic Acids Res. Jun 1 ;31(1 1):2952-62.
Garcia-Viloca, M., Gao, J., Karplus, M. and Truhlar, D.G. (2004) How enzymes work: analysis by modern rate theory and computer simulations. Science, 303, 186-195.
Gimble FS, Moure CM, Posey KL. (2003) Assessing the plasticity of DNA target site recognition of the PI-SceI homing endonuclease using a bacterial two-hybrid selection system. J MoI Biol. Dec 12;334(5):993-1008.
Gouble, A., Smith, J., Bruneau, S., Perez, C, Guyot, V., Cabaniols, J.P., Leduc, S., Fiette, L., Ave, P., Micheau, B., Duchateau, P. and Paques, F. (2006) Efficient in toto targeted recombination in mouse liver by meganuclease-induced double-strand break. J Gene Med, 8, 616-622.
Gouet, P., Courcelle, E., Stuart, D.I. and Metoz, F. (1999) ESPript: analysis of multiple sequence alignments in PostScript. Bioinformatics, 15, 305-308. W Grindl, W Wende, V Pingoud and A Pingoud (1998) The protein splicing domain of the homing endonuclease PI-sceI is responsible for specific DNA binding. Nucleic Acids Research, VoI 26, Issue 8 1857-1862
Hacein-Bey-Abina, S., Von Kalle, C, Schmidt, M., McCormack, M.P., Wulfrraat, N., Leboulch, P., Lim, A., Osborne, C.S., Pawliuk, R., Morillon, E.,
Sorensen, R., Forster, A., Fraser, P., Cohen, J.I., de Saint Basile, G., Alexander, I.,
Wintergerst, U., Frebourg, T., Aurias, A., Stoppa-Lyonnet, D., Romana, S., Radford-
Weiss, I., Gross, F., Valensi, F., Delabesse, E., Macintyre, E., Sigaux, F., Soulier, J.,
Leiva, L. E., Wissler, M., Prinz, C, Rabbitts, T.H., Le Deist, F., Fischer, A. and Cavazzana-Calvo, M. (2003) LMO2-associated clonal T cell proliferation in two patients after gene therapy for SCID-Xl . Science, 302, 415-419.
Ichiyanagi, K., Ishino, Y., Ariyoshi, M., Komori, K. and Morikawa, K. (2000) Crystal structure of an archaeal intein-encoded homing endonuclease PI- Pful. J MoI Biol, 300, 889-901. Jacquier, A. and Dujon, B. (1985) An intron-encoded protein is active in a gene conversion process that spreads an intron into a mitochondrial gene. Cell, 41 , 383-394.
Jurica, M.S., Monnat, R.J., Jr. and Stoddard, B.L. (1998) DNA recognition and cleavage by the LAGLIDADG homing endonuclease I-Crel. MoI Cell, 2, 469-476.
Kabsch, W. (1988) Automatic indexing of rotation diffraction patterns. J. Appl. Cryst, 21, 67-71.
Kleywegt, G.J., Zou, J. Y., Kjeldgaard, M. & Jones, T.A. (2001). Around O. In: "International Tables for Crystallography, Vol. F. Crystallography of Biological Macromolecules" (Rossmann, M. G. & Arnold, E., Editors). Chapter 17.1 , pp. 353-356, 366-367. Dordrecht: Kluwer Academic Publishers, The Netherlands
Larkin, M. A., Blackshields, G., Brown, N. P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., WiIm, A., Lopez, R., Thompson, J.D., Gibson, T.J. and Higgins, D.G. (2007) Clustal W and Clustal X version 2.0. Bioinformatics, 23, 2947-2948. Laskowski R A, MacArthur M W, Moss D S & Thornton J M (1993). PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Cryst., 26, 283-291..
Levitt, D. G. (2001) A new software routine that automates the fitting of protein X-ray crystal lographic electron-density maps. Acta Crystallogr D Biol Crystallogr, 57, 1013-1019.
Luscombe, N. M., Laskowski, R. A. and Thornton, J.M. (1997) NUCPLOT: a program to generate schematic diagrams of protein-nucleic acid interactions. Nucleic Acids Res, 25, 4940-4945. Lykke-Andersen, J., Garrett, R. A. and Kjems, J. (1997) Mapping metal ions at the catalytic centres of two intron-encoded endonucleases. Embo J, 16, 3272-3281.
Miller, J.C, Holmes, M.C., Wang, J., Guschin, D.Y., Lee, Y.L., Rupniewski, L, Beausejour, CM., Waite, A.J., Wang, N. S., Kim, K. A., Gregory, P. D., Pabo, CO. and Rebar, EJ. (2007) An improved zinc-finger nuclease architecture for highly specific genome editing. Nat Biotechnol, 25, 778-785.
Moure, CM., Gimble, F.S. and Quiocho, F.A. (2003) The crystal structure of the gene targeting homing endonuclease I-Scel reveals the origins of its target site specificity. J MoI Biol, 334, 685-695. Moure, C M., F. S. Gimble, Quiocho FA. (2002). Crystal structure of the intein homing endonuclease PI-SceI bound to its recognition sequence. Nat Struct Biol 9(10): 764-70.
Murshudov, G.N., Vagin, A. A. and Dodson, E.J. (1997) Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr D Biol Crystallogr, 53, 240-255.
Otwinowski, Z. and Minor, W. (1997) Processing of X-ray Diffraction Data Collected in Oscillation Mode. Academic Press, Methods in Enzymology New York.
Paques, F. and Duchateau, P. (2007) Meganucleases and DNA double-strand break-induced recombination: perspectives for gene therapy. Curr Gene Ther, 7, 49-66. Perrin, A., Buckle, M. and Dujon, B. (1993) Asymmetrical recognition and activity of the I-Scel endonuclease on its site and on intron-exon junctions. Embo J, 12, 2939-2947.
Prieto, J., Epinat, J. C, Redondo, P., Ramos, E., Padro, D., Cedrone, F., Montoya, G., Paques, F. and Blanco, F.J. (2008) Generation and Analysis of Mesophilic Variants of the Thermostable Archaeal I-Dmol Homing Endonuclease. J Biol Chem, 283, 4364-4374.
Puchta, H., Dujon, B. and Hohn, B. (1996) Two different but related mechanisms are used in plants for the repair of genomic double-strand breaks by homologous recombination. Proc Natl Acad Sci U S A, 93, 5055-5060.
Redondo, P., Prieto, J., Ramos, E., Blanco, F.J. and Montoya, G. (2007) Crystallization and preliminary X-ray diffraction analysis on the homing endonuclease I-Dmo-I in complex with its target DNA. Acta Crystallogr Sect F Struct Biol Cryst Commun, 63, 1017-1020. Rosen LE, Morrison HA, Masri S, Brown MJ, Springstubb B,
Sussman D, Stoddard BL, Seligman LM. (2006) Homing endonuclease I-Crel derivatives with novel DNA target specificities. Nucleic Acids Res.;34(17):4791-800. Epub Sep 13.
Schneider, T.R. and Sheldrick, G. M. (2002) Substructure solution with SHELXD. Acta Crystallogr D Biol Crystallogr, 58, 1772- 1779.
Schymkowitz, J., Borg, J., Stricher, F., Nys, R., Rousseau, F. and Serrano, L. (2005) The FoIdX web server: an online force field. Nucleic Acids Res, 33, W382-388.
Seligman LM, Chisholm KM, Chevalier BS, Chadsey MS, Edwards ST, Savage JH, Veillet AL. (2002) Mutations altering the cleavage specificity of a homing endonuclease. Nucleic Acids Res. Sep l ;30(17):3870-9.
Silva, G.H., Belfort, M., Wende, W. and Pingoud, A. (2006) From monomeric to homodimeric endonucleases and back: engineering novel specificity of LAGLIDADG enzymes. J MoI Biol, 361, 744-754. Silva, G.H., Dalgaard, J.Z., Belfort, M. and Van Roey, P. (1999)
Crystal structure of the thermostable archaeal intron-encoded endonuclease I-Dmol. J MoI Biol, 286, 1 123-1 136. Smith, J., Grizot, S., Arnould, S., Duclert, A., Epinat, J. C, Chames, P., Prieto, J., Redondo, P., Blanco, F.J., Bravo, J., Montoya, G., Paques, F. and Duchateau, P. (2006) A combinatorial approach to create artificial homing endonucleases cleaving chosen sequences. Nucleic Acids Res, 34, el 49. Spiegel, P. C, Chevalier, B., Sussman, D., Turmel, M., Lemieux, C. and Stoddard, B. L. (2006) The structure of I-Ceul homing endonuclease: Evolving asymmetric DNA recognition from a symmetric protein scaffold. Structure, 14, 869- 880.
Sussman D, Chadsey M, Fauce S, Engel A, Bruett A, Monnat R Jr, Stoddard BL, Seligman (2004) LM. Isolation and characterization of new homing endonuclease specificities at individual target site positions. J MoI Biol. Sep 3;342(1):31-41.
Stoddard, B. L. (2005) Homing endonuclease structure and function. Q Rev Biophys, 38, 49-95. Szczepek, M., Brondani, V., Buchel, J., Serrano, L., Segal, D.J. and
Cathomen, T. (2007) Structure-based redesign of the dimerization interface reduces the toxicity of zinc-finger nucleases. Nat Biotechnol, 25, 786-793.
Thierry, A. and Dujon, B. (1992) Nested chromosomal fragmentation in yeast using the meganuclease I-Scel: a new method for physical mapping of eukaryotic genomes. Nucleic Acids Res, 20, 5625-5631.
Van Duyne, G.D., Standaert, R.F., Karplus, P.A., Schreiber, S.L. and Clardy, J. (1993) Atomic structures of the human immunophilin FKBP- 12 complexes with FK506 and rapamycin. J MoI Biol, 229, 105-124.
Weeks, CM. and Miller, R. (1999) The design and implementation of SnB v2.0. J. Appl. Cryst, 32, 120-124.

Claims

1) A polypeptide, comprising the sequence of an I-Dmol endonuclease or a chimeric derivative thereof, including at least the I-Dmol domain B and characterized in that it comprises the substitution of at least one of residues in positions 124, 126, 154, 155 of said I-Dmol domain B; and wherein said polypeptide recognises an I-Dmol DNA target half-site which differs from a wild type I-Dmol DNA target half-site SEQ ID NO: 1, in at least one of positions ±2, ±3, ±5 ±6, ±7.
2) A polypeptide according to claim 1, wherein at least one of residues in positions 124, 126, 154, 155 are substituted for any amino acid.
3) A polypeptide according to claims 1 or 2, further comprising the substitution of at least one of the residues in positions 1 19, 128, 157 by any amino acid, which alters the recognition of said polypeptide for an I-Dmol DNA target half- site which differs from a wild type I-Dmol DNA target half-site SEQ ID NO: 1, in at least one of positions ±2 ±3.
4) A polypeptide according to any one of claims 1, 2 or 3, further comprising the substitution of at least one of the residues in positions 1 15, 1 16, 1 17, 1 18, 120, 130, 150, 152, 153, 156, 158, 160, 164, 166, 167, 170 by any amino acid, which alters the recognition of said polypeptide for an I-Dmol DNA target half-site which differs from a wild type I-Dmol DNA target half-site SEQ ID NO: 1, in at least one of positions ±1 , ±2 ±3, ±4, ±5 ±6, ±7, ±8, ±9.
5) A polypeptide according to any one of claims 1 to 4, wherein said polypeptide is a chimeric-Dmø endonuclease consisting of the fusion of said I-Dmol domain B to a sequence of a dimeric LAGLIDADG homing endonuclease or to a domain of another monomeric LAGLIDADG homing endonuclease.
6) A polypeptide according to any one of claims 1 to 5, wherein said I-Dmol domain B is fused to a domain selected from one of the enzymes in the group: I-Sce I, I-Chu I, I-Cre I, I-Csm I, PISce I, PI-TH I, PI-Mtu I, I-Ceu I, I-Sce II, I-Sce III, HO, Pi-Civ I, PI-Ctr I, PI-Aae I, PI-Bsu I, PI-Dha I, PI-Dm I, PI-Mav I, PI-Mch /, PI-Mfu I, PI-MjI I, PI-Mga I, PI-Mgo I, PI-Min I1 PI-Mka I, PI-MIe I, PI-Mma I1 PI- Msh I, PI-Msm I, PI-Mth I PI-Mtu I PI-Mxe I, PI-Npu I, PI-Pfu I1 PI-Rma I, PISpb I, PI-Ssp I, PI-Fac I, PI-Mja I, PI-Pho I, Pi-Tag I, PI-Thy I, PI-Tko I, PI-Tsp I, I-Msol. 7) A polypeptide according to any preceding claim, characterized in that said I-Dmol domain B is at the NH2-terminus of said chimeric-D/wo endonuclease.
8) A polypeptide according to anyone of claims 1 to 7, characterized in that said dimeric LAGLIDADG homing endonuclease is I-Crel.
9) A polypeptide according to anyone of claims 1 to 8, characterized in that it comprises a detectable tag and/or a Nuclear Localisation Signal, at its NH2 and/or COOH terminus.
10) A polynucleotide, characterized in that it encodes a polypeptide according to anyone of claims 1 to 9.
1 1) A vector, characterized in that it comprises the polynucleotide according to claim 10.
12) A host cell, characterized in that it is modified by the polynucleotide according to claim 10 or the vector according to claim 1 1. 13) A non-human transgenic animal, characterized in that all or part of its cells are modified by the polynucleotide according to claim 10 or the vector according to claim 1 1.
14) A transgenic plant, characterized in that all or part of its cells are modified by the polynucleotide according to claim 10 or the vector according to claim 1 1.
15) Use of a polypeptide according to anyone of claims 1 to 9, a polynucleotide according to claim 10, a vector according to claim 1 1 a cell according to claim 12, a non-human animal according to claim 13 or a plant according to claim 14, for the selection and/or the screening of meganucleases with novel DNA target specificity.
16) A method of identifying polypeptides comprising at least one I- Dmol domain, which can recognise and bind to an altered DNA target, comprising the steps of: i) applying a 3-dimensional molecular modelling algorithm to at least the set of atomic coordinates set out in Table II and figures 8 and 9 to determine the spatial coordinates of the DNA interacting portions of a candidate polypeptide and its native DNA target, modelled from said set of atomic coordinates and generating a model; ii) modifying at least one residue of said candidate polypeptide and altering the characteristics of said model accordingly; iii) electronically screening the modified candidate polypeptide of step ii) against a stored set of spatial coordinates representing said native DNA target sequence and at least one variant thereof; iv) calculating from said model the interaction energies of the modified candidate polypeptide of step ii) with said stored set of DNA targets; v) converting said interaction energies into a probability score to predict the preference of said modified polypeptide for said stored set of DNA targets; vi) identifying at least one DNA target sequence from the stored set of DNA targets, which said modified candidate polypeptide recognises, when said probability score is greater than a predetermined threshold; vii) identifying candidate polpeptides with altered specificity wherein said at least one DNA target sequence of step vi) is not said native DNA target sequence.
17) The method of claim 16, wherein said stored set of spatial coordinates of step iii) comprises said native DNA target in which at least one base therein is changed to the three alternate possible bases.
18) The method of claims 16 or 17, wherein said stored set of spatial coordinates comprises all possible variants of said native DNA target sequence.
19) The method according to any one of claims 16, 17 or 18, wherein said modified residue of step ii) forms a direct contact between said candidate polypeptide and said native DNA target sequence.
20) The method according to any one of claims 16, 17 or 18, wherein said modified residue of step ii) forms an indirect contact between said candidate polypeptide and said native DNA target sequence.
21) The method according to any one of claims 16, 17 or 18, wherein said modified residue of step ii) forms a molecular interaction selected from the group: hydrogen bond, polar contact and van der Waals interaction; between said candidate polypeptide and said native DNA target sequence. 22) The method according to any one of claims 16 to 21, wherein said candidate polypeptide comprises at least said I-Dmol domain B; and wherein in step ii) at least one of residues in positions 1 15, 1 16, 1 17, 1 18, 1 19,120, 124, 126, 128,130, 150, 152, 153, 154, 156, 155, 157, 158, 160, 164, 166, 167, 170 of said I-Dmol domain B is altered; and wherein said at least one altered DNA target half-site differs from a native DNA target consisting of SEQ ID NO: 7, in at least one of positions -1, -2, -3, - 4 -5 -6, -7, -8, -9.
23) The method according to any one of claims 16 to 22, wherein said candidate polypeptide comprises at least the l-Dmol domain A; and wherein in step ii) at least one of residues in positions 15, 19, 20, 21, 22, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 41, 42, 43, 67, 68, 70, 71, 72, 73, 75, 76, 77, 79, 81 , 83, 84, 85 of said l-Dmol domain A is altered; and wherein said at least one altered DNA target half-site differs from a native DNA target consisting of SEQ ID NO: 7, in at least one of positions +1, +2, +3, +4, +5 +6, +7, +8, +9, +10, +1 1 , +12, +13.
24) The method of claim 22 or 23 wherein said candidate polypeptide consists of an A or B domain of l-Dmol fused to a dimeric LAGLIDADG homing endonuclease or to a domain of another monomeric LAGLIDADG homing endonuclease.
25) The method according to any one of claims 22 to 23, wherein said candidate polypeptide consists of said l-Dmol domain fused to a domain selected from the one of the enzymes in the group: l-Sce I, l-Chu I, l-Cre I, I-Csm I, Pl-Sce I, Pl-TIi I, Pl-Mtu I, I-Ceu I1 l-Sce II, l-Sce III, HO, Pl-Civ I, Pl-Ctr I, PI-Aae I, Pl-Bsu /, Pl-Dha I, Pl-Dra I, PI-Mav I, Pl-Mch I, Pl-Mfu I, Pl-Mfl I, Pl-Mga I, Pl-Mgo I, Pl- Min I, Pl-Mka I, Pl-MIe I, Pl-Mma I, Pl-Msh I, Pl-Msm I, Pl-Mth I, Pl-Mtu I, Pl-Mxe I, Pl-Npu I, Pl-Pfu I, Pl-Rma I, Pl-Spb I, Pl-Ssp I, Pl-Fac I, Pl-Mja I, Pl-Pho I, Pi- Tag I, Pl-Thy I, PI-Tko I, Pl-Tsp I, l-Msol.
PCT/IB2008/002756 2008-07-03 2008-07-03 The crystal structure of i-dmoi in complex with its dna target, improved chimeric meganucleases and uses thereof Ceased WO2010001189A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/IB2008/002756 WO2010001189A1 (en) 2008-07-03 2008-07-03 The crystal structure of i-dmoi in complex with its dna target, improved chimeric meganucleases and uses thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2008/002756 WO2010001189A1 (en) 2008-07-03 2008-07-03 The crystal structure of i-dmoi in complex with its dna target, improved chimeric meganucleases and uses thereof

Publications (1)

Publication Number Publication Date
WO2010001189A1 true WO2010001189A1 (en) 2010-01-07

Family

ID=40352676

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2008/002756 Ceased WO2010001189A1 (en) 2008-07-03 2008-07-03 The crystal structure of i-dmoi in complex with its dna target, improved chimeric meganucleases and uses thereof

Country Status (1)

Country Link
WO (1) WO2010001189A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011064736A1 (en) 2009-11-27 2011-06-03 Basf Plant Science Company Gmbh Optimized endonucleases and uses thereof
WO2011064750A1 (en) 2009-11-27 2011-06-03 Basf Plant Science Company Gmbh Chimeric endonucleases and uses thereof
WO2012149470A1 (en) 2011-04-27 2012-11-01 Amyris, Inc. Methods for genomic modification
DE112010004584T5 (en) 2009-11-27 2012-11-29 Basf Plant Science Company Gmbh Chimeric endonucleases and applications thereof
EP2612918A1 (en) 2012-01-06 2013-07-10 BASF Plant Science Company GmbH In planta recombination
WO2015095804A1 (en) 2013-12-19 2015-06-25 Amyris, Inc. Methods for genomic integration
EP3733847B1 (en) 2012-10-23 2022-06-01 Toolgen Incorporated Composition for cleaving a target dna comprising a guide rna specific for the target dna and cas protein-encoding nucleic acid or cas protein, and use thereof
WO2023081756A1 (en) 2021-11-03 2023-05-11 The J. David Gladstone Institutes, A Testamentary Trust Established Under The Will Of J. David Gladstone Precise genome editing using retrons
WO2023141602A2 (en) 2022-01-21 2023-07-27 Renagade Therapeutics Management Inc. Engineered retrons and methods of use
EP4219731A2 (en) 2016-05-18 2023-08-02 Amyris, Inc. Compositions and methods for genomic integration of nucleic acids into exogenous landing pads
WO2024044723A1 (en) 2022-08-25 2024-02-29 Renagade Therapeutics Management Inc. Engineered retrons and methods of use

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004031346A2 (en) * 2002-09-06 2004-04-15 Fred Hutchinson Cancer Research Center Methods and compositions concerning designed highly-specific nucleic acid binding proteins

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004031346A2 (en) * 2002-09-06 2004-04-15 Fred Hutchinson Cancer Research Center Methods and compositions concerning designed highly-specific nucleic acid binding proteins

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
AAGAARD C ET AL: "PROFILE OF THE DNA RECOGNITION SITE OF THE ARCHAEAL HOMING ENDONUCLEASE I-DMOL", NUCLEIC ACIDS RESEARCH, OXFORD UNIVERSITY PRESS, SURREY, GB, vol. 25, no. 8, 15 April 1997 (1997-04-15), pages 1523 - 1530, XP000942177, ISSN: 0305-1048 *
LUCAS PATRICK ET AL: "Rapid evolution of the DNA-binding site in LAGLIDADG homing endonucleases", NUCLEIC ACIDS RESEARCH, vol. 29, no. 4, 15 February 2001 (2001-02-15), pages 960 - 969, XP002516751, ISSN: 0305-1048 *
MARCAIDA MARÍA JOSÉ ET AL: "Crystal structure of I-DmoI in complex with its target DNA provides new insights into meganuclease engineering.", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA 4 NOV 2008, vol. 105, no. 44, 4 November 2008 (2008-11-04), pages 16888 - 16893, XP002516753, ISSN: 1091-6490 *
MOURE CARMEN M ET AL: "Crystal structures of I-SceI complexed to nicked DNA substrates: snapshots of intermediates along the DNA cleavage reaction pathway", NUCLEIC ACIDS RESEARCH, vol. 36, no. 10, June 2008 (2008-06-01), pages 3287 - 3296, XP002516752, ISSN: 0305-1048 *
REDONDO PILAR ET AL: "Crystallization and preliminary X-ray diffraction analysis on the homing endonuclease I-Dmo-I in complex with its target DNA", ACTA CRYSTALLOGRAPHICA SECTION F STRUCTURAL BIOLOGY AND CRYSTALLIZATION COMMUNICATIONS, vol. 63, no. Part 12, December 2007 (2007-12-01), pages 1017 - 1020, XP002516750, ISSN: 1744-3091(print) 1744-3091(ele *
SILVA G H ET AL: "Analysis of the LAGLIDADG interface of the monomeric homing endonuclease I-DmoI", NUCLEIC ACIDS RESEARCH, OXFORD UNIVERSITY PRESS, SURREY, GB, vol. 32, no. 10, 1 June 2004 (2004-06-01), pages 3156 - 3168, XP002364698, ISSN: 0305-1048 *
SILVA G H ET AL: "Crystal structure of the thermostable archaeal intron-encoded endonuclease I-DmoI", JOURNAL OF MOLECULAR BIOLOGY, LONDON, GB, vol. 286, no. 4, 5 March 1999 (1999-03-05), pages 1123 - 1136, XP004462690, ISSN: 0022-2836 *
SMITH JULIANNE ET AL: "A combinatorial approach to create artificial homing endonucleases cleaving chosen sequences", NUCLEIC ACIDS RESEARCH, OXFORD UNIVERSITY PRESS, SURREY, GB, vol. 34, no. 22, 27 November 2006 (2006-11-27), pages e149 - 1, XP002457876, ISSN: 0305-1048 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10316304B2 (en) 2009-11-27 2019-06-11 Basf Plant Science Company Gmbh Chimeric endonucleases and uses thereof
US9404099B2 (en) 2009-11-27 2016-08-02 Basf Plant Science Company Gmbh Optimized endonucleases and uses thereof
DE112010004583T5 (en) 2009-11-27 2012-10-18 Basf Plant Science Company Gmbh Chimeric endonucleases and applications thereof
WO2011064736A1 (en) 2009-11-27 2011-06-03 Basf Plant Science Company Gmbh Optimized endonucleases and uses thereof
DE112010004582T5 (en) 2009-11-27 2012-11-29 Basf Plant Science Company Gmbh Optimized endonucleases and applications thereof
DE112010004584T5 (en) 2009-11-27 2012-11-29 Basf Plant Science Company Gmbh Chimeric endonucleases and applications thereof
WO2011064750A1 (en) 2009-11-27 2011-06-03 Basf Plant Science Company Gmbh Chimeric endonucleases and uses thereof
US8685737B2 (en) 2011-04-27 2014-04-01 Amyris, Inc. Methods for genomic modification
WO2012149470A1 (en) 2011-04-27 2012-11-01 Amyris, Inc. Methods for genomic modification
US9701971B2 (en) 2011-04-27 2017-07-11 Amyris, Inc. Methods for genomic modification
WO2013102875A1 (en) 2012-01-06 2013-07-11 Basf Plant Science Company Gmbh In planta recombination
EP2612918A1 (en) 2012-01-06 2013-07-10 BASF Plant Science Company GmbH In planta recombination
EP3733847B1 (en) 2012-10-23 2022-06-01 Toolgen Incorporated Composition for cleaving a target dna comprising a guide rna specific for the target dna and cas protein-encoding nucleic acid or cas protein, and use thereof
US12473559B2 (en) 2012-10-23 2025-11-18 Toolgen Incorporated Cas9/RNA complexes for inducing modifications of target endogenous nucleic acid sequences in nucleuses of eukaryotic cells
WO2015095804A1 (en) 2013-12-19 2015-06-25 Amyris, Inc. Methods for genomic integration
EP4219731A2 (en) 2016-05-18 2023-08-02 Amyris, Inc. Compositions and methods for genomic integration of nucleic acids into exogenous landing pads
WO2023081756A1 (en) 2021-11-03 2023-05-11 The J. David Gladstone Institutes, A Testamentary Trust Established Under The Will Of J. David Gladstone Precise genome editing using retrons
WO2023141602A2 (en) 2022-01-21 2023-07-27 Renagade Therapeutics Management Inc. Engineered retrons and methods of use
WO2024044723A1 (en) 2022-08-25 2024-02-29 Renagade Therapeutics Management Inc. Engineered retrons and methods of use

Similar Documents

Publication Publication Date Title
WO2010001189A1 (en) The crystal structure of i-dmoi in complex with its dna target, improved chimeric meganucleases and uses thereof
Moure et al. The crystal structure of the gene targeting homing endonuclease I-SceI reveals the origins of its target site specificity
CN103608027B (en) For generating method of fine and close TALE-nuclease and uses thereof
JP2024001024A (en) Streamlined meganucleases with altered sequence specificity and DNA-binding affinity
EP2126066B1 (en) Laglidadg homing endonuclease variants having novel substrate specificity and use thereof
US20180170984A1 (en) Site-specific dna base editing using modified apobec enzymes
WO2004031346A2 (en) Methods and compositions concerning designed highly-specific nucleic acid binding proteins
JP2019062898A (en) Rationally designed single chain meganucleases having non-palindrome recognition sequences
EP2231697B1 (en) Improved chimeric meganuclease enzymes and uses thereof
CN101384712A (en) Meganuclease variants cleaving a DNA target sequence from xeroderma pigmentosum gene and uses thereof
JP2011505809A (en) A rationally designed meganuclease with a recognition sequence found in the DNase hypersensitive region of the human genome
Thyme et al. Reprogramming homing endonuclease specificity through computational design and directed evolution
Joshi et al. Evolution of I-SceI homing endonucleases with increased DNA recognition site specificity
WO2012007848A2 (en) Meganuclease variants cleaving a dna target sequence in the was gene and uses thereof
Zhao Characterization of bacterial homing endonuclease I-Ssp6803I
Schomburg et al. uracil-DNA glycosylase 3.2. 2.27
Silva Structural and biochemical analysis of the thermostable archaeal intron-encoded endonuclease I-DmoI
SG193850A1 (en) Meganuclease variants cleaving a dna target sequence from a glutamine synthetase gene and uses thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08874866

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08874866

Country of ref document: EP

Kind code of ref document: A1