HK1182130B

HK1182130B - Genome editing of a rosa locus using zinc-finger nucleases

Info

Publication number: HK1182130B
Application number: HK13109374.2A
Authority: HK
Inventors: X.崔; G.戴维斯; P．D．格雷戈里; M．C．霍姆斯; E．J．维因斯泰恩
Original assignee: 桑格摩生物科学股份有限公司; 西格马奥利奇股份有限公司
Priority date: 2010-04-26
Filing date: 2011-04-25
Publication date: 2017-09-08

Description

Genome editing of Rosa sites using zinc finger nucleases

Cross Reference to Related Applications

This application claims the benefit of U.S. provisional application No. 61/343,287, filed on month 4,26, 2010, the disclosure of which is incorporated herein by reference in its entirety.

Claims to invention completed under federally sponsored research

Not applicable.

Technical Field

The present disclosure is in the field of genome engineering and includes somatic gene and genetic gene insertion/disruption, genomic alterations, generation of alleles carrying random mutations, and/or insertion of transgenes into Rosa loci.

Background

The Rosa gene product is commonly expressed at the same time in all stages of development. Thus, such sites have been widely used for expressing endogenous sequences from endogenous or introduced promoters and for producing transgenic mice, for example from embryonic stem cells. See, e.g., Strathdee et al (2006) plosONE, 1 st, 4 th edition; nyabi et al (2009) nucleic acids. res.37: e55.

however, traditional methods of targeted insertion may require complex targeting vector components. Thus, there is a need for methods of targeted insertion and/or modification of Rosa genes in a targeted manner. Precise targeted site-specific cleavage of genomic sites provides an effective complement and/or alternative to traditional homologous recombination. The generation of double-stranded breaks (DSBs) increases the number of homologous recombinations at the targeted site by more than 1000-fold. More simply, site-specific DSB imprecise repair by non-homologous end joining (NHEJ) can also lead to gene disruption. The generation of two such DSBs results in the deletion of arbitrarily large regions. Modular DNA recognition preferences for zinc finger proteins allow for the rational design of site-specific multi-finger DNA binding proteins. Fusion of the nuclease for the type II restriction enzyme fokl domain to the site-specific zinc finger protein resulted in the generation of a site-specific nuclease. See, for example, U.S. patent publications 20030232410; 20050208489, respectively; 20050026157, respectively; 20050064474; 20060188987; 20060063231, respectively; 20070134796, respectively; 2008015164, respectively; 20080131962, respectively; 2008015996 and international publications WO07/014275 and 2008/133938, all of which describe the use of zinc finger nucleases and are incorporated by reference in their entirety for all purposes.

SUMMARY

Disclosed herein are compositions and methods for targeted insertion into the Rosa gene site. The compositions and methods described herein may be used for genome editing, including but not limited to: lysing one or more genes in an animal cell to result in targeted changes (insertion, deletion, and/or substitution mutations) in one or more genes, including incorporation of such targeted changes into the germline; targeted introduction of a non-endogenous nucleic acid sequence to partially or completely inactivate one or more genes in an animal; methods of inducing homology-directed repair, generating transgenic animals (e.g., rodents), and/or generating random mutations encoding novel allelic forms of the animal's genes.

In one aspect, described herein are Zinc Finger Proteins (ZFPs) that bind to a target site in a Rosa gene in a genome (e.g., a rodent genome), wherein the ZFPs comprise one or more engineered zinc finger binding domains. In one embodiment, the ZFP is a Zinc Finger Nuclease (ZFN) that cleaves a target genomic region of interest, wherein the ZFN comprises one or more engineered zinc finger binding domains and a nuclease cleavage domain or cleavage half-domain. Cleavage domains and cleavage half-domains can be obtained, for example, from various restriction endonucleases and/or homing endonucleases. In one embodiment, the cleavage half-domain is derived from a type IIS restriction endonuclease (e.g., FokI). In certain embodiments, the zinc finger domain can recognize a target site in a Rosa gene, e.g., Rosa 26.

The ZFNs can bind to and/or cleave the Rosa gene in a non-coding sequence (such as, e.g., a leader, trailer, or intron) within or near the coding region of the gene, or within a non-transcribed region upstream or downstream of the coding region.

In another aspect, described herein are compositions comprising one or more zinc finger nucleases described herein. In certain embodiments, the composition comprises one or more zinc finger nucleases and a pharmaceutically acceptable excipient.

In another aspect, described herein are polynucleotides encoding one or more ZFNs described herein. The polynucleotide may be, for example, an mRNA.

In another aspect, described herein are ZFN expression vectors comprising a polynucleotide encoding one or more of the ZFNs described herein operably linked to a promoter.

In another aspect, described herein are host cells comprising one or more ZFN expression vectors as described herein. The host cell may be stably transformed or transiently transfected with one or more ZFP expression vectors or a combination thereof. In one embodiment, the host cell is an embryonic stem cell. In other embodiments, the one or more ZFP expression vectors express one or more ZFNs in a host cell. In another embodiment, the host cell may further comprise an exogenous polynucleotide donor sequence. In any embodiment, a host cell described herein can include an embryonic cell, such as one or more mouse, rat, rabbit, or other mammalian cell embryos.

In another aspect, described herein is a method for lysing one or more Rosa genes in a cell, the method comprising: (a) introducing one or more polynucleotides encoding one or more ZFNs that bind to a target site in one or more genes into a cell under conditions such that the ZFNs are expressed and the one or more genes are cleaved.

In yet another aspect, described herein is a method for introducing an exogenous sequence into the genome of a cell, the method comprising the steps of: (a) introducing one or more polynucleotides encoding one or more ZFNs that bind to a target site in a Rosa gene into a cell under conditions such that the ZFNs are expressed and the one or more genes are cleaved; and (b) contacting the cell with an exogenous polynucleotide; this allows the lytic gene to stimulate the integration of the exogenous polynucleotide into the genome by homologous recombination. In certain embodiments, the exogenous polynucleotide is physically integrated into the genome. In other embodiments, integration of the exogenous polynucleotide into the genome is achieved by replication of the exogenous sequence in the host cell genome by a nucleic acid replication process (e.g., homology-directed repair of double-strand breaks). In other embodiments, genomic integration occurs by homology-independent targeted integration (e.g., "end capture"). In certain embodiments, the one or more nucleases are fusions between a type IIS restriction endonuclease cleavage domain and an engineered zinc finger binding domain. In certain embodiments, the exogenous sequence is integrated into a small mammalian (e.g., rabbit or rodent such as mouse, rat, etc.) Rosa gene.

In another embodiment, described herein is a method of modifying one or more Rosa gene sequences in the genome of a cell, the method comprising (a) providing a cell comprising one or more Rosa sequences; and (b) expressing a first and second Zinc Finger Nuclease (ZFN) in the cell, wherein the first ZFN cleaves at a first cleavage site and the second ZFN cleaves at a second cleavage site, wherein the gene sequence is located between the first cleavage site and the second cleavage site, wherein cleavage of the first and second cleavage sites results in modification of the gene sequence by non-homologous end joining and/or homology-directed repair. Optionally, the lysis results in the insertion of an exogenous sequence (transgene) that is also introduced into the cell. In other embodiments, the non-exogenous end-joining results in a deletion between the first and second cleavage sites. The size of the deletion in the gene sequence is determined by the distance between the first and second cleavage sites. Correspondingly, deletions of any size in any genomic region of interest can be obtained. Deletions of 25, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000 nucleotide pairs, or any integer value of nucleotide pairs within this range, can be obtained. Sequence deletions of any integer value of nucleotide pairs greater than 1,000 nucleotide pairs can additionally be obtained using the methods and compositions disclosed herein.

The methods of modifying an endogenous Rosa gene as described herein can be used to generate animal (e.g., human) disease models, for example, by inactivating the gene (partially or completely) or by generating random mutations at defined genetic positions that allow for the identification or selection of transgenic animals (e.g., rats, rabbits, or mice) that carry new allelic forms of the gene, by insertion of humanized genes (for non-limiting example, for studying drug metabolism) or by insertion of mutant alleles of interest, for example, the phenotypic impact of such alleles.

In yet another aspect, described herein is a method for germline disruption of one or more target Rosa genes, the method comprising modifying one or more Rosa sequences in the genome of one or more cells of an embryo by any of the methods described herein, wherein the modified gene sequences are present in at least a portion of a sexually mature animal gamete. In certain embodiments, the animal is a small mammal, such as a rodent or rabbit.

In another aspect, described herein is a method of producing one or more genetically mutant alleles in at least one Rosa locus of interest, the method comprising modifying one or more Rosa loci in the genome of one or more cells of an animal embryo by any of the methods described herein; culturing the embryo to sexual maturity; and allowing the sexually mature animal to produce offspring; wherein at least some of the progeny comprise a mutant allele. In certain embodiments, the animal is a small mammal, e.g., a rabbit or a rodent such as a rat, mouse, or guinea pig.

In any of the methods described herein, the polynucleotide encoding the zinc finger nuclease may comprise DNA, RNA, or a combination thereof. In certain embodiments, the polynucleotide comprises a plasmid. In other embodiments, the polynucleotide encoding the nuclease comprises mRNA.

In yet another aspect, provided herein is a method for site-specific integration of a nucleic acid sequence into a chromosomal Rosa locus. In certain embodiments, the method comprises: (a) injecting the embryo with (i) at least one DNA vector, wherein the DNA vector comprises an upstream sequence and a downstream sequence flanking the nucleic acid sequence to be integrated, and (ii) at least one RNA molecule encoding a zinc finger nuclease that recognizes an integration site in the Rosa site, and (b) culturing the embryo for expression of the zinc finger nuclease, wherein a double-strand break introduced to the integration site by the zinc finger nuclease is repaired by homologous recombination via the DNA vector so as to integrate the nucleic acid sequence into the chromosome.

Suitable embryos can be obtained from several different vertebrate species, including mammals, birds, reptiles, amphibians, and fish. In general, suitable embryos are embryos that can be collected, injected, and cultured for expression of the zinc finger nuclease. In some embodiments, suitable embryos may include embryos from small mammals (e.g., rodents, rabbits, etc.), companion animals, livestock animals, and primates. Non-limiting examples of rodents can include mice, rats, hamsters, gerbils, and guinea pigs. Non-limiting examples of companion animals may include cats, dogs, rabbits, hedgehog, and ferrets. Non-limiting examples of livestock may include horses, goats, sheep, pigs, llamas, alpacas, and cattle. Non-limiting examples of primates can include caput monkeys, chimpanzees, lemurs, macaques, marmosets, tamarins, spider monkeys, squirrel monkeys, and black-tailed monkeys. In other embodiments, suitable embryos may include embryos from fish, reptiles, amphibians, or birds. Alternatively, a suitable embryo may be an insect embryo, for example, a drosophila embryo or a mosquito embryo.

Also provided are embryos comprising at least one DNA vector, wherein the DNA vector comprises upstream and downstream sequences flanking a nucleic acid sequence to be integrated, and at least one RNA molecule encoding a zinc finger nuclease that recognizes an integrated chromosomal site. Also provided are organisms obtained from any of the embryos as described herein.

Kits comprising the ZFPs of the invention are also provided. The kit may comprise a nucleic acid encoding a ZFP (e.g. an RNA molecule contained in a suitable expression vector or a ZFP-encoding gene), a donor molecule, a suitable host cell line, instructions for performing the methods of the invention, etc.

Brief Description of Drawings

FIG. 1 depicts a southern blotThe southern blot demonstrated e.g.by Surveyor^TMMismatch analysis (of transgenics) NHEJ repair results after cleavage at rat rosa26 site were analyzed. "G" indicates the reaction of cells transfected with GFPZFN, and the numbered lanes indicate specific ZFN pairs. Arrows indicate lanes in which NHEJ has occurred.

Fig. 2 depicts insertion of Rosa-targeted donor nucleotides into mouse genomic DNA.

Detailed description of the invention

Described herein are compositions and methods for genome editing (e.g., gene cleavage; gene alteration, e.g., by post-cleavage insertion (physical insertion or insertion by replication through homology-directed repair) of exogenous sequences and/or post-cleavage non-homologous end joining (NHEJ); partial or complete inactivation of one or more genes; generation of alleles with random mutations to cause altered expression of endogenous genes, etc.) and alteration of genomes introduced into germline in vivo (e.g., in small mammals such as mice, rats, or rabbits). Also disclosed are methods of making and using these compositions (reagents), e.g., for editing (altering) one or more genes in a target animal (e.g., small mammal) cell. Thus, the methods and compositions described herein provide efficient methods for targeting genetic alterations (e.g., knock-in) and/or knock-out (partial or complete) of one or more genes and/or for any target allele sequence randomization mutation, and thus allow the generation of animal models of human disease.

The compositions and methods described herein provide rapid, complete, and permanent targeted disruption of endogenous sites in animals without labor-intensive selection and/or screening, and with minimal off-target effects. Whole animal gene knockouts can also be easily generated in one step by injecting ZFN mrna or ZFN expression frameworks.

Summary of the invention

The practice of these methods, and the preparation and use of the compositions disclosed herein, will employ, unless otherwise indicated, conventional techniques of molecular biology, biochemistry, chromatin structure and analysis, computational chemistry, cell culture, recombinant DNA, and the like, which are within the skill of the art and relevant art. These techniques are fully described in the literature. See, e.g., Sambrook et al, Molecularclong, ALABORATORYMANUAL, second edition, ColdSpringHarbor laboratory Press, 1989, and third edition, 2001; ausubel et al, Currentprotocolsinmolecularloy, John Wiley & Sons, New York, 1987 and periodic updates; MethodSenzylens book, academic Press, SanDiego; wolffe, CHROMATINSTRUCTURETURENDENTATION, third edition, academic Press, SanDiego, 1998; MethodSenzylcoloy, Vol.304, "Chromatin" (ed. P.M.Wassarman and A.P.Wolffe), academic Press, SanDiego, 1999; and MethodsDinimolecurobiology, Vol 119, "chromatography protocols" (ed. P.B. Becker) HumanaPress, Totowa, 1999.

Definition of

The terms "nucleic acid", "polynucleotide" and "oligonucleotide" are used interchangeably and refer to a polymer of deoxyribonucleotides or ribonucleotides in either a linear or cyclic conformation, and in either single-or double-stranded form. For purposes of this disclosure, these terms should not be construed as limiting the length of the polymer. The term may encompass known analogs of natural nucleotides, as well as nucleotides modified in the base, sugar, and/or phosphate moieties (e.g., phosphorothioate backbones). Typically, analogs of a particular nucleotide have the same base pair specificity; i.e. the analogue of a will pair with the T base.

The terms "polypeptide", "peptide" and "protein" are used interchangeably to refer to a polymer of amino acid residues. The term also applies to amino acid polymers in which one or more amino acids are chemical analogs or modified derivatives of the corresponding naturally occurring amino acid.

"binding" refers to a sequence-specific, non-covalent interaction between macromolecules (e.g., between a protein and a nucleic acid). Not all binding interaction components need to be sequence specific (e.g., to DNA)Phosphate residues in the backbone) as long as the interaction is sequence specific overall. This interaction is generally characterized by 10^-6M^-1Or lower decomposition constant (K)_d). "affinity" refers to the strength of binding: increased binding affinity with lower K_dIt is related.

A "binding protein" is a protein that is capable of non-covalent binding to another molecule. The binding protein may bind to, for example, a DNA molecule (DNA binding protein), an RNA molecule (RNA binding protein), and/or a protein molecule (protein binding protein). In the case of a protein binding protein, it may bind to itself (to form homodimers, homotrimers, etc.) and/or it may bind to one or more molecules of a different protein. The binding protein may have more than one type of binding activity. For example, zinc finger proteins have DNA binding, RNA binding, and protein binding activity.

A "zinc finger DNA binding protein" (or binding domain) is a protein that binds DNA in a sequence-specific manner via one or more zinc fingers, or a domain within a larger protein, which are regions of amino acid sequence within the binding domain whose structure is stabilized by coordination of zinc ions. The term zinc finger DNA binding protein is often abbreviated as zinc finger protein or ZFP.

A zinc finger binding domain may be "engineered" to bind to a predetermined nucleotide sequence. Non-limiting examples of methods for engineering zinc finger proteins are design and selection. The designed zinc finger protein is a protein that does not occur in nature, and the design/composition of the protein is mainly the result of rational criteria. Reasonable criteria for design include the use of substitution rules and computational algorithms for processing information in a database that stores information for existing ZFP designs and binding data. See, for example, U.S. patent 6,140,081; 6,453,242; and 6,534,261; see also WO 98/53058; WO 98/53059; WO 98/53060; WO02/016536 and WO 03/016496.

A "selected" zinc finger protein is a protein not found in nature, the manufacture of which is primarily the result of empirical processes such as phage display, mutual capture or hybrid selection. See, e.g., US5,789,538; US5,925,523; US6,007,988; US6,013,453; US6,200,759; WO 95/19431; WO 96/06166; WO 98/53057; WO 98/54311; WO 00/27878; WO01/60970WO01/88197 and WO 02/099084.

The term "sequence" refers to a nucleotide sequence of any length, which may be DNA or RNA; may be linear, cyclic or branched, and may be single-stranded or double-stranded. The term "donor sequence" refers to a nucleotide sequence that is inserted into a genome. The donor sequence may be any length, for example, between 2 and 10,000 nucleotides in length (or any integer value therebetween or thereabove), preferably between about 100 and 1,000 nucleotides in length (or any integer value therebetween), more preferably between about 200 and 500 nucleotides in length.

"homologous, non-identical sequence" refers to a first sequence that shares a degree of sequence identity with a second sequence, but differs in sequence from the second sequence. For example, a polynucleotide comprising the wild-type sequence of a mutant gene is homologous to, and different from, the sequence of the mutant gene. In certain embodiments, the degree of homology between the two sequences is sufficient to allow homologous recombination to occur therebetween using standard cellular mechanisms. Two homologous, non-identical sequences can be of any length, and their degree of non-homology can be as little as a single nucleotide (e.g., for correcting genomic point mutations by targeted homologous recombination) or as much as 10 or more kilobases (e.g., for inserting a gene in a predetermined ectopic site in a chromosome). Two polynucleotides comprising homologous non-identical sequences need not be the same length. For example, between 20 and 10,000 nucleotides or nucleotide pairs of the exogenous polynucleotide (i.e., the donor polynucleotide) can be used.

Techniques for determining the identity of nucleic acid and amino acid sequences are known in the art. Typically, such techniques involve determining the mRNA nucleotide sequence for a gene and/or determining the amino acid sequence encoded thereby, and comparing these sequences to a second nucleotide or amino acid sequence. Genomic sequences can also be determined and compared in this manner. In general, identity refers to the exact nucleotide-to-nucleotide or amino acid-to-amino acid correspondence of two polynucleotide or polypeptide sequences, respectively. Two or more sequences (polynucleotides or amino acids) can be compared by determining their percent identity. The percent identity of two sequences, whether nucleic acid or amino acid sequences, is the number of exact matches between two aligned sequences divided by the length of the shorter sequence and multiplied by 100.

Alternatively, the degree of sequence similarity between polynucleotides can be determined by: polynucleotides are hybridized under conditions that allow the formation of a stable duplex between homologous regions, followed by cleavage with a single strand specific nuclease, and the size of the cleaved fragments is determined. Two nucleic acid or two polynucleotide sequences are substantially homologous to each other when the sequences exhibit at least about 70% to 75%, preferably 80% to 82%, more preferably 85% to 90%, even more preferably 92%, yet more preferably 95% and most preferably 98% sequence identity over a defined molecular length as determined using the methods described above. As used herein, substantially homologous also refers to displaying a sequence identical to a particular DNA or polypeptide sequence. DNA sequences that are substantially homologous can be identified in a Southern hybridization experiment, for example, under stringent conditions as defined for this particular system. Defining appropriate hybridization conditions is within the skill of the art. See, e.g., Sambrook et al, supra;NucleicAcidHybridization：A PracticalApproachhames and s.j.higgins, (1985) Oxford; washington, DC; IRLPress).

Selective hybridization of two nucleic acid fragments can be determined as follows. The degree of sequence identity between two nucleic acid molecules affects the efficiency and intensity of the hybridization results between these molecules. A partially identical nucleic acid sequence will at least partially inhibit hybridization of the identical sequence to the target molecule. Inhibition of hybridization of identical sequences can be assessed using hybridization experiments that are well known in the art (e.g., southern (DNA) blots, northern (RNA) blots, solution hybridization, etc., see Sambrook et al, molecular cloning: AlaboratoryManual, second edition, (1989) Cold spring harbor, N.Y.). These experiments can be performed with varying degrees of selectivity, for example, using conditions varying from low to high stringency. If conditions of low stringency are employed, the lack of non-specific binding can be assessed using secondary probes that lack even a partial degree of sequence identity (e.g., less than about 30% sequence identity with the target molecule), such that in the absence of non-specific binding, the secondary probes will not hybridize to the target.

When using a hybridization-based detection system, nucleic acid probes complementary to a reference nucleic acid sequence are selected, and then by selecting appropriate conditions under which the probes and reference sequences will selectively hybridize or bind to each other to form a double-stranded molecule. Nucleic acid molecules capable of selectively hybridizing to a reference sequence under moderately stringent hybridization conditions typically hybridize under conditions that allow detection of a target nucleic acid sequence of at least about 10 to 14 nucleotides in length, which nucleic acid sequence has at least approximately 70% sequence identity to the sequence of the selected nucleic acid probe. Stringent hybridization conditions typically allow detection of target nucleic acid sequences of at least about 10 to 14 nucleotides in length, which have greater than about 90% to 95% sequence identity to the selected nucleic acid probe. Hybridization conditions suitable for probe/reference sequence hybridization (under which conditions the probe and reference sequence have a particular degree of sequence identity) can be determined as is known in the art (see, e.g., forNucleicAcidHybridization：APracticalApproachHames and s.j.higgins, (1985) Oxford; washington, DC; IRLPress).

Conditions for hybridization are well known to those skilled in the art. Hybridization stringency refers to the extent to which hybridization conditions are unfavorable for the formation of hybrids containing mismatched nucleotides, with higher stringency being associated with lower tolerance for mismatched hybrids. Factors that affect hybridization stringency are well known to those skilled in the art and include, but are not limited to, temperature, pH, ionic strength, and concentration of organic solvents (e.g., formamide and dimethyl sulfoxide). As is known to those skilled in the art, hybridization stringency is increased by higher temperatures, lower ionic strength, and lower solvent concentrations.

With respect to stringency conditions for hybridization, it is known in the art that many equivalent conditions can be employed to establish a particular stringency by varying, for example, the following factors: the length and nature of the sequence, the base composition of the various sequences, the concentration of salts and other hybridization solution components, the presence or absence of blocking agents (e.g., dextran sulfate and polyethylene glycol) in the hybridization solution, the temperature and time parameters of the hybridization reaction, and varying washing conditions. Selection of a set of hybridization conditions is selected according to methods standard in the art (see, e.g., Sambrook et al,MolecularCloning：alaborrymanual, second edition, (1989) cold spring harbor, n.y.).

"recombination" refers to the process of genetic information exchange between two polynucleotides. For purposes of this disclosure, "Homologous Recombination (HR)" refers to a particular form of such exchange that occurs during repair of a double-strand break in a cell, for example, via a homology-directed repair mechanism. This process requires nucleotide sequence homology, uses a "donor" molecule to become a repair template for a "target" molecule (i.e., a molecule that undergoes a double strand break), and is variously referred to as "non-cross-gene conversion" or "short-sequence gene conversion" because it results in the transfer of genetic information from the donor to the target. Without wishing to be bound by any particular theory, this transfer may involve correction of mismatch of heteroduplex DNA formed between the fragmented target and donor, and/or "dependent on the strand adhesion synthesized" in which the donor is used to resynthesize genetic information that will be part of the target, and/or related processes. This particular HR often results in a change in the sequence of the target molecule such that part or all of the sequence of the donor polynucleotide is incorporated into the target polynucleotide.

In the methods of the present disclosure, one or more targeted nucleases as described herein can create a double-stranded break in a target sequence (e.g., cellular chromatin) in a predetermined site, and a "donor" polynucleotide having homology to the nucleotide sequence in the break region can be introduced into the cell. The presence of a double-stranded break has been shown to facilitate integration of the donor sequence. The donor sequence may be physically integrated or, alternatively, the donor polynucleotide is used as a template for repair of the break via homologous combination, which will result in all or part of the nucleic acid sequence introduced into the donor into the cellular chromatin. Thus, a first sequence in cellular chromatin can be altered and, in certain embodiments, converted to a sequence present in a donor polynucleotide. Thus, use of the terms "replace" or "replacing" is understood to mean the replacement of one nucleotide sequence with another (i.e., replacing the sequence in an informational sense), and does not necessarily require the physical or chemical replacement of one polynucleotide with another.

In any of the methods described herein, additional zinc finger protein pairs may be used for additional double-strand cleavage of additional target sites within the cell.

In embodiments of the methods for targeted recombination and/or replacement and/or alteration of a sequence of a region of interest in cellular chromatin, chromosomal sequences are altered by homologous recombination with exogenous "donor" nucleotide sequences. If sequences homologous to the break region are present, such homologous recombination can be stimulated by double strand breaks present in cellular chromatin.

In any of the methods described herein, the first nucleotide sequence ("donor sequence") can comprise a sequence that is homologous but not identical to the genomic sequence in the region of interest, and thus will stimulate homologous recombination to insert a non-identical sequence in the region of interest. Thus, in certain embodiments, the portion of the donor sequence that is homologous to the region of interest sequence exhibits between about 80% and 99% sequence identity to the replaced genomic sequence (or any integer therebetween). In other embodiments, the homology between the donor and genomic sequences is greater than 99%, for example if there is only a 1 nucleotide difference between the donor and genomic sequences with more than 100 contiguous base pairs. In some cases, the non-homologous portion of the donor sequence can comprise a sequence that is not present in the region of interest, such that a new sequence is introduced into the region of interest. In these cases, the non-homologous sequences are typically flanked by sequences of 50 to 1,000 base pairs (or any integer value therebetween) or any number greater than 1,000 base pairs, which are homologous or identical to the sequences in the region of interest. In other embodiments, the donor sequence is non-homologous to the first sequence, and the donor sequence is inserted into the genome by a non-homologous recombination mechanism.

Any of the methods described herein can be used for partial or complete inactivation of one or more target sequences in a cell by targeted integration of a donor sequence that disrupts expression of a gene of interest. Cell lines having partially or fully inactivated genes are also provided.

Furthermore, targeted integration methods as described herein can also be used to integrate one or more exogenous sequences. The exogenous nucleic acid sequence may include, for example, one or more genes or cDNA molecules, or any type of coding or non-coding sequence, and one or more control elements (e.g., promoters). In addition, the exogenous nucleic acid sequence may produce one or more RNA molecules (e.g., small hairpin RNA (shrna), inhibitory RNA (rnais), microrna (mirna), etc.).

"cleavage" refers to the disruption of the covalent backbone of a DNA molecule. Cleavage can be initiated by a number of methods, including but not limited to enzymatic or chemical hydrolysis of phosphodiester bonds. Both single-stranded and double-stranded cleavage are possible, and double-stranded cleavage can occur as a result of two distinct single-stranded cleavage phenomena. DNA cleavage can result in blunt ends or staggered ends. In certain embodiments, the fusion polypeptide is used to target double-stranded DNA cleavage.

A "cleavage half-domain" is a polypeptide sequence linked to a second polypeptide (the same or different) to form a complex with cleavage activity, preferably double-stranded cleavage activity. The term "first and second cleavage half-domains"; "+ and-cleavage half-domains" and "rear and left cleavage half-domains" are used interchangeably to refer to a pair of dimerized cleavage half-domains.

An "engineered cleavage half-domain" is a cleavage half-domain that has been modified to form an obligate heterodimer with another cleavage half-domain (e.g., another engineered cleavage half-domain). See also U.S. patent publication nos. 2005/0064474; 2007/0218528, and 2008/0131962, which are incorporated herein by reference in their entirety.

"chromatin" is a nucleoprotein structure comprising the genome of a cell. Cellular chromatin includes nucleic acids (primarily DNA) as well as proteins, including histone and non-histone chromosomal proteins. Most eukaryotic chromatin exists in the form of nucleosomes, in which the nucleosome core comprises approximately 150 DNA base pairs associated with an octamer comprising two each of histones H2A, H2B, H3 and H4; and linker DNA (of variable length depending on the organism) extends between the nucleosome cores. The histone H1 molecule is typically associated with a linker DNA. For the purposes of this disclosure, the term "chromatin" is meant to encompass all types of nuclear proteins, whether prokaryotic or eukaryotic. Cellular chromatin includes both chromosomal and episomal chromatin.

A "chromosome" is a chromatin complex that includes all or part of the genome of a cell. The genome of a cell is usually characterized by its karyotype, which is the collection of all chromosomes that comprise the genome of the cell. The genome of the cell may include one or more chromosomes.

An "episome" is a replicated nucleic acid, nucleoprotein complex or other structure that includes a nucleic acid that is not part of the chromosomal karyotype of a cell. Examples of episomes include plasmids and certain viral genomes.

A "target site" or "target sequence" is a nucleic acid sequence that defines a portion of a nucleic acid to which a binding molecule binds (provided that sufficient conditions exist for binding). For example, the sequence 5 '-GAATTC-3' is the target site for an EcoRI restriction endonuclease.

An "exogenous" molecule is a molecule that is not normally present in a cell, but can be introduced into a cell by one or more genetic, biochemical, or other means. "normally present in a cell" is determined relative to the particular developmental stage and environmental conditions of the cell. Thus, for example, molecules that are only present during muscle embryo development are exogenous molecules relative to adult muscle cells. Similarly, a molecule induced by heat shock is an exogenous molecule relative to a cell that is not heat shocked. Exogenous molecules may include, for example, a functional pattern of endogenous molecules that function improperly or a functional pattern of endogenous molecules that function properly. The foreign molecule may also be a molecule that is normally found in another species, such as a human sequence introduced into the genome of an animal.

Wherein the foreign molecule may be a small molecule, such as produced by a combinatorial chemistry, or a large molecule, such as a protein, nucleic acid, carbohydrate, lipid, glycoprotein, lipoprotein, polysaccharide, modified derivative of any of the above, or any complex comprising one or more of the above. Nucleic acids include DNA and RNA, and may be single-stranded or double-stranded; may be linear, branched or cyclic; and may be any length. Nucleic acids include those capable of forming duplexes as well as triplets. See, for example, U.S. Pat. nos. 5,176,996 and 5,422,251. Proteins include, but are not limited to, DNA binding proteins, transcription factors, chromatin remodeling factors, methylated DNA binding proteins, polymerases, methylases, demethylases, acetyltransferases, deacetylases, kinases, phosphatases, integrases, recombinases, ligases, topoisomerases, gyrases, and helicases.

The exogenous molecule may be the same type of molecule as the endogenous molecule, e.g., an exogenous protein or nucleic acid. For example, the exogenous nucleic acid may include an infected viral genome, a plasmid or episome introduced into the cell, or a chromosome not normally present in the cell. Methods for introducing exogenous molecules into cells are known to those of skill in the art and include, but are not limited to, lipid-mediated delivery (i.e., liposomes, including neutral and cationic lipids), electroporation, direct injection, cell fusion, particle bombardment, calcium phosphate co-precipitation, DEAE-dextran-mediated delivery, and viral vector-mediated delivery.

In contrast, an "endogenous" molecule is one that is normally present in a particular cell at a particular developmental stage under particular environmental conditions. For example, an endogenous nucleic acid can include the genome of a chromosome, mitochondria, chloroplast or other organelle, or an episomal nucleic acid that occurs in nature. Additional endogenous molecules may include proteins, such as transcription factors and enzymes.

A "fusion" molecule is a molecule in which two or more subunit molecules are linked, preferably covalently linked. The subunit molecules may be molecules of the same chemical type, or may be molecules of different chemical types. Examples of the first type of fusion molecule include, but are not limited to, fusion proteins (e.g., fusion between ZFPDNA binding domain and cleavage domain) and fusion nucleic acids (e.g., nucleic acids encoding the fusion proteins described above). Examples of the second type of fusion molecule include, but are not limited to, fusions between triplex forming nucleic acids and polypeptides and fusions between minor groove binding agents and nucleic acids.

Expression of the fusion protein in the cell can result from delivery of the fusion protein to the cell or by delivery of a polynucleotide encoding the fusion protein to the cell, wherein the polynucleotide is transcribed and the transcript is translated to produce the fusion protein. Intermolecular splicing, polypeptide cleavage, and polypeptide ligation reactions may also be involved in protein expression in cells. Methods for delivery of polynucleotides and polypeptides to cells are set forth elsewhere in this disclosure.

"Gene" for purposes of this disclosure includes a DNA region encoding a gene product (see below), as well as all DNA regions that regulate the production of a gene product, whether or not such regulatory sequences are contiguous with coding and/or transcribed sequences. Thus, genes include, but are not necessarily limited to, promoter sequences, terminators, translational regulatory sequences such as ribosome binding sites and internal ribosome entry sites, enhancers, silencers, insulators, border elements, origins of replication, matrix binding sites, and locus control regions.

"Gene expression" refers to the conversion of information contained in a gene into a gene product. The gene product can be a direct transcription product of a gene (e.g., mRNA, tRNA, rRNA, antisense RNA, ribozyme, structural RNA, or any other type of RNA) or a protein produced by translation of mRNA. Gene products also include RNA modified by methods such as capping, polyadenylation, methylation and editing, and proteins modified by, for example, methylation, acetylation, phosphorylation, ubiquitination, ADP-ribosylation, myristosylation and glycosylation.

"modulation" of gene expression refers to a change in gene activity. Expression modulation may include, but is not limited to, gene activation and gene repression. Gene editing (e.g., cleavage, alteration, inactivation, random mutation) can be used to modulate expression. Gene inactivation refers to any reduction in gene expression compared to cells that do not include ZFPs as described herein. Thus, gene inactivation may be partial or complete.

A "region of interest" is any region of cellular chromatin, such as, for example, a gene or a non-coding sequence within or near a gene, where the incorporation of a foreign molecule is desirable. Binding may be for the purpose of targeted DNA cleavage and/or targeted recombination. The region of interest can be present, for example, in a chromosome, an episome, an organelle genome (e.g., mitochondria, chloroplasts), or an infectious viral genome. The region of interest may be within the coding region of the gene, within a transcribed non-coding region, such as, for example, a leader sequence, trailer sequence or intron, or within a non-transcribed region within the upstream or downstream of the coding region. The region of interest may be as small as a single nucleotide pair, or up to 2,000 nucleotide pairs in length, or any integer value of nucleotide pairs.

The terms "operably linked" or "operably linked" (or "operably linked") are used interchangeably with respect to juxtaposing two or more components (e.g., sequence components) wherein the components are arranged so that the components both function properly and so that at least one of the components can mediate an applied function to at least one of the other components. Illustratively, a transcriptional regulatory sequence, such as a promoter, may be operably linked to a coding sequence if the transcriptional regulatory sequence controls the level of transcription of the coding sequence in response to the presence or absence of one or more transcriptional regulatory factors. Transcriptional control sequences are typically operably linked to a coding sequence in cis, but need not be directly adjacent to the coding sequence. For example, enhancers are transcriptional regulatory sequences operably linked to a coding sequence, although they are not contiguous.

The term "operably linked" with respect to a fusion polypeptide may refer to the fact that: each component performs the same function in the connected state with the other component as it does without being so connected. For example, a ZFPDNA binding domain and a cleavage domain are operably linked if, in the fusion polypeptide, the ZFPDNA binding domain is capable of binding its target site and/or its binding site while the cleavage domain is capable of cleaving DNA in the vicinity of the target site, relative to a fusion polypeptide in which the ZFPDNA binding domain is fused to the cleavage domain.

A "functional fragment" of a protein, polypeptide, or nucleic acid is a protein, polypeptide, or nucleic acid that differs in sequence from the full-length protein, polypeptide, or nucleic acid, but retains the same function as the full-length protein, polypeptide, or nucleic acid. Functional fragments may possess more, fewer, or the same number of residues as the corresponding native molecule, and/or may comprise one or more amino acid or nucleic acid substitutions. Methods for determining a function of a nucleic acid (e.g., encoding a function, the ability to hybridize to another nucleic acid) are well known in the art. Similarly, methods for determining protein function are well known. For example, the DNA binding function of a polypeptide can be determined, e.g., by filter binding, electrophoretic migration shift, or immunoprecipitation experiments. DNA cleavage can be analyzed by gel electrophoresis. See Ausubel et al, supra. The ability of a protein to interact with another protein can be determined, for example, by co-immunoprecipitation, two-hybrid experiments, or complementation, both genetic and biochemical. See, e.g., Fields et al (1989) Nature 340: 245-246; U.S. Pat. No. 5,585,245 and PCTWO 98/44350.

Zinc finger nucleases

Described herein are Zinc Finger Nucleases (ZFNs) that can be used for genome editing (e.g., cleavage, alteration, inactivation, and/or random mutation) of one or more Rosa genes. ZFNs include Zinc Finger Proteins (ZFPs) and nuclease (cleavage) domains (e.g., cleavage half-domains).

A. Zinc finger proteins

The zinc finger binding domain may be engineered to bind to a selection sequence. See, e.g., Beerli et al (2002) NatureBiotechnol.20: 135-141; pabo et al (2001) Ann. Rev. biochem.70: 313-340; isalan et al (2001) Nature Biotechnol.19: 656-; segal et al (2001) curr. opin. biotechnol.12: 632-637; choo et al (2000) curr. opin. struct. biol.10: 411-416. The engineered zinc finger binding domains may have novel binding specificities compared to naturally occurring zinc finger proteins. Engineering methods include, but are not limited to rational design and various types of selection. Rational design includes, for example, the use of a database comprising triplet (or quadruplet) nucleotide sequences and individual zinc finger amino acid sequences, in which each triplet or quadruplet nucleotide sequence is associated with one or more amino acid sequences of a zinc finger that binds to a particular triplet or quadruplet sequence. See, for example, commonly owned U.S. Pat. Nos. 6,453,242 and 6,534,261, which are incorporated herein by reference in their entirety.

Exemplary selection methods including phage display and two-hybrid systems are disclosed in U.S. patent 5,789,538; 5,925,523, respectively; 6,007,988, respectively; 6,013,453, respectively; 6,410,248, respectively; 6,140,466, respectively; 6,200,759 and 6,242,568; and WO 98/37186; WO 98/53057; WO 00/27878; WO01/88197 and GB2,338,237. In addition, binding specificity enhancement for zinc finger binding domains has been described, for example, in commonly owned WO 02/077227.

Selecting a target site; ZFPs and methods for designing and constructing fusions (and polynucleotides encoding these proteins) are known to those skilled in the art and are described in detail in U.S. application publication nos. 20050064474 and 20060188987, the disclosures of which are incorporated by reference in their entirety.

In addition, as disclosed in these and other references, zinc finger domains and/or multi-fingered zinc finger proteins can be linked together using any suitable linker sequence, including, for example, linkers of 5 or more amino acids in length (e.g., TGEKP (SEQ ID NO:1), TGGQRP (SEQ ID NO:2), TGQKP (SEQ ID NO:3), and/or TGSQKP (SEQ ID NO: 4)). For exemplary 6 or more amino acid length linker sequences, see also U.S. patent No. 6,479,626; 6,903,185, and 7,153,949. The proteins described herein may include any combination of suitable linkers between zinc fingers of the individual proteins.

As described below, in certain embodiments, the four, five or six finger binding domain is fused to a cleavage half-domain, such as, for example, the cleavage domain of a type IIs restriction endonuclease, e.g., FokI. One or more pairs of such zinc finger/nuclease half-domain fusions can be used for targeted cleavage, as disclosed, for example, in U.S. patent publication No. 20050064474.

For targeted cleavage, the proximal edges of the binding sites can be separated by 5 or more nucleotide pairs, and the fusion proteins can each bind to opposite strands of the DNA target. All pairwise combinations 1 can be used for Rosa gene targeted cleavage. In accordance with the present disclosure, ZFNs can be targeted to any sequence in the genome of an animal.

In some embodiments, the DNA binding domain is an engineered domain from a TAL effector derived from the plant pathogen Xanthomonas (see Boch et al, (2009) Science 326: 1509-1512 and Moscou and Bogdannove, (2009) Science 326: 1501).

B. Cleavage domains

ZFNs also include nucleases (cleavage domains, cleavage half-domains). The cleavage domain portion of the fusion proteins disclosed herein can be obtained from any endonuclease or exonuclease. Exemplary endonucleases from which cleavage domains can be derived include, but are not limited to, restriction endonucleases and homing endonucleases. See, e.g., 2002-; and Belfort et al (1997) nucleic acids as cidsRes.25: 3379-3388. Enzymes which additionally cleave DNA are known (e.g., S1 nuclease; mung bean nuclease; pancreatic DNaseI; micrococcal nuclease; yeast HO endonuclease; see also Linn et al (ed.) nucleotides, Cold spring harbor laboratory Press, 1993). One or more of these enzymes (or functional fragments thereof) may be used as a source of cleavage domains and cleavage half-domains.

Similarly, the cleavage half-domain may be derived from any nuclease or portion thereof that requires dimerization for cleavage activity, as set forth above. Generally, if the fusion protein comprises a cleavage half-domain, then two fusion proteins are required for cleavage. Alternatively, a single protein comprising two cleavage half-cleavage domains may be used. The two cleavage half-domains may be derived from the same endonuclease (or functional fragments thereof), or each cleavage half-domain may be derived from a different endonuclease (or functional fragment thereof). In addition, the target sites of the two fusion proteins are preferably arranged relative to each other such that binding of the two fusion proteins to their respective target sites places the cleavage half-domains in a spatial location to each other such that the half-domains form a functional cleavage domain, e.g., by dimerization. Thus, in certain embodiments, the proximal edges of the target sites are separated by 5 to 8 nucleotides or by 15 to 18 nucleotides. However, any integer number of nucleotides or nucleotide pairs may be interposed between the two target sites (e.g., from 2 to 50 nucleotide pairs or more). Typically, the site of cleavage is located between the target sites.

Restriction endonucleases (restriction enzymes) are present in many species and are capable of sequence-specific binding to DNA (at the recognition site) and of cleaving DNA at or near the binding site. Certain restriction enzymes (e.g., type IIS) cleave DNA at sites remote from the recognition site and have separate binding and cleavage domains. For example, the type IIS enzyme fokl catalyzes DNA double strand cleavage at 9 nucleotides from the recognition site on one strand and at 13 nucleotides from the recognition site on the other strand. See, for example, U.S. Pat. nos. 5,356,802; 5,436,150 and 5,487,994; and Li et al (1992) Proc.Natl.Acad.Sci.USA89: 4275-4279; li et al (1993) Proc. Natl. Acad. Sci.USA90: 2764-2768; kim et al (1994a) proc.natl.acad.sci.usa 91: 883-887; kim et al (1994b) j.biol.chem.269: 31, 978-31, 982. Thus, in one embodiment, the fusion protein comprises a cleavage domain (or cleavage half-domain) from at least one type IIS restriction enzyme and one or more zinc finger binding domains, which may or may not be engineered.

An exemplary type IIS restriction enzyme in which the cleavage domain is separate from the binding domain is FokI. This particular enzyme functions as a dimer. Bitinaite et al (1998) proc.natl.acad.sci.usa95: 10, 570-10, 575. Thus, for the purposes of this disclosure, the fokl enzyme portion used in the disclosed fusion proteins is considered to be the cleavage half-domain. Thus, for targeting double-stranded cleavage and/or targeting replacement of cellular sequences using zinc finger fokl fusions, two fusion proteins (each comprising a fokl cleavage half-domain) can be used to reconstitute the catalytically active cleavage domain. Alternatively, a single polypeptide molecule comprising a zinc finger binding domain and two fokl cleavage half-domains may also be used. Parameters for targeting cleavage and targeting sequence changes using zinc finger FokI fusions are provided elsewhere in the disclosure.

The cleavage domain or cleavage half-domain can be any portion of a protein that retains cleavage activity or retains the ability to multimerize (e.g., dimerize) to form a functional cleavage domain.

Exemplary type IIS restriction enzymes are described in International publication WO07/014275, which is incorporated by reference herein in its entirety. Additional restriction enzymes also comprise separate binding and cleavage domains, and these enzymes are encompassed by the present disclosure. See, e.g., Roberts et al (2003) nucleic acids Res.31: 418-420.

In certain embodiments, the cleavage domain comprises one or more engineered cleavage half-domains (also referred to as dimerization domain mutants) that minimize or prevent homodimerization, as described, for example, in U.S. patent publication nos. 20050064474; 20060188987 and 20080131962, the disclosures of all of which are incorporated herein by reference in their entirety. Amino acid residues at positions 446, 447, 479, 483, 484, 486, 487, 490, 491, 496, 498, 499, 500, 531, 534, 537, and 538 of FokI are all targets for affecting FokI cleavage half-domain dimerization.

Exemplary engineered cleavage half-domains of FokI that form obligate heterodimers include pairs in which the first cleavage half-domain includes mutations at amino acid residues 490 and 538 of FokI and the second cleavage half-domain includes mutations at amino acid residues 486 and 499.

Thus, in one embodiment, the mutation at 490 replaces glu (e) with lys (k); mutation at 538 replaced iso (i) with lys (k); mutation on 486 replaces gln (q) with glu (e); and a mutation at position 499 replaces iso (i) with lys (k). More specifically, the engineered cleavage half-domains described herein are prepared by mutating positions 490(E → K) and 538(I → K) on one cleavage half-domain to yield the designated "E490K: I538K "engineering the cleavage half-function domain and the cleavage half-function domain was modified by mutating positions 486(Q → E) and 499(I → L) on the other cleavage half-function domain to yield the designated" Q486E: I499L "engineering cleavage half-domains. The engineered cleavage half-domains described herein are obligate heterodimeric mutants in which aberrant cleavage is minimized or eliminated. See, for example, example 1 of WO 07/139898. In certain embodiments, the engineered cleavage half-domain comprises mutations at positions 486, 499 and 496 (numbered relative to wild-type fokl), e.g., a substitution of the wild-type gin (q) residue at position 486 with a glu (e) residue, the wild-type iso (i) residue at position 499 with a leu (l) residue, and the wild-type asn (n) residue at position 496 with an asp (d) or glu (e) residue (also referred to as "ELD" and "ELE" domains, respectively). In other embodiments, engineered cleavage half-domains include mutations at positions 490, 538 and 537 (numbered relative to wild-type fokl), such as mutations that replace the wild-type glu (e) residue with a lys (k) residue at position 490, the wild-type iso (i) residue with a lys (k) residue at position 538, and the wild-type his (h) residue with a lys (k) residue or a arg (r) residue at position 537 (also referred to as "KKK" and "KKR" domains, respectively). In other embodiments, the engineered cleavage half-domain comprises mutations at positions 490 and 537 (numbered relative to wild-type fokl), such as mutations that replace the wild-type glu (e) residue at position 490 with a lys (k) residue and the wild-type his (h) residue at position 537 with a lys (k) or arg (r) residue (also referred to as "KIK" and "KIR" domains, respectively). (see U.S. application No. 12/931,660).

The engineered cleavage half-domains described herein can be prepared using any suitable method, for example, by site-directed mutagenesis of a wild-type cleavage half-domain (fokl) as described in U.S. patent publication No. 20050064474.

C. Additional methods for targeted cleavage

Any nuclease that has a target site in any Rosa gene can be used in the methods disclosed herein. For example, homing endonucleases and meganucleases have recognition sequences that are quite long, and some of these recognition sequences may be present on a statistical basis in human-sized genomes. Any such nuclease with a target site in the Rosa gene can be used instead of or in addition to a zinc finger nuclease for targeted cleavage.

Exemplary homing endonucleases include I-SceI, I-CeuI, PI-PspI, PI-Sce, I-SceIV, I-CsmI, I-PanI, I-SceII, I-PpoI, I-SceIII, I-CreI, I-TevI, I-TevII, and I-TevIII. Their recognition sequences are known. See also U.S. patent nos. 5,420,032; U.S. patent nos. 6,833,252; belfort et al (1997) nucleic acids as ciddsRes.25: 3379-; dujon et al (1989) Gene 82: 115-118; perler et al (1994) nucleic acids as Res.22, 1125-1127; jasin (1996) trends Genet.12: 224-228; gimble et al (1996) J.mol.biol.263: 163-180; argast et al (1998) J.mol.biol.280: 345, 353 and the new england biological laboratory catalog (the new englangandbaiulascatalogue).

Although the cleavage specificity of most homing endonucleases is not absolute with respect to their recognition site, the site is of sufficient length that a single cleavage event per mammalian-sized genome can be obtained by expressing the homing endonuclease in a cell that contains a single copy of its recognition site. It has also been reported that the specificity of homing endonucleases and meganucleases can be engineered to bind to unnatural target sites. See, e.g., Chevalier et al (2002) molec. cell10: 895-905; epinat et al (2003) nucleic acids as cerdsRes.31: 2952-2962; ashworth et al (2006) Nature 441: 656-659; paques et al (2007) CurrentGeneTherapy 7: 49-66.

Transport of

The ZFNs described herein can be delivered to the target cell by any suitable means, including, for example, by injection of ZFN mrna. See Hammerschmidt et al (1999) methods CellBiol.59: 87-115.

As described in, for example, U.S. Pat. nos. 6,453,242; 6,503,717, respectively; 6,534,261; 6,599,692, respectively; 6,607,882, respectively; 6,689,558, respectively; 6,824,978, respectively; 6,933,113, respectively; 6,979,539, respectively; 7,013,219 and 7,163,824, the disclosures of all of which are incorporated herein by reference in their entirety.

The ZFNs described herein can also be delivered using a vector that contains a sequence that encodes one or more of the ZFNs. Any vector system may be used, including but not limited to plasmid vectors, retroviral vectors, lentiviral vectors, adenoviral vectors, poxviral vectors; herpes virus vectors, adeno-associated virus vectors, and the like. See also U.S. Pat. nos. 6,534,261; 6,607,882, respectively; 6,824,978, respectively; 6,933,113, respectively; 6,979,539, respectively; 7,013,219, and 7,163,824, which are incorporated herein by reference in their entirety. Furthermore, any of these vectors may obviously comprise sequences encoding one or more ZFNs. Thus, when one or more pairs of ZFNs are introduced into a cell, the ZFNs may be carried on the same vector or on different vectors. When multiple vectors are used, each vector may include a sequence encoding one or more ZFNs.

Traditional viral and non-viral gene delivery methods can be used to introduce nucleic acids encoding engineered ZFPs into cells. This method can be used to administer nucleic acids encoding ZFPs in vitro into cells. In certain embodiments, the nucleic acid encoding a ZFP may be administered in vivo or ex vivo.

Non-viral vector delivery systems include electroporation, lipofection, microinjection, gene gun, virosome, liposome, immunoliposome, polycation or lipid: nucleic acid conjugates, naked DNA, artificial viral particles, and agents enhance uptake of DNA. The sonoporation using, for example, the Sonitron2000 system (Rich-Mar) can also be used for nucleic acid delivery. Viral vector delivery systems include DNA and RNA viruses, both of which have episomal or integrated genomes after delivery to cells. Additional exemplary nucleic acid delivery systems include those provided by AmaxaBiosystems (colongene, Germany), Maxcyte, Inc, (Rockville, Maryland), btxmolecula rd linear systems (Holliston, MA), and copernicus therapeutics Inc (see US6008336, example). Lipofection is described in, for example, US5,049,386, US4,946,787, and US4,897,355), and lipofection reagents are commercially available (e.g., Transfectam)^TMAnd Lipofectin^TM). Suitable cationic and neutral lipids for use in polynucleotide efficient receptor recognition lipofection include those of Felgner, WO91/17424, WO 91/16024. Delivery may be to cells (ex vivo administration) or target tissues (in vivo administration). Lipid: the preparation of nucleic acid complexes, including targeted liposomes such as immunoliposome complexes, is well known to those skilled in the art (see, e.g., Crystal, Science 270: 404- > 410 (1995); Blaese et al, cancer Gene ther.2: 291- > 297 (1995); Behr et al, bioconjugugaTechem.5: 382-389 (1994); remy et al, bioconjugateChem.5: 647-654 (1994); gao et al, Gene therapy 2: 710-722 (1995); ahmad et al, cancer Res.52: 4817 + 4820 (1992); U.S. patent nos. 4,186,183, 4,217,344, 4,235,871, 4,261,975, 4,485,054, 4,501,728, 4,774,085, 4,837,028, and 4,946,787).

Additional delivery methods include the use of encapsulation of the nucleic acid to be delivered into the EnGeneIC Delivery Vehicle (EDV). These EDVs are specifically delivered to a target tissue using a bispecific antibody, where one arm of the antibody has specificity for the target tissue and the other arm has specificity for the EDV. The antibody brings the EDV to the surface of the target cell, and then the EDV is brought into the cell by endocytosis. When in the cell, the contents will be released (see macdiaramid et al (2009) nature biotechnology, volume 27 (7) page 643).

As indicated above, the disclosed methods and compositions can be used in any type of cell. Progeny, variants and derivatives of animal cells may also be used.

Applications of

The disclosed methods and compositions can be used for genome editing of any one or more Rosa genes. In certain applications, the methods and compositions can be used for inactivation of genomic Rosa sequences. In other applications, the methods and compositions enable random mutation generation, including the generation of novel allelic forms of genes with different expression compared to unedited genes or humanized gene integrants, which in turn enables the generation of animal models. In other applications, the methods and compositions can be used to generate random mutations at defined positions in a gene that allow for the identification or selection of animals that carry a completely allelic form of those genes. In other applications, the methods and compositions allow targeted integration of exogenous (donor) sequences in any selected genomic region, e.g., a mouse or rat Rosa gene. Regulatory sequences (e.g., promoters) can be integrated at the site of interest in a targeted manner. By "integrated" is meant both physically inserted (e.g., in the genome of a host cell) and otherwise integrated by replication of the donor sequence into the host cell genome by a nucleic acid replication process. The donor sequence may also include nucleic acids such as shRNA, miRNA, and the like. These small nucleic acid donors can be used to study their effect on genes of interest within the genome. Additional donor sequences of interest can be human genes that encode proteins associated with disease models. Non-limiting examples of such genes include human factor VIII and human factor IX. Insertion of these genes into the Rosa site thus enables researchers to investigate these proteins in greater detail in vivo. Genome editing (e.g., inactivation, integration, and/or targeted or random mutation) can be achieved, for example, by a single cleavage event, by post-cleavage non-homologous end joining, by post-cleavage homology-directed repair mechanisms, by post-cleavage physical integration of donor sequences, by joining after cleavage at two sites so as to delete sequences between the two cleavage sites, by targeted recombination of missense or nonsense codons into the coding region, by targeted recombination of unrelated sequences (i.e., "stuffer" sequences) into the gene or its regulatory region so as to disrupt the gene or regulatory region, or by targeted recombination of spliced acceptor sequences into introns so as to cause transcriptional missplicing. See U.S. patent publication nos. 20030232410; 20050208489, respectively; 20050026157, respectively; 20050064474; 20060188987; 20060063231, respectively; and international publication WO07/014275, the disclosures of which are incorporated herein by reference in their entirety for all purposes.

There are also various applications for Rosa gene ZFN mediated genome editing. The methods and compositions described herein allow for the generation of human disease models. For example, editing the position 53 gene resulted in the creation of a "cancer rat" that provided an animal model for studying cancer and testing cancer treatments.

Examples

Example 1: restriction Fragment Length Polymorphism (RFLP) donor nucleic acids were constructed for targeted integration into the rRosa26 nucleic acid region of the rat genome.

Plasmids were also constructed to target integration of the NotI and PmeRFLP sites into the rRosa26 region of the rat genome. The design and construction of the plasmids are as described above. The PCR primer pairs used to amplify the homologous rRosa26 regions are described in table 1.

Zinc finger designs targeting the indicated target sites in rat Rosa26 are shown in tables 2 and 3. Nucleotides in target sites contacted by ZFP recognition spirochetes are indicated in uppercase letters; uncontacted nucleotides are indicated in lower case.

Table 2: rat rosa26 finger design

Table 3: rat rosa26 target site

Rat C6 cells were transfected with either GFP control or each of the 8 pairs of ZFNs. DNA was prepared from cells one day after transfection. ZFNCleavage Using the products amplified with the respective primers, with Surveyor^TMNucleases, as described in, for example, U.S. patent publication nos. 20080015164; 20080131962 and 20080159996. The results are presented in figure 1. The arrows indicate that lysis was only found in the samples containing ZFN pairs, but not in the control samples where cells were transfected with ZFNs specific for GFP.

Example 2: zinc finger nucleases specific for mouse Rosa26 site

Zinc finger designs targeting the target site in mouse Rosa26 are shown in tables 4 and 5. Nucleotides in target sites contacted by ZFP recognition spirochetes are indicated in uppercase letters; uncontacted nucleotides are indicated in lower case.

Table 4: mouse Rosa26 zinc finger design

Table 5: mouse Rosa26 target site

The Cel-I analysis was performed as described above for ZFN pair 18473/18477 and 18473/25096, and NHEJ percentages were as follows: 26.5% NHEJ using ZFN pair 18477/18473 and 35.70% NHEJ with ZFN pair 18473/25096.

Example 3: targeted integration of donor polynucleotides into the mouse genome at the Rosa26 site

The Rosa donor was constructed by cloning PCR products prepared using the following oligonucleotides: for the 527bp left arm, the oligonucleotide used for PCR was 5'-ggctcgagtgagtcatcagacttctaagatcagg-3' (SEQ ID NO: 31); for the 413bp left arm donor, 5'-ggctcgagttttgataaggctgcagaag-3' (SEQ ID NO:32) was ligated with reverse primer 5'-ctgaattcgaatgggcgggagtcttctgggca-3' (SEQ ID NO: 33).

For the 640bp right arm, the oligonucleotide used for PCR was 5'-ccaagcttggaggtaggtggggtgagg-3' (SEQ ID NO: 34); for the 200bp arm, 5'-ccaagcttagtcgctctgagttgttatc-3' (SEQ ID NO: 35); for the 100bp arm 5'-ccaagctttctgggagttctctgctgcc-3' (SEQ ID NO:36) was linked to the reverse primer 5'-cattcgaattcagaaagactggagttgcagatc-3' (SEQ ID NO: 37). The individual arm amplicons were joined by fusion PCR and cloned to generate donors with varying homology arms. Neuro2a cells (200,000) were co-transfected with 400ng of SBS18473 and 18477, respectively, using the Amaxa-ShutteNeuro 2a high efficiency method with 2 μ g of indicated donor in solution SF.

Genomic DNA was harvested 72 hours post transfection and 100ng of 32P-dATP and 32P-dCTP were used for PCR at a binding temperature of 68 ℃ along with 5'-cccagctacagcctcgattt-3', 5'-cacaaatggcgtgttttggt-3' and 5. mu. Ci of each sample and stretched 28 cycles at 72 ℃ for two minutes. After purification on a G-50 column, 10uL of each 50uL reaction was digested with EcoRI at 37 ℃ for two hours and loaded on a 10% polyacrylamide gel.

As shown in fig. 2, donor nucleotides were inserted into the Rosa site at the indicated frequency.

All patents, patent applications, and patent publications mentioned herein are hereby incorporated by reference in their entirety.

Although the disclosure has been provided by way of illustration and example in some detail for purposes of clarity of understanding, it will be apparent to those skilled in the art that various changes and modifications can be made without departing from the spirit or scope of the disclosure. Accordingly, the foregoing description and examples should not be construed as limiting.

Claims

1. A pair of fusion proteins, including a first and a second fusion protein, each fusion protein comprising a fokl nuclease domain and an engineered zinc finger domain, wherein the engineered zinc finger domain binds to a rat Rosa intron target site, the Rosa intron comprising the sequence seq id nos 75-90, wherein the zinc finger domain comprises a zinc finger DNA-binding domain named F1-F4, F1-F5, or F1-F6 and arranged from N-terminus to C-terminus, wherein at least one zinc finger DNA-binding domain of the pair of fusion proteins comprises: RSANLTR (SEQ ID NO:42) or QNAHRKT (SEQ ID NO:22) or TSSNLSR (SEQ ID NO:71), the pair of zinc finger DNA-binding domains comprising a recognition region being selected from the group consisting of:

(i) the first zinc finger domain in the fusion protein pair is F1: DRSDLSR (SEQ ID NO:38), F2: RSDDLTR (SEQ ID NO:39), F3: TSGHLSR (SEQ ID NO:40), F4: RSDNLSV (SEQ ID NO:41), and F5: RSANLTR (SEQ ID NO: 42); and the second zinc finger domain is F1: QSDHLTK (SEQ ID NO:43), F2: NSSNLSR (SEQ ID NO:44), F3: RSDHLTK (SEQ ID NO:45), F4: NSDHLSR (SEQ ID NO:46), and F5 RSDHLSR (SEQ ID NO: 47);

(ii) the first zinc finger domains in the fusion protein pairs are F1: RSDHLSE (SEQ ID NO:48), F2: RSAALAR (SEQ ID NO:49), F3: RSDHLST (SEQ ID NO:50), F4: QNAHRIT (SEQ ID NO:51), and F5: RSAVLSE (SEQ ID NO: 52); and the second zinc finger domain is

F1 QSGDLTR (SEQ ID NO:17), F2 TSGSLTR (SEQ ID NO:18), F3 RSANLTR (SEQ ID NO:42), F4 RSDHLTK (SEQ ID NO:45), and F5 NSDHLSR (SEQ ID NO: 46);

(iii) the first zinc finger domains in the fusion protein pairs are F1: RSANLTR (SEQ ID NO:42), F2: QSGDLTR (SEQ ID NO:17), F3: QSGDLTR (SEQ ID NO:17), F4: RSANLAR (SEQ ID NO:53), and F5: RSDNLRE (SEQ ID NO: 54); and the second zinc finger domain is F1: RSDHLST (SEQ ID NO:50), F2: DNRDRIK (SEQ ID NO:55), F3: RSDTLSE (SEQ ID NO:56), F4: QSSHLAR (SEQ ID NO:57), and F5: QNAHRKT (SEQ ID NO: 22);

(iv) the first zinc finger domains in the fusion protein pairs were F1: QSGDLTR (SEQ ID NO:17), F2: QSGDLTR (SEQ ID NO:17), F3: RSDNLTR (SEQ ID NO:58), F4: RSDNLSE (SEQ ID NO:21), and F5: QNAHRKT (SEQ ID NO: 22);

and the second zinc finger domain is F1: DRSDLSR (SEQ ID NO:38), F2: RSDHLST (SEQ ID NO:50), F3: DNRDRIK (SEQ ID NO:55), F4: RSDTLSE (SEQ ID NO:56), and F5: QSSHLAR (SEQ ID NO: 57);

(v) the first zinc finger domains in the fusion protein pairs were F1: QSGDLTR (SEQ ID NO:17), F2: RSDNLTR (SEQ ID NO:58), F3: RSDNLSE (SEQ ID NO:21), F4: QNAHRKT (SEQ ID NO:22), F5: RSDHLSE (SEQ ID NO:48), and F6: TSSTRKT (SEQ ID NO: 59);

and the second zinc finger domain is F1: TSGNLTR (SEQ ID NO:60), F2: QSGNLAR (SEQ ID NO:61), F3: RSDALSV (SEQ ID NO:62), F4: DSSRHRTR (SEQ ID NO:63), and F5: RSDVLSE (SEQ ID NO: 64);

(vi) the first zinc finger domains in the fusion protein pairs are F1: RSDNLSE (SEQ ID NO:21), F2: QNAHRKT (SEQ ID NO:22), F3: RSDHLSE (SEQ ID NO:48), F4: TSSTRKT (SEQ ID NO:59), and F5: TSGHLSR (SEQ ID NO: 40);

and the second zinc finger domain is F1: TSGNLTR (SEQ ID NO:60), F2: QSGNLAR (SEQ ID NO:61), F3: RSDALSV (SEQ ID NO:62), and F4: DSSRTR (SEQ ID NO: 63);

(vii) the first zinc finger domains in the fusion protein pairs are F1: QRSNLVR (SEQ ID NO:65), F2: RSDHLTQ (SEQ ID NO:66), F3: QSGHLQR (SEQ ID NO:67), and F4: DRSHLAR (SEQ ID NO: 68);

and the second zinc finger domain is F1: RSDVLSE (SEQ ID NO:64), F2: QRNHRTT

(SEQIDNO:69)，F3:TKRSLIE(SEQIDNO:70)，F4:TSSNLSR

(SEQ ID NO:71), F5: RSDDLSK (SEQ ID NO:25), and F6DNRDRIK (SEQ ID NO: 55);

(viii) the first zinc finger domains in the fusion protein pairs are F1: RSDHLSA (SEQ ID NO:72), F2: QSGHLSR (SEQ ID NO:24), F3: RSDHLSR (SEQ ID NO:47), F4: QNDNRIK (SEQ ID NO:73), and F5: QSGNLAR (SEQ ID NO: 61); and

the second zinc finger domains are F1: NNRDLIN (SEQ ID NO:74), F2: TSSNLSR (SEQ ID NO:71), F3: RSDVLSE (SEQ ID NO:64), F4: QRNHRTT (SEQ ID NO:69), F5: TKRSLIE (SEQ ID NO:70), F and 6: TSSNLSR (SEQ ID NO: 71).

2. The pair of fusion proteins of claim 1, wherein the fokl domain is naturally occurring or engineered.

3. A polynucleotide encoding a pair of fusion proteins according to claim 1 or 2.

4. A cell comprising a pair of fusion proteins according to claim 1 or 2 or a polynucleotide according to claim 3, which cell is not a human embryonic cell.

5. The cell of claim 4, wherein the cell is an embryonic cell.

6. A composition comprising a pair of fusion protein according to claim 1 or 2 or a polynucleotide according to claim 3 and a pharmaceutically acceptable excipient.

7. A method for lysing one or more Rosa genes in a cell, the method comprising:

introducing one or more fusion protein pairs according to claim 1 or 2 or one or more polynucleotides according to claim 3 into said cell such that one or more Rosa genes are cleaved.

8. A method of introducing an exogenous polynucleotide sequence into the genome of said cell, said method comprising

Cleaving one or more Rosa genes according to the method of claim 7; and

contacting the cell with an exogenous polynucleotide sequence;

wherein cleavage of the one or more genes stimulates integration of the exogenous polynucleotide sequence into the genome by homologous recombination.

9. The method of claim 8, wherein the exogenous polynucleotide sequence is physically entered into the genome.

10. The method of claim 9, wherein the exogenous polynucleotide sequence is integrated into the genome via a nucleic acid replication process.

11. The method of claim 9, wherein the exogenous polynucleotide sequence is integrated into the genome via homology-independent targeted integration.

12. A method of modifying a Rosa gene sequence in the genome of the cell, the method comprising

The method of claim 8 for cleaving one or more Rosa genes, wherein

(i) The first ZFN cleaves at a first cleavage site, and the second ZFN cleaves at a second cleavage site;

(ii) the Rosa gene sequence is located between the first cleavage site and the second cleavage site;

(iii) cleavage of the first and second cleavage sites results in modification of the gene sequence by non-homologous end joining or homology-directed repair.

13. The method of claim 12, wherein the modification comprises a deletion.

14. The method of claim 13, wherein the modification comprises insertion of an exogenous sequence.

15. A method of producing a transgenic animal, the method comprising;

modifying a Rosa gene sequence in an embryonic cell according to any one of claims 12 to 14; and

allowing the embryo to develop into an animal.

16. The method of claim 15, wherein the modification comprises one or more random mutations at specified positions.

17. The method of claim 15, wherein the modification comprises insertion of a humanized gene.

18. The method of claim 17, wherein the humanized gene is associated with drug metabolism.

19. The method of any one of claims 15 to 18, wherein the animal is a sexually mature animal and the modified gene sequence is present in at least a portion of a gamete of the sexually mature animal.

20. A method of generating one or more genetically mutated alleles in at least one target Rosa locus, the method comprising

Producing the transgenic animal of any one of claims 15-17, wherein the embryo is cultured to sexual maturity; and

allowing said sexually mature animal to produce offspring; wherein at least some of the progeny comprise the mutant allele.

21. A kit comprising a pair of fusion proteins according to claim 1 or 2 or a polynucleotide according to claim 3.

22. The kit of claim 21, further comprising an additional component selected from the group consisting of: one or more exogenous sequences, instructions for use, and combinations thereof.