GB2360285A

GB2360285A - Design and synthesis of Zinc Finger Proteins

Info

Publication number: GB2360285A
Application number: GB0111280A
Authority: GB
Inventors: Stephen P Eisenberg; Casey Christopher Case; Iii George Norbert Cox; Andrew Jamieson; Edward J Rebar
Original assignee: Sangamo Biosciences Inc
Current assignee: Sangamo Therapeutics Inc
Priority date: 1999-01-12
Filing date: 2000-01-12
Publication date: 2001-09-19
Anticipated expiration: 2020-01-12
Also published as: GB2360285B; GB0111280D0

Abstract

There is described a method for designing Zinc Finger Proteins (ZFPs) which essentially comprises (a) providing a database comprising a plurality of designations of ZFPs and sub-designations of each of three fingers, and a corresponding nucleic acid with triplets bound by ZFP in contiguous order, (b) providing triplets in contiguous order as a target for the design of the ZFP, (c) identifying ZFP fingers that bind specifically to triplets, (d) outputting the designations and sub-designations of the identified ZFPs and (e) synthesizing proteins on this basis. A sub-score is assigned to each triplet from the correspondence regime and the sub-scores combined. A target with a high score is preferably selected as the target site for the design of the ZFP (step (c)).

Description

2360285 SELECTION OF SITES FOR TARGETING BY ZINC FINGER PROTEINS AND

METHODS OF DESIGNING ZINC FINGER PROTEINS TO BrNDS TO PRESELECTED SITES

TECHNICAL FIELD

The invention resides in the technical fields of bloinfori-natics, and protein

10engineering.

BACKGROUND

Zinc finaer proteins (ZIPs)are proteins that can bind to DNA in a manner. Zinc fingers were first identified in the transcription factor sequence-speci TFI11A frorn the oocytes of the African clawed toad, Xenopus laevis. An exemplary m'f chai one class of these prote' 1 12-His otl -acteri 1 111 (C21-12 class) 'S -CYS-(X)2-4-CYS-(X) (X)3--His (where X is any anifflo acid). A single finger domain is about 30 amino acids C) Z> in lengfli, and several structural studies have demonstrated that it contains all alpha helix containing the two invariant histidine residues and two invariant cystelne residues in a beta turn co-ordinated through zinc. To date, over 10,000 zinc finger sequences have been identified in several thousand known or putative transcription factors. Zinc finger dornalns are involved not only in DNA-recognition, but also in RNA blinding and in protein-protein binding. Current estimates are that this class of molecules will constitute about 2% of all human genes.

The x-ray crystal structure of Z1f268, a three-finger dornain from a murine transcription factor, has been solved in complex with a cognate DNA- sequence and shows that each finger can be superimposed on the next by a periodic rotation. The structure suggests that each fin-er interacts independently with DNA over base-pair intervals, with side-chains at positions - 1, 2, ') and 6 on each recognition helix making )o contacts with their respective DNA triplet subsites. The arnino terminus of Z1f268 is situated at the J end of the DNA strand with which it makes most contacts. Recent results have indicated that some zinc fingers can bind to a fourth base in a target segment.

2 If the strand with which a zinc finger protein makes most contacts is designated the target strand, soine zinc finger proteins bind to a three base triplet in the target strand and a fourth base on the nontarget strand. The fourth base is complementary to the base immediately 3' of the three base subsite.

The structure of the Z1f268-DNA complex also suggested that the DNA sequence specificity of a zinc finger protein might be altered by maki rig amino acid substitutions at the four helix positions (- 1, 2, 3 and 6) on each of the zinc finger reco-riltion helices. Phage display experiments usincl zinc fin-er combinatorial libraries 0 W C) to test this observation were published in a series of papers in 1994 (Rebar cl al., Scietice 263), 671-673 (1994),- Jarrileson cl al., 33, 5689-5695 (1994),- Choo et al, PNAS 91, 11163-11167 (1994)). Combinatorial libraries were constructed with randornized side-chains in either the first or middle fiii(,er ofZ1f268 and then used to select for an altered Z1C68 binding site in which the appropriate DNA sub- site was replaced by an altered DNA triplet. Further, correlation between the nature of introduced mutations and the resulting alteration in binding specificity gave rise to a partial set of substitution rules for design of ZFPs with altered bindino, specificity.

Greisman & Pabo, cieiice 275, 657-661 (1997) discuss an elaboration of the phage display method in which each finger of a Z1f268 was successively randornized and selected for binding to a new triplet sequence. This paper reported selection of ZM for a nuclear hormone response element, a p53) target site and a TATA box sequence.

A number of papers have reported attempts to produce ZFPs to modulate particular target sites. For example, Choo et al., Natin-e 3 72, 645 (1994), report an attempt to design a ZFP that would repress expression of a brc-abl oncogene. The target C1 segment to which +the ZFPs would bind was a nine base sequence YWA GAA33' GCC chosen to overlap the Junction created by a specific oncogenic translocation fusing the genes encoding bre and abl. The intention was that a ZFP specific to this target site would bind to the oncogene without binding to abl or bre component genes. The authors used phage display to screen a mini-library of variant ZFPs for binding to this target segment. A variant ZFP thus isolated was then reported to repress expression of a stably transfected bre-able construct in a cell line.

Pornerantz cl al., Science 267, 93)-96 (1995) reported an attempt to design a novel DNA binding protein by fusing two fingers from Z1f268 with a homeodomain 1 Z:1 from Oct-1. The hybrid protein was then fused with a transcriptional activator for expression as a chimeric protein. The chimeric protein was reported to bind a target site representing a hybrid of the subsites of its two components. The authors then constructed a reporter vector containing a luciferase gene operably linked to a promoter and a hybrid site for the chimeric DNA binding protein in proximity to the promoter. The authors reported that their chimeric DNA binding protein could activate expression of the luciferase gene.

Llu et al., PNAS 94, 5525-5530 (1997) report forming a composite zinc finger protein by using a peptide spacer to link two component zinc finger proteins each havinc:, three fincyers. The composite protein was then further linked to transcriptional activation dorriain. It was reported that the resulting chirneric protein bound to a target site fori-ned from the tar(let segments bound by the two component zinc finger proteins. It was furtlier reported that the chimeric zinc finger protein could activate transcription of a reporter gene when its tar(-.,et site was inserted into a reporter plasi- nid in proximity to a promoter operably linked to the reporter, Choo et al., WO 98/53058, W098/53059, and WO 98/53060 (1998) discuss selection of zinc finaer proteins to bind to a taroet site within the HIV Tat gene.

C1 Choo et al. also discuss selection of a zinc finger protein to bind to a target site encompassing a site of a corni-non mutation in the oncogene ras. The target site within ras was thus constrained by the position of the mutation.

None of the above studies provided criteria for systematically evaluating the respective merits of the different potential target sites within a candidate gene. The phage display studies by Rebar et al., supra, Jamieson et al., supra and Choo et al, PNAY(1994) supra, all focused on alterations of the natural Z1f268 binding-site, 5'WG TGG GCGc3', and were not rnade with reference to a predetermined target gene. Choo et al. Nature ( 1994), supra's selection of target site was constrained solely by the intent that the site overlap the interface between brc and abl segments and did not Involve a comparison of different potential targef sites. Likewise, Greisman & Pabo chose certain target sites because of their known regulatory roles and did not consider the relative merits of different potential target segments within a preselected target gene. Similarly, Choo et al. (1998), supra's choice of target site within ras was constrained by the position of a mutation. No criterion is provided for Choo et al. ( 1 998)'s selection of a target site in HIV Tat. Finally, both Pornerantz et al., supra and Llu et al., supra constructed artificial 4 hybrid target sites for composite zinc fingers and then Inserted the target sites into reporter constructs.

SUMMARY OF TUE INVENTION

The invention provides methods of selecting a target site within a target sequence for targeting by a zinc finger protein. Some such methods comprise providing a target nucleic acid to be targeted by a zinc finger protein and outputting a target site within the target nucleic acid comprising S'NNx aNy bNzcY. Each of (x, a), (y, b) and (z, c) is (NI, N) or (Q K) provided at least one of (x, a), (y, b) and (z, c) is (Q K). N and K are 1UPACAUB at-nbl,(,tilty codes. In sorne methods, a plurality of segments within the target nucleic acid are selected and a subset of the plurality of segments comprising C 5WNx aNy bNzcY is output. Typically the target nuclelc acid comprises a target úene.

In sorne methods, at least two of (x, a), (y, b) and (z, c) is (G, K) In sorne methods, all three of (x, a), (y, b) and (z, c) are (Q K). Some methods further comprise identifying a second segment of the gene coi-nprislii,(:,, 5'NNx aNy bNzc') ', wherein each of (x, a), (y, b) and (z, c) Is (N, N) or (G, K),- at least one of (x, a), (y, b) and (z, c) is (G, K). and N and K are 1UPAC-1UJ3 amblau-lty codes. In sorric rnethods, in the second segment at least two of (x, a), (y, b) and (z, c) are (G, K). In sorne methods, all three of at least one of (x, a), (y, b) and (z, c) are (G, K). In some methods, the first and second segments are separated by fewer than 5 bases in the target site.

Some methods further comprise synthesizing a zinc finger protein comprising first, second and third fingers that bind to the I)Nz aNy and NNx triplets respectively. In some such methods, the synthesizing step comprises synthesizing a first zinc finger protein comprising three zinc fingers that respectively bind to the NNx aNy and bNz triplets in the target segment and a second three fingers that respectively bind to the NNx aNy and bNz triplets in the second target seament. In some methods, each of the first, second and third fingers is selected or designed independently. In some methods, a finger is designed from a database containing designations of zinc finger proteins, subdesignations of finger components, and nucleic acid sequences bound by the zinc finger proteins. In some methods, a finger is selected by screening variants of a zinc finerer binding, protein for specific binding to the target site to identify a variant that binds 1 W to the target site.

Some methods further comprise contacting a sample containing the target nucleic acid with the zinc finger protein, whereby the zinc finger protein binds to the target site revealing the presence of the target nucleic acid or a particular allelle form thereof In some methods, a sample containing the target nucleic acid is contacted with the zinc finger protein, whereby the zinc finger protein binds to the target site thereby modulating expression of the target nucleic acid.

In some methods, the target site occurs in a coding region. In some methods, the target site occurs within or proximal to a promoter, enhancer, of transcription start site. In sorne methods, the target site occurs outside a promoter, regulatory sequence or polymorphic site within the target nucleic acid In another aspect, the invention provides alternate methods for selecting a target site within a polynucleotide for tai-(jetln,(7 by a zinc finger protein. These methods, Z comprising provIding a polynucleotide sequence and selecting a potential target site within the polynucleotide sequence., the potential target site comprising contiguous first, second and third triplets of bases at first, second and third positions in the potential target site. -A plurality of subscores are then determined by applying a correspondence regime between triplets and triplet position in a sequence of three contiguous triplets, wherein each triplet has first, second and third correspondin,g positions, and each combination of triplet and triplet position has a particular subscore. A score is then calculated for the potential target site by combining subscores for the first, second, and third triplets. The selecting, determining and calculatina steps are then repeated at least once on a further potential target site comprising first, second and third triplets at first, second and third positions of the further potential target site to determine a further score. Output is then provided of at least one potential target site with its score. In some methods, output is provided of the potential target site with the highest score. In some methods, output is provided of the n potential target sites with the highest scores, and the method further comprises providing user input of a value for n. In some methods, the subscores are combined by forming the product of the subscores. In some methods, the correspondence regime comprises 64 triplets, each having first, second, and third corresponding positions, C1 and 192 subscores.

In some methods, the subscores in the correspondence regime are determined by assigning a first value as the subscore of a subset of triplets and corresponding positions, for each of which there is an existing zinc finger protein that Z 6 comprising a finger that specifically binds to the triplet from the same position in the existing zinc finger protein as the corresponding position of the triplet in the correspondence regime, assigning a second value as the subscore of a subset of triplets and corresponding positions, for each of which there is an existing zinc finger protein that comprises a finger that specifically binds to the triplet from a different position in the existing zinc finger protein than the corresponding position of the triplet in the correspondence regime., and assigning a third value as the subscore of a subset of triplets and corresponding positions for which there is no existing zinc protein comprisin a 0 -D 9 finger that specifically binds to the triplet In sorne methods, a context parameter with the subscore of at least one of the first, second and third triplets to give a scaled subscore of the at least one triplet. In sorne rnethods the context parameter is combined xvith the subscore when the target site comprises a base sequence 5'NNGK3', wherein NNG is the at least one triplet.

In another aspect, the invention provides methods of designing a zinc finger protein- Such methods use a database coi-n?rlsljio designations for a plurality of zinc finger proteins, each protein comprising at least first, second and third fingers, and subdesignations for each of the three fingers of each of the zinc finger proteins, a correspondi] n,-, nucleic acid sequence for each zinc finger protein, each sequence comprising at least first, second and third triplets specifically bound by the at least first, second and third fingers respectively in each zinc finger protein, the first, second and third triplets being arranged in the nucleic acid sequence (')'-5') in the same respective order as the first, second and third fingers are arranged in the zinc finger protein (N terminal to C-terminal), A target site is provided for design of a zinc finger protein, the target site comprising. continuous first, second and third triplets in a Y-Y order. For the first, second and third triplet in the target site, first, second and third sets of zinc finger protein(s) in the database are identified, the first set comprising zinc finger protein(s) comprising a finger specifically binding to the first triplet in the target site, the second set comprising zinc finger protern(s) compn sing a finger specifically binding to the second triplet in the taraet site, the third set comprising zinc finger protein(s) comprising a finger Z.) specifically binding to the third triplet in the target site. Designations and Z subdesignations of the zinc finger proteins in the first, second, and third sets identified in step (c) are then output. Some method further comprise producing a zinc finger protein that binds to the target site comprising a first finger frorn a zinc finger protein from the 7 first set, a second finger from a zinc finger protein from the second set, and a third finger frorn a zinc finger protein from the third set Some methods further comprises identifying subsets of the first, second and third sets. The subset of the first set comprising zinc finger protein(s) comprising a finger that specifically binds to the first triplet in the target site from the first finger position of a zinc finger protein in the database. The subset of the second set comprising zinc finger protein(s) comprises a finger that specifically binds to the second triplet in the tarcyet site frorn the second finger position in a zinc finger protein in the database- the C. W.71 1 subset of the third set comprises a zinc fing-er protein(s) comprising a finger that specifically binds to the third triplet in the target site from a third finger position in a zinc finger protein in the database. Designations and subdesignations of the subset of the first, second and third sets are Output. A zinc finger protein comprising a first finger frorn the first subset, a second finger frorn the second subset, and a third finger frorn the third subset is then produced. I n sonne of the above rnethods of design, the target site Is provided by user Input. In sorne methods, the target site is provided by one of the target site selection methods described above.

The invention further provides computer program products for implementing any of the methods described above. One computer program product implements methods for selectino a tar(set site within a polynucleotide for targeting by a zinc finger protein. Such a product comprises (a) code for providing a polynucleotide sequence, (b) code for selecting a potential target site within the polynueleotide sequence., the potential target site comprising first, second and third triplets of bases at first, second and third positions in the potential target site; (c) code for calculating a score for the potential target site from a combination of subscores for the first, second, and third triplets, the subscores being obtained from a correspondence regime between triplets and triplet position, wherein each triplet has first, second and third corresponding positions, and each corresponding triplet and position has a particular subscore; (d) code for repeating steps (b) and (c) at least once on a further potential target site comprising first, second and third triplets at first, second and third positions of the further potential target site to determine a further score, e) code for providing output of at least one of the potential target site with its score., and (0 a computer readable storage medium for holdine, the codes.

8 The invention further provides computer systems for Implementing any of the methods described above. One such system for selecting a target site within a polynucleotide for targeting by a zinc finger protein, comprises (a) a memory, (b) a system bus, and (c) a processor. The processor is operatively disposed to:(1) -.provide or receive a polynucleotide sequence; (2) select a potential target site within the polynucleotide sequence., the potential target site comprising first, second and third triplets of bases at first, second and third positions in the potential target site., (3) calculate a score for the potential target site frorn a combination of subscores for the first, second, and third triplets, the subscores being obtained frorn a correspondence regime between triplets and triplet position, wherein each triplet has first, second and third corresponding positions, and eacli corresponding triplet and position has a particular subscore,. (4) repeat steps (.2) and (3) at least once on a further potential target site comprising first, second and third triplets at first, second and third positions of the further potential target site to determine a further score., (5) provide output of at least one of the potential target site with its score A further computer program product for producing a zinc finger protein comprises: (a) code for providing a database comprising designations for a plurality of zinc finger proteins, each pi-otein comprising at least first, second and third fingers; subdesignations for each of the three fingers of eacii of the zinc finger proteins, a corresponding nucleic acid sequence for each zinc finger protein, each sequence comprising at least first, second and third triplets specifically bound by the at least first, second and third fingers respectively in each zinc finger protein, the first, second and third triplets being arranged in the nucleic acid sequence (Y-5') in tile same respective order as the first, second and third fingers are arranged in. the zinc finger protein (N terminus to C-terminus),- (b) code for providing a target site for design of a zinc finger protein, the target site comprising at least first, second and third triplets. (c) for the first, second and third triplet in the target site, code for identifing first, second and third sets of zinc finger protein(s) in the database, the first set comprising zinc finger protein(s) compnsing a finger specifically binding to the first triplet in the target site, the second set comprising a finger specifically binding to the second triplet in the target site, the third set comprising a finger specifically bindincl, to the third triplet In the target site., (d) code for outputting designations and subdesi nations of the zinc finger proteins in the first, 9 second, and third sets identified in step (c) and, (e) a compute readable storage medium for holding the codes.

The invention further provides a system for producing a zinc finger protein. The system comprises (a) a rnernory- (b) a system bus; and (c) a. -processor.

The processor is operatively disposed to:(1) provide a database comprising designations for a plurality of zinc finger proteins, each protein comprising at least first, second an cl third fingers, subclesignations for each of the three fingers of each of the zinc finger proteins. a corresponding nucleic acid sequence for each zinc finger protein, each sequence comprising at least first, second and third triplets specifically bound by the at least first, second and third fincrers respectively in each zinc finger protein, the first, second and third triplets being arranged in the nucleic acid sequence (Y- Y)in the same respective order as the first, second and third finoers are arranged in the zinc finger protein (N-teri-ninus to C-teri-ninus),- (2) provide a target site for design of a zinc finger protein, the target site comprising at least first, second and third triplets, (3) for the first, second and third triplet in the target site, identify first, second and third sets of zinc finger protein(s) in the database, the first set coi-npi-is'LIIQ zinc finger protein(s) comprising a finger specifically binding to the first triplet in the target site, the second set comprising a finger specifically binding to the second triplet In the target site, the third set comprising a finger specifically binding to the third triplet in the target site; and (4) output designations and subdesignations of the zinc finger proteins in the first, second, and third sets identified in step (3).

BRIEF DESCRAPTION OF THE FIGURES Fig. 1 shows a chart providing data that the presence and number of sub sites in a target site bound by a zinc finger protein correlates with binding affinity.

Fig, 2 shows a three finger zinc finger protein bound to a target site CI C1 containing three D-able subsites.

Fig. 3 shows the process of assembling a nucleic acid encoding a designed ZFR Figs 4 and 5 show computer systems for implementing methods of target site selection and zinc finger protein design Fig. 6 shows a flow chart of a method for selecting a target site containing a D-able subsite within a target sequence Fig. 7A. shows a flow chart for selecting a target site within a target sequence using, a correspondence regime.

Fig. 7B shows a flow chart for designing a ZFP to bind a desired target site using a database.

Fig. 8A. is an entity representation diaggrain of a ZFP database.

Fig. 8B is a representation of a ZFP database.

DEFINITIONS A zinc finger DNA binding protein is a protei - seggi-nent within a larger 1 1 in oi protein that binds DNA in a sequence-specific manner as a result of stabilization of protein structure through cordination on of zinc ion. The teri-n zinc finger DNA binding protein is often abbreviated as zinc finger protein or ZFP_ A designed zinc finger protein is a protein not occurring in nature whose design/cornposition results principally frorn rational criteria. Rational criteria for desicn include application of substitution rules and computerized algorithms for processing information in a database storing information of existing ZFP designs and binding data..

A selected zinc finger protein is a protein not found in nature whose production results primarily frorn an empirical process such as phage display.

The term naturally-occurring, is used to describe an object that can be found in nature as distinct frorn being artificially produced by man. For example, a polypeptide or polynucleotide sequence that is present in an organism (Including viruses) that can be isolated from a source in nature and which has not been intentionally modified by man in the laboratory is naturally-occurring. Generally, the term naturally-occurring refers to an object as present in a non-pathological (undiseased) individual, such as would be typical for the species.

A nucleic acid is operably linked when it is placed into a functional relationship with another nucleic acid sequence. For instance, a promoter or enhancer is operably linked to a coding sequence if it increases the transcription of the coding sequence. Operably linked means that the DNA sequences being linked are typically contiguous and, where necessary to join two protein coding regions, contiguous and in reading frame. However, since enhancers generally function when separated from the promoter by up to several kilobases or more and intronic sequences may be of variable lengths, some polynuelcotide elements may be operably linked but not contiguous.

A specific binding affinity between, for example, a ZFP and a speel ic target site means a binding affinity of at least 1 x 10' M-'.

The terms "modulating expression" "Inhibiting expression" and "activating expression" of a gene refer to the ability of a zinc finger protein to activate or inhibit transcription of a gene. Activation includes prevention of subsequent transcriptional inhibition (i.e., prevention of repression of gene expression) and inhibition includes prevention of subsequent transcriptional activation (i.e., prevention of gene activation).

Modulation can be assayed by determining ally parameter that is indirectly or directly affected by the expression of the tarOet cleric. Such parameters include, c.(,., changes in 1:1 C C> R-NA or protein levels, changes ill protein activity, chan-es ill product levels, changes Ill downstream gene expression, changes in reporter gene transcription (luciferase, CAT, beta-galactosildase, GFP (see, e.g., Mistill & Spector, Nalure Biolechnology 15:961-964 (1997)),- changes in signal transduction, phosphorylation and dephosphorylation, receptor ligand interactions, second messenger concentrations (e.g., cGNIP, cAMP, 1P3), and Ca2+), cell growth, neovascularIzation, M Wiro, m i,ii,o, úmd ex vivo. Such functional effects can be measured by any rneans known to those skilled in the art, e.lo., measurement of RNA or protein levels, measurement of RNA stability, identification of downstream or reporter gene expression, via chemilunlinescence, fluorescence, colorimetric reactions, antibody binding, inducible markers, ligand binding assays.

changes in intracellular second messengers such as cGN4P and inositol triphosphate (1P3), changes in Intracellular calcium levels, cytokine release, and tile like.

A "regulatory domain" refers to a protein or a protein subsequence that has transcriptional modulation activity. Typically, a regulatory domain is covalently or non covalently linked to a ZFP to modulate transcription. Alternatively, a ZFP can act alone, without a regulatory domain, or with multiple regulatory domains to modulate transcription.

A D-able subsite within a target site has the motif 5%NGKY. A target site containing one or more such motifs is sometimes described as a D- able target site. A zinc finger appropriately designed to bind to a D-able subsite is sometimes referred to as a D-able finger. Likewise a zinc finger protein containing at least one finger designed or selected to bind to a target site including at least one D-able subsite is sometimes referred to as a D-able zinc finger protein.

12 DETAILED DESCRIPTION

1. General In one aspect, the invention is directed to rriethods ofselecting appropriate segments within a preselected target gene for design of a zinc finger protein intended for use in modulating or detecting the gene. The size of a potential target gene can vary widely from around 100 to several 100,000 bp. A zinc finger protein can bind to a small subsequence or target site within such gene. For example, zinc finger proteins containing three fingers typically bind to nine or ten bases of a target gene. The invention provides criteria and methods for selectines optirnurn subsequence(s) from a target gene for targetIng by a zinc finger protein.

Sonne of the methods of target site selection seek to identify one or more target segyments having a DNA i-notif containing one or inore so-called Dable subsites. A D-able subsite is defined by a characteristic DNA sequence formula as discussed in detail below. A zinc finger protein is able to bind such a motif in a rnanner such that at least one component finger of the zinc fingel- protein contacts an additional base outside the three base subsite usually bound by a finger. If two D-able sites are present in the target segment, then two component fingers of a zinc finger protein can each bind to four bases of the target site. If three Dable subsites are present in the target segment, then three component fingers of zinc finger protein can each bind to four bases in the target site. In general zinc finger proteins binding to target sites containing at least one D-able subsite show higher bindinor affinity than zinc finger proteins that bind to target segments lacking Z a D-able subsite. Likewise, zinc finger proteins binding, to a target site with two D-able subsites generally show higher binding affinity than zinc finger proteins that bind to a target site with one D-able subsite, and zinc finger proteins with three D-able subsites generally show higher binding affinity than zinc fincrer proteins that bind to a target site with two D-able subsites. Although an understanding of mechanism is not required for practice of the invention, it is believed that the higher binding affinity results from the additional interactions possible between a zinc finger and four bases in a target segment relative to the interactions possible between a zinc finger and three bases in a target 3 segment. In general, the potential for high affinity binding of target segments with D-able subsites makes them the target sites of choice frorn within target genes for design of zinc finger proteins because higher binding affinity often results in a greater extent of, and/or greater specificity in, modulation of a target gene.

Other methods of the invention are directed to selection of target segments within target genes by additional or alternative criteria to the D-able subsite. The principal criteria for selection of target segments in such methods are provided in the form of a correspondence regime between different triplets of three bases and the three possible positions of a triplet within a nine-base site (i.e., bases 1-3, 4-6 and 7-9), An exemplary correspondence regime is shown in Table 1. The correspondence regime provides different values for different combinations of triplet and triplet position within a target site. A potential target site within a target gene 'is evaluated by determining a score for the site by cornbining subscores for its component triplets obtained from the correspondence regime. The scores of different potential target sites are compared, with a high score indicating desirab1'ty of a particular segment as a tar(Yet site for design of zinc c> W finger bindiii(,.,, protein.

In another aspect, the invention provides methods of designing zinc finger proteins that bind to a preselected target site. These methods can, of course, be used following the preselection of target sites according to the procedures and criteria W.7 described above. The methods of desi-n use a database containing information about previously characterized zinc finger proteins. This information includes names or other designations of previously characterized zine finger proteins, the amino acid sequence of their component fingers, and the nucleotide triplets bound by each finger of the proteins.

Information in the database is accessed using an algorithm that allows one to select fingers from different previous designs for combination in a novel zinc finger protein having specificity for a chosen target site.

11. Zinc Finger Proteins Zinc finger proteins are formed frorn zinc finger components. For example, zinc fine-er proteins can have one to thirty-seven fingers, commonly having 2, 3, 4, 5 or 6 fingers. A zinc finger protein recognizes and binds to a target site (sometimes referred to as a target segment) that represents a relatively small subsequence within a target gene. Each component fincer of a zinc finger protein can bind to a subsite within the target site. The subsite includes a triplet of three contiguous bases all on the same strand (sometimes referred to as the target strand). The subsite may or rnay not also include a fourth base on the opposite strand that is the complement of the base immediately 3' of the three contiguous bases on the target strand. In many zinc finger 14 proteins, a zinc finger binds to its triplet subsite substantially independently of other fingers in the same zinc fin(,er protein. Accordingly, the binding specificity of zinc 1 1 1 finger protein containing multiple fingers is usually approximately the ag regate of the -19 of its component g 1 1 spec] 1 1 fingers. For example, If a zinc finger protein is formed from first, second and third fingers that individually bind to triplets XXX, YYY, and ZZZ, the binding specificity of the zinc finger protein is YXXX YYY ZZZY.

The relative order of fingers in a zinc finger protein frorn N-terminal to C terminal determines the relative order of triplets in the Y to 5' direction in the target. For example, if a zinc finger protein comprises frorn N-teri-ninal to C- terminal the first, Z-D second and third fingers mentioned above, then the zinc finger protein binds to the target segment YXXXWY=5'. If the zinc finger protein comprises the fingers in another order, for example, second finaer, first finger, third finger, then the zinc fin-er protein binds to a target segment comprising a different permutation of triplets, in this example, 3'YYYXX=Z5' (see BerIc, & Siil,.ciet?ce 271, 1081-1086 (1996)). The assessment of binding properties of a zinc finger protein as the aggregate of its component fingers is, P- however, only approximate, due to context-dependent interactions of multiple fingers binding in the sanne proteirl Two or more zinc finger proteins can be linked to have a target specificity that is the aggregate of that of the component zinc finger proteins (see e.g., Kim & Pabo, PNAS 95, 2812-2817 (1998)). For example, a first zinc finger protein having first, second and third component fingers that respectively bind to XXX, WY and ZZZ can be linked to a second zinc finger protein having first, second and third component fingers with binding specificities, AAA, BBB and CCC. The binding specificity of the combined first and second proteins 'Is thus YXXXYW= AAAWBWC5', where the underline indicates a short intervenina region (typically 0-5 bases of any type). 1n this situation, the target site can be viewed as comprising two target segments separated by an intervening Segment.

Linkage can be accomplished using any of the following peptide tinkers.

T G E K P: (Lilu et al., 1997, siipra.),- (G4S)n (Kirn et al--- 1WAS 93), 1156-1160 (1996.)l GGRIkGGGS., I-RQWGIERP, LRQKDGGGSERP,- KD(G3S)2 ERP. Alternatively, flexible linkers can be rationally designed usill(y cornputer prograrn capable of rnodeling C, Z Z:1 both DNA-binding sites and the peptides themselves or by phage display methods. In a C1 further variation, noncovalent linkage can be achieved by fusing two zinc finger proteins with domains promoting heterodimer formation of the two zinc finger proteins. For example, one zinc finger protein can be fused with fos and the other with jun (see Barbas et al., WO 95/11943 1).

Linkage of two zinc finger proteins is advantageous for conferring a unique binding specificity within a mammalian genome. A typical mammalian diploid genome consists of 3 x 109 bp. Assurning that the four nucleotides A, C, G, and T are randomly distributed, a given 9 bp sequence is present -23,000 times. Thus a ZFP recognizin,g a 9 bp target with absolute specificity would have the potential to bind to -23,000 sites within the genome. An 18 bp sequence is present oncein 3.4 x 10 10 bp, or about once In a randorn DNA sequence whose complexity i's ten tirnes that of a marrinnalian genorne.

A component finger of zinc finger protein typically contains about 30 arnino acids and has the following i-notif (N-C) Cys- (X) 2-4-Cys-X. X. X. X. X. X. X. X. X. X. X. X-His- (X) -,-,-His -1 1 2 3 4 5 6 7 The two invariant histidine residues and two invariant eystelne residues in a single beta turn are co-ordinated through zinc (see, Ber. & Shl,,S. 'Cietice 271, 1081 1085 (1996)). The above rnotif shows a numbering convention thatis standard in tile field for the realon of a zinc fincer conferrim, binding specificity. The amino acid on the

In --> left (N-teri-ninal side) of the first invariant His residues is assigned the number +6, and other amino acids flirther to the left are assigned successively decreasing numbers. The alpha helix begins at residue 1 and extends to the residue following the second conserved histidine. The entire helix is therefore of variable length, between 11 and 13 residues.

The process of designing or selecting- a nonnaturally occurring or variant ZFP typically starts with a natural ZFP as a source of framework residues. The process of design or selection serves to define nonconserved positions (i.e., positions - 1 to +6) so as to confer a desired binding specificity. One suitable ZFP is the DNA binding domain of the mouse transcription factor Zif268. The DNA binding domain of this protein has the amino acid sequence YACI?VESCIDRRFSRSDEURHIRIHTGQKP (F 1) FQCRICMRNFSRSDI-ILTTHIRTHTGEKP (F2) FACIDICiGRKFARSDERKRHTKIHLRQK (F3) and b 1 nds to a target S' GCG TGG GCG Y.

16 Another suitable natural zinc finger protein as a source of framework residues is Sp- 1. The Sp- 1 sequence used for construction of zinc finger proteins corresponds to amino acids 531 to 624 In the Sp-1 transcription factor. This sequence is 94 amino acids in length- The amino acid sequence of Sp-1 is as follows PGKKKQHICHIQGCGKVYGKTSHLRAHLRWHTGER.P FMCTWSYCGKRFTRSDELQRHKRTHTGEKK FACPECPKP,FMRSDFILSKHIKTHQNKKG Sp- 1 binds to a target site 5'GGG GCG GGGY.

An alternate form of Sp- 1, an Sp- 1 consensus sequence, has the following arnino acid sequence:

iyieklrti(-,s(,d I)GKKKQHACI)ECGKSFSKSSHLRAHQRI-HTGERP Y-KCPECGKSFSRSDELQRHQRTHTGEKP Y-KCPECGKSFSRSDHLSKHQRTHQNK-KG (lower case letters are a leader sequence from Shi & Berga, Chennsiry and Biology 1, 83-89. (1995). The optirnal binding sequence for the Sp- 1 consensus sequence is 5'GIGGGICGGGGY. Other suitable ZFPs are described below.

There are a nuniber ofstibstitution rules that assist rational design of sorne zinc finger proteins (see DesJarials & Berg, 1WAS 90, 2256-2260 (1993); Choo & Klug, 1INAS 91, 11163-11167 (1994),- DesJarials & Berg, 1WAS 89, 7345-7349 (1992); Jamieson et al., supra,. Choo et al- WO 98/53057, WO 98/530581- WO 98/53059 WO 98/53060). Many of these rules are supported by site-directed mutagenesis of the three fin er domain of the ubiquitous transcription factor, Sp- 1 (Desj'arials and Berg, 1992.

9 1 1993) One of these rules Is that a 5' G in a DNA triplet can be bound by a zinc finger incorporating argyinine at position 6 of the recognition helix. Another substitution rule i 1 1 1 1 1 1 is that a G in the middle of a subsite can be recognized by including a histidine residue at position 3 of a zinc finger. A further substitution rule Is that asparagine can be incorporated to recognize A in the middle of triplet, aspartic acid, glutamic acid, senine or threonine can be incorporated to recognize C In the iniddle. of triplet, and amino acids with srnall side chains SLIch as alanine can be incorporated to recognize T in the middle of triplet. A further substitution r-ule is that tile 3' base of triplet subsite can be recognized by incorporating the following amino acids at position - 1 of the recognition helix,.

arginine to recoanize G, glutarnine to recognize A, glutarnic acid (or aspartic acid) to 17 recognize Q and threonine to recognize T. Although these substitution rules are useful in designing zinc finger proteins they do not take into account all possible target sites.

Furthermore, the assumption underlying the rules, namely that a particular arnino acid in a zinc finger is responsible for binding to a particular base 'In a subsite is only approximate. Context-dependent interactions between proximate amino acids in a finger or binding of multiple amino acids to a single base or vice versa can cause variation of the binding specificities predicted by the existing substitution rules.

The technique of phage display provides a largely empirical illeans of ceneratincr z"iic fincrer proteins with a desired target specificity (see c.(,., Rebar, US w 2, 1 1 1 1 1 13 5,789,538; Choo et al., WO 96/06166-, Barbas et al., WO 95/1943 1 and WO 98/543111 Jarnieson et al., supra). The method can he used in conjunction with, or as an alternative to rational desin. The method involves the Creneration of diverse libraries of rnutagenized zinc finger proteins, followed by the isolation of proteins with desired DNA binding properties using affinity selection methods. To use this rnethod, the experimenter typically proceeds as follows. First, a gene for a zipc finger protein is i-nutagenized to introduce diversity into regions important for binding specificity and/or affinity. In a typical application, this is accomplished via randonnization of a single finger at positions -1, +2, +'), and +6, and sometimes accessory positions such as + 1, +5, + 8 and +10. Next, the mutagenized gene is cloned into a phage or phagernid vector as a fusion with gene 111 of a filamentous phage, which encodes the coat protein p11.1. The zinc finger gene is inserted between segments of gene Ill encoding the membrane export signal peptide and the remainder of plIl, so that the zinc finger protein is expressed as an amino-terminal fusion with pIll or in the mature, processed protein. When using phagemid vectors, the mutagenized zinc finger gene may also be fused to a truncated version of gene 111 encoding, minimally, the C-terminal region required for assembly of pIll. into the phage particle. The resultant vector library is transformed into E. coli and used to produce filamentous phage which express variant zinc finger proteins on their surface as fusions with the coat protein plII. If a phagernid vector is used, then the this step requires superinfection with helper phage. The phage library is then incubated with target DNA site, and affinity selection methods are used to isolate phage which bind target with high affinity from bulk phage. Typically, the DNA target is immobilized on a solid support, which is then washed under conditions sufficient to remove all but the tightest binding phage. After washina, any phage remaining on the support are recovered via clution Is under conditions which disrupt zinc finger - DNA binding. Recovered phage are used to infect fresh E. coli., which is then amplified and used to produce a new batch of phage particles. Selection and amplification are then repeated as many times as is necessary to enrich the phage pool for tight binders such that these may be identified using sequencing and/or screening methods. Although the rnetliod is illustrated for plII fusions, analogous principles can be used to screen ZFP variants as pV111 fusions Zinc finger proteins are often expressed with a heterologous dornam as fuslon proteins- Common dorrialns for addition to the ZFP include, e.g., transcription factor domains (activators, repressors, co-activators, co-repressors), silencers, oncogenes (e.g., i-nyc, jun, fos, iiiyb, max, rnad, rel, ets, bel, iiiyb, mos fannily rnernbers etc.)- DNA repair enzymes and their associated factors and i-noditici-s,. DNA rearrangerrient enzymes and their associated factors and modifiers, cl-ii-omatin associated protelns and their modilfiers (e.g. kinases, acetylases and deacetylases), and DNA i- nodifyliio enzyines (e.g., rnethyltransferases, topoisornerases, helicases, ligases, kina.ses, phospilatases, polyrnerases, endonucleases) and their associated factors and modifiers. A preferred dornain for fusing with a Z17P when the ZFP is to be used for represssing expression of a target gene is a the KRAB repression dornain frorn the hurnan KOX-1 protein (Thiesen et aL, New Biolo,<,i.,l 2, -')6')-')74 (1990)-1 Mar(,oliii et al., lli-o(,.. NolL Acld Sci. USA 91, 450945 1 3) ( 1994), Pengue et al., Nuel. Acid.s 22:2908-29 14 (1994). Witzgall et aL, Pi-oc. NwL Acad. Sci. (7.A 9 1, 45 14-45 18 (1994). Preferred domains for achieving activation include the HSV VP 16 activation domain (see, e.g., Hagi-nann et al., J. Virol.

71, 5952-5962 (1997)) nuclear hormone receptors (see, e.g., Torchia et al. , Curr. Opin.

Cell. Biol. 10:373-3833 (1998)), the p65 subunit of nuclear factor kappa B (Bitko & Bark J ViroL 72:5610-5618 (1998)and Doyle & Hunt, Neiii-oi-el)oi-t 8:2937-2942 (1997')),- L11.1 et al., Ccincer Gene Ther. 5:3-28 (1998)), or artificial chirneric functional domains such as VP64 (Selfpal et al., EMBO.I. 11, 4961-4968 (1992)), An important factor in the administration of polypeptide compounds, such as the ZFPs, is ensuring that the polypeptide has the ability to traverse the plasma membrane of a cell, or the membrane of an intra-cellular compartment such as the nucleus- Cellular membranes are composed of llpid-proteln bilayers that are freely permeable to srnall, nonionic 11pophilic compounds and are inherently impermeable to polar compounds, macromolecules, and therapeutic or diagnostic agents. However, 19 proteins and other compounds such as hposomes have been described, which have the ability to translocate polypeptides such as ZFPs across a cell membrane.

For example, "i-nembraiie translocation polypeptides" have amphiphilic or hydrophobic amino acid subsequences that have the ability to act as membrane translocating carriers. In one embodiment, homeodornain proteins have the ability to translocate across cell membranes. The shortest internalizable peptide of a homeodomain protein, Antennapedia, was found to be the third helix of the protein, from amino acid position 43 to 58 (see, e.g., Prochlantz, Curreifl Opinion in Neiii- obioloM, C629-634 1996)). Another subsequence, the h (hydrophobic) dornain of signal peptides, was found to have sirnilar cell mernbrane translocation characteristics (see, e.g., Lin el. al.,.1. Biol.

Chem. 270 1 4255-14258 (1995)).

Examples of peptide sequences which can be linked to a ZF11 of the invention, for facilitating uptake of ZFP into cells, include, but are not lirnited to.. an 11 anii-no acid pept[de of the tat protein of HIV,. a 20 residue peptide sequence which corresponds to amino acids 84-103 of the p 16 protein (see Fahracus el al. , CUrrent Biol(," 684 (1996))-1 the third helix of the 60-arnino acid Iong homeodomain of Antennapedia (Deross- el al., J Biol. Cheni, 269:10444 (1994))- the 11 region of a signal pide such as the Kaposi fibroblast growth factor (K-FGF) h region (Lin el al., siipi-a)- epti or the VP22 translocation domain frorn HSV (Elliot & O'Hare, Cell 88:223- 233 (1997)).

Other suitable chemical moieties that provide enhanced cellular uptake rflay also be chemically linked to ZFPs.

Toxin molecules also have the ability to transport polypeptides across cell membranes. Often, such molecules are composed of at least two parts (called "binary toxins"): a translocation or binding domain or polypeptide and a separate toxin domain or polypeptide. Typically, the translocation domain or polypeptide binds to a cellular receptor, and then the toxin is transported into the cell. Several bacterial toxins, including Clostridium pefritigc,.,is iota toxin, diphtheria toxin (DT), Pseudonionas exotoxin A (PE), pertussis toxin (PT), Bacillits anthracis toxin, and pertussis adenylate cyclase (CYA), have been used in attempts to deliver peptides to the cell cytosol as internal or amino terminal fusions (Arora el al., J Biol. Chem., 268:33)34-33541 (1993)); Perelle et al., Infect.

Ininnin., 61:5 147-5156 (1993); Stenmark el aL, J Cell Biol, 113:10251032 (199 1)1.

Donnelly et al., PMAS 90:3530-3534 (1993); Carbonettl et al., Abstr. Annu. Meet. Am.

oe. Microbiol. 95:295 (1995)-1 Sebo el al., lt?fcct. Ininnin. 63:385 1 3)857 (1995), Klimpel el al., TWAS II.S.A. 89: 10277-10281 (1992),- and Novak el aL, J Biol. Chem. 267:17186 17193) 1992)).

Such subsequences can be used to translocate ZFPs across a cell membrane. ZFPs can be conveniently fused to or derivatized with such sequences.

Typically, the translocation sequence is provided as part of a fusion protein. Optionally, A linker can be used to link the ZFP and the translocation sequence. Any suitable linker can be used, e.g., a peptide linker.

Ill- SelectioiiofTar,etGeiic Zinc finger pi-otelns can be used to modulate the expression of any target polynucleotide sequence. The sequence can be for example, genornic, eDNA or RNA or an expressed sequence tag (EST). Typically, the target pc)lyiiucleotide includes a gene or a fraarrient thereof The term (-relic is used broadly to include, for example, exonilc regions, intronle regions, 5'UTRs, 3' UTRs, 5' flanking sequences, 3' flankino sequences, promoters, enhancers, transcription stall sites, nbosome bindino sites, regulatory sites, poly-adenylation sites. Target (,cries can be cellular, viral or from other sources including, purely theoretical sequences. Target gene sequences can be obtained from databases, such as GenBank, the published literature or can be obtained de, tiovo.

Target genes include ge nes frorn pathological viruses and microorganisms for which repression of expression can be used to abort infectionExamples of pathogenic viruses include hepatitis (A, B, or C), herpes virus (e.<1y,., WV, HSV- 1, F[SV-6, LISVAI, and CMV, Epstein Barr virus), HIV, ebola, adenovirus, influenza vir-us, flavivir-uses, echovirus, rhinovials, coxsackle virus, cornovirus, respiratory syncytial virus, mumps virus, rotavirus, measles virus, rubella virus, parvovirus, vaccinia vir- us, HTLV vir-us, dengue virus, papillomavirus, i-nolluscum virus, pollovirus, rabies virus, JC virus and arboviral encephalitis vials. Sorne examples of pathogenic bacteria include chlamydia, rickettsial bacteria, mycobacteria, staphylococcl, treptocci, pneumonococci, meningococcl and conococci, klebslella, proteus, serratia, pseudomonas, leglonella, diphtheria, salmonella, bacilli, cholera, tetanus, botulism, anthrax, plague, 1CPtOSPIFOSIS, and Lyrne disease bacteria.

Target genes also include genes firorn hurnan or other mammals that bute to disease. Some such genes are oncogenes, turnor suppressors or growth contri 1 factors that contribute to cancel-. Examples of oncogenes include liMSH2 (Fishel et al., 21 Cell 75, 1027-1038 (1993)) and hMLH I (Papadopoulos et al., cience 263, 1625-1628 (1994)). Some examples of growth factors include fibroblast growth factor, platelet derived growth factor, GM-SCF, VE13F, EPO, Erb-B2, and hGH. Other human genes contribute to disease by rendering a subject susceptible to infection by a microotganism or virus. For example, certain alleles of the gene encoding the CCR5 receptor render a subject susceptible to infection by HIV. Other human (,,cries, such as that encoding arnylold precursor protein or ApoE, contribute to other diseases, such as Alzbeirner's disease.

Tamlet (,cries also include (,cries of hunnan or other rnarni-nals that provide defense mechanisms acalnst diseases due to other sources. For example, turnor repressor 0 genes, provide protection against cancer. Expression of such genes is desirable and zinc finger proteins are used to activate expression.

Target genes also include genes that are normally turned offor expressed at low levels but which through activation can be used to substitute for another defective gene present in some individuals- For example, the fetal hemaglobin genes, which are Z normally inactive in adult hurnans, can be activated to substitute for the defective beta alobin cene with sickle cell ariernia.

Target (,cries also include plant genes for which repression or activation leads to an improvement in plant characteristics, such as improved crop production, disease or herbicide resistance. For example, repression of expression of the FAD2-1 gene results in an advantageous increase in olelc acid and decrease in linoleic and linoleic acids.

1V. Design of Zinc Finger Proteins-To Bind D-able Subsites 1 - Methods The invention provides methods that select a target gene, and identify a target site within the gene containing one to six (or more) D-able subsites. A zinc finger protein can then be synthesized that binds to the preselected site. These methods of target site selection are premised, in pail, on the present inventors' recognition that the presence of one or rnore D-able subsites in a tar(lret seg 1 1 1ingy ---1:>merit confers the potential for higher bindi affinity in a zinc finger protein selected or designed to bind to that site relative to zinc 1-1 -,> finger proteins that bind to target segments lacking D-able subsites. Experimental C> C) evidence supportincl, this instght is provided in Examples 2-9.

W A D-able subsite is a region of a target site that allows an appropriately designed single zinc finger to bind to up to four bases rather than up to three of the target site. Such a zinc finger binds to a triplet of bases on one strand of a double-stranded target segment (target strand) and a fourth base on the other strand (see Fig. 2)-- For a single zinc finger to bind a four base target segment imposes constraints both on the. sequence of the target strand and on the arnino acid sequence of the zinc finger. The target site within the target strand should include the "D-able" subsite rnotif 5'NNGKY, in which N and K are conventional 1UPAC-1UJ3 aiiibl,,,,ulty codes. A zinc finger for bindine to such a s'te should nclude an arolmne residue at position - 1 and an aspartic acid, (oi- less preferably a (-,]itaiiilc acild) at position +2. The arginine residue at position I interacts with the G residue in the D-able subsite. The aspailic acid (or (-',lutai-nic acild) residue at position + 2 of the zinc finger interacts with the opposite strand base complementary to the K base in the D-able subsite. It is the interaction between aspartic acid (syrnbol D) and the opposite strand base (fourth base) that confers the narne D-able 15 subsite. As is apparent from the DabIC SUbsite formula, there are two subtypes of D-able subsites- 5'NNiGGY and 5'NNGEY. For the fori-ner subsite, the aspartic acid or glutainic aid at position +2 of a zinc finger interacts with a C in the. opposite strand to the D-able cj 1 1 1 subsite. In the latter subsite, the aspartic acid or gkitarnic acid at position +2 of a zinc finger interacts with an A in the opposite strand to the D-a.ble subsite. In general, NNGG is preferred over NNGT.

In the design of a zinc finger protein with three fingers, a target site should Z:1 _n 1-1) Z:1 be selected in which at least one finger of the protein, and preferably, two or three fingers Z.

have the potential to bind a D-able subsite in a target site. Such can be achieved by selecting a target site from within a larger target gene having the formula 5'NNx aNy bNzcY, whereir.

wherein each of the sets (x, a), (y, b) and (z, c) is either (,N, N) or (Q K) at least one of (x, a), (y, b) and (z, c) is (G, K). and N and K are 1UPAC-IUB ambiguity codes.

23) In other words, at least one of the three sets (x, a), (y, b) and (z, c) is the set (G, K) i-neaning that the first position of the set is G and the second position is G or T_ Those of the three sets (if any) which arenot (G, K) are (N, N) meaning that the first position of the set can be occupied by any nucleotide and the second position of the set can be occupied by any nucleotide. As an example, the set (x, a) can be (G, K) and the sets (y, b) and (z, c) can both be (N, N).

In the formula 5'NNx aNy bNzc'', the triplets of NNx aNy and bNz represent the triplets of bases on the target strand bound by the three fingers In a zinc finger protein. The complements of the highlighted bases are the sites of potential fourth base binding on the nontarget strand. If only one of x, y and z is a G, and thils G is followed by a K, the target site includes a single D-able subsite. For example, if only x is G and a is K, the site reads NING KNy bNz w with the D-able subsite highlighted. If both x and y but not z are G and a and b are K, then the tai-(,et site has two overlappino D-able subsites as follows 5'NNG KM; KNz c')' with one such site beino represented 'in bold and the other in italics. If all three of x, y and z a['e G and a, b and c are K, then the target s.,liietit includes three D-able subsites, as follows 5'NNG KNG KNG KY, tile D-able e ZD - subsites belii(-,, represented by bold, italics and underline.

The methods of the invention thus work by selecting a target gene and systematically searching within the possible subsequences of the gene for 1) Z-) 1 17 target sites conforming to the formula 5'N-Nx aNy bNzcY, wherein ZD wherein cacti of (x, a), (y, b) and (z, c) is (N, N) or (G, K); at least one of (x, a), (y, b) and (z, c) is (Q K). and N and K are 1UPAC-lUB ambiguity codes.

In sorne such methods, ever), possible subsequence of 10 contiguous bases on either strand of a potential target gene is evaluated to determine whether it conforms to the above formula, and, if so, how rnany D-able subsites are present. Typically, such a comparison is performed by computer, and a list of target sites conforming to the formula are output. Optionally, such target sites can be output in different subsets according to how rnany D-able subsites are present.

24 In a variation, the methods of the Invention identify first and second target segments, each independently conforming to the above formula. The two target segments in such methods are constrained to be adjacent or proximate (i.e., within about 0-5 bases) of each other In the target gene. The strategy underlying selection of proximate target segments is to allow the design of a zinc finger protein formed by linkage of two component zinc finger proteins specific for the first and second target segments respectively. These principles can be extended to select target sites to be bound by zinc finger proteins with any niumber (f'coiiil)oiiejit finglers. For example, a suitable target site for a nine-fin-er protein would have three componei-it segments, each cotiforiiiin(y to the above formula.

The target sites Ideritilled by the above methods can be subject to further evaluation by other criterlia or cari be used directly for design or selection (if needed) and production of a zinc finger proteln specific for such a site. A further criterion for evaluating potential target sites is proximity to pailicular regions within a gene. If a zinc finger protein is to be used to repress a cellular gene on its own (i.e., without linking the zinc finger protein to a repressing, moiety), then the optimal location appears to be at the site of transcription Initiation, or with-M about 50 bp upstream or downstream, or alternatively within an enharicer elenient to interfere with the formation of the transcription complex (Kirn & Pabo---1. Biol. Chem. (1997) or compete for an essential enhancer bindin(y protein. If, however, a ZFP is fused to a functional domain such as the KRAB repressor dornain or the VP 16 activator domain, the choice of location of the binding site is considerably i-nore flexible and can be outside known regulatory regions.

For example, a KRAB domain can repress transcription of a promoter up to at least 3-kb from where KRAB is bound. Thus, taroct sites can be selected that do not include or overlap segments of significance with target genes, such as regulatory sequences, or polymorphic sites- Other criteria for further evaluating target segments include the prior availability of zinc finger proteins binding to such segments or related segments, and/or case of designing new zinc finaer proteins to bind a given target segment.

Implementation of such criteria in 1he selection process is discussed in further detail below.

Once a target se(,iiient has been selected, a zinc finger protein that binds to the segment can be provided by a variety of approaches. The simplest approach is to ide a precharacterized zinc finger protein from an existing collection that is already prov] Z:1 known to bind to the taract site. However, in many instances, such a zinc finger protein does not exist. An alternative approach uses information in a database of existing zinc finger proteins and binding specificities to design new zinc finger proteins. This approach is described in more detail below. A further approach is to design a zinc finger protein based on substitution rules as discussed above. A still further alternative is t_o select a zinc finger protein with specificity for a given target by an empirical process such as phage display. In some such methods, each component finger of a zinc finger protein Is designed or selected independently of other component fingers. For example, each finger can be obtained frorn a different 1)i-e-exist'ln,,, ZFP. or each finger can be subject to separate randomization and selectt()11 Once a zinc finger protein has been selected, designed, or otherwise ided to a given target segment, the zinc finger protein or the DNA encoding are provi _n W synthesized. Exemplary methods for synthesizing and expressing DNA encoding zinc proteins are described below. The zinc fingler protein or a polynucleotide encoding it can then be used for modulation of expression, or analysis of the target gene containing the tareyet site to which the zinc linger protein binds 2. Qabie Zinc Finoer Protei Ms A zinc finger protein is described as D-able if it contains a finger that can bind to the fourth base of at least one D-able subsite, that is a polynuelcotide sequence 5'NNGMY. A preferred framework for designing, D-able zinc fingers is the human wild type Sp- 1 DNA binding dornain. The target for the hurnan transcription factor Sp- 1 is D 5'GGG GCG GGG')', and fingers 1 and 2 of this protein have an R- 1 D+2 arrangement.

Designed ZFPs can be identical to Sp- 1 except in the recognition helix of each of the three fingers, where the sequences are designed to recognize each of the triplets with which they interact. The mouse ZFP Z1f268, which binds the site GCG TGG GCG, is also suitable, having the RA D+2 arrangement in all three fingers.

Other zinc finger proteins as a source of framework residues for design of zinc fin-er proteins capable of binding to D-able subsites can be obtained from or derived from ZFPs firorn several alternative sources. For example, tile TTK transcriptional regulatory protein of the fruit tly 1)ro.vol)hilci nielaiio,,islei. has been well characterized with regard to both the sequences of its recognition helices and its DNA site. The protein 0 1) has only two fingers and binds to a six base target, so finger 2 interacts with the first 26 DNA triplet and fineer 1 recognizes the second triplet of the site. The site is 5'AAG GATY with a GG type D-able subsite present at the Junction of the first and second triplet, and finger 2 has the R- 1 D+2 sequence. Other suitable ZFPs are found in the unicellular eukaryote The ADR gene product is known to reoulate expression of the ADH gene by binding within the ADH promoter. As described above for TTK, the ADR ZFP binding domain has two fingers, and binds to a six base taroet, TTGGAG. The finoer 2 recooilition helix has the R- 1 D+2 sequence, appropriate for a ZFP bindin- to a tamet site with a D-able subsite.

Z-7> - 10]V. Selection of Tar(-,et Sites by a Corres[)ondence ReL,1111e The invention further provides additional or alternative methods for selectin,g a tar-et site frorn \vithin a tai-,y>et (lene. These inethods are premised, in part, on the insights that different three-base subsites (triplets) bound by individual fingers have different desirabilities for zinc fiii.,,,ei- protein desl,(,,ii, that these different desirabilities can be expressed as numerical values, and that the nurnerical values for the three individual triplets cornprising a tareet site can be combined to Live an overall score for the target site. The relative merits of diffierent tai-()et sites can the be compared from their relative score. The methods work by providing a polynucleotide sequence typically a gene or cDNA within which one wishes to select a target site for detection or modulation by a ZW In practice, one typically provides two sequences for the two strands of a polynucleotide sequence, but for simplicity, the rnethod is illustrated for a single polynucleotide sequence. Frorn within such a polynucleotide sequence, a potential target site of at least 9 bases contiguous first, second and third triplets of bases is selected. The triplets are contiguous in that the first triplet occupies bases 7-9, the second triplet bases 4-6 and the third triplet bases 1-33 of a site, with base 1 in the 5,-33 orientation being designated base 1. This designation of triplets as first, second, and third is arbitrary and could be reversed. However, by designating the first tniplet as occupying bases 7-9, the second triplet bases 4-6 and the third triplet bases 1 -3), tile first, second, and third fin(yers of a three finger ZFP in an NX tern-itnal orlentation bind to the first, second and third triplets of a target site. Viewed in another rnanner, the first, second and third fingers in a zinc finger protein order frorn N terminal to C terminal are respectively specific for the first, second and third triplets in a target site ordered in the Y-5' onientation- A subscore is then determined for each triplet from a correspondence regime between triplets and corresponding positions within a target site. An exemplary correspondence regirne is provided in Table 1. The correspondence regirne Is a rnatrix Z - providing three values for each triplet at its three possible positions within a nine base target site. The table provides three values for each of the 64 possible triplets. For example, consider a potential target site 5'AAA AAG AACY. The A-AC triplet occurs in the first position (bases 7-9) of the target site and is assigned a subscore of 1 from Table 1. The A-AG triplet occurs in the second position of the target site (bases 4-6) and 'is assi, <,ned a subscore of 8. The AAA triplet occurs M the third position of tile target site (bases l-3) and is assi-ned a subscore of8. The subscores of the three triplets 'In the potential target site are then conibined, c.(,., by multiplication or addition or sorne other function. For example, multiplication of the three ti-ip[et subscores gIves a combined score of 1 x 8 x 8 = 64 The process is then repeated for a second potential target site. Subscores are determined for each of the three component triplets of the second potential target site, and a combined score is calculated toi- the second potential target site. The process can then be repeated for fi-ii-ther potential tai-,,et sites. Optionally, the process can be repeated for evei-y possible coiitl,,,, ttotis subsequence ofat least 1) bases in either strand of a target gene of interest. When scores of all potential target sites of 'Interest have been determined Z.) the scores are compared. In general a high score indicates desirability of a target site for design of a Z17P. One or i-nore of the target sites identified with high scores can be outputted together with the score, The desisynation Of Values in the correspondence regime can reflect any criteria that make one triplet SUbsite more desirable than another for zinc finger protein design or selection. The values in the exemplary correspondence regirne of Table 1 reflect availability Of previously characterized ZFPs known to bind a given nucleotide triplet. If for a given triplet in a given position of a target site, there exist one or more previously characterized ZFPs that specifically bind to a target segment including the triplet at the given position, then the combination of the triplet and given position is assigned a score of 10. If for a given triplet at a given position, there are no previously characterized Z17Ps that specitically bind a target site Including the triplet at the given position, but there are one or rnore previously characterized ZFPs that specifically bind to the triplet at a different position, then the triplet is assigned a score of 8_ If for a given 28 triplet and a given position, there are no previously characterized ZF1PS that bind the triplet either at the given position or another position, the triplet and position are assigned a value 1.

The values 10, 8 and 1 are only illustrative, and other values could be used. Furthermore, a more sophisticated assignment of values can be used which also takes into account different binding affinities, specificities and presence of D-able sites, arnong other factors. In such a scherne, combinations of triplets and positions for which prior ZFPs exist with strong binding affinities are typically given higher values than coillbinations of triplet and positions t'(-)i- which there are prior ZFPs with lower binding affinities.

The selection ofpotential tarl(_,et sites a larger sequence and calculation of scores is typically performed by a sultably pro(_, rarniTied computer, which outputs one or rnore potential target site(s) with tlicii- score(s). Optionally, user input can be provided to such a computer to specify how rnany potential target sites should be output. For example, the user can elect to have npotetitial target sites with the highest swres output, where n is at the discretion of the user. The user can also specify a threshold score, which must be equaled or exceeded tor a potential target site to be output.

In a variation ofthe above method, a potential target site can be evaluated based both on values in a correspondence table and on the presence of one or more D-able subsites. Such is achieved by user input of a context parameter to provide a scaled score for one or more combinations of triplet and a particular position, if the context of the triplet indicates presence of a D-able subsite. For example, a triplet 5WNGY followed by an A does not provide a D-able subsite. However, 5'NNGY followed by a K does provide a D-able site. The user can elect to input a context parameter that increases the value of the subscore for the 5'NNG3)' triplet when 5%NGY is followed by a K. The scaled subscore for this triplet is then combined with subscores or scaled subscores for other triplets to give an overall score for a potential target site.

In a further variation, a computer performing, the above analysis is programmed to output certain taroet scoments receivina high scores in pairs determined W -1 1-Y -1) W by their physical proximity to each other. Paired tar,::,et se.(,t-nents both of which receive high scores that occur within about five bases of each other are appropriate targets for the 29 design of six-finger zinc proteins formed by linkage of two component zinc finger proteins each having three fingers.

D Potential target sites identified by the above methods can be subject to further evaluation or can be used directly for design or selection (if needed) and, production of zinc finger proteins. Zinc finger proteins can be designed and synthesized to such target sites using the same methods described for potential target segments containing, D-able subsites described above.

V. Database desi m of ZFPs The Invention provides methods for desi,:,,ii of ZFPs to a preselected target site. These methods are sultable I'01lise Ill conjunction with the methods of target site selection described above, or by other methods of tai,-,et site selection.

In designing a new ZFI', oenerally advantaoeous to rnake use of information inherent ill precharacterized ZFPs and their target sites thereby minimizing the need for de novo design ot. selection- As witliai-(',et site selection, several factors are involved in this process. Desion is flacilltated when, for each triplet subsite in a target site, fingers are, not only available ill existing ZFPs, but such fingers also contact their respective triplet subsites froni the same location ill the existing proteins as ill the proposed design. For example, consider three existin(l pairs of ZFP and target site:

5 >GCG TGG GAC3)', bound by a ZFP with fingers F 1 -F2-F') (where F3 interacts with GCG, F2 with TGG, and F 1 with GAC), 5'AAG GAG GTG')', bound by a ZFP with fingers F4-F5-FiS, and 5'CM TGA GlCAY, botind by a ZFP with fingers F7-H- F9, and a target site 5'GW GAG GCAY for which a ZFP is to be designed. In this situation, the novel protein F7-F5-F3) binds to 5'GCG GAG ISCA31' with each finger in the novel protein occurring in the same relative position in the novel protein as it did in the database proteins from which it was obtained. This design is advantageous because the analogous environment of each finger in the novel ZFP with that of its previous ZFP means that the finger is likely to bind with similar specificity and affinity in the novel ZFP as in the parent. Thus, the general nile that the binding characteristics of a zinc finger protein are the aooreaate of its component fingers is likely to hold.

Novel zinc finger proteins can also be designed frorn component fingers that are available in existing proteins, but not at the same positions as in the protein to be designed. For example, using the set of existing Z17P-site pairs described above, the 1 )o protein F3-F7-175 can be designed to bind sequence 5'GA.G GCA GCG3'. In the novel protein, the fingers occupy ditferent positions than in their respective parental proteins.

Although to an approximation a given finger retains its triplet specificity and affinity irrespective of which position it occupies in a ZFP, in practice, contextual effects are more likely to cause changes 'In specificity and/or affinity of a finger for its triplet subsite when the finger occupies difflerent positions in different zinc finger proteins. Therefore, although ZFPs fori-ned frorn component fingers occupying different positions than in Z:1 z> previously characterized ZF1)s typically still bind to the site, the specificilty or affinity is soi-netli-nes different (typicall.y lower) than expected.

Finally, for preselected tai-,,et sites including a triplet for which no preexisting finger is available, completely novel fingers can be designed or selected using Z:1 rules-based approaches or pha-e display.

The invention provides niethods of systematically using a database W containing infori-nation about existin- ZFPs the design of new ZM for a preselected taroet site according to the principles described above. The organization of a typical 1) cl 1 Z"> database is shown in Table 9. The database typically includes designations for each of a collection of precharacterized ZFPs. The ZFI- 's can be natural ZFPs or variant ZFPs. The designation can be, for exaniple, the nanic or a synibol representino each ZFP. The database also includes subdesignafio 1 -- 1 - ns for each of the fingers in a ZFR Typically, the subdesignations are in the forni ofamino acid residues occupying selected positions in a fin er or fin-ers. For example, in Table, 9 the subdesicnations are the arnino acids 9 C G occupying positions -1 tliroij(rli +6 accordina to conventional numbering. The database further includes a taraet nucleic acid segnient bound by each zinc finger protein. The C1 nucleic acid segment usually includes three triplets of three bases. The three triplets of bases can be included Joined as one sequence or as separate sequences. If bases in a nine base target site are numbered consecutively from the 5' end, a first triplet occupies bases 7-9, a second triplet occupies bases 4-6 and a third triplet occupies bases 1 -3. According to this designation of triplet position within a target segment, the first finger of a zinc finger protein (i.e., closest to N-terminus) binds to the first triplet, the second finger to the second triplet, and the third tinger to the third triplet. The database can also include additional information such as the bindino affinity or dissociation constant of a ZFP for its target site, although such Is riot essential.

Z-1 A target site is provided for design of a zinc finger protein using the database. In some methods, the target site is provided by user input. In other methods, the target site is provided as output from any of the methods of target site selection described above. The target site typically comprises at least 9 bases forming at least three triplets. The three component triplets are designated first, second and third triplets respectively occupying bases 7-9, 4-6 and 1 -3) of the target site, with the S' base being assigned as base 1. For the first triplet in the target site, the computer searches the database for a zinc fin,',,ei- protein(s) containing fingers that bind to the triplet, The computer stores records relating to the zinc finger proteln(s) thereby identified, and their finger(s) that bind to the first triplet. Optionally, the computer distinguishes between zinc finger proteins containing a finger that binds to the first triplet of the target site at the first 1D 1 1 1 1 finger position and in other positions. ll'so, the computer -stores the two subsets of zinc finger protein(s) as separate records. The process 'Is then repeated for the second triplet in the target site. The computer identifies zinc finger protein(s) containing a finger that specifically binds to the second triplet. Optionally, the computer distinguishes between zinc fincer(s) that bind the second triplet fi-orn the second position of an existing zinc finger protein or at a different position. Finally, the computer identifies zinc finger protein(s) containing a fincer that specifically binds to the third triplet of the target site.

Optionally, the computer distinguishes between zinc finger(s) that bind the third triplet from the third position of an existing zinc finger protein or from another position. After W searching for Z17Ps that bind to each of the first, second and third triplets in the target segment, the computer outputs desigmations for the Z17Ps that have been identified and subdesignations of the fingers that bind to the first, second and third triplets. Optionally, the computer provides separate output of a subset of Z17Ps that bind the first triplet from the first finaer position, and a subset of ZFPs that bind the first triplet from other positions; and corresponding subsets of ZFPs that bind the second triplet from the second finger position and from other positions, and of ZFPs that bind the third triplet from the third finger position and from other positions.

The information output by the computer can be used in the design and synthesis of novel zinc finger proteins that bind to a preselected target. For example, if the output includes a ZFP 1 with a finger X that binds the first triplet of the target, ZFP2 that includes a finCer Y that binds to the second triplet of the target, and ZM that includes a finaer Z that binds to the third tripiet of the target, a novel ZFP can be )2 synthesized comprising the fiii(-,,ers XYZ in that order (N-t(.riiiinal to C-terminal). If the computer outputs multiple d1Werent zinc finger proteins that contain multiple different fingers that bind to a given triplet, the user can select between the fingers depending on whether a finger binds to a particular triplet position frorn tile same position in the database protein as in the ZFP to be designed. For example, a ZFP 1 containing fingers XYZ, in which X binds to a tirst triplet ill a target site is generally preferred to a ZFP2 containing fingers ABC, in which finger C binds to the first triplet in a target site. Thus one would typically use finger X rather than C to occupy the first finger position in a ZFP desi-,ned to bind the tareyet se(-yi-nciit. Often the computer prograln identifies two ZFPs, each coiitaiiiiii,(--,, a finger that binds a particular irlpiet, and In each ZFP, the finger W occupies the sarne position in the database protein fi-om which it derives as in the intended design ZW In such cases, one ofien chooses between tile two fingers based on the binding affinity for thelf, respective tar-gels, with hl-her bind no affinity being 1 preferred- Optionally, the computer also provides output ofproposed arnino acid substitutions to one or Illore fingers for tile corresponding triplet(s) bound by the finn.er(s).

Althow,h database analysis is prirnarlly illustrated for precharacterized zinc finger proteins liaviii,,:, three fingers, such databases can alternatively or additionally store information concernino zinc finger proteins with fewer or greater numbers of fincyers. Likewise, such databases can be used Ill the design of zinc finger proteins having fewer or greater than three fingers. For example, sorne databases of the invention store information concerning ZFPs with only two fingers as well as or instead of information concerning ZFPs with three fingers. ZFPs with only two fingers have corresponding target sites with only two tripletsThe Information relating to two- finger ZM can be used in the design of three-finger ZFPs that bind to nine base tarc,,et sites in essentially the same manner described above. However, there is no exact correspondence between the relative positions of two fingers in a two-finger protein with the relative positions of three fingers in a three-finger zinc finger protein. This issue call be addressed in two ways.

---1 Z:1 First, all fingers in a two-finger protein call be etTectively treated as occupying different 1 C1 Z positions than fingers In a three-finger protein. Accordingly, a two finger protein contains a finaer that binds to a given triplet, tile computer outputs tills information and :-> C1 indicates that the fincer does not occur at the same position in the database two-finger Z.) protein as in the three-finger protein to be designed. Alter-natively, the first (N-terminal) finger in a two-finger protein can be considered the equivalent of either the first or second finger in a three-fincrer protein. The second finger in a two-finger protein can be Z1) 0 idered the equivalent of either the second or third finger In a three- finger protein.

cons] Accordingly, if the computer identifies a two finger protein with a first (N-termnal) finger binding to a first triplet in a target site for which a zinc finger p rotein is to be designed, the computer can output that the two fiiic-yei- protein supplies an appropriate finger and at the same position 'In the database protein as In the three finger protein to be desl,(,ned.

VII. Production of ZFPs ZFP polypeptides and nuclelc acids encoding the sarne can be rnade using routine techniques in the field of recombinant genetics. Basic texts disclosing the general methods of use, 'In this invention include Sambrook et al., Moleculai. Cloning, A Lil)oi-a1013 A4,ltill,"1 (2nd ed. 1989)., Knegler, (;eie and A Lahoralmy Manwil ( 1990), and ('iii-i-ejil M Alfoleciilii- Biology (Ausubel et al., eds., 1994))_ In addition, nuelelc acids less than about 100 bases can be custom ordered from any of a variety of commercial sources, such as The Midland Certified Reagent Company (i-ncre@oll(,os.cotii), The Great Anicrican Gene Company (http://www.genco.coi-n), ExpressGen Inc. (\,,,xvw.expi-ess(-yen.com), Operon Technologies Inc. (Alameda, CA). Similarly, peptides can be custom ordered from any of a variety of sources, such as PeptildoGenic (pkiiii@cciiet.coiii), HTI Bio-products, inc.

(http://www.htiblo.com), BMA Blornedicals Ltd (U.K.), Bio.Syntliesis, Inc.

Oligonucleotides can be chemically synthesized according to the solid phase phosphoramidite triester method first described by Beaucage & Caruthers, Tetrahedroti Lells. 22:1859-1862 (198 1), usinu an automatedsynthesizer, as described in Van Devanter et al., Nueleic Acids Res. 12:6159-6168 (1984). Purification of oligonucleotides is by elther denaturing, polyacrylarnide (,cl electrophoresis or by reverse phase RPLC. The sequence of the cloned (-,,cries and synthetic oligonuelcotides can be -Y -:> veri ied after clonim, usino, e.,,., the chain termination method for sequencing double stranded templates of Wallace et al., Gene 16:21-26 (198 1).

Two alternative methods are typically used to create the coding sequences required to express newly designed DNA-binding peptides. One protocol is a PCR-based Z:> assembly procedure that utilizes six overlapping oligonucleotides (Fig. 3). Three 34 obgonuelcoides (oligos 1, 3, and 5 in F1gure 3) correspond to'tniveeaF' sequences that encode portions of the DNA-binding dornain between the recognition helices. These oligonucleotides typically remain constant for all zinc finger constructs. The other three cl specific" oligonucleotides (oligos 2, zC and 6 ki ng. 3) are deigned to encode the recognition helices. These oligonucleotides contain substitutions primarily at positions - 1, 2, 3 and 6 on the recognition helices making thern specific for each of the different DNA-binding domains.

The PCR synthesis is carCed out 0 two steps. First, a double stranded DNA template is created by combking the six obgowcludes Chee unixersal, thee specific) in a four cycle PCR reackon with a low teniperature annealing, step, thereby annealing the obgonudeohdes to forin a DNA "scaffold." The gaps in the scaffold are filled in by high-fidelity thei mostable polynierase, the combination of Taq and Pfu polyrnerases also suffices I n the second phase of construcion, the zinc finger template is amplified by extemal Wines deigned to incorporre rewriction sites at either end for cloning into a shuttle vector or directly into an exl,)j-essioil vector.

An alternative rnethod of doing the newly designed DNA-binding proteins relies on annealing complementary oligonucleotides encoding tile specific re.

gions of the desired ZFI). This paninthr apphcajon requires hat he oligonucleotides be phosphorylated pin to the final ligation step. This is usually performed beRse setting up the annealing reactions In brief the -universal" oligonucleotides encoding the constant regions of the proteins (oli,(,os 1, 2 and 33 ofabove) are annealed with their complementary obgonudeoides. Additionally, the---specific-oligonucleotides encoding the finger recognition helices are annealed with their respective complementary oligonucleotides. These complementary oligos are deigned to fill in the region which was previously filled in by polymerase in the above-i-nentioned protocol. The complementary obgos to the corimmon oligos 1 and finger 3 are engineered to leave ovehanging sequences specific for the restriction sites used in cloning into the vector of choice in the following step. The second assembly protocol differs from the initial protocol in the following aspects: the "scaffold" encoding the newly designed ZFP is composed entirely of synthetic DNA thereby eliminating the polymerase fill-in step, additionally the fragment to be cloned into the vector does not require amplification. Lastly, the design of leaving sequence- specific overhangs eliminates the need for 1 )5 restriction enzyme digests of the inseiling fi-a,,ineiit. Alternative y changes to ZFP recognition helices can be created ttslii(,coiiveiitioiial site-directed i-nutacenesis methods.

Both assembly methods require that the resulting fragment encoding the newly designed ZFP be ligated into a vector. Ultimately, the ZFP-encoding sequence is cloned into an expression vector. Expression vectors that are commonly utilized. include, but are not limited to, a modified pMAL-c2 bacterial expression vector (New England BioLabs or an eukaryotic expression vector, pcDNA (Promega). The final constructs are W verified by sequence analysis.

Any suitable rnethod of protein purification known to those of skill in the arl can be used to purify ZFPs ofthe invention (see, Ausubel, supra, Sarnbrook, supra), In addition, any suitable host can be used for exl)i-essioii, bacterial cells, 'Insect cells, yeast cells, rnarnrnalian cells, and the like.

Expression of a zinc finger protein fused to a rnaltose binding protein (MBP-ZFP) in bacterial strain J M 109 allows tor straightforward purification through an arnylose column (NEB). High expression levels of the zinc finger chimeric protein can be obtained by induction wilth 111TIS since the NW-Z1711 tiusion In the pMal-c2 expression plasmid is under the control of the tac promoter (NEB). Bacteria containing the MBP ZFP fusion plasi-nids are inoculated in to 2xYT medluin containing 10PM ZrIC12, 0.02% glucose, plus 50 p(,,/i-nl arnpicillin and shaken at 3MC- At rnidexponential growth IPTG is added to 0.3 mMand the cultures are allowed to shake. After 3 hours the bacteria are harvested by cent rifugati on, disrupted by sc)iiicatioii or by passage through a french pressure cell or through the use of lysozyme, and insoluble material is removed by centrifugation. The -ZFP proteins are captured on an arnylose-bound resin, washed extensively with buffer containing 20 niM Tris-HG (pH 7.5), 200 niM NaCI, 5 i-nM DTT and 50 gM ZnC12, then cluted with rnaltose in essentially the same buffer (purification Is based on a standard protocol from NEB). Purified proteins are quantitated and stored for biochemical analysis.

The dissociation constants of the punfied proteins, e.g., Kd, are typically characterized via electrophoretic mobility shift assays (EMSA) (Buratowski & Chodosh, in Current Protocols iii Molecular Biolog), pp. 12.2.1-12.2.7 (Ausubel ed. , 1996)).

Affinity is measured by titrating purified protein against a fixed amount of labeled double-stranded ollconueleotide target. The target typically comprises the natural binding site sequence flanked by the 3) bp found in the natural sequence and additional, 36 constant flanking sequences. The natural binding site typically 9 bp for a three-finger protein and 2 x 9 bp + intervenin- bases for a six fiti,,er ZFP. The annealed oligonucleotide targets possess a 1 base 5' overhang which allows for efficient labeling of the target with T4 phage polytiticleotide kinase. For the assay the target is added at a concentration of 1 nM or lower (the actual concentration is kept at le ast 10-fold lower than the than the expected dissociation constant), purified ZFPs are added at various concentrations, and the reaction is allowed to equilibrate for at least 45 rnin. In addition the reaction mixture also contains 10 iiiM Tris (I)H 7.5), 100 ram KG, 1 mM MgC12, 0.1 i-nM ZnC12, 5 i-nM DTT, 10% glycerol, 0.02% BSA. (N-B: in earlier assays poly d(IC) was also added at 10- 100 p g/it L) The equilibrated reactions are loaded onto a 10% polyacrylarrildc gel, which has been pre-run for 45 rnin 'in Tris/glycine btitYer, then bound and unbound labeled target is resolved by electrophoresis at 15OV. (alternatively, 10- 20% gradient Tris-HCI eels, containing a 4% polyacrylarnide stacker, can be used) The dried gels are visualized by atitoi-adioz,,,,i-al)lif or and the apparent Kd is determined by Z;P calculatinu the _n concentraton that u'ves half-niaximal binding protel 1 -- 1 W The assays can also include determining active fractions in the protein preparations. Active fractions are determined by stolchlornctric,cl shifts where proteins are titrated acainst a concentration of tar-et DNA. Titrations are done at 100, 50, and 25% of target (tistially at micrornolar levels).

1:1 IX- Applications of Designed ZFPs ZM that bind to a particular target gene, and the nucleic acids encoding thern, can be used for a variety of applications- These applications include therapeutic methods in which a ZFP or a nucleic acid eiicodiii,,'t adii-i'nistered to a subject and used to modulate the expression of a target gene within the subject (see copending application Townsend & Townsend & Crew Attorney Docket 019496-002200, filed January 12, 1998). The modulation can be in the form of repression, for example, when the target gene residesin a pathological infectim, or in an endogenous exene of the patient, sueli as an oncogene or viral receptoi-, that Is contributing to a disease state, Alternatively, the modulation can be -In the form of activation when activation of expression or increased expression of an endogenous cellular gene can ameliorate a W diseased state. For such applications, ZFPs, or more typically, nucleic acids encoding )7 them are formulated with a pharmaceutically acceptable carrier as a pharmaceutical composition.

Pharmaceutically acceptable carriers are determined in part by the particular composition being administered, as well as by the particular method used to administer the composition. (.see, e.g., Reniiii,toti'.s Pharniaceitical, cietices, 17'l' pd.

1985)). The ZFPs, alone or in combination with other suitable components, can be made into aerosol formulations (i.e., they can be "nebulized") to be administered via inhalation.

Aerosol formulations can be placed into pressurized acceptable propellants, such as diclilorodifluorot-netliane, propane, nitrogen, and the like. Formulations stlitable for parenteral administration, stich as, for example, by intravenous, intramuscular, intradermal, and subcutaneous routes, include aqueous and non-aqueous, isotonic sterile injection solutions, which can contain antioxidants, buffers, bacterlostats, and solutes that render the formulation isotonic with the blood of the intended recipient, and aqueous and le suspensions that can include suspending agents, solubilizers, non-aqueous steri 1 n thickenine acents, stabilizers, and preservatives. 1 1 1 Z:1 -oiiipos't'ons can be administered, for example, by intravenous infusion, orally, topically, intrapentoneally, intravesically or intrathecally. The formulations of compounds can be presented in unit-dose or multidose scaled containers, stich as arnpules and vials. Inlection solutions and suspensions can be prepared from sterile powders, granules, and tablets of the kind previously described.

The dose administered to a patient should be sufficient to effect a beneficial therapeutic response in the patient over tirne. The dose is determined by the efficacy and Kd of the particular ZFP employed, the target cell, and the condition of the patient, as well as the body weight or surface area of tile patient to be treated. The size of the dose also is determined by the existence, nature, and extent of any adverse side-effects that accompany the administration of a particular compound or vector in a particular patient In other applications, ZFPs are used in diagnostic methods for sequence 'fie detection of target nuclelc acid in a sample. For example, ZFPs can be used to speci detect variant alleles associated with a disease or phenotype in patient samples. As an example, ZFPs can be used to detect the presence of particular mRNA species or eDNA in a complex mixtures of niRNAs or cDNAs. As a further example, Z17Ps can be used to -nple. For example, detection of loss of one copy quanti y copy number of a gene in a sat 1 of a p53 gene in a clinical sarnple is an indicator of susceptibility to cancer. In a further example, ZFPs are used to detect the presence of pathological microorganisms in clinical samples. This is achieved by using one or rnore ZFPs specific to genes within the microorganism to be detected. A suitable fori-nat for performing diagnostic assays employs ZFPs linked to a dornaln that allows immobilization of the ZFP on an ELISA plate. The immobilized ZFP is contacted with a sample suspected of containing a target nucleic acid under conditions in which bindin,-,, can occur. Typically, nucleic acids in the sample are labeled in the course of PCR amplification). Alternatively, unlabelled probes can be detected a second labelled probe. Afier washing, boundlabelled nucleic acids are detected Z171's also caii be used f-or assays to deterniffle the phenotype and function of gene expression. Current methodologies tor determination of gene function rely W p1 ily upon either overexpression or i-ci-noviii(, (knocking out completely) the gene of r mar] 1 1 1 interest firorn its natural bioloulcal settino, and observ-no the effects. The plienotypic effects observed indicate the role of the gene in the biological systern.

1 One advaiita-,e of'ZI--[-'-iiiedlated i-e,,tilatioii of a gene relative to Z71 Z? conventional knockout analysis is that expression of the ZFP can be placed under small molecule control. By contiollim, expression levels ofthe ZIPs, one can in turn control the expression levels of a gene regulated by the ZFP to determine what degre of repression or stimulation ofexpression is required to achieve a given phenotypic or biochemical effect. This approach has partiCUlar value for drug development. By putting the ZFP under small niolectile control, problems of embryonic lethality and developmental coi-nl)ejisat'loii can be avoided by switchino on the ZFP repressor at a later stage in mouse development and observing the etYects in the adult anirnal. Transgeni 1 lc mice having target genes regulated by a ZFP can be produced by integ ation of the nucleic acid encoding the ZFP at any site in to the target gene. Accordingly.

homologous recombination is not required for of the nucleic acid. Further, because the ZFP is trans-dorninant, only one chromosomal copy is needed and therefore flinctional knock-out animals can be produced without backcrossing, X. Computer Systems and Progn-arns Flo. 4 depi icts a representative computer system suitable for the present invention. Fig- 4 shows basic subsystems of a computer system 10 suitable for use with the present invention. In FixS. 4, computer system 10 includes a bus 12 which interconnects major subsystems such as a central processor 14, a systern memory 16, an input/output controller 18, an external device such as a printer 20 via a parallel port -22, a display screen 24 via a display adapter 26, a serial port 28, a keyboard 30, a fixed disk drive 32 and a floppy disk drive J33 operative to receive a floppy disk 33A. Many other devices can be connected such as a scanner 60 (riot shown) vila 1/0 controller 18, a i-nouse 36 connected to serial poil 28 or a network interface 40. Many other devices or subsystems (riot shown) may be connected In a sinillar mariner. Also, it is not necessary for all of the devices slio\,,,ji in FlWu. 4 to be present to practice the present invention, as discussed below. The devices and subsystems may be interconnected in different ways frorn that shown In FK,. 4. The operation ofa computer system such as that shown in Fig.

4 is readily known In the art and is riot discussed in detail in tile present application.

Source code to implement file present invention iiia! be operably disposed in systern mernory 16 or stored on stora-e rnedia such as a fixed disk 32 or a floppy disk 33A.

Fw. 5 an Illustration of represeniative computer system 10 of Fig. 4 suitable for embodying the methods ofthe present Invention. Fig. 5 depicts but one example of rnany possible computer types or coiifi,,iiratioiis capable of being used with Z-1 C) the present invention. Fig. 5 shows computer systern 10 Including display screen 24, 1 cabinet 20, keyboard 30, a scanner 60, and mouse 36. Mouse 36 and keyboard 30 illustrate---userinput devices." Other examples of user input devices are a touch screen lierlit pen, track ball, data glove, etc.

W In a preferred embodiment, System 10 Includes a PentiumS class based computer, running WindowsC Version '). 1, Windoxvs95@ or Windows980 operating system by Microsoft Corporation. However, the rnethod is easily adapted to other operating systems without departing from the scope of the present invention.

Mouse 36 may have one or more buttons such as buttons 37. Cabinet 20 houses familiar computer components such as disk drive 3)-3), a processor, storage means, etc. As used in tills specification -storage meansincludes any storage device used in connection with a computer system such as disk drives, magnetic tape, solid state memory, bubble memory, etc. Cabinet 20 may Include additional hardware such as input/output (110) interface 18 for connecting computer system 10 to external devices C> such as a scanner 60, external storage, other computers or additional peripherals. Fig, 5 is ZY representative of but one type of system for embodying the present invention. Many other system types and configurations are suitable for use in conjunction with the present invention.

Fig. 6 depicts a flowchart 30 1 of simplified steps in a representative embodiment for selectino a taract site containins-Y a D-able subs'te within a target sequence for targeting by a zinc finger protein. In a stel) '102, a target sequence to be tar(leted by a zinc finger protein is provided. Then, in a stel) 303, a potential target site within the target sequence is selected 16r evaluation. In a decisional step 304, the potential target site is evaluated to determine whetlier it contains a Dable subsite, Such a tar(liet site conforms to the fonmila 5'NNx aNy bNzc3', whereill "Iliei,elii eacii of (x, a), (y, b) and (z, c) is (N, N) or (G, K),- at leat one of (x, a), (y, b) and (z, c) is (Q K) and N and K' are 1UPACAU1S codes.

If the potential target site does contain a D-able subsite, the potential target 1 further decisional step site is stored as a record in 205. The rnethods continues with a 306. If evaluation of further potential target sites is required by the user, a further iteration of the method is performed starting, frorn 303. It'sutlicient potential target sites have already been evaluated, records of target sites stored in step 305 are then ouput in step 307.

Fil->_ 7A depicts a flowchart of simplified steps another representative embodiment for selecting a target site within a polyiiucleotide for targeting by a zinc finger protein. In a step 402A, a polynucleotide target sequence is provided for analysts.

Then, in a step 404, a potential target site within the polynucleotide sequence is selected.

The potential target site corriprises first, second and third triplets of bases at first, second and third positions in the potential tar(1,,et site. Theri, in a step 406, a plurality of subscores are determined by applyino a correspondence regirne between triplets and triplet position, W wherein each triplet has first, second and third corresponding positions, and each 41 corresponding triplet and position is assigned a particular subscore Next there is an Z.

optional decislonal step 408 in which the user can elect to scale one or more of the subscores with a scaling factor in step 4 10. Thereafter in a step 4 12, a score is determined from the subscores (scaled as appropriate) for the first, second, and third triplets. Then, in a decisional step 414, a check is performed to determine if any further potential target sites are to be examined. If so, then processing continues with step 404.

Otherwise, in a step 4 16, at least one of the potential target sites and its score are provided as output.

Flo. 7B depicts a flowchart of sli-nplified steps in a representative embodiment for 1)i-o(iticiii, a zinc finger protein- In a step 450 a database comprising designations for a plurality ol'ziiic finger proteins is provided. Each protein in the database comprises at least tirst, second and third The database further comprises subdest gnat Ion s for each of the three fincers ofeach of the zinc finger proteins and a correspond Rg, nucleic acid sequence for each zinc finger protein. Each sequence comprises at least first, second and third triplets specifically bound by the at least first, second and third fingers respectively in each zinc finger protein. The first, second and third triplets have an arrangement in the nuclele acid sequence in the same respective order (-V-5') as the first, second and third tillCi'S W,e a[,["all(,Ied in the zinc finger protein (N-tert-ninal to C-terniffial).

In a step 452, a target site for desion of a zinc finger proteins comprising at Z least first, second and third triplets is provided, Then, in a step 454, a first set of zinc finger proteins with a finger that binds to the first triplet In tile target sequence is identified. There follows an optional step 456 of identifyin_g first and second subsets of the set determined in 454. The first subset comprises zinc fing-er protein(s)s with a finger that binds the first triplet from the first finger position In the zinc, finger protein. The second subset comprises zinc finger protein(s) with a finger that binds the first triplet from other than the first finger position in the zinc finger protein. The method continues at step 458. In this step, a flirther set of zinc finger proteins is identified, this set comprising a finger that binds to the second triplet in the target site. This step is followed by an optional step 460 of identifying first and second subsets of the set identified in step 458. The first subset comprises zinc finger proteffl(s) that bind to the second triplet from the second position within a zinc finger proteinThe second subset comprises zinc finger protein(s) that bind the second triplet fi-orn other than the second position of a zinc finger 42 protein. The method continues at step 462. In 462, a set of zinc finger proteins is identified comprising, a finger that binds to the third triplet of the target site. In an optional step 464, first and second subsets of the set identified in step 462 are identified.

The first subset comprises zinc finger protein(s) containing a finger that binds tcr.the third triplet frorn the third finger position of the zinc finger protein. The second subset comprises zinc finger proteln(s) containing a finger that binds to the third triplet from W other than the third finger position of the zinc finger protein. The method continues at Z_ Z__> step 466 in which the sets of zinc finger protein identified ill steps 454, 458 and 462 are separately output- There is a further optional step 468 in which the first and second subsets of zinc fin-er proteins identified in steps 460, 464 and 468 are output.

F.I.g. SA Is a key to the Entity Representation Diagram (ERD) that will be used to describe the contenis of ZP-11 database. A representative table 502 includes one or more key attributes 04 and one or more non-key attilbutes 506 Representative table 502 includes one or more records where each i-ecord includes fields corresponding to the listed attribute--,. The contents of the key fields taken together identify all individual recordIn the ERD, each table Is i epresented by a rectangle divided by a horizontal line.

The fields or attributes above the line are keY while the fields or attributes below the line are norl-key filelds. An relailonship 508 si-nifies that the key attribute of a parent table 5 10 is also a key attribute ofa child table 512. A non- identl in. relationship fy, 0 5 14 signifies that the key attribute of a parent table 5 16 is also a non-key attribute of a child table 5 18. Where (FK) appears ill parenthesis, it indicates that an attribute of one table is a key attribute of another table. For both the non-1dentifying and the identifying relationships, one record 'in the parent table corresponds to one or more records in the child table.

FW. 813 depicts a representative ZFP database 550 according to a particular embodiment of the preSent invention- Database 550 can typically include designations for each of a collection of precharacterized ZFPsThe ZFPs can be natural ZM or variant ZFPs. The designation can be, for example, the name or a symbol representing each Z17P. For example, ZFP 552 of database 550 ill Fig. 813 is designated "ZFPOOI."

The database 550 also includes siibdesi,,iiat'oiis foi- each of the fincyers in a ZFP, such as subdesignation 554, Finger 1 of ZFPOO 1 552, Typically, the subdesignations are in the form of amino acid residues occupying selected positions in a finger. Further, the ZFPs have subdesi-ciiat"ons that are the arn'no acids occupying positions -1 through +6 1 1 1 n 43 accordinc to conventional numberin---. The database can further a target nucleic C1 aid seg -D c --,i-nent bound by each zinc finger protein. The nucleic acid segment usually includes three triplets of three bases. The three triplets of bases can be included joined as one sequence or as separate sequences. If bases in a nine base target site are numbered consecutively from the 5' end, a first triplet occupies bases 7-9, a second triplet occupies bases 4-6 and a third triplet occupies bases 1-3). According to this designation of triplet position within a tarcet seUrnent, the first fincer of a zinc fin,,er protein (i.e., closest to N :1 terminus) binds to the first triplet, the second finger to the second triplet, and the third finger to the third triplet. The database can also include additional information such as the binding affinity oidissociati 10 1 n 1 1 ion constant of a ZFP f' r its tar()et site, although such is not essential. Fiii-ther database 550 can include other arrangements and relationships among the M's, fin(,ers and nucleic acids than are depicted in Flo. 813 without departing frorn the scope ofthe present Invention.

Examples jjle 1: SEARCH 11ROTOCOLS FOR DNA MOTIFS This Example illustrates how a tar(Tel,,e(7iiieiit is selected from a longer Oelle. The search procedure is Implemented ti.111(1 1 C01111)11ter progrann that allows one to sl)ecl, one or inore DNA sequence motil's in a protocol. Non-nal procedure is to input the DNA sequence of a gene or eDNA and then search the sequence multiple times for different motifs, firorn the most to the least desirable. Thus, of the exemplary protocols listed below one would typically perform protocol 1 first, and if that fails to yield an adequate number of potential tar(lyet se(rments, one then tries protocol 2, and so W 1:1 forth.

Protocol 1 searches a tar(,et eerie for a taract site formed from two separate segments, each of 9 or 10 bases. The two segments can be separated by zero to three intervening bases. Each segnient Includes a D-able subsite of the form NNGG (shown in bold). Each three base subsite within a segment begins with a G. The target sites W C) identified by this analysis can be used directly for ZFP design or can be subject to further W analysts, for example, to ldent4 which target segments possess additional D-able 1 subsites. In a target site formed fi-orn two segnients, each of tell bases, a total of six D able subsites can be present. All target sites below are shown from 5' to 3' and the nomenclature "0,")" indicates that 0-3 iiticicoticle of any type may be present.

GNGGNNGN-NI(N)(0,3) GNGGNNGNNN GNGGNNGNN(N){0,3}GNNGNGG GNGGNNGNN(N){0,33 GNGGNNGNG(; GNNGNGGNN(N){0,--, I(GNGGNNGNNN GNNIGNGGNN(N)l 0,3 1,GNNGNGGN-NN GNNGNGGNN(N),10,3) GNGGNNGNG( GNNGNNGNGG(N)10,3,GNGGNNGNNN GNNGNNGNGG(N),f 0,3,1GNNGNGGNNN GNNGNNGNGG(N) 1,0,3 fIGNGGNNGNGG GNNGNNGNGGNGGNNGNNN GNNGNNGNGGNNGNGGNNN GNNGNNGNGGNNGNNGNGG Protocol 2 Is a second procedure for evaluating target sites within a tareret gene- This pt-ocedtit-e, a(,a'ji searclies foi- a target site fori-ned frorn two segments, each of z -1 W --> 9 or 10 bases. Each segi-nent contains at least one D-able subsite of the form KNGG.

Protocol 2 differs fi.om protocol 1 In that protocol 2 does not require that three base subsites being with a G. R-affier in protocol 2, three bases subsites beginning with either a Z) Z"> z::> G or T (K in lUBPACAUB arribiguity code). Target sites are shown frorn 5' to Y, and the syrnbolds "(0,-')) and (0,2) indicate Intervening segments of 0-3 and 0-2 bases respetively.

KNGGNNKNN(N),10,-) KNGGNNKNN-N KNGGNN-KNN(N)0,-) ftKNNYNGGNNN KNGGNNKNN(N){0,3) 11 KNNKNNKNIGG K,NNKNGGNN(N)J,0,3, 1 KNGGNNKNNN KNNKNGGNN(N)(0,31 KNNKNGGNN-N KWKNGGNN(N)f, 0,33 ik KNNKNNKNGG KNNKNNKNGG(N) {0,2 k KNGGNNKNNN KNNKNNKNGG(N) (0,2 k KNNKNGGNNN KNNKNNKNGG(N) (0,2 k KNNKNNKNGG KNNKNNKNGGNIGGNNKNNN KNNKNNKNGGNNKNGGNNN KNNKNNKNiGGNNKNNKNGG Prolocol 31 Is the Saille as protocol i o except that protocol three selects target sites with Cither a KNGG oi- a KNGT D-able,iibsite. Tai-(,et sites are shown frorn Z -3Y.

K N G K N N K N N (N) 1, 0, 3 ik K N G K N N K N N N KNGKNNKNN(N)1,0,3) (,KNNKNGK K N G KN N KN N (N) 0, 3) f1K N N KNI N K N1 G K K N N KN G K N N (N) it 0, 3 1k K N G K N N K N NN KN N KN GK N N (N),( 0, 3 1k K N N K N G K N NN KNNKNGKNN(N)0,--) KNNKNNKNGK KNNKNNKNGK(N){0,2 lKNGKNNKNNN KNWNNKNGK(N) {0,2 KNNKNiSK KNNKNNI<NGK(N){0,2KNNKNNKNGK KNNKNNENGKNGKNNKNNN KNNKNNKNiSKNNKNGK KNNKNNKNGKNNKNNKNGK 46 Protocol 4 is more (--,ciieral than any of the protocols described above, and does not require that target sites contain a D-able subsite. Protocol 4 similar requires two segments, each of 9 bases within 0-13) bases of each other of the form GNN GNN GNN.

Protocol 5 is the same as protocol 4 except that it searches for target sites formed frorn two target segi-nents of formula 5'KNN KM1,1 KNNY within 0-3 bases of each other.

Example 2

This exarnplel illustrates that zinc finger proteins that bind to target seginents Including at least one D-able subsite generally bind with h'gher affinity than zinc fin-er proteins bindinu to tar(Tet seunients lackinDable subsites provided the ZFP has a D residues at position +22. Fifty-thi-ce ZFf)s, each having, three fingers, were Z-1 selected frorn a collection without regard to binding affinlly or binding to a D-able subsite. The dissociation constants ofthe selected M's were determined by binding of the Z17Ps to a tanjet se-n-lent C()1111)1-1slll" till-cc ColltiOLIOUS illicicotide triplets respectively bound by the three 1Ingers ofthe ZF11 plus at least one flankin- base frorn the target sequence on either side. All Z17Ps had the hurnan Sp 1 frarnework. The binding affinities 1 of these 53 Z17Ps were arbilti divided into 4 gi oups, listed as M values in Table 2.

Table 2

Dissociation Constants (M) >1,00011M 100-1,00011m 10-9911M <or=1011m 11 1 8 11 According to this classification only about 25% (14/53) of these proteins had high affinitles (Kd less than or equal to 100 nIVI) for their respective targets. Of these 14 proteins, all had had least one D-able subsite within the target.

Example 3

We searched the sequence of the soybean (Glyclne niax) FAD2-1 cDNA for paired proximate 9 base target segillents Using protocols 2 and 3). Five targets segments were C C Z:1 chosen, and either one or two Z17Ps were designed to bind to each of the targets. The 47 targets chosen and the Kel values for the respective designed ZFPs are shown in Table 3. D-able subsites are shown in bold. Sequences are shown from S' to Y.

Table 3

TARGET NAME SEQUENCE PROTEfN NAME Kd (nM) FAD] GAG GTA GAG C FAD IA 10 FADI GAG G-FA GAG C FAD I B 10 FAD2 GTC GTC; TGG A FAD 2A 100 FAD3) GTT GAG GAA G FAD 3A 100 FAD3) GTT GAG CAA G FAD 33B 100 FAD4 GAG GTC CAA G FAD 4A 10 FAD4 GAG GTG CAA G FAD 413 2 FAD 5 TAG GI-G GTG A FAD 5A 10 Ofthe 8 ZFils made, all bound with atlinity (Kd less than or equal to 100 nM) to their targets, shomm, that selecting target with a D-able subsite within a 9bp target allows one to efficlently desj.,,ii a lil,,ii atImity ZW Moreover, all of the ZFPs bindincy to target sites with two D-able subsites bound more strongly than ZPFs bindina to target sites with only one D-able subsite.

zw) Example 4

This example provides furtlier evidence that D-able subsites confer high binding affinity. Fifty-three target seuments \vere identified by protocol 5 listed above, W W - which does riot require that a D-able subsite be present in a target site. Fifty-three ZFPs were designed to bind to these respective sites. Thirty, three target segments were W Z:1 11.

identified by protocol 3 above, which does require a D-able subsite, and thirty-three ZFPs were designed to bind to these respective sites. Table 4 compares the Kds of ZFPs designed by the different procedures.

C) Table 4

Search Dissociation Constants (Kd) Protocol 1,00011m 100-1,000 nM 10- 1 0011M < or =10n.M #5 33 1 8 11 3 #3 0 ? 15 16 48 Table 4 shows that 33 1 of 33 ZFPs by protocol 3 have h Zgh bindinle) affinity (M less than 100 nM). By contrast, only 14 of 56 ZFPs designed by protocol 5 have hiph IZ:w) W binding affinity. These data show that atfinity ZFPs (Kd<100nm) can be designed more efficiently to targets 'If the search protocol includes D-able subsite criteria than if the search protocol does riot require a Dable subsite.

Exaipple 5 The the relationslill) between tlic affin-lty ofthe ZFP and tile presence of one or more Dable subsites in the tai-<,et \vas analyzed fior about 300 des'-ried Z17Ps ific mostly to difl'ef-ellt sites. In iii:s and subsequent analyses, only one ZFP specl 1 1 was included per target site, this being the Z111 xvith the highest affin' 1 -, 1 Ity.

Table 5 and F-I-. 1 show the avera,,e Kd of'ditYci-eiit catcoories of ZFP categorized by number and type of'D-able subsites in 9 base tai-()et site bound. In Table 4, and later in Tables 6, 7 and 8, s.e.m. Is standaid ci-i-oj- ofthe rnean and n is nurriber of proteins examined.

Table 5)

D-able subsite/ Aver M 9 base target 0 828 66) 24 1 GT 46 226) OS 1 GG I')g -')5) -1)4 2 GT 100 'W) 02 1 GG+1W 208 198) OA 2 GG 15 6) The 22 Z17Ps designed to targets with two GG type D-able subsites have the strongest binding affinity wilth an average K1d = 15 iiM. Of tile 50 ZFPs having a Kd 0 11 C1 < 100 nM, 49 have at least one D-able subsite. The table shows the following conclusion.. ( 1) binding to a target site with one D-able subsite bind more strongly than

ZFPs binding to a target site Jackin- a D-able subsites (2) ZFPS biiidiii(l to a tareet site 11 -> -- 1 0 In with two D-able subsites bind more strongly than Z17Ps that bind to a target sing with one D-able subsite, and (3) ZFPs with a target site with a GG D-able subsite bind rnore Z:1 strongly than ZFPs with a target site with a GT D-able subsite.

49 Example 6

Another fiactor afl'ectlii,, bliidiii,,,, affinity of designed ZFPs is whether a target site has the form GNN GNN GNN rather than KNN KNN KNN. his example shows that D-able subsites confer high binding affinity even in the context of a ONN GNN GN-N motif For this analysis, we selected a population of 59 ZFPs, each of which binds to a different tai-,(,,et slte of the form GNN GNN GNN. Table 6 shows the Kd values of designed ZFPs as a function of the presence of D-able subsites with a GNN GNN GN-N target.

*Table 6

D-able subsites/ Average K'd 11 9bp Target 0 787 88) 17 1 GG 66 14) 2 3) 2 GG 17 7) is 1 GG+ 1 M' 5) 2 The presence ofa D-able siibsite stromfly allects binding affinity of a ZFP even when the his, ihe GNN GNN GNN mow' Example 7

This Example provides further evidence that the efflect of D-able subsites in conferrim, "jici-eased binding affinity additive with any effcts of G residues in 2, 1 1 1, 1 1 1 1 1 conferring higher binding affinity i-elative to other residues. For this analysis, we selected 101 zinc finoer proteins bliidiii,(-, to ditferent tai... et sites from our collection, and classified these target sites by the number of G residues present. The target sites contained from 2 8 G residues in a 9 base sequence. Table 7 slio\,,s that in general, the more G residues that are present in a target site, the stronger the binding affinity of the ZFP for that site.

W -> Table 7 # Gs/ Aver. Kd, iiM 11 9 base target (+/-) s.c.in 2 >1000 4 68 1 58) 8 4 447 84) 26 5 195 58) 2 6 83 (66) 15 7 46 (26) 9 8 1 1 We analyzed these data 1Orther by askin- whether the presence or absence of a D-able subsIte affiected avera-e I'd values of the desi,,,iicd ZFPs. Each category of 9 base tar-et fl-orn Table 7 subefivided 111to tarocts conlainin(y or not containino D-able subsites. The result ofthis analYsis Is shown in Table 8 Table 8

K' d, 1]M 809 + 191 167 -4- 27) 4 867 +87 640 _+ 169 98 1M 6 >1000 8 +66 The table sli)\vs that when tai-,,et sites havin- the sanne number of G but d'tYerent nunibers of'D-able subs' j resI 1 Ites are compared, the sites D-able subsite(s) confer higher bliidiii,,, afflinity- For 9 base target sites having has 4 or more Gs, the average M is approxirnately 100 iiM or less if the target has at least one D-able subsite. Particularly notable is the comparison between target sites having 5 G residues.

such target sites lacking a D-able subsite had an average Kd of 640 nM. 23 such target sites with two D-able substles had an avera-e K'd of iiM.

Example 10: The ZFP Prediction Module 51 This example illustrates selection of a target segment within a target gene using a correspondence reginie, and use of a database to design a ZFP that binds to the selected target segment. The ZFP Prediction Module facilitates both the site selection and ZFP design processes by taking as Input (1) the DNA sequence of Interest (11) various data tables (Iii) design parameters and (iv) output parameters, and providing as output a list of potential ZFP target sites in the seqUence of interest and a summary of fingers which have been desicmed to subsites in each taroet site. This section will describe program inputs, outputs, and scoi-in(-, protocols for the program. For clarity, the descriptions will be divided into site selection and flincli L 10 t 1.11.

1 52 1. Selecton of target s'tes w'th'ii the DNA region of interest:

Inputs:

1) The target DNA sequence 2) A scores table each of the possible three-base pair subsites and scores. for its three possible locations 'In a 9-bp target site is shown in Table 1. The scores table is provided by the user at ruti-tli-ne and rnay be customized and updated to reflect the user's most current understanding ofthe DNA-sequence preferences of the zinc fin(-,er motlf 3) A 'ZFI-1 data table' \,,,lilcli contains sites, ainino acid sequences, and reference data for ZF-11s This table _s required for th' 1 1 1 1 1 Is Portioll of 1 tile program only Ifoutput parametei (11) is selected below. An example ofa ZF11 data table is provided -Ill Table 9.

4) All optional contexi parameter - the "enhancement factor for'D-able' triplets" 15 entered by the user at run-time. This paraiiieter multiplies - by the enhancement factor - the score 1'oi lily 'x\G' subslic flanked by a 3' G or T_ 5) Output parailleters - supplied by tile IISCI - specl Y1110 i the miniber oftarllet SlICS 10 111ClUde In the Output H whenier the should specitically highlight those target sites W C (if any) for which three-tinger proteins have already been designed 111) whether the program should re-order output target sites according to their relative positions ill the Input target sequence iv) whether the prograrn should lilz,,lil'()lit targetable pairs of 9-bp DNA sites (adjacent, nonoverlappin- site pairs separated bv n or t'evei- bases, where n Is typically 5, 4, 3, 2 or 1).

Output. A set of potential taroet sites in the target DNA sequence ranked by score.

If specified, a list of ally tai-.,,et sites for which three-finger proteins have already been designed- If specified, the list of output sites re-ordered accordiii,(', to location 'In the input sequence If specified, a list of all targetable pairs of 9-bl) DNA sites.

51) The site selection portion of the program assi(Tris a score to every possible 9-bp sequence in a given target DNA fi-agnient, the score reflecting case of targetability C> C1) 11211 based on using Information fi-om previously designed zinc finger proteins. In evaluating a given 9 base sequence, the program fij-st splits the target into its component subsites, and Z:1 c> then consults the scores table to obtain a score for each subsite at its location In the potential target site, Finally, It multiplies the subsite scores to obtain an overall score for 1 the 9-bp target site. For example, using the test sequence 5'AGTGCGCGGTGC')' and the scores table in Table 1, the output sites (5'- Y) arld scores are site Subsites SCOVe AWGCGCW A G T G CG CG G 1 X 10 X 1 = 10 GTGCGCWT GTG CGC GGT 10 X 1 X 10 = 100 TWGWGTG TGC GCG GTG 10 X 10 X 10 = 1000 GCGCWTGC GCG CGG TW. 10 X 1 X 8 = 80 Ill ilits example, the best site Is S'TGC GCG GTGY, with a score of 1000. The SCO1 ell [0 ill tile opposite (antisense) strand, but ffir the sake ot'sliiil)llclt- these slies ai,e Ill tills example.

A optional factor, the "enhancement factor foi-'D-,ible' triplets", can be provided to alter the above scoring pl-0t0C0 1 n to aCCOLIlit for the context factor - the D-able contact - in 1 evaluatin- taroet sites. If tills feature 'Is chosen, the prograrn performs the following _1 W C check when assi-n'n- subs 1 Ite SC(-)1-es If a subsite is of the forni x.\G, theii if the ad - Went base (oil tile 3' side) is T or G, then the score of the xxG subsite is iiiijitll)lled by the enhancement factor, other-wise, the subsite score reniains the same, [If the subsite is of the form xxA, xxC or xxT, its score also remains unchanged] For example, if the user Inputs all eitancenient factor for'D-able' triplets of 1.25, then the scores above are adjusted as follovs 54 site subsites Score AGTGCGCWt AGT GCG CGGt 1 x 1 Ox(l x 1.25) = 12.5 (CGG is D-able) GWCGCWTg GTG CGCWT,-, 1 Ox 1 x 10 (no D-able subsites) TGWCWTGc TGC GCG GTGc 1 Ox(l Ox [.25)x 1 0= 1250 (GCG Is D-able) GWCGGTW# GCG CGG TW9 10x(1xl.25)x8 = 100 (CGG is D-able) [When iislii(-, this option, the 1)i-o,i-aiii considers the identity of base irni-nedlately to the ')'side of the tai-()et site (In lower case). For the last site, this base is undefined In this exaniple and this Is noted by the pound si-,ii'g'at this position] Af ' ter assi,-,jiillk) scores to all 9-base pall, secluences in the target DNA, the pro(ylrai-n then prints out the top scores, with the nuinber ofsites outputted determined by tile user As specified by the user, the prograrn can also provide:

1 a list taroet sites Iol which till-ce-fill(Yer- proteins have W already been designed.

ii the 11,i siles ic-oi(leic(i accoi-diii,-,, to location in the input sequence ill a list ofall targetable pairs ot4'9-bl) DNA sites (adjacent, nonoverlappin- site pairs separated by fi\le, three or fe,ei 1 1 - bases).

11. Design of proteins tbi. the chosen target site Inputs: Sites froin the site-selection portion of the prograrn (or otherwise determined) The7FPdata tabic'wlilcli contains target sites, arnino acid sequences, and reference data for existing li'(,1i-atYjii'ty ZFPs An output pararneter supplied by the user - specifying whether the prograrn should restrict its output elther.

1 to oni), those proteins (if any) whose tar(l,et sites are completely identical to sites in the output, or, P to only those proteins (Wany) \Yliose target sites match output sites at two or more of the three-bp subsites.

Output: 1 n the absence of restrictions (1) or (11) For each potential 9-base pair target site, a listliil,, of three sets of ZFPs and their component fingers from the ZFP data table which respectively bind to the three-triplet subsites within the target site. For each subsite, the set of ZFPs can be subdivided into two subsets. One subset contains ZFPs and their fingers that bind a t riplet at a given position frorn the corresponding fin-er position in a parental ZFP_ The other subset contains Z17Ps and their fin,,ers that bind a trilAct at a given position from a noncorresponding ition withiii a parental W. A first finger position (NC) posi 1 1 1 corresponds to a first ti-11)lcl 1)o,iticiii The ZFP des'],,ii 1)ort-on offfie 1'acilitates the desioll process by allowing 1 z::, zz) the user to rapidly i-cvie\, all kiio\,ii w bind iit)sltes In a given 9- base target site.

target. Given the optimal desl,,ji tti,,ct f-om the above example (5'TGCGCGGTG"), and the short ZFP data table pi-ovided M Table 9, the output (In the absence of restrictions (1) or (10) would be as follows site 5'TGCGCGGI-G Z17Ps -- PREVIOUS DESIGN:

ORDERED:

Triplet 3 2 1 F] F2 F3) [1] 5'TGCGGGGCA x ERIDHLRT [3] 5'WWCGGGG RSDEl-QR [4] 5'GAGTGTGTG 'RKIDSI-VR DISORDERED:

RSDELTR[21(3)) RSDERK11[2]( 1 The'ordered' oiitl)tit shows thal, In the ZFP data table, there is one instance where the TGC subsite is contacted by a zinc finger in the third triplet of a target Site.

56 The finaer in this case is ERDULRT, and the site is 5'TGCGGGGCA3'. There is also one similar Instance for each of the other two siibsites GCG, and GTci. The fingers in these cases are, respectively, RSDEl-QR and RKDS1-M This Information is used to propose the tliree finger protein F 1 -RKDSLVR. 172-RSDEl-QR, 173-ERDFILRT as a 5 design to bind the target 5'TGCGCGGTG')'.

Tlie'disot-dei-ed'otitl)iit shows that there are two cases In tile Z17Pdata table in which fiii(,ei-s contact a WG sLibsite, bot not at the center stibsite of a target. Rather, Z') Z.) ill one case GCG is contacted at the 5' end, and the other the Yend, and in these cases the finoer seqiciiccs are RSDELTR and RSDERKR. These are alternate deslons for binding C7 GCG in the target site.

1 57 Table 1 The scores table subsite subsite score. subsite subsite score sequence: sequence location in 9 base site: location in 9 base site base pairs 4 base pairs 7-9 4-6 1 - 7-9 4-6 AAA 10 8 8 CA A 8 8 10 A-AG 8 8 10 CAG 1 1 1 AAC 1 1 1 CAC 1 1 1 AAT- 8 10 10 CAT 1 1 1 AGA 10 8 S CGA 1 1 1 AGG 1 1 CGG 1 1 1 AGC 1 1 CGC 1 1 1 AGT 1 1 CGT 1 1 1 ACA 8 10 8 CCA 1 1 1 AW 1 1 1 CCG 1 1 1 ACC 1 1 1 CCC 1 1 1 ACT 1 1 1 CCT 1 1 1 ATA 8 10 8 CTA 1 1 1 ATG 1 1 1 CW 1 1 1 ATC 1 1 1 CW 1 1 1 ATT 1 1 1 CTT 1 1 1 GAA 10 10 to TAA 8 8 10 GAG 10 10 10 TAG 10 10 8 GAC 10 10 8 TAC 10 8 10 GAT 10 10 10 TAT 1 1 1 GGA 10 10 10 TGA 10 10 8 58 GGG 10 10 10 TGG 10 10 10 (SW 10 10 10 TGC 8 10 10 GGT 10 10 10 TX5T 10 10 8 GICA 10 10 10 TCA 10 8 8 GCG 10 10 10 XG 8 10 8 GCC 10 10 8 TCC 10 8 10 GCT 10 10 10 XT 1 1 1 GTA 10 10 10 TTA 10 10 8 GTG 10 10 10 TTG 8 10 8 GTC 10 10 10 TTC 1 1 1 GTT 10 10 10 8 10 8 Table 9: Exernplgy ZFP data table # targel. site ZFP Sequence rel'creiicc infionnatioll F 1 F2 1 3 1 TGCCi(-(i(;('A RSADLTR RSDHLTR Li1)11li'i SBS design GR-223, Kd: 8 liM 2 CTCGTGGGC(i RSDELTR RSDI-IM RSDMU Z1f268, Kd 004 iiM 3 GGGGWGGG KTSHLRA RSDELQR RSDI-UK SP 1, Kd 25 iM 4 GAGTGWWT RKDSLVR MHLAS RSDNUR SBS design GL-8.3. 1, Kd: 32 iiM Oilier examples of zinc finger protelns, the seqtieiices of 1 heir fiii,, ci-s aiid target sites bound appropriate for b ilicltisioii in sucha dalabaseare discussed iii the i.e[crciices cited M the Background Section.

For the avoidance of doubt the present invention as the skilled person will understand includes a system for selectincy a tareet sequence within a polynucleotide for targeting by a zinc finger protein, comprisiti(l' a processor operatively disposed to:

(1) provide or receive a polyiiucleotid(,. sequence,- (2) select a potential target site within the polynucleotide sequence., the potential target site comprising first, second and tli'trd triplets of bases at first, second and third positions in the potential targ i 1 _>et site; 59 (3) calculate a score for the potential target site from a combination of subscores for the first, second, and third triplets, the subscores being obtained from a correspondence regime between triplets and triplet position, wherein each triplet has first, second and third corresponding positions, and each corresponding triplet and position has a particular subscore; (4) repeat steps (2) and (3) at least once on a further potential target site comprising first, second and third triplets at first, second and third positions of the further potential target site to determine a further score., (5) provide output of at least one of the potential target site with its score.

Other aspects of the invention are a computer program element cornprisin Z_) 9 computer program code means to rnake a computer execute procedure to perform the method steps of any one of claims 1 to '37, 47 or 48 (in the claims appended hereto), i.e.

to perform the method steps of any of the methods of the invention. Such an element may be embodied on a computer readable medILIM, The skilled person will also appreciate that the invention includes within its scope electrical or optical signals representina instructions or statements to make a computer

W execute procedure to perform the rnethod steps of any of the methods of the present invention, wherein the electrical or optical signals are adapted to support transmission thereof over a communication network.

Althou h the fore-oltio invention has been described in detail for purposes of 9 0 0 clarity of understanding, it will be obvious that certain modifications may be practiced within the scope of the appended claims. All publications and patent documents cited herein are hereby incorporated by reference in their entirety for all purposes to the same extent as if each were so individually denoted.

Claims

CLAIMS:

1 A method of synthesizing a zinc finger protein comprising:

(a) providing a database comprising designations for a plurality of zinc finger proteins, each protein comprising at least first, second and third fingers, and subdesignations for each of the three fingers of each of the zinc finger proteins; and a corresponding nucleic acid sequencefor each zinc finger protein, each sequence comprising at least first, second and third triplets specifically bound by the at least first, second and third fingers respectively in each zinc finger protein, the first, second and third triplets being arranged in the nucleic acid sequence (3'-5') in the same respective order as the first, second and third fingers are arranged in the zinc finger protein (N-terminal to C-terminal); (b) providing a target site for design of a zinc finger protein, the target site comprising continuous first, second and third triplets in a 3'-5' order, (c) for the first, second and third triplet in the target site, identifying first, second and third sets of zinc finger protein(s) in the database, the first set comprising zinc finger protein(s) comprising a finger specifically binding to the first triplet in the target site, the second set comprising zinc finger protein(s) comprising a finger specifically binding to the second triplet in the target site, the third set comprising zinc finger protein(s) comprising a finger specifically binding to the third triplet in the target site; (d) outputting designations and subdesignations of the zinc finger proteins in the first, second, and third sets identified in step (c); and (e) synthesizing a zinc finger protein JFIP), or a nucleic acid encoding the ZFP, that binds to the target site, said ZFP comprising a first finger from a zinc finger protein from the first set, a second finger from a zinc finger protein from the second set, and a third finger from a zinc finger protein from the third set.

2. A method of claim 1 further comprising identifying subsets of the first, second and third sets, the subset of the first set comprising zinc finger protein(s) comprising a finger that specifically binds to the first triplet in the 61 target site from the first finger position of a zinc finger protein in the database; the subset of the second set comprising zinc finger protein(s) comprising a finger that specifically binds to the second triplet in the target site from the second finger position in a zinc finger protein in the database; the subset of the third set comprising zinc finger protein(s) comprising a finger that specifically binds to the third triplet in the target site from a third finger position in a zinc finger protein in the database; wherein the outputting step comprises outputting designations and subdesignations of the subset of the first, second and third sets; and the synthesizing step comprises synthesizing a zinc finger protein or a nucleic acid encoding the ZIFIP, said ZFP comprising a first finger from the first subset, a second finger from the second subset, and a third finger from the third subset.

3. A method of claim 2, wherein the outputting comprises outputting the designations and subdesignations of the subsets of the first, second and third sets, and the first, second and third sets minus their respective subsets.

4. A method of claim 5, wherein each of the subsets is a null set.

5. A method of any one of claims 1 to 4, wherein the target site is provided by user input.

6. A method of any of claims 1 to 5 wherein the target site is provided by a method comprising:

(a) evaluating subsequences of a target nucleic acid for conformance with the formula S-NW aNy bNzc-3', wherein:

(i) each of (x, a), (y, b) and (z, c) is (N, N) or (G, K); (ii) at least one of (x, a), (y, b) and (z, c) is (G, K); and (iii) N and K are 1UPACAUB ambiguity codes, (b) selecting a subsequence that conforms to the formula as a target site in the target nucleic acid.

62

7. A method of synthesizing a zinc finger protein comprising:

(a) providing a database comprising designations for a plurality of zinc finger proteins, each protein comprising at least first and second fingers, subdesignations for each of the fingers of each of the zinc finger proteins, and a corresponding nucleic acid sequence for each zinc finger protein, each sequence comprising first and second triplets specifically bound by the first and second fingers respectively, the triplets being arranged in the nucleic acid sequence (3'-S) in the same respective order as the first and second fingers are arranged in the zinc finger protein (N-terminal to C terminal), (b) providing a target site for design of a zinc finger protein, the target site comprising contiguous first and second triplets ordered 3'-S in the target site; (c) for the first and second triplet in the target site, identifying first and second sets of zinc finger protein(s) in the database, the first set comprising zinc finger protein(s) comprising a finger specifically binding to the first triplet in the target site, the second set comprising zinc finger protein(s) comprising a finger specifically binding to the second triplet in the target site; (d) outputting designations and subdesignations of the zinc finger proteins in the first and second sets identified in step (c),. and (e) synthesizing a zinc finger protein (ZFP), or a nucleic acid encoding the ZFP, said ZFP comprising a first finger from a zinc finger protein from the first set and a second finger from a zinc finger protein from the second set.

8. A method of producing a zinc finger protein comprising:

(a) providing a database comprising..

designations for a plurality of zinc finger proteins, each protein comprising at least first and second fingers; subdesignations for each of the fingers of each of the zinc finger proteins; and 63 a corresponding nucleic acid sequence for each zinc finger protein, each sequence comprising first and second triplets specifically bound by the first and second fingers respectively, the triplets being arranged in the nucleic acid sequence (3'-5') in the same respective order as the first and second fingers are arranged in the zinc finger protein (N-terminal to Cterminal); (b) providing a target site for design of a zinc finger protein, the target site comprising contiguous first, second and third triplets ordered 3,-5' in the target site; (c) for the first and third triplet in the target site, identifying first and second sets of zinc finger protein(s) in the database, the first set comprising zinc finger protein(s) comprising a finger specifically binding to the first triplet in the target site, the second set comprising zinc finger protein(s) comprising a finger specifically binding to the third triplet in the target site, (d) outputting designations and subdesignations of the zinc finger proteins in the first and second sets identified in step (c); and (e) synthesizing a zinc finger protein (ZFP), or a nucleic acid encoding the ZF1P, said ZFP comprising a first finger from a zinc finger protein from the first set and a third finger from a zinc finger protein from the second set.

9. A method of synthesizing a zinc finger protein or a nucleic acid encoding the ZFID, involving providing a target nucleic acid site comprising contiguous first, second and third triplets ordered 3'-5' in the target site, and substantially as hereinbefore described.