WO2008031098A1

WO2008031098A1 - Binary amino acid libraries for fibronectin type iii polypeptide monobodies

Info

Publication number: WO2008031098A1
Application number: PCT/US2007/078039
Authority: WO
Inventors: Shohei Koide; Kaori Esaki; Akiko Koide
Original assignee: University of Chicago
Current assignee: University of Chicago
Priority date: 2006-09-09
Filing date: 2007-09-10
Publication date: 2008-03-13
Anticipated expiration: 2009-03-09

Abstract

The invention generally relates to libraries of fibronective Type III polypeptide monobodies. The libraries include a plurality of different monobodies generated by creating diversity in the surface loop regions of natural molecular scaffold using only two amino acids.

Description

BINARY AMINO ACID LIBRARIES FOR MONOBODIES

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Application No. 60/843,357, filed September 9, 2006, which is incorporated by reference in its entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR

DEVELOPMENT

This invention was made with support from the United States Government by grant Nos. NIH/NIDDK No. R01-DK63090; NIH/NIGMS No. R01-GM072688; NIH/NIGMS No. U54-GM074946. The United States Government has certain rights in this invention.

FIELD OF THE INVENTION

The invention generally relates to libraries of artificial antibodies or monobodies, i.e., single domain antibody mimics. The libraries include a plurality of different monobodies generated by creating diversity in the surface loop regions of natural molecular scaffolds.

BACKGROUND

The ability to engineer proteins, e.g., antibodies, that bind to a given target is a major goal in protein engineering. Although antibodies are traditionally generated by immunizing animals, recent advances in protein engineering technologies have made it possible to engineer antibody fragments and antibody-like molecules, collectively referred to as "affinity reagents," using recombinant DNA technologies without the use of animals. Such affinity reagents (or antibody mimics) are useful in molecular and cellular biology, biotechnology and the pharmaceutical industry. Usually affinity reagents are selected from combinatorial libraries in which a portion (or portions) of a protein/peptide is diversified. Because it is impossible to make a library that completely covers all possible sequences, the design of combinatorial libraries is an extremely important parameter that determines the success and failure of affinity reagent engineering.

Central to recombinant affinity reagent engineering is the concept of "molecular scaffolds". This concept arises from the mechanism by which the antigen-binding site of antibodies is constructed. The antigen-binding site in the natural antibodies consists primarily of six surface loops ("complementarity determining regions" or

CDRs) that are highly variable in the antibody repertoire, while the rest of the protein

("framework") is much less variable. The antibody framework can be considered as a "molecular scaffold" that present the CDRs, and antibodies with different affinity and/or specificity are generated by changing the amino acid sequences of the CDRs.

In recombinant affinity reagents, the amino acid sequence diversity is usually generated at the DNA level. The DNA sequences (codons) for positions at which amino acid diversity is introduced are changed to a combination of nucleotides (degenerate codons) that code for a set of amino acids. In the classical approach, the NNN (N represents an equal mixture of A, T, G and C) codon was used, which contains 64 possible sequences for all 20 amino acids. The size of recombinant libraries is usually limited by the transformation efficiency (i.e., the efficiency by which DNA can be introduced into a host organism), typically < 10¹⁰ for E. coli (Sidhu et al., 2000, Phage display for selection of novel binding peptides, Methods Enzymol. 328: 333-363, incorporated by reference). Thus, the library size limit determines the degree of coverage of possible sequence combination for a given library.

The six CDRs of standard antibodies typically contain approximately 30 residues. Thus, there are 20³⁰ possible amino acid sequences, and if one uses the NNK codon (K represents an equal mixture of G and T; 32 possible combinations that encode all 20 amino acids) 32³⁰ DNA sequences are required to include all possible DNA combinations. Clearly these sizes are far beyond the practical limit for library size.

Two complementary approaches have been advanced to reduce the total number of possible sequences in a library. One is to reduce the number of positions that are diversified, and the other is to reduce the number of amino acids used for each position. The former approach, i.e., fully randomizing a few positions at a time, has not been very successful in engineering high-affinity binding proteins, probably because protein-protein interaction typically requires a sizable interface (Lo Conte et al., 1999, The atomic structure of protein-protein recognition sites, J. MoI. Biol., 285: 2177-2198).

The latter approach, i.e., the use of "reduced amino acid sets" has been pioneered by Sidhu and coworkers at Genentech (Fellouse et al., 2004, Synthetic antibodies from a four-amino-acid code: a dominant role for tyrosine in antigen recognition, Proc Natl Acad Sci USA, 101 : 12467-12472; Fellouse et al., 2005, Molecular recognition by a binary code, J MoI. Biol., 348(5):1153). They have demonstrated that one can engineer specific Fab fragments from combinatorial libraries that contain as few as two amino acids for each position. In their Fab "binary" library, Sidhu et al., used a combination of Ser and Tyr for a total of 28-36 positions and obtained a series of Fabs with an apparent dissociation constant (K₀ ) in the range from ~20 nM to low μM.

BRIEF DESCRIPTION OF THE INVENTION

The invention provides methods of engineering binding proteins using molecular scaffolds. The problem of providing a combinational library of affinity reagents of practical size is solved in one aspect, by using the tenth FN3 domain of human fibronectin (FN3fn10) as the molecular scaffold and diversifying up to three loops (BC, DE and FG) of FN3fn10 to construct a combinatorial library. In accordance with the invention, it has been shown that monobodies with novel binding functions can be engineered by screening phage-display libraries of FN3fn10 in which loop regions are diversified by varying loop sequence length or replacement of one or more amino acids with serine or tyrosine or a combination of both. In another aspect of the invention, binary amino acids (e.g., serine and tyrosine) libraries of FN3 that produce high affinity monobodies are provided.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention may be better understood and appreciated by reference to the detailed description of specific embodiments presented herein in conjunction with the accompanying drawings.

FIG. 1 is a schematic graphic illustrating the concept of a molecular scaffold, showing the variable loops and the target protein that binds them. FIGS. 2 A-B are schematic drawings of the structure of the 10^th type III domain of human fibronectin (FN3fn10), with β-strands labeled as A-G and the loop regions BC, DE and FG labeled (Koide, et al., 1998, The fibronectin type III domain as a scaffold for novel binding proteins, J. MoI. Biol., 284:1141-1151).

FIG. 3A is the amino acid sequence and restriction sites of FN3 ; FIG. 3B is the amino acid sequence and restriction sites of the "shaved" FN3 with introduced serines in boldface, and including the stability-enhancing mutation D7K.

FIG. 4 shows tables of S/Y monobodies generated for a variety of target proteins and the associated K_ds.

FIG. 5 is a graph illustrating the binding of S/Y monobodies for ySUMO and MBP;

FIG. 6 is a graphic illustrating varying loop length.

FIG. 7 shows schematic vector maps of a phage display vector and a yeast surface display vector in accordance with the present invention.

FIG. 8 is a representative sensorgram for the interaction of a MBP-binding monobody (MBP-1 , SEQ ID NO:63) with MBP. The monobody was first immobilized to a sensor chip NTA through the HiS₁₀ tag. MBP (300 nM) was injected and its association and dissociation were monitored on a Biacore 2000 instrument. The black curves show the experimental data, and the red curves show the best fit with a K_d of 20 nM.

FIG. 9 is a table of alignments of the loop sequence regions of different clone monobodies for the three targets, MBP, hSUMO4, and ySUMO, and the respective K_d values.

FIG. 1OA is the titration curves for three MBP-binding monobodies tested using yeast surface display showing level of MBP binding (PE fluorescence) normalized with respect to the level of monobody display (FITC fluorescence). FIG. 1OB is a SPR sensorgram of the interaction between the MBP-74 monobody and MBP. FIG. 10C- E are bar graphs showing the binding specificity of monobodies tested with three different targets and yeast surface display, where 1OC is hSUMO4, 10D is ySUMO, and 10E iS MBP. FIG. 11A is a schematic of the x-ray crystal structure of the MBP monobody MBP-74 fusion protein. FIG. 11 B and 11C is a schematic of the "binding complex" of MBP and monobody with the MBP fusion partner. FIG. 11 D is the epitope shown on the MBP surface. FIG. 11 E is the epitope of MBP mapped by NMR spectroscopy. FIG. 11 F is a comparison of the backbone conformation of the recognition loops between monobody MBP-74 and wild-type FN3fn10.

FIG. 12A-C is a diagram of the stick model of binding interface of the MBP-74 monobody and MBP. FIG. 12D is a bar graph of the buried surface areas of the monobody residues.

FIG. 13A is the superposition of βCD bound to MBP and the monobody paratope residues. FIG. 13B is the comparison of the MBP epitope and βCD with that of MBP- 74 monobody.

FIG. 14 is the HSQC spectra of [²H₁ ¹³C₁ ¹⁵N]-MBP in the absence and presence of the unlabeled MBP-74 monobody by NMR spectroscopy.

FIG. 15 is a representation of the interaction between the MBP residues (right side) and the monobody residues (left side).

FIG. 16A is a list of the clones occurring by mutation of the MBP-74 monobody sorted by using phage display to bind to MBP. FIG. 16B is a list of the clones occurring by mutation of the MBP-74 monobody sorted by using phage display for binding to V5 epitope tag.

FIG. 17A is a model of the paratope of the MBP-74 monobody. FIG. 17B is the paratope of Fab-YSd1. FIG. 17C are bar graphs comparing the amino acid compositions of the binding interfaces for the MBP-74 monobody/MBP complex and the Fab-YSd1.hDR5 complex, the upper panel show the buried surface areas for the paratopes plotted for different amino acid types, and the lower panel for the epitopes.

FIG. 18A depicts a schematic drawing of the monobody scaffold with β-strands labeled, and the FG loop, where amino acid diversity was introduced into the library. FIG. 18B is the schematic alignment of wildtype FN3 FG loop sequence, the single- loop binary library, and two MBP-binding clones selected from the library. FIG. 18C is a SPR sensorgrams for MBP-SL1 , where MBP-SL1 was immobilized on a surface and the association and dissociation of MBP was monitored. FIG. 18D is a table outlining the binding parameters obtained from the SPR analysis for two single-loop monobodies and for MBP-74.

DETAILED DESCRIPTION

The invention provides simplified combinatorial libraries of antibody mimics or monobodies utilizing a molecular (protein) scaffold. In an illustrated embodiment, the invention provides a combinatorial library for a fibronectin type III (FN3) scaffold. The library includes certain antibody mimics or monobodies in which the loop regions of the scaffold, corresponding to an antigen binding site or pocket, are varied in length and substitution using binary set, i.e., only two amino acids, such as serine and tyrosine. The number of total possible sequences that are encoded in the library is much smaller and potentially yields a greater percentage of high affinity binders than unrestricted libraries. In other words, the combinatorial libraries embodying the principles of the invention are binary libraries of FN3 that produce high affinity monobodies i.e., single domain antibody mimics.

It was surprisingly found that many specific binders were generated compared to the prior Fab scaffold as the number of amino acids in the FN3fn10 loops is so much smaller that that of Fab. As noted, Fellouse et al. (Fellouse et al., 2005, Molecular recognition by a binary code, J MoI. Biol., 348(5):1153) demonstrated that S/Y library produces binders with Kd values from 17 nM to low micro M. They diversified approximately 30-40 positions in four loops. Even with the large number of residues that can potentially form a binding interface, the affinity of the S/Y Fab molecules was not very high.

In contrast, the FN3 scaffold has only up to three loops that can be diversified. The maximal number of residues that can potentially form a binding interface is 20-25 residues. Therefore, the interface size of FN3 monobodies are expected to be significantly smaller than that of Fab's. Furthermore, the smaller number of diversified positions in the FN3 monobodies results in a smaller number of possible amino acid sequences that can be coded, which, in turn, results in a smaller chemical diversity of the interface. Taken together, the success of the S/Y FN3 monobody libraries would not have been predicted from the work of Fellouse et al.

Before any embodiments of the invention are explained in detail, it is to be understood that the invention is not limited in its application to the details of the invention set forth in the following description or illustrated in the appended drawings. The invention is capable of other embodiments and of being practiced or of being carried out in various ways. Also, it is to be understood that the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of "including," "comprising," or "having" and variations thereof herein is meant to encompass the items listed thereafter and equivalents thereof as well as additional items.

The invention will now be further described through the following detailed description of the invention, which detailed description is illustrative of the certain embodiments of the invention and is not intended to limit the scope of the invention as set forth in the appended claims. While the following detailed description describes the invention through reference to embodiments utilizing certain molecular scaffolds and engineered polypeptides, it should be understood that other scaffolds peptides are also suitable for use with the teachings of the invention.

Further, no admission is made that any reference, including any patent or patent document, cited in this specification constitutes prior art. In particular, it will be understood that, unless otherwise stated, reference to any document herein does not constitute an admission that any of these documents forms part of the common general knowledge in the art in the United States or in any other country. Any discussion of the references states what their authors assert, and the applicant reserves the right to challenge the accuracy and pertinency of any of the documents cited herein.

As will be understood by one skilled in the art, for any and all purposes, particularly in terms of providing a written description, any and all ranges disclosed herein also encompass any and all possible subranges and combinations of subranges thereof. As only one example, a range of 20% to 40%, can be broken down into ranges of 20% to 32.5% and 32.5% to 40%, 20% to 27.5% and 27.5% to 40%, etc. For further example, if a peptide or polypeptide is stated as having 7 to 300 amino acids, it is intended that values such as 7 to 25, 8 to 30, 9 to 90 or 50 to 300 are expressly enumerated in this specification. Any listed range can be easily recognized as sufficiently describing and enabling the same range being broken down into at least equal halves, thirds, quarters, fifths, tenths, etc. As a non-limiting example, each range discussed herein can be readily broken down into a lower third, middle third and upper third, etc. Further, as will also be understood by one skilled in the art, all language such as "up to," "at least," "greater than," "less than," "more than" and the like include the number recited and refer to ranges which can be subsequently broken down into subranges as discussed above. In the same manner, all ratios disclosed herein also include all subratios falling within the broader ratio. These are only examples of what is specifically intended.

Unless otherwise noted, technical terms are used according to conventional usage. Definitions of common terms in molecular biology may be found in many standard references, e.g., Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by VCH Publishers, Inc., 1995 (ISBN 1-56081-569-8), Robert A. Meyers (ed.), incorporated here by reference.

However, as used herein, the following definitions may be useful in aiding the skilled practitioner in understanding the invention:

As used herein, "monobody" or "monobodies" is intended to refer to an artificial or synthetic single domain antibody or antibody mimic, i.e., to a binding protein/polypeptide that has a sequence variable domain, and especially to a protein/polypeptide that has one or more variable loop regions. Particularly, in the context of an illustrated embodiment encompassing the principles of the invention, the term refers to a polypeptide which includes a β-strand domain lacking in disulfide bonds and containing a plurality of β-strands, two or more loop regions each connecting one β-strand to another β-strand, wherein at least one of the two or more loop regions, is characterized by activity in binding a target protein or target molecule. Suitably, polypeptide monobodies of the invention can include three or more loop regions. The size of such polypeptide monobodies is suitably less than about 30 kDa, more suitably less than about 20 kDa. Monobodies, formed using a small β-sheet protein scaffold, such as the tenth fibronectin type III domain from human fibronectin (FN3fn10), have been previously described (Koide, et al., 1998, The fibronectin type III domain as a scaffold for novel binding proteins, J. MoI. Biol., 284:1141-1151).

The terms "target" and "target molecule," as used herein, refer to any biomolecule of interest for which a binding protein is sought. Exemplary targets include, but are not limited to, secreted peptide growth factors, pharmaceutical agents, cell signaling molecules, blood proteins, portions of cell surface receptor molecules, portions of nuclear receptors, steroid molecules, viral proteins, antibodies, portions of antibodies, carbohydrates, enzymes, active sites of enzymes, binding sites of enzymes, portions of enzymes, small molecule drugs, cells, bacterial cells, proteins, molecular affinity's of proteins, surfaces of proteins involved in protein-protein interactions, cell surface epitopes, diagnostic proteins, diagnostic markers, plant proteins, peptides involved in protein-protein interactions, and foods.

As used herein, the term "library" refers to any collection of different proteins/polypeptides. In certain embodiments, a library may be a collection of polypeptides that have been modified to favor the inclusion of certain amino acid residues, or polypeptides of certain lengths.

As used herein in connection with numerical values, the terms "approximately" and "about" are meant to encompass variations of ± 20% or ± 10% or less of the indicated value.

As is conventional, the terms "a" and "an" mean "one or more" when used herein, including in the claims.

By "molecular scaffold" or "scaffold" is meant a core molecule or framework, particularly a polypeptide used to select or design a polypeptide frame with specific and favorable properties, such as binding affinity. One or more additional chemical moieties can be covalently attached to, modified in, or eliminated from the core molecule to form a plurality or library of molecules with common structural elements. Characteristics of a scaffold can include having chemical positions where moieties can be attached that do not interfere with binding of the scaffold to a protein binding site, such that the scaffold or library members can be modified to improve binding affinity and/or specificity. When designing polypeptides or proteins from a scaffold, amino acid residues that are important for the framework's favorable binding properties are retained, while others may be varied to provide a peptide with improved linkage to generate tailor-made target specific binding proteins.

As used herein, the term "modulating" or "modulate" refers to an effect of altering a biological activity, especially a biological activity associated with a particular biomolecule. For example, an agonist or antagonist of a particular biomolecule modulates the activity of that biomolecule, e.g., an enzyme. In the following description of embodiments of the methods of the invention, process steps are carried out at room temperature and atmospheric pressure unless otherwise specified. Standard techniques are used for recombinant nucleic acid methods, polynucleotide synthesis, cell culture, and transgene incorporation (e.g., electroporation, microinjection, lipofection). Enzymatic reactions, oligonucleotide synthesis, and purification steps are performed according to the manufacturer's specifications. The techniques and procedures are performed according to conventional methods in the art and various general references that are provided throughout this document. The procedures herein are generally well known in the art, some of which are provided for the convenience of the reader.

The monobodies in accordance with the invention are suitably based on the structure of a fibronectin module of type III (FN3), a common domain found in mammalian blood and structural proteins, as a scaffold. This domain has been estimated to occur in 2% of the proteins sequenced to date, including but not limited to fibronectins, tenscin, intracellular cytoskeletal proteins, and prokaryotic enzymes (Bork and Doolittle, 1992, Proposed acquisition of an animal protein domain by bacteria, Proc. Natl. Acad. Sci. USA ,89:8990; Bork et al., 1997, Nature Biotech., 15:553; Meinke et al., 1993, Cellulose-binding polypeptides from Cellulomonas fimi: endoglucanase D (CenD), a family A beta-1 ,4-glucanase, J. Bacteriol. 175:1910; Watanabe et al., 1990, Gene cloning of chitinase A1 from Bacillus circulans WL-12 revealed its evolutionary relationship to Serratia chitinase and to the type III homology units of fibronectin, J. Biol. Chem., 265:15659, incorporated herein by reference). Suitably, these scaffolds include, as templates, the tenth module of human FN3 (FN3fn10), which comprises 94 amino acid residues. FN3fn10 does not contain disulfide bonds or metal binding sites, is highly stable and undergoes reversible unfolding (Koide, et al., 1998, The fibronectin type III domain as a scaffold for novel binding proteins, J. MoI. Biol., 284:1141-1151 ; Main et al., 1992, The three- dimensional structure of the tenth type III module of fibronectin: an insight into RGD- mediated interactions, Cell 71 :671-678; Plaxco et al., 1996, Rapid refolding of a proline-rich all-beta-sheet fibronectin type III module, Proc Natl Acad Sci USA, 93(20): 10703-6). It is small enough for structural analysis, yet large enough to accommodate multiple binding domains so as to achieve tight binding and/or high specificity for its target. As described in more detail below, the monobodies, in accordance with the invention, exhibit improved biophysical properties, such as stability under reducing conditions and solubility at high concentrations. In addition, these monobodies may be readily expressed and folded in prokaryotic systems, such as E. coli, in eukaryotic systems, such as yeast, and in in vitro translation systems, such as the rabbit reticulocyte lysate system. Moreover, these monobodies are extremely amenable to affinity maturation techniques involving multiple cycles of selection, including in vitro selection using RNA-protein fusion technology (Roberts and Szostak, 1997, RNA- peptide fusions for the in vitro selection of peptides and proteins, Proc. Natl. Acad. Sci USA 94:12297; Szostak et al., U.S. Ser. No. 09/007,005 and U.S. Ser. No. 09/247,190; Szostak et al. WO98/31700, incorporated herein by reference), phage display (see, for example, Smith and Petrenko, Phage Display, Chem. Rev. 97:317, 1997, incorporated herein by reference), and yeast display systems (see, for example, Boder and Wittrup, 1997, Yeast surface display for screening combinatorial polypeptide libraries, Nature Biotech., 15:553, incorporated herein by reference).

In an illustrated embodiment, the molecular scaffold for formation of monobodies in accordance with the invention is the fibronectin type III domain (FN3). As noted, one suitable wild-type FN3 scaffold is the tenth FN3 domain of human fibronectin (FN3fn10), which is illustrated in FIG. 2A-B, and has an amino acid sequence according to FIG. 3A. An even more suitable scaffold is the synthetic "shaved" FN3 scaffold which is the tenth FN3 domain of human fibronectin and in which serines have been introduced for underrepresented amino acids and includes the stability- enhancing mutation D7K as shown in FIG. 3B.

Both the "shaved" and wild-type FN3fn10 are characterized by the same structure, namely seven β-strand domain sequences (designated A through G and six loop regions (AB loop, BC loop, CD loop, DE loop, EF loop, and FG loop) as illustrated in FIG. 2A-B, which connect the seven β-strand domain sequences. In accordance with the invention, three loops, the BC, the DE, and the FG, were varied. As shown in FIGS. 2A-B, the BC loop, DE loop, and FG loop are all located at the same end of the polypeptide monobody, and the BC loop corresponds to residues 21-31 , the DE loop corresponds to residues 51-56, and the FG loop corresponds to residues 75-88. It is noted that residue 75 of the FG loop has not previously been included and varied in the FG loop. The exposed loop sequences of the FN3fn10 framework tolerate randomization, facilitating the generation of diverse pools of antibody mimics or monobodies. Despite the fact that the FN3Fn10 module is not an immunoglobulin, its overall fold is close to that of the variable region of the IgG heavy chain, making it possible to display three fibronectin loops analogous to CDRs in relative orientations similar to those of native antibodies. Because of this structure, the antibody mimics or monobodies embodying the principles of the invention possess antigen binding properties that are similar in nature and affinity to those of antibodies, and a loop randomization and shuffling strategy may be employed in vitro that is similar to the process of affinity maturation of antibodies in vivo. The randomization of these three loops does not have an adverse effect on the overall fold or stability of the FN3fn10 framework or scaffolds itself.

FN3fn10-based monobodies have been previously reported for several targets, e.g., ubiquitin, estrogen receptor, integrins (Koide, et al., 1998, The fibronectin type III domain as a scaffold for novel binding proteins, J. MoI. Biol., 284:1141-1151 ; Koide et al., 2002, Probing protein conformational changes by using designer binding proteins: application to the estrogen receptor, Proc. Natl. Acad Sci. USA, 99:1253- 1258; Zhao et al., 2003, Mutation of Leu-536 in human estrogen receptor-alpha alters the coupling between ligand binding, transcription activation, and receptor conformation, J Biol Chem, 278:27278-86; Richards et al., 2003, Engineered fibronectin type III domain with a RGDWXE sequence binds with enhanced affinity and specificity to human alphavbeta3 integrin, J MoI Biol 326:1475-88), SH3 domains (Karatan et al., 2004, Molecular recognition properties of FN3 monobodies that bind the Src SH3 domain, Chem Biol 11 :835-844), TNFα (Phylos). Such monobodies were generated and selected by randomly varying the amino acid sequences of the loop regions of FN3fn10.

The monobodies in accordance with the invention are generated by diversifying the amino acid sequences in up to three loops (BC, DE and FG) of FN3fn10 to construct a combinatorial library. At least one loop region sequence includes an amino acid sequence in which a serine or tyrosine or a combination of both (Ser/Tyr or S/Y) is substituted for the loop sequences of the wild-type. Up to all three loops are suitably replaced with S/Y. The loop region sequence can be varied by replacement of one or more amino acids with serine and/or tyrosine from a corresponding loop region in a wild-type or mutant FN3 scaffold. It is also comtemplated that serine/tryptophan may be used in the same manner as described for S/Y to generate libraries of monobodies.

Based on FN3fn10 scaffold, the monobodies described herein have no disulfide bonds, which have been reported to retard or prevent proper folding of antibody fragments under certain conditions. Since the scaffolds utilized in the methods of the invention do not rely on disulfides for native fold stability, they are stable under reducing conditions, unlike antibodies and their fragments which unravel upon disulfide bond breakdown.

The three loops of FN3fn10 corresponding to the antigen-binding loops of the IgG heavy chain run between amino acid residues 21-31 , 51-56, and 75-88. The length of the first and the third loop, 11 and 12 residues, respectively, fall within the range of the corresponding antigen-recognition loops found in antibody heavy chains, that is,

10-12 and 3-25 residues, respectively. Accordingly, once randomized with S/Y and selected for high antigen affinity, these two loops make contacts with antigens equivalent to the contacts of the corresponding loops in natural antibodies.

The second loop of FN3fn10 is only 6 residues long, whereas the corresponding loop in antibody heavy chains ranges from 16-19 residues. To optimize antigen binding, therefore, the second loop of FN3fn10 may be extended by 10-13 residues (in addition to being randomized) to obtain the greatest possible flexibility and affinity in antigen binding. Indeed, in general, the lengths as well as the sequences of the CDR-like loops of the monobodies may be randomized during in vitro or in vivo affinity maturation (as described below).

As noted, the loop regions can also be diversified by varying lengths of the regions, suitably to between 4 and 25 amino acids. As shown in the examples below, all loops appeared to prefer near-wild-type lengths. Specifically, the DE is suitably full length as in the wild-type. The lengths of BC and FG were more amenable to length variations as seen in FIG. 6.

The engineering of the monobodies in accordance with the invention can be accomplished at the DNA level via recombinant techniques. Such techniques afford the deletion, insertion, or replacement of amino acids from a corresponding loop region, in a wild-type or other synthetic FN3 scaffold. In other words, recombinant techniques permit diversification of the amino acid residues and the loop length. Deletions can be a deletion of one or more amino acid residues down to substantially four amino acid residues appearing in a particular loop region. Insertions can be an insertion of one or more amino acid residues which is serine or tyrosine up to about 25 amino acid residues, or suitably up to about 15 amino acid residues. Replacements can be replacements of one or more amino acid residues with serine or tyrosine in a particular loop region.

In generating a binary amino acid library in accordance with the invention, the use of serine and tyrosine represents a bias toward amino acids that are prevalent in protein-binding interfaces (See, Mian et al., 1991 , Structure, function and properties of antibody binding sites, J. MoI. Biol., 217:133-151 , incorporated by reference). Further, the construction design eliminates those amino acids that are under- represented and those that may cause undesired complexities.

Specifically, the deletions, insertions, and replacements (relative to wild-type) on FN3 scaffolds can be achieved using recombinant techniques beginning with a known nucleotide sequence. A synthetic gene for the tenth domain of FN3 of human fibronectin (FIG. 3A, SEQ ID NO:1) was designed which includes convenient restriction sites for ease of mutagenesis and uses specific codons for high-level protein expression (Gribskov et al., 1984, The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression, Nucleic Acids Res., 12(1 Pt 2):539-49). As seen in FIG. 3A, the residue numbering is according to Main et al. Restriction enzyme sites designed are shown above the amino acid sequence. B-strands are denoted by underlines. The N-terminal "mq" sequence has been added for a subsequent cloning into an expression vector. The His-tag fusion protein has an additional sequence preceding the FN3 sequence shown above. This gene is substantially identical to the gene disclosed in co- pending U.S. patent applications Ser. No. 09/096,749 to Koide, filed June 12, 1998, and No. 10/190,162, to Koide, filed July 3, 2002, which are hereby incorporated by reference in their entireties. The "shaved" FN3 gene used as the starting material in the invention is similarly illustrated in FIG. 3B (SEQ ID NO:2).

The gene was assembled as follows: first the gene sequence was divided into five parts with boundaries at designed restriction sites (FIG. 3); for each part, a pair of oligonucleotides that code opposite strands and have complementary overlaps of about 15 bases was synthesized; the two oligonucleotides were annealed and single strand regions were filled in using the Klenow fragment of DNA polymerase; the double-stranded oligonucleotide was cloned into the pET3a vector (Novagen) using restriction enzyme sites at the termini of the fragment and its sequence was confirmed by an Applied Biosystems DNA sequencer using the dideoxy termination protocol provided by the manufacturer. These steps were repeated for each of the five parts to obtain the whole gene. Although this approach takes more time to assemble a gene than the one-step polymerase chain reaction (PCR) method, no mutations occurred in the gene. Mutations would likely have been introduced by the low fidelity replication of Taq polymerase, and would have required time-consuming gene-editing. Recombinant DNA manipulations were performed according to Molecular Cloning, Laboratory Manual (Sambrook et al., 1989, 2002, incorporated herein by reference), unless otherwise stated.

Mutations are introduced to the FN3 gene using Kunkel mutagenesis (Kunkel et al., 1987, Rapid and efficient site-directed mutagenesis without phenotypic selection. Methods Enzymol, 154: 367-382, incorporated herein by reference). Kunkel mutagenesis can be utilized to randomly produce a plurality of mutated monobody coding sequences which can be used to prepare a combinatorial library of polypeptide monobodies for screening. Basically, targeted loop regions (or C- terminal or N-terminal tail regions) can be randomized using a degenerate codon.

As described above, Monobodies in accordance with the invention can be isolated using cell-display-based library technology, wherein the monobodies are selected by exposing a library of polypeptides displayed on the surface of phage, yeast or other host cell, to a target molecule of interest, and isolating those variants that bind to the target. For example, phage display is a well-known method in the art by which variant polypeptides are displayed as fusion proteins to at least a portion of the coat protein on the surface of phage particles. In the examples below, a library of monobodies is created on the surface of filamentous phage viruses by adding monobody genes to the gene that encodes the phage's coat protein. Phage display can be used for high throughput screening of protein interactions. Each phage expresses and displays multiple copies of a single antibody fragment on its surface. Because each phage possesses both the surface-displayed monobody and the DNA that encodes that fragment, the monobody that binds to a target can be identified by amplifying the associated DNA. Similarly, yeast surface display, may be used to isolate high affinity monobodies against a variety of targets. In one embodiment of the invention, the FN3 phage display system may be constructed as described in the examples below.

As explained in the examples below, the S/Y library contained ~10¹⁰ sequences. The sequences were selected through three rounds of phase display and an optional round of yeast surface display. The library so constructed is capable of producing binding proteins, i.e., monobodies, to a variety of targets as seen in the table of FIG. 4. Specific binding of monobodies is also shown in FIG. 5.

Nucleic acid molecules encoding the polypeptide monobody can be incorporated into host cells using conventional recombinant DNA technology. Recombinant molecules can be introduced into cells via transformation, particularly transduction, conjugation, mobilization, or electroporation. The DNA sequences are cloned into the vector using standard cloning procedures in the art, as described by Sambrook et al. (1989, 2002). Generally, this involves inserting the DNA molecule into an expression system to which the DNA molecule is heterologous (i.e., not normally present). The heterologous DNA molecule is inserted into the expression system or vector in sense orientation and correct reading frame. The vector contains the necessary elements (promoters, suppressers, operators, transcription termination sequences, etc.) for the transcription and translation of the inserted protein-coding sequences. See, e.g., U.S. Pat. No. 4,237,224 to Cohen and Boyer, which describes the production of expression systems in the form of recombinant plasmids using restriction enzyme cleavage and ligation with DNA ligase, incorporated herein by reference. These recombinant plasmids are then introduced by means of transformation and replicated in unicellular cultures including prokaryotic organisms and eukaryotic cells grown in tissue culture.

A variety of host-vector systems may be utilized to express the polypeptide monobody or fusion protein which includes a polypeptide monobody. Primarily, the vector system must be compatible with the host cell used. Host-vector systems include but are not limited to the following: bacteria transformed with bacteriophage DNA, plasmid DNA, or cosmid DNA; microorganisms such as yeast containing yeast vectors; and mammalian cell systems infected with virus (e.g., vaccinia virus, adenovirus, etc.). The expression elements of these vectors vary in their strength and specificities. Depending upon the host-vector system utilized, any one of a number of suitable transcription and translation elements can be used. Once the DNA molecule encoding the polypeptide monobody has been cloned into an expression system, it is ready to be incorporated into a host cell. Such incorporation can be carried out by the various forms of transformation noted above, depending upon the vector/host cell system. Suitable host cells include, but are not limited to, bacteria, yeast cells, mammalian cells, etc.

Monobodies in accordance with the present invention are well suited for expression as fusion proteins in combinatorial libraries to be screened, i.e., using a yeast or mammalian two-hybrid system. Yeast and mammalian two-hybrid systems have been established as standard methods to identify and characterize protein interactions in the nucleus of yeast cells (Fields & Song, 1989, A novel genetic system to detect protein-protein interactions, Journal: Nature, 340:245-246; Uetz & Hughes, 2000, Systematic and large-scale 2-hyrid screens, Current Opinion in Microbiology, 3303-308). These approaches have previously been adapted for combinatorial library screening of specific peptide libraries (Colas & Brent, 1998, The impact of two-hybrid and related methods on biotechnology, Trends Biotechnol., (8):355-63; Mendelsohn & Brent, 1994, Applications of interaction traps/two-hybrid systems to biotechnology research, Curr Opin Biotechnol., 5(5):482-6, incorporated by reference). One version of the yeast-two hybrid system has been described (Chien et al., 1991, The two-hybrid system: a method to identify and clone genes for proteins that interact with a protein of interest, Proc Natl Acad Sci, 88(21 ):9578-82, incorporated herein by reference) and is commercially available from Clontech (Palo Alto, Calif.). The two-hybrid system or related methodology can be used to screen activation domain libraries for polypeptide monobodies that interact with a known target protein or polypeptide. The inventors successfully selected monobodies using a yeast two-hybrid system as described in the examples below.

In an aspect, the present invention relates to a combinatorial library which includes a plurality of fusion polypeptides. Each of the fusion polypeptides within the combinatorial library includes a transcriptional activation domain fused to a fibronectin type III (FN3) polypeptide monobody as described above, with at least one loop region sequence including a combinatorial amino acid sequence which varies by deletion, insertion, or replacement of one or more amino acids with serine or tyrosine or a combination of both from a corresponding loop region in a wild-type FN3 domain of fibronectin. The size of the combinatorial library will necessarily vary depending on the size of the combinatorial sequence introduced into the monobody coding sequence (i.e., the number of mutations introduced into a particular loop coding sequence). For purposes of screening, however, the combinatorial library is preferably at least about 10³ in size, affording at least about 10⁵ transformed cells. Therefore, while some redundancy may exist for each individual combinatorial amino acid sequence, considering the total number of transformants, the combinatorial sequence in each individual transformant differs from substantially all other combinatorial sequences present in the combinatorial array of transformants.

The combinatorial sequence in each polypeptide monobody can be the result of deletions, insertions, or replacements of the type described above. In certain aspects of the invention, the combinatorial amino acid sequence is at least about 4 amino acids in length, including one or more deletions, insertions, or replacements. In other aspects of the present invention, the combinatorial amino acid sequence is at least about 25 amino acids in length, including one or more deletions, insertions, or replacements.

Virtually any target protein that does not self-activate the reporter gene can be used. The two hybrid system is not suitable for membrane-bound targets. For such targets, the split ubiquitin (Johnsson & Varshavsky, 1994, Split ubiquitin as a sensor of protein interactions in vivo, Proc Natl Acad Sci USA, 91 (22): 10340-4) or dihydroforate reductase reconstitution can be used (Pelletier et al., 1998, Oligomerization domain-directed reassembly of active dihydrofolate reductase from rationally designed fragments, Proc Natl Acad Sci USA, 95(21):12141-6).

The target protein can be any protein or polypeptide. As explained in the examples below, targets used in invention included APC22356, APC35945, MBP (maltose binding protein), APC25517, yeast SUMO (ySUMO) and human SUMO4 (hSUMO4).

Having identified that certain monobodies bind to a target protein, the monobodies in accordance with the invention may be developed to bind any target of interest.

Monobodies or nucleic acids encoding them may be of therapeutic value, i.e., used for therapeutic administration to modulate or modify the activity of the target protein in vivo. Monobodies may be employed in place of antibodies in all areas in which antibodies are used including, research, therapeutic and diagnostic fields. For purposes of therapeutic usage, it is suitable for polypeptide monobodies be prepared in substantially pure form. This can be performed according to standard procedures. Typically, this involves recombinant expression of the desired polypeptide monobody by a host cell, propagation of the host cells, lysing the host cells, and recovery of supernatant by centrifugation to remove host cell debris. The supernatant can be subjected to sequential ammonium sulfate precipitation. The fraction containing the polypeptide monobody of the invention is subjected to gel filtration in an appropriately sized dextran or polyacrylamide column to separate the polypeptide monobodies. If necessary, the protein fraction may be further purified by HPLC. The isolation and purification of polypeptide monobodies, in particular, has previously been reported by Koide et al. (Koide et al.,1998, The fibronectin type III domain as a scaffold for novel binding proteins, J. MoI. Biol. 284:1141-1151).

Whether the polypeptide monobodies themselves or nucleic acids encoding them are administered alone or in combination with pharmaceutically or physiologically acceptable carriers, excipients, or stabilizers, or in solid or liquid form such as, tablets, capsules, powders, solutions, suspensions, or emulsions, they may, suitably formulated, be administered orally, parenterally, subcutaneously, intravenously, intramuscularly, intraperitoneal^, by intranasal instillation, by intracavitary or intravesical instillation, intraocularly, intraarterially, intralesionally, or by application to mucous membranes, such as, that of the nose, throat, and bronchial tubes. For most therapeutic purposes, the polypeptide monobodies or nucleic acids can be administered intravenously.

Pharmaceutical compositions of the monobodies or nucleic acids in accordance with the present invention may be formulated in accordance with routine procedures as compositions adapted for intravenous administration to human beings. Typically, compositions for intravenous administration are solutions in sterile isotonic aqueous buffer. Where necessary, the composition also may include a solubilizing agent and a local anesthetic such as lidocaine to ease pain at the site of the injection. Generally, ingredients are supplied either separately or mixed together in unit dosage form; for example, as a dry lyophilized powder or water free concentrate in a hermetically sealed container such as an ampoule or sachette indicating the quantity of active agent. Solutions or suspensions of ingredients can be prepared in a physiologically acceptable diluent with a pharmaceutical carrier. Such carriers include sterile liquids, such as water and oils, with or without the addition of a surfactant and other pharmaceutically and physiologically acceptable carrier, including adjuvants, excipients or stabilizers. Illustrative oils are those of petroleum, animal, vegetable, or synthetic origin, for example, peanut oil, soybean oil, or mineral oil. In general, water, saline, aqueous dextrose and related sugar solution, and glycols, such as propylene glycol or polyethylene glycol, are preferred liquid carriers, particularly for injectable solutions.

Where the composition is to be administered by infusion, it can be dispensed by an infusion bottle containing sterile pharmaceutical grade water or saline. Where the composition is administered by injection, an ampoule of sterile water for injection or saline can be provided so that the ingredients may be mixed prior to administration.

For use as aerosols, the polypeptide monobodies or nucleic acids in solution or suspension may be packaged in a pressurized aerosol container together with suitable propellants, for example, hydrocarbon propellants like propane, butane, or isobutane with conventional adjuvants. The materials of the present invention also may be administered in a non-pressurized form such as in a nebulizer or atomizer.

A number of known delivery techniques can also be utilized for the delivery into cells, of either the polypeptide monobodies themselves or nucleic acid molecules which encode them. Regardless of the particular method of the invention which is practiced, when it is desirable to contact a cell (i.e., to be treated) with a polypeptide monobody or its encoding nucleic acid, it is preferred the contacting be carried out by delivery of the polypeptide monobody or its encoding nucleic acid into the cell.

One approach for delivering polypeptide monobody or its encoding DNA into cells involves the use of liposomes. Basically, this involves providing the polypeptide monobody or its encoding DNA to be delivered, and then contacting the target cell with the liposome under conditions effective for delivery of the polypeptide monobody or DNA into the cell.

Liposomes are vesicles comprised of one or more concentrically ordered lipid bilayers which encapsulate an aqueous phase. They are normally not leaky, but can become leaky if a hole or pore occurs in the membrane, if the membrane is dissolved or degrades, or if the membrane temperature is increased to the phase transition temperature. Current methods of drug delivery via liposomes require that the liposome carrier ultimately become permeable and release the encapsulated drug at the target site. This can be accomplished, for example, in a passive manner wherein the liposome bilayer degrades over time through the action of various agents in the body. Every liposome composition will have a characteristic half-life in the circulation or at other sites in the body and, thus, by controlling the half-life of the liposome composition, the rate at which the bilayer degrades can be somewhat regulated.

In contrast to passive drug release, active drug release involves using an agent to induce a permeability change in the liposome vesicle. Liposome membranes can be constructed so that they become destabilized when the environment becomes acidic near the liposome membrane (Wang & Huang, 1987, phi-sensitive immunoliposomes mediate target-cell-specific delivery and controlled expression of a foreign gene in mouse, Proc Natl Acad Sci USA, 84(22):7851-5, incorporated by reference). When liposomes are endocytosed by a target cell, for example, they can be routed to acidic endosomes which will destabilize the liposome and result in drug release.

Alternatively, the liposome membrane can be chemically modified such that an enzyme is placed as a coating on the membrane which slowly destabilizes the liposome. Since control of drug release depends on the concentration of enzyme initially placed in the membrane, there is no real effective way to modulate or alter drug release to achieve "on demand" drug delivery. The same problem exists for pH- sensitive liposomes in that as soon as the liposome vesicle comes into contact with a target cell, it will be engulfed and a drop in pH will lead to drug release.

This liposome delivery system can also be made to accumulate at a target organ, tissue, or cell via active targeting (e.g., by incorporating an antibody or hormone on the surface of the liposomal vehicle). This can be achieved according to known methods.

Different types of liposomes can be prepared according to Bangham et al. (Bangham et al., 1965, Cation permeability of phospholipid model membranes: effect of narcotics, Nature, 208(5017): 1295-1297); U.S. Pat. Nos. 5,653,996 to Hsu et al.; 5,643,599 to Lee et al.; 5,885,613 to Holland et al.; 5,631 ,237 to Dzau et al.; and U.S. Pat. No. 5,059,421 to Loughrey et al., incorporated herein by reference, as well as any other approach demonstrated in the art. An alternative approach for delivery of polypeptide monobodies involves the conjugation of the desired polypeptide monobody to a polymer that is stabilized to avoid enzymatic degradation of the conjugated monobody. Conjugated proteins or polypeptides of this type are described in U.S. Pat. No. 5,681 ,811 to Ekwuribe, incorporated by reference.

Yet another approach for delivery of polypeptide monobodies involves preparation of fused proteins according to U.S. Pat. No. 5,817,789 to Heartlein et al, incorporated by reference. The protein can include a ligand domain and, e.g., a polypeptide monobody which has activity to bind a cellular target (e.g., a nuclear receptor or other cellular protein). The ligand domain is specific for receptors located on a target cell. Thus, when the protein is delivered intravenously or otherwise introduced into blood or lymph, the protein will adsorb to the targeted cell, and the targeted cell will internalize the protein. An exemplary approach is the HIV Tat protein.

When it is desirable to achieve heterologous expression of a desirable polypeptide monobody in a target cell, DNA molecules encoding the polypeptide monobody can be delivered into the cell. Basically, this includes providing a nucleic acid molecule encoding the polypeptide monobody and then introducing the nucleic acid molecule into the cell under conditions effective to express the polypeptide monobody in the cell. Preferably, this is achieved by inserting the nucleic acid molecule into an expression vector before it is introduced into the cell.

When transforming mammalian cells for heterologous expression of a polypeptide monobody, an adenovirus vector can be employed. Adenovirus gene delivery vehicles can be readily prepared and utilized given the disclosure provided in Berkner (Berkner, 1988, Development of adenovirus vectors for the expression of heterologous genes, Biotechniques, 6(7):616-29) and Rosenfeld et al. (Rosenfeld et al., 1991 , Adenovirus-mediated transfer of a recombinant alpha 1-antitrypsin gene to the lung epithelium in vivo, Science, 252(5004): 374), incorporated by reference. Adeno-associated viral gene delivery vehicles can be constructed and used to deliver a gene to cells. The use of adeno-associated viral gene delivery vehicles in vivo is described in Flotte et al. (Flotte et al., 1993, Prospects for virus-based gene therapy for cystic fibrosis, J Bioenerg Biomembr., 25(1):37-42) and Kaplitt et al. (Kaplitt et al., 1994, Long-term gene expression and phenotypic correction using adeno-associated virus vectors in the mammalian brain, Nature Genetics, 8(2): 148-54). Additional types of adenovirus vectors are described in U.S. Pat. No. 6,057,155 to Wickham et al.; 6,033,908 to Bout et al.; 6,001 ,557 to Wilson et al.; 5,994,132 to Chamberlain et al.; 5,981 ,225 to Kochanek et al.; 5,885,808 to Spooner et al.; and U.S. Pat. No. 5,871 ,727 to Curiel, incorporated by reference.

Retroviral vectors which have been modified to form infective transformation systems can also be used to deliver nucleic acid encoding a desired polypeptide monobody into a target cell. One such type of retroviral vector is disclosed in U.S. Pat. No. 5,849,586 to Kriegler et al, incorporated by reference.

Regardless of the type of infective transformation system employed, it should be targeted for delivery of the nucleic acid to a specific cell type. For example, for delivery of the nucleic acid into tumor cells, a high titer of the infective transformation system can be injected directly within the tumor site so as to enhance the likelihood of tumor cell infection. The infected cells will then express the desired polypeptide monobody, allowing the polypeptide monobody to modify the activity of its target protein.

Dosages to be administered can be determined according to known procedures, including those which balance both drug efficacy and degree of side effects. The amount of monobody agent to be administered depends on the precise formulation selected; the disease or condition of the patient and its severity; the route of administration; the health and weight of the patient; the existence of other concurrent treatment, if any; the frequency of treatment, the nature of the effect desired, for example, inhibition of tumor metastasis; and the judgment of the skilled practitioner. A dose of a monobody agent for treating a patient is an amount sufficient to modulate or modify the activity of the target protein. The number of variables in regard to an individual treatment regimen is large, and a considerable range of doses is expected.

The present invention is further explained by the following examples, which should not be construed by way of limiting the scope of the present invention. EXAMPLES

Example 1 : Phage display vector construction

"Shaved" FN3fn10 was constructed by replacing under-represented amino acids with Ser, i.e., introducing the D3S, P25S, A26S, V27S, T28S, R30S, V75S, T76S, R78S, G77S, G79S, D80S, P82S, and A83S mutations in the FN3fn10 gene by Kunkel mutagenesis. DNA fragment that encodes signal sequence of DsbA, (Steiner et al., 2006, Signal sequences directing cotranslational translocation expand the range of proteins amenable to phage display, Nat Biotechnol 24:823-31) was fused to the gene for "shaved" FN3fn10 using PCR, and the fusion gene was cloned into the phage display vector pAS38 (Koide, et al., 1998, The fibronectin type III domain as a scaffold for novel binding proteins, J. MoI. Biol., 284:1141-1151), resulting in pDsbFNshavedp3. The vector map is shown in FIG. 7.

Example 2: Phage library construction

The S/Y libraries were constructed using the TMT degenerate codon (M=AJC in equal proportions) that encodes the combination of S and Y. The initial S/Y library was constructed by introducing diversity at all positions in the three loops (BC, DE, and FG) as follows. In the BC-loop (residues 21-31), residues 25-28 were replaced with 4 to 10 residues of the S/Y combination and residue 30 was replaced with a single S/Y combination. Val27 was unchanged because of its importance for the FN3fn10 stability. Thus the BC loop length was varied from 11 (= original) to 17. In the DE-loop (residues 51-56), residues 52-55 were replaced with 4 to 10 residues of the S/Y combination. In the FG-loop (residues 75-88), residues 75-85 were replaced with 7 to 15 residues of the S/Y combination, and K86 was replaced with Ser.

The second-generation library was constructed as follows. The BC loop contained 4- 8 residues of the S/Y combination for residues 25-28 and a single S/Y combination for R30. In the DE-loop (G52-S55), G52 was replaced with the S/Y/G combination and residues 53-55 were replaced with 3 residues of the S/Y combination. The FG loop was diversified in the same manner as the first-generation library as described above.

All libraries were constructed using Kunkel mutagenesis using the phagemid vector described above as the template following published procedures (Sidhu et al., 2000, Phage display for selection of novel binding peptides, Methods Enzymol 328: 333- 363, incorporated by reference).

Example 3: Target protein preparation

Target proteins were conjugated with a cleavable biotinylation reagent as follows. A target protein was dissolved in 5OmM MOPS/NaOH buffer, pH 7.0 containing 10OmM

NaCI at a concentration of 1mg/ml_ and mixed with 1/10 volume of 1 mg/ml_ EZ-Link

Sulfo-NHS-SS-Biotin (Pierce) dissolved in the same buffer. After incubating for

45min, the reaction was terminated by an addition of 1/10 volume of 1M Tris-HCI

(pH8). The solution was dialyzed against Tris/NaCI buffer (2OmM Tris, 10OmM NaCI, pH8), and its concentration was determined by measuring absorbance at 280 nm.

Example 4: Phage library sorting

In the first round of sorting, 1nmol of biotinylated target was incubated with 0.5 mg of Streptavidin (SAV)-coated magnetic beads (Promega Corp.) for 15min. After washing the beads in TBS (50 mM Tris HCI buffer with 150 mM NaCI, pH 7.5)), SAV was blocked with 5μM biotin solution in TBS. To the target-beads complex 10^12'13 phagemid particles suspended in 1mL TBST (TBS containing 0.5% (v/v) Tween20) containing 0.5% BSA(termed TBST-BSA) were added and the mixture was incubated for 15min. After discarding the phagemid solution and washing the beads twice with TBST, half of the beads were used to infect 3ml_ of log-phase XL-1 Blue cells (Invitrogen), and phagemids were amplified in the presence of 0.2mM IPTG in the 2xYT media containing ampicillin.

The second and third rounds of selection were performed using a King Fisher magnetic beads handler (Thermo). For the second round, approximately 10¹¹of amplified phagemid particles from the first round were mixed with IOpmol of biotinylated target in 100μl TBS-BSA (i.e., the target concentration was 10OnM). After incubating for 15min, 20μg of SAV-beads were added to capture the biotinylated target-phage complexes and incubated for 15 min. After washing the beads 4 times with TBS-BSA, the target molecules (with bound phagemids) were released from the beads by cleaving the disulfide linker between the protein and biotin with elution buffer (2OmM Tris (pH8), 10OmM DTT). Recovered phagemids were used to infect 1.2mL of log-phase XL-1 Blue cells, and phagemids were amplified in the presence of 0.2mM IPTG in 2xYT + ampicillin. The third round was performed in the same manner as the second round except that the target concentration was reduced to 20-50 nM.

Example 5: Yeast surface display vector construction

A yeast surface display vector for monobodies was constructed from pYD1 (Invitrogen) using yeast homologous recombination. The gene for a Monobody of interest was inserted between Nhel and Xhol sites of pYD1 using yeast homologous recombination (Ma et al., 1987, Plasmid construction by homologous recombination in yeast, Gene 58, 201-16; Raymond et al., 1999, General method for plasmid construction using homologous recombination, Biotechniques 26:134-8,140-1 , incorporated by reference). Yeast cells were transformed using the method of Gietz (Gietz & Woods, 2001 , Genetic transformation of yeast, Biotechniques 30:816-828, incorporated by reference). The resulting plasmid, pGalAgaMB, expresses Aga2- monobody-V5 tag-His tag fusion protein under the control of a galactose-inducible promoter. The plasmid map is shown in FIG. 7.

Example 6. Yeast surface display and screening

Monoclonal anti-V5 IgG and anti-mouse IgG-fluorescein isothiocyanate (FITC) conjugate were purchased from Sigma. Neutravidin-PE (NAV-PE) conjugate was purchased from Molecular Probes. Yeast surface display was performed generally according to Boder and Wittrup (Boder & Wittrup, 2000, Yeast surface display for directed evolution of protein expression, affinity, and stability, Methods Enzymol 328, 430-44, incorporated by reference).

After three rounds of phage display sorting of a monobody library, the mixture of genes for the enriched monobodies were amplified using primers that contain identical sequences to the 5¹ and 3' flanking regions of the monobody gene in the yeast display vector pGalAgaMB. Saccharomyces cerevisiae EBY100 was transformed with the PCR fragments and linearized pGalAgaMB to make a sub- library. Yeast cells were grown in the presence of 2% galactose at 3O⁰C for 24 hours in order to induce the Aga2-monobody fusion proteins on the yeast cell surface. Then they were incubated with either 5OnM of a biotinylated target for 40min. After cells were collected and washed, they were stained with monoclonal anti-V5 antibody and NAV-PE, followed by FITC-conjugated anti-mouse antibody. The cell population exhibiting the top 0.5% of target binding (as judged by the PE fluorescence) among the population exhibiting top 50% Aga2-monobody expression (as judged by the FITC fluorescence) were recovered using a FACS Aria instrument (Beckton- Dickinson).

Example 7: Yeast surface display analysis of individual clones and K_d measurements

Experiments were performed essentially following the methods of Boder and Wittrup (Boder & Wittrup, 2000, Yeast surface display for directed evolution of protein expression, affinity, and stability, Methods Enzymol 328, 430-44, incorporated by reference ). Individual yeast cells displaying a monobody were grown and incubated with various concentration of biotinylated target protein for 40 min at 4⁰C. Subsequence washing and staining were performed as described above. The fluorescence signals were measured using a FACS Calibour equipped with a high- throughput sampler. Typically 10⁵ cells were analyzed. The PE fluorescence intensity for each cell was normalized by dividing it with its FITC fluorescence intensity. The normalized PE fluorescence intensity of the top 10% of cells was used for K_d determination.

The amino acid sequences of isolated monobodies were deduced by PCR-amplifying their genes and determining their DNA sequences.

Table 1 lists the amino acid sequences (with corresponding SEQ ID NOs) obtained for the three loop regions, BC (amino acids 21-31), DE (amino acids 51-56) and FG

(amino acids 75-88) of monobodies isolated that bind specific targets: maltose binding protein (MBP), yeast SUMO (ySUMO) or human SUMO4 (hSUMO4). The loop regions refer to those found in the "shaved" FN3 protein (SEQ ID NO:2). K_d derived either from yeast display or Biacore for specific clones can be found in FIG. 4.

Table 1 : Listing of sequences of monobody clones loop regions.

Example 8: Expression and purification of monobodies

An E. coli expression vector, pHFT2, was constructed by replacing the HiS₆ segment of the expression vector pHFT1 (Huang et al.,2006, Conformation-specific affinity purification of proteins using engineered binding proteins: Application to the estrogen receptor, Protein Expr Purif 47, 348-354, incorporated by reference) with a HiS₁₀ sequence.

A monobody gene obtained from library sorting was cloned between the BamHI-Xhol sites by standard PCR cloning methods. BL21(DE3) cells (Novagen) were transformed with a resulting monobody expression vector.

Bacterial cell culture and protein expression was performed by an addition of IPTG to the media as described previously (Koide, et al., 1998, The fibronectin type III domain as a scaffold for novel binding proteins, J. MoI. Biol., 284:1141-1151 and Koide et al., 2001 , Stabilization of a fibronectin type III domain by the removal of unfavorable electrostatic interactions on the protein surface, Biochemistry 40:10326- 33). Alternatively, the automatic induction method of Studier (Studier, 2005, Protein production by auto-induction in high density shaking cultures, Protein Expr Purif, 41 :207-34) was also utilized.

Expressed monobodies were purified using nickel affinity chromatography (Ni- Sepharose 6 Fast Flow; GE Healthcare) following the manufacturer's instructions. Small-scale purification was performed using Ni-NTA agarose magnetic beads (Qiagen) on a King Fisher instrument (Thermo) following the manufacturers' instructions.

Example 9: K_d determination by Surface Plasmon Resonance (SPR)

SPR measurements were performed on a Biacore 2000 instrument. A purified monobody was immobilized on a sensor chip NTA (preloaded with nickel) through the HiS₁₀ tag on the monobody. A target at approximately 300 nM was injected and its association and dissociation were monitored. Sensorgrams were analyzed using BIAevaluation software (Biacore). Example 10: Characterization of monobody clones to three distinct proteins.

After three rounds of phage-display library sorting against three protein targets, maltose-binding protein (MPB), human SUMO4( hSUMO4), and yeast SUMO (ySUMO), the enriched pools of monobody clones were transferred into the yeast- display format and performed one round of sorting. The Y/S monobodies exhibited clear and distinct consensus sequence for each target, with monobodies to different targets showing distinct loop length distribution (FIG. 9). As demonstrated in FIG. 9, the obtained cloned monobodies used all the designed lengths of the BC and FG loops except 12 residue-long for FG, suggesting that the BC and FG loops do not have a preferred length. The DE loop, in contrast, all of the selected clones were four residues long, suggesting a strong preference imposed by the scaffold.

Amino acid sequences of individual clones identified at least two classes of monobodies for each target (FIG. 9, and Table 1). The amino acid sequences of both BC and FG loops are distinctly different classes of monobodies (e.g. MBP-74 versus MBP-32), suggesting that these loops have mutual influence on each other's sequence and cooperatively form a binding interface.

Example 11: Characterization of monobody clones

The K_d values of representative clones, as determined using yeast surface display, ranged from 5 nM to 90 nM (FIG. 9 and 10A). The monobodies showed low levels of cross-reactivity to non-cogate targets, but some clones that cross-react with multiple targets (e.g. hSUMO4-39 and MBP-73, FIG. 10C-E). The selected monobodies can discriminate hSUMO4 and ySOMO that have 40% sequence identity (FIG. 10 C-D), showing that Y/S monobodies achieve a good level of binding specificity.

X-ray structure determination of a Y/S-monobody-target complex with designed 3D domain swapping.

A method in which a monobody is fused to its target with a short or no linker between them, and which maintains a 1 :1 stoicheometry, was used to crystallize the monobody bound to its target. Such fused proteins were hypothesized to form an oligomer via 3D domain swapping and that increasing the probability of higher order structure formation would enhance crystallization. This strategy was tested with the MBP-74 monobody and MBP. Two different linker lengths (zero and three residues) were between the C-terminus of MBP and the N-terminus of the monobody. Gel filtration revealed that the fusion protein with no linker formed predominantly an octamer while the other with a three-residue linker formed a mixture of tetramer and octamer.

The MBP-monobody fusion protein was crystallized in 20% polyethyleneglycol-1000, 0.1 M Na/K phosphate buffer and 0.2 M NaCI, pH 6.5 using the sitting drop vapor diffusion method. Crystals were frozen in an 80% mixture of this solution combined with 20% glycerol as cryoprotectant. The X-ray diffraction data was collected at APS beamline 24-ID (Advanced Photon Source at Argonne National Laboratory). Crystal data and data collection statistics is summarized in Table 2. X-ray diffraction data was processed and scaled with HKL2000. The structures were determined by molecular replacement using multi-copy search with two different models with the program MOLREP in CCP4. The MBP structure (PDBI: 1 DBM) was used as a search model, along with the FN3 structure (PDBID:1 FNA). The rigid body refinement was carried out with CNS1.1. The SigmaA-weighted 2Fobs-Fcalc and Fobs-Fcalc Fourier maps were calculated and examined. The model building was carried out using the Turbo-Frodo program 10. The simulated annealing and the search for water molecules were performed in CNS1.1. The TLS (Translation/Libration/Screw) and bulk solvent parameters, restrained temperature factor, and final positional refinement were completed with REFMAC5. Molecular graphics were generated using PyMOL.

Table 2. Properties of the monobody paratope and those of related proteins.

Parameters were calculated using the protein-protein interaction server (Jones and Thornton, 1996, Proc Natl Acad Sci U S A 93, 13-20) and the program SC (Lawrence and Colman, 1993, J MoI Biol 234, 946-50), except for five VHH'S (UTT, 1 RJC, 1 RI8, 1ZVY and 1ZV5) which were obtained from De Genst, et al., (2006) Proc Natl Acad Sci U S A 103, 4586-91.

The parameters for the entire paratope and those for the paratope from which the scaffold residues are omitted (in the parenthesis) are shown for the monobody.

The average and standard deviation for eight V_HH structures that bind to a cleft (PDB IDs: 1 KXQ, 1 KXT, 1 SQ2, 1 JTT, 1 RJC, 1 RI8, 1 ZVY, and 1 ZV5).

This excludes information for 1KXQ due to its incompatibility with the protein-protein interaction server. X-ray structure of crystallized MBP/MBP-74 fusion protein with no linker was determined at a 2.35 A resolution (Table 3). The connecting segment between the MBP and monobody portions had well-defined electron density (Fig. 11 B), allowing us to unambiguously establish the connectivity of the two portions. Surprisingly, the fusion protein formed a continuous helical rod in the crystal with a 4-fold screw axis along its length (Figs. 11A and 11 B), rather than an oligomer, indicating a structural rearrangement during crystallization.

Table 3. Statistics for the crystal structure of the MBP-74 monobody/MBP com lex

The monobody scaffold (excluding the three recognition loops) and the corresponding part of the wild-type FN3 (PDB code, 1 FNF) had a Ca rmsd value of 0.54 A, indicating that the FN3fn10 scaffold is essentially unaffected by the extensive mutations in the loops. The backbone conformations of the BC and FG loops of the monobody, where all mutations are located, are very different from those of their counterparts in the uncomplexed wild-type protein, suggesting their inherent plasticity

(Fig. 3F). MBP was in the open form, similar to that of the MBP/β-cyclodextrin (βCD) complex (PDB code, 1 DMB; Ca rmsd = 0.81 A).

The recognition loops of the monobody segment interact with the sugar-binding cleft of the MBP segment of an adjacent fusion protein (Fig. 11C). Hereafter, this combination of monobody and MBP is referred to as the "binding complex" and the interface between them as the "binding interface." Following the convention for antibody-antigen complexes, the interaction interface of MBP is referred to as the epitope and that of the monobody as the paratope.

This binding complex was confirmed represent the interaction in solution using two independent methods. First, the monobody-binding epitope of MBP was mapped by heteronuclear NMR spectroscopy (FIG. 14). Monobody binding affected the backbone 1 H, 15N and 13C resonances of MBP residues that overlap with the epitope seen in the crystal structure (FIG. 11 D and E). The NMR perturbation results indicate that the monobody binds to a single, dominant epitope on MBP. Note that an epitope mapped by this NMR method is usually twice as large as the actual structural epitope, because this method detects residues that are affected by direct and indirect contacts (Huang et al., 1998, J. MoI. Biol. 281 :61-67). Second, the binding of MBP-74 to MBP was competitively inhibited by βCD that binds to the cleft.

Binding interface

The binding interface buries 749 A2 of monobody surface and is comprised of 16 residues of the monobody and 22 residues of MBP (interface residues are defined as those with buried surface area > 5 A2; FIG. 15). Although the DE loop contains the wild-type sequence, all three recognition loops are involved in the interaction. Both the epitope and the paratope are bisected by a deep unfilled cavity, resulting in two distinct sets of contacts (FIG. 11 D and FIG. 12A). The monobody FG loop and a part of the scaffold interact with the "bottom" lobe of MBP, and the BC and DE loops together with a single Tyr residue from the FG Loop interact with the "top" lobe (FIG. 12A). The contacts made by the monobody scaffold residues are potentially due to lattice packing, because the contact residues are mostly polar and charged and NMR epitope mapping data show little to no chemical shift perturbation in this area (FIG. 11 E).

The FG-loop residues contribute the bulk of the interface surface (513 A2; FIG. 12D) and mediate contact with MBP almost exclusively through the side chain atoms. Of the seven Tyr residues in this loop (FIG. 2B)₁ three interact closely with aromatic residues of MBP (FIG. 12B). They form a central hydrophobic patch that is surrounded by a more polar periphery consisting of the hydroxyl groups of five Tyr residues (FIG. 12B). The remaining Tyr residues do not contribute to this contact: Y82 lies against the backside of the Tyr cluster that form the binding interface, and Y77 stretches away.

The BC and DE loops together with Y77 interact with three charged residues on the "top" lobe of MBP (FIG. 12A and C). This interface nearly completely buries K42, E44 and E45 of MBP to account for 149 A2 of the monobody interface surface area. Contrary to the FG loop, the majority of the contacts here are mediated by the backbone atoms. The carbonyl groups of S27 and V29 of the BC loop and the hydroxyl group of Y77 form hydrogen bonds with the buried K42 of MBP. In addition, nearby E44 and E45 of MBP form hydrogen bonds with the hydroxyl groups of Tyr77 and Ser53, respectively. These GIu residues may compensate for the burial of K42's ε-amino group.

In addition to the binding interface, each monobody molecule forms a large binding interface (520 A) with the MBP molecule to which it is fused as well as a 214 A2 contact with a symmetry-related MBP. Together with the binding interface, 28% of the total monobody surface is buried in the crystal structure.

Substrate mimicry by the monobody

A structural comparison of the MBP/monobody complex with MBP complexed with its βCD substrate revealed that the MBP epitopes for βCD and the monobody share 12 residues that have nearly identical conformations in the two structures (FIG. 13B). Many βCD structural elements are closely mimicked by the monobody binding loop structures (FIG. 13A). In particular, the aromatic ring of the FG loop Tyr shows striking overlap with the sugar rings of βCD, and they utilize the same hydrophobic contacts. Further, many of the hydroxyl groups of βCD are emulated by the Tyr hydroxyls and backbone carbonyls, which resulted in conservation of a similar hydrogen bonding pattern.

Dominant functional contribution of Tyr residues in the paratope

The ability to test whether loop residues of MBP-74 could be substituted with other amino acids. Two secondary libraries in which either the BC or FG loop was diversified with a mixture of Y, S, F, A, D, and V at each position. All the residues in the BC loop tolerate extensive mutations (FIG. 16A). In contrast, only Y82 and S83 in the FG loop can be replaced with other amino acid (FIG. 16C). Y82 can be changed to F, but not S, A, D, or V. The phenolic ring of Y82 packs against the interface Tyr residues and appears to support the paratope structure. S83 was completely replaced with either A or D, suggesting that A and D are preferred over S at this position. These results clearly indicate the essential contributions of most of the FG loop residues.

Example 12: NMR of MBP and a monobody

NMR spectroscopy

1H, 15N-HSQC and HNCO spectra of the free 2H/13C/15N-enriched MBP (0.3 mM) and those of a mixture of 2H/13C/15N-enriched MBP (92 μM) and the unlabeled MBP-74 monobody (165 μM) were collected on a Varian (Palo Alto, CA) INOVA 600 NMR spectrometer using pulse sequences provided by the manufacturer. The HNCO resonances of the free MBP were in a good agreement with previously established assignments generously provided by Drs. Lewis Kay and Vitali Tugarinov (University of Toronto). Residues affected by monobody binding were identified by comparing the two HNCO spectra. Amide cross peaks were classified into four categories: strongly affected, a peak that migrates more than two linewidths; weakly affected, a peak exhibiting a significantly reduced intensity at the same position as in the free spectrum, or a peak that has a corresponding peak in the complex spectrum to the vicinity (within two linewidths) of its original position in the free spectrum; not affected; and excluded from analysis, a peak that overlaps with another in the spectra.

The NMR results confirmed that the monobody binds to a single, dominant epitope on MBP.

Example 13: Production of monobodies from a single-loop binary library

The ability to produce specific monobodies that bind to a target from a library in which only one loop of the FN3 scaffold (the FG loop) has been diversified with only tyrosine and serine (Y/S) library. Tyr/Ser binary diversity was introduced in the FG loop (FIG. 18A). The loop length was varied between 7 and 13 residues in this library, replacing the 11 -residue segment of wildtype FG loop (VaI 75 to Ser 85). Together, the number of sequences encoded by the design is about 2 x 10⁴. A phage-display library was constructed, the single-loop Y/S library, containing 10⁹ independent clones. The actual size of this library far exceeds the number of unique sequences, and thus it is likely that all unique sequences are present in the library.

The library was sorted using maltose-binding protein (MBP) as a target. The progress of sorting was monitored by determining the number of recovered phages (Table 4). The "enrichment ratio" is the ratio of the number of phages recovered from sorting performed with the target to that from sorting without the target, and a high ratio indicates a substantial enrichment of the phages specifically binding to the target. In our experience, usually more than 50% of clones are target-specific binders when the enrichment ratio is 10 or greater. After the single-loop Y/S-library was sorted, the enrichment factor was 60 in the fourth round, and 90% of the sorted clones were MBP-binders as judged by phage ELISA method. FIG. 18B shows the amino acid sequences of the selected clones. There were only two unique sequences among the selected 15 clones. MBP-SL1 that appeared 14 times had a pattern of tyrosine-serine followed by multi tyrosine (FIG. 18B), and the loop-length was two residues shorter than the wild type. The other clone, MBP-SL2, that appeared only once, also had a similar pattern but Y81 was replaced with a Ser, and the loop-length was one residue shorter than SL1. Kd values of the single-loop monobodies, as measured by surface plasmon resonance (SPR), were 213nM and 184nM, respectively (FIG. 18C and 18D). The monobodies were also specific, as they showed no binding to ribonuclease A, cytochrome C, or yeast SUMO, even at a target concentration of 5μM in SPR experiments. Table 4. Phage numbers and enrichment ratios for monobody library sorting

"NP, not performed. ND, Not determined. The enrichment ratio is determined as the number of recovered phages from target(+) selection over that from target(-) selection.

It was noted that the YSYYYYXY motif, where X stands for either Y or S, appeared only in two instances even though our library design allowed for the motif to be encoded in different positions within a loop and the motif is certainly encoded in the context of longer loop length. These results suggest that the position of the sequence motif and the loop length are both important for presenting the motif in a functional manner.

The FG loop sequence of MBP-SL1 is identical to that of an MBP-binding monobody that was previously obtained from a library in which three loops were randomized described above. The MBP-binding monobody, MBP-74, had two mutated loops, and its FG loop was the same as that of MBP-SL1 (FIG. 18B), but their BC loop sequences were distinct. The affinity of the monobodies MBP-SL1 (SEQ ID NO:65) and MBP-SL2 (SEQ ID NO:66) was only slightly reduced relative to that of monobody MBP-74 whose Kd is 135nM (FIG. 18D), suggesting the dominant role of the FG loop in binding to MBP.

The affinity and specificity of the single-loop Y/S monobodies were comparable to those of monobodies produced from the so-called "hard randomized libraries" ( Koide, et al., 1998, The fibronectin type III domain as a scaffold for novel binding proteins, J. MoI. Biol., 284:1141-1151 ; Karatan et al, 2004, Molecular recognition properties of FN3 monobodies that bind the Src SH3 domain, Chem Biol, 11 :835-44) and to those of similar types of antibody mimics (Vogt and Skerra 2004, Construction of an artificial receptor protein ("anticalin") based on the human apolipoprotein D, Chembiochem., 5(2):191-9). This observation suggests that including all the 20 amino acids at each position in a combinatorial library is likely not highly beneficial over the binary diversity approach.

A 20 amino acid randomization to make a display library has long been standard protocol. Including all 20 amino acid types greatly increases chemical diversity over the binominal diversity; however, it also makes the number of encoded sequences very large. For example, for randomizing 9 residues (same loop length as clone MBP-SL1), theoretical library size is 3.5 x 10¹³, which far exceeds the practical library size achievable with the current phage display techniques (~10¹¹). Consequently, only a small fraction of encoded sequences can be experimentally sampled. In order to find out if having all 20 amino acids in the loop but sampling them sparsely increases chances of getting binders, a single-loop library, whose loop length was fixed to 9 residues, was made using all 20 amino acids (AII-9-library), and sorted for MBP binders. The enrichment factor was 1~2 for the AII-9-library indicating no target-dependent enrichment. The results showed that the Y/S-library surpassed the All-library at least for producing binders to the test target, MBP.

The Y/S binary encoded in a single loop produced binding proteins, thus defining the ultimate baseline for the effectiveness of the Y/S binary diversity. Interestingly, the "hard randomized" library failed in the same context. There may be two reasons for the failure of the AII-9 library to produce MBP binding monobodies. First, the AII-9- library (contained 10¹⁰ independent clones) contained only about 0.1% of all sequences encoded by the design (3.5 x 10¹³). This sparse sampling of the encoded sequences resulted in a failure to recover MBP-binding sequences that are clearly present among them, for example, the sequence of MBP-SL1. Second, some of the additional amino acids available in the AII-9 library could disrupt a binding surface. For example, glutamic acid and lysine are known to be underrepresented in a protein-protein binding interface and also known to prevent protein-protein interactions in protein crystallization. Thus, a single GIu or Lys in the monobody loop of MBP-SL1 would severely reduce its binding affinity. A large fraction of AII-9 encoded sequences contains one or more "counter-binding" amino acid. It is important to note that these two problems are related. If one can sample the sequences of the AII-9-library, one would be able to select a binding clone even in the presence of a larger number of nonbinding clones. Because the size of experimental combinatorial libraries is limited, encoding ail amino acid types in a library is indeed counterproductive. The effectiveness of the Y/S binary library arises from its high content of binding sequences, the absence of the counter-binding amino acids, and the small library size that allows nearly complete sampling.

Example 14: Use of the serine and tryptophan

Construction of libraries is carried out as set forth in Examples 1-13 except that the binary pair of amino acids is serine and tryptophan. The results demonstrate high affinity monobodies for a variety of target proteins.

In summary, the present invention provides methods of engineering binding proteins using molecular scaffolds. The invention uses the tenth FN3 domain of human fibronectin (FN3fn10) as the molecular scaffold, diversifies up to three loops (BC, DE and FG) of FN3fn10 to construct a combinatorial library. The library includes certain antibody mimics or monobodies in which the loop regions of the scaffold, corresponding to an antigen binding site or pocket, are varied using only two amino acids, serine and tyrosine. In other words, the combinatorial libraries in accordance with the present invention are binary libraries of FN3 that produce high affinity monobodies.

The library includes certain antibody mimics or monobodies in which the loop regions of the scaffold, corresponding to an antigen binding site or pocket, are varied using only two amino acids, serine and tyrosine. In other words, the combinatorial libraries in accordance with the present invention are binary libraries of FN3 that produce high affinity monobodies.

While the invention has now been described and exemplified with some specificity, those skilled in the art will appreciate the various modifications, including variations, additions, and omissions that may be made in what has been described. Accordingly, it is intended that these modifications also be encompassed by the present invention and that the scope of the present invention be limited solely by the broadest interpretation that lawfully can be accorded the appended claims.

All patents, publications and references cited herein are hereby fully incorporated by reference. In case of conflict between the present disclosure and incorporated patents, publications and references, the present disclosure should control.

Claims

1. A fibronectin type III (FN3) polypeptide monobody comprising a plurality of FN3 β-strand domain sequences that are linked to a plurality of loop region sequences, wherein one or more of the loop region sequences vary by deletion, insertion or replacement of one or more amino acids with serine, tyrosine or a combination of both from the corresponding loop region sequences in wild-type FN3.

2. The monobody of claim 1 , wherein the variable loop region comprises a variable BC, DE or FG loop or combination thereof:

wherein the BC loop is represented by: A₁-A₂-A₃-... A_n-_{I 1}A_n where n=4-25, and wherein A_n-1 is V₂₉ and wherein each of A₁, A₂, A₃ to A_n is the amino acid of the native loop, S or Y;

wherein the DE loop is represented by:

A₁-A₂-A₃-_{^ 1}A_n where n=4-25 and wherein each of A₁, A₂, A₃ to A_n is the amino acid of the native loop, S or Y;

wherein the FG loop is represented by:

where n=4-25, and wherein each of A-i, A₂, A₃ to A_n is the amino acid of the native loop, S or Y.

3. The monobody of claim 2, wherein at least one loop region is varied, and wherein the one loop region is the BC loop.

4. The monobody of claim 2, wherein at least one loop region is varied, and wherein the one loop region is the DE loop.

5. The monobody of claim 2, wherein at least one loop region is varied, and wherein the one loop region is the FG loop.

6. The monobody of claim 2, wherein at least two loop regions are varied, and wherein at least one of the loop regions is BC.

7. The monobody of claim 2, wherein at least two loop regions are varied, and wherein at least one of the loop regions is DE.

8. The monobody of claim 2, wherein at least two loop regions are varied, and wherein at least one of the loop regions is FG.

9. The monobody of claim 2, wherein at least three loop regions are varied.

10. The monobody of claim 1 , wherein at least one loop region varies by deletion, insertion or replacement of one or more amino acids with serine, tyrosine or a combination of both.

11. The monobody of claim 1 , wherein at least two loop regions vary by deletion, insertion or replacement of one or more amino acids with serine, tyrosine or a combination of both.

12. The monobody of claim 1 , wherein at least three loop regions vary by deletion, insertion or replacement of one or more amino acids with serine, tyrosine or a combination of both.

13. The monobody of claim 1 , wherein the monobody is contained within a pharmaceutical composition.

14. A polynucleotide encoding the polypeptide monobody of claim 1.

15. The polynucleotide of claim 14, wherein the polynuceotide sequence is contained within a vector.

16. The polynucleotide of claim 15, wherein the vector is an adenovirus vector.

17. The polynucleotide of claim 15, wherein the vector is a retrovirus vector.

18. The polynucleotide of claim 14, wherein the polynuclotide sequence is expressed within a host cell.

19. The polynucleotide of claim 18, wherein the host cell is a cancer cell.

20. A host cell comprising the vector of claim 15.

21. The host cell of claim 20, wherein the host cell is a cancer cell.

22. A pharmaceutical composition comprising the monobody of claim 1.

23. A library encoding FN3 polypeptide monobodies comprising a plurality of nucleic acid species, wherein the species encode a plurality of FN3 β-strand domain sequences that are linked to a plurality of loop region sequences, wherein one or more of the loop region sequences vary by deletion, insertion or replacement of one or more two amino acids with serine, tyrosine or a combination of both from corresponding loop region sequences in wild-type FN3.

24. The library of claim 22, wherein one of the loop region sequences varies by deletion, insertion or replacement of one or more amino acids with serine, tyrosine or a combination of both.

25. The library of claim 23, wherein the one loop region is BC.

26. The library of claim 23, wherein the one loop region is DE.

27. The library of claim 23, wherein the one loop region is FG.

28. The library of claim 22, wherein two of the loop region sequences vary by deletion, insertion or replacement of one or more amino acids with serine, tyrosine or a combination of both.

29. The library of claim 22, wherein three of the loop region sequences vary by deletion, insertion or replacement of one or more amino acids with serine, tyrosine or a combination of both.

30. The library of claim 22, wherein the plurality of nucleic acid species are expressed as a polypeptide display library.

31. A polypeptide display library derived from the nucleic acid library of claim 22.

32. A method of screening the library of claim 31, comprising: displaying the library of claim 31 on a phage or yeast surface; contacting the phage or yeast with a target protein; and determining those members of the library that interact with the target protein.

33. A method of modulating or modifying the activity of a target protein, comprising: administering the monobody of claim 1 to a subject in need thereof.

34. The method of claim 32, wherein the subject has cancer.