US20120129715A1

US20120129715A1 - Gb1 peptidic libraries and methods of screening the same

Info

Publication number: US20120129715A1
Application number: US13/294,072
Authority: US
Inventors: Sachdev S. Sidhu; Maruti Uppalapati
Original assignee: Individual
Current assignee: University of Toronto
Priority date: 2010-11-12
Filing date: 2011-11-10
Publication date: 2012-05-24
Also published as: WO2012078313A3; US20120178643A1; WO2012078313A2; EP2638063A2; US20160223560A1; CA2817579A1; US9285372B2; EP2638063A4

Abstract

GB1 peptidic libraries and methods of screening the same for specific binding to a target protein are provided. Libraries of polynucleotides that encode GB1 peptidic compounds are provided. These libraries find use in a variety of applications in which specific binding to target molecules, e.g., target proteins is desired. Also provided are methods of screening the libraries for binding to a target.

Description

CROSS REFERENCE TO RELATED APPLICATIONS

Pursuant to 35 U.S.C. §119(e), this application claims priority to the filing date of U.S. provisional application Ser. No. 61/413,318, filed Nov. 12, 2010, the disclosure of which is herein incorporated by reference.
This application is related to copending U.S. application entitled “GB1 peptidic compounds and methods for making and using the same” filed on Nov. 10, 2011 to Sidhu et al. (attorney reference number RFLX-003) and accorded Ser. No. ______, and U.S. provisional application Ser. No. 61/413,331 filed Nov. 12, 2010, which are entirely incorporated herein by reference.
This application is related to copending U.S. application entitled “Methods and compositions for identifying D-peptidic compounds that specifically bind target proteins” filed on Nov. 10, 2011 to Ault-Riché et al. (attorney reference number RFLX-002) and accorded Ser. No. ______, and U.S. provisional application Ser. No. 61/413,316 filed Nov. 12, 2010, which are entirely incorporated herein by reference.

INTRODUCTION

Essentially all biological processes depend on molecular recognition mediated by proteins. The ability to manipulate the interactions of such proteins is of interest for both basic biological research and for the development of therapeutics and diagnostics.
Libraries of polypeptides can be prepared, e.g., by manipulating the immune system or via chemical synthesis, from which specificity of binding to target molecules can be selected. Molecular diversity from which specificity can be selected is large for polypeptides having numerous possible sequence combinations of amino acids. In addition, proteins can form large binding surfaces with multiple contacts to a target molecule that leads to highly specific and high affinity binding events. For example, antibodies are a class of protein that has yielded highly specific and tight binding ligands for various target antigens.
Because of the diversity of target molecules of interest and the binding properties of proteins, the screening of peptidic libraries to identify molecules with useful functions is of interest.

SUMMARY

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts a ribbon structure of a GB1 protein that illustrates a 4β-1α motif (Mayo et al., Nature Structural Biology, 5(6), 1998, p. 470-475).

FIGS. 2A and 2B depict six different libraries that include a GB1 scaffold, both in a ribbon representation (top) and a space filling representation (bottom). Amino acids at several positions of the GB1 scaffold that are selected for mutation are highlighted in darker shade (top). The space filling representations of Library 1 to Library 6 (bottom) illustrate six different potential binding surfaces (shown in darker shade) on the GB1 scaffold.

FIG. 3 illustrates the underlying sequence of the GB1 scaffold domain (SEQ ID NO: 1) of FIGS. 2A-2B and the positions of the variant amino acids (shown in the grey blocks) in Libraries 1 to 6. The asterisks indicate positions at which mutations may include insertion of amino acids.

FIG. 4A depicts the phage display of a GB1 peptidic compound fusion of coat protein p3 that includes a hinge and dimerization format. FIG. 4B illustrates display levels of various formats of the GB1 peptidic compound fusion on the phage particles.

FIG. 5 illustrates the design of phage display Library 1 (SEQ ID NOs: 225, 226 and 197-199).

FIG. 6 illustrates the design of phage display Library 2 (SEQ ID NOs: 225, 226 and 200-202).

FIG. 7 illustrates the design of phage display Library 3 (SEQ ID NOs: 225, 226 and 203-209).

FIG. 8 illustrates the design of phage display Library 4 (SEQ ID NOs: 225, 226 and 210-216).

FIG. 9 illustrates the design of phage display Library 5 (SEQ ID NOs: 225, 226 and 217-220).

FIG. 10 illustrates the design of phage display Library 6 (SEQ ID NOs: 225, 226 and 221-224).

FIGS. 5 to 10 illustrate the design of phage display libraries based on Libraries 1 to 6 illustrated in FIGS. 2A-2B. Ribbon (left) and space filling (right) structural representations depict the variant amino acid positions in dark. Oligonucleotide and amino acid sequences (SEQ ID NOs: 225 and 226) show the GB1 scaffold in the context of the fusion protein with GGS linkers at the N- and C-termini of the scaffold. Also shown are the oligonucleotide sequences synthesized for use in preparation of the libraries by Kunkel mutagenesis that include KHT codons at variant amino acid positions to encode variable regions of GB1 peptidic compounds.

FIG. 11 illustrates binding results from four rounds of phage display screening of Libraries 1 to 6 against L-VEGF and D-VEGF.

FIG. 12 illustrates binding assay results of individual clones identified from phage display screening of subject libraries against VEGF proteins. 10 nM or 100 nM VEGF protein was added to binding solutions in a competition binding assay.

FIG. 13 illustrates exemplary bifunctional libraries having two potential binding surfaces. A: Solvent exposed residues of surface 1 (S1) and surface 5 (S5) are shown in dark. B: Solvent exposed residues of surface 4 (S4) and surface 3 (S3) are shown in dark. C: Solvent exposed residues of surface 2 (S2) and surface 6 (S6) are shown in dark.

DEFINITIONS

As used herein, the term “peptidic” refers to a moiety that is composed of amino acid residues. The term “peptidic” includes compounds or libraries in which the conventional backbone has been replaced with non-naturally occurring or synthetic backbones, and peptides in which one or more naturally occurring amino acids have been replaced with one or more non-naturally occurring or synthetic amino acids, or a D-amino acid. Any of the depictions of sequences found herein (e.g., using one-letter or three-letter codes) may represent a L-amino acid or a D-amino acid version of the sequence. Unless noted otherwise, the capital and small letter codes of L- and D-amino acid residues are not utilized.

As used herein, the terms “polypeptide” and “protein” are used interchangeably. The term “polypeptide” also includes post translational modified polypeptides or proteins. The term “polypeptide” includes polypeptides in which the conventional backbone has been replaced with non-naturally occurring or synthetic backbones, and peptides in which one or more of the conventional amino acids have been replaced with one or more non-naturally occurring or synthetic amino acids. In some instances, polypeptides may be of any length, e.g., 2 or more amino acids, 4 or more amino acids, 10 or more amino acids, 20 or more amino acids, 30 or more amino acids, 40 or more amino acids, 50 or more amino acids, 60 or more amino acids, 100 or more amino acids, 300 or more amino acids, 500 or more or 1000 or more amino acids.

As used herein, the term “scaffold” or “scaffold domain” refers to a peptidic framework from which a library of compounds arose, and against which the compounds are able to be compared. When a compound of a library arises from amino acid mutations at various positions within a scaffold, the amino acids at those positions are referred to as “variant amino acids.” Such variant amino acids may confer on the resulting peptidic compounds different functions, such as specific binding to a target protein.

As used herein, the term “mutation” is a deletion, insertion, or substitution of an amino acid(s) residue or nucleotide(s) residue relative to a reference sequence or motif, such as a scaffold sequence or motif.

As used herein, the terms “GB1 scaffold domain” and “GB1 scaffold” refer to a scaffold that has a structural motif similar to the B1 domain of Protein G (GB1), where the structural motif is characterized by a motif including a four stranded β-sheet packed against a helix (also referred to as a 4β-1α motif). The arrangement of four β-strands and one α-helix may form a hairpin-helix-hairpin motif. An exemplary GB1 scaffold domain is depicted in FIG. 1. GB1 scaffold domains include members of the family of IgG binding B domains, e.g., Protein L B1 domain. Amino acid sequences of exemplary B domains that may be employed herein as GB1 scaffold domains are found in the Wellcome Trust Sanger Institute Pfam database (The Pfam protein families database: Finn et al., Nucleic Acids Research (2010) Database Issue 38:D211-222), see, e.g., Family: IgG_binding_B (PF01378) (pfam.sanger.ac.uk/family/PF01378.10#tabview=tab0) or in NCBI's protein database. Exemplary GB1 scaffold domain sequences include those described by SEQ ID NOs: 227-261. A GB1 scaffold domain may be a native sequence of a member of the B domain protein family, a B domain sequence with pre-existing amino acid sequence modifications (such as additions, deletions and/or substitutions), or a fragment or analogue thereof. A GB1 scaffold domain may be L-peptidic, D-peptidic or a combination thereof. In some cases, a “GB1 scaffold domain” may also be referred to as a “parent amino acid sequence.”

As used herein, the term “GB1 peptidic compound” refers to a compound composed of peptidic residues that has a parent GB1 scaffold domain.

As used herein, the terms “parent amino acid sequence” and “parent polypeptide” refer to a polypeptide comprising an amino acid sequence from which a variant GB1 peptidic compound arose and against which the variant GB1 peptidic compound is being compared. In some cases, the parent polypeptide lacks one or more of the modifications disclosed herein and differs in function compared to a variant GB1 peptidic compound as disclosed herein. The parent polypeptide may comprise a native GB1 sequence or GB1 scaffold sequence with pre-existing amino acid sequence modifications (such as additions, deletions and/or substitutions).

As used herein, the term “variable region” refers to a continuous sequence of residues that includes one or more variant amino acids. A variable region may also include one or more conserved amino acids at fixed positions. As used herein, the term “fixed region” refers to a continuous sequence of residues that does not include any mutations or variant amino acids, and is conserved across a library of compounds.

As used herein, the term “variable domain” refers to a domain that includes all of the variant amino acids of a GB1 scaffold. The variable domain may include one or more variable regions, and may encompass a continuous or a discontinuous sequence of residues. The variable domain may be part of the scaffold domain.

As used herein, the term “discontinuous sequence of residues” refers to a sequence of residues that is not continuous with respect to the primary sequence of a peptidic compound. A peptidic compound may fold to form a secondary or tertiary structure, e.g., a 4β-1α motif, where the amino acids of a discontinuous sequence of residues are adjacent to each other in space, i.e., contiguous. As used herein, the term “continuous sequence of residues” refers to a sequence of residues that is continuous in terms of the primary sequence of a peptidic compound.

As used herein, the term “non-core mutation” refers to an amino acid mutation of a GB1 peptidic compound that is located at a position in the 4β-1α structure that is not part of the hydrophobic core of the structure. Amino acid residues in the hydrophobic core of a GB1 peptidic compound are not significantly solvent exposed but rather tend to form intramolecular hydrophobic contacts. Unless explicitly defined otherwise, a hydrophobic core residue or core position, as described herein, of a GB1 scaffold domain that is described by SEQ ID NO: 1 is defined by one of

positions

2, 4, 6, 19, 25, 29, 33, 38, 42, 51 and 53 of the GB1 scaffold. The methodology used to specify hydrophobic core residues in GB1 is described by Dahiyat et al., (“Probing the role of packing specificity in protein design,” Proc. Natl. Acad. Sci. USA, 1997, 94, 10172-10177) where a PDB structure was used to calculate which side chains expose less than 10% of their surface area to solvent. Such methods can be modified for use with the GB1 scaffold domain.

As used herein, the term “surface mutation” refers to an amino acid mutation in a GB1 scaffold that is located at a position in the 4β-1α structure that is solvent exposed. Such variant amino acid residues at surface positions of a GB1 peptidic compound are capable of interacting directly with a target molecule, whether or not such an interaction occurs.

As used herein, the term “boundary mutation” refers to an amino acid mutation of a GB1 scaffold that is located at a position in the 4β-1α structure that is at the boundary between the hydrophobic core and the solvent exposed surface. Such variant amino acid residues at boundary positions of a GB1 peptidic compound may be in part contacting hydrophobic core residues and/or in part solvent exposed and capable of some interaction with a target molecule, whether or not such an interaction occurs. One criteria for describing core, surface and boundary residues of a GB1 peptidic structure is described by Mayo et al. Nature Structural Biology, 5(6), 1998, 470-475. Such methods and criteria can be modified for use with the GB1 scaffold domain.

As used herein, the term “linking sequence” refers to a continuous sequence of amino acid residues, or analogs thereof, that connect two peptidic motifs. In certain embodiments, a linking sequence is the loop connecting two β-strands in a 13-hairpin motif.

As used herein, the term “phage display” refers to a technique by which variant peptidic compounds are displayed as fusion proteins to a coat protein on the surface of phage, e.g. filamentous phage particles. The term “phagemid” refers to a plasmid vector having a bacterial origin of replication, e.g., Co1E1, and a copy of an intergenic region of a bacteriophage. The phagemid may be based on any known bacteriophage, including filamentous bacteriophage. In some instances, the plasmid will also contain a selectable marker for antibiotic resistance. Segments of DNA cloned into these vectors can be propagated as plasmids. When cells harboring these vectors are provided with all genes necessary for the production of phage particles, the mode of replication of the plasmid changes to rolling circle replication to generate copies of one strand of the plasmid DNA and package phage particles. The phagemid may form infectious or non-infectious phage particles. This term includes phagemids which contain a phage coat protein gene or fragment thereof linked to a heterologous polypeptide gene as a gene fusion such that the heterologous polypeptide is displayed on the surface of the phage particle.

As used herein, the term “phage vector” refers to a double stranded replicative form of a bacteriophage that contains a heterologous gene and is capable of replication. The phage vector has a phage origin of replication allowing phage replication and phage particle formation. In some cases, the phage is a filamentous bacteriophage, such as an M13, f1, fd, Pf3 phage or a derivative thereof, a lambdoid phage, such as lambda, 21, phi80, phi81, 82, 424, 434, etc., or a derivative thereof, a Baculovirus or a derivative thereof, a T4 phage or a derivative thereof, a T7 phage virus or a derivative thereof.

As used herein, the term “stable” refers to a compound that is able to maintain a folded state under physiological conditions at a certain temperature, such that it retains at least one of its normal functional activities, for example binding to a target protein. The stability of the compound can be determined using standard methods. For example, the “thermostability” of a compound can be determined by measuring the thermal melt (“Tm”) temperature. The Tm is the temperature in degrees Celsius at which half of the compounds become unfolded. In some instances, the higher the Tm, the more stable the compound.

The compounds of the subject libraries may contain one or more asymmetric centers and may thus give rise to enantiomers, diastereomers, and other stereoisomeric forms that may be defined, in terms of absolute stereochemistry, as (R)- or (S)- or, as (D)- or (L)- for amino acids and polypeptides. The present invention is meant to include all such possible isomers, as well as, their racemic and optically pure forms. When the compounds described herein contain olefinic double bonds or other centers of geometric asymmetry, and unless specified otherwise, it is intended that the compounds include both E and Z geometric isomers. Likewise, all tautomeric forms are also intended to be included.

As used herein, the term “a target protein” refers to all members of the target family, and fragments and enantiomers thereof, and protein mimics thereof. The target proteins of interest that are described herein are intended to include all members of the target family, and fragments and enantiomers thereof, and protein mimics thereof, unless explicitly described otherwise. The target protein may be any protein of interest, such as a therapeutic or diagnostic target, including but not limited to: hormones, growth factors, receptors, enzymes, cytokines, osteoinductive factors, colony stimulating factors and immunoglobulins. The term “target protein” is intended to include recombinant and synthetic molecules, which can be prepared using any convenient recombinant expression methods or using any convenient synthetic methods, or purchased commercially, as well as fusion proteins containing a target molecule, as well as synthetic L- or D-proteins.

As used herein, the term “protein mimic” refers to a peptidic compound that mimics a binding property of a protein of interest, e.g., a target protein. In general terms, the target protein mimic includes an essential part of the original target protein (e.g., an epitope or essential residues thereof) that is necessary for forming a potential binding surface, such that the target protein mimic and the original target protein are each capable of binding specifically to a binding moiety of interest, e.g., an antibody or a D-peptidic compound. In some embodiments, the part(s) of the original target protein that is essential for binding is displayed on a scaffold such that potential binding surface of the original target protein is mimicked. Any suitable scaffold for displaying the minimal essential part of the target protein may be used, including but not limited to antibody scaffolds, scFv, anticalins, non-antibody scaffolds, mimetics of protein secondary and tertiary structures. In some embodiments, a target protein mimic includes residues or fragments of the original target protein that are incorporated into a protein scaffold, where the scaffold mimics a structural motif of the target protein. For example, by incorporating residues of the target protein at desirable positions of a convenient scaffold, the protein mimic may present a potential binding surface that mimics that of the original target protein. In some embodiments, the native structure of the fragments of the original target protein are retained using methods of conformational constraint. Any convenient methods of conformationally constraining a peptidic compound may be used, such as but not limited to, bioconjugation, dimerization (e.g., via a linker), multimerization, or cyclization.

DETAILED DESCRIPTION

GB1 peptidic libraries and methods of screening the same for the identification of compounds that specifically bind to target proteins are provided. The subject libraries include a plurality of GB1 peptidic compounds, where each GB1 peptidic compound has a scaffold domain of the same structural motif as the B1 domain of Protein G (GB1), where the structural motif of GB1 is characterized by a motif that includes an arrangement of four 13-strands and one α-helix around a hydrophobic core (also referred to as a 4β-1α motif). The GB1 peptidic compounds of the subject libraries include mutations at non-core positions, e.g., variant amino acids at positions within a GB1 scaffold domain that are not part of the hydrophobic core of the structure. A 4β-1α motif is depicted in FIG. 1.
A variety of libraries of GB1 peptidic compounds are provided. For library diversity, both the positions of the mutations and the nature of the mutation at each variable position of the scaffold may be varied. In some instances, the mutations are included at non-core positions, although mutations at core positions may also be included. The mutations may confer different functions on the resulting GB1 peptidic compounds, such as specific binding to a target molecule. The mutations may be selected at positions of a GB1 scaffold domain that are solvent exposed such that the variant amino acids at these positions can form part of a potential target molecule binding surface, although mutations at selected core and/or boundary positions may also be included. In a subject library, the mutations may be concentrated in a variable domain that defines one of several distinct potential binding surfaces of the GB1 scaffold domain. Libraries of GB1 peptidic compounds are provided that include distinct arrangements of mutations concentrated at various surfaces of the 413-1α motif, for example, as depicted in FIGS. 2A-2B. The subject libraries may include compounds that specifically bind to a target molecule via one of the several potential binding sites of the GB1 scaffold domain. Mutations may be included at the potential binding surface to provide for specific binding to a target molecule without significantly disrupting the GB1 peptidic structure.
In the subject methods, a GB1 peptidic library is contacted with a target molecule to screen for a compound of the library that specifically binds to the target with high affinity. The subject methods and libraries find use in a variety of applications, including screening applications.
Before certain embodiments are described in greater detail, it is to be understood that this invention is not limited to certain embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing certain embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention, representative illustrative methods and materials are now described.
All publications and patents cited in this specification are herein incorporated by reference as if each individual publication or patent were specifically and individually indicated to be incorporated by reference and are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. The citation of any publication is for its disclosure prior to the filing date and should not be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed.
It is noted that, as used herein and in the appended claims, the singular forms “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,” “only” and the like in connection with the recitation of claim elements, or use of a “negative” limitation.
Each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present invention. Any recited method can be carried out in the order of events recited or in any other order which is logically possible.
In further describing the various aspects of the invention, the structures and sequences of members of the various libraries are described first in greater detail, followed by a description of methods of screening and applications in which the libraries finds use.

Libraries

As summarized above, aspects of the invention include libraries of GB1 peptidic compounds where each GB1 peptidic compound has a scaffold domain of the same structural motif as the B1 domain of Protein G (GB1), where the structural motif of GB1 is characterized by a motif that includes an arrangement of four β-strands and one α-helix (also referred to as a 4β-1α motif) around a hydrophobic core. The GB1 peptidic compounds of the subject libraries include mutations at various non-core positions of the 4β-1α motif, e.g., variant amino acids at non-core positions within a GB1 scaffold domain. In many embodiments, the four β-strands and one α-helix motifs of the structure are arranged in a hairpin-helix-hairpin motif, e.g., β1-β2-α1-β3-β4 where β1-β4 are β-strand motifs and α1 is a helix motif. A GB1 peptidic hairpin-helix-hairpin motif is depicted in FIG. 1.
A GB1 scaffold domain may be any polypeptide, or fragment thereof that includes the 4β-1α motif, whether naturally occurring or synthetic. The GB1 scaffold domain may be a native sequence of a member of the IgG binding B domain protein family, a IgG binding B domain sequence with pre-existing amino acid sequence modifications (such as additions, deletions and/or substitutions), or a fragment or analogue thereof. GB1 scaffold domains include those described in the following references Gronenborn et al., FEBS Letters 398 (1996), 312-316; Kotz et al., Eur. J. Biochem. 271, 1623-1629 (2004); Malakaukas et al., Nature Structural Biology, 5(6), 1998, p. 470-475; Minor Jr. et al., Nature, 367, 1994, 660-663; Nauli et al. Nature Structural Biology, 8(7), 2001, 602-605; Smith et al., Biochemistry, 1994, 33, 5510-5517; Wunderlich et al. J. Mol. Biol. (2006) 363, 545-557; and analogs or fragments thereof; and those scaffolds described in the definitions section above. In certain embodiments, a GB1 scaffold domain has an amino acid sequence as set forth in one of SEQ ID NOs: 1 and 227-261. In certain embodiments, a GB1 scaffold domain includes a sequence having 60% or more amino acid sequence identity, such as 70% or more, 80% or more, 90% or more, 95% or more or 98% or more amino acid sequence identity to an amino acid sequence set forth in one of SEQ ID NO: 1 and 227-261. A GB1 scaffold domain sequence may include 1 or more, such as 2 or more, 3 or more, 4 or more, 5 or more, 10 or more, 15 or more, or even 20 or more additional peptidic residues compared to a native IgG binding B domain sequence. Alternatively, a GB1 scaffold domain sequence may include fewer peptidic residues compared a native IgG binding B domain sequence, such as 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10, or even fewer residues.
Exemplary GB1 scaffold domain sequences from the Wellcome Trust Sanger Institute Pfam database are shown in the following sequence alignments:

B4U242_STREM/244-298
(SEQ ID NO: 227)
...S.YKLVIKGATFSGETATKAVDAAVAEQ.TFRDYANKNGVDGVWAYDAATKTFTVTE...

B4U242_STREM/316-370
(SEQ ID NO: 228)
....TYRLVIKGVTFSGETATKAVDAATAEQ.TFRQYANDNGITGEWAYDTATKTFTVTE...

C0MA37_STRE4/228-282
(SEQ ID NO: 229)
...S.YKLVIKGATFSGETATKAVDAAVAEQ.TFRDYANKNGVDGVWAYDAATKTFTVTE...

C0MA37_STRE4/300-354
(SEQ ID NO: 230)
....TYRLVIKGVTFSGETATKAVDAATAEQ.TFRQYANDNGVTGEWAYDAATKTFTVTE...

C0MCK9_STRS7/228-282
(SEQ ID NO: 231)
...S.YKLVIKGATFSGETATKAVDAAVAEQ.TFRDYANKNGVDGVWAYDAATKTFTVTE...


C0MCK9_STRS7/300-354
(SEQ ID NO: 232)
....TYRLVIKGVTFSGETSTKAVDAATAEQ.TFRQYANDNGVTGEWAYDAATKTFTVTE...

Q1JGB6_STRPD/117-137
(SEQ ID NO: 233)
ANIP........................AEK.AFRQYANDNGVDGV.................

Q53291_PEPMA/330-384
(SEQ ID NO: 234)
....TYKLILNGKTLKGETTTEAVDAATAEK.VFKQYANDNGVDGEWTYDDATKTFTVTE...

Q53291_PEPMA/400-454
(SEQ ID NO: 235)
....TYKLVINGKTLKGETTTKAVDAETAEK.AFKQYANDNGVDGVWTYDDATKTFTVTE...

Q53337_9STRE/3-57
(SEQ ID NO: 236)
....TYKLVINGKTLKGETTTKTVDAETAEK.AFKQYANDNGVDGVWTYDDATKTFTVTE...

Q53974_STRDY/258-312
(SEQ ID NO: 237)
....TYKLVINGKTLKGETTTKAVDAETAEK.AFKQYANENGVDGVWTYDDATKTFTVTE...

Q53975_STRDY/224-278
(SEQ ID NO: 238)
....TYKLVVKGNTFSGETTTKAIDTATAEK.EFKQYATANNVDGEWSYDDATKTFTVTE...

Q53975_STRDY/294-348
(SEQ ID NO: 239)
....TYKLIVKGNTFSGETTTKAVDAETAEK.AFKQYATANNVDGEWSYDDATKTFTVTE...

Q53975_STRDY/364-418
(SEQ ID NO: 240)
....TYKLIVKGNTFSGETTTKAIDAATAEK.EFKQYATANGVDGEWSYDDATKTFTVTE...

Q53975_STRDY/434-488
(SEQ ID NO: 241)
....TYKLIVKGNTFSGETTTKAVDAETAEK.AFKQYANENGVYGEWSYDDATKTFTVTE...

Q53975_STRDY/504-558
(SEQ ID NO: 242)
....TYKLVINGKTLKGETTTKAVDAETAEK.AFKQYANENGVDGVWTYDDATKTFTVTE...

Q54181_STRSG/1-45
(SEQ ID NO: 243)
..............MKGETTTEAVDAATAEK.VFKQYANDNGVDGEWTYDDATKTFTVTE...

Q54181_STRSG/131-185
(SEQ ID NO: 244)
....TYKLVINGKTLKGETTTKAVDAETAEK.AFKQYANDNGVDGVWTYDDATKTFTVTE...

Q54181_STRSG/61-115
(SEQ ID NO: 245)
....TYKLVINGKTLKGETTTEAVDAATAEK.VFKQYANDNGVDGEWTYDDATKTFTVTE...

Q56192_STAXY/238-290
(SEQ ID NO: 246)
....TYKLILNGKTLKGETTTEAVDAATARSFNFPILENSSSVPGDPLESTCMH......VEH

Q56193_STAXY/238-293
(SEQ ID NO: 247)
....TYKLILNGKTLKGETTTEAVDAATARSFNFPILENSSSVPGDPLESTCRHASFAQA...

Q56212_STRSZ/228-282
(SEQ ID NO: 248)
...S.YKLVIKGATFSGETATKAVDAAVAEQ.TFRDYANKNGVDGVWAYDAATKTFTVTE...

Q56212_STRSZ/300-354
(SEQ ID NO: 249)
....TYRLVIKGVTFSGETATKAVDAATAEQ.AFRQYANDNGVTGEWAYDAATKTFTVTE...

Q76K19_STRSZ/232-286
(SEQ ID NO: 250)
...S.YKLVIKGATFSGETATKAVDAAVAEQ.TFRDYANKNGVDGVWAYDAATKTFTVTE...

Q76K19_STRSZ/304-358
(SEQ ID NO: 251)
....TYRLVIKGVTFSGETATKAVDAATAEQ.TFRQYANDNGITGEWAYDTATKTFTVTE...

Q93EM8_STRDY/224-278
(SEQ ID NO: 252)
....TYKLVVKGNTFSGETTTKAIDTATAEK.EFKQYATANNVDGEWSYDDATKTFTVTE...

Q93EM8_STRDY/294-348
(SEQ ID NO: 253)
....TYKLIVKGNTFSGETTTKAIDAATAEK.EFKQYATANNVDGEWSYDYATKTFTVTE...

Q93EM8_STRDY/364-418
(SEQ ID NO: 254)
....TYKLIVKGNTFSGETTTKAIDAATAEK.EFKQYATANNVDGEWSYDDATKTFTVTE...

Q93EM8_STRDY/434-488
(SEQ ID NO: 255)
....TYKLIVKGNTFSGETTTKAVDAETAEK.AFKQYATANNVDGEWSYDDATKTFTVTE...

Q93EM8_STRDY/504-558
(SEQ ID NO: 256)
....TYKLVINGKTLKGETTTKAVDVETAEK.AFKQYANENGVDGVWTYDDATKTFTVTE...

SPG1_STRSG/228-282
(SEQ ID NO: 257)
....TYKLILNGKTLKGETTTEAVDAATAEK.VFKQYANDNGVDGEWTYDDATKTFTVTE...

SPG1_STRSG/298-352
(SEQ ID NO: 258)
....TYKLVINGKTLKGETTTKAVDAETAEK.AFKQYANDNGVDGVWTYDDATKTFTVTE...

SPG2_STRSG/303-357
(SEQ ID NO: 259)
....TYKLILNGKTLKGETTTEAVDAATAEK.VFKQYANDNGVDGEWTYDDATKTFTVTE...

SPG2_STRSG/373-427
(SEQ ID NO: 260)
....TYKLVINGKTLKGETTTEAVDAATAEK.VFKQYANDNGVDGEWTYDDATKTFTVTE...

SPG2_STRSG/443-497
(SEQ ID NO: 261)
....TYKLVINGKTLKGETTTKAVDAETAEK.AFKQYANDNGVDGVWTYDDATKTFTVTE...

In some embodiments, the GB1 scaffold domain is described by the following sequence: (T/S)Y(K/R)L(Z1)(Z1)(N/K)G(K/N/V/A)T(L/F)(K/S)GET(T/A/S)T(K/E)(A/T)(V/I)D(A/T/V) (A/E)(T/V)AE(K/Q)(A/E/T/V)F(K/R)(Q/D)YA(N/T)(A/D/E/K)N(G/N)(Z3)(D/T)G(E/V)W(A/T/S)YD(D/A/Y/T)ATKT(Z1)T(Z1)TE (SEQ ID NO:262) where each Z1 is independently a hydrophobic residue. In some embodiments, the GB1 scaffold domain is described by the following sequence: (T/S)Y(K/R)L(I/V)(L/I/V)(N/K)G(K/N/V/A)T(L/F)(K/S)GET(T/A/S)T(K/E)(A/T)(V/I)D(A/T/V)(A/E)(T/V)AE(K/Q)(A/E/T/V)F(K/R)(Q/D)YA(N/T)(A/D/E/K)N(G/N)(V/I)(D/T)G(E/V) W(A/T/S)YD(D/A/Y/T)ATKTFTVTE (SEQ ID NO:263). In certain embodiments, GB1 scaffold domain is described by the following sequence: TYKL(I/V)(L/I/V)(N/K)G(K/N)T(L/F)(K/S)GET(T/A)T(K/E)AVD(A/T/V)(A/E)TAE(K/Q)(A/E/T/V)F(K/R)QYA(N/T)(A/D/E/K)N(G/N)VDG(E/V)W(A/T/S)YD(D/A)ATKTFTVTE (SEQ ID NO:264). A mutation in a scaffold domain may include a deletion, insertion, or substitution of an amino acid residue at any convenient position to produce a sequence that is distinct from the reference scaffold domain sequence.
In some embodiments, the GB1 scaffold domain is described by the following sequence: T(Z2)K(Z1)(Z1)(Z1)(N/V)(G/L/I)(K/G)(Q/T/D)(L/A/R)(K/V)(G/E/V)(E/V)(A/T/R/I/P/V)(T/I) (R/W/L/K/V/T/I)E(A/L/I)VDA(A/G)(T/E)(A/V/F)EK(V/I/Y)(F/L/W/I/A)K(L/Q)(Z1)(Z3)N(A/D)(K/N)(T/G)(V/I)(E/D)G(V/E)(W/F)TY(D/K)D(E/A)(T/I)KT(Z1)T(Z1)TE (SEQ ID NO:265), where each Z1 is independently a hydrophobic residue, Z2 is an aromatic hydrophobic residue, and Z3 is a non-aromatic hydrophobic residue.
In some embodiments, the GB1 scaffold domain is described by the following sequence:

(SEQ ID NO: 266)
T(Y/F/W/A)K(L/V/I/M/F/Y/A)(L/V/I/F/M)(L/V/I/F/M/A/Y/S)(N/V)(G/L/I)(K/G)(Q/T/D)(L/

A/R)(K/V)(G/E/V)(E/V)(A/T/R/I/P/V)(T/I)(R/W/L/K/V/T/I)E(A/L/I)VDA(A/G)(T/E)(A/V/F)

EK(V/I/Y)(F/L/W/I/A)K(L/Q)(W/F/L/M/Y/I)(L/V/I/A)N(A/D)(K/N)(T/G)(V/I)(E/D)G(V/E)

(W/F)TY(D/K)D(E/A)(T/I)KT(L/V/I/F/M/W)T(L/V/I/F/M)TE .

The diversity of the subject libraries is designed to maximize diversity while minimizing structural perturbations of the GB1 scaffold domain. The positions to be mutated are selected to ensure that the GB1 peptidic compounds of the subject libraries can maintain a folded state under physiological conditions. Another aspect of generating diversity in the subject libraries is the selection of amino acid positions to be mutated such that the amino acids can form a potential binding surface in the GB1 scaffold domain, whether or not the residues actually contact a target protein. One way of determining whether an amino acid position is part of a potential binding surface involves examining the three dimensional structure of the GB1 scaffold domain, using a computer program such as the UCSF Chimera program. Other ways include crystallographic and genetic mutational analysis. Any convenient method may be used to determine whether an amino acid position is part of a potential binding surface.
The mutations may be found at positions in the GB1 scaffold domain where the amino acid residue is at least in part solvent exposed. Solvent exposed positions can be determined using software suitable for protein modeling and three-dimensional structural information obtained from a crystal structure. For example, solvent exposed residues may be determined using the Protein Data Bank (PDB) structure 3 GB1 and estimating the solvent accessible surface area (SASA) for each residue using the GETarea tool (Fraczkiewicz & Braun, “Exact and efficient analytical calculation of the accessible surface areas and their gradients for macromolecules,” J. Comput. Chem. 1998, 19, 319-333). This tool calculates the ratio of SASA in structure compared to SASA in a random coil. A ratio of 0.4 was used in selecting the following solvent accessible residues (shown in bold): TYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVT E (SEQ ID NO:1).
The mutations of the GB1 scaffold domain may be concentrated at one of several different potential binding surfaces of the scaffold domain. Several distinct arrangements of mutations of the GB1 scaffold domain at non-core positions of the hairpin-helix-hairpin scaffold domain are provided. In some instances, the majority of the mutations are at non-core positions of the GB1 scaffold domain (e.g., solvent exposed or boundary positions) however in some cases one or more mutations may be located at hydrophobic core positions. In certain embodiments, mutations at hydrophobic core position may be tolerated without significantly disrupting the GB1 scaffold structure, such as, when those core mutations are selected in a loop region. In such cases the loop region may form a structure or conformation that is different to that of the parent scaffold.
In certain embodiments, the GB1 scaffold may have loop regions that are independently selected from any one of the loop sequences set forth in Table 1: (SEQ ID NOs: 67-196 and 267-272). Any of the loop sequences 1-4 of Table 1 may be incorporated at the positions indicated in Table 1 into any convenient GB1 scaffold domain (e.g., SEQ ID NO: 1) to produce another GB1 scaffold domain.
In certain embodiments, mutations at boundary positions may also be tolerated without significantly disrupting the GB1 scaffold structure. Mutations at such positions may confer desirable properties upon the resulting GB1 compound variants, such as stability, a certain structural property, or specific binding to a target molecule.
The positions of the mutations in the GB1 scaffold domain may be described herein either by reference to a structural motif or region, or by reference to a position number in the primary sequence of the scaffold domain. FIG. 3 illustrates the alignment of the position numbering scheme for a GB1 scaffold domain relative to its β1, β2, α1, β3 and β4 motifs, and relative to the mutations of certain libraries of the invention. Positions marked with an asterix indicate exemplary positions at which mutations that include the insertion of one or more amino acids may be included. Any GB1 scaffold domain sequence may be substituted for the scaffold sequence depicted in FIG. 3, and the positions of the mutations that define a subject library may be transferred from one scaffold to another by any convenient method. For example, a sequence alignment method may be used to place any GB1 scaffold domain sequence within the framework of the position numbering scheme illustrated in FIG. 3. Alignment methods based on structural motifs such as β-strands and α-helices may also be used to place a GB1 scaffold domain sequence within the framework of the position numbering scheme illustrated in FIG. 3.
In some cases, a first GB1 scaffold domain sequence may be aligned with a second GB1 scaffold domain sequence that is one or more amino acids longer or shorter. For example, the second GB1 scaffold domain may have one or more additional amino acids at the N-terminal or C-terminal relative to the first GB1 scaffold, or may have one or more additional amino acids in one of the loop regions of the structure. In such cases, a numbering scheme such as is described below for insertion mutations may be used to relate two scaffold domain sequences.
Another aspect of the diversity of the subject libraries is the size of the library, i.e, the number of distinct compounds of the library. In some embodiments, a subject library includes 50 or more distinct compounds, such as 100 or more, 300 or more, 1×10³or more, 1×10⁴or more, 1×10⁵or more, 1×10⁶or more, 1×10⁷or more, 1×10⁸or more, 1×10⁹or more, 1×10¹⁰or more, 1×10¹¹or more, or 1×10¹²or more, distinct compounds.
A subject library may include GB1 peptidic compounds each having a hairpin-helix-hairpin scaffold domain described by formula (I):
P1-α1-P2 (I)
where P1 and P2 are independently beta-hairpin domains and α1 is a helix domain and P1, α1 and P2 are connected independently by linking sequences of between 1 and 10 residues in length. In some embodiments, in formula (I), P1 is β1-β2 and P2 is β3-β4 such that the compounds are described by formula (II):
β1-β2-α1-β3-β4 (II)
where β1, β2, β3 and β4 are independently beta-strand domains and α1 is a helix domain, and β1, β2, α1, β3 and β4 are connected independently by linking sequences of between 1 and 10 residues in length, such as, between 2 and 8 residues, or between 3 and 6 residues in length. In certain embodiments, each linking sequence is independently of 3, 4, 5, 6, 7 or 8 residues in length, such as 4 or 5 residues in length.
In certain embodiments, the linking sequences may form a loop or a turn structure. For example, the two antiparallel β-strands of a hairpin motif may be connected via a loop. Mutations in a linking sequence that includes insertion or deletion of one or more amino acid residues may be tolerated without significantly disrupting the GB1 scaffold structure. In some embodiments, in formulas (I) and (II), each compound of the subject library includes mutations in one or more linking sequences. In certain embodiments, 80% or more, 90% or more, 95% or more, or even 100% of the mutations are at positions within the regions of the linking sequences. In certain embodiments, in formulas (I) and (II), at least one of the linking sequences is one or more (e.g., such as 2 or more) residues longer in length than the corresponding linking sequence of the GB1 scaffold. In certain embodiments, in formulas (I) and (II), at least one of the linking sequences is one or more residues shorter in length than the corresponding linking sequence of the GB1 scaffold.
In some embodiments, one or more positions in the scaffold may be selected as positions at which to include insertion mutations, e.g., mutations that include the insertion of 1 or 2 additional amino acid residues in addition to the amino acid residue being substituted. In certain embodiments, the insertion mutations are selected for inclusion in one or more loop regions, or at the N-terminal or C-terminal of the scaffold. The positions of the variant amino acids that are inserted may be referred to using a letter designation with respect to the numbered position of the mutation, e.g., an insertion mutation of 2 amino acids at position 38 may be referred to as positions 38a and 38b.
In certain embodiments, the subject library includes a mutation at position 38 that includes insertion of 0, 1 or 2 variant amino acids. In certain embodiments, the subject library includes a mutation at position 19 that includes insertion of 0, 1 or 2 variant amino acids. In certain embodiments, the subject library includes a mutation at position 1 that includes insertion of 2 variant amino acids, and at positions 19 and 47 that each include insertion of 0, 1 or 2 variant amino acids. In certain embodiments, the subject library includes mutations at positions 9 and 38 that each includes insertion of 0, 1 or 2 variant amino acids, and at position 55 that includes insertion of 1 variant amino acid. In certain embodiments, the subject library includes a mutation at position 9 that includes insertion of 0, 1 or 2 variant amino acids, and at position 55 that includes insertion of 1 variant amino acid. In certain embodiments, the subject library includes a mutation at position 1 that includes insertion of 1 variant amino acid and at position 47 that includes insertion of 0, 1 or 2 variant amino acids.
In some cases, when an insertion mutation (e.g., insertion of one or more additional variant amino acids) is made in a GB1 scaffold, the resulting GB1 compound variants may be aligned with the parent GB1 scaffold in different ways. For example, an insertion mutation including 2 additional variant amino acids at position 38 of the GB1 scaffold may lead to GB1 compound variants where the loop regions between the α1 and P3 regions can be aligned with the GB1 scaffold domain in two or more distinct ways. In other words, the resulting GB1 compounds may encompass various distinct loop sequences and/or structures that align differently with the parent GB1 scaffold domain. In some cases, the various distinct loop sequences are produced when the insertion mutation is in a variable loop region (e.g. where most of the loop region is being mutated).
In some embodiments, each compound of a subject library includes 4 or more, such as, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 11 or more, 12 or more, 13 or more, 14 or more, or 15 or more mutations at different positions of a hairpin-helix-hairpin scaffold domain. The mutations may involve the deletion, insertion, or substitution of the amino acid residue at the position of the scaffold being mutated. The mutations may include substitution with any naturally or non-naturally occurring amino acid, or an analog thereof.
In some embodiments, each compound of a subject library includes 3 or more different non-core mutations, such as, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 11 or more, or 12 or more different non-core mutations in a region outside of the β1-β2 region.
In some embodiments, each compound of a subject library includes 3 or more different non-core mutations, such as, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, 10 or more or 11 or more different non-core mutations in the α1 region.
In some embodiments, each compound of a subject library includes 3 or more different non-core mutations, such as 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more or 10 or more different non-core mutations in the β3-β4 region.
In some embodiments, each compound of a subject library includes 5 or more different non-core mutations, such as 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 11 or more, or 12 or more different non-core mutations in the α1-β3 region.
In certain embodiments, each compound of a subject library includes ten or more different mutations, where the ten or more different mutations are located at positions selected from the group consisting of positions 21-24, 26, 27, 30, 31, 34, 35, 37-41.
In certain embodiments, each compound of a subject library includes ten or more different mutations, where the ten or more different mutations are located at positions selected from the group consisting of positions 18-24, 26-28, 30-32, 34 and 35.
In certain embodiments, each compound of a subject library includes ten or more different mutations, where the ten or more different mutations are located at positions selected from the group consisting of positions 1, 18-24 and 45-49. In certain embodiments, each compound of a subject library includes ten or more different mutations, where the ten or more different mutations are located at positions selected from the group consisting of positions 7-12, 36-41, 54 and 55.
In certain embodiments, each compound of a subject library includes ten or more different mutations, where the ten or more different mutations are located at positions selected from the group consisting of positions 3, 5, 7-14, 16, 52, 54 and 55.
In certain embodiments, each compound of a subject library includes ten or more different mutations, where the ten or more different mutations are located at positions selected from the group consisting of positions 1, 3, 5, 7, 41, 43, 45-50 52 and 54.
In certain embodiments, each compound of a subject library includes five or more different mutations in the α1 region. In certain embodiments, five or more different mutations are located at positions selected from the group consisting of positions 22-24, 26, 27, 30, 31, 34 and 35.
In certain embodiments, each compound of a subject library includes ten or more different mutations in the α1 region. In certain embodiments, the ten or more different mutations are located at positions selected from the group consisting of positions 22-24, 26, 27, 28, 30, 31, 32, 34 and 35.
In certain embodiments, each compound of a subject library includes three or more different mutations in the β3-β4 region. In certain embodiments, the three or more different mutations are located at positions selected from the group consisting of positions 41, 54 and 55. In certain embodiments, the three or more different mutations are located at positions selected from the group consisting of positions 52, 54 and 55.
In certain embodiments, each compound of a subject library includes five or more different mutations in the β3-β4 region. In certain embodiments, the five or more different mutations are located at positions selected from the group consisting of positions 45-49. In certain embodiments, each compound of a subject library includes nine or more different mutations in the β3-β4 region. In certain embodiments, the nine or more different mutations are located at positions selected from the group consisting of positions 41, 43, 45-50 52 and 54.
In certain embodiments, each compound of a subject library includes two or more different mutations in the region between the al and β3 regions, e.g., mutations in the linking sequence between al and β3. In certain embodiments, the two or more different mutations are located at positions selected from the group consisting of positions 37-40.
In certain embodiments, each compound of a subject library includes three or more, four or more, five or more, six or more, or ten or more different mutations in the β1-β2 region. In certain embodiments, the ten or more different mutations in the β1-β2 region are located at positions selected from the group consisting of positions 3, 5, 7-14 and 16.
In some embodiments, each compound of a subject library is described by a formula independently selected from the group consisting of:
F1-V1-F2 (III);
F3-V2-F4 (IV);
V3-F5-V4-F6-V5-F7 (V);
F8-V6-F9-V7-F10-V8 (VI);
V9-F11-V10 (VII); and
V11-F12-V12 (VIII)
where F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11 and F12 are fixed regions and V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, V11 and V12 are variable regions;
where each fixed region is common to all compounds of the same formula and each compound of the library has a distinct variable region.
In certain embodiments, each compound of a subject library is described by formula (III), where:
F1 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to the amino acid sequence TYKLILNGKTLKGETTTEA (SEQ ID NO: 2);
F2 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to an amino acid sequence TYDDATKTFTVTE (SEQ ID NO: 3); and
V1 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to the amino acid sequence VDAATAEKVFKQYANDNGVDGEW (SEQ ID NO: 4), where each compound of the library comprises 10 or more mutations (e.g., 11, 12, 13, 14 or 15 or more mutations) in the V1 variable region.
In certain embodiments, in formula (III), V1 comprises a sequence of the following formula: VXXXXAXXVFXXYAXXNXXXXXW (SEQ ID NO: 5), where each X is a variant amino acid.
In certain embodiments, in formula (III), F1 comprises the sequence TYKLILNGKTLKGETTTEA (SEQ ID NO: 2), F2 comprises the sequence TYDDATKTFTVTE (SEQ ID NO: 3), and V1 comprises a sequence of the following formula: VXXXXAXXVFXXYAXXNXXXXXW (SEQ ID NO: 6) where each X is independently selected from the group consisting of A, D, F, S, V and Y.
In certain embodiments, in formula (III), the mutation at position 19 of V1 includes insertion of 0, 1 or 2 variant amino acids.
In certain embodiments, each compound of a subject library is described by formula (IV), where:
F3 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to the amino acid sequence TYKLILNGKTLKGETT (SEQ ID NO: 7);
F4 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to an amino acid sequence GVDGEWTYDDATKTFTVTE (SEQ ID NO: 8); and
V2 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to the amino acid sequence TEAVDAATAEKVFKQYANDN (SEQ ID NO: 9), where each compound of the library comprises 10 or more mutations (e.g., 11, 12, 13, 14 or 15 or more mutations) in the V2 variable region.
In certain embodiments, in formula (IV), V2 comprises a sequence of the formula: TXXXXXXXAXXXFXXXAXXN (SEQ ID NO: 10), where each X is a variant amino acid.
In certain embodiments, in formula (IV), F3 comprises the sequence TYKLILNGKTLKGETT (SEQ ID NO: 7), F4 comprises the sequence GVDGEWTYDDATKTFTVTE (SEQ ID NO: 8), and V2 comprises a sequence of the formula: TXXXXXXXAXXXFXXXAXXN (SEQ ID NO: 11) where each X is independently selected from the group consisting of A, D, F, S, V and Y.
In certain embodiments, in formula (IV), the mutation at position 3 of V2 includes insertion of 0, 1 or 2 variant amino acids.
In certain embodiments, each compound of a subject library is described by formula (V), where:
F5 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to the amino acid sequence KLILNGKTLKGETT (SEQ ID NO: 12);
F6 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to an amino acid sequence EKVFKQYANDNGVDGEWT (SEQ ID NO: 13);
F7 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to an amino acid sequence FTVTE (SEQ ID NO: 14);
V3 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to an amino acid sequence TY; and
V4 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to an amino acid sequence TEAVDAATA (SEQ ID NO: 15); and
V5 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to an amino acid sequence YDDATKT (SEQ ID NO: 16);
where each compound of the library comprises one or more mutation in the V3 variable region, 3 or more mutations (e.g., 4, 5, 6 or 7 or more mutations) in the V4 variable region, and 3 or more mutations (e.g., 4 or 5 or more mutations) in the V5 variable region.
In certain embodiments, in formula (V), V3 comprises a sequence of the formula XY, V4 comprises a sequence of the formula TXXXXXXXA (SEQ ID NO: 17), and V5 comprises a sequence of the formula YXXXXXT (SEQ ID NO: 18) where each X is a variant amino acid.
In certain embodiments, in formula (V), F5 comprises the sequence KLILNGKTLKGETT (SEQ ID NO: 12), F6 comprises the sequence EKVFKQYANDNGVDGEWT (SEQ ID NO: 13), F7 comprises the sequence FTVTE (SEQ ID NO: 14), V3 comprises a sequence of the formula XY, V4 comprises a sequence of the formula TXXXXXXXA (SEQ ID NO: 19), and V5 comprises a sequence of the formula YXXXXXT (SEQ ID NO: 20) where each X is independently selected from the group consisting of A, D, F, S, V and Y.
In certain embodiments, in formula (V), the mutation at position 1 of V3 includes insertion of 2 variant amino acids, and the mutations at positions 3 and 4 of V4 and V5, respectively, each include insertion of 0, 1 or 2 variant amino acids.
In certain embodiments, each compound of a subject library is described by formula (VI), where:
F8 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to the amino acid sequence TYKLI (SEQ ID NO: 21);
F9 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to the amino acid sequence ETTTEAVDAATAEKVFKQYAN (SEQ ID NO: 22);
F10 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to the amino acid sequence TYDDATKTFT (SEQ ID NO: 23);
V6 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to an amino acid sequence LNGKTLKG (SEQ ID NO: 24);
V7 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to an amino acid sequence DNGVDGEW (SEQ ID NO: 25);
V8 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to an amino acid sequence VTE;
where each compound of the library comprises 3 or more mutations (e.g., 4, 5 or 6 or more mutations) in the V6 variable region, 3 or more mutations (e.g., 4, 5 or 6 or more mutations) in the V7 variable region; and one or more mutations (e.g., 2 or more mutations) in the V8 variable region.
In certain embodiments, in formula (VI), V6 comprises a sequence of the formula LXXXXXXG (SEQ ID NO: 26), V7 comprises a sequence of the formula DXXXXXXW (SEQ ID NO: 27), and V8 comprises a sequence of the formula VXX where each X is a variant amino acid.
In certain embodiments, in formula (VI), F8 comprises the sequence TYKLI (SEQ ID NO: 21), F9 comprises the sequence ETTTEAVDAATAEKVFKQYAN (SEQ ID NO: 22), F10 comprises the sequence TYDDATKTFT (SEQ ID NO: 23), V6 comprises a sequence of the formula LXXXXXXG (SEQ ID NO: 28), V7 comprises a sequence of the formula DXXXXXXW (SEQ ID NO: 29), and V8 comprises a sequence of the formula VXX where each X is independently selected from the group consisting of A, D, F, S, V and Y.
In certain embodiments, in formula (VI), the mutations at position 4 of V6 and V7 each include insertion of 0, 1 or 2 variant amino acids, and the mutation at position 3 of V8 includes insertion of 1 variant amino acid.
In certain embodiments, each compound of a subject library is described by formula (VII), where:
F11 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to an amino acid sequence EAVDAATAEKVFKQYANDNGVDGEWTYDDATKT (SEQ ID NO: 30);
V9 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to an amino acid sequence TYKLILNGKTLKGETTT (SEQ ID NO: 31); and
V10 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to an amino acid sequence FTVTE (SEQ ID NO: 32);
where each compound of the library comprises 6 or more mutations (e.g., 7, 8, 9, 10 or 11 or more mutations) in the V9 variable region, and 2 or more mutations (e.g., 3 or more mutations) in the V10 variable region.
In certain embodiments, in formula (VII), V9 comprises a sequence of the formula TYXLXLXXXXXXXXTXT (SEQ ID NO: 33), and V10 comprises a sequence of the formula FXVXX (SEQ ID NO: 34), where each X is a variant amino acid.
In certain embodiments, in formula (VII), F11 comprises the sequence EAVDAATAEKVFKQYANDNGVDGEWTYDDATKT (SEQ ID NO: 30); V9 comprises a sequence of the formula TYXLXLXXXXXXXXTXT (SEQ ID NO: 35), and V10 comprises a sequence of the formula FXVXX (SEQ ID NO: 36), where each X is independently selected from the group consisting of A, D, F, S, V and Y.
In certain embodiments, in formula (VII), the mutation at position 9 of V9 includes insertion of 0, 1 or 2 variant amino acids, and the mutation at position 5 of V10 includes insertion of 1 variant amino acid.
In certain embodiments, each compound of a subject library is described by formula (VIII), where:
F12 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to the amino acid sequence KTLKGETTTEAVDAATAEKVFKQYANDNGVD (SEQ ID NO: 37);
V11 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to the amino acid sequence TYKLILNG (SEQ ID NO: 38);
V12 comprises a sequence having 60% or more (e.g., 70% or more, 80% or more, 90% or more, or 95% or more) amino acid sequence identity to the amino acid sequence GEWTYDDATKTFTVTE (SEQ ID NO: 39);
where each compound of the library comprises 3 or more mutations (e.g., 4 or more mutations) in the V11 variable region, and 5 or more mutations (e.g., 6, 7, 8, 9 or 10 or more mutations) in the V12 variable region.
In certain embodiments, in formula (VIII), V11 comprises a sequence of the formula XYXLXLXG (SEQ ID NO: 40), and V12 comprises a sequence of the formula GXWXYXXXXXXFXVXE (SEQ ID NO: 41), where each X is a variant amino acid.
In certain embodiments, in formula (VIII), F12 comprises the sequence KTLKGETTTEAVDAATAEKVFKQYANDNGVD (SEQ ID NO: 37), V11 comprises a sequence of the formula XYXLXLXG (SEQ ID NO: 42), and V12 comprises a sequence of the formula GXWXYXXXXXXFXVXE (SEQ ID NO: 43), where each X is independently selected from the group consisting of A, D, F, S, V and Y.
In certain embodiments, in formula (VIII), the mutation at position 8 of V12 includes insertion of 0, 1 or 2 variant amino acids, and the mutation at position 1 of V11 includes insertion of 2 variant amino acids.
In some embodiments, each compound of the subject library includes a peptidic sequence of between 30 and 80 residues, such as between 40 and 70, between 45 and 60 residues, or between 52 and 58 residues. In certain embodiments, each compound of the subject library includes a peptidic sequence of 52, 53, 54, 55, 56, 57 or 58 residues. In certain embodiments, the peptidic sequence is of 55, 56, or 57 residues, such as 56 residues.
In certain embodiments, each compound of the subject library includes a GB1 scaffold domain and a variable domain. The variable domain may be a part of the GB1 scaffold domain and may be either a continuous or a discontinuous sequence of residues. A variable domain that is defined by a discontinuous sequence of residues may include contiguous variant amino acids at positions that are arranged close in space relative to each other in the structure of the compound. The variable domain may form a potential binding interface of the compounds. The variable domain may define a binding surface area of a suitable size for forming protein-protein interactions. The variable domain may include a surface area of between 600 and 1800 Å², such as between 800 and 1600 Å², between 1000 and 1400 Å², between 1100 and 1300 Å², or about 1200 Å².
The individual sequences of the members of any one of the subject libraries can be determined as follows. Any GB1 scaffold as defined herein may be selected as a scaffold for a subject library. The positions of the mutations in the GB1 scaffold domain may be selected as described herein, e.g., as depicted in FIG. 3 for Libraries 1 to 6, where the GB1 scaffold domain may be aligned with the framework of FIG. 3 as described above. The nature of the mutation at each variant amino acid position may be selected, e.g., substitution with any naturally occurring amino acid, or substitution with a limited number of representative amino acids that provide a reasonable diversity of physiochemical properties (e.g., hydrophobicity, hydrophilicity, size, solubility). Certain variant amino acid positions may be selected as positions where mutations can include the insertion or deletion of amino acids, e.g., the insertion of 1 or 2 amino acids where the variant amino acid position occurs in a loop or turn region of the scaffold. In certain embodiments, the mutations can include the insertion or amino acids at one or more positions selected from positions 1, 9, 19, 38, 47 and 55. After selection of the GB1 scaffold, selection of the positions of variant amino acids, and selection of the nature of the mutations at each position, the individual sequences of the members of the library can be determined.
In some embodiments, two or more of the subject libraries may be combined to produce a larger library. The combination library may include members that have any one of two or more distinct arrangements of mutations that define two or more potential binding surfaces of the GB1 scaffold. In some embodiments, each compound of the library is described by one of formulas (III) to (VIII), as defined above, and the library includes at least one member described by formula (III), at least one member described by formula (IV), at least one member described by formula (V), at least one member described by formula (VI), at least one member described by formula (VII), and at least one member described by formula (VIII).
In certain embodiments, each compound of the library is described by one of formulas (III) to (VIII), where only two of the formulas (III) to (VIII) are represented by the members of the library. In certain embodiments, each compound of the library is described by one of formulas (III) to (VIII), where 5 or less, such as 4 or less or 3 or less of the formulas (III) to (VIII) are represented by the members of the library.
In some embodiments, the subject library includes a combination of libraries 1 to 6 depicted in FIG. 3, e.g., a combination of 2 or more, such as 3 or more, 4 or more, or 5 or more of libraries 1 to 6. In some embodiments, the subject library includes a combination of any 2 of the libraries 1 to 6 depicted in FIG. 3, e.g., a combination of libraries 1 and 2, a combination of libraries 2 and 3, a combination of libraries 1 and 3, a combination of libraries 4 and 5, a combination of libraries 5 and 6, a combination of libraries 4 and 6, a combination of any one of libraries 1-3 and any one of libraries 4-6. In some embodiments, the subject library includes a combination of any 3 of the libraries 1 to 6 depicted in FIG. 3, e.g., a combination of libraries 1-3, a combination of libraries 4-6, a combination of any 2 libraries of 1-3 and any one library of 4-6, or a combination of any one library of 1-3 and any 2 libraries of 4-6. In some embodiments, the subject library includes a combination of all of libraries 1 to 6 depicted in FIG. 3.
In some embodiments, the subject library is bifunctional in the sense that the GB1 compounds of the library have two potential binding surfaces. Such libraries can be screened to identify compounds having specific binding properties for two target molecules. In certain embodiments, the compounds may include a first potential binding surface for a first target molecule and a second potential binding surface for a second target molecule. In certain embodiments, the first target molecule is a therapeutic target protein and the second target molecule is an endogenous protein or receptor (e.g., an IgG, FcRn, or serum albumin protein) that is capable of modulating the pharmacokinetic properties (e.g., in vivo half-life) of a GB1 compound upon recruitment. In some embodiments, any convenient endogenous protein target may be selected as one of the targets to be screened. In certain embodiments, the compounds of the library include two potential binding surfaces for the same target molecule, where the overall binding affinity of the compound may be modulated via an avidity effect.
GB1 has binding affinity for human IgG fragments, e.g., hFc binds to the al helix motif and hFab binds to the second beta-strand (β2) motif. In some embodiments, the IgG-binding properties of the GB1 scaffold are utilized to provide one potential binding surface of the subject bifunctional libraries. In certain embodiments, the bifunctional library has an IgG binding surface that includes the α1 helix motif and a target binding surface, such as surface 5 or 6.
Any suitable combinations of potential binding surfaces may be utilized to produce the subject bifunctional libraries. In some cases, the two potential binding surfaces of a bifunctional library are selected to minimize any potential steric interactions between the first and second target molecules, e.g., by binding the targets on opposite sides of the scaffold. In some embodiments, a pair of potential biding surfaces of the subject bifunctional library are selected from surfaces 1 and 5, surfaces 3 and 4, surfaces 2 and 6, surfaces 1 and 6, surfaces 2 and 5, and surfaces 2 and 4, where the individual surfaces 1 to 6 are shown in FIGS. 2A and 2B, respectively. FIG. 13 illustrates exemplary pairs of potential binding surfaces for use in the subject bifunctional libraries.
The subject bifunctional library may include one or more variable domains on each of the potential binding surfaces of the library. Any convenient variable domains as described herein for surfaces 1-6 may be employed in the subject bifunctional libraries. In some embodiments, the subject bifunctional library includes 3 or more mutations, such as 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 10 or more, 12 or more or 14 or more mutations in the variable domain of a first surface, and 3 or more mutations, such as 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 10 or more, 12 or more or 14 or more mutations in the variable domain of a second surface. Any suitable mutations in the variable domains may be selected, as described above for the mutations surfaces 1-6 (see e.g., FIG. 3).
The subject bifunctional library may be screened for specific binding to first and second target molecules using a variety of strategies. For example, the libraries can be screened for binding to first and second target molecules using simultaneous screening, consecutive screening or convergent screening strategies. In some embodiments, the bifunctional library is screened for simultaneous binding of first and second targets to first and second surfaces, respectively. In some embodiments, a first library is screened for binding of a first target to a first surface to produce a second generation library based on a scaffold that binds the first target. In certain embodiments, such binding of a first target protein to a first surface is inherent in the scaffold, and does not require screening, although affinity maturation optimization of the binding of the first target may be performed. The second generation library based on the scaffold that binds the first target is then screening for binding to a second target at a second surface. In some embodiments, a convergent screening strategy is utilized where a first library is screened for binding to a first target and a second library is screened for binding to a second target. Utilizing the results of these screens, first and second binding surfaces are then incorporated into the same GB1 scaffold to produce bifunctional GB1 compounds. Such bifunctional compounds and libraries can be optimized by affinity maturation.
Also provided are affinity maturation libraries, e.g., second generation GB1 peptidic libraries based on a parent GB1 peptidic compound that binds to a certain target molecule, where the libraries can be screened to optimize for binding affinity and specificity, or any desirable property, such as, protein folding, protease stability, thermostability, compatibility with a pharmaceutical formulation, etc.
In some embodiments, the affinity maturation library is a GB1 peptidic library as described above, except that a fraction of the variant amino acid positions are held as fixed positions while the remaining variant amino acid positions define the new library. The mutations of these variant amino acids that define the affinity maturation library may include substitution with all 20 naturally occurring amino acids. The variant amino acids that are held as fixed become part of a new scaffold domain. In certain embodiments, the affinity maturation library is a GB1 peptidic library described herein, where 70% or more of the variant amino acids, such as 75% or more, 80% or more, or 85% or more are held fixed. In certain embodiments, the affinity maturation library is a GB1 peptidic library described herein, where 8 or more of the variant amino acids, such as 9 or more, 10 or more, or 11 or more, or 12 or more are held fixed. In some cases, the affinity maturation library includes 6 or less, such as 5 or less, 4 or less, or 3 or less variant amino acids. In certain embodiments, the affinity maturation library includes 4 remaining variant amino acids. In certain embodiments, the remaining variant amino acids are contiguous. In certain embodiments, the remaining variant amino acids form a continuous sequence of residues in the GB1 scaffold domain. In certain embodiments, the affinity maturation library is based on one of the GB1 peptidic libraries 1 to 6 as described in FIGS. 2-3, where a fraction of the variant amino acid positions are held as fixed positions while the remaining variant amino acid positions define the new library. In some cases, one or more of the variant amino acids that are held fixed may be different from the amino acids of the GB1 scaffold shown in FIG. 3. Further, any GB1 scaffold domain may be substituted for the scaffold domain shown in FIG. 3. The scaffold domain of an affinity maturation library may be selected based on an initial selection for binding to a target molecule.
In some instances, a GB1 peptidic compound that is identified after initial screening a subject library for binding to a certain target molecule may be selected as a scaffold for an affinity maturation library. Any convenient methods of affinity maturation may be used. In some cases, a number of affinity maturation libraries are prepared that include mutations at limited subsets of possible variant positions (e.g., mutations at 4 of a 15 variable positions), while the rest of the variant positions are held as fixed positions. The positions of the mutations may be tiled through the scaffold sequence to produce a series of libraries such that mutations at every variant position is represented and a diverse range of amino acids are substituted at every position (e.g., all 20 naturally occurring amino acids). Mutations that include deletion or insertion of one or more amino acids may also be included at variant positions of the affinity maturation libraries. An affinity maturation library may be prepared and screened using any convenient method, e.g., phage display library screening, to identify members of the library having an improved property, e.g., increased binding affinity for a target molecule, protein folding, protease stability, thermostability, compatibility with a pharmaceutical formulation, etc.
In some embodiments, in an affinity maturation library, most or all of the variant amino acid positions in the variable regions of the parent GB1 compound are held as fixed positions, and contiguous mutations are introduced at positions adjacent to these variable regions. Such mutations may be introduced at positions in the parent GB1 compound that were previously considered fixed positions in the original GB1 scaffold. Such mutations may be used to optimize the GB1 compound variants for any desirable property, such as protein folding, protease stability, thermostability, compatibility with a pharmaceutical formulation, etc.
Fusion polypeptides including GB1 peptidic compounds can be displayed on the surface of a cell or virus in a variety of formats and multivalent forms. In one embodiment, a bivalent moiety, for example, a hinge and dimerization sequence from a Fab template, an anti-MBP (maltose binding protein) Fab scaffold is used for displaying GB1 peptidic compound variants on the surface of a phage particle. Optionally, other sequences encoding polypeptide tags useful for purification or detection such as a FLAG tag, can be fused at the 3′ end of the nucleic acid sequence encoding the GB1 peptidic compound.

Polynucleotide Libraries

Also provided is a library of polynucleotides that encodes a library of GB1 peptidic compounds as described above. In some embodiments, each polynucleotide of the library encodes a distinct GB1 peptidic compound that includes three or more, such as four or more or five or more mutations at non-core positions in a region outside of the β1-β2 region.
In some embodiments, each polynucleotide of the library encodes a GB1 peptidic compound that includes 30 or more, 40 or more, or 50 or more amino acids. In some embodiments, each polynucleotide of the library encodes a GB1 peptidic compound where the compound includes three or more variant amino acids at non-core positions, and where each variant amino acid is encoded by a random codon. In certain embodiments, the random codon is selected from the group consisting of NNK (where N=A, G, C and T, and K=G and T) and KHT (where K=G and T, and H=A, C and T).
In certain embodiments, the subject library of polynucleotides is a library of replicable expression vectors that includes a nucleic acid sequence encoding a gene fusion, where the gene fusion encodes a fusion protein including the GB1 peptidic compound fused to all or a portion of a viral coat protein. Also included is a library of diverse replicable expression vectors comprising a plurality of gene fusions encoding a plurality of different fusion proteins including a plurality of the antibody variable domains generated with diverse sequences as described above. The vectors can include a variety of components and can be constructed to allow for movement of the GB1 domain between different vectors and/or to provide for display of the fusion proteins in different formats. Examples of vectors include phage vectors and ribosome display vectors. The phage vector has a phage origin of replication allowing phage replication and phage particle formation. In certain embodiments, the phage is a filamentous bacteriophage, such as an M13, f1, fd, Pf3 phage or a derivative thereof, or a lambdoid phage, such as lambda, 21, phi80, phi81, 82, 424, 434, etc., or a derivative thereof.
Any convenient display methods may be used to display GB1 peptidic compounds encoded by the subject library of polynucleotides, such as cell-based display techniques and cell-free display techniques. In certain embodiments, cell-based display techniques include phage display, bacterial display, yeast display and mammalian cell display. In certain embodiments, cell-free display techniques include mRNA display and ribosome display.
In certain embodiments, the library of polynucleotides is a library that encodes 50 or more distinct compounds, such as 100 or more, 300 or more, 1×10³or more, 1×10⁴or more, 1×10⁵or more, 1×10⁶or more, 1×10⁷or more, 1×10⁸or more, 1×10⁹or more, 1×10¹⁰or more, 1×10¹¹or more, or 1×10¹²or more, distinct compounds, where each polynucleotide of the library encodes a GB1 peptidic compound that comprises three or more, such as four or more or five or more different non-core mutations at positions in a region outside of the β1-β2 region. In certain embodiments, the library of polynucleotides is a library of replicable expression vectors.
In some embodiments, each polynucleotide of the library encodes a GB1 peptidic compound comprising ten or more variant amino acids at non core positions, wherein each variant amino acid is encoded by a random codon. In certain embodiments, the random codon is selected from the group consisting of NNK and KHT.

Phage Display Libraries

The subject libraries may be prepared using any convenient methods, such as, methods that find use in the preparation of libraries of peptidic compounds, for example, phage display methods.
In some embodiments, the subject library is a phage display library. A utility of phage display is that large libraries of randomized protein variants can be rapidly and efficiently sorted for those sequences that bind to a target protein. Display of polypeptide libraries on phage may be used for screening for polypeptides with specific binding properties. Polyvalent phage display methods may be used for displaying polypeptides through fusions to either gene III or gene VIII of filamentous phage. Wells and Lowman (1992) Curr. Opin. Struct. Biol B:355-362 and references cited therein. In monovalent phage display, a polypeptide library is fused to a gene III or a portion thereof and expressed at low levels in the presence of wild type gene III protein so that phage particles display one copy or none of the fusion proteins. Avidity effects are reduced relative to polyvalent phage so that sorting is on the basis of intrinsic ligand affinity, and phagemid vectors are used, which simplify DNA manipulations. Lowman and Wells (1991) Methods: A companion to Methods in Enzymology 3:205-216. In phage display, the phenotype of the phage particle, including the displayed polypeptide, corresponds to the genotype inside the phage particle, the DNA enclosed by the phage coat proteins.
In some embodiments, each GB1 peptidic compound of a subject library is fused to at least a portion of a viral coat protein. Examples of viral coat proteins include infectivity protein PIII, major coat protein PVIII, p3, Soc, Hoc, gpD (of bacteriophage lambda), minor bacteriophage coat protein 6 (pVI) (filamentous phage; J. Immunol. Methods, 1999, 231(1-2):39-51), variants of the M13 bacteriophage major coat protein (P8) (Protein Sci 2000 April; 9(4):647-54). The fusion protein can be displayed on the surface of a phage and suitable phage systems include M13KO7 helper phage, M13R408, M13-VCS, and Phi X 174, pJuFo phage system (J. Virol. 2001 August; 75(15):7107-13), hyperphage (Nat. Biotechnol. 2001 January; 19(1):75-8). In certain embodiments, the helper phage is M13KO7, and the coat protein is the M13 Phage gene III coat protein. In certain embodiments, the host is E. coli or protease deficient strains of E. coli. Vectors, such as the fth1 vector (Nucleic Acids Res. 2001 May 15; 29(10):E50-0) can be useful for the expression of the fusion protein.

Display of Fusion Polypeptides

Any convenient methods for displaying fusion polypeptides including GB1 peptidic compounds on the surface of bacteriophage may be used. For example methods as described in patent publication number WO 92/01047; WO 92/20791; WO 93/06213; WO 93/11236 and WO 93/19172.
The expression vector also can have a secretory signal sequence fused to the DNA encoding each GB1 peptidic compound. This sequence may be located immediately 5′ to the gene encoding the fusion protein, and will thus be transcribed at the amino terminus of the fusion protein. However, in certain cases, the signal sequence has been demonstrated to be located at positions other than 5′ to the gene encoding the protein to be secreted. This sequence targets the protein to which it is attached across the inner membrane of the bacterial cell. The DNA encoding the signal sequence may be obtained as a restriction endonuclease fragment from any gene encoding a protein that has a signal sequence. Suitable prokaryotic signal sequences may be obtained from genes encoding, for example, LamB or OmpF (Wong et al., Gene, 68:1931 (1983), MalE, PhoA and other genes. A prokaryotic signal sequence for practicing this invention is the E. coli heat-stable enterotoxin II (STII) signal sequence as described by Chang et al., Gene 55:189 (1987), and malE.
The vector may also include a promoter to drive expression of the fusion protein. Promoters most commonly used in prokaryotic vectors include the lac Z promoter system, the alkaline phosphatase pho A promoter, the bacteriophage .gamma-_PLpromoter (a temperature sensitive promoter), the tac promoter (a hybrid trp-lac promoter that is regulated by the lac repressor), the tryptophan promoter, and the bacteriophage T7 promoter. While these are the most commonly used promoters, other suitable microbial promoters may be used as well.
The vector can also include other nucleic acid sequences, for example, sequences encoding gD tags, c-Myc epitopes, FLAG tags, poly-histidine tags, fluorescence proteins (e.g., GFP), or beta-galactosidase protein which can be useful for detection or purification of the fusion protein expressed on the surface of the phage or cell. Nucleic acid sequences encoding, for example, a gD tag, also provide for positive or negative selection of cells or virus expressing the fusion protein. In some embodiments, the gD tag is fused to a GB1 peptidic compound which is not fused to the viral coat protein. Nucleic acid sequences encoding, for example, a polyhistidine tag, are useful for identifying fusion proteins including GB1 peptidic compounds that bind to a specific target using immunohistochemistry. Tags useful for detection of target binding can be fused to either a GB1 peptidic compound not fused to a viral coat protein or a GB1 peptidic compound fused to a viral coat protein.
Another useful component of the vectors used to practice this invention are phenotypic selection genes. The phenotypic selection genes are those encoding proteins that confer antibiotic resistance upon the host cell. By way of illustration, the ampicillin resistance gene (ampr), and the tetracycline resistance gene (tetr) are readily employed for this purpose.
The vector can also include nucleic acid sequences containing unique restriction sites and suppressible stop codons. The unique restriction sites are useful for moving GB1 peptidic compound domains between different vectors and expression systems. The suppressible stop codons are useful to control the level of expression of the fusion protein and to facilitate purification of GB1 peptidic compounds. For example, an amber stop codon can be read as Gln in a supE host to enable phage display, while in a non-supE host it is read as a stop codon to produce soluble GB1 peptidic compounds without fusion to phage coat proteins. These synthetic sequences can be fused to GB1 peptidic compounds in the vector.
In some cases, vector systems that allow the nucleic acid encoding a GB1 peptidic compound of interest to be easily removed from the vector system and placed into another vector system, may be used. For example, appropriate restriction sites can be engineered in a vector system to facilitate the removal of the nucleic acid sequence encoding the GB1 peptidic compounds. The restriction sequences are usually chosen to be unique in the vectors to facilitate efficient excision and ligation into new vectors. GB1 peptidic compound domains can then be expressed from vectors without extraneous fusion sequences, such as viral coat proteins or other sequence tags.
Between nucleic acid encoding GB1 peptidic compounds (gene 1) and the viral coat protein (gene 2), DNA encoding a termination codon may be inserted, such termination codons including UAG (amber), UAA (ocher) and UGA (opel). (Microbiology, Davis et al., Harper & Row, New York, 1980, pp. 237, 245-47 and 374). The termination codon expressed in a wild type host cell results in the synthesis of the gene 1 protein product without the gene 2 protein attached. However, growth in a suppressor host cell results in the synthesis of detectable quantities of fused protein. Such suppressor host cells are well known and described, such as E. coli suppressor strain (Bullock et al., BioTechniques 5:376-379 (1987)). Any acceptable method may be used to place such a termination codon into the mRNA encoding the fusion polypeptide.
The suppressible codon may be inserted between the first gene encoding the GB1 peptidic compounds, and a second gene encoding at least a portion of a phage coat protein. Alternatively, the suppressible termination codon may be inserted adjacent to the fusion site by replacing the last amino acid triplet in the antibody variable domain or the first amino acid in the phage coat protein. When the plasmid containing the suppressible codon is grown in a suppressor host cell, it results in the detectable production of a fusion polypeptide containing the polypeptide and the coat protein. When the plasmid is grown in a non-suppressor host cell, the GB1 peptidic compound domain is synthesized substantially without fusion to the phage coat protein due to termination at the inserted suppressible triplet UAG, UAA, or UGA. In the non-suppressor cell the GB1 peptidic compound domain is synthesized and secreted from the host cell due to the absence of the fused phage coat protein which otherwise anchored it to the host membrane.

Methods of Screening

Also provided are methods of screening libraries of the compounds, e.g., as described above, for binding to a target protein. In addition, the libraries may be selected for improved binding affinity to a certain target protein, e.g., as described above, for the preparation and screening of affinity maturation libraries. The target proteins may include any type of protein of interest in research or therapeutic applications. Aspects of these screening methods may include determining whether a compound of the subject libraries specifically binds to a target protein of interest. Screening methods may include screening for inhibition of a biological activity. Such methods may include: (i) contacting a sample containing a target protein with a library of the invention; and (ii) determining whether a compound of the library specifically binds to the target protein.
The determining step may be carried out by any one or more of a variety a protocols for characterizing the specific binding or the inhibition of binding.
For example, screening may be a cell-based assay, an enzyme assay, a ELISA assay or other related biological assay for assessing specific binding or the inhibition of binding, and the determining or assessment step suitable for application in such assays are well known and involve routine protocols.
Screening may also include in silico methods, in which one or more physical and/or chemical attributes of compounds of the library of interest are expressed in a computer-readable format and evaluated by any one or more of a variety of molecular modeling and/or analysis programs and algorithms suitable for this purpose. In some embodiments, the in silico method includes inputting one or more parameters related to the D-target protein, such as but not limited to, the three-dimensional coordinates of a known X-ray crystal structure of the D-target protein. In some embodiments, the in silico method includes inputting one or more parameters related to the compounds of the L-peptidic library, such as but not limited to, the three-dimensional coordinates of a known X-ray crystal structure of a parent scaffold domain of the library. In some instances, the in silico method includes generating one or more parameters for each compound in a peptidic library in a computer readable format, and evaluating the capabilities of the compounds to specifically bind to the target protein. The in silico methods include, but are not limited to, molecular modelling studies, biomolecular docking experiments, and virtual representations of molecular structures and/or processes, such as molecular interactions. The in silico methods may be performed as a pre-screen (e.g., prior to preparing a L-peptidic library and performing in vitro screening), or as a validation of binding compounds identified after in vitro screening.
Thus the screening methods of the invention can be carried out in vitro or in vivo. For example, when the compound is in a cell, the cell may be in vitro or in vivo, and the determining of whether the compound is capable of specifically binding to a target protein in the cell includes: (i) contacting the cell with a library of the invention; and (ii) assessing whether a compound of the library specifically binds to the target protein.
As such, determining whether a GB1 peptidic compound of a subject library is capable of specifically binding a target protein may be carried out by any number of methods, as well as combinations thereof.
In some embodiments, the subject method includes:
(a) contacting a target protein with a library including 50 or more distinct GB1 peptidic compounds, where each compound includes a β1-β2 region and three or more, such as four or more or five or more mutations at non-core positions in a region outside of the β1-β2 region; and
(b) identifying a compound of the library that specifically binds to the target protein.
In some embodiments, in the subject method, the target protein is a D-protein. In some embodiments, in the subject method, the target protein is a L-protein.

Phage Display Screening Methods

Screening for the ability of a fusion polypeptide including a GB1 peptidic compound of a subject library to bind a target molecule can also be performed in solution phase. For example, a target protein can be attached with a detectable moiety, such as biotin. Phage that bind to the target molecule in solution can be separated from unbound phage by a molecule that binds to the detectable moiety, such as streptavidin-coated beads where biotin is the detectable moiety. Affinity of binders (GB1 peptidic compound fusions that bind to target protein) can be determined based on concentration of the target protein used, using any convenient formulas and criteria.
In some embodiments, the target protein may be attached to a suitable matrix such as agarose beads, acrylamide beads, glass beads, cellulose, various acrylic copolymers, hydroxyalkyl methacrylate gels, polyacrylic and polymethacrylic copolymers, nylon, neutral and ionic carriers, and the like. Attachment of the target protein to the matrix may be accomplished by any convenient methods, e.g., methods as described in Methods in Enzymology, 44 (1976). After attachment of the target protein to the matrix, the immobilized target is contacted with the library expressing the GB1 peptidic compound containing fusion polypeptides under conditions suitable for binding of at least a portion of the phage particles with the immobilized target. In some instances, the conditions, including pH, ionic strength, temperature and the like will mimic physiological conditions. Bound particles (“binders”) to the immobilized target are separated from those particles that do not bind to the target by washing. Wash conditions can be adjusted to result in removal of all but the higher affinity binders. Binders may be dissociated from the immobilized target by a variety of methods. These methods include competitive dissociation using the wild-type ligand, altering pH and/or ionic strength, and methods known in the art. Selection of binders may involve elution from an affinity matrix with a ligand. Elution with increasing concentrations of ligand should elute displayed binding GB1 peptidic compounds of increasing affinity.
The binders can be isolated and then reamplified or expressed in a host cell and subjected to another round of selection for binding of target molecules. Any number of rounds of selection or sorting can be utilized. One of the selection or sorting procedures can involve isolating binders that bind to an antibody to a polypeptide tag such as antibodies to the gD protein, FLAG or polyhistidine tags. Another selection or sorting procedure can involve multiple rounds of sorting for stability, such as binding to a target protein that specifically binds to folded GB1 peptidic compound containing polypeptide and does not bind to unfolded polypeptide followed by selecting or sorting the stable binders for binding to a target protein.
In some cases, suitable host cells are infected with the binders and helper phage, and the host cells are cultured under conditions suitable for amplification of the phagemid particles. The phagemid particles are then collected and the selection process is repeated one or more times until binders having the desired affinity for the target molecule are selected. In certain embodiments, two or more rounds of selection are conducted.
After binders are identified by binding to the target protein, the nucleic acid can be extracted. Extracted DNA can then be used directly to transform E. coli host cells or alternatively, the encoding sequences can be amplified, for example using PCR with suitable primers, and then inserted into a vector for expression.
Any convenient strategy may be used to select for high affinity binders to a target protein. In certain embodiments, the process of screening is carried out by automated systems to allow for high-throughput screening of library candidates.
In certain embodiments, compounds of the subject peptidic library specifically bind to a target protein with high affinity, e.g., as determined by an SPR binding assay or an ELISA assay. The compounds of the subject peptidic library may exhibit an affinity for a target protein of 1 uM or less, such as 300 nM or less, 100 nM or less, 30 nM or less, 10 nM or less, 5 nM or less, 2 nM or less, 1 nM or less, 300 pM or less, or even less. The compounds of the subject peptidic libraries may exhibit a specificity for a target protein, e.g., as determined by comparing the affinity of the compound for the target protein with that for a reference protein (e.g., an albumin protein), that is 5:1 or more 10:1 or more, such as 30:1 or more, 100:1 or more, 300:1 or more, 1000:1 or more, or even more.

Target Molecules

Once the subject libraries are prepared they can be selected and/or screened for binding to one or more target molecules. In addition, the libraries may be selected for improved binding affinity to certain target molecule. The target molecules may be any type of protein-binding or antigenic molecule, such as proteins, nucleic acids, carbohydrates or small molecules. In certain embodiments, the target molecule is a therapeutic target molecule or a diagnostic target molecule, or a fragment thereof, or a mimic thereof.
In certain embodiments, the target molecule is a hormone, a growth factor, a receptor, an enzyme, a cytokine, an osteoinductive factor, a colony stimulating factor or an immunoglobulin.
In certain embodiments, the target molecule may be one or more of the following: growth hormone, bovine growth hormone, insulin like growth factors, human growth hormone including n-methionyl human growth hormone, parathyroid hormone, thyroxine, insulin, proinsulin, amylin, relaxin, prorelaxin, glycoprotein hormones such as follicle stimulating hormone (FSH), leutinizing hormone (LH), hemapoietic growth factor, Her-2, fibroblast growth factor, prolactin, placental lactogen, tumor necrosis factors, mullerian inhibiting substance, mouse gonadotropin-associated polypeptide, inhibin, activin, vascular endothelial growth factors, integrin, nerve growth factors such as NGF-beta, insulin-like growth factor-I and II, erythropoietin, osteoinductive factors, interferons, colony stimulating factors, interleukins (e.g., an IL-4 or an IL-8 protein), bone morphogenetic proteins, LIF, SCF, FLT-3 ligand, kit-ligand, SH3 domain, apoptosis protein, hepatocyte growth factor, hepatocyte growth factor receptor, neutravidin, maltose binding protein, angiostatin, aFGF, bFGF, TGF-alpha, TGF-beta, HGF, TNF-alpha, angiogenin, IL-8, thrombospondin, the 16-kilodalton N-terminal fragment of prolactin and endostatin.
In certain embodiments, the target molecule may be a therapeutic target protein for which structural information is known, such as, but not limited to: Raf kinase (a target for the treatment of melanoma), Rho kinase (a target in the prevention of pathogenesis of cardiovascular disease), nuclear factor kappaB (NF-.kappa.B, a target for the treatment of multiple myeloma), vascular endothelial growth factor (VEGF) receptor kinase (a target for action of anti-angiogenetic drugs), Janus kinase 3 (JAK-3, a target for the treatment of rheumatoid arthritis), cyclin dependent kinase (CDK) 2 (CDK2, a target for prevention of stroke), FMS-like tyrosine kinase (FLT) 3 (FLT-3; a target for the treatment of acute myelogenous leukemia (AML)), epidermal growth factor receptor (EGFR) kinase (a target for the treatment of cancer), protein kinase A (PKA, a therapeutic target in the prevention of cardiovascular disease), p21-activated kinase (a target for the treatment of breast cancer), mitogen-activated protein kinase (MAPK, a target for the treatment of cancer and arthritis), c-Jun NH.sub.2-terminal kinase (JNK, a target for treatment of diabetes), AMP-activated kinase (AMPK, a target for prevention and treatment of insulin resistance), lck kinase (a target for immuno-suppression), phosphodiesterase PDE4 (a target in treatment of inflammatory diseases such as rheumatoid arthritis and asthma), Abl kinase (a target in treatment of chronic myeloid leukemia (CML)), phosphodiesterase PDE5 (a target in treatment of erectile dysfunction), a disintegrin and metalloproteinase 33 (ADAM33, a target for the treatment of asthma), human immunodeficiency virus (HIV)-1 protease and HIV integrase (targets for the treatment of HIV infection), respiratory syncytial virus (RSV) integrase (a target for the treatment of infection with RSV), X-linked inhibitor of apoptosis (XIAP, a target for the treatment of neurodegenerative disease and ischemic injury), thrombin (a therapeutic target in the treatment and prevention of thromboembolic disorders), tissue type plasminogen activator (a target in prevention of neuronal death after injury of central nervous system), matrix metalloproteinases (targets of anti-cancer agents preventing angiogenesis), beta secretase (a target for the treatment of Alzheimer's disease), src kinase (a target for the treatment of cancer), fyn kinase, lyn kinase, zeta-chain associated protein 70 (ZAP-70) protein tyrosine kinase, extracellular signal-regulated kinase 1 (ERK-1), p38 MAPK, CDK4, CDK5, glycogen synthase kinase 3 (GSK-3), KIT tyrosine kinase, FLT-1, FLT-4, kinase insert domain-containing receptor (KDR) kinase, and cancer osaka thyroid (COT) kinase.
In certain embodiments, the target molecule is a target protein that is selected from the group consisting of a VEGF protein, a RANKL protein, a NGF protein, a TNF-alpha protein, a SH2 domain containing protein, a SH3 domain containing protein, an IgE protein a BLyS protein (Oren et al., “Structural basis of BLyS receptor recognition”, Nature Structural Biology 9, 288-292, 2002), a PCSK9 protein (Ni et al., “A proprotein convertase subtilisin-like/kexin type 9 (PCSK9) C-terminal domain antibody antigen-binding fragment inhibits PCSK9 internalization and restores low density lipoprotein uptake”, J. Biol. Chem. 2010 Apr. 23; 285(17):12882-91), a DLL4 protein (Garber, “Targeting Vessel Abnormalization in Cancer”, JNCI Journal of the National Cancer Institute 2007 99(17):1284-1285), an Ang2 (Angiopoietin-2) protein, a Clostridium difficile Toxin A or B protein (e.g., Ho et al., “Crystal structure of receptor-binding C-terminal repeats from Clostridium difficile toxin A”, (2005) Proc. Natl. Acad. Sci. Usa 102: 18373-18378), a CTLA4 protein (Cytotoxic T-Lymphocyte Antigen 4), and fragments thereof. In certain embodiments, the target protein is a VEGF protein. In certain embodiments, the target protein is a SH2 domain containing protein (e.g., a 3BP2 protein) or a SH3 domain containing protein (e.g., a ABL or a Src protein).

Utility

The libraries of the invention, e.g., as described above, find use in a variety of applications. Applications of interest include, but are not limited to, screening applications and research applications.
The screening methods, e.g., as described above, find use in a variety of applications, including selection and/or screening of the subject libraries in a wide range of research and therapeutic applications, such as therapeutic lead identification and affinity maturation, identification of diagnostic reagents, development of high throughput screening assays, development of drug delivery systems for the delivery of toxins or other therapeutic moieties. The subject screening methods may be exploited in multiple settings.
In some cases, the subject libraries may find use as research tools to analyze the roles of proteins of interest in modulating various biological processes, e.g., angiogenesis, inflammation, cellular growth, metabolism, regulation of transcription and regulation of phosphorylation. For example, antibody libraries have been useful tools in many such areas of biological research and lead to the development of effective therapeutic agents, see Sidhu and Fellhouse, “Synthetic therapeutic antibodies,” Nature Chemical Biology, 2006, 2(12), 682-688.
The subject libraries may be exploited as research tools in the development of clinical diagnostics, e.g., in vitro diagnostics (e.g., for targeting various biomarkers), or in vivo tumor imaging agents. The screening of libraries of binding molecules (e.g., aptamers and antibodies) has found use in the development of such clinical diagnostics, see for example, Jayasena, “Aptamers: An Emerging Class of Molecules That Rival Antibodies in Diagnostics,” Clinical Chemistry. 1999; 45:1628-1650.
The following examples are offered by way of illustration and not by way of limitation.

EXPERIMENTAL

1. Phage Display of GB1 Peptidic Libraries

1.1 Cloning

The wild-type sequence of the Protein G B1 domain (Gronenborn et al., Science 253, 657-61, 1991) was prepared (Genscript USA Inc.) with an N-terminal FLAG tag and a C-terminal 10×His tag spaced by a Glycine-Glycine-Serine linker, is shown below: DYKDDDDK-GGS-TYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEW TYDDATKTFTVTE-GGS-HHHHHHHHHH-amber stop (SEQ ID NO: 44)
This sequence was synthesized with NcoI and XbaI restriction sites at 5′ and 3′ respectively and cloned into a display vector as an N-terminal fusion to truncated protein 3 of M13 filamentous phage. The features of the vector include a ptac promoter and StII secretion leader sequence (MKKNIAFLLASMFVFSIATNAYA; SEQ ID NO: 45). This display version allows the display of GB1 in amber suppressor bacterial strains and is useful for expression of the protein in non-suppressor strains.

1.2 Optimization of Phage Display Levels

The presence of the His-tag and amber-stop at the C-terminus of the protein allows the purification of proteins/mutants without additional mutagenesis. In addition, to optimize for display of GB1 peptidic compounds, two additional constructs were tested for display-levels of GB1 (i) without His-tag and amber-stop (ii) with a hinge and dimerization sequence derived from a Fab-template (DKTHTCGRP; SEQ ID NO: 46) for dimeric display.
The following oligonucleotides were prepared (Integrated DNA Technologies Inc.), for site-directed mutagenesis:

i) 5′-GTT ACC GAA GGC GGT TCT TCT AGA AGT GGT TCC GGT-3′	SEQ ID NO: 47
V T E G G S S R S G S G	SEQ ID NO: 48

For removal of 10×His and amber-stop

	ii)
	5′-TT ACC GAA GGC GGT TCT GAC AAA ACT CAC ACA TGC GGC CGG CCC AGT GGT TCC GGT GAT T-3′	SEQ ID NO: 49
	V T E G G S D K T H T C G R P S G S G D F	SEQ ID NO: 50

For insertion of Fab-dimerization sequence to replace His-tag and amber stop
Site-directed mutagenesis was performed by methods described by Kunkel et al. (Methods Enzymol., 1987, 154, 367-82) and the sequence was confirmed by DNA sequencing. For comparing display levels, phage for each construct was harvested from a 25 mL overnight culture using methods described previously (Fellouse & Sidhu, “Making antibodies in bacteria. Making and using antibodies” Howard & Kaser, Eds., CRC Press, Boca Raton, Fla., 2007). The phage concentrations were estimated using a spectrophotometer (OD₂₆₈=1 for 5×10¹²phage/ml) and normalized to the lowest concentration. Three-fold serial dilutions of phage for each construct were prepared and added to NUNC maxisorb plates previously coated with anti-FLAG antibody (5 μg/ml) and blocked with BSA (0.2% BSA in PBS). The plates were washed and assayed with anti-M13-HRP to detect binding. The HRP signal was plotted as function of phage concentration.

2 Preparation of GB1 Loop Libraries

The mutational and insertion tolerance of GB1 loops was tested, by randomizing the loops and beta-turns and selecting for stably folded proteins. The loop lengths were varied from 4-6 residues and randomized with a NNK codon. The beta-turns and loop residues of GB1 are shown as underlined below:

(SEQ ID NO: 1)
TYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE
B1 L1 L2 B2

Regions B1 and L2 are contiguous and regions L1 and B2 are contiguous. These loops/turn regions were randomized together to produce libraries for screening. Site directed mutagenesis (Kunkel 1987) was used to introduce trip stop codons in the loop pairs. Since wild-type protein is more stable, it would have selective advantage over the rest of the library. The following oligonucleotides were used to make the stop-templates (Integrated DNA Technologies, Inc.):

B1-L2- Stop template
SEQ ID NO: 51
5′- TAC AAA CTG ATT CTG AAC TAA TAA TAA AAA GGT GAA ACC ACG AC-3′ (For B1)

SEQ ID NO: 52
5′- G TAC GCC AAC GAT AAT TAA TAA TAA GAA TGG ACC TAC GAT G-3′ (For L2)

L1-B2- Stop template
SEQ ID NO: 53
5′- GGT GAA ACC ACG ACC TAA TAA TAA GCA GCA ACG GCA GAA AAA-3′ (For L1)

SEQ ID NO: 54
5′- GT GAA TGG ACC TAC GAT TAA TAA TAA ACC TTC ACG GTT ACC G-3′ (For B2)

These stop templates were mutated to construct the Loop libraries using methods described in previous protocols (Kunkel 1987). The following oligonucleotides were used for randomization (Integrated DNA Technologies, Inc.):

Library B1-L2
SEQ ID NO: 55
5′- TAC AAA CTG ATT CTG AAC NNK NNK NNK NNK AAA GGT GAA ACC ACG AC-3′

SEQ ID NO: 56
5′- TAC AAA CTG ATT CTG AAC NNK NNK NNK NNK NNK AAA GGT GAA ACC ACG AC-3′

SEQ ID NO: 57
5′- TAC AAA CTG ATT CTG AAC NNK NNK NNK NNK NNK NNK AAA GGT GAA ACC ACG AC-3′

SEQ ID NO: 58
5′- G TAC GCC AAC GAT AAT NNK NNK NNK NNK GAA TGG ACC TAC GAT G-3′

SEQ ID NO: 59
5′- G TAC GCC AAC GAT AAT NNK NNK NNK NNK NNK GAA TGG ACC TAC GAT G-3′

SEQ ID NO: 60
5′- G TAC GCC AAC GAT AAT NNK NNK NNK NNK NNK NNK GAA TGG ACC TAC GAT G-3′

Library L1-B2
SEQ ID NO: 61
5′- GGT GAA ACC ACG ACC NNK NNK NNK NNK GCA GCA ACG GCA GAA AAA-3′

SEQ ID NO: 62
5′- GGT GAA ACC ACG ACC NNK NNK NNK NNK NNK GCA GCA ACG GCA GAA AAA-3′

SEQ ID NO: 63
5′- GGT GAA ACC ACG ACC NNK NNK NNK NNK NNK NNK GCA GCA ACG GCA GAA AAA-3′

SEQ ID NO: 64
5′- GT GAA TGG ACC TAC GAT NNK NNK NNK NNK ACC TTC ACG GTT ACC G-3′

SEQ ID NO: 65
5′- GT GAA TGG ACC TAC GAT NNK NNK NNK NNK NNK ACC TTC ACG GTT ACC G-3′

SEQ ID NO: 66
5′- GT GAA TGG ACC TAC GAT NNK NNK NNK NNK NNK NNK ACC TTC ACG GTT ACC G-3′

The number of transformants was 1×10⁹for Library B1-L2 and 1×10¹⁰for Library L1-B2. The selections were performed using the methods described below except that the library was directly added to selections wells coated with anti-FLAG antibody (5 μg/ml diluted in PBT) and there was no preincubation step. Selections on anti-FLAG were performed to identify folded variants (misfolded proteins are cleaved thereby losing N-terminal FLAG tag). Three rounds of selection (8 washes/round) were performed as good enrichment was observed in Pool ELISA at Rounds 2 and 3.
The results of anti-FLAG selections of the loop libraries showed that all loops tolerated mutations, including insertion mutations, while maintaining the structure of the scaffold. The following exemplary loop sequences were identified following anti-FLAG selection:

TABLE 1

anti-FLAG selection of loop libraries B1-L2 and L1-B2.

Loop 1	SEQ	Loop 3	SEQ	Loop 2	SEQ	Loop 4	SEQ
8 9 10 11 a b	ID NO:	37 38 39 40 a b	ID NO:	18 19 20 21 a b	ID	46 47 48 49 a b	ID NO:

W P C G V	67	Q V G S	104	G R R T	135	L I P N C Y	166

E V G G V	68	G V W S Q G	105	F E C G W G	136	S S A L K R	167

S S A W R	69	W G C R	106	D R G S	137	E L G G	168

C R G T	70	S T L G G	107	T C T P	138	C A R R H C	169

W G E E	71	F V L A H S	108	V E G G	139	C W P S G	170

G S K T G	72	R H A M	109	S L D E R	140	G A S I N C	171

A S T G	73	T K F C	110	G G A E	141	G C G R	172

G G R W R	74	F C G S R G	111	A F E A E	142	Y K C T D D	173

R G G E	75	M F T E	112	P E S I M R	143	C R G P R	174

S D H S	76	G V G G	113	G E V T	144	S S V G	175

S D G M	77	L R G L	114	S S V D G	145	A C L G G	176

N A H R	78	R R I Q C G	115	V G G A	146	Q N C E M	177

C G E P E	79	Q N L V	116	G W C A P R	147	K E R G A G	178

T H G A	80	Y T D A L S	117	G E C W G	148	P D E M V	179

T G L V R	81	K A V S V R	118	H H G C R A	149	N S D Q Q	180

G A C V R	82	H G R T A G	119	C D D R	150	G A G G	181

G Q Q H	83	R G V V	120	D W G R	151	Q G C G E	182

G T S R E	84	V W L G	121	T R G N	152	C P S R	183

C A T T W	85	G E D A	122	D S S A	153	S D G C	184

G V A G	86	S V W E C	123	L S C Q	154	A G S S P	185

C A R Y G	87	S K Y V L G	124	C V E T R	155	A P Q V G	186

L D F L C	88	A P L R M Q	125	V V G E	156	G C S A	187

C N T R	89	Y G W K H	126	R P T S D M	157	G C R G E S	188

L P S R	90	G C G S R L	127	W E D T C V	158	P R P D A	189

R D I Y	91	D A M C K G	128	S C L G	159	S G N L G G	190

G W G G A W	92	R G K Y	129	K E V K Q	160	R G M A	191

L C V P I N	93	E G G G	130	D S S V	161	E G G G	192

W E K E D	94	D S S C G	131	C T L K	162	R R D D E	193

W G S Q	95	G I G V A	132	P S G H	163	L P Y P	194

G D H A F S	96	M C S S G	133	W S Q C	164	G R A G	195

W G G G A C	97	C P T R	134	Q C N N	165	Y R L G R	196

G C V K	98	S I I L	267

E G H S A	99	Q R Y D	268

G Y G G R	100	K E Y Y N M	269

C C G L	101	G G H S	270

K D G G	102	E F F S	271

T S N G V	103	G V V LK	272

3. Preparation of GB1 Peptidic Libraries

The solvent accessible surface area (SASA) for each residue in the Protein Data Bank (PDB) structure 3 GB1 was estimated using the GETarea tool (Fraczkiewicz & Braun, “Exact and efficient analytical calculation of the accessible surface areas and their gradients for macromolecules,” J. Comput. Chem. 1998, 19, 319-333). This tool also calculates the ratio of SASA in structure compared to SASA in a random coil. A ratio of 0.4 was used to select solvent accessible residues (shown in bold): TYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVT E (SEQ ID NO: 1).
Various contiguous stretches of solvent-accessible residues were selected for randomization (shown in dark in FIGS. 5 to 10) taking into account the oligonucleotide length and homology requirements for Kunkel mutagenesis. The parent sequence is also shown in FIG. 3 with the numbering scheme and loop/beta-turn regions defined.
In addition, positions in the loops were selected for mutations that include insertion of 0, 1 or 2 additional amino acid residues in addition to substitution. Library 1: +0-2 insertions at position 38; Library 2: +0-2 insertions at position 19; Library 3: +2 insertions at position 1, +0-2 insertions at positions 19 and 47; Library 4: +0-2 insertions at positions 9 and 38, +1 insertion at position 55; Library 5: +0-2 insertions at position 9, +1 insertion at position 55; Library 6: +1 insertion at position 1, +0-2 insertions at position 47.
The following oligonucleotides were prepared (Integrated DNA Technologies) to make the libraries using the Kunkel mutagensis method:

Library 1:

(SEQ ID NO: 197)
5′-ACGACCGAAGCAGTG KHT KHT KHT KHT GCA KHT KHT GTT TTC KHT KHT

TAC GCC KHT KHT AAT KHT KHT KHT KHT KHT TGGACCTACGATGAT-3′

(SEQ ID NO: 198)
5′-ACGACCGAAGCAGTG KHT KHT KHT KHT GCA KHT KHT GTT TTC KHT KHT

TAC GCC KHT KHT AAT KHT KHT KHT KHT KHT KHT TGGACCTACGATGAT-3′

(SEQ ID NO: 199)
5′-ACGACCGAAGCAGTG KHT KHT KHT KHT GCA KHT KHT GTT TTC KHT KHT

TAC GCC KHT KHT AAT KHT KHT KHT KHT KHT KHT KHT

TGGACCTACGATGAT-3′

These oligonucleotides include the variable regions where each variant amino acid position is encoded by a KHT codon. SEQ ID NOs: 197-199 include insertion mutations of +0, 1 or 2 additional variant amino acids, respectively, at the position equivalent to position 38 of the scaffold.

Library 2:

(SEQ ID NO: 200)
5′-GGTGAAACCACGACC KHT KHT KHT KHT KHT KHT KHT GCA KHT KHT KHT

TTC KHT KHT KHT GCC KHT KHT AATGGCGTGGATGGT-3′

(SEQ ID NO: 201)
5′-GGTGAAACCACGACC KHT KHT KHT KHT KHT KHT KHT KHT GCA KHT KHT

KHT TTC KHT KHT KHT GCC KHT KHT AATGGCGTGGATGGT-3′

(SEQ ID NO: 202)
5′-GGTGAAACCACGACC KHT KHT KHT KHT KHT KHT KHT KHT KHT GCA KHT

KHT KHT TTC KHT KHT KHT GCC KHT KHT AATGGCGTGGATGGT-3′

These oligonucleotides include the variable regions where each variant amino acid position is encoded by a KHT codon. SEQ ID NOs: 200-202 include insertion mutations of +0, 1 or 2 additional variant amino acids, respectively, at the position equivalent to position 19 of the scaffold.

Library 3:

(SEQ ID NO: 203)

5′-GATGATAAAGGCGGTAGC KHT KHT KHT TACAAACTGATTCTG

AAC-3′

(SEQ ID NO: 204)

5′-AAAGGTGAAACCACGACC KHT KHT KHT KHT KHT KHT KHT

GCAGAAAAAGTTTTCAAA-3′

(SEQ ID NO: 205)

5′-AAAGGTGAAACCACGACC KHT KHT KHT KHT KHT KHT KHT

KHT GCAGAAAAAGTTTTCAAA-3′

(SEQ ID NO: 206)

5′-AAAGGTGAAACCACGACC KHT KHT KHT KHT KHT KHT KHT

KHT KHT GCAGAAAAAGTTTTCAAA-3′

(SEQ ID NO: 207)

5′-GATGGTGAATGGACCTAC KHT KHT KHT KHT KHT

ACCTTCACGGTTACCGAA-3′

(SEQ ID NO: 208)

5′-GATGGTGAATGGACCTAC KHT KHT KHT KHT KHT KHT

ACCTTCACGGTTACCGAA-3′

(SEQ ID NO: 209)

5′-GATGGTGAATGGACCTAC KHT KHT KHT KHT KHT KHT KHT

ACCTTCACGGTTACCGAA-3′

These oligonucleotides include the variable regions where each variant amino acid position is encoded by a KHT codon. SEQ ID NO: 203 includes an insertion mutation of +2 variant amino acids at the position equivalent to position 1 of the scaffold. SEQ ID NOs: 204-206 include mutations of +0, 1 or 2 additional variant amino acids, respectively, at the position equivalent to position 19 of the scaffold. SEQ ID NOs: 207-209 include mutations of +0, 1 or 2 additional variant amino acids, respectively, at the position equivalent to position 47 of the scaffold.

Library 4

(SEQ ID NO: 210)

5′-ACGTACAAACTGATTCTG KHT KHT KHT KHT KHT KHT

GGTGAAACCACGACCGAA-3′

(SEQ ID NO: 211)

5′-ACGTACAAACTGATTCTG KHT KHT KHT KHT KHT KHT KHT

GGTGAAACCACGACCGAA-3′

(SEQ ID NO: 212)

5′-ACGTACAAACTGATTCTG KHT KHT KHT KHT KHT KHT KHT

KHT GGTGAAACCACGACCGAA-3′

(SEQ ID NO: 213)

5′-AAACAGTACGCCAACGAT KHT KHT KHT KHT KHT KHT

TGGACCTACGATGATGCG-3′

(SEQ ID NO: 214)

5′-AAACAGTACGCCAACGAT KHT KHT KHT KHT KHT KHT KHT

TGGACCTACGATGATGCG-3′

(SEQ ID NO: 215)

5′-AAACAGTACGCCAACGAT KHT KHT KHT KHT KHT KHT KHT

KHT TGGACCTACGATGATGCG-3′

(SEQ ID NO: 216)

5′-ACGAAAACCTTCACGGTT KHT KHT KHT GGCGGTTCTGACAAA

ACT-3′

These oligonucleotides include the variable regions where each variant amino acid position is encoded by a KHT codon. SEQ ID NOs: 210-212 include mutations of +0, 1 or 2 additional variant amino acids, respectively, at the position equivalent to position 9 of the scaffold. SEQ ID NOs: 213-215 include mutations of +0, 1 or 2 additional variant amino acids, respectively, at the position equivalent to position 38 of the scaffold. SEQ ID NO: 216 includes an insertion mutation of +2 variant amino acids at the position equivalent to position 55 of the scaffold.

Library 5

(SEQ ID NO: 217)
5′-AAAGGCGGTAGCACGTAC KHT CTG KHT CTG KHT KHT KHT KHT KHT KHT

KHT KHT ACC KHT ACCGAAGCAGTGGATGCA-3′

(SEQ ID NO: 218)
5′-AAAGGCGGTAGCACGTAC KHT CTG KHT CTG KHT KHT KHT KHT KHT KHT

KHT KHT KHT ACC KHT ACCGAAGCAGTGGATGCA-3′

(SEQ ID NO: 219)
5′-AAAGGCGGTAGCACGTAC KHT CTG KHT CTG KHT KHT KHT KHT KHT KHT

KHT KHT KHT KHT ACC KHT ACCGAAGCAGTGGATGCA-3′

(SEQ ID NO: 220)
5′-GATGCGACGAAAACCTTC KHT GTT KHT KHT KHT

GGCGGTTCTGACAAAACT-3′

These oligonucleotides include the variable regions where each variant amino acid position is encoded by a KHT codon. SEQ ID NOs: 217-219 include mutations of +0, 1 or 2 additional variant amino acids, respectively, at the position equivalent to position 9 of the scaffold. SEQ ID NO: 220 includes an insertion mutation of +2 variant amino acids at the position equivalent to position 55 of the scaffold.

Library 6

(SEQ ID NO: 221)
5′-GATGATAAAGGCGGTAGC KHT KHT TAC KHT CTG KHT CTG KHT

GGCAAAACCCTGAAAGGT-3′

(SEQ ID NO: 222)
5′-GATAATGGCGTGGATGGT KHT TGG KHT TAC KHT KHT KHT KHT KHT KHT

TTC KHT GTT KHT GAAGGCGGTTCTGACAAA-3′

(SEQ ID NO: 223)
5′-GATAATGGCGTGGATGGT KHT TGG KHT TAC KHT KHT KHT KHT KHT KHT

KHT TTC KHT GTT KHT GAAGGCGGTTCTGACAAA-3′

(SEQ ID NO: 224)
5′-GATAATGGCGTGGATGGT KHT TGG KHT TAC KHT KHT KHT KHT KHT KHT

KHT KHT TTC KHT GTT KHT GAAGGCGGTTCTGACAAA-3′

These oligonucleotides include the variable regions where each variant amino acid position is encoded by a KHT codon. SEQ ID NO: 221 includes an insertion mutation of +1 variant amino acids at the position equivalent to position 1 of the scaffold. SEQ ID NOs: 222-224 include mutations of +0, 1 or 2 additional variant amino acids, respectively, at the position equivalent to position 47 of the scaffold.

The libraries were prepared using the same method described above for the GB1 template with Fab dimerization sequence (Fellouse & Sidhu, 2007). Oligonucleotides with 0/1/2 insertions have the same homology regions and compete for binding the template. Therefore they were pooled together (equimolar ratio) and treated as a single oligonucleotide for mutagenesis. The constructed libraries were pooled together for total diversity of 3.5×10¹⁰transformants. Selections were performed against L-VEGF and D-VEGF using a method as described below with the exception that 10 selection wells were used for Round 1.
Selections were also performed against 3BP2-SH2, ABL-SH3 and v-Src-SH3 proteins using similar methods to those described below.
Individual clones were analyzed by direct-binding ELISA as described below and by single-point competitive ELISA (Fellouse & Sidhu, 2007).

4. Methods of Screening of Phage Display Libraries

4.1 Library Selections Against VEGF Protein and Negative Selection with BSA
The selection procedure is essentially the same as described in previous protocols (Fellouse & Sidhu, 2007) with some minor changes. Although the method below is described for L-VEGF, the method can be adapted to screen for binding to any target. The media and buffer recipes are the same as in the described protocol.
1. Coat NUNC Maxisorp plate wells with 100 μl of L-VEGF (5 μg/ml in PBS) for 2 h at room temperature. Coat 5 wells for selection and 1 well for phage pool ELISA.
2. Remove the coating solution and block for 1 h with 200 μl of PBS, 0.2% BSA. At the same time, block an uncoated well as a negative control for pool ELISA. Also block 7 wells for pre-incubation of library on a separate plate.
3. Remove the block solution from the pre-incubation plate and wash four times with PT buffer.
4. Add 100 μl of library phage solution (precipitated and resuspended in PBT buffer) to each blocked wells. Incubate at room temperature for 1 h with gentle shaking.
5. Remove the block solution from selection plate and wash four times with PT buffer.
6. Transfer library phage solution from pre-incubation plate to selection plate (5 selection wells+2 controls for pool ELISA)
7. Remove the phage solution and wash 8-10 times with PT buffer (increased based on pool ELISA signal from previous round).
8. To elute bound phage from selection wells, add 100 μl of 100 mM HCl. Incubate 5 min at room temperature. Transfer the HCl solution to a 1.5-ml microfuge tube. Adjust to neutral pH with 11 μl of 1.0 M Tris-HCl, pH 11.0.
9. In the meantime add 100 μl of anti-M13 HRP conjugate (1:5000 dilution in PBT buffer) to the control wells and incubate for 30 min.
10. Wash control wells four times with PT buffer. Add 100 μl of freshly prepared TMB substrate. Allow color to develop for 5-10 min.
11. Stop the reaction with 100 μl of 1.0 M H₃PO₄and read absorbance at 450 nm in a microtiter plate reader. The enrichment ratio can be calculated as the ratio of signal from coated vs uncoated well.
12. Add 250 μl eluted phage solution to 2.5 ml of actively growing E. coli XL1-Blue (OD₆₀₀<0.8) in 2YT/tet medium. Incubate for 20 min at 37° C. with shaking at 200 rpm.
13. Add M13KO7 helper phage to a final concentration of 10¹⁰phage/ml. Incubate for 45 min at 37° C. with shaking at 200 rpm.
14. Transfer the culture from the antigen-coated wells to 25 volumes of 2YT/carb/kan medium and incubate overnight at 37° C. with shaking at 200 rpm.
15. Isolate phage by precipitation with PEG/NaCl solution, resuspend in 1.0 ml of PBT buffer
16. Repeat the selection cycle for 4 rounds.
4.2. Negative Selection with GST Tagged Protein
A more stringent negative selection procedure is as follows. The selection process is essentially the same as described above except that:
i) For Rounds 1 and 2 the libraries were pre-incubated on GST coated (10 μg/ml in PBS) and blocked wells.
ii) For Rounds 3 and 4, the libraries were pre-incubated with 0.2 mg/ml GST in solution for 1 hr before transfer to selection wells
iii) The control wells for pool ELISA were coated with GST (5 μg/ml in PBS)

4.3. Selections of Libraries Against Anti-FLAG

Misfolded proteins are degraded in the periplasm and will not be displayed on phage (Missiakas & Raina, “Protein misfolding in the cell envelope of Escherichia coli: new signaling pathways,” Trends in Biochemical Sciences, 1997, 22, 59-63). Stably folded proteins can therefore be selected for display of the N-terminal FLAG tag.
The selections were performed on the GB1 Loop libraries by a method similar to the one described above except that the library was directly added to selection wells coated with anti-FLAG antibody (5 μg/ml diluted in PBT) and there was no preincubation step. Only three rounds of selection were performed as good enrichment was observed in Pool ELISA at Rounds 2 and 3.

5. Analysis of Single-Clones by Direct Binding ELISA

The following protocol is an adapted version of previous protocols (Fellouse & Sidhu 2007; Tonikian et al., “Identifying specificity profiles for peptide recognition modules from phage-displayed peptide libraries,” Nat. Protoc., 2007, 2, 1368-86):
1. Inoculate 450 μl aliquots of 2YT/carb/KO7 medium in 96-well microtubes with single colonies harboring phagemids and grow for 21 hrs at 37° C. with shaking at 200 rpm.
2. Centrifuge at 4,000 rpm for 10 min and transfer phage supernatants to fresh tubes.
3. Coat 3 wells of a 384 well NUNC maxisorb plate per clone, with 2 μg/ml of L-VEGF, Neutravidn, Erbin-GST respectively and leave one well uncoated. Incubate for 2 hrs at room temperature and block the plates (all 4 well).
4. Wash the plate four times with PT buffer.
5. Transfer 30 μl of phage supernatant to each well and incubate for 2 hrs at room temperature with gentle shaking.
6. Wash four times with PT buffer.
7. Add 30 μl of anti-M13-HRP conjugate (diluted 1:5000 in PBT buffer). Incubate 30 min with gentle shaking.
8. Wash four times with PT buffer
9. Add 30 μl of freshly prepared TMB substrate. Allow color to develop for 5-10 min.
10. Stop the reaction with 100 μl of 1.0 M H₃PO₄and read absorbance at 450 nm in a microtiter plate reader.
Although the particular embodiments have been described in some detail by way of illustration and example for purposes of clarity of understanding, it is readily apparent in light of the teachings of this invention that certain changes and modifications may be made thereto without departing from the spirit or scope of the appended claims.
Accordingly, the preceding merely illustrates the principles of the invention. Various arrangements may be devised which, although not explicitly described or shown herein, embody the principles of the invention and are included within its spirit and scope. Furthermore, all examples and conditional language recited herein are principally intended to aid the reader in understanding the principles of the invention and the concepts contributed by the inventors to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the invention as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents and equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure. The scope of the present invention, therefore, is not intended to be limited to the exemplary embodiments shown and described herein. Rather, the scope and spirit of present invention is embodied by the appended claims.

Claims

1. A library comprising 50 or more distinct GB1 peptidic compounds;

wherein each compound of the library comprises a β1-β2 region and has three or more different non-core mutations in a region outside of the β1-β2 region.

2. The library according to claim 1, wherein the library comprises 1×10⁴or more distinct compounds.

3-4. (canceled)

5. The library according to claim 1, wherein each compound of the library comprises six or more different non-core mutations in a region outside of the β1-β2 region.

6. The library according to claim 1, wherein each compound of the library comprises ten or more different mutations.

7-13. (canceled)

14. The library according to claim 6, wherein the ten or more different mutations are located at positions selected from the group consisting of positions 21-24, 26, 27, 30, 31, 34, 35, 37-41.

15. The library according to claim 6, wherein the ten or more different mutations are located at positions selected from the group consisting of positions 18-24, 26-28, 30-32, 34 and 35.

16. The library according to claim 6, wherein the ten or more different mutations are located at positions selected from the group consisting of positions 1, 18-24 ad 45-49.

17. The library according to claim 6, wherein the ten or more different mutations are located at positions selected from the group consisting of positions 7-12, 36-41, 54 and 55.

18. The library according to claim 6, wherein the ten or more different mutations are located at positions selected from the group consisting of positions 3, 5, 7-14, 16, 52, 54 and 55.

19. The library according to claim 6, wherein the ten or more different mutations are located at positions selected from the group consisting of positions 1, 3, 5, 7, 41, 43, 45-50 52 and 54.

20. The library according to claim 6, wherein each compound of the library comprises five or more different mutations in the α1 region.

21-23. (canceled)

24. The library according to claim 6, wherein each compound of the library comprises three or more different mutations in the β3-β4 region.

25-30. (canceled)

31. The library according to claim 6, wherein each compound of the library comprises two or more different mutations in the region between the α1 and β3 regions.

32. (canceled)

33. The library according to claim 6, wherein each compound of the library comprises ten or more different mutations in the β1-β2 region.

34-35. (canceled)

36. The library according to claim 33, wherein P1 is β1-β2 and P2 is β3-β4 such that the compound is described by the formula (II):

β1-β2-α1-β3-β4 (II)

wherein β1, β2, β3 and ⊖4 are independently beta-strand domains; and

β1, β2, α1, β3 and β4 are connected independently by linking sequences of between 1 and 10 residues in length.

37-38. (canceled)

39. The library according to claim 36, wherein each compound of the library is described by a formula independently selected from the group consisting of:

F1-V1-F2 (III);

F3-V2-F4 (IV);

V3-F5-V4-F6-V5-F7 (V);

F8-V6-F9-V7-F10-V8 (VI);

V9-F11-V10 (VII); and

V11-F12-V12 (VIII)

wherein F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11 and F12 are fixed regions and V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, V11 and V12 are variable regions;

wherein each fixed region is common to all compounds of the same formula and each compound of the library has a distinct variable region.

40. The library according to claim 39, wherein each compound of the library is described by formula (III), wherein:

F1 comprises a sequence having 75% or more amino acid sequence identity to the amino acid sequence set forth in SEQ ID NO: 2;

F2 comprises a sequence having 75% or more amino acid sequence identity to an amino acid sequence set forth in SEQ ID NO: 3; and

V1 comprises a sequence that comprises 10 or more mutations compared to a parent amino acid sequence set forth in SEQ ID NO: 4.

41. The library according to claim 40, wherein V1 comprises a sequence of the formula:

VXXXXAXXVFXXYAXXNXXXXXW (SEQ ID NO: 5)

wherein each X is independently a mutation that comprises substitution with a variant amino acid, wherein the mutation at position 19 of V1 comprises insertion of 0, 1 or 2 additional variant amino acids.

42. (canceled)

43. The library according to claim 39, wherein each compound of the library is described by formula (IV), wherein:

F3 comprises a sequence having 75% or more amino acid sequence identity to the amino acid sequence set forth in SEQ ID NO: 7;

F4 comprises a sequence having 75% or more amino acid sequence identity to an amino acid sequence set forth in SEQ ID NO: 8; and

V2 comprises a sequence that comprises 10 or more mutations compared to a parent amino acid sequence set forth in SEQ ID NO: 9.

44. The library according to claim 39, wherein V2 comprises a sequence of the formula:

TXXXXXXXAXXXFXXXAXXN (SEQ ID NO: 10)

wherein each X is independently a mutation that comprises substitution with a variant amino acid, wherein the mutation at position 3 of V2 comprises insertion of 0, 1 or 2 additional variant amino acids.

45. (canceled)

46. The library according to claim 39, wherein each compound of the library is described by formula (V), wherein:

F5 comprises a sequence having 75% or more amino acid sequence identity to the amino acid sequence set forth in SEQ ID NO: 12;

F6 comprises a sequence having 75% or more amino acid sequence identity to an amino acid sequence set forth in SEQ ID NO: 13;

F7 comprises a sequence having 75% or more amino acid sequence identity to an amino acid sequence set forth in SEQ ID NO: 14;

V3 comprises a sequence that comprises one or more mutation compared to a parent amino acid sequence that is TY; and

V4 comprises a sequence that comprises 7 or more mutations compared to a parent amino acid sequence set forth in SEQ ID NO: 15; and

V5 comprises a sequence that comprises 5 or more mutations compared to a parent amino acid sequence set forth in SEQ ID NO: 16.

47. The library according to claim 46, wherein:

V3 comprises a sequence of the formula XY;

V4 comprises a sequence of the formula TXXXXXXXA (SEQ ID NO: 17); and

V5 comprises a sequence of the formula YXXXXXT (SEQ ID NO: 18);

wherein each X is independently a mutation that comprises substitution with a variant amino acid, wherein the mutation at position 1 of V3 comprises insertion of 2 additional variant amino acids and the mutations at positions 3 and 4 of V4 and V5 each independently comprise insertion of 0, 1 or 2 additional variant amino acids.

48. (canceled)

49. The library according to claim 39, wherein each compound of the library is described by formula (VI), wherein:

F8 comprises a sequence having 75% or more amino acid sequence identity to the amino acid sequence set forth in SEQ ID NO: 21;

F9 comprises a sequence having 75% or more amino acid sequence identity to the amino acid sequence set forth in SEQ ID NO: 22;

F10 comprises a sequence having 75% or more amino acid sequence identity to the amino acid sequence set forth in SEQ ID NO: 23;

V6 comprises a sequence that comprises 6 or more mutations compared to a parent amino acid sequence set forth in SEQ ID NO: 24;

V7 comprises a sequence that comprises 6 or more mutations compared to a parent amino acid sequence set forth in SEQ ID NO: 25; and

V8 comprises a sequence that comprises 2 or more mutations compared to a parent amino acid sequence that is VTE.

50. The library according to claim 49, wherein:

V6 comprises a sequence of the formula LXXXXXXG (SEQ ID NO: 26);

V7 comprises a sequence of the formula DXXXXXXW (SEQ ID NO: 27); and

V8 comprises a sequence of the formula VXX;

wherein each X is independently a mutation that comprises substitution with a variant amino acid, wherein the mutations at position 4 of V6 and V7 each independently comprise insertion of 0, 1 or 2 additional variant amino acids and the mutation at position 3 of V8 comprises insertion of 1 additional variant amino acid.

51. (canceled)

52. The library according to claim 39, wherein each compound of the library is described by formula (VII), wherein:

F11 comprises a sequence having 75% or more amino acid sequence identity to an amino acid sequence set forth in SEQ ID NO: 30;

V9 comprises a sequence that comprises at least 11 mutations compared to a parent amino acid sequence set forth in SEQ ID NO: 31; and

V10 comprises a sequence that comprises 3 or more mutations compared to a parent amino acid sequence set forth in SEQ ID NO: 32.

53. The library according to claim 52, wherein:

V9 comprises a sequence of the formula TYXLXLXXXXXXXXTXT (SEQ ID NO: 33); and

V10 comprises a sequence of the formula FXVXX (SEQ ID NO: 34);

wherein each X is independently a mutation that comprises substitution with a variant amino acid, wherein the mutations at position 9 of V9 comprises insertion of 0, 1 or 2 additional variant amino acids and the mutation at position 5 of V10 comprises insertion of 1 additional variant amino acid.

54. (canceled)

55. The library according to claim 39, wherein each compound of the library is described by formula (VIII), wherein:

F12 comprises a sequence having 75% or more amino acid sequence identity to an amino acid sequence set forth in SEQ ID NO: 37;

V11 comprises a sequence that comprises 4 or more mutations compared to a parent amino acid sequence set forth in SEQ ID NO: 38;

V12 comprises a sequence that comprises 10 or more mutations compared to a parent amino acid sequence set forth in SEQ ID NO: 39.

56. The library according to claim 55, wherein:

V11 comprises a sequence of the formula XYXLXLXG (SEQ ID NO: 40); and

V12 comprises a sequence of the formula GXWXYXXXXXXFXVXE (SEQ ID NO: 41);

wherein each X is independently a mutation that comprises substitution with a variant amino acid, wherein the mutation at position 8 of V12 comprises insertion of 0, 1 or 2 additional variant amino acids and the mutation at position 1 of V11 comprises insertion of 1 additional variant amino acid.

57-65. (canceled)

66. A library of polynucleotides that encodes 50 or more distinct compounds, wherein each polynucleotide encodes a GB1 peptidic compound that comprises a β1-β2 region and has three or more different non-core mutations at positions in a region outside of the β1-β2 region.

67-72. (canceled)

73. The library according to claim 66, wherein each polynucleotide encodes a GB1 peptidic compounds comprising ten or more variant amino acids at non core positions, wherein each variant amino acid is encoded by a random codon.

74. (canceled)

75. A method comprising:

contacting a target protein with a library comprising:

50 or more distinct GB1 peptidic compounds, wherein each compound of the library comprises a β1-β2 region and has three or more different non-core mutations in a region outside of the β1-β2 region; and

identifying a compound of the library that specifically binds to the target protein.

76-79. (canceled)