WO2018005720A1

WO2018005720A1 - Method of determining the molecular binding between libraries of molecules

Info

Publication number: WO2018005720A1
Application number: PCT/US2017/039862
Authority: WO
Inventors: Kettner John Frederick GRISWOLD; Neng FANG
Original assignee: Harvard University
Current assignee: Harvard University
Priority date: 2016-06-30
Filing date: 2017-06-29
Publication date: 2018-01-04
Anticipated expiration: 2018-12-30

Abstract

A method for determining the molecular binding between a target molecule library and a binder molecule library is provided including contacting the target molecule library comprising a plurality of mRNA-protein conjugates with the binder molecule library comprising a plurality of mRNA-protein conjugates under conditions where one or more proteins in the target library bind to one or more proteins in the binder library, and determining the molecular binding of the target and binder protein pairs by determining the sequences of the mRNAs conjugated to the target and binder protein pairs.

Description

METHOD OF DETERMINING THE MOLECULAR BINDING BETWEEN

LIBRARIES OF MOLECULES

RELATED APPLICATION DATA

This application claims priority to U.S. Provisional Application Number 62/356,81 filed on June 30, 2016, which is hereby incorporated herein by reference in its entirety for all purposes.

STATEMENT OF GOVERNMENT INTERESTS nils invention was made with government support under Grant Nos. 1RM1HG008525 and R01MH10391 Q from the National Institutes of Health. The government has certain rights in the invention.

FIELD

The present invention relates in general to methods of determining the molecular binding between libraries of molecules.

BACKGROUND

Methods for attaching genetic information in the form of nucleic acids to protein sequences are known in the art as protein display methods. Protein display methods are typically employed for purposes of affinity maturation of proteins. In a typical affinity maturation scheme, a library of proteins are displayed, and passed over a surface saturated with excess copies of a single target molecule. Proteins that remain attached to the target molecule after washing are subsequently amplified and diversified. Over multiple rounds of affinity maturation processes, high affinity proteins for the target molecule are enriched. These affinity maturation and enrichment processes are typically limited to singular targets due to diffi culties to precisely assign interactions between libraries of affinity matured binder molecules and libraries of target molecules. Although single target affinity maturation leads to slow off rate binding molecules, the specificity of affinity matured binder molecules cannot be determined without further studies.

SUMMARY

The disclosure provides methods of determining the molecular binding between libraries of molecules. The disclosure provides methods for the determination of molecular interactions from the pairing of nucleic acid barcodes following molecule-molecule binding events. The disclosure pro vides methods for retrieval of nucleic acid barcode sequences encoding the pair of binding molecules.

The disclosure provides a method for determining the molecular binding between a target molecule library and a binder molecule library including the steps of: (a) contacting the target molecule library comprising a plurality of mRNA -protein conjugates with the binder molecule library comprising a plurality of mRNA-protein conjugates under conditions where one or more proteins in the target library bind to one or more proteins in the binder library, wherein the mRNA of each mRNA-protein conjugate comprises sequences of an open reading frame (ORF) encoding the conjugated protein and a 3' untranslated region (3' UTR) sequence motif comprising a barcode sequence, (b) crossimking the bound target-binder protein pairs to form protein-protein complexes, (c) transferring the cross-linked protein- protein complexes of the target-binder protein pairs with their respective mRNAs conjugated thereto into a gel support, and (d) determining the molecular binding of the target and binder protein pairs by determining the sequences of the mRNAs conjugated to the target and binder protein pairs.

The disclosure provides that the plurality of mRNA-protein conjugates of the target library are immobilized to a solid support. The disclosure provides that the plurality of mRNA -protein conjugates of the target library are immobilized to a solid support via a protein immobilization tag. The disclosure provides that the protein immobilization tag is fused with the protein of the mRNA-protein conjugate of the target library. The disclosure provides that the protein immobilization tag forms a covalent bond to a surface attachment ligand on the solid support. The disclosure provides that the surface attachment ligand is cleavable from the solid support. The disclosure provides that the mRNA-protein conjugates are obtained by in vitro transcription of DNA molecules to mRNA molecules, in vitro translation of the transcribed mRNA molecules and mRNA display. The disclosure provides that ribosomal release factors are depleted during in vitro translation. The disclosure provides that each DNA molecule comprises a sequence of an open reading frame (ORF) encoding the conjugated protein of the target library or the binder library and a sequence encoding the 3' untranslated region (3' UTR) sequence motif. The disclosure provides that the 3' UTR sequence motif of the DNA molecule for the target library comprises sequences encoding a polypeptide linker, polystop codons, PCR priming sequences, isothermal PCR priming sequences, an enzyme recognition site, a barcode sequence, and DNA scaffold sequences for in vitro transcription, translation and mRNA display. The disclosure provides that the 3 ' UTR sequence motif of the DNA molecule for the binder library comprises sequences encoding a polypeptide linker, polystop codons, PCR priming sequences, isothermal PCR priming sequences, an enzyme recognition site, a unique molecule identifier sequence (UMI), a barcode sequence, and DNA scaffold sequences for in vitro transcription, translation and mRNA display. The disclosure provides that a puromycin oligo is attached to the 3' UTR of the in vitro transcribed mRNA for mRNA display. The disclosure further provides degrading non-attached mRNA by a 3' to 5' directed riboexonuclease after the attachment step. The disclosure provides that the riboexonuclease is R ase R, exoribonuclease II, or exonuclease T. The disclosure provides that the attaching is via UV crosslinking or ligation. The disclosure provides that the mRNA is covalently attached to the C-terminus of the encoded protein. The disclosure provides that the mRNA-protein conjugates are passivated by reverse transcription of the mRNAs to form an mRNA-cDNA duplex prior to the contacting step. The disclosure provides further cleaving the cross-linked protein-protein complexes of the target-binder protein pairs from the solid support. The disclosure provides further casting the cross-linked protein-protein complexes of the target-binder protein pairs in a gel. The disclosure provides further in gel PCR amplification of the mRNA-cDNA duplex conjugated to the target-binder protein pairs. The disclosure provides that the amplified PCR products are sequenced through the ORF and barcode regions to determine the molecular identity of the target-binder protein pairs. The disclosure provides that the barcode sequence is unique to the ORF within the mRNA. The disclosure provides that further comprising in gel PCR amplification of the unique barcoded region in the mRN A-cDNA duplex conjugated to the target-binder protein pairs. The disclosure provides that the PCR ampiicons are joined. The disclosure provides that the joining is carried out via restriction enzyme digestion of the enzyme recognition site and ligation. The disclosure provides that the restriction enzyme is a type IIS endonuclease. The disclosure provides that the joining is carried out via

recombination at the enzyme recognition sites. The disclosure provides that the joining is carried out via overlap extension PCR. The disclosure provides that the molecular identity of the target-binder protein pairs of the protein-protein complexes is determined by sequencing the unique barcoded regions in joined PCR ampiicons. The disclosure provides that the barcode sequence is an orthogonal PCR priming sequence. The disclosure provides that the barcode sequence is a set of mutually orthogonal 20 base pair PCR priming sequence. The disclosure provides that the enzyme recognition site is a type US endonuclease recognition site. The disclosure provides that the in vitro transcription and translation are carried out in prokaryotic or eukaryotic systems. The disclosure provides that the joined PCR amplicons are further amplified for sequencing and clonal isolation of the desired binder. The disclosure provides that the clonal isolation of the desired binder is carried out first by selecting all the binders that bind to a target, and second by selecting a single binder that binds to the target. The disclosure provides that the steps can be repeated to enrich for affinity maturation of a target molecule against a plurality of diversified binder molecules. The disclosure provides that the ORF within the mRNA can be barcoded in either the 5' UTR or 3' UTR. The disclosure provides that the method can be used for determining the molecular binding of DNA barcoded chemical libraries.

Further features and advantages of certain embodiments of the present invention will become more fully apparent in the following description of embodiments and drawings thereof, and from the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee. The foregoing and other features and advantages of the present embodiments will be more fully understood from the following detailed description of illustrative embodiments taken in conjunction with the accompanying drawings in which: FIGS. 1A and IB depict schematics of mRNA display. FIG. 1 A depicts a normal translation termination scheme. FIG. IB depicts an exemplary mRNA display scheme where the ribosome translates puromycin linked mRNA without release factors 1, 2 and 3.

FIG. 2 depicts a schematic of preparation of UT^'R barcoded ORFeome libraries using Gibson assembly .

FIG. 3 depicts a cloning schematic of barcoded ORFeome transfer to gBlock.

FIG. 4 depicts schematics of tagging Illumina PGR priming sequences to barcoded amplicon sequences for next, generation sequencing (NGS) determination and verification of the barcoded amplicon sequences.

FIG. 5 depicts schematics of a target-binder pair immobilized on a solid support. The target and binder are each conjugated to its encoding mRNA, which can be passivated as an mRNA-cDNA duplex. The target molecule is immobilized to the solid support via a protein immobilization tag. The protein immobilization tag can be covalently linked to a ligand on the surface of the solid support.

FIG. 6 depicts schematics of stabilizing the protein-protein complexes bound to the solid support via crosslinking.

FIG. 7 depicts schematics of gel casting of the crosslinked protein-protein complexes which are cleaved off of the solid support and polony amplification of the barcoded mRNAs conjugated to the proteins.

FIG. 8 depicts schematics of in gel polony amplification of colocalized nucleic acids.

FIG. 9 depicts schematics of covalent joining of colocalized nucleic acid polonies by Golden Gate i gel reactions.

FIG. 10 depicts schematics of in gel PCR and elution of amplicon DNA containing paired barcodes for NGS library preparation,

FIG. 11 depicts schematics of interactome map recovery from NGS data.

FIG. 12 depicts schematics of clonal recovery procedure for binder species against each target.

DETAILED DESCRIPTION

The present disclosure is directed to methods of determining the molecular binding between libraries of molecules. The disclosure provides methods for affinity maturation between libraries of molecules. Affinity maturation generally refers to a process where the binding affinity between a target and a binder molecule is increased by modification of the binder molecule. The disclosure provides methods for multiplexing affinity maturation of one or more molecule libraries, such as protein target libraries against one or more molecule libraries, such as binder libraries. The disclosure provides that the target and binder molecules are nucleic acid barcoded proteins, polypeptides or small molecules. The disclosure provides that determining the interaction between the target and binder molecule pairs is by determining the sequence of the barcoded nucleic acids associated with the respective target and binder molecules. Multiplexing affinity maturation of protein target libraries against binder libraries requires three fundamental functionalities including 1) an affinity maturation process, 2) the ability to determine all pairwise interactions between and within protein libraries and 3) retrieval of protein binder pairs between and within diverse libraries according to interaction data. The disclosure provides methods that in addition to facilitating library affinity maturation, enable the determination of pairwise molecular interactions by DNA sequencing. Further, the disclosure provides methods that facilitate retrieval and recovery of populations of individual binding pairs from the pool of the molecule libraries.

Affinity Maturation Methods

The disclosure provides methods for affinity maturation of molecules such as proteins. Methods for attaching genetic information in the form of nucleic acids to protein sequences have been described in the art as protein display methods. Protein display methods are typically employed for purposes of affinity maturation of proteins. In a typical affinity maturation scheme, a library of proteins are displayed, and passed/panned over a surface saturated with excess copies of a single target molecule. Proteins that remain attached to the targets after washing are amplified and diversified. Over multiple rounds of affinity maturation process, high affinity proteins for the target are enriched.

Variations of protein display methods includes mRNA display which is a minimalist protein display method, by which protein identifying nucleic acid sequences can be paired with proteins without necessary modification to the protein sequence. During mRNA display, mRNA molecules are covalentlv attached to the C-terminus of their encoded proteins. mRNA display works by covalently attaching a puromycin antibiotic molecule to the 3' end of the mRNA, such that during the terminal process of translation, the mRNAs become covalently attached to the C-terminal end of the encoded proteins by ribosomal peptidyl transfer of the puromycin moiety. Affinity maturation is achieved by iterating mRNA display in the desired conditions. cDNA display is an extension of mRNA display techniques wherein a moiety is introduced in the 5' end of a reverse transcription primer. The moiety is generally proximal to the protein such that the moiety can react in a biorthogonal way with the mRN A displayed protein at an unnatural amino acid, or a functional group on the puromycin linker that facilitates covalent attachment of the cDNA to the protein. The mRNA is optionally digested and replaced with DNA to fonn a duplex. Affinity maturation is achieved by iterating cDN A display in the desired conditions.

Ribosome display is a display methodology based on known stalling behavior of ribosome-mRNA-protein complexes (RMPCs). In Ribosome display, the non-covalent RPMC is produced from protein coding mRNA transcripts that encode a C-tenninal SecM or equivalent ribosomal stalling sequence, generally preceded by a flexible linker for steric purposes. The RPMC is used directly in panning experiments, and mRNA left behind by the complex is reverse transcribed for PC amplification and diversification of libraries. Affinity maturation is achieved by iterating ribosome display in the desired conditions.

Additional protein display techniques include Snap Display, Halo Display, Clip Display, ACP Display and MCP Display where either the C or N terminus of a protein tag reacts with a 5' cDNA or an mRNA terminal moiety to form a covalent bond with the protein tag. These protein tags form robust, physiologically irreversible covalent bonds with their respective ligands upon binding. Exemplary tag and ligand include Halo tag domains that react with Haloalkane ligands, Clip tags that react with benzylcytosine ligands, MCP and ACP tags that react with Acetyl-COA ligand derivatives and Snap tags that react with benzylguanine ligands. The cDNA or mRNA may be attached to the protein tags by in vitro compartmentalized translation reactions, or they may also be produced in a pooled fashion during the production of protein ribosome mRNA complexes wherein the protein tag is coupled to the mRNA or cDNA in Cis. Protein tagged DNA or mRNA complexes can be panned over a substrate coated with binder molecules. The mRNA protein complex can be panned after lysis of the cells, reverse transcribed and/or PCR amplified in an iterative fashion to facilitate affinity maturation and enrichment between the target and binder molecules.

In Cis display, a sequence specific RNA binding protein is added to the C or N terminus of the displayed protein sequence which binds to a biorthogonal sequence motif at the 5' or 3' UTR region in the mRNA encoding the displayed protein. Strict mRNA to protein pairing is accomplished by compartmentalized expression in in vitro compartmentalized translation conditions. In cells such as yeast and mammalian cells, a centromere containing chromosome can guarantee single copy genetic information, thereby preventing cross coupling of mRNA to other protein species. The mRNA protein complex can be panned after lysis of the cells, reverse transcribed and PCR amplified to facilitate affinity maturation and enrichment.

Phage display is a protein display technique that uses bacteriophages to connect proteins with the genetic information that encodes them. Typically, the phage is an ml3 phage which contains native proteins in the genome. Exemplary coating proteins on rn !3 phage include pill, pVIII, pVI, pVII and pIX and can be fused to the protein librar ' to be affinity matured. Some bacteriophages used in phage display are m !3 and FD filamentous phage, T4, T7, and λ phage. The DNA packaging behavior of these phages is such that single copy genome packaging can be guaranteed. Phage expressed in E, coli is panned on a surface containing the binder for affinity maturation and enrichment. After washing, the bound phage is allowed to infect a fresh E. coli culture and the process is repeated to achieve affinity maturation and enrichment.

Yeast display is a protein display method within a general class of techniques called cell surface display, where the protein of interest is displayed on the cell surface. Generally, the protein is displayed by fusion of the protein library to a membrane protein. Classically, cells are panned over a substrate of affinity enrichment binder target. After washing, bound cells are allowed to inoculate fresh media and the process is repeated to achieve affinity maturation and enrichment.

The disclosed methods include molecules for affinity maturation that are not solely to proteins and polypeptides. In fact, there exists a class of small molecule display techniques general termed DNA barcoded chemical libraries which permit the production enrichment and diversification of small molecule libraries using hybridization based combinatorial chemistry. Different methods for DNA barcoded small molecule library preparation have been described in Kleiner, R. E., Dumelin, C. E. & Liu, D. R., Small-molecule discovery from DNA-encoded chemical libraries. Chem. Soc. Rev. 40, 5707-17 (201 1) hereby incorporated by reference in its entirety. The general approach to DNA barcoded library synthesis involves alternating library synthesis steps with enzyme-catalyzed DNA polymerization or ligation of short DNA sequences used to barcode each synthetic step. Library diversity is generated over repeated cycles of division, synthesis, and pooling, a traditional method in combinatorial synthesis.

Protein-Protein Interaction Methods In the literature, proteins involved in protein-protein interactions are called either bait or prey. Binding of the bait and prey constitutes a protein-protein interaction. These interactions can be determined by a variety of methods known in the art.

Co-immunoprecipitation, (Co-IP) is a technique used to isolate protein complexes through the use of affinity interactions. Co-IP is typically used in a serial manner to e valuate interactions one protein at a time. It generally requires prior knowledge of likely interacting partners and an antibody to blot for its detection . Co-IP may also be multiplexed using protein and peptide molecule arrays.

Mass spectrometiy (MS) can be used to determine the identity of co- immunoprecipitated proteins. By targeting a known member with an antibody and utilizing mass spectrometiy it is possible to pull the entire protein complex out of solution and thereby identifying unknown members of the complex. This concept of pulling protein complexes out of solution is sometimes referred to as a "pull-down". The bait protein may be bound by an antibody, or be a constructed protein that can be affinity isolated through a moiety or domain.

Yeast Two Hybrid (Y2H) system is another popular method of determining protein- protein interactions that is based on in vivo interactions in yeast. Y2H generally employs a reporter gene whose transcription is governed by a pair of transcriptional cofactors. When the two transcriptional cofactors are each fused to a bait and a prey protein, the degree to which the cofactors activate transcription is dependent upon the degree of interaction between the bait and prey proteins. Where the reporter gene is a fluorescent protein, the degree of bait and prey interaction can be quantitated by determining the fluorescent signal via, e.g., flow activated cell sorting and plate based fluorimeter assays of clonal yeast cell populations in wells on multiwell plates. Barcode fusion genetics-Yeast Two-Hybrid (BFG-Y2H), is a method by which a full matrix of protein pairs can be screened in a single multiplexed strain pool (Yachie, N. et al., Pooled-matrix protein interaction screens using Barcode Fusion Genetics, Moi. Syst. Biol. 12, 863 (2016) hereby incorporated by reference in its entirety). BFG-Y2H uses Cre recombination to fuse DNA barcodes from distinct plasmids, generating chimeric protein-pair barcodes that can be quantified via next-generation sequencing. BFG-Y2H matrices range in scale from -25 K to 2.5 M protein pairs. BFG-Y2H increases the efficiency of protein matrix screening, with quality that is on par with state-of-the-art Y2H methods.

In single molecule interaction sequencing (SMI-seq) developed by Gu et al. (Gu, L. et al., Multiplex single-molecule interaction profiling of DNA-barcoded proteins, Nature 515,

554-557 (2014) hereby incorporated by reference in its entirety), proteins are displayed with

DNA barcodes by Halo tag display that is performed in Cis by a stalled ribosome mRNA protein complex. Protein-protein interactions from a panning step are stabilized by cross linking of the protein-protein interface, and the complexes are cast into gels. Polony PCR performed on interacting molecules generates co-localized barcode polonies. Using fluorescent in situ sequencing, the barcode identity co-localization data for each molecule is sufficient to perform library on library screens of interaction in vitro. Furthermore, disassociation constants of molecules could be determined from calculations from the ratio of co-localized and non-co-localized barcode polonies. The methods of the instant disclosure differ from the SMI-seq method in several ways. First, whereas the SMI-seq method enables the determination of interacting molecules by in situ sequencing of DN A barcodes for each molecule, this disclosure provides methods for the in situ joining and recovery of two coUocalized barcodes to facilitate ex-situ determination of molecule interactions using ex-situ

DNA sequencing methods that include, but are not limited to sequencing by synthesis, sequencing by ligation, and sequencing by hybridization instruments. Further, whereas the SMI-seq method permits determination of interactions and affinity maturation, it does not provide methods for the retrieval of the specific subset of binder molecule sequences from the libraiy as herein described by the instant disclosure. This disclosure provides further function beyond the SMI-seq method, including but not limited to: DNA library based sequence retrieval, and interaction profiling by ex-situ DNA sequencing.

Small Molecule-Protein Interactions

Interaction-dependent PCR (IDPCR) (McGregor, L. M., Gorin, D. J., Dumelin, C. E. & Liu, D. R., Interaction-dependent PCR: identification of ligand-target pairs from libraries of ligands and libraries of targets in a single solution-phase experiment, J. Am. Chem. Soc. 132, 15522-4 (2010) hereby incorporated by reference in its entirety) is a small molecule- protein interaction profiling method based on selectively amplifying DNA sequences encoding ligand-target pairs. IDPCR requires DNA-linked ligands and DNA-linked targets, which, in the publication, were prepared in individual wells by simple nonspecific conjugation. Binding between DNA-linked targets and DNA-linked ligands induces formation of an extendable duplex, where extension links codes that identify the Iigand and target.

Pooled DNA Library Retrieval Methods

Methods for retrieval of DNA sequences from libraries involve the addition of orthogonal PCR priming sites to individual DNA sequences. The assignment of the

Orthogonal PCR primers may be random or preassigned. In the dial-out PCR methods described by Shendure and colleagues (Schwartz, J. J., Lee, C. & Shendure J., Accurate gene synthesis with tag-directed retrieval of sequence-verified DNA molecules, Nat. Methods 9, 913-5 (2012); Klein, J. C. et al., Multiplex pairw!se assembly of array-derived DNA oligonucleotides, Nucleic Acids Res, gkvl l77- (2015). doi: 10.1093/nar/gkvl l 77 hereby incorporated by reference in their entireties), a barcode libraiy is flanked at the 3 ' end by a common PCR sequence that target the library. 5' common adapters are added to facilitate libraiy sequencing. Determination of the assignment of PCR primers to libraiy members is accomplished by next generation sequencing (NGS). Desired sequences can be retrieved independently from the libraiy by PCR employing the determined combination of assigned orthogonal primers to the library member. Alternatively, the orthogonal priming sites may be preassigned to specific members of the libraiy. In the preassigned case, no DNA sequencing of the library-priming site pairs is necessary.

Comparison of Methods

The disclosed methods have the advantage over the conventional methods for determining interactions between proteins by DNA sequencing because the disclosed methods can simultaneously facilitate affinity maturation, interaction profiling and sequence recovery (Table 1).

Table 1. A comparison of the methods.

Lib ran' Sequence

Retrieval

Interaction X X X Profiling by

Ex-Situ

DNA

Sequencing

Library on X X X X Library

The disclosure provides general methods for multiplexing affinity maturation of target molecule libraries against binder molecule libraries, where the interactions between the target and binder molecules can be determined by determining the barcoded nucleic acid sequences associated with the respective target and binder molecules, and the exact subsets of the molecules in the binder library can be recovered in either pooled or clonal forms by PCR and sequencing.

Embodiments of the present invention are directed to nucleic acid sequences have sequences of open reading frame (ORF) encoding the protein molecules and 3' or 5' UTRs. The UTR sequences comprise two or more orthogonal primer binding sites that each hybridizes to an orthogonal primer. As used herein, the term "orthogonal primer binding site" is intended to include, but is not limited to, a nucleic acid sequence located within the UTR sequences of the present invention which hybridizes a complementary orthogonal primer. An "orthogonal primer pair" refers to a set of two primers of identical sequence that bind to both orthogonal primer binding sites within the UTR sequences. Orthogonal primer pairs are designed to be mutually non-hybridizing to other orthogonal primer pairs, to have a low potential to cross-hybridize with one another or to have a low potential to form secondary- structures, and to have similar melting temperatures (Tms) to one another. Orthogonal primer pair design and software useful for designing orthogonal primer pairs is known to a skilled in the art. In certain exemplar - embodiments, overlap extension PCR or assembly PCR is used to produce a nucleic acid sequence of interest from, a plurality of nucleic acid sequences. "Assembly PCR" refers to the synthesis of long, double stranded nucleic acid sequences by performing PCR on a pool of oligonucleotides or nucleic acid sequences having overlapping segments. Assembly PCR is discussed further in Stemmer et al . (1995) Gene 164:49, In certain aspects, PCR assembly is used to assemble single stranded nucleic acid sequences (e.g., ssDNA) into a nucleic acid sequence of interest. In other aspects, PCR assembly is used to assemble double stranded nucleic acid sequences (e.g., dsDNA) into a nucleic acid sequence of interest.

"Complementar '" or "substantially complementary" refers to the hybridization or base pairing or the formation of a duplex between nucleotides or nucleic acids, such as, for instance, between the two strands of a double stranded DNA molecule or between an oligonucleotide primer and a primer binding site on a single stranded nucleic acid.

Complementary nucleotides are, generally, A and T (or A and U), or C and G. Two single- stranded RNA or DNA molecules are said to be substantially complementary when the nucleotides of one strand, optimally aligned and compared and with appropriate nucleotide insertions or deletions, pair with at least about 80% of the nucleotides of the other strand, usually at least about 90% to 95%, and more preferably from, about 98 to 100%.

Alternatively, substantial complementarity exists when an RNA or DNA strand will hybridize under selective hybridization conditions to its complement. Typically, selective hybridization will occur when there is at least about 65% complementary over a stretch of at least 14 to 25 nucleotides, preferably at least about 75%, more preferably at least about 90%

complementary. See Kanehisa (1984) Nucl. Acids Res. 12:203. "Complex" refers to an assemblage or aggregate of molecules in direct or indirect contact with one another. In one aspect, '"contact," or more particularly, '"direct contact," in reference to a complex of molecules or in reference to specificity or specific binding, means two or more molecules are close enough so that attractive noncovalent interactions, such as van der Waal forces, hydrogen bonding, ionic and hydrophobic interactions, and the like, dominate the interaction of the molecules. In such an aspect, a complex of molecules is stable in that under assay conditions the complex is ihermodynamically more favorable than a non- aggregated, or non-complexed, state of its component molecules. As used herein, "complex" refers to a duplex or triplex of polynucleotides or a stable aggregate of two or more proteins. In regard to the latter, a complex can be formed by an antibody specifically binding to its corresponding antigen.

^"'Duplex" refers to at least two oligonucleotides and/or polynucleotides that are fully or partially complementary undergo Watson-Crick type base pairing among all or most of their nucleotides so that a stable complex is formed. The terms "annealing" and

"hybridization" are used interchangeably to mean the formation of a stable duplex. In one aspect, stable duplex means that a duplex structure is not destroyed by a stringent wash, e.g., conditions including temperature of about 5° C. less that the T_mof a strand of the duplex and low monovalent salt concentration, e.g., less than 0.2 M, or less than 0.1 M. "Perfectly matched" in reference to a duplex means that the polynucleotide or oligonucleotide strands making up the duplex form a double stranded structure with one another such that every nucleotide in each strand undergoes Watson-Crick base pairing with a nucleotide in the other strand. The term "duplex" comprehends the pairing of nucleoside analogs, such as

deoxyinosine, nucleosides with 2-aminopurine bases, PNAs, and the like, that may be employed. A "mismatch" in a duplex between two oligonucleotides or polynucleotides means that a pair of nucleotides in the duplex fails to undergo Watson-Crick bonding.

"Hybridization" refers to the process in which two single-stranded polynucleotides bind non-covalently to form a stable double-stranded polynucleotide. The term

"hybridization" may also refer to triple -stranded hybridization. The resulting (usually) double-stranded polynucleotide is a "hybrid" or "duplex." "Hybridization conditions" will typically include salt concentrations of less than about 1 M, more usually less than about 500 mM and even more usually less than about 200 mM. Hybridization temperatures can be as low as 5°C, but are typically greater than 22°C, more typically greater than about 30°C, and often in excess of about 37°C. Hybridizations are usually performed under stringent conditions, i.e., conditions under which a probe will hybridize to its target subsequence.

Stringent conditions are sequence-dependent and are different in different circumstances.

Longer fragments may require higher hybridization temperatures for specific hybridization.

As other factors may affect the stringency of hybridization, including base composition and length of the complementary strands, presence of organic solvents and extent of base mismatching, the combination of parameters is more important than the absolute measure of any one alone. Generally, stringent conditions are selected to be about 5°C. lower than the T_D, for the specific sequence at s defined ionic strength and pH. Exemplary stringent conditions include salt concentration of at least 0.01 M to no more than 1 M Na ion concentration (or other salts) at a pH 7.0 to 8.3 and a temperature of at least 25° C. For example, conditions of

5 SSPE (750 mM NaCl, 50 mM Na phosphate, 5 mM EDTA, pH 7.4) and a temperature of

25-30°C. are suitable for allele-specific probe hybridizations. For stringent conditions, see for example, Sambrook, Fritsche and Maniatis, Molecular Cloning A Laboratory Manual, 2nd

Ed. Cold Spring Harbor Press (1989) and Anderson Nucleic Acid Hybridization, 1^st Ed., BIOS

Scientific Publishers Limited (1999). "Hybridizing specifically to" or "specifically hybridizing to" or like expressions refer to the binding, duplexing, or hybridizing of a molecule substantially to or only to a particular nucleotide sequence or sequences under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular) DNA or RNA.

'^"Kit" refers to any delivery system for delivering materials or reagents for carrying out a method of the invention. In the context of assays, such deliver}' systems include systems that allow for the storage, transport, or delivery of reaction reagents (e.g., primers, enzymes, etc. in the appropriate containers) and/or supporting materials (e.g., buffers, written instructions for performing the assay etc.) from one location to another. For example, kits include one or more enclosures (e.g., boxes) containing the relevant reaction reagents and/or supporting materials for assays of the invention. Such contents may be delivered to the intended recipient together or separately. For example, a first container may contain an enzyme for use in an assay, while a second container contains primers.

'^"Ligation" means to form a covalent bond or linkage between the termini of two or more nucleic acids, e.g., oligonucleotides and/or polynucleotides, in a template-driven reaction. The nature of the bond or linkage may vary widely and the ligation may be carried out enzymatically or chemically. As used herein, ligations are usually carried out enzymatically to form a phosphodiester linkage between a 5 ' carbon of a terminal nucleotide of one oligonucleotide with 3' carbon of another oligonucleotide. A variety of template- driven ligation reactions are described in the following references: Whitely et al., U.S. Pat. No. 4,883,750; Letsmger et al., U.S. Pat. No. 5,476,930; Fung et al, U.S. Pat. No. 5,593,826; Kool, U.S. Pat. No. 5,426, 180; Landegren et al., U.S. Pat. No. 5,871 ,921 ; Xu and Kool (1999) NucL Acids Res. 27:875; Higgms et al, Meth. in Enzymol. (1979) 68:50; Engler et al. (1982) The Enzymes, 15:3 (1982); and Namsaraev, U.S. Patent Pub. 2004/01 102 3.

"Amplifying" includes the production of copies of a nucleic acid molecule via repeated rounds of primed enzymatic synthesis. "In situ" amplification indicated that the amplification takes place with the template nucleic acid molecule positioned on a support or in a gel, rather than in solution. In situ amplification methods are described in U.S. Pat. No. 6,432,360.

"Support" can refer to a matrix upon which molecules of the present invention are placed. The support can be solid or semi-solid or a gel. "Semi-solid" refers to a compressible matrix with both a solid and a liquid component, wherein the liquid occupies pores, spaces or other interstices between the solid matrix elements. Semi-solid supports can be selected from polyacrylamide, cellulose, polyamide (nylon) and crossed linked agarose, dextran and polyethylene glycol.

"Primer" includes an oligonucleotide, either natural or synthetic, that is capable, upon forming a duplex with a polynucleotide template, of acting as a point of initiation of nucleic acid synthesis and being extended from its 3' end along the template so that an extended duplex is formed. The sequence of nucleotides added during the extension process are determined by the sequence of the template polynucleotide. Usually primers are extended by a D A polymerase. Primers usually have a length in the range of between 3 to 36 nucleotides, also 5 to 24 nucleotides, also from 14 to 36 nucleotides. Primers within the scope of the invention include orthogonal primers, amplification primers, constructions primers and the like. Pairs of primers can flank a sequence of interest or a set of sequences of interest.

Primers and probes can be degenerate in sequence. Primers within the scope of the present invention bind adjacent to a target sequence (e.g., an oligonucleotide sequence of an oligonucleotide set or a nucleic acid sequence of interest).

In other embodiments, temporary orthogonal primers/primer binding sites may be removed using enzymatic cleavage. For example, orthogonal primers/primer binding sites may be designed to include a restriction endonuclease cleavage site. After amplification, the pool of nucleic acids may be contacted with one or more endonucleases to produce double stranded breaks thereby removing the primers/primer binding sites. In certain embodiments, the forward and reverse primers may be removed by the same or different restriction endonucleases. Any type of restriction endonuclease may be used to remove the

primers/primer binding sites from nucleic acid se uences. A wide variety of restriction endonucleases having specific binding and/or cleavage sites are commercially available, for example, from New England Biolabs (Ipswich, Mass.). In various embodiments, restriction endonucleases that produce 3' overhangs, 5' overhangs or blunt ends may be used. When using a restriction endonuclease that produces an overhang, an exonuclease (e.g., Reclf, Exonuclease I, Exonuclease T, Si nuclease, Pi nuclease, mung bean nuclease, CEL I nuclease, etc.) may be used to produce blunt ends. In an exemplary embodiment, an orthogonal primer/primer binding site that contains a binding and/or cleavage site for a type IIS restriction endonuclease may be used to remove the temporary orthogonal primer binding site

As used herein, the term "restriction endonuclease recognition site" is intended to include, but is not limited to, a particular nucleic acid sequence to which one or more restriction enzymes bind, resulting in cleavage of a DNA molecule either at the restriction endonuclease recognition sequence itself, or at a sequence distal to the restriction

endonuclease recognition sequence. Restriction enzymes include, but are not limited to, type I enzymes, type II enzymes, type IIS enzymes, type III enzymes and type IV enzymes. The REBASE database provides a comprehensive database of information about restriction enzymes, DNA methyltransferases and related proteins involved in restriction-modification. It contains both published and unpublished work with information about restriction endonuclease recognition sites and restriction endonuclease cleavage sites, isoschizomers, commercial availability, crystal and sequence data (see Roberts et. al. (2005) Nucl. Acids Res. 33:D230, incorporated herein by reference in its entirety for all purposes).

In certain aspects, primers of the present invention include one or more restriction endonuclease recognition sites that enable type HS enzymes to cleave the nucleic acid several base pairs 3' to the restriction endonuclease recognition sequence. As used herein, the term "type IIS^" refers to a restriction enzyme that cuts at a site remote from its recognition sequence. Type HS enzymes are known to cut at a distances from their recognition sites ranging from 0 to 20 base pairs. Examples of Type Hs endonucleases include, for example, enzymes that produce a 3' overhang, such as, for example, Bsr I, Bsm I, BstF5 I, BsrD I, Bts I, Mnl I, BciV I, Hph I, Mbo II, Eci I, Acu I, Bpm I, Mme I, BsaX I, Beg I, Bae I, Bfi I, TspDT I, TspGW I, Taq II, Eco57 1, Eco57M I, Gsu I, Ppi I, and Psr I; enzymes that produce a 5' overhang such as, for example, BsmA I, Pie I, Fau I, Sap I, BspM I, SfaN I, Hga I, Bvb I, Fok I, BceA I, BsmF I, Ksp632 I, Eco31 I, Esp3 I, Aar I; and enzymes that produce a blunt end, such as, for example, Mly I and Btr I. Type-IIs endonucleases are commercially available and are well known in the art (New England Bioiabs, Beverly, Mass.). Information about the recognition sites, cut sites and conditions for digestion using type Hs endonucleases may be found, for example, on the Worldwide web at

neb.com/nebecomm/enzymefindersearch bytypells.asp). Restriction endonuclease sequences and restriction enzymes are well known in the art and restriction enzymes are commercially available (New England Bioiabs, Ipswich, Mass.).

2.3 Primers (e.g., orthogonal primers, amplification primers, construction primers and the like) suitable for use in the methods disclosed herein may be designed with the aid of a computer program, such as, for example, DNA Works, Gene20iigo, or using the parameters software described herein. Typically primers are from about 5 to about 500, about 10 to about 100, about 10 to about 50, or about 10 to about 30 nucleotides in length. In certain exemplary embodiments, a set of orthogonal primers or a plurality of sets of orthogonal prim ers are designed so as to have substantially similar melting temperatures to facilitate manipulation of a complex reaction mixture. The melting temperature may be influenced, for example, by- primer length and nucleotide composition. In certain exemplaiy embodiments, a plurality of sets of orthogonal primers are designed such that each set of orthogonal primers is mutually non-hybridizing with one another. Methods for designing orthogonal primers are described further herein.

'"Solid support," "support," and "solid phase support" are used interchangeably and refer to a material or group of materials having a rigid or semi-rigid surface or surfaces. In many embodiments, at least one surface of the solid support will be substantially flat, although in some embodiments it may be desirable to physically separate synthesis regions for different compounds with, for example, wells, raised regions, pins, etched trenches, or the like. According to other embodiments, the solid support(s) will take the form of beads, resins, gels, microspheres, or other geometric configurations. Microarrays usually comprise at least one planar solid phase support, such as a glass microscope slide. Semisolid supports and gel supports are also useful in the present invention.

"Specific" or "specificity" in reference to the binding of one molecule to another molecule, such as a target to a binder, means the recognition, contact, and formation of a stable complex between the two molecules, together with substantially less recognition, contact, or complex formation of that molecule with other molecules. In one aspect, "specific'¹ in reference to the binding of a first molecule to a second molecule means that to the extent the first molecule recognizes and forms a complex with another molecule in a reaction or sample, it forms the largest number of the complexes with the second molecule. In certain aspects, this largest number is at least fifty percent. Generally, molecules involved in a specific binding event have areas on their surfaces or in cavities giving rise to specific recognition between the molecules binding to each other. Examples of specific binding include antibody-antigen interactions, enzyme-substrate interactions, formation of duplexes or triplexes among polynucleotides and/or oligonucleotides, receptor-ligand interactions, and the like. As used herein, "contact" in reference to specificity or specific binding means two molecules are close enough that weak non~covalent chemical interactions, such as van der Waal forces, hydrogen bonding, base-stacking interactions, ionic and hydrophobic interactions, and the like, dominate the interaction of the molecules.

In various embodiments, the methods disclosed herein comprise amplification of nucleic acids including, for example, oligonucleotides, subassemblies and/or polynucleotide constructs (e.g., nucleic acid sequences of interest). Amplification may be carried out at one or more stages during an assembly scheme and/or may be carried out one or more times at a given stage during assembly. Amplification methods may comprise contacting a nucleic acid with one or more primers that specifically hybridize to the nucleic acid under conditions that facilitate hybridization and chain extension. Exemplary methods for amplifying nucleic acids include the polymerase chain reaction (PCR) (see, e.g., Mullis et al. ( 1986) Cold Spring

Harh. Symp. Quant. Biol. 51 Pt 1 :263 and Cieary et al. (2004) Nature Methods 1 :241; and

U.S. Pat. Nos. 4,683,195 and 4,683,202), anchor PCR, RACE PCR, ligation chain reaction (LCR) (see, e.g., Landegran et al. (1988) Science 241 : 1077-1080; and Nakazawa et al. (1994) Proc. Natl. Acad. Sci. U.S. A. 91 :360-364), self sustained sequence replication (Guatelii et al. (1990) Proc. Nail Acad. Sci. U.S.A. 87: 1874), transcriptional amplification system (Kwoh et al. (1989) Proc. Natl. Acad. Sci. U.S.A. 86: 1 173), Q-Beta Replicase (Lizardi et al. (1988) BioTechnoiogy 6: 1 197), recursive PGR (Jaffe et al. (2000) J. Biol. Chem. 275:2619; and Williams et al. (2002) J Biol. Chem. 277:7790), the amplification methods described in U.S. Pat. Nos. 6,391,544, 6,365,375, 6,294,323, 6,261,797, 6,124,090 and 5,612,199, or any other nucleic acid amplification method using techniques well known to those of skill in the art.

"Polymerase chain reaction," or "PCR," refers to a reaction for the in vitro amplification of specific DNA sequences by the simultaneous primer extension of

complementary strands of DNA. In other words, PCR is a reaction for making multiple copies or replicates of a target nucleic acid flanked by primer binding sites, such reaction comprising one or more repetitions of the following steps: (i) denaturing the target nucleic acid, (ii) annealing primers to the primer binding sites, and (iii) extending the primers by a nucleic acid polymerase in the presence of nucleoside triphosphates. Usually, the reaction is cycled through different temperatures optimized for each step in a thermal cycler instrument. Particular temperatures, durations at each step, and rates of change between steps depend on many factors well-known to those of ordinary skill in the art, e.g., exemplified by the references: McPherson et ai., editors, PCR: A Practical Approach and PCR2: A Practical Approach (IRL Press, Oxford, 1991 and 1995, respectively). For example, in a conventional PCR using Taq DNA polymerase, a double stranded target nucleic acid may be denatured at a temperature greater than 90°C, primers annealed at a temperature in the range 50-75° C, and primers extended at a temperature in the range 72-78°C. The term "PCR" encompasses derivative forms of the reaction, including but not limited to, RT-PCR, real-time PCR, nested PCR, quantitative PCR, multiplexed PCR, assembly PCR and the like. Reaction volumes range from a few hundred nanoliters, e.g., 200 mL, to a few hundred microliters, e.g., 200 microliters. "Reverse transcription PCR," or "RT- PCR," means a PCR that is preceded by a reverse transcription reaction that converts a target RNA to a complementary single stranded DNA, which is then amplified, e.g., Tecott et ai., U.S. Pat. No. 5, 168,038. "Real-time PCR" means a PCR for which the amount of reaction product, i.e., amplicon, is monitored as the reaction proceeds. There are many forms of realtime PCR that differ mainly in the detection chemistries used for monitoring the reaction product, e.g., Gelfand et al., U.S. Pat. No. 5,210,015 ("Taqman"); Wittwer et ai., U.S. Pat. Nos. 6, 174,670 and 6,569,627 (intercalating dyes); Tyagi et ai., U.S. Pat. No. 5,925,517 (molecular beacons). Detection chemistries for real-time PCR are reviewed in Mackay et al., Nucleic Acids Research, 30: 1292-1305 (2002). "Nested PCR" means a two-stage PCR wherein the amplicon of a first PCR becomes the sample for a second PCR using a new set of primers, at least one of which binds to an interior location of the first amplicon. As used herein, "initial primers" in reference to a nested amplification reaction mean the primers used to generate a first amplicon, and "secondary primers" mean the one or more primers used to generate a second, or nested, amplicon. "Multiplexed PCR" means a PCR wherein multiple target sequences (or a single target sequence and one or more reference sequences) are simultaneously carried out in the same reaction mixture, e.g. Bernard et ai. (1999) Anal. Biochem.. 273:221-228 (two-color real-time PCR). Usually, distinct sets of primers are employed for each sequence being amplified. "Quantitative PCR" means a PCR designed to measure the abundance of one or more specific target sequences in a sample or specimen. Techniques for quantitative PCR are well-known to those of ordinary skill in the art, as exemplified in the following references: Freeman et al ., Biotechniques, 26: 112-126 (1999); Becker-Andre et al., Nucleic Acids Research, 17:9437-9447 (1989); Zimmerman et al., Biotechniques, 21 :268-279 (1996); Diviacco et al., Gene, 122: 3013-3020 (1992); Becker- Andre et al., Nucleic Acids Research, 17:9437-9446 (1989); and the like.

In certain embodiments, methods of determining the sequence of one or more nucleic acid sequences of interest are provided. Determination of the sequence of a nucleic acid sequence of interest can be performed using variety of sequencing methods known in the art including, but not limited to, sequencing by hybridization (SBH), sequencing by ligation (SBL), quantitative incremental fluorescent nucleotide addition sequencing (QIFNAS), stepwise ligation and cleavage, fluorescence resonance energy transfer (FRET), molecular beacons, T^'aqMan reporter probe digestion, pyrosequencing, fluorescent in situ sequencing (FISSEQ), FISSEQ beads (U.S. Pat. No. 7,425,431), wobble sequencing (PCT/US05/27695), multiplex sequencing (U.S. Ser. No. 12/027,039, filed Feb. 6, 2008; Porreca et al (2007) Nat. Methods 4:931 ), polymerized colony (POLONY) sequencing (U.S. Pat. Nos. 6,432,360, 6,485,944 and 6,511,803, and PCT/US05/06425); nanogrid rolling circle sequencing (ROLONY) (U.S. Ser. No. 12/120,541, filed May 14, 2008), allele-specific oligo ligation assays (e.g., oligo ligation assay (OLA), single template molecule OLA using a ligated linear probe and a rolling circle amplification (RCA) readout, ligated padlock probes, and/or single template molecule OLA using a ligated circular padlock probe and a rolling circle amplification (RCA) readout) and the like. High-throughput sequencing methods, e.g., on cyclic array sequencing using platforms such as Roche 454, Illumina Solexa, AB-SOLiD, Helicos, Polonator platforms and the like, can also be utilized. High-throughput sequencing methods are described in U.S. Ser. No. 61/162,913, filed Mar. 24, 2009. A variety of light- based sequencing technologies are known in the art (Landegren et al. (1998) Genome Res. 8:769-76; Kwok (2000) Pharmocogenomics 1 :95-100; and Shi (2001) Clin. Chem. 47: 164- 172).

Preparation of Barcoded Polypeptide Libraries by mRNA Display

The disclosure provides methods that use mRNA display of both the protein target library and the protein binder library that can facilitate tandem identification of molecular species following their interaction.

In different versions of mRNA display, several methods are employed to covalently attach the puromycin oligo to the 3' end of the mRNA. In most methods, a 3 ' puromycin modified oligo is attached to a common region of the 3 ' UTR of the mRNA. Attachment can be accomplished by UV crosslinking of a 5' terminal psoralen to the mRNA, and enzymatic ligation of the 5' terminal of the puromycin oligo to the 3 ^' OH group of the mRNA sequence.

Existing methods for mRNA display cannot support a 3' UTR because they require the explicit absence of stop codons to prevent ribosome recycling, and to stall the ribosome for efficient puromycin incorporation. In the psoralen cross-linked mRNA scheme, a secondary structure formed by the puromycin oligo and the mRN A stalls the ribosome for peptidyl transfer of the puromycin. In the enzymatic ligation scheme, the ribosome stalls at the mRN A-DN A oligo interface pri or to the peptidyl transfer of the puromycin.

The disclosure provides a novel process of mRNA display which facilitates display of 3' UTR barcodes unique to each protein target and its protein binder respectively, without in libraiy variation of the encoded C-terrninal protein sequence. In the disclosed scheme, the 3' UTR barcode region of both target and binder mRNAs serve multiple purposes. First, the barcode region standardizes the amplicon length and amplification conditions of the identifying barcode sequences, without changing the underlying protein sequence. This greatly simplifies PCR of the barcodes and sequencing processes. Second, placement of the barcode in the 3' UTR as opposed to the 5' UTR of the mR A ensures that the barcodes are in standard proximity during protein-protein interactions regardless of the length of the protein coding sequence involved. The 3' UTR barcode regions in the mRNA of binder and target libraries contain a common TYPE IIS endonuclease recognition sequence tandem to the barcode that facilitates covalent joining of the TV^'PE IIS endonuclease digested target and binder amplicons after PCR amplification. Finally, the barcode sequences provide

orthogonal priming sequences with common PCR conditions that allow for the enrichment of the individual sequences of interest within a particular library in a pooled or unpooled fashion.

In exemplary embodiments, the disclosure provides the 3' UTR of the coding sequences having the following common motifs:

Motif for 3' UTR of the Target Library

5' {Linker, Po!yStop, JCT-Fwd, Barcode, TypellS, Coml } 3'

Where Linker is a flexible glycine serine polypeptide encoding sequence, PolyStop is a repeated sequence of stop codons, JCT-Fwd, JCT-Rev, Coml, and Com2 are orthogonal 20 base pair isothermal PCR priming sequences. Barcode represents a set of mutually orthogonal 20 base pair PCR priming sequences, TypellS is a Type IIS endonuclease recognition site, for example a site recognized by the following enzymes: Hgal, Bbvl, BcoDI, BsniAI, BsmFl, FokL SfaNl, Bbsl, BfuAI, Bsal, BsmBL BspMI, BtgZI, Earl, BspQI, Sapl, or other Type lis DNA endonucleases. Motif for 3' UTR of the Binder Library

5^" {Linker, Poly Stop, JCT-Rev, UMI, Barcode, Type IIS, Com2} 3^"

Where Linker is a flexible glycine serine polypeptide encoding sequence, PolyStop is a repeated sequence of stop codons, JCT-Fwd, JCT-Rev, Coml, and Com2 are orthogonal 20 base pair isothermal PCR priming sequences, UMI represents a random 20 base pair unique molecule identifier sequence, Barcode represents a set of mutually orthogonal 20 base pair PCR priming sequences, TypellS is a Type IIS endonuclease recognition site, for example a site recognized by the following enzymes: Hgal, Bbvl, BcoDI, BsmAI, BsrnFI, Fokl, SfaNI, BbsL BfuAI, Bsa!, BsmBI, BspMI, BtgZI, Earl, BspQI, SapL or other Type lis DNA endonucleases.

The mRNA display strategy of the disclosure supports the conjugation of 3' UTR engineered mRNA sequences with the encoded proteins without translation tlirough the UTR region. This is accomplished by depleting all ribosome release factors from the in vitro translation system, and introducing multiple stop codons at the end of the protein coding sequence (applicable to both eukaryotic and prokaryotic in vitro translation systems.) The disclosed system has an equivalent property to conventional mRNA display methods where the ribosome will stall at the end of the protein coding sequence until the puromycin is incorporated, at which point, the protein-mRNA conjugate is free to translocate from the p- site with additional assistance from the ribosome recycling factor (FIGS. 1A and IB).

Preparation of UTR Barcoded ORFeome Libraries

The disclosure provides that when the proteins are of a known set of sequences, the

UTR barcode can be preassigned to these proteins, each with a unique barcode. In certain variations, barcodes can be randomly assigned to proteins. The UTR barcodes can be fused with the ORF encoding the proteins by methods known in the art, for example, by Golden Gate insertion, Gibson assembly, or the like. In the case of randomly assigned barcodes, the Barcode-ORF pairs must be sequenced fully to recover a barcode association map. Additionally, the unique Barcode-ORF pairs must be enriched from the population.

According to an exemplary embodiment of the disclosure, the UTR barcoding scheme presupposes a unique assignment of each barcode sequence to each displayed protein. The entire ORFeome library containing plasmid DNAs encoding QRFs of the protein targets and binders can be barcoded with preassigned barcodes in a one pot process provided thai the sequence 3' or 5' to the ORF sequence is constant, the plasmids at the site 3' or 5 'to the ORF will be digested as close as possible to the ORF sequence. The linearized plasmids can be recircularized by Gibson Assembly using a pool of barcoding oligonucleotides described herein and as illustrated in FIG. 2,

The "UTR Barcodes" as illustrated in FIG. 2 are contained in the following constructs of the libraries. A pair of target and binder molecules can also be named as a pair of bait and prey.

An exemplary Bait has the following motif:

{Linker, PoiyStop, JCT-Fwd, Barcode, TypellS, Com2, T7 Terminator} An exemplar. Prey has the following motif:

{Linker, PoiyStop, JCT-Rev, Barcode, TypellS, Coml , T7 Terminator}

Where Linker is a flexible glycine serine polypeptide encoding sequence, PoiyStop is a repeated sequence of stop codons, JCT-Fwd, JCT-Rev, Cora l, and Com2 are orthogonal 20 base pair isothermal PCR priming sequences. Barcode represents a set of mutually orthogonal 20 base pair PCR priming sequences, Type IIS is a Type IIS endonuclease recognition sequence, for example a sequence recognized by the following enzymes: Hgal, Bbvl, BcoDI, BsmAI, BsmFI, Fokl, SfaNI, Bbsl, BfuAI, Bsal, BsmBI, BspMI, BtgZI, Earl, BspQI, Sapl, or other Type lis DNA endonucleases.

UTR Barcoding Oligo Design

UTR barcoding oligo for Bait according to an exemplary embodiment of the disclosure has the following motif:

5' {ORF (30bp of homology), Sequence 5' to cut site, Linker, Poly Stop, JCT-Fwd, Barcode, Type IIS, Com!, T7 Terminator, Sequence 3' to cut site (30bp of homology)} 3'

UTR barcoding oligo for Prey according to an exemplary embodiment of the disclosure has the following motif:

5' {ORF (30bp of homology), Sequence of AttL2 5' to cut site, Linker, PolyStop, JCT-Rev, Barcode, Type IIS, Com2, T7 Terminator, Sequence 3' to cut site (30bp of homology)} 3'

Where ORF is the open reading frame of DN A sequence of a set of genes collectively termed 'ORFeome', AttLl and AttL2 are orthogonal Phage Recombinase recognition sequences, Linker is a flexible glycine serine polypeptide encoding sequence, PolyStop is a repeated sequence of stop codons, JCT-Fwd, JCT-Rev, Com 1, and Com2 are orthogonal 20 base pair isothermal PCR priming sequences. Barcode represents a set of mutually orthogonal 20 base pair PGR priming sequences, Type IIS is a Type IIS endonuclease recognition site, for example a site recognized by the following enzymes: Hgal, Bbvl, BcoDI, BsmAI, Bsm.FI, Fokl, SfaNL Bbsl, BfuAI, Bsal, BsmBl, BspMI, BtgZI, Earl, BspQI, Sapl, or other Type IIS DNA endonucleases.

The disclosure provides that Gibson Assembly with barcoding pools permits direct assignment of barcodes to specific ORF sequences. In general, only barcoded sequences will be recircularized. These recircularized sequences can be selectively amplified by rolling circle amplification (RCA), inverse PGR, transformation or plasmid purification from E. coli.

Methods for cloning ORFeomes are known in the art and have been descirbed in Rajagopaia, S. V et al. The Escherichia coli K-12 ORFeome: a resource for comparative molecular microbiology, BMC Genomics 11, 470 (2010) and Brasch, M. A., Hartley, J. L. & Vidal, M., ORFeome cloning and systems biology: standardized mass production of the parts from the parts-l ist, Genome Res, 14, 2001-9 (2004), hereby incorporated by reference in their entireties. For example, ORFeomes that are maintained in Gateway vectors have been described in Hartley, J. L., DNA Cloning Using In Vitro Site-Specific Recombination, Genome Res. 10, 1788-1795 (2000) hereby incorporated by reference in its entirety. In the ORFeomes, where the ORFs are flanked by Art sequences that serve as recombinase recognition sites in the cloning system. In pDONOR gateway vectors, the sequence 3' to the ORF is termed At†I,2. To maintain Gateway cloning functionality, the inserted barcodes rescue the AttL sequences that were destroyed during digestion.

Commercially available restriction enzymes cannot recognize regions proximal to

ORFs in AttL2, and many enzymes cleave internal to member ORF sequences. In contrast,

Cas9 and in vitro prepared guide RNAs can cleave within 6 base pairs to the ORF sequence. The disclosure provides using Cas9 guide RNA for cloning of barcoded ORFeome libraries. In the Cas9 guide RNA mediated digestion, only 2 amino acids of Oterminai scar sequence is generated. In vitro Cas9 guide RMA digestion of ORFeome plasmids is a more general and robust digestion strategy for pooled barcoding of ORFeomes, which is contemplated by the disclosure. The disclosure provides that UTR barcoding DNA libraries of oligos as large as lxlO⁶ sequences and up to iSObp in length can be ordered from commercial source such as from the Agilent Oligo Library Service.

One Challenge to modifying pooled ORF sequences for mRNA display is that both the length and sequence of ORFs are highly variable, thus precluding PCR and Restriction cloning. To facilitate pooled manipulation of ORFeome pools, the disclosure provides DNA scaffolds which were designed with AttR cloning sites that provide the requisite transcription and translation motifs for prokaryotic and eukaryotic in vitro transcription and translation, as well as fusion with protein immobilization tag either N-terminal or C-terminal to the proteins. These DNA scaffolds facilitate one pot preparation of the entire ORFeomes for general expression and mRNA display (FIG. 3).

An exemplary DNA scaffold sequence is shown as follows:

5' { AttR2, T7 Promoter, IRES, Kozak, protein immobilization tag, Linker, AttRl } 3'

Where AttR2 and AttRl are orthogonal Phage Recombinase recognition sequences, IRES is an internal ribosomal entry sequence, Kozak is the Kozak consensus sequence,

HaloTag is an exemplary protein immobilization tag: Haloalkane dehalogenase mutant from Rhodococus, and Linker is a flexible glycine serine polypeptide encoding sequence. Puromycin Oligo Composition and UTR Design for Attachment

The disclosure provides attaching puromycin oligo to a common region (Com) of the 3' UTR. The sequence Com l or Com2 may vary depending on the puromycin oligo conjugation method used. It is desirable to keep the 3' UTR short where possible. In most methods, a 3' puromycin modified oligo is attached to a common region of the 3' UTR. Attachment process disclosed herein includes UV crosslmking of a 5' terminal psoralen to the mRNA 3' UTR, and enzymatic ligation of the 5' tenninai of the puromycin oligo to the 3' OH terminal of the mRNA 3' UTR sequence.

In the UV crosslinking method, the puromycin oligo DNA sequences for UV crosslmking are as follows:

5' {Psoralen, Corn !*, C9, C9, Puromycm} 3'

5' { Psoralen, Com2*, C9, C9, Puromycin} 3'

Where psoralen is the UV active crosslinking motif, Coml and Com2 are orthogonal 20 base pair isothermal priming sequences, C9 represents a C9 flexible linker sequence, and Puromycin is the puromycin antibiotic. * indicates reverse compliment sequence of Com.

The sequence UAA is appended to the 3' terminal sequence of the UTR barcode to facilitate psoralen crosslinking on hybridization, introducing local adenine niicleobases has been demonstrated to increase the efficiency of psoralen UV crosslinking and has been described in Kurz M et al., Nucleic Acids Res., 2000 Sep 15;28( 18):E83, Psoralen photo- crosslinked mRNA-puromycin conjugates: a novel template for the rapid and facile preparation of mRNA -protein fusions, hereby incorporated by reference in its entirety. In the Splint ligation method, the puromycm oligo DNA sequences for Splint ligation are as follows:

5' {Com l(l 1 -20), C9, C9, Puromycin } 3'

5' {Com2( 11 -20), C9, C9, Puromycin } 3'

Where Coml(l 1-20) and Com2(l 1-20) are the latter 10 base pairs of coml and com2 sequences. Com l and Corn2 are orthogonal 20 base pairs isothermal priming sequences and C9 represents a C9 flexible linker sequence, and Puromycin is the puromycin antibiotic.

The splint oligo sequences for Splint ligation are as follows:

5^" {Coml* } 3'

5' {Com2* } 3'

For splint ligation, the coml and com2 in the UTR are truncated 10 base pairs from the 3 ' end. This has the effect that the full Coml and Coni2 sequences are produced upon splint ligation. This design shortens the overall UTR length.

Methods for Preparation of DNA Barcoded Diverse Protein Binder Libraries

The disclosure provides protein binder libraries with sequence diversity up to but not limited to, lxlO¹⁵ can be produced with a variety of methods. Exemplar}_' Protein binders and scaffolds are described in Table 2.

Table 2. A list of protein binders and scaffolds.

Protein Binder Scaffold Antibody Fv Domain from IgG

Single-chain

Fv Domain from IgG

variable fragment

Single Domain Variable domain from IgG of the Camelid Family, Variable domain Antibody from. IgG Heavy chain in Carteleginic Fish

Affibodies Z domain of Protein A

Affihns Gamma-B crystalline, Ubiquitin

Affimers Cystatin

Affitins Sac7d (from Sulfolohus acidocaldarius)

Alphabodies Triple helix coiled coil

Anticalins Lipocalins

Avimers A domains of various membrane receptors

DARPins Ankyrin repeat motif

Fynomers SH3 domain of Fyn

Monobodies 10th type III domain of fibronectin

The disclosure contemplates that it is generally desirable that the synthetic ORFeome libraries for affinity maturation have a common sequence length and therefore are amenable to PGR based ampiicon preparation. The methods for assembly and preparation of fixed length UTR barcoded libraries of DNA sequences are known in the art, which includes overlap extension assembly, although Golden Gate assembly may be preferable. Degenerate fixed length libraries can be barcoded in either the 5 ' UTR or 3 ' UTR.

The disclosure provides that a complete rnRNA display ready protein library' including barcodes and UMI sequences can be made in a one pot Golden Gate reaction without subsequent amplification. Diverse libraries can be made and diversified by recombination, error prone PGR, and other methods known to a skilled in the art. The disclosure provides that in the mRNA display process according to certain embodiments, as many as a IxlO¹⁵ unique sequences may be simultaneously displayed against a target. In the initial panning of the library against the targets, the diversity of library sequences greatly exceeds the diversity limits of the orthogonal PGR barcodes which may be as diverse as IxlO⁶ sequences. Even in the final panning, the diversity of library sequences may remain as high as 1 xlO⁸ sequences.

The conditions of panning can be tuned such that the barcode sequence diversity is estimated to exceed the diversity of binder library sequences. This diversity and

addressability bottle neck poses a problem both in terms of both orthogonal single step retrieval, and unique identification of protein binder sequences for interaction assignment.

The disclosure provides two strategies to solve this problem. First, the clonal retrieval is redesigned as a two-step process, where both the target and binder barcodes combine to yield up to but not limited to a 1 x 10¹² address space. Second, a completely degenerate 20 base pair sequence is employed whose diversity of greater than 1 x 10¹² unique sequences exceeds the maximum binder library diversity by orders of magnitude. By separating the functions of the molecular identification and orthogonal PGR based retrieval from the single barcode sequence, the UMI sequence enables a unique identification of every binder sequence, and the orthogonal PGR priming barcode pool can be allowed to be less diverse than the original binder library, yet still sufficiently diverse to enable clonal retrieval PGR after selection in a small number of steps. In conclusion, the libraries of protein binders with predetermined composition are barcoded and identified in a preassigned manner using sequence homology based methods, and degenerate protein binder libraries are barcoded in a randomly assigned manner that ensures that each library sequence is assigned a unique UMI barcode.

Removal of Stop Codons from the Protein Binder Library

When degenerate NNK codon protein libraries, are assembled, they can contain internal stop codons which prematurely truncate translation of the binding protein. NNK refers to a degenerate codon where N is the nucleobase adenine, thymidine, cytosine or guanine, and K is thymidine or guanine. Premature translation termination may, depending on the mRNA display method, lead to expression of competing non-mR A displayed protein. Additionally, internal stop codons can generation different length protein binders that are undesirable in downstream applications because proteins of different length may introduce variability in immunoprecipitation and complicate detection and staining properties within the libraiy.

The disclosure provides methods to filter out constructs that contain premature stop codons from the naive binder libraiy. According to certain embodiments, the entire libraiy of mRNA display is fused with a C-terminal pulldown tag, such that only the full length protein binders are recovered following affinity purification using the pulldown tag, RT-PCR, and gel purification. The refined library is then subsequently ready to use in the initial panning against the target library and subsequent affinity maturation and evolution steps.

Sequence Verification of Barcode Assignment

The disclosure provides methods to prepare for sequence verification of the barcode- library sequences. According to certain embodiments, a single adapter Tn5 tagmentation reaction is first performed. This transposase mediated reaction will add the Illumina P5 priming sequence randomly within the ORF library coding sequence. Following single end adapter tagmentati on, a limited cycle PGR is performed, which primes the common sequence with an Illumina P7 sequencing adapter and the tagmentati on site to produce a mappable coding sequence-barcode pair. The paired end sequencing primers P5 and P7 are illustrated as ' Illumina PEF and 'Illumina PE2' respectively (FIG. 4).

The following examples are set forth as being representative of the present disclosure. These examples are not to be constmed as limiting the scope of the present disclosure as these and other equivalent embodiments will be apparent in view of the present disclosure, figures and accompanying claims.

EXAMPLES

Expression and Preparation of mRNA Displayed Soluble Protein Targets and Binders for Panning

In this example, DNA libraries are prepared at a concentration ready for direct in vitro transcription by T7 RNA Polymerase. Approximately one microgram of library DNA is prepared for transcription. 10s to 100s of micrograms of RNA can be produced from a single in vitro transcription reaction. After transcription, the mRNA is processed to remove template DNA from the reaction. DNAse I treatment is the most common method for removing the template DNA. However, the DNAse must be cleared from the solution by methods such as ethanol precipitation, LiCl precipitation or column based RNA cleanup methods. In the alternative, LiCl precipitation can readily recover pure mRNA without the need for DNAse treatment.

After cleanup, the mRN A is ready for puromycin conjugation. As described herein, there are two major methods that covaiently couple puromycin oligonucleotides to the mRNA, which are Splint ligation and UV psoralen crosslinking. The removal of free unconjugated mRNA is critical to ensure that downstream translation produces only mRNA conjugated protein and not free protein.

Several enzymatic approaches were devised to clear unconjugated mRNA from the reaction. When properly conjugated either by UV crosslinking or Splint ligation, the 3 ' end of the mRNA is not accessible to exonbonuclease. This property was exploited to clear unconjugated mRNA with enzymes. Specifically, it was found that RNase R, exonbonuclease II and exonuclease T, all can degrade unconjugated free mRNA preferably over crosslinked and Splint ligated rnRNA. An alternative strategy for clearing unconjugated free mRNA is to pull down using the common puromycin oligo sequence, for example, a poly A sequence that can be targeted by Poly 1 beads.

After subsequent elution and enzymatic cleanup, the mRNA-puromycin conjugate is ready for in vitro translation. A variety of methods are known in the art for puromycin- peptide fusion based in vitro translation and mRNA display. These methods involve stalling mRNA secondary structure, stalling mRNA-cDNA duplex, stalling ribosome at stop codons with release factor (RF) depletion as described elsewhere herein. In vitro translation may take place in different lysates dependent on the protein to be displayed and required expression conditions. For example, if the protein is prokaryotic in origin, or contains a single domain with no obligate disulfide bonds, then the mRNA-puromycin conjugate will be translated in prokaryotic system such as the PURE Expression system.. According to one embodiment, the release factor (RF) depletion strategy is employed which uses the delta RF^" PURE system.

If the protein to be displayed has multiple domains and has obligate disulfide bonds, or is eukaryotic in origin, then a eukaryotic expression system will be used. A variety of commercially available eukaryotic systems are available. According to one embodiment, the commercially available human HeLa cell lysate was used as the eukaryotic expression system for in vitro translation (IVT). This system is also used for validated microarray production. Typically, the eukaryotic IVT systems require RNase and protease inhibitors. In the system as herein disclosed, the Release Factor Regeneration inhibitor was used. Other eukaryotic IVT systems can also be used which include human HeLa cell lysate, rabbit reticulocyte lysate, wheat germ lysate, and insect cell lysate.

Following in vitro translation, the mRNA conjugated to the protein must be passivated to an mRNA-cDNA duplex by reverse transcription. This is to prevent interaction biasing mRNA-mRNA interactions, and also avoids in situ reverse transcription which is relatively inefficient. Reverse transcription with MulV Reverse Transcriptase is performed following desalting. The denaturation, annealing, and extension steps in the reverse transcription are kept at or below 37 degrees Celsius to prevent thermal denaturation of the protein-mRNA conjugated complex.

Expression and Preparation of mRNA Displayed Membrane Protein Targets for Panning

The disclosure provides methods for determining binding between libraries of a vari ety of proteins including membrane proteins. For example, properly folded membrane proteins can be mRNA displayed using the translation protocol described herein by performing the in vitro translation in the presence of empty nanodisc assemblies and in the absence of detergent. Protocols for expressing high level of membrane proteins in nanodiscs in in vitro translation systems are known to a skilled in the art and have been described in

Roos, C. et a!., High-level cell-free production of membrane proteins with nanodiscs, Methods Mol. Biol. 1118, 109-30 (20 4) hereby incorporated by reference in its entirety. Traditionally, nanodiscs are composed of a membrane scaffold protein, and a phospholipid, such as palmitoyl-oleoyl-glycero-phosphocholine (POPC) and Dimyristoyl-glycero- phosphocholine (DMPC) as described in Denisov, I. G., Grinkova, Y. V, Lazarides, A. A. & Sligar, S, G,, Directed self-assembly of monodisperse phospholipid bilayer Nanodiscs with controlled size, ./ Am. Chem. Soc. 126, 3477-87 (2004) hereby incorporated by reference in its entirety. Nanodiscs can be used in cell-free expression reactions to directly integrate the nascent membrane protein into the nanodisc without any detergent added. The optimal concentration of the nanodisc is dependent on the membrane protein expression rate.

The example provides methods to mRNA display membrane proteins. Specifically, ORFs of the membrane proteins are pooled together and engineered to express separately both as N and C-terminal protein tagged fusion proteins. The UTR region of the membrane protein ORFeome pool is the same as the non-membrane protein UTR in the ORFeome Pool . Using the mRNA display with nanodisks according to methods described herein allows for library on library interaction between molecules and different classes of membrane proteins to be characterized in a pooled fashion.

Method for Production of DNA Barcoded Small Molecule Libraries

The disclosure provides methods for determining binding between molecules of DNA barcoded chemical libraries. DNA barcoded chemical libraries permit the production, enrichment, and diversification of small molecule libraries using hybridization based combinatorial chemistry in combination with affinity maturation techniques. Different methods for DNA directed small molecule library preparation are known to a skilled in the art and have been described and reviewed in Kleiner, R. E., Dumelin, C. E. & Liu, D. R., Small -molecule discover ' from DNA-encoded chemical libraries, Chem. Soc. Rev. 40, 5707- 17 (201 1) hereby incorporated by reference in its entirety.

The example uses the general approach to DNA barcoded library' synthesis which involves alternating library synthesis steps with enzyme-catalyzed DNA polymerization or ligation of short DNA sequences used to barcode each synthetic step of small molecules. Library diversity is generated over repeated cycles of division, synthesis, and pooling, a well- established method in combinatorial synthesis. DNA-barcoded combinatorial libraries have been made containing up to 8xl0⁸ individual library members as described in Clark, M. A. et al . Design, synthesis and selection of DNA-encoded small -molecule libraries, Nat. Chem. Biol. 5, 647-54 (2009) hereby incorporated by reference in its entirety.

In the synthetic scheme used to produce the 8x10⁸ member library, the initial DNA duplex sequence is arbitrary, as is the encoded the length of the 3' overhang, which formed a substrate for subsequent ligation to barcoded tags. Further, the DNA coding sequences were also arbitrary in length and size. The barcodmg tags were short double-stranded DNA sequences consisting of a variable region flanked by constant 3' overhangs. The 3' overhangs were unique to each cycle of synthesis, so that each set of tags could only ligate to the set from the preceding cycle, and not to trancated sequences. Because the sequence of the degrees of freedom in the synthetic scheme, the split pool ligation reaction can proceed such that the composed DNA barcode follows the UTR barcode composition:

5 ' {JCT-Rev, Coding, Barcode, Type IIS, Com 2} 3 ^'

Where JCT-Rev, and Com2 are orthogonal 20 base pair isothermal PCR priming sequences, 'coding' is the assembled molecule coding sequence, Barcode represents a set of mutually orthogonal 20 base pair PGR priming sequences, TypelTS is a Type IIS

endonuclea.se recognition sequence, for example a sequence recognized by the following enzymes: Hgal, Bbvl, BcoDl, BsmAL BsmFI, FokL SfaNl, Bbsl, BfuAl, BsaL BsniBL BspMI, BtgZI, Earl, BspQI, Sapl, or other Type IIS DNA endonucleases. mRNA Display Panning and Protein Binder Evolution

The disclosure provides in an affinity maturation process, the proteins of the target library are attached to a surface. According to an exemplary embodiment of the disclosure, an ideal surface attachment is accomplished using a common feature on the target molecule. Additionally, the surface area for panning should be scaled according to the product of the target and binder library complexity to adequately sample possible interactions (FIG. 5).

According to some embodiments, the common feature on the target molecule for attachment may use a common protein immobilization tag, including but not limited to a Clip Tag, Snap Tag, MCP Tag, ACP Tag or Halo Tag domains. These protein domains form robust, physiologically irreversible covalent bonds with their respective ligands upon binding. These protein domains can be engineered at either the C-terminal or N-terminal of the target proteins. According to certain embodiments, pools of N and C terminal protein tag and target protein fusions can be screened together. Halo Tag domains react with Haloalkane ligands. Clip Tag reacts with benzylcytosine ligands, MCP and ACP tags react with Acetyl -CO A iigand derivatives and Snap Tags react with benzylguanine ligands.

In an exemplar}' embodiment of the mRNA display scheme of the example, the panmng substrate is a Sepherose resin having a protein immobilization tag iigand

funciionalized on the surface of the resin. For example, Sepherose45 resin provides high surface area to volume ratio of Haloalkane Irgands.

To prepare for the resin for each panning step, the pooled protein immobilization tagged target library were first loaded onto the resin. Next, the remaining protein

immobilization tag ligand surface of the resin was saturate with free protein immobilization tags. Due to the nature of this affinity maturation and evolution scheme, it only needed to mRNA display the protein immobilization tagged target library in the last panning step for interaction mapping.

After the resin was loaded with the target library of protein-mRNA conjugates and the mRNA attached to the target proteins were passivated, a degenerate binder library of mRN A displayed protein-mRNA conjugates that were passivated were panned through the resin, in the present of competitive negative selection components. The negative selection components included free protein immobilization tags and mRNA-cDNA duplexes. The standard mRNA display evolution scheme as described herein is followed for all panning steps.

Crosslinking of Protein-Protein Complexes and Cleavage of Complexes from Solid Supports

Bound target binder pairs from the panning were treated with a protein crosslinking reagent to stabilize the protein-protein complexes of the target binder pairs (FIG. 6). Next, the crosslinked protein-protein complexes were cleaved from the resin support. Chemical protein cross-linkers are known to a skilled in the art and have been described in the literature such as Hermanson, G. T., Bioconjugate Technique . Bioconjiigate Techniques (Elsevier, 2013), doi: 10.1016/B978-0-12-382239-0.00006-6 hereby incorporated by reference in its entirety. Table 3 provides a non-limiting exemplar}' list of crosslinking targets and reacting groups used for the experiment.

Table 3. A list of crosslinking targets and reacting groups.

According to some embodiments, cleavage of the crosslinked protein-protein complexes from the resin support can be accomplished in two general ways, either by cleavage of the target protein from its protein tag, or by cleavage of the tag reacted ligand form the resin support. The former approach, cleavage from the protein tag, is limited from two standpoints. First it assumes that the crosslinking treatment does not already crosslink the tag domain to the target protein. Second, in a C-terminal tagged construct, cleavage of the target would leave behind the tag and mRNA barcode on the resin following cleavage, and thus would not allow subsequent barcode -barcode coupling. Therefore, the later approach of cleavage scheme that targets the ligand-resin interface is an ideal approach for releasing of the crosslinked protein-protein complexes. The later cleavage scheme can be generalized across various crosslinking schemes, and is compatible with both N and C terminal tagged constructs. Exemplary cleavabie bonds and cleavage conditions are known in the art and have been discussed in Leriche, G., Chishoim, L. & Wagner, A., Cleavabie linkers in chemical biology, Bioorganic Med. Chern. 2Θ, 571-582 (2012) hereby incorporated by reference in its entirety. Tables 4 and 5 provide a non-limiting list of exemplary cleavabie bonds and groups. Table 4, A List of Exemplary Protein Tag Qeavable Groups

Cleavage conditions Qeavable group

Enzymes TEV, trypsin, thrombin, cathepsin B, cathespin D, cathepsin K, caspase

1, matrix metalloproteinase.

Table 5. A List of Exemplary Labile bonds for Ligand-Resin Interfaces

Cleavage conditions 1 Cleavable group

Nucleophiiic/basic Diaikyl dialkoxysilane, cyanoethyl group, sulfome, ethylene reagents glycol} ! disuccinate, 2-N-acyl nitrobenzenesulfonamide, a- thiophenylester, unsaturated vinyl sulfide, sulfonamide after activation, malondialdehyde (MDA)-indole derivative, ievulmoyi 1 ester, hydrazone, acylhydrazone, alkyl thioester.

Reducing reagents J Disulfide bridges, azo compounds.

Photo-irradiation 2-Nitrobenzy] derivatives, phenacy] ester, 8-quinolinyl

benzenesulfonate, coumarin, phosphotriester, bis- 1 arylhydrazome, bimane bi-thiopropionic acid derivatives.

Eiectrophilic/acidic Paramethoxybenzyl derivative, tert-butylcarbamate analogue, reagents diaikyl or diaryl dialkoxysilane, orthoester acetal,

aconityl, hydrazine, β-thiopropionate, phosphoramidate, imine, vinyl ether, polyketal, alkyl 2- 1 (diphenylphosphiiio)henzoate derivatives.

Grganonietaliic and Ailyl ester, 8-hydroxyquinoline ester, picolinate ester.

metal catalyst

Oxidizing reagents 1 Vicinal diols, selenium compounds.

Methods for Target-B inder cBNA amplification and Barcode Coupling

The disclosure provides methods for amplifying mRNA-cDNA duplex conjugated to the crosslinked protein-protein target binder pairs. Generally, it is desirable to amplify both the target and binder barcodes of the mRNA-cDNA duplexes prior to barcode transfer to increase the efficiency of the reaction. According to some embodiments, two solid phase strategies were applied to amplify DNA in situ. In the two dimensional case, an isothermal or thermal cycled PCR. reaction occurs on the solid phase substrate surface, such as silicon or silica, where the PCR primer oligonucleotides are coupled to the surface of the substrate. In the three dimensional case, an isothermal or thermally cycled PCR reaction occurs in a porous solid phase substrate volume. Exemplar}' substrates may include polyacrylamide, polyacryiate, PEG matrix structure, or alternative matrix structures, where the PCR oligonucleotide primers are covalently integrated with the porous substrate.

The disclosure provides isothermal DNA amplification methods that include nicked strand displacement PCR, where the DNA polymerase is PhiX29 DNA Polymerase, Bstl DNA Polymerase, or Kappa Hifi DNA polymerase. The disclosure provides thermal cycled DNA amplification methods that include polymerase chain reaction (PCR), where the DNA polymerase is Taq DNA polymerase, Tth DNA polymerase, Pfu DNA polymerase, Kappa DNA polymerase, Phusion DNA polymerase, Q5 DNA polymerase, or other DNA polymerases known to a skilled in the art.

Methods for Barcode Transfer from Polonies

The disclosure provides methods for barcode transfer from polonies that uses recombinase and integrase mediated UTR barcode transfer where the recombinase and integrase recognition sequences are immediately flanking the barcode sequences in the UTR. Recombinases are sequence specific and have been described in Gaj, T., Sirk, S. J. & Barbas, C. F., Expanding the scope of site-specific recombinases for genetic and metabolic engineering, Biotechnol. Bioeng. Ill, 1-15 (2014) hereby incorporated by reference in its entirety. Table 6 lists non-limiting examples of recombinases and target sites.

Table 6. A List of Recombinases and Target Sites.

Recombinase j Origin Target site

Flp j S. cerevisiae FRT

Tn3 E, coli res site I

The disclosure provides methods for barcode transfer from polonies that uses overlap extension PCR. According to certain embodiments, overlap extension PCR amplifies a shared complimentary⁷ sequence existing at the 3' terminus of the sense or antisense strand of cDNA in the polonies. DNA polymerase used in the overlap extension PCR is Taq DNA polymerase, Tth DNA polymerase, Pfu DNA polymerase, Kappa DNA polymerase, Phusion DNA polymerase, Q5 DNA polymerase, or other DNA polymerases. The disclosure provides methods for barcode transfer from polonies that uses Golden Gate Assembly methods. Golden Gate Assembly can be used for either 3' and or 5' UTR barcodes where a type IIS restriction endonuclease digestion of polonies produces ligation compatible ends thai are located immediately adjacent to the target and binder barcodes. Golden Gate Assembly can be used to couple the target and binder barcodes. The type IIS restriction endonuclease includes, for example, Hgal, Bbvl, BcoDI, BsmAI, BsmFI, Fokl, SfaNI, Bbsl, BfuAI, Bsal, BsmBI, BspMI, BtgZI, Earl, BspQI, Sapl, or other Type IIS DNA endonucleases. DNA ligase includes, for example: Taq DNA Ligase, T7 DNA Ligase, T3 DNA ligase, T4 DNA ligase, E, coli DNA ligase, and other DNA ligases.

The disclosure provides methods for barcode transfer from polonies that uses Gibson Assembly methods. Gibson Assembly can be used for either 3' and or 5' UTR barcodes where sequences immediately adjacent to the target and binder barcodes in the UTR are homologous ends. Enzymes used with Gibson Assembly is a 5' to 3' exonuclease including, for example, T5 exonuclease, T7 exonuclease, Lambda exonuclease, exonuclease VTII or other 5^" to 3^" exonucleases known to a skilled in the art. DNA polymerase used with Gibson Assembly includes, for example, Taq DNA polymerase, Tth DNA polymerase, Pfu DNA polymerase. Kappa DNA polymerase, Phusion DNA polymerase, Q5 DNA polymerase, or other DNA polymerases. DNA ligase used with Gibson Assembly includes, for example, Taq DNA ligase, T7 DNA ligase, T3 DNA ligase, T4 DNA ligase, E. coil DNA Ligase, and other DNA ligases.

The disclosure provides methods for barcode transfer from polonies that uses Splint ligation methods. In Splint ligation, splint oligos containing reverse complimentar - sequences immediately adjacent to both the target and binder barcodes are used to ligate two cDNAs to couple the target and binder barcodes. The li gases used with the Splint ligation are T3 DNA ligase, T7 DNA ligase, E. coli DNA !igase, Taq DNA ligase, or other DNA ligases.

The disclosure provides that the most challenging aspect of the library against library screening method is the recover of a clean map of each binder to its target protein. In the herein disclosed method, the target-binder complex brings two mRN A-cDNA duplex barcodes together in proximity. Methods for joining single molecule DNA sequences, such as proximity ligation and overlap extension PGR are not robust, and can lead to erroneous DNA coupling in trans. The use of polony PGR of interacting molecules to prepare the colocalized Fisseq Amplicons have been described in Gu, L. el al. Multiplex single-molecule interaction profiling of DNA-barcoded proteins. Nature 515, 554-557 (2014) hereby incorporated by- reference in its entirety. The herein disclosed methods use polony PGR to generate colocalized amplicons for subsequent isothermal digestion and ligation. Polony PCR methods as herein disclosed provides an excellent solution for amplification of single molecule sequences, whose subsequent ligation is remarkably robust, and spatially isolated from cross- reactions (FIG. 7).

The disclosure provides methods for in gel ligation to couple target binder barcodes that utilize a type IIS restriction endonuclease to digest the polonies that produces complimentary overhangs between the binder and target species for ligation in the gel.

According to some embodiments, covalent joining of colocalized DNA polonies by Golden Gate ligation produces a fixed length tandem barcode sequence that can be amplified for sequencing or clonal isolation of the desired binder (FIG. 8 and FIG. 9). Interaction Sequencing

According to certain embodiments of the disclosure, the UTR motifs are designed such that when the barcodes are paired in a single cDNA, they can by sequenced by PCR primers using the universal JCT-fwd and JCT-rev sites immediately adjacent to the barcode and UMI sequences (FIG. 10). According to certain embodiments, the in gel iigated cDNAs can be directly prepared for Illumina sequencing by in gel PCR and electroelution or passive elution of the amplicon DNAs from the gel. The process may be done in multiple PC steps.

Protein Binder Sequencing

The disclosure provides that when desired, the protein binder can be sequenced to link the UMI sequence information to the variable protein scaffold sequence. According to certain embodiments, the sequencing involves sequencing from JCT-fwd to the Jet-Rev sequence adjacent the binder coding region (FIG. 10 and FIG. 11). In addition to the UTR motifs designed for interactome sequencing, the UTR region flanking the binder sequence is universal such that the whole binder barcode-barcode pairs can be sequenced and recovered. The DNA in the gel can be directly prepared for Illumina sequencing by in gel PCR and electroelution or passive elution of amplicon DNAs from the gel. The process may be done in multiple PCR steps.

Clonal Recovery Procedure for Binder Species against each Target

The disclosure provides that subject to sequencing results of the barcode -barcode amplicons, a clonal population of each target binding protein can be recovered. According to certain embodiments, clonal recovery procedure has two steps. First, the barcode-barcode binder amplicon is generically amplified as a mixed pool. The mixed pool is distributed across microplates for as many target binding binders as desired (one binder to be recovered per well). A subsequent PCR against the target barcode recovers ail the binders for a particular target. Next, a final PCR against the binder barcode can isolate the individual binder species that bind to a particular target. In certain embodiments, the two PCR steps may be unnecessary if all the binder species are uniquely barcoded.

Methods for Multiplexed Enrichment of High Affinity-Specificity Binding Molecule Libraries

For many applications, a pooled enrichment of binders is sufficient for generating widely applicable reagents. For example, a pool of protoprimer oligos termed OLS (oligo library synthesis) are ordered. This pool of protoprimers is amplified and processed to mature primers. A pool of barcode-barcode-binder amplicons are enriched by PCR and are suitable for downstream modifications for affinity maturation.

Terms and symbols of nucleic acid chemistry, biochemistry, genetics, and molecular biology used herein follow those of standard treatises and texts in the field, e.g., Komberg and Baker, DNA Replication, Second Edition (W.H. Freeman, New York, 1992); Lehninger, Biochemistry, Second Edition (Worth Publishers, New York, 1975); Strachan and Read, Human Molecular Genetics, Second Edition (Wiley-Liss, New York, 1999); Eckstein, editor, Oligonucleotides and Analogs: A Practical Approach (Oxford University Press, New York, 1991); Gait, editor, Oligonucleotide Synthesis: A Practical Approach (1RL Press, Oxford, 1984); and the like.

OTHER EMBODIMENTS

Other embodiments will be evident to those of skill in the art. It should be understood that the foregoing description is provided for clarity only and is merely exemplary. The spirit and scope of the present invention are not limited to the above examples, but are encompassed by the following claims. All publications and patent applications cited above are incorporated by reference in their entirety for all purposes to the same extent as if each individual publication or patent application were specifically and individually indicated to be so incorporated by reference.

Claims

Claims:

1. A method for determining the molecular binding between a target molecule library and a binder molecule library, the method comprising the steps of:

(a) contacting the target molecule library comprising a plurality of mRNA -protein conjugates with the binder molecule library comprising a plurality of mRNA -protein conjugates under conditions where one or more proteins in the target library bind to one or more proteins in the binder library, wherein the mRNA of each mRNA -protein conjugate comprises sequences of an open reading frame (ORF) encoding the conjugated protein and a 3' untranslated region (3' UTR) sequence motif comprising a barcode sequence,

(b) crosslinking the bound target-binder protein pairs to form protein-protein complexes,

(c) transferring the cross-linked protein-protein complexes of the target-binder protein pairs with their respective mRNAs conjugated thereto into a gel support, and

(d) determining the molecular binding of the target and binder protein pairs by determining the sequences of the mRNAs conjugated to the target and binder protein pairs.

2. The method of claim 1, wherein the plurality of mRNA-protein conjugates of the target library are immobilized to a solid support.

3. The method of claim 2, wherein the plurality of mRNA-protein conjugates of the target library are immobilized to a solid support via a protein immobilization tag.

4. The method of claim. 3, wherein the protein immobilization tag is fused with the protein of the mRNA-protein conjugate of the target library.

5. The method of claim 3, wherein the protein immobilization tag forms a covalent bond to a surface attachment ligand on the solid support.

6. The method of claim 5, wherein the surface attachment ligand is cleavable from the solid support.

7. The method of claim 1, wherein the mRNA-protein conjugates are obtained by in vitro transcription of DNA molecules to mRNA molecules, in vitro translation of the transcribed mRNA molecules and mRNA display.

8. The method of claim 7, wherein ribosomal release factors are depleted during in vitro translation.

9. The method of claim 5, wherein each DNA molecule comprises a sequence of an open reading frame (ORF) encoding the conjugated protein of the target library or the binder library and a sequence encoding the 3' untranslated region (3' UTR) sequence motif.

10. The method of claim 9, wherein the 3' UTR sequence motif of the DNA molecule for the target library comprises sequences encoding a polypeptide linker, polystop codons, PGR priming sequences, isothermal PGR priming sequences, an enzyme recognition site, a barcode sequence, and DNA scaffold sequences for in vitro transcription, translation and mRNA display.

11. The method of claim 9, wherein the 3' UTR sequence motif of the DNA molecule for the binder library comprises sequences encoding a polypeptide linker, polystop codons, PGR priming sequences, isothermal PGR priming sequences, an enzyme recognition site, a unique molecule identifier sequence (UM1), a barcode sequence, and DNA scaffold sequences for in vitro transcription, translation and mRNA display.

12. The method of claim 9, wherein a puromycin oligo is attached to the 3' UTR of the in vitro transcribed mRNA for mRNA display.

13. The method of claim 12, further comprising, after the attachment step, degrading non- attached mRNA by a 3' to 5' directed nboexonuclease.

14. The method of claim 13, wherein the nboexonuclease is RNase R, exoribonuclease II, or exonuclease T.

15. The method of claim 12, wherein the attaching is via UV crosslinking or ligation.

16. The method of claim 12, wherein the mRNA is covalently attached to the C-terminus of the encoded protein.

17. The method of claim 1, wherein the mRNA-protein conjugates are passivated by reverse transcription of the mRNAs to form an mRNA-cDNA duplex prior to the contacting step.

18. The method of claim 2, further comprising cleaving the cross-linked protein-protein complexes of the target-binder protein pairs from the solid support.

19. The method of claim. 1, further comprising casting the cross-linked protein-protein complexes of the target-binder protein pairs in a gel.

20. The method of claim 1, further comprising in gel PCR amplification of the niRNA- cDNA duplex conjugated to the target-binder protein pairs.

21. The method of claim 20, wherein the amplified PCR products are sequenced through the ORF and barcode regions to determine the molecular identity of the target-binder protein pairs.

22. The method of claim 1, wherein the barcode sequence is unique to the ORF within the mRNA.

23. The method of claim 22, further comprising in gel PCR amplification of the unique barcoded region in the mRNA-cDNA duplex conjugated to the target-binder protein pairs.

24. The method of claim 23, wherein the PCR amplicons are joined.

25. The method of claim 24, wherein the joining is carried out via restriction enzyme digestion of the enzyme recognition site and ligation.

26. The method of claim 25, wherein the restriction enzyme is a type IIS endonuc lease.

27. The method of claim 24, wherein the joining is carried out via recombination at the enzyme recognition sites.

28. The method of claim 24, wherein the joining is carried out via overlap extension PCR.

29. The method of claim 24, wherein the molecular identity of the target-binder protein pairs of the protein-protein complexes is determined by sequencing the unique barcoded regions in joined PGR ampiicons.

30. The method of claim 1 , wherein the barcode sequence is an orthogonal PGR priming sequence.

31 . The method of claim 30, wherein the barcode sequence is a set of mutually orthogonal 20 base pair PGR priming sequence.

32. The method of claim 10, wherein the enzyme recognition site is a type IIS endonuclease recognition site.

33. The method of claim 11, wherein the enzyme recognition site is a type IIS endonuclease recognition site.

34. The method of claim 7, wherein the in vitro transcription and translation are carried out in prokaryotic or eukaryotic systems.

35. The method of claim 24, wherein the joined PGR ampiicons are further amplified for sequencing and clonal isolation of the desired binder.

36. The method of claim 35, wherein the clonal isolation of the desired binder is carried out first by selecting all the binders that bind to a target and second by selecting a single binder that binds to the target.

37. The method of claim 1 , wherein the steps can be repeated to enrich for affinity maturation of a target molecule against a plurality of diversified binder molecules.

38. The method of claim 1, wherein the ORF within the mRNA can be barcoded in either the 5 ' UTR or 3' UTR.

39. The method of claim 1, wherein the method can be used for determining the molecular binding of D A barcoded chemical libraries.