CN107828876B

CN107828876B - Application of tagged proteins that can covalently bind to substrates in CLIP

Info

Publication number: CN107828876B
Application number: CN201710979254.6A
Authority: CN
Inventors: 俞洋; 顾嘉琦
Original assignee: Institute of Biophysics of CAS
Current assignee: Institute of Biophysics of CAS
Priority date: 2017-10-19
Filing date: 2017-10-19
Publication date: 2021-05-28
Anticipated expiration: 2037-10-19
Also published as: CN107828876A

Abstract

本发明公开了一种可共价结合底物的标签蛋白在CLIP中的应用。本发明提供了使用可共价结合底物的标签融合蛋白通过变性剂纯化方式在CLIP技术中的应用。本发明使用可共价结合底物的标签的融合蛋白通过蛋白变性试剂洗涤的方式得到可以用于测序的RNA文库，结果证明可以很好地重复以前的文献发表的实验结果。在操作上省略了传统CLIP步骤中需要通过转膜并且切胶回收片段的步骤。该方法通过可共价结合底物的标可和固定于介质上的特异性结合物形成共价键的特性，将传统上不能在磁珠上直接进行的剧烈洗涤方法加入到实验操作步骤中。这样在操作上比较简便，节省了操作时间，优化了CLIP步骤，减少了操作过程中可能会造成的损失。The invention discloses the application of a tag protein that can covalently bind to a substrate in CLIP. The present invention provides the application of a tag fusion protein that can be covalently bound to a substrate in CLIP technology by means of purification by a denaturant. The present invention uses the fusion protein of the tag that can covalently bind to the substrate to obtain an RNA library that can be used for sequencing by washing with a protein denaturing reagent, and the result proves that the experimental results published in the previous literature can be well repeated. In operation, the steps of transferring the membrane and cutting the gel to recover the fragments in the traditional CLIP step are omitted. This method incorporates the drastic washing method that cannot be directly performed on magnetic beads traditionally into the experimental operation steps through the property of forming a covalent bond between the label that can be covalently bound to the substrate and the specific binding substance immobilized on the medium. In this way, the operation is relatively simple, the operation time is saved, the CLIP steps are optimized, and the loss that may be caused during the operation is reduced.

Description

Application of tag protein capable of covalently binding substrate in CLIP

Technical Field

The invention belongs to the field of RNA biology and molecular biology, and relates to application of a tag protein capable of being covalently bound with a substrate in performing a CLIP (cross-linking immunoprecipitation) experiment, in particular to a CLIP technology in which a fusion protein with a tag capable of being covalently bound with a substrate is used, and a protein-RNA covalent cross-linked complex is purified in a denaturing reagent washing manner, so that the steps of SDS-PAGE (SDS-polyacrylamide) electrophoresis, cellulose acetate membrane transfer or isotope labeling and the like are omitted, and a cDNA library for sequencing is finally obtained.

Background

Methods of molecular biology to study RNA binding proteins and the RNA to which they bind, briefly, by purifying complexes of RNA and RNA binding protein. The method of purifying the complex may be either precipitation of RNA or precipitation of RNA-binding protein. Because of the limitations of the techniques for precipitating RNA, researchers have developed more ways to obtain complexes of RNA and RNA-binding proteins by precipitating RNA-binding proteins. Initially, the technique of co-immunoprecipitation (RIP) of RNA-binding proteins was widely used by researchers in the field of studying the interaction of RNA and proteins.

RIP technology first precipitates a complex of the corresponding RNA and RNA binding protein in a tissue or cell by an antibody directed against the target protein. The RNA is then purified and isolated and informative by downstream analytical techniques. RIP technology is similar to the general idea of ChIP technology, but differs due to the objects studied. There are many variations of the steps involved. In addition, RNA is easy to degrade in vitro, so that higher requirements are placed on reagents used for RNA manipulation and experiments. The RIP technology is combined with the microarray technology to form the RIP-Chip technology; the combination with high-throughput sequencing technology becomes RIP-seq technology. The RIP technique has certain limitations due to the milder washing of protein and nucleic acid complexes. The results of false positives in RIP data are numerous and no specific binding site information for RNA and protein can be known [ see Jayaseelan, S., F. Doyle, and S.A. Tenebaum, Profiling post-transcriptional network mRNA subsets using RIP-Chip and RIP-seq. methods,2014.67(1): p.13-9 ].

In 2003, the Darnell laboratory first published a related article for studying RNA and protein interactions by means of uv cross-linking. The initial CLIP protocol combines the RIP and ChIP protocols. Prior to immunoprecipitation of RNA and RNA protein complexes, a method of studying the interaction of DNA and protein, i.e., UV crosslinking, was used. Due to the limitations of the technology at the time, they could not apply high throughput sequencing technologies and therefore the data obtained were limited. Even so, they identified neuro-specific RNA binding proteins and the RNA binding sites for the cleavage factors NOVA1 and NOVA2 in mouse brain tissue using the CLIP technique [ see Ule, J., et al, CLIP intermediates NOVA-regulated RNA networks in the brain science,2003.302(5648): p.1212-5 ]. The conclusions reached by CLIP were validated by knocking out the relevant genes in mouse brain tissue. In 2005, they used high throughput sequencing technology, and further performed deep sequencing on the library obtained by CLIP, and preliminarily drawn the interaction network of the whole genome protein and RNA of Nova protein according to the sequencing result. They refer to the technology combining CLIP and high throughput sequencing as HITS-CLIP technology [ see Ule, j., et al, Nova regulations bridge-specific partitioning to shape the map. nat Genet,2005.37(8): p.844-52 ]. At the same time, a subject group studied the whole genome RNA and protein interaction network of RbFox2 protein using CLIP in combination with high throughput sequencing technology, and named CLIP-seq [ see Yeo, G.W., et al., An RNA code for the FOX2 partitioning modified by mapping RNA-protein interactions in stem cells. Nat Struct Mol Biol,2009.16(2): p.130-7 ]. Since then high throughput sequencing technologies are increasingly being applied to the study of RNA and protein whole genome interaction networks. In 2009, the interactions between Ago2 protein-bound miRNA-mRNA were analyzed by the Darnell laboratory through CLIP-seq technology to obtain genome-wide information of the interactions between miRNA-mRNA [ see Chi, s.w., et al, Argonaute HITS-CLIP codes microRNA-mRNA interaction maps. nature,2009.460(7254): p.479-86 ]. In the same year, Xue, Y et al have studied the mechanism of variable cleavage of mRNA precursors in the nucleus by the CLIP technique for PTB proteins [ see Xue, Y., et al, Genome-wide analysis of PTB-RNA interactions a specific used by the genetic cleavage to modulation expression or cleavage. mol Cell,2009.36(6): p.996-1006 ].

The CLIP technology provides some improvements over the RIP technology. First, both cell samples and tissue samples need to be treated by ultraviolet irradiation. Covalent bonds are formed between the RNA and the protein by UV irradiation. The energy required to open the covalent bond is high and the complex of protein and RNA can withstand intensive purification processes. Thus, the use of ultraviolet light can improve the signal-to-noise ratio of the data obtained from sequencing. Meanwhile, UV cross-linking does not cause the formation of chemical bridges between macromolecules and macromolecules, interfering with signal-to-noise ratio, compared to formalin (formaldehyde) cross-linking used in ChIP. UV cross-linking only crosslinks within a distance of a few angstroms, i.e., only crosslinks between proteins and interacting nucleic acids, forming covalent bonds. Second, rnase digestion was introduced in the CLIP procedure. Because RNA that is not protected by proteins will be or is more readily digested by RNAses, a mild RNAse digestion reaction can degrade the portion of RNA that is not bound to proteins, leaving only the portion of RNA that is bound to proteins. The truncated RNA is easier to transcribe and sequence and to perform domain analysis, and furthermore, precipitation of unnecessary protein RNA protein complexes is avoided. Because of this, this nuclease treatment is also added to the subsequent RIP manipulation step. The RNA digestive enzymes reported in the literature include RNase A, RNase T1, RNase I and the like. Their applications have benefits and disadvantages. Although the RNA in the complex is truncated, there is some interference with the results because there is no step to remove the RNase during subsequent purification. Finally, samples of CLIP can withstand washing under vigorous conditions. Such as multiple washes of PBST, SDS-PAGE gel purification, transfer to cellulose acetate membranes, and the like. These steps all remove proteins that are non-specifically bound to the antibody, while reducing unbound and non-cross-linked RNA and reducing the background of the data.

The CLIP technology is developed based on ChIP and RIP technology. The conventional CLIP technique is as follows: after cell culture, proteins and RNA were crosslinked by uv irradiation. After cell lysis, the complex is subjected to RNase treatment to degrade the unprotected portion of the protein and trim the RNA length to uniformity. The protein is then immunoprecipitated in vitro by specific antibodies, precipitating complexes of RNA and protein. After gentle washing, an RNA linker was added to the 3' end of the RNA. Labeling of 5' end of RNA with isotope³²And P. The protein and RNA complexes were then denatured, run on SDS-PAGE gels, and blotted onto nitrocellulose membranes. Because the nitrocellulose membrane is characterized by binding to proteins only, but not to RNA. After membrane transfer, the band carrying the isotopically labeled protein and RNA complexes is cleaved by development, and the RNA and protein complexes are extracted. Subsequently, the protein was digested with proteinase K to purify the RNA. Then a linker sequence is added to the 5' end of the RNA. The RNA is reverse transcribed into cDNA by means of a linker sequence. The cDNA is then amplified by PCR techniques. The resulting cDNA library can be subjected to high throughput sequencing, data obtained and analyzed [ see Ule, J., et al, CLIP identifications Nova-modulated RNA networks in the library, science,2003.302(5648): p.1212-5 ].

After the CLIP technology and the high throughput sequencing technology are combined, many new CLIP technologies are developed. One of the widely used techniques is PAR-CLIP (Photoactive Nitrogen-enhanced CLIP) [ see Hafner, M., et al, transfer-with identification of RNA-binding protein and microRNA target sites by PAR-CLIP. cell,2010.141(1): p.129-41 ]. PAR-CLIP technology changes the cell culture technology. 4SU was added at the time of cell culture, so that the U site of nascent transcripts or homeostatic RNA was replaced by 4 SU. The wavelength of the ultraviolet light used for crosslinking also changes from 254nm to 365 nm. The efficiency of crosslinking is theoretically improved. Different nucleases were also used to trim the length of the RNA. The position of 4SU in the resulting cDNA library will undergo C → T conversion and can be identified during the course of the signal generation process. The PAR-CLIP technology is characterized in that 4-thiouridine (4 SU) or 6-thioguanosine (6 SG) is exogenously added in the cell culture process, so that U or G is marked, and meanwhile, ultraviolet irradiation is selectively carried out by using long wavelength (generally UV365 nm). This can improve the efficiency of crosslinking proteins and nucleic acids. 4SU is a more commonly used agent because it increases the efficiency of UV cross-linking and the frequency of uracil and protein binding in nucleic acids in vivo is extremely high. Replacement of U in RNA by 4SU results in the conversion of T to C at the time of reverse transcription. This transition often serves as an indicator of the crosslinking site and is also often used as a signal to subtract out the non-crosslinked background. However, there are also many limitations to the 4SU used in PAR-CLIP. First, 4SU can only be used in cell culture, and has not been reported in tissues, animals or clinical specimens. Second, 4SU is toxic to cells. After adding 4SU to the medium, the cells were cultured for 24 to 48 hours. During this time, 4SU may have an effect on the metabolism of the cell, leading to apoptosis of the cell. Finally, although it has been reported that the mRNA library obtained after addition of 4SU has not changed significantly, its unknown effects have not been discovered, and thus their use requires caution.

Following the PAR-CLIP technique, the iCLIP (induced-nuclear-particle resolution CLIP) technique has emerged [ see Konig, J., et al., iCLIP developments of the function of hnRNP composites in utilization of induced nuclear particle resolution. Nat Structure Mol Biol,2010.17(7): p.909-15 ]. The iCLIP technique refers to a single base resolution CLIP technique. In the traditional CLIP technology, when RNA is reverse transcribed into cDNA, reverse transcriptase usually cannot successfully transcribe when it runs to the cross-linking site. Since the covalent bond formed by the mutual cross-linking of the amino acid residue of the RNA-binding protein and the RNA molecule is not easily opened after the RNA-binding protein is degraded, the reverse transcriptase may be separated from the template after steric hindrance, and thus the site becomes an obstacle to reverse transcription during reverse transcription. Although such a barrier exists, reverse transcriptase sometimes bypasses this barrier to reverse transcription. The result of this may be a mutation or deletion of the cDNA at these sites, but more often termination of the reverse transcription. Although reverse transcription is not complete, these deleted and terminated fragments may provide information on the site of crosslinking. However, in the traditional CLIP procedure, PCR amplification requires the addition of linker sequences at both the 5 'and 3' ends of the cDNA. While the 5 'and 3' linker sequences of the cDNA are derived from reverse transcription of the RNA template. Thus, the cDNA thus reverse transcribed is susceptible to deletion of the 5' linker sequence, thus leading to failure of the library construction. To improve this step, the linker was added directly to the reverse transcribed primer in the iCLIP protocol. The cDNA thus obtained is subjected to cyclization and enzymatic cleavage, and linker sequences can be added to both ends of the cDNA naturally. This avoids loss of cDNA information due to transcription failure. The iCLIP technology changes the traditional library building mode. Most of cDNA of the reverse transcription reaction is stopped at a cross-linking position by adopting a cyclization mode. Thus, after the loop is cut, the beginning of the 5' end of the sequence is the position where the cross-linking is halted. iCLIP can be used for single nucleotide resolution data analysis.

Although the iCLIP technique improves the efficiency of cDNA library construction, reduces the operating time of the CLIP technique and reduces the likelihood of failure. However, the efficiency of the cyclase in this step is extremely critical since the circularization of single stranded cDNA is required for its pooling. If the efficiency of the cyclase is low, the composition of the whole cDNA library is affected, and the sequencing success rate is even affected. To improve this, in 2016, Gene, W, Y laboratories optimized the library building procedure, invented the eCLP (enhanced CLIP) technology [ see Van Nostrand, E.L., et al, Robust transfer-with discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP) ], 2016.13(6): p.508-14 ]. The eCLIP technology does not change the operation steps of the whole CLIP, but changes in library establishment. They refer to the library construction procedure for iCLIP, but do not circularize after reverse transcription, but add linker sequences directly to the 3' end of the cDNA. Then, the cDNA library is amplified by the PCR step. They indicated that the library construction method of eCILP increased the efficiency of cDNA construction and reduced the probability of failure of cDNA library.

In addition, the advent of irCLIP (intrinsic-CLIP) technology has replaced the traditional use of isotopes in CLIP technology. irCLIP technology uses a 3' RNA linker with a Biotin label. This allows the location of the RNA to be located on the fluorescence scanner [ see Zarnegar, B.J., et al, irCLIP platform for efficacy characterization of protein-RNA interactions. Nat Methods,2016.13(6): p.489-92 ].

Although the CLIP technique has been improved many times, the procedure requires the use of SDS-PAGE electrophoresis and cellulose acetate membrane transfer. The two steps are long in time consumption, high in operation requirement and easy to lose samples. The greatest starting point for improving the CLIP technology is therefore to reduce the overall operating time, reducing the operating steps. During the procedure, if the overall efficiency is improved, the number of cells initially used may also be reduced. In fact, the reduction of the cell or tissue dosage can enable scientific researchers to carry out CLIP experiments on cells or tissues which are difficult to obtain. The study of the interaction of target proteins and RNA in cells at an early embryonic stage or at some stage in the cell differentiation cycle becomes possible.

Disclosure of Invention

The invention aims to provide a novel application of a tag protein capable of being covalently bound with a substrate. The tag protein capable of covalently binding to a substrate herein refers to a tag protein which can react with a substrate to form a covalent bond and cannot be dissociated after binding.

The invention provides a new application of a tag protein capable of being covalently bound with a substrate, in particular to an application of the tag protein capable of being covalently bound with the substrate in an ultraviolet cross-linking co-immunoprecipitation technology.

In the application, the tag protein capable of being covalently bound with the substrate is used as a fusion tag of a target protein in the ultraviolet crosslinking co-immunoprecipitation technology.

Further, in the present invention, the application is for studying the interaction between RNA and RNA-binding protein (i.e., the target protein), and the UV cross-linked co-immunoprecipitation technology purifies RNA-protein complex in a non-gel-running and non-membrane-transfer manner. That is, in contrast to the conventional CLIP technology, the RNA-protein complex of the present invention is purified without passing through SDS-polyacrylamide gel electrophoresis, without passing through a membrane transfer of a cellulose acetate membrane, and without using an isotope.

Further, in the present invention, a fusion protein formed by the tag protein capable of covalently binding to a substrate and the target protein (i.e., the RNA-binding protein) is covalently bound to a specific binding substance of the tag protein capable of covalently binding to a substrate with a medium (e.g., magnetic beads), and then the RNA-protein complex is purified by means of a protein denaturing wash.

Wherein, the protein denaturation washing refers to that hydrogen bonds in protein molecules are destroyed by physical means (such as heating, shaking and the like) and/or chemical means (such as guanidine hydrochloride, urea, sodium dodecyl sulfate, Trizol reagent and the like), so that the original higher structure is opened after the protein is denatured. Because covalent bonds are formed between the tag proteins capable of covalently bonding substrates and the specific conjugates of the tag proteins capable of covalently bonding substrates fixed on a medium (such as magnetic beads), RNA and protein cross-links also form covalent bonds, the complexes connected by the covalent bonds are still coupled on the medium (such as magnetic beads) and maintain the integrity of the sequence during the washing process, and non-covalently bonded substances, such as impurities, non-specifically bonded nucleic acids and the like, are released out of the complexes and washed clean.

Accordingly, the invention provides a method for obtaining target RNA of RNA binding protein by using tag protein capable of being covalently bound with substrate and carrying out ultraviolet crosslinking co-immunoprecipitation by a non-gel purification mode.

The invention provides a method for obtaining target RNA of RNA binding protein by using tag protein capable of covalently binding substrate and performing ultraviolet crosslinking co-immunoprecipitation by a non-gel purification mode, specifically, the purification of RNA-protein complex is not performed by SDS-polyacrylamide gel electrophoresis, is not performed by membrane transfer of cellulose acetate membrane, and is not performed by isotope.

The method specifically comprises the following steps:

(1) allowing the receptor cell to express a fusion protein formed by linking a tag protein capable of covalently binding to a substrate and a target protein via a linker peptide;

(2) carrying out ultraviolet irradiation on the receptor cells expressing the fusion protein in the step (1) to form covalent bonds between RNA and protein in the RNA-protein complex for crosslinking, and collecting a cell sample;

(3) lysing the cell sample collected in step (2), and then performing immunoprecipitation through a medium (such as magnetic beads) connected with a specific conjugate of the tag protein capable of covalently binding to the substrate, to obtain a precipitated sample containing a "target RNA-fusion protein-medium";

(4) subjecting the precipitated sample obtained in step (3) to a non-denaturing wash and then to a dephosphorylation treatment (e.g., treatment with alkaline phosphatase);

(5) performing denaturation washing on the sample treated in the step (4);

(6) subjecting the sample treated in step (5) to an enzymatic digestion treatment with a protease capable of specifically recognizing the linker peptide, thereby separating the RNA-protein complex formed by the target protein and the target RNA from the medium;

(7) removing the target protein (e.g., removing the target protein by a protease digestion treatment) from the RNA-protein complex formed by the target protein and the target RNA obtained in step (6), thereby obtaining the target RNA.

Further, the fusion protein in step (1), wherein the tag protein capable of covalently binding to a substrate may be located at the N-terminus or the C-terminus.

Further, the step (1) is as follows: connecting the coding gene of the tag protein capable of being covalently combined with the substrate with the coding gene of the target protein (namely RNA binding protein) through the coding gene of the connecting peptide to obtain a fusion gene; the fusion gene is capable of expressing a fusion protein formed by connecting the tag protein capable of covalently binding a substrate and the target protein by the connecting peptide; introducing the fusion gene into a receptor cell to obtain a recombinant cell; culturing the recombinant cell to express the fusion protein from the introduced fusion gene.

In the method, in the step (3), the step of performing RNase (or other nuclease capable of hydrolyzing RNA) treatment is further included after lysing the cell sample collected in the step (2).

In the present invention, in step (4), the non-denaturing washing is specifically: washing 2 times with PBS buffer containing 0.1% Triton X-100 and 500mM NaCl, and 3 times with PBS buffer containing 0.1% Triton X-100; wherein% represents volume percent.

In the present invention, in the step (5), the denaturing washing may be one or more denaturing washes. In the present invention, two denaturing washes are specific.

More specifically, the first denaturing washing of the two denaturing washes is specifically: alkaline phosphatase was removed with Trizol reagent (Invitrogen); 8M guanidine hydrochloride washes 2 times (5 minutes each); 8M urea washes were 2 times (5 minutes each). The second denaturing washing step specifically comprises: washing 3 times (each time with shaking at 65 ℃ for 5 minutes) with SDS washing solution (formulation: 10% SDS; 50mM Tris-HCl pH 7.0; 1mM EDTA; 1mM DTT; balance water;% represents a percentage by mass, i.e., 10% represents 10g/100 ml); wash 2 times with 8M urea (5 min at room temperature each).

In the method, a step of performing a linker reaction on the 3' end of the RNA on the sample after the first denaturing washing may be further included between the two denaturing washes. This step may, of course, be carried out after successful isolation of the target RNA from the RNA-protein complex.

In the method, the "protease capable of specifically recognizing the linker peptide" may be Tev enzyme, enterokinase, factor Xa, 3C protease of human rhinovirus type 14 (HRV 3C protease or PreScission protease), thrombin, etc., and the linker peptide may be specifically a recognition sequence specific to one or more (e.g., 2 to 3) of these enzymes, respectively. Wherein, the specific sequence of the connecting peptide recognized by the Tev enzyme is Glu-Asn-Leu-Tyr-Phe-Gln-Gly/Ser ("/" represents "or"). The specific sequence of the connecting peptide recognized by the enterokinase is Asp-Asp-Asp-Asp-Lys. The specific sequence of the connecting peptide recognized by the Xa factor is Ile-Glu/Asp-Gly-Arg. The 3C protease (HRV 3C protease or PreScission protease) of human rhinovirus type 14 recognizes the linker peptide with the specific sequence Leu-Glu-Val-Leu-Phe-Gln-Gly-Pro. The specific sequence of the connecting peptide recognized by the thrombin is A-B-Pro-Arg-X-Y (wherein, A and B are hydrophobic amino acids, and X and Y are non-acidic amino acids), and the common recognition sequence is Leu-Val-Pro-Arg-Gly-Ser.

In one embodiment of the invention, the linker peptide is specifically a recognition sequence specific for 3 Tev enzymes in tandem. The specific recognition sequence of the Tev enzyme is Glu-Asn-Leu-Tyr-Phe-Gln-Gly/Ser (the "/" represents "or"). More specifically, the connecting peptide is an amino acid sequence encoded by a DNA fragment shown in 2322-2396 site of sequence 1 in the sequence table. Accordingly, the "protease capable of specifically recognizing the linker peptide" is specifically Tev enzyme. The research of the invention shows that Tev enzyme can tolerate 2M urea condition.

The use of the methods described above for studying protein-RNA interaction is also within the scope of the invention.

The invention further provides a method for identifying a target RNA of an RNA binding protein.

The method for identifying the target RNA of the RNA binding protein, named as GoldClIP, specifically comprises the following steps: obtaining target RNA by the method, constructing a cDNA library, carrying out high-throughput sequencing on the cDNA library, and finally carrying out gene localization according to a sequencing result.

In the present invention, the target RNA is obtained followed by construction of the cDNA library by the method described in the iCLIP literature [ see Konig, J., et al (2011) ] for protein-RNA interactions with induced nuclear reactions. Roughly as follows: RNA reverse transcription and the addition of each sample independent tag sequence, run TEB-Urea denatured gel purification. The primer sequence used for reverse transcription not only comprises an independent tag sequence, but also a linker sequence at the 5 'end, a linker sequence at the 3' end and a BamHI enzyme cutting site between the linker sequences. The resulting single stranded cDNA is circularized by ssDNA cyclase and then a sequence complementary to the primer is annealed to the primer sequence by a PCR instrument. Then, the product was digested with BamHI to obtain a product having a linker sequence at the 5 'end, a tag sequence and a linker sequence at the 3' end. And then carrying out high-throughput sequencing on the obtained cDNA after PCR amplification.

The tag protein which can be covalently bound to a substrate as described above may be any of the following: halo tag protein, CLIP tag protein, SNAP tag protein, Spy tag protein and the like. The Halo tag is a genetically engineered microbial dehalogenase of Rhodococcus rhodochrous from Promega. The substrate of the enzyme is a halogen-containing ligand. The SNAP tag is derived from human alkylguanine-DNA alkyltransferase, and is a DNA repair protein edited by NEB company through genetic engineering. The substrate of the enzyme is benzylguanine. The CLIP tag is derived from the SNAP tag. The substrate of the enzyme is a benzylcytosine derivative. The Spy tag is derived from a short 11 amino acid peptide of the fibronectin-binding protein (FbaB) CnaB2 domain of streptococcus pyogenes. The substrate of the tag is a SpyCatcher protein.

Correspondingly, the specific binding substance of the tag protein capable of being covalently bound with the substrate is any one of the following substances:

(A) when the tag protein capable of being covalently bound to the substrate is a Halo tag protein, the specific conjugate of the tag protein capable of being covalently bound to the substrate is a halogen-containing ligand;

(B) when the tag protein capable of being covalently bound to the substrate is a CLIP tag protein, the specific binder of the tag protein capable of being covalently bound to the substrate is a benzyl cytosine derivative;

(C) when the tag protein capable of being covalently bound to the substrate is a SNAP tag protein, the specific binder of the tag protein capable of being covalently bound to the substrate is benzylguanine;

(D) when the tag protein capable of being covalently bound to the substrate is a Spy tag protein, the specific conjugate of the tag protein capable of being covalently bound to the substrate is a SpyCatcher protein.

In one embodiment of the invention, the tag protein capable of covalently binding to a substrate is specifically a Halo tag protein. More specifically, the amino acid sequence of the Halo tag protein is specifically shown as a sequence 1 in a sequence table.

In one embodiment of the invention, the target protein is specifically a PTB protein. The Halo tag protein is located at the N-terminus of the PTB protein. The receptor cell is a HEK293 cell.

Under normal physiological conditions, the active center of the Halo-tagged protein can only be combined with the halogen-containing ligand in a covalent bond mode, and the covalent bond can not be opened. Thus, when the target protein and the Halo tag are bound, covalent binding can be performed by adding a fluorescent Halo ligand in vitro. If a Halo ligand is attached to a magnetic bead, the Halo-tagged protein can be covalently coupled to the magnetic bead. The molecular weight of the Halo tag is 33kDa, and the Halo tag can be combined at the N terminal or the C terminal of a target protein. The fusion protein can be expressed in a prokaryotic or eukaryotic expression system. The invention utilizes the characteristic that Halo labels can be covalently combined with ligands to design a new CLIP scheme.

The invention relates to the use of Halo-tagged fusion proteins in the CLIP (ultraviolet cross-linked co-immunoprecipitation) technology by means of denaturant purification. The traditional CLIP procedure requires purification of the protein RNA complex by SDS-PAGE electrophoresis, cellulose acetate membrane transfer and isotopes. It takes long time and the sample is easy to lose. The invention uses Halo-tagged fusion proteins to obtain RNA libraries that can be used for sequencing by means of washing with protein denaturing agents. The invention comprises the following technical steps: to the beltCell lines with Halo-tagged fusion proteins were UV cross-linked to collect samples. After cell lysis, the length of RNA was trimmed with nuclease. Then pass through

Beads were immunoprecipitated. The immunoprecipitated RNA is then dephosphorylated and a linker sequence is added to the 3' end of the RNA. Residual RNA contaminants are then removed using a strong protein denaturant wash. And finally, digesting the protein to obtain purified RNA. The RNA is constructed into a cDNA library, and then high-throughput sequencing can be carried out. The present invention is also known as GoldClIP technology (Gel-identified and Ligation-Dependent Cross-Linking and immunopropractination).

The invention takes PTB protein as target protein, verifies the provided GoldClIP technology, and proves that the experimental results published by the previous literature can be well repeated. The step of recovering fragments through film transfer and gel cutting in the traditional CLIP step is omitted in operation. According to the method, a method of violent washing which cannot be directly carried out on the magnetic beads in the prior art is added into an experimental operation step by virtue of the characteristic that a Halo label can form a covalent bond with a ligand of the magnetic beads. Therefore, the operation is simple and convenient, the operation time is saved, the CLIP steps are optimized, and the loss possibly caused in the operation process is reduced.

Drawings

FIG. 1 is a technical flow chart of UV cross-linking co-immunoprecipitation using Halo-tagged fusion proteins by non-gel purification. And transferring the target protein with the Halo label into a cell line by a virus infection method, and performing ultraviolet crosslinking to collect a sample. Cells were lysed and nuclease treated to trim protein-bound RNA. Immunoprecipitation was then performed by magnetic beads attached with Halo ligands. Then, the magnetic beads were washed gently to wash the nonspecific band, and after removing the phosphate group at the 3' end by alkaline phosphatase, the alkaline phosphatase was removed by Trizol reagent, and the first protein denaturation wash was performed. Next, a linker is added to the 3' end of the RNA, and a second denaturing wash is performed by a strong means such as a protein denaturing wash to remove residual RNA contaminants. And carrying out Tev enzyme digestion to release a protein nucleic acid compound, then digesting the protein, and purifying the obtained RNA. The cDNA library was then constructed by methods described in the iCLIP literature and the resulting cDNA was subjected to high throughput sequencing.

FIG. 2 is a map of plasmid MSCV-NHalo-3 xTev-PTB-T2-puro.

FIG. 3 is a graph showing the results of Halo-PTB immunoprecipitation. Halo-PTB UV254 and UV365 sample passage

After immunoprecipitation with Beads, use

Alexa

660Ligand, and detecting by an Odyssey two-color infrared fluorescence imaging system after PAGE electrophoresis. in: proteins with Halo tags prior to immunoprecipitation; ub: unbound after immunoprecipitation

Halo-tagged proteins of Beads; asterisks: Halo-PTB (. about.98 kD); triangular: Halo-YFP (. about.65 kD).

FIG. 4 shows the data obtained by high throughput sequencing of PTB. PTB high throughput data was valid sequence data after pyicoslip software processing. The results of two sets of duplicate experimental data volumes obtained for PTB UV365, PTB UV254 and YFP UV254 are shown in sequence.

Fig. 5 is a whole genome sequence alignment of PTB sequences. PTB UV254, PTB UV365 data are plotted as percentage distribution of data in functional regions of the genome after whole gene alignment.

FIG. 6 is a correlation analysis between duplicate data of PTB. Correlation comparison between two sets of duplicate data for ptb UV 254; correlation comparison between two sets of duplicate data for ptb UV 365.

Fig. 7 is a Motif analysis of PTB data. Ptb UV365rep1 data for the first 5 Motif sequence results; ptb UV365rep2 data for the first 5 Motif sequence results; top 5 Motif sequence results in ptb UV254rep1 data; top 5 Motif sequence results in ptb UV254rep2 data; results of the first 5 Motif sequences in yfp UV254rep1 data; yfp UV254rep2 data for the first 5 Motif sequence results.

Detailed Description

The experimental procedures used in the following examples are all conventional procedures unless otherwise specified.

Materials, reagents and the like used in the following examples are commercially available unless otherwise specified.

The operational steps of the GoldCLIP technique provided by the present invention are generally as follows: and transferring the target protein with the Halo label into a cell line by a virus infection method, and performing ultraviolet crosslinking to collect a sample. Cells were lysed and nuclease treated to trim protein-bound RNA. Immunoprecipitation was then performed by magnetic beads attached with Halo ligands. The beads were then washed gently to wash away non-specific bands, and then alkaline phosphatase was removed by Trizol reagent (first denaturing wash) after removing the phosphate group at the 3' end by alkaline phosphatase. Then, a linker is added to the 3' -end of the RNA, and the remaining RNA contaminants are removed by a vigorous method such as protein denaturation washing (second denaturation washing). The protein was then digested by Tev protease, and the resulting RNA was purified. The cDNA library was then constructed by the methods described in the iCLIP literature. RNA reverse transcription and the addition of each sample independent tag sequence, run TEB-Urea denatured gel purification. The primer sequence used for reverse transcription not only comprises an independent tag sequence, but also a linker sequence at the 5 'end, a linker sequence at the 3' end and a BamHI enzyme cutting site between the linker sequences. The resulting single stranded cDNA is circularized by ssDNA cyclase and then a sequence complementary to the primer is annealed to the primer sequence by a PCR instrument. Then, the product was digested with BamHI to obtain a product having a linker sequence at the 5 'end, a tag sequence and a linker sequence at the 3' end. And then carrying out high-throughput sequencing on the obtained cDNA after PCR amplification. The specific technical flow chart is shown in figure 1.

Example 1 Halo-tagged PTB proteins CLIP experiments in HEK293 cell line

In this example, the application of the GoldCLIP technology provided by the present invention to RNA binding proteins was further investigated by performing CLIP experiments on Halo-tagged PTB proteins in HEK293 cell lines.

(1) Expression of Halo-PTB fusion proteins in HEK293 cell lines

Plasmid MSCV-NHalo-3xTev-PTB-T2-puro with Halo-PTB fusion protein coding gene (Halo tag protein and PTB protein are connected by 3 specific recognition sequences Glu-Asn-Leu-Tyr-Phe-Gln-Gly/Ser of series-connected Tev enzymes, the map of the plasmid is shown in figure 2, the whole sequence is shown in sequence 2 in the sequence table, wherein 1419-2309 bit of the sequence 2 codes the Halo tag protein shown in the sequence 1 in the sequence table, 2322-2342 bit, 2349-2369 bit and 2376-2396 bit code 3 specific recognition sequences of series-connected Tev enzymes, 2598-4193 bit codes Human PTB protein) is transferred into HEK293 cell line, and the cell line is stabilized by Puromycin drug screening. By passing

Alexa

660Ligand (Promega, G8471) examined the expression of Halo-PTB in cell lines.

The experiment also set up controls for expression of Halo-YFP fusion proteins in HEK293 cell lines. Wherein, the plasmid introduced into the HEK293 cell line is obtained by replacing position 2598-4193 of sequence 1 in the sequence table with YFP coding gene (sequence 3), and the rest operations are the same as above.

(2) Harvesting UV-crosslinked cells

Cell lines with Halo-PTB were grown up on 2 plates of 100mm cell culture dishes. After a dish of cells was plated out on a 100mm dish, UV254nm UV cross-linking was performed. After another dish of cells was plated out on 80% 100mm plates, 4SU (final concentration 200. mu.M) was added and incubation was continued for 24 hours, followed by UV365nm UV cross-linking. Cell lines with Halo-YFP were grown up on 1-dish 100mm cell culture dishes. After the cells were plated out on 100mm dishes, UV254nm UV cross-linking was performed. After UV crosslinking, the cells were scraped with a cell scraper, washed twice with pre-cooled PBS and collected in a 15ml centrifuge tube. The collected cells were either placed on ice for direct follow-up experiments or stored at-80 ℃.

(3) By using

Co-immunoprecipitation of Halo-PTB by Beads

Cells were passed through the lysis buffer (formulation: 50mM Tris-HCl pH 7.5; 100mM NaCl; 1mM DTT; 2mM CaCl)₂(ii) a 10% glycerol; 1 Xproteinase inhibitor (Promega)]；0.2％TritonX-100；0.5U/μl Micrococcal Nuclease[NEB](ii) a The balance of water; % indicates volume percentage content), the cells were then broken down on ice by a Dounce tool and allowed to stand on ice for 10 minutes. Then, the reaction mixture was reacted at 37 ℃ for 3 minutes and immediately placed on ice. After addition of 2mM EDTA and 2mM EGTA at final concentration, the mixture was centrifuged at high speed for 10 minutes. Transfer the supernatant to a fresh centrifuge tube, collect the RNA input and input for the western blot and add 300. mu.l

Beads, immunoprecipitated overnight at 4 ℃. Passing through the next day

Alexa

660Ligand (Promega, G8471) to examine the efficiency of immunoprecipitation. Since the Halo tag and the magnetic bead form a covalent bond, the efficiency and effect of the immunoprecipitation can be estimated by comparing the reduction ratio of the expression amount of the Halo tag protein (unbound) in the precipitated sample to the expression amount of the Halo tag protein (input) in the unprecipitated sample. The results are shown in FIG. 3.

(4) On-beads dephosphorylation

Washing with PBST (i.e., non-denaturing wash, specifically "with P containing 0.1% Triton X-100 and 500mM NaCl)BS buffer washing 2 times, containing 0.1% Triton X-100 PBS buffer washing 3 times; wherein% represents a volume percentage content)') incorporates Halo-PTB protein or Halo-YFP

After Beads, they were washed 3 times with 1 XNEB Cutsmart buffer, then 80. mu.l of alkaline phosphatase (CIP) and DNase I reaction system (8. mu.l of 10 XNEB CutSmart; 8. mu.l of 10mM CaCl) were added₂；5μl CIP[NEB,M0290S]；5μl DNase I[Promega]；2μl RNase inhibitor[Promega]；52μl H₂O), using a Thermomixer instrument, at 37 ℃ for 30 minutes (wherein every 3 minutes, 1000rpm mix for 15 seconds).

(5) First protein denaturing wash

After completion of the dephosphorylation reaction, 750. mu.l Trizol LS (Invitrogen), 200. mu.l CHCl were added₃After that, vortex for 15 seconds. The beads were then washed 2 times with 8M guanidine hydrochloride (5 min each), 2 times with 8M urea (5 min each), and finally with PNK buffer (formulation: 20mM Tris-HCl pH 7.0; 10mM MgCl)₂(ii) a 0.1% TritonX-100; the balance of water; % indicates the volume percentage) was washed 3 times.

(6) On-beads 3' RNA linker ligation

Mu.l of RNA Linker premix (formulation: 2. mu.l RNase inhibitor; 2. mu.l of 3' RNA Linker [100 pmol/. mu.l) was added to the beads]；36μl H₂And O. Wherein, the specific sequence of the 3' RNA Linker is as follows: 5 'P/AGGTCGGAAGAGCGGTTCAG/3' ddC/, synthesized by IDT corporation), 55. mu.l of the RNA ligation premix (formulation: 10 μ l 10 XT 4RNA ligase buffer [ NEB]；10μl BSA；10μl 10mM ATP；10μl DMSO；10μl T4RNA ligase[Ambion,AM2141](ii) a 5 μ l of 0.1M DTT). The reaction was carried out overnight at 16 ℃ using a Thermomixer apparatus (15 seconds of mixing at 1000rpm every 3 minutes). The next day, 12. mu.l of the ligation reaction premix (formulation: 1. mu.l of 3' RNA Linker [100 pmol/. mu.l)]；5μl T4RNA ligase[Ambion,AM2141](ii) a 5 μ l 10mM ATP; mu.l of 0.1M DTT) was allowed to continue the reaction at 25 ℃ for 3 hours (15 seconds of mixing at 1000rpm every 3 minutes).

(7) Second protein denaturation wash

The ligation reaction system in the magnetic beads was removed and washed 1 time with PBST and 3 times with SDS wash (formulation: 10% SDS; 50mM Tris-HCl pH 7.0; 1mM EDTA; 1mM DTT; balance water;%, representing% by mass per volume, i.e.10% representing 10g/100ml), each time with a Thermomixer instrument, at 65 ℃ for 5 minutes with shaking at 1000 rpm. Followed by 2 washes with 8M urea, shaking for 5 minutes at room temperature.

(8) Separating PTB protein from magnetic beads through Tev enzyme digestion reaction

The reaction mixture was digested with 1 XTev (formulation: 50mM Tris-HCl pH 8.0; 1mM EDTA; 1mM DTT; 1% Triton X-100; 2M Urea; the remainder is water;% indicates the volume percentage) and washed 3 times. The magnetic beads were resuspended in 200. mu.l of 1 XTev digest, 5. mu.l of RNase inhibitor and 5. mu.l of Tev enzyme (5. mu.g/. mu.l) were added, and the reaction was carried out at 25 ℃ for 2 hours using a Thermomixer apparatus (wherein the mixture was mixed at 1000rpm for 15 seconds every 3 minutes). The reaction was transferred to a new 1.5mL centrifuge tube. The magnetic beads were resuspended in 100. mu.l of 1 XTev digest, 2. mu.l of RNase inhibitor and 2. mu.l of Tev enzyme (5. mu.g/. mu.l) were added, and the reaction was carried out at 30 ℃ for 2 hours using a Thermomixer apparatus (wherein the mixture was mixed at 1000rpm for 15 seconds every 3 minutes). The reaction solution and the previous reaction solution were combined.

(9) Extraction of RNA

Mu.l of PK reaction mixture (formulation: 4. mu.l of 5M NaCl; 20. mu.l of 1M Tris-HCl pH 7.0; 8. mu.l of 10% SDS; 8. mu.l of l H) was added to the combined Tev reaction mixture₂O; 60 μ l proteinase K [ NEB, P8102S]) Water bath at 37 deg.c for 30 min. RNA was extracted with 400. mu.l phenol chloroform solution (Sigma), after vortex centrifugation, the supernatant was transferred to a new 1.5mL centrifuge tube and 50. mu.l 3M NaAc pH 5.5, 1. mu.l glycobilie, 1mL 1:1 (vol/vol) ethanol: isopropanol, standing overnight at-20 ℃. After centrifugation the next day, the pellet was washed 2 times with 75% ethanol and then with 16. mu. l H₂And (4) obtaining an RNA library (namely the collection of the collected target RNA interacting with the PTB protein) after O heavy suspension.

(10) construction of cDNA library

Library construction of cDNA reference iCLIP methods see Konig, J., et al (2011) "iCLIP-transcription-with mapping of protein-RNA interactions with induced nuclear resolution," J Vis Exp (50) ". Briefly, RNA was first reverse transcribed into cDNA using Superscript III (Invitrogen). The reverse transcription primer used was/5 Phos/NN [ Index ] NNNAGATCGGAAGAGCGTGgatCCTGAACCGC, where Index is AACC, ACAA, ATTG, AGGT, CGCC, CCGG, CTAA, CATT, GCCA, GACC, GGTT, GTGG, TCCG, TGCC, TATT, TTAA. The cDNA was then run through 6% TEB-urea gel (Invitrogen), 85nt-200nt of the desired fragment was recovered by cutting the gel and ethanol precipitation overnight. The recovered cDNA was circularized with Circliase II (Epicentre). Then 30. mu.l of oligo animal mix was added to bind the oligo to the sequence of interest by means of PCR annealing. Wherein the oligo sequence is 5'-GTTCAGGATCCACGACGCTCTTCAAAA-3'. Then, the reaction was digested with BamHI-HF (NEB) for 1 hour. Finally the cDNA library was amplified by 2 XPPhusion Mix. The amplification forward primer used was 5'-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-3' and the reverse primer was 5'-CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT-3'.

(11) High throughput sequencing

The obtained cDNA library is sent to a sequencing platform for sequencing, and the main sequencing platforms comprise illumina Hiseq 2500 and illumina Hiseq X10. And selecting SE50 or PE250 and the like for sequencing.

(12) Data processing

The adaptor sequences were filtered by Cutadapt software (v1.10http:// journal. embnet. org/index. php/embnetjournal/article/view/200; cutapt-a AGATCGGAAGAGCGGTTCAG-e 0.2-q 20-m 24) and the sequences of the 3' end low-mass bases were trimmed, and the number of bases of each sequence was removed to be less than 24 bp. Then combining the repeated sequences and taking out the barcode sequence and the random sequence at the 5' end. The clipped data was then searched for peak by the Pyicoclip program (Pyicoclip: -P-value 0.001-region. bed). The results obtained after the Pyicoclip procedure are shown in FIG. 4. As can be seen, two biological replicates of UV365 irradiated Halo-PTB containing cells (experimental group) yielded approximately 10 million peaks following the method of the invention. Two biological replicates of UV254 irradiated Halo-PTB containing cells (experimental group) yielded approximately 20 million peaks by the method of the invention. While two biological replicates containing Halo-YFP cells (control) that were subjected to uv irradiation were subjected to the method of the invention with almost no peak.

Each sequence was then aligned to the human genome using Bowtie (v1.1.2 (REF); Bowtie-f-p 8-v 2-k 1- -best- -sam- -un) (hg 19). The whole genome sequence alignment of the PTB sequences is shown in figure 5. As can be seen, the proportion of the peak in the intron (intron) portion of the Halo-PTB cells, whether irradiated by UV365 or UV254, was greater than 60% of the total peak. The second is the intergenic region, which accounts for about 10% of the total weight.

(13) Data correlation analysis

The correlation between two sets of experiments for Halo-PTB UV254, UV365 and novu data was compared by Pearson correlation coefficient. The results are shown in FIG. 6, which shows the correlation R of the two sets of data for PTB UV254²Correlation R of two sets of data PTB UV365 ═ 0.97²0.99. The experiment is proved to have repeatability and reliable data.

(14) Motifs analysis

The motif in the Peak sequence was found by HOMER software (-p 8-rna-S10-len 5,6, 7). The method is to compare the sequence information with the background. Background information was generated by three sets of random sequences in the same gene region. The results shown in FIG. 7 list the first 5 sequences of the found Motif sequences. The results show that the experiments performed by this method, whether PTB UV254 or PTB UV365, can find the Motif sequence of UC rich and are consistent with the published results [ see Xue, Y., et al, Genome-wide analysis of PTB-RNA interactions a specific used by the genetic partitioning expression. mol Cell,2009.36(6): p.996-1006 ].

The results of this example show that: the CLIP method using the technique of uv cross-linking co-immunoprecipitation using Halo-tagged fusion proteins by non-gel purification can well replicate the experimental results published in the previous literature. The step of recovering fragments through film transfer and gel cutting in the traditional CLIP step is omitted in operation. According to the method, a method of violent washing which cannot be directly carried out on the magnetic beads in the prior art is added into an experimental operation step by virtue of the characteristic that a Halo label can form a covalent bond with a ligand of the magnetic beads. Therefore, the operation is simple and convenient, the operation time is saved, the CLIP steps are optimized, and the loss possibly caused in the operation process is reduced.

<110> institute of biophysics of Chinese academy of sciences

<120> use of tag protein capable of covalently binding substrate in CLIP

<130> GNCLN171827

<160> 3

<170> PatentIn version 3.5

<210> 1

<211> 297

<212> PRT

<213> Artificial sequence

<220>

<223>

<400> 1

Met Ala Glu Ile Gly Thr Gly Phe Pro Phe Asp Pro His Tyr Val Glu

1 5 10 15

Val Leu Gly Glu Arg Met His Tyr Val Asp Val Gly Pro Arg Asp Gly

20 25 30

Thr Pro Val Leu Phe Leu His Gly Asn Pro Thr Ser Ser Tyr Val Trp

35 40 45

Arg Asn Ile Ile Pro His Val Ala Pro Thr His Arg Cys Ile Ala Pro

50 55 60

Asp Leu Ile Gly Met Gly Lys Ser Asp Lys Pro Asp Leu Gly Tyr Phe

65 70 75 80

Phe Asp Asp His Val Arg Phe Met Asp Ala Phe Ile Glu Ala Leu Gly

85 90 95

Leu Glu Glu Val Val Leu Val Ile His Asp Trp Gly Ser Ala Leu Gly

100 105 110

Phe His Trp Ala Lys Arg Asn Pro Glu Arg Val Lys Gly Ile Ala Phe

115 120 125

Met Glu Phe Ile Arg Pro Ile Pro Thr Trp Asp Glu Trp Pro Glu Phe

130 135 140

Ala Arg Glu Thr Phe Gln Ala Phe Arg Thr Thr Asp Val Gly Arg Lys

145 150 155 160

Leu Ile Ile Asp Gln Asn Val Phe Ile Glu Gly Thr Leu Pro Met Gly

165 170 175

Val Val Arg Pro Leu Thr Glu Val Glu Met Asp His Tyr Arg Glu Pro

180 185 190

Phe Leu Asn Pro Val Asp Arg Glu Pro Leu Trp Arg Phe Pro Asn Glu

195 200 205

Leu Pro Ile Ala Gly Glu Pro Ala Asn Ile Val Ala Leu Val Glu Glu

210 215 220

Tyr Met Asp Trp Leu His Gln Ser Pro Val Pro Lys Leu Leu Phe Trp

225 230 235 240

Gly Thr Pro Gly Val Leu Ile Pro Pro Ala Glu Ala Ala Arg Leu Ala

245 250 255

Lys Ser Leu Pro Asn Cys Lys Ala Val Asp Ile Gly Pro Gly Leu Asn

260 265 270

Leu Leu Gln Glu Asp Asn Pro Asp Leu Ile Gly Ser Glu Ile Ala Arg

275 280 285

Trp Leu Ser Thr Leu Glu Ile Ser Gly

290 295

<210> 2

<211> 9099

<212> DNA

<213> Artificial sequence

<220>

<223>

<400> 2

tgaaagaccc cacctgtagg tttggcaagc tagcttaagt aacgccattt tgcaaggcat 60

ggaaaataca taactgagaa tagagaagtt cagatcaagg ttaggaacag agagacagca 120

gaatatgggc caaacaggat atctgtggta agcagttcct gccccggctc agggccaaga 180

acagatggtc cccagatgcg gtcccgccct cagcagtttc tagagaacca tcagatgttt 240

ccagggtgcc ccaaggacct gaaatgaccc tgtgccttat ttgaactaac caatcagttc 300

gcttctcgct tctgttcgcg cgcttctgct ccccgagctc aataaaagag cccacaaccc 360

ctcactcggc gcgccagtcc tccgatagac tgcgtcgccc gggtacccgt attcccaata 420

aagcctcttg ctgtttgcat ccgaatcgtg gactcgctga tccttgggag ggtctcctca 480

gattgattga ctgcccacct cgggggtctt tcatttggag gttccaccga gatttggaga 540

cccctgccca gggaccaccg acccccccgc cgggaggtaa gctggccagc ggtcgtttcg 600

tgtctgtctc tgtctttgtg cgtgtttgtg ccggcatcta atgtttgcgc ctgcgtctgt 660

actagttagc taactagctc tgtatctggc ggacccgtgg tggaactgac gagttctgaa 720

cacccggccg caaccctggg agacgtccca gggactttgg gggccgtttt tgtggcccga 780

cctgaggaag ggagtcgatg tggaatccga ccccgtcagg atatgtggtt ctggtaggag 840

acgagaacct aaaacagttc ccgcctccgt ctgaattttt gctttcggtt tggaaccgaa 900

gccgcgcgtc ttgtctgctg cagcgctgca gcatcgttct gtgttgtctc tgtctgactg 960

tgtttctgta tttgtctgaa aattagggcc agactgttac cactccctta agtttgacct 1020

taggtcactg gaaagatgtc gagcggatcg ctcacaacca gtcggtagat gtcaagaaga 1080

gacgttgggt taccttctgc tctgcagaat ggccaacctt taacgtcgga tggccgcgag 1140

acggcacctt taaccgagac ctcatcaccc aggttaagat caaggtcttt tcacctggcc 1200

cgcatggaca cccagaccag gtcccctaca tcgtgacctg ggaagccttg gcttttgacc 1260

cccctccctg ggtcaagccc tttgtacacc ctaagcctcc gcctcctctt cctccatccg 1320

ccccgtctct cccccttgaa cctcctcgtt cgaccccgcc tcgatcctcc ctttatccag 1380

ccctcactcc ttctctaggc gccggaatta gatccaccat ggcagaaatc ggtactggct 1440

ttccattcga cccccattat gtggaagtcc tgggcgagcg catgcactac gtcgatgttg 1500

gtccgcgcga tggcacccct gtgctgttcc tgcacggtaa cccgacctcc tcctacgtgt 1560

ggcgcaacat catcccgcat gttgcaccga cccatcgctg cattgctcca gacctgatcg 1620

gtatgggcaa atccgacaaa ccagacctgg gttatttctt cgacgaccac gtccgcttca 1680

tggatgcctt catcgaagcc ctgggtctgg aagaggtcgt cctggtcatt cacgactggg 1740

gctccgctct gggtttccac tgggccaagc gcaatccaga gcgcgtcaaa ggtattgcat 1800

ttatggagtt catccgccct atcccgacct gggacgaatg gccagaattt gcccgcgaga 1860

ccttccaggc cttccgcacc accgacgtcg gccgcaagct gatcatcgat cagaacgttt 1920

ttatcgaggg tacgctgccg atgggtgtcg tccgcccgct gactgaagtc gagatggacc 1980

attaccgcga gccgttcctg aatcctgttg accgcgagcc actgtggcgc ttcccaaacg 2040

agctgccaat cgccggtgag ccagcgaaca tcgtcgcgct ggtcgaagaa tacatggact 2100

ggctgcacca gtcccctgtc ccgaagctgc tgttctgggg caccccaggc gttctgatcc 2160

caccggccga agccgctcgc ctggccaaaa gcctgcctaa ctgcaaggct gtggacatcg 2220

gcccgggtct gaatctgctg caagaagaca acccggacct gatcggcagc gagatcgcgc 2280

gctggctgtc gacgctcgag atttccggcg agccaaccac tgaggatctg tactttcaga 2340

gcatcgatga aaatctgtac ttccagggga tcgatgagaa cctgtacttt caggggagcg 2400

cctggtccca cccccagttc gaaaagggcg gcggaagcgg aggaggctcc ggaggttccg 2460

cttggtccca cccgcagttc gagaagaccg gcgccattta tcaaacaagt ttgtacaaaa 2520

aagcaggctc caccatggga accaattcag tcgactggat cctcatggct agcatgactg 2580

gtggacagca aatgggtatg gacggcattg tcccagatat agccgttggt acaaagcggg 2640

gatctgacga gcttttctct acttgtgtca ctaacggacc gtttatcatg agcagcaact 2700

cggcttctgc agcaaacgga aatgacagca agaagttcaa aggtgacagc cgaagtgcag 2760

gcgtcccctc tagagtgatc cacatccgga agctccccat cgacgtcacg gagggggaag 2820

tcatctccct ggggctgccc tttgggaagg tcaccaacct cctgatgctg aaggggaaaa 2880

accaggcctt catcgagatg aacacggagg aggctgccaa caccatggtg aactactaca 2940

cctcggtgac ccctgtgctg cgcggccagc ccatctacat ccagttctct aaccacaagg 3000

agctgaagac cgacagctct cccaaccagg cgcgggccca ggcggccctg caggcggtga 3060

actcggtcca gtcggggaac ctggccttgg ctgcctcggc ggcggccgtg gacgcaggga 3120

tggcgatggc cgggcagagc cctgtgctca ggatcatcgt ggagaacctc ttctaccctg 3180

tgaccctgga tgtgctgcac cagattttct ccaagttcgg cacagtgttg aagatcatca 3240

ccttcaccaa gaacaaccag ttccaggccc tgctgcagta tgcggacccc gtgagcgccc 3300

agcacgccaa gctgtcgctg gacgggcaga acatctacaa cgcctgctgc acgctgcgca 3360

tcgacttttc caagctcacc agcctcaacg tcaagtacaa caatgacaag agccgtgact 3420

acacacgccc agacctgcct tccggggaca gccagccctc gctggaccag accatggccg 3480

cggccttcgg cctttccgtt ccgaacgtcc acggcgccct ggcccccctg gccatcccct 3540

cggcggcggc ggcagctgcg gcggcaggtc ggatcgccat cccgggcctg gcgggggcag 3600

gaaattctgt attgctggtc agcaacctca acccagagag agtcacaccc caaagcctct 3660

ttattctttt cggcgtctac ggtgacgtgc agcgcgtgaa gatcctgttc aataagaagg 3720

agaacgccct agtgcagatg gcggacggca accaggccca gctggccatg agccacctga 3780

acgggcacaa gctgcacggg aagcccatcc gcatcacgct ctcgaagcac cagaacgtgc 3840

agctgccccg cgagggccag gaggaccagg gcctgaccaa ggactacggc aactcacccc 3900

tgcaccgctt caagaagccg ggctccaaga acttccagaa catattcccg ccctcggcca 3960

ctctgcacct ctccaacatc ccgccctcag tctccgagga ggatctcaag gtcctgtttt 4020

ccagcaatgg gggcgtcgtc aaaggattca agttcttcca gaaggaccgc aagatggcac 4080

tgatccagat gggctccgtg gaggaggcgg tccaggccct cattgacctg cacaaccacg 4140

acctcgggga gaaccaccac ctgcgggtct ccttctccaa gtccaccatc tagctcgaga 4200

tatctagacc cagctttctt gtacaaagtg gttcgataaa ttgacgtaag ctagtctaga 4260

cggaattcta ccgggtaggg gaggcgcttt tcccaaggca gtctggagca tgcgctttag 4320

cagccccgct gggcacttgg cgctacacaa gtggcctctg gcctcgcaca cattccacat 4380

ccaccggtag gcgccaaccg gctccgttct ttggtggccc cttcgcgcca ccttctactc 4440

ctcccctagt caggaagttc ccccccgccc cgcagctcgc gtcgtgcagg acgtgacaaa 4500

tggaagtagc acgtctcact agtctcgtgc agatggacag caccgctgag caatggaagc 4560

gggtaggcct ttggggcagc ggccaatagc agctttgctc cttcgctttc tgggctcaga 4620

ggctgggaag gggtgggtcc gggggcgggc tcaggggcgg gctcaggggc ggggcgggcg 4680

cccgaaggtc ctccggaggc ccggcattct gcacgcttca aaagcgcacg tctgccgcgc 4740

tgttctcctc ttcctcatct ccgggccttt cgacctgcag cccaagctta ccatgaccga 4800

gtacaagccc acggtgcgcc tcgccacccg cgacgacgtc cccagggccg tacgcaccct 4860

cgccgccgcg ttcgccgact accccgccac gcgccacacc gtcgatccgg accgccacat 4920

cgagcgggtc accgagctgc aagaactctt cctcacgcgc gtcgggctcg acatcggcaa 4980

ggtgtgggtc gcggacgacg gcgccgcggt ggcggtctgg accacgccgg agagcgtcga 5040

agcgggggcg gtgttcgccg agatcggccc gcgcatggcc gagttgagcg gttcccggct 5100

ggccgcgcag caacagatgg aaggcctcct ggcgccgcac cggcccaagg agcccgcgtg 5160

gttcctggcc accgtcggcg tctcgcccga ccaccagggc aagggtctgg gcagcgccgt 5220

cgtgctcccc ggagtggagg cggccgagcg cgccggggtg cccgccttcc tggagacctc 5280

cgcgccccgc aacctcccct tctacgagcg gctcggcttc accgtcaccg ccgacgtcga 5340

ggtgcccgaa ggaccgcgca cctggtgcat gacccgcaag cccggtgcct gacgcccgcc 5400

ccacgacccg cagcgcccga ccgaaaggag cgcacgaccc catgcatcga taaaataaaa 5460

gattttattt agtctccaga aaaagggggg aatgaaagac cccacctgta ggtttggcaa 5520

gctagcttaa gtaacgccat tttgcaaggc atggaaaata cataactgag aatagagaag 5580

ttcagatcaa ggttaggaac agagagacag cagaatatgg gccaaacagg atatctgtgg 5640

taagcagttc ctgccccggc tcagggccaa gaacagatgg tccccagatg cggtcccgcc 5700

ctcagcagtt tctagagaac catcagatgt ttccagggtg ccccaaggac ctgaaatgac 5760

cctgtgcctt atttgaacta accaatcagt tcgcttctcg cttctgttcg cgcgcttctg 5820

ctccccgagc tcaataaaag agcccacaac ccctcactcg gcgcgccagt cctccgatag 5880

actgcgtcgc ccgggtaccc gtgtatccaa taaaccctct tgcagttgca tccgacttgt 5940

ggtctcgctg ttccttggga gggtctcctc tgagtgattg actacccgtc agcgggggtc 6000

tttcatgggt aacagtttct tgaagttgga gaacaacatt ctgagggtag gagtcgaata 6060

ttaagtaatc ctgactcaat tagccactgt tttgaatcca catactccaa tactcctgaa 6120

atagttcatt atggacagcg cagaaagagc tggggagaat tgtgaaattg ttatccgctc 6180

acaattccac acaacatacg agccggaagc ataaagtgta aagcctgggg tgcctaatga 6240

gtgagctaac tcacattaat tgcgttgcgc tcactgcccg ctttccagtc gggaaacctg 6300

tcgtgccagc tgcattaatg aatcggccaa cgcgcgggga gaggcggttt gcgtattggg 6360

cgctcttccg cttcctcgct cactgactcg ctgcgctcgg tcgttcggct gcggcgagcg 6420

gtatcagctc actcaaaggc ggtaatacgg ttatccacag aatcagggga taacgcagga 6480

aagaacatgt gagcaaaagg ccagcaaaag gccaggaacc gtaaaaaggc cgcgttgctg 6540

gcgtttttcc ataggctccg cccccctgac gagcatcaca aaaatcgacg ctcaagtcag 6600

aggtggcgaa acccgacagg actataaaga taccaggcgt ttccccctgg aagctccctc 6660

gtgcgctctc ctgttccgac cctgccgctt accggatacc tgtccgcctt tctcccttcg 6720

ggaagcgtgg cgctttctca tagctcacgc tgtaggtatc tcagttcggt gtaggtcgtt 6780

cgctccaagc tgggctgtgt gcacgaaccc cccgttcagc ccgaccgctg cgccttatcc 6840

ggtaactatc gtcttgagtc caacccggta agacacgact tatcgccact ggcagcagcc 6900

actggtaaca ggattagcag agcgaggtat gtaggcggtg ctacagagtt cttgaagtgg 6960

tggcctaact acggctacac tagaagaaca gtatttggta tctgcgctct gctgaagcca 7020

gttaccttcg gaaaaagagt tggtagctct tgatccggca aacaaaccac cgctggtagc 7080

ggtggttttt ttgtttgcaa gcagcagatt acgcgcagaa aaaaaggatc tcaagaagat 7140

cctttgatct tttctacggg gtctgacgct cagtggaacg aaaactcacg ttaagggatt 7200

ttggtcatga gattatcaaa aaggatcttc acctagatcc ttttaaatta aaaatgaagt 7260

tttaaatcaa tctaaagtat atatgagtaa acttggtctg acagttacca atgcttaatc 7320

agtgaggcac ctatctcagc gatctgtcta tttcgttcat ccatagttgc ctgactcccc 7380

gtcgtgtaga taactacgat acgggagggc ttaccatctg gccccagtgc tgcaatgata 7440

ccgcgagacc cacgctcacc ggctccagat ttatcagcaa taaaccagcc agccggaagg 7500

gccgagcgca gaagtggtcc tgcaacttta tccgcctcca tccagtctat taattgttgc 7560

cgggaagcta gagtaagtag ttcgccagtt aatagtttgc gcaacgttgt tgccattgct 7620

acaggcatcg tggtgtcacg ctcgtcgttt ggtatggctt cattcagctc cggttcccaa 7680

cgatcaaggc gagttacatg atcccccatg ttgtgcaaaa aagcggttag ctccttcggt 7740

cctccgatcg ttgtcagaag taagttggcc gcagtgttat cactcatggt tatggcagca 7800

ctgcataatt ctcttactgt catgccatcc gtaagatgct tttctgtgac tggtgagtac 7860

tcaaccaagt cattctgaga atagtgtatg cggcgaccga gttgctcttg cccggcgtca 7920

atacgggata ataccgcgcc acatagcaga actttaaaag tgctcatcat tggaaaacgt 7980

tcttcggggc gaaaactctc aaggatctta ccgctgttga gatccagttc gatgtaaccc 8040

actcgtgcac ccaactgatc ttcagcatct tttactttca ccagcgtttc tgggtgagca 8100

aaaacaggaa ggcaaaatgc cgcaaaaaag ggaataaggg cgacacggaa atgttgaata 8160

ctcatactct tcctttttca atattattga agcatttatc agggttattg tctcatgagc 8220

ggatacatat ttgaatgtat ttagaaaaat aaacaaatag gggttccgcg cacatttccc 8280

cgaaaagtgc cacctgacgt ctaagaaacc attattatca tgacattaac ctataaaaat 8340

aggcgtatca cgaggccctt tcgtctcgcg cgtttcggtg atgacggtga aaacctctga 8400

cacatgcagc tcccggagac ggtcacagct tgtctgtaag cggatgccgg gagcagacaa 8460

gcccgtcagg gcgcgtcagc gggtgttggc gggtgtcggg gctggcttaa ctatgcggca 8520

tcagagcaga ttgtactgag agtgcaccat atgcggtgtg aaataccgca cagatgcgta 8580

aggagaaaat accgcatcag gcgccattcg ccattcaggc tgcgcaactg ttgggaaggg 8640

cgatcggtgc gggcctcttc gctattacgc cagctggcga aagggggatg tgctgcaagg 8700

cgattaagtt gggtaacgcc agggttttcc cagtcacgac gttgtaaaac gacggcgcaa 8760

ggaatggtgc atgcaaggag atggcgccca acagtccccc ggccacgggg cctgccacca 8820

tacccacgcc gaaacaagcg ctcatgagcc cgaagtggcg agcccgatct tccccatcgg 8880

tgatgtcggc gatataggcg ccagcaaccg cacctgtggc gccggtgatg ccggccacga 8940

tgcgtccggc gtagaggcga ttagtccaat ttgttaaaga caggatatca gtggtccagg 9000

ctctagtttt gactcaacaa tatcaccagc tgaagcctat agagtacgag ccatagataa 9060

aataaaagat tttatttagt ctccagaaaa aggggggaa 9099

<210> 3

<211> 720

<212> DNA

<213> Artificial sequence

<220>

<223>

<400> 3

atggtgagca agggcgagga gctgttcacc ggggtggtgc ccatcctggt cgagctggac 60

ggcgacgtaa acggccacaa gttcagcgtg tccggcgagg gcgagggcga tgccacctac 120

ggcaagctga ccctgaagtt catctgcacc accggcaagc tgcccgtgcc ctggcccacc 180

ctcgtgacca ccttcggcta cggcctgcaa tgcttcgccc gctaccccga ccacatgaag 240

ctgcacgact tcttcaagtc cgccatgccc gaaggctacg tccaggagcg caccatcttc 300

ttcaaggacg acggcaacta caagacccgc gccgaggtga agttcgaggg cgacaccctg 360

gtgaaccgca tcgagctgaa gggcatcgac ttcaaggagg acggcaacat cctggggcac 420

aagctggagt acaactacaa cagccacaac gtctatatca tggccgacaa gcagaagaac 480

ggcatcaagg tgaacttcaa gatccgccac aacatcgagg acggcagcgt gcagctcgcc 540

gaccactacc agcagaacac ccccatcggc gacggccccg tgctgctgcc cgacaaccac 600

tacctgagct accagtccgc cctgagcaaa gaccccaacg agaagcgcga tcacatggtc 660

ctgctggagt tcgtgaccgc cgccgggatc actctcggca tggacgagct gtacaagtaa 720

Claims

1. A method for obtaining the target RNA of RNA-binding protein by using a tag protein that can be covalently bound to a substrate and performing UV cross-linking co-immunoprecipitation by non-running gel non-transfer membrane purification mode, comprising the following steps:

(1) Expressing a fusion protein in the recipient cell, the fusion protein is a fusion protein formed by linking the Halo-tagged protein that can covalently bind to the substrate and the target protein by a linking peptide;

The connecting peptide is the specific recognition sequence of Tev enzyme;

(2) irradiating the recipient cells expressing the fusion protein in step (1) with ultraviolet light to form covalent bonds between the RNA and the protein in the RNA-protein complex to cross-link, and collect cell samples;

(3) Lyse the cell sample collected in step (2), and then perform immunoprecipitation through the medium linked with the specific binder of the tag protein that can be covalently bound to the substrate to obtain a "target RNA-fusion protein". - Precipitation sample of "medium"; the specific binding partner of the tag protein that can covalently bind to the substrate is a ligand containing halogen;

(4) Perform non-denaturing washing on the precipitated sample obtained in step (3), and then perform dephosphorylation treatment;

The non-denaturing washing is as follows: wash twice with PBS buffer containing 0.1% Triton X-100 and 500 mM NaCl; wash three times with PBS buffer containing 0.1% Triton X-100; wherein % represents volume percentage ;

(5) denaturing and washing the sample processed in step (4);

The denaturing washing is two denaturing washings;

The first denaturing washing in the two denaturing washings is: removing alkaline phosphatase with Trizol reagent; washing twice with 8M guanidine hydrochloride for 5 minutes each time; washing with 8M urea twice for 5 minutes each time; The second denaturing washing in the first denaturing washing is: washing with SDS washing solution for 3 times, shaking at 65°C for 5 minutes each time; washing with 8M urea at room temperature for 2 times, 5 minutes each time;

The SDS washing solution is 10% SDS;

(6) The sample treated in step (5) is digested with a protease capable of specifically recognizing the connecting peptide, so that the RNA-protein complex formed by the target protein and the target RNA is removed from the medium; Separated from above; the "protease capable of specifically recognizing the connecting peptide" is Tev enzyme;

(7) removing the target protein from the RNA-protein complex formed by the target protein and the target RNA obtained in step (6), thereby obtaining the target RNA.

2 . The method according to claim 1 , wherein in step (3), the medium is magnetic beads. 3 .

3. The application of the method described in claim 1 or 2 in studying the interaction between protein and RNA.

4. a method for identifying the target RNA of RNA binding protein, comprising the steps: utilize the method described in claim 1 or 2 to obtain target RNA, then construct a cDNA library, then carry out high-throughput sequencing to the cDNA library, and finally according to Sequencing results for gene mapping.