Background
Methods of molecular biology to study RNA binding proteins and the RNA to which they bind, briefly, by purifying complexes of RNA and RNA binding protein. The method of purifying the complex may be either precipitation of RNA or precipitation of RNA-binding protein. Because of the limitations of the techniques for precipitating RNA, researchers have developed more ways to obtain complexes of RNA and RNA-binding proteins by precipitating RNA-binding proteins. Initially, the technique of co-immunoprecipitation (RIP) of RNA-binding proteins was widely used by researchers in the field of studying the interaction of RNA and proteins.
RIP technology first precipitates a complex of the corresponding RNA and RNA binding protein in a tissue or cell by an antibody directed against the target protein. The RNA is then purified and isolated and informative by downstream analytical techniques. RIP technology is similar to the general idea of ChIP technology, but differs due to the objects studied. There are many variations of the steps involved. In addition, RNA is easy to degrade in vitro, so that higher requirements are placed on reagents used for RNA manipulation and experiments. The RIP technology is combined with the microarray technology to form the RIP-Chip technology; the combination with high-throughput sequencing technology becomes RIP-seq technology. The RIP technique has certain limitations due to the milder washing of protein and nucleic acid complexes. The results of false positives in RIP data are numerous and no specific binding site information for RNA and protein can be known [ see Jayaseelan, S., F. Doyle, and S.A. Tenebaum, Profiling post-transcriptional network mRNA subsets using RIP-Chip and RIP-seq. methods,2014.67(1): p.13-9 ].
In 2003, the Darnell laboratory first published a related article for studying RNA and protein interactions by means of uv cross-linking. The initial CLIP protocol combines the RIP and ChIP protocols. Prior to immunoprecipitation of RNA and RNA protein complexes, a method of studying the interaction of DNA and protein, i.e., UV crosslinking, was used. Due to the limitations of the technology at the time, they could not apply high throughput sequencing technologies and therefore the data obtained were limited. Even so, they identified neuro-specific RNA binding proteins and the RNA binding sites for the cleavage factors NOVA1 and NOVA2 in mouse brain tissue using the CLIP technique [ see Ule, J., et al, CLIP intermediates NOVA-regulated RNA networks in the brain science,2003.302(5648): p.1212-5 ]. The conclusions reached by CLIP were validated by knocking out the relevant genes in mouse brain tissue. In 2005, they used high throughput sequencing technology, and further performed deep sequencing on the library obtained by CLIP, and preliminarily drawn the interaction network of the whole genome protein and RNA of Nova protein according to the sequencing result. They refer to the technology combining CLIP and high throughput sequencing as HITS-CLIP technology [ see Ule, j., et al, Nova regulations bridge-specific partitioning to shape the map. nat Genet,2005.37(8): p.844-52 ]. At the same time, a subject group studied the whole genome RNA and protein interaction network of RbFox2 protein using CLIP in combination with high throughput sequencing technology, and named CLIP-seq [ see Yeo, G.W., et al., An RNA code for the FOX2 partitioning modified by mapping RNA-protein interactions in stem cells. Nat Struct Mol Biol,2009.16(2): p.130-7 ]. Since then high throughput sequencing technologies are increasingly being applied to the study of RNA and protein whole genome interaction networks. In 2009, the interactions between Ago2 protein-bound miRNA-mRNA were analyzed by the Darnell laboratory through CLIP-seq technology to obtain genome-wide information of the interactions between miRNA-mRNA [ see Chi, s.w., et al, Argonaute HITS-CLIP codes microRNA-mRNA interaction maps. nature,2009.460(7254): p.479-86 ]. In the same year, Xue, Y et al have studied the mechanism of variable cleavage of mRNA precursors in the nucleus by the CLIP technique for PTB proteins [ see Xue, Y., et al, Genome-wide analysis of PTB-RNA interactions a specific used by the genetic cleavage to modulation expression or cleavage. mol Cell,2009.36(6): p.996-1006 ].
The CLIP technology provides some improvements over the RIP technology. First, both cell samples and tissue samples need to be treated by ultraviolet irradiation. Covalent bonds are formed between the RNA and the protein by UV irradiation. The energy required to open the covalent bond is high and the complex of protein and RNA can withstand intensive purification processes. Thus, the use of ultraviolet light can improve the signal-to-noise ratio of the data obtained from sequencing. Meanwhile, UV cross-linking does not cause the formation of chemical bridges between macromolecules and macromolecules, interfering with signal-to-noise ratio, compared to formalin (formaldehyde) cross-linking used in ChIP. UV cross-linking only crosslinks within a distance of a few angstroms, i.e., only crosslinks between proteins and interacting nucleic acids, forming covalent bonds. Second, rnase digestion was introduced in the CLIP procedure. Because RNA that is not protected by proteins will be or is more readily digested by RNAses, a mild RNAse digestion reaction can degrade the portion of RNA that is not bound to proteins, leaving only the portion of RNA that is bound to proteins. The truncated RNA is easier to transcribe and sequence and to perform domain analysis, and furthermore, precipitation of unnecessary protein RNA protein complexes is avoided. Because of this, this nuclease treatment is also added to the subsequent RIP manipulation step. The RNA digestive enzymes reported in the literature include RNase A, RNase T1, RNase I and the like. Their applications have benefits and disadvantages. Although the RNA in the complex is truncated, there is some interference with the results because there is no step to remove the RNase during subsequent purification. Finally, samples of CLIP can withstand washing under vigorous conditions. Such as multiple washes of PBST, SDS-PAGE gel purification, transfer to cellulose acetate membranes, and the like. These steps all remove proteins that are non-specifically bound to the antibody, while reducing unbound and non-cross-linked RNA and reducing the background of the data.
The CLIP technology is developed based on ChIP and RIP technology. The conventional CLIP technique is as follows: after cell culture, proteins and RNA were crosslinked by uv irradiation. After cell lysis, the complex is subjected to RNase treatment to degrade the unprotected portion of the protein and trim the RNA length to uniformity. The protein is then immunoprecipitated in vitro by specific antibodies, precipitating complexes of RNA and protein. After gentle washing, an RNA linker was added to the 3' end of the RNA. Labeling of 5' end of RNA with isotope32And P. The protein and RNA complexes were then denatured, run on SDS-PAGE gels, and blotted onto nitrocellulose membranes. Because the nitrocellulose membrane is characterized by binding to proteins only, but not to RNA. After membrane transfer, the band carrying the isotopically labeled protein and RNA complexes is cleaved by development, and the RNA and protein complexes are extracted. Subsequently, the protein was digested with proteinase K to purify the RNA. Then a linker sequence is added to the 5' end of the RNA. The RNA is reverse transcribed into cDNA by means of a linker sequence. The cDNA is then amplified by PCR techniques. The resulting cDNA library can be subjected to high throughput sequencing, data obtained and analyzed [ see Ule, J., et al, CLIP identifications Nova-modulated RNA networks in the library, science,2003.302(5648): p.1212-5 ].
After the CLIP technology and the high throughput sequencing technology are combined, many new CLIP technologies are developed. One of the widely used techniques is PAR-CLIP (Photoactive Nitrogen-enhanced CLIP) [ see Hafner, M., et al, transfer-with identification of RNA-binding protein and microRNA target sites by PAR-CLIP. cell,2010.141(1): p.129-41 ]. PAR-CLIP technology changes the cell culture technology. 4SU was added at the time of cell culture, so that the U site of nascent transcripts or homeostatic RNA was replaced by 4 SU. The wavelength of the ultraviolet light used for crosslinking also changes from 254nm to 365 nm. The efficiency of crosslinking is theoretically improved. Different nucleases were also used to trim the length of the RNA. The position of 4SU in the resulting cDNA library will undergo C → T conversion and can be identified during the course of the signal generation process. The PAR-CLIP technology is characterized in that 4-thiouridine (4 SU) or 6-thioguanosine (6 SG) is exogenously added in the cell culture process, so that U or G is marked, and meanwhile, ultraviolet irradiation is selectively carried out by using long wavelength (generally UV365 nm). This can improve the efficiency of crosslinking proteins and nucleic acids. 4SU is a more commonly used agent because it increases the efficiency of UV cross-linking and the frequency of uracil and protein binding in nucleic acids in vivo is extremely high. Replacement of U in RNA by 4SU results in the conversion of T to C at the time of reverse transcription. This transition often serves as an indicator of the crosslinking site and is also often used as a signal to subtract out the non-crosslinked background. However, there are also many limitations to the 4SU used in PAR-CLIP. First, 4SU can only be used in cell culture, and has not been reported in tissues, animals or clinical specimens. Second, 4SU is toxic to cells. After adding 4SU to the medium, the cells were cultured for 24 to 48 hours. During this time, 4SU may have an effect on the metabolism of the cell, leading to apoptosis of the cell. Finally, although it has been reported that the mRNA library obtained after addition of 4SU has not changed significantly, its unknown effects have not been discovered, and thus their use requires caution.
Following the PAR-CLIP technique, the iCLIP (induced-nuclear-particle resolution CLIP) technique has emerged [ see Konig, J., et al., iCLIP developments of the function of hnRNP composites in utilization of induced nuclear particle resolution. Nat Structure Mol Biol,2010.17(7): p.909-15 ]. The iCLIP technique refers to a single base resolution CLIP technique. In the traditional CLIP technology, when RNA is reverse transcribed into cDNA, reverse transcriptase usually cannot successfully transcribe when it runs to the cross-linking site. Since the covalent bond formed by the mutual cross-linking of the amino acid residue of the RNA-binding protein and the RNA molecule is not easily opened after the RNA-binding protein is degraded, the reverse transcriptase may be separated from the template after steric hindrance, and thus the site becomes an obstacle to reverse transcription during reverse transcription. Although such a barrier exists, reverse transcriptase sometimes bypasses this barrier to reverse transcription. The result of this may be a mutation or deletion of the cDNA at these sites, but more often termination of the reverse transcription. Although reverse transcription is not complete, these deleted and terminated fragments may provide information on the site of crosslinking. However, in the traditional CLIP procedure, PCR amplification requires the addition of linker sequences at both the 5 'and 3' ends of the cDNA. While the 5 'and 3' linker sequences of the cDNA are derived from reverse transcription of the RNA template. Thus, the cDNA thus reverse transcribed is susceptible to deletion of the 5' linker sequence, thus leading to failure of the library construction. To improve this step, the linker was added directly to the reverse transcribed primer in the iCLIP protocol. The cDNA thus obtained is subjected to cyclization and enzymatic cleavage, and linker sequences can be added to both ends of the cDNA naturally. This avoids loss of cDNA information due to transcription failure. The iCLIP technology changes the traditional library building mode. Most of cDNA of the reverse transcription reaction is stopped at a cross-linking position by adopting a cyclization mode. Thus, after the loop is cut, the beginning of the 5' end of the sequence is the position where the cross-linking is halted. iCLIP can be used for single nucleotide resolution data analysis.
Although the iCLIP technique improves the efficiency of cDNA library construction, reduces the operating time of the CLIP technique and reduces the likelihood of failure. However, the efficiency of the cyclase in this step is extremely critical since the circularization of single stranded cDNA is required for its pooling. If the efficiency of the cyclase is low, the composition of the whole cDNA library is affected, and the sequencing success rate is even affected. To improve this, in 2016, Gene, W, Y laboratories optimized the library building procedure, invented the eCLP (enhanced CLIP) technology [ see Van Nostrand, E.L., et al, Robust transfer-with discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP) ], 2016.13(6): p.508-14 ]. The eCLIP technology does not change the operation steps of the whole CLIP, but changes in library establishment. They refer to the library construction procedure for iCLIP, but do not circularize after reverse transcription, but add linker sequences directly to the 3' end of the cDNA. Then, the cDNA library is amplified by the PCR step. They indicated that the library construction method of eCILP increased the efficiency of cDNA construction and reduced the probability of failure of cDNA library.
In addition, the advent of irCLIP (intrinsic-CLIP) technology has replaced the traditional use of isotopes in CLIP technology. irCLIP technology uses a 3' RNA linker with a Biotin label. This allows the location of the RNA to be located on the fluorescence scanner [ see Zarnegar, B.J., et al, irCLIP platform for efficacy characterization of protein-RNA interactions. Nat Methods,2016.13(6): p.489-92 ].
Although the CLIP technique has been improved many times, the procedure requires the use of SDS-PAGE electrophoresis and cellulose acetate membrane transfer. The two steps are long in time consumption, high in operation requirement and easy to lose samples. The greatest starting point for improving the CLIP technology is therefore to reduce the overall operating time, reducing the operating steps. During the procedure, if the overall efficiency is improved, the number of cells initially used may also be reduced. In fact, the reduction of the cell or tissue dosage can enable scientific researchers to carry out CLIP experiments on cells or tissues which are difficult to obtain. The study of the interaction of target proteins and RNA in cells at an early embryonic stage or at some stage in the cell differentiation cycle becomes possible.
Disclosure of Invention
The invention aims to provide a novel application of a tag protein capable of being covalently bound with a substrate. The tag protein capable of covalently binding to a substrate herein refers to a tag protein which can react with a substrate to form a covalent bond and cannot be dissociated after binding.
The invention provides a new application of a tag protein capable of being covalently bound with a substrate, in particular to an application of the tag protein capable of being covalently bound with the substrate in an ultraviolet cross-linking co-immunoprecipitation technology.
In the application, the tag protein capable of being covalently bound with the substrate is used as a fusion tag of a target protein in the ultraviolet crosslinking co-immunoprecipitation technology.
Further, in the present invention, the application is for studying the interaction between RNA and RNA-binding protein (i.e., the target protein), and the UV cross-linked co-immunoprecipitation technology purifies RNA-protein complex in a non-gel-running and non-membrane-transfer manner. That is, in contrast to the conventional CLIP technology, the RNA-protein complex of the present invention is purified without passing through SDS-polyacrylamide gel electrophoresis, without passing through a membrane transfer of a cellulose acetate membrane, and without using an isotope.
Further, in the present invention, a fusion protein formed by the tag protein capable of covalently binding to a substrate and the target protein (i.e., the RNA-binding protein) is covalently bound to a specific binding substance of the tag protein capable of covalently binding to a substrate with a medium (e.g., magnetic beads), and then the RNA-protein complex is purified by means of a protein denaturing wash.
Wherein, the protein denaturation washing refers to that hydrogen bonds in protein molecules are destroyed by physical means (such as heating, shaking and the like) and/or chemical means (such as guanidine hydrochloride, urea, sodium dodecyl sulfate, Trizol reagent and the like), so that the original higher structure is opened after the protein is denatured. Because covalent bonds are formed between the tag proteins capable of covalently bonding substrates and the specific conjugates of the tag proteins capable of covalently bonding substrates fixed on a medium (such as magnetic beads), RNA and protein cross-links also form covalent bonds, the complexes connected by the covalent bonds are still coupled on the medium (such as magnetic beads) and maintain the integrity of the sequence during the washing process, and non-covalently bonded substances, such as impurities, non-specifically bonded nucleic acids and the like, are released out of the complexes and washed clean.
Accordingly, the invention provides a method for obtaining target RNA of RNA binding protein by using tag protein capable of being covalently bound with substrate and carrying out ultraviolet crosslinking co-immunoprecipitation by a non-gel purification mode.
The invention provides a method for obtaining target RNA of RNA binding protein by using tag protein capable of covalently binding substrate and performing ultraviolet crosslinking co-immunoprecipitation by a non-gel purification mode, specifically, the purification of RNA-protein complex is not performed by SDS-polyacrylamide gel electrophoresis, is not performed by membrane transfer of cellulose acetate membrane, and is not performed by isotope.
The method specifically comprises the following steps:
(1) allowing the receptor cell to express a fusion protein formed by linking a tag protein capable of covalently binding to a substrate and a target protein via a linker peptide;
(2) carrying out ultraviolet irradiation on the receptor cells expressing the fusion protein in the step (1) to form covalent bonds between RNA and protein in the RNA-protein complex for crosslinking, and collecting a cell sample;
(3) lysing the cell sample collected in step (2), and then performing immunoprecipitation through a medium (such as magnetic beads) connected with a specific conjugate of the tag protein capable of covalently binding to the substrate, to obtain a precipitated sample containing a "target RNA-fusion protein-medium";
(4) subjecting the precipitated sample obtained in step (3) to a non-denaturing wash and then to a dephosphorylation treatment (e.g., treatment with alkaline phosphatase);
(5) performing denaturation washing on the sample treated in the step (4);
(6) subjecting the sample treated in step (5) to an enzymatic digestion treatment with a protease capable of specifically recognizing the linker peptide, thereby separating the RNA-protein complex formed by the target protein and the target RNA from the medium;
(7) removing the target protein (e.g., removing the target protein by a protease digestion treatment) from the RNA-protein complex formed by the target protein and the target RNA obtained in step (6), thereby obtaining the target RNA.
Further, the fusion protein in step (1), wherein the tag protein capable of covalently binding to a substrate may be located at the N-terminus or the C-terminus.
Further, the step (1) is as follows: connecting the coding gene of the tag protein capable of being covalently combined with the substrate with the coding gene of the target protein (namely RNA binding protein) through the coding gene of the connecting peptide to obtain a fusion gene; the fusion gene is capable of expressing a fusion protein formed by connecting the tag protein capable of covalently binding a substrate and the target protein by the connecting peptide; introducing the fusion gene into a receptor cell to obtain a recombinant cell; culturing the recombinant cell to express the fusion protein from the introduced fusion gene.
In the method, in the step (3), the step of performing RNase (or other nuclease capable of hydrolyzing RNA) treatment is further included after lysing the cell sample collected in the step (2).
In the present invention, in step (4), the non-denaturing washing is specifically: washing 2 times with PBS buffer containing 0.1% Triton X-100 and 500mM NaCl, and 3 times with PBS buffer containing 0.1% Triton X-100; wherein% represents volume percent.
In the present invention, in the step (5), the denaturing washing may be one or more denaturing washes. In the present invention, two denaturing washes are specific.
More specifically, the first denaturing washing of the two denaturing washes is specifically: alkaline phosphatase was removed with Trizol reagent (Invitrogen); 8M guanidine hydrochloride washes 2 times (5 minutes each); 8M urea washes were 2 times (5 minutes each). The second denaturing washing step specifically comprises: washing 3 times (each time with shaking at 65 ℃ for 5 minutes) with SDS washing solution (formulation: 10% SDS; 50mM Tris-HCl pH 7.0; 1mM EDTA; 1mM DTT; balance water;% represents a percentage by mass, i.e., 10% represents 10g/100 ml); wash 2 times with 8M urea (5 min at room temperature each).
In the method, a step of performing a linker reaction on the 3' end of the RNA on the sample after the first denaturing washing may be further included between the two denaturing washes. This step may, of course, be carried out after successful isolation of the target RNA from the RNA-protein complex.
In the method, the "protease capable of specifically recognizing the linker peptide" may be Tev enzyme, enterokinase, factor Xa, 3C protease of human rhinovirus type 14 (HRV 3C protease or PreScission protease), thrombin, etc., and the linker peptide may be specifically a recognition sequence specific to one or more (e.g., 2 to 3) of these enzymes, respectively. Wherein, the specific sequence of the connecting peptide recognized by the Tev enzyme is Glu-Asn-Leu-Tyr-Phe-Gln-Gly/Ser ("/" represents "or"). The specific sequence of the connecting peptide recognized by the enterokinase is Asp-Asp-Asp-Asp-Lys. The specific sequence of the connecting peptide recognized by the Xa factor is Ile-Glu/Asp-Gly-Arg. The 3C protease (HRV 3C protease or PreScission protease) of human rhinovirus type 14 recognizes the linker peptide with the specific sequence Leu-Glu-Val-Leu-Phe-Gln-Gly-Pro. The specific sequence of the connecting peptide recognized by the thrombin is A-B-Pro-Arg-X-Y (wherein, A and B are hydrophobic amino acids, and X and Y are non-acidic amino acids), and the common recognition sequence is Leu-Val-Pro-Arg-Gly-Ser.
In one embodiment of the invention, the linker peptide is specifically a recognition sequence specific for 3 Tev enzymes in tandem. The specific recognition sequence of the Tev enzyme is Glu-Asn-Leu-Tyr-Phe-Gln-Gly/Ser (the "/" represents "or"). More specifically, the connecting peptide is an amino acid sequence encoded by a DNA fragment shown in 2322-2396 site of sequence 1 in the sequence table. Accordingly, the "protease capable of specifically recognizing the linker peptide" is specifically Tev enzyme. The research of the invention shows that Tev enzyme can tolerate 2M urea condition.
The use of the methods described above for studying protein-RNA interaction is also within the scope of the invention.
The invention further provides a method for identifying a target RNA of an RNA binding protein.
The method for identifying the target RNA of the RNA binding protein, named as GoldClIP, specifically comprises the following steps: obtaining target RNA by the method, constructing a cDNA library, carrying out high-throughput sequencing on the cDNA library, and finally carrying out gene localization according to a sequencing result.
In the present invention, the target RNA is obtained followed by construction of the cDNA library by the method described in the iCLIP literature [ see Konig, J., et al (2011) ] for protein-RNA interactions with induced nuclear reactions. Roughly as follows: RNA reverse transcription and the addition of each sample independent tag sequence, run TEB-Urea denatured gel purification. The primer sequence used for reverse transcription not only comprises an independent tag sequence, but also a linker sequence at the 5 'end, a linker sequence at the 3' end and a BamHI enzyme cutting site between the linker sequences. The resulting single stranded cDNA is circularized by ssDNA cyclase and then a sequence complementary to the primer is annealed to the primer sequence by a PCR instrument. Then, the product was digested with BamHI to obtain a product having a linker sequence at the 5 'end, a tag sequence and a linker sequence at the 3' end. And then carrying out high-throughput sequencing on the obtained cDNA after PCR amplification.
The tag protein which can be covalently bound to a substrate as described above may be any of the following: halo tag protein, CLIP tag protein, SNAP tag protein, Spy tag protein and the like. The Halo tag is a genetically engineered microbial dehalogenase of Rhodococcus rhodochrous from Promega. The substrate of the enzyme is a halogen-containing ligand. The SNAP tag is derived from human alkylguanine-DNA alkyltransferase, and is a DNA repair protein edited by NEB company through genetic engineering. The substrate of the enzyme is benzylguanine. The CLIP tag is derived from the SNAP tag. The substrate of the enzyme is a benzylcytosine derivative. The Spy tag is derived from a short 11 amino acid peptide of the fibronectin-binding protein (FbaB) CnaB2 domain of streptococcus pyogenes. The substrate of the tag is a SpyCatcher protein.
Correspondingly, the specific binding substance of the tag protein capable of being covalently bound with the substrate is any one of the following substances:
(A) when the tag protein capable of being covalently bound to the substrate is a Halo tag protein, the specific conjugate of the tag protein capable of being covalently bound to the substrate is a halogen-containing ligand;
(B) when the tag protein capable of being covalently bound to the substrate is a CLIP tag protein, the specific binder of the tag protein capable of being covalently bound to the substrate is a benzyl cytosine derivative;
(C) when the tag protein capable of being covalently bound to the substrate is a SNAP tag protein, the specific binder of the tag protein capable of being covalently bound to the substrate is benzylguanine;
(D) when the tag protein capable of being covalently bound to the substrate is a Spy tag protein, the specific conjugate of the tag protein capable of being covalently bound to the substrate is a SpyCatcher protein.
In one embodiment of the invention, the tag protein capable of covalently binding to a substrate is specifically a Halo tag protein. More specifically, the amino acid sequence of the Halo tag protein is specifically shown as a sequence 1 in a sequence table.
In one embodiment of the invention, the target protein is specifically a PTB protein. The Halo tag protein is located at the N-terminus of the PTB protein. The receptor cell is a HEK293 cell.
Under normal physiological conditions, the active center of the Halo-tagged protein can only be combined with the halogen-containing ligand in a covalent bond mode, and the covalent bond can not be opened. Thus, when the target protein and the Halo tag are bound, covalent binding can be performed by adding a fluorescent Halo ligand in vitro. If a Halo ligand is attached to a magnetic bead, the Halo-tagged protein can be covalently coupled to the magnetic bead. The molecular weight of the Halo tag is 33kDa, and the Halo tag can be combined at the N terminal or the C terminal of a target protein. The fusion protein can be expressed in a prokaryotic or eukaryotic expression system. The invention utilizes the characteristic that Halo labels can be covalently combined with ligands to design a new CLIP scheme.
The invention relates to the use of Halo-tagged fusion proteins in the CLIP (ultraviolet cross-linked co-immunoprecipitation) technology by means of denaturant purification. The traditional CLIP procedure requires purification of the protein RNA complex by SDS-PAGE electrophoresis, cellulose acetate membrane transfer and isotopes. It takes long time and the sample is easy to lose. The invention uses Halo-tagged fusion proteins to obtain RNA libraries that can be used for sequencing by means of washing with protein denaturing agents. The invention comprises the following technical steps: to the beltCell lines with Halo-tagged fusion proteins were UV cross-linked to collect samples. After cell lysis, the length of RNA was trimmed with nuclease. Then pass through

Beads were immunoprecipitated. The immunoprecipitated RNA is then dephosphorylated and a linker sequence is added to the 3' end of the RNA. Residual RNA contaminants are then removed using a strong protein denaturant wash. And finally, digesting the protein to obtain purified RNA. The RNA is constructed into a cDNA library, and then high-throughput sequencing can be carried out. The present invention is also known as GoldClIP technology (Gel-identified and Ligation-Dependent Cross-Linking and immunopropractination).
The invention takes PTB protein as target protein, verifies the provided GoldClIP technology, and proves that the experimental results published by the previous literature can be well repeated. The step of recovering fragments through film transfer and gel cutting in the traditional CLIP step is omitted in operation. According to the method, a method of violent washing which cannot be directly carried out on the magnetic beads in the prior art is added into an experimental operation step by virtue of the characteristic that a Halo label can form a covalent bond with a ligand of the magnetic beads. Therefore, the operation is simple and convenient, the operation time is saved, the CLIP steps are optimized, and the loss possibly caused in the operation process is reduced.
Detailed Description
The experimental procedures used in the following examples are all conventional procedures unless otherwise specified.
Materials, reagents and the like used in the following examples are commercially available unless otherwise specified.
The operational steps of the GoldCLIP technique provided by the present invention are generally as follows: and transferring the target protein with the Halo label into a cell line by a virus infection method, and performing ultraviolet crosslinking to collect a sample. Cells were lysed and nuclease treated to trim protein-bound RNA. Immunoprecipitation was then performed by magnetic beads attached with Halo ligands. The beads were then washed gently to wash away non-specific bands, and then alkaline phosphatase was removed by Trizol reagent (first denaturing wash) after removing the phosphate group at the 3' end by alkaline phosphatase. Then, a linker is added to the 3' -end of the RNA, and the remaining RNA contaminants are removed by a vigorous method such as protein denaturation washing (second denaturation washing). The protein was then digested by Tev protease, and the resulting RNA was purified. The cDNA library was then constructed by the methods described in the iCLIP literature. RNA reverse transcription and the addition of each sample independent tag sequence, run TEB-Urea denatured gel purification. The primer sequence used for reverse transcription not only comprises an independent tag sequence, but also a linker sequence at the 5 'end, a linker sequence at the 3' end and a BamHI enzyme cutting site between the linker sequences. The resulting single stranded cDNA is circularized by ssDNA cyclase and then a sequence complementary to the primer is annealed to the primer sequence by a PCR instrument. Then, the product was digested with BamHI to obtain a product having a linker sequence at the 5 'end, a tag sequence and a linker sequence at the 3' end. And then carrying out high-throughput sequencing on the obtained cDNA after PCR amplification. The specific technical flow chart is shown in figure 1.
Example 1 Halo-tagged PTB proteins CLIP experiments in HEK293 cell line
In this example, the application of the GoldCLIP technology provided by the present invention to RNA binding proteins was further investigated by performing CLIP experiments on Halo-tagged PTB proteins in HEK293 cell lines.
(1) Expression of Halo-PTB fusion proteins in HEK293 cell lines
Plasmid MSCV-NHalo-3xTev-PTB-T2-puro with Halo-PTB fusion protein coding gene (Halo tag protein and PTB protein are connected by 3 specific recognition sequences Glu-Asn-Leu-Tyr-Phe-Gln-Gly/Ser of series-connected Tev enzymes, the map of the plasmid is shown in figure 2, the whole sequence is shown in
sequence 2 in the sequence table, wherein 1419-2309 bit of the
sequence 2 codes the Halo tag protein shown in the
sequence 1 in the sequence table, 2322-2342 bit, 2349-2369 bit and 2376-2396
bit code 3 specific recognition sequences of series-connected Tev enzymes, 2598-4193 bit codes Human PTB protein) is transferred into HEK293 cell line, and the cell line is stabilized by Puromycin drug screening. By passing
Alexa
660Ligand (Promega, G8471) examined the expression of Halo-PTB in cell lines.
The experiment also set up controls for expression of Halo-YFP fusion proteins in HEK293 cell lines. Wherein, the plasmid introduced into the HEK293 cell line is obtained by replacing position 2598-4193 of sequence 1 in the sequence table with YFP coding gene (sequence 3), and the rest operations are the same as above.
(2) Harvesting UV-crosslinked cells
Cell lines with Halo-PTB were grown up on 2 plates of 100mm cell culture dishes. After a dish of cells was plated out on a 100mm dish, UV254nm UV cross-linking was performed. After another dish of cells was plated out on 80% 100mm plates, 4SU (final concentration 200. mu.M) was added and incubation was continued for 24 hours, followed by UV365nm UV cross-linking. Cell lines with Halo-YFP were grown up on 1-dish 100mm cell culture dishes. After the cells were plated out on 100mm dishes, UV254nm UV cross-linking was performed. After UV crosslinking, the cells were scraped with a cell scraper, washed twice with pre-cooled PBS and collected in a 15ml centrifuge tube. The collected cells were either placed on ice for direct follow-up experiments or stored at-80 ℃.
(3) By using
Co-immunoprecipitation of Halo-PTB by Beads
Cells were passed through the lysis buffer (formulation: 50mM Tris-HCl pH 7.5; 100mM NaCl; 1mM DTT; 2mM CaCl)
2(ii) a 10% glycerol; 1 Xproteinase inhibitor (Promega)];0.2%TritonX-100;0.5U/μl Micrococcal Nuclease[NEB](ii) a The balance of water; % indicates volume percentage content), the cells were then broken down on ice by a Dounce tool and allowed to stand on ice for 10 minutes. Then, the reaction mixture was reacted at 37 ℃ for 3 minutes and immediately placed on ice. After addition of 2mM EDTA and 2mM EGTA at final concentration, the mixture was centrifuged at high speed for 10 minutes. Transfer the supernatant to a fresh centrifuge tube, collect the RNA input and input for the western blot and add 300. mu.l
Beads, immunoprecipitated overnight at 4 ℃. Passing through the next day
Alexa
660Ligand (Promega, G8471) to examine the efficiency of immunoprecipitation. Since the Halo tag and the magnetic bead form a covalent bond, the efficiency and effect of the immunoprecipitation can be estimated by comparing the reduction ratio of the expression amount of the Halo tag protein (unbound) in the precipitated sample to the expression amount of the Halo tag protein (input) in the unprecipitated sample. The results are shown in FIG. 3.
(4) On-beads dephosphorylation
Washing with PBST (i.e., non-denaturing wash, specifically "with P containing 0.1% Triton X-100 and 500mM NaCl)
BS buffer washing 2 times, containing 0.1% Triton X-100
PBS buffer washing 3 times; wherein% represents a volume percentage content)') incorporates Halo-PTB protein or Halo-YFP
After Beads, they were washed 3 times with 1 XNEB Cutsmart buffer, then 80. mu.l of alkaline phosphatase (CIP) and DNase I reaction system (8. mu.l of 10 XNEB CutSmart; 8. mu.l of 10mM CaCl) were added
2;5μl CIP[NEB,M0290S];5μl DNase I[Promega];2μl RNase inhibitor[Promega];52μl H
2O), using a Thermomixer instrument, at 37 ℃ for 30 minutes (wherein every 3 minutes, 1000rpm mix for 15 seconds).
(5) First protein denaturing wash
After completion of the dephosphorylation reaction, 750. mu.l Trizol LS (Invitrogen), 200. mu.l CHCl were added3After that, vortex for 15 seconds. The beads were then washed 2 times with 8M guanidine hydrochloride (5 min each), 2 times with 8M urea (5 min each), and finally with PNK buffer (formulation: 20mM Tris-HCl pH 7.0; 10mM MgCl)2(ii) a 0.1% TritonX-100; the balance of water; % indicates the volume percentage) was washed 3 times.
(6) On-beads 3' RNA linker ligation
Mu.l of RNA Linker premix (formulation: 2. mu.l RNase inhibitor; 2. mu.l of 3' RNA Linker [100 pmol/. mu.l) was added to the beads];36μl H2And O. Wherein, the specific sequence of the 3' RNA Linker is as follows: 5 'P/AGGTCGGAAGAGCGGTTCAG/3' ddC/, synthesized by IDT corporation), 55. mu.l of the RNA ligation premix (formulation: 10 μ l 10 XT 4RNA ligase buffer [ NEB];10μl BSA;10μl 10mM ATP;10μl DMSO;10μl T4RNA ligase[Ambion,AM2141](ii) a 5 μ l of 0.1M DTT). The reaction was carried out overnight at 16 ℃ using a Thermomixer apparatus (15 seconds of mixing at 1000rpm every 3 minutes). The next day, 12. mu.l of the ligation reaction premix (formulation: 1. mu.l of 3' RNA Linker [100 pmol/. mu.l)];5μl T4RNA ligase[Ambion,AM2141](ii) a 5 μ l 10mM ATP; mu.l of 0.1M DTT) was allowed to continue the reaction at 25 ℃ for 3 hours (15 seconds of mixing at 1000rpm every 3 minutes).
(7) Second protein denaturation wash
The ligation reaction system in the magnetic beads was removed and washed 1 time with PBST and 3 times with SDS wash (formulation: 10% SDS; 50mM Tris-HCl pH 7.0; 1mM EDTA; 1mM DTT; balance water;%, representing% by mass per volume, i.e.10% representing 10g/100ml), each time with a Thermomixer instrument, at 65 ℃ for 5 minutes with shaking at 1000 rpm. Followed by 2 washes with 8M urea, shaking for 5 minutes at room temperature.
(8) Separating PTB protein from magnetic beads through Tev enzyme digestion reaction
The reaction mixture was digested with 1 XTev (formulation: 50mM Tris-HCl pH 8.0; 1mM EDTA; 1mM DTT; 1% Triton X-100; 2M Urea; the remainder is water;% indicates the volume percentage) and washed 3 times. The magnetic beads were resuspended in 200. mu.l of 1 XTev digest, 5. mu.l of RNase inhibitor and 5. mu.l of Tev enzyme (5. mu.g/. mu.l) were added, and the reaction was carried out at 25 ℃ for 2 hours using a Thermomixer apparatus (wherein the mixture was mixed at 1000rpm for 15 seconds every 3 minutes). The reaction was transferred to a new 1.5mL centrifuge tube. The magnetic beads were resuspended in 100. mu.l of 1 XTev digest, 2. mu.l of RNase inhibitor and 2. mu.l of Tev enzyme (5. mu.g/. mu.l) were added, and the reaction was carried out at 30 ℃ for 2 hours using a Thermomixer apparatus (wherein the mixture was mixed at 1000rpm for 15 seconds every 3 minutes). The reaction solution and the previous reaction solution were combined.
(9) Extraction of RNA
Mu.l of PK reaction mixture (formulation: 4. mu.l of 5M NaCl; 20. mu.l of 1M Tris-HCl pH 7.0; 8. mu.l of 10% SDS; 8. mu.l of l H) was added to the combined Tev reaction mixture2O; 60 μ l proteinase K [ NEB, P8102S]) Water bath at 37 deg.c for 30 min. RNA was extracted with 400. mu.l phenol chloroform solution (Sigma), after vortex centrifugation, the supernatant was transferred to a new 1.5mL centrifuge tube and 50. mu.l 3M NaAc pH 5.5, 1. mu.l glycobilie, 1mL 1:1 (vol/vol) ethanol: isopropanol, standing overnight at-20 ℃. After centrifugation the next day, the pellet was washed 2 times with 75% ethanol and then with 16. mu. l H2And (4) obtaining an RNA library (namely the collection of the collected target RNA interacting with the PTB protein) after O heavy suspension.
(10) construction of cDNA library
Library construction of cDNA reference iCLIP methods see Konig, J., et al (2011) "iCLIP-transcription-with mapping of protein-RNA interactions with induced nuclear resolution," J Vis Exp (50) ". Briefly, RNA was first reverse transcribed into cDNA using Superscript III (Invitrogen). The reverse transcription primer used was/5 Phos/NN [ Index ] NNNAGATCGGAAGAGCGTGgatCCTGAACCGC, where Index is AACC, ACAA, ATTG, AGGT, CGCC, CCGG, CTAA, CATT, GCCA, GACC, GGTT, GTGG, TCCG, TGCC, TATT, TTAA. The cDNA was then run through 6% TEB-urea gel (Invitrogen), 85nt-200nt of the desired fragment was recovered by cutting the gel and ethanol precipitation overnight. The recovered cDNA was circularized with Circliase II (Epicentre). Then 30. mu.l of oligo animal mix was added to bind the oligo to the sequence of interest by means of PCR annealing. Wherein the oligo sequence is 5'-GTTCAGGATCCACGACGCTCTTCAAAA-3'. Then, the reaction was digested with BamHI-HF (NEB) for 1 hour. Finally the cDNA library was amplified by 2 XPPhusion Mix. The amplification forward primer used was 5'-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-3' and the reverse primer was 5'-CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT-3'.
(11) High throughput sequencing
The obtained cDNA library is sent to a sequencing platform for sequencing, and the main sequencing platforms comprise illumina Hiseq 2500 and illumina Hiseq X10. And selecting SE50 or PE250 and the like for sequencing.
(12) Data processing
The adaptor sequences were filtered by Cutadapt software (v1.10http:// journal. embnet. org/index. php/embnetjournal/article/view/200; cutapt-a AGATCGGAAGAGCGGTTCAG-e 0.2-q 20-m 24) and the sequences of the 3' end low-mass bases were trimmed, and the number of bases of each sequence was removed to be less than 24 bp. Then combining the repeated sequences and taking out the barcode sequence and the random sequence at the 5' end. The clipped data was then searched for peak by the Pyicoclip program (Pyicoclip: -P-value 0.001-region. bed). The results obtained after the Pyicoclip procedure are shown in FIG. 4. As can be seen, two biological replicates of UV365 irradiated Halo-PTB containing cells (experimental group) yielded approximately 10 million peaks following the method of the invention. Two biological replicates of UV254 irradiated Halo-PTB containing cells (experimental group) yielded approximately 20 million peaks by the method of the invention. While two biological replicates containing Halo-YFP cells (control) that were subjected to uv irradiation were subjected to the method of the invention with almost no peak.
Each sequence was then aligned to the human genome using Bowtie (v1.1.2 (REF); Bowtie-f-p 8-v 2-k 1- -best- -sam- -un) (hg 19). The whole genome sequence alignment of the PTB sequences is shown in figure 5. As can be seen, the proportion of the peak in the intron (intron) portion of the Halo-PTB cells, whether irradiated by UV365 or UV254, was greater than 60% of the total peak. The second is the intergenic region, which accounts for about 10% of the total weight.
(13) Data correlation analysis
The correlation between two sets of experiments for Halo-PTB UV254, UV365 and novu data was compared by Pearson correlation coefficient. The results are shown in FIG. 6, which shows the correlation R of the two sets of data for PTB UV2542Correlation R of two sets of data PTB UV365 ═ 0.9720.99. The experiment is proved to have repeatability and reliable data.
(14) Motifs analysis
The motif in the Peak sequence was found by HOMER software (-p 8-rna-S10-len 5,6, 7). The method is to compare the sequence information with the background. Background information was generated by three sets of random sequences in the same gene region. The results shown in FIG. 7 list the first 5 sequences of the found Motif sequences. The results show that the experiments performed by this method, whether PTB UV254 or PTB UV365, can find the Motif sequence of UC rich and are consistent with the published results [ see Xue, Y., et al, Genome-wide analysis of PTB-RNA interactions a specific used by the genetic partitioning expression. mol Cell,2009.36(6): p.996-1006 ].
The results of this example show that: the CLIP method using the technique of uv cross-linking co-immunoprecipitation using Halo-tagged fusion proteins by non-gel purification can well replicate the experimental results published in the previous literature. The step of recovering fragments through film transfer and gel cutting in the traditional CLIP step is omitted in operation. According to the method, a method of violent washing which cannot be directly carried out on the magnetic beads in the prior art is added into an experimental operation step by virtue of the characteristic that a Halo label can form a covalent bond with a ligand of the magnetic beads. Therefore, the operation is simple and convenient, the operation time is saved, the CLIP steps are optimized, and the loss possibly caused in the operation process is reduced.
<110> institute of biophysics of Chinese academy of sciences
<120> use of tag protein capable of covalently binding substrate in CLIP
<130> GNCLN171827
<160> 3
<170> PatentIn version 3.5
<210> 1
<211> 297
<212> PRT
<213> Artificial sequence
<220>
<223>
<400> 1
Met Ala Glu Ile Gly Thr Gly Phe Pro Phe Asp Pro His Tyr Val Glu
1 5 10 15
Val Leu Gly Glu Arg Met His Tyr Val Asp Val Gly Pro Arg Asp Gly
20 25 30
Thr Pro Val Leu Phe Leu His Gly Asn Pro Thr Ser Ser Tyr Val Trp
35 40 45
Arg Asn Ile Ile Pro His Val Ala Pro Thr His Arg Cys Ile Ala Pro
50 55 60
Asp Leu Ile Gly Met Gly Lys Ser Asp Lys Pro Asp Leu Gly Tyr Phe
65 70 75 80
Phe Asp Asp His Val Arg Phe Met Asp Ala Phe Ile Glu Ala Leu Gly
85 90 95
Leu Glu Glu Val Val Leu Val Ile His Asp Trp Gly Ser Ala Leu Gly
100 105 110
Phe His Trp Ala Lys Arg Asn Pro Glu Arg Val Lys Gly Ile Ala Phe
115 120 125
Met Glu Phe Ile Arg Pro Ile Pro Thr Trp Asp Glu Trp Pro Glu Phe
130 135 140
Ala Arg Glu Thr Phe Gln Ala Phe Arg Thr Thr Asp Val Gly Arg Lys
145 150 155 160
Leu Ile Ile Asp Gln Asn Val Phe Ile Glu Gly Thr Leu Pro Met Gly
165 170 175
Val Val Arg Pro Leu Thr Glu Val Glu Met Asp His Tyr Arg Glu Pro
180 185 190
Phe Leu Asn Pro Val Asp Arg Glu Pro Leu Trp Arg Phe Pro Asn Glu
195 200 205
Leu Pro Ile Ala Gly Glu Pro Ala Asn Ile Val Ala Leu Val Glu Glu
210 215 220
Tyr Met Asp Trp Leu His Gln Ser Pro Val Pro Lys Leu Leu Phe Trp
225 230 235 240
Gly Thr Pro Gly Val Leu Ile Pro Pro Ala Glu Ala Ala Arg Leu Ala
245 250 255
Lys Ser Leu Pro Asn Cys Lys Ala Val Asp Ile Gly Pro Gly Leu Asn
260 265 270
Leu Leu Gln Glu Asp Asn Pro Asp Leu Ile Gly Ser Glu Ile Ala Arg
275 280 285
Trp Leu Ser Thr Leu Glu Ile Ser Gly
290 295
<210> 2
<211> 9099
<212> DNA
<213> Artificial sequence
<220>
<223>
<400> 2
tgaaagaccc cacctgtagg tttggcaagc tagcttaagt aacgccattt tgcaaggcat 60
ggaaaataca taactgagaa tagagaagtt cagatcaagg ttaggaacag agagacagca 120
gaatatgggc caaacaggat atctgtggta agcagttcct gccccggctc agggccaaga 180
acagatggtc cccagatgcg gtcccgccct cagcagtttc tagagaacca tcagatgttt 240
ccagggtgcc ccaaggacct gaaatgaccc tgtgccttat ttgaactaac caatcagttc 300
gcttctcgct tctgttcgcg cgcttctgct ccccgagctc aataaaagag cccacaaccc 360
ctcactcggc gcgccagtcc tccgatagac tgcgtcgccc gggtacccgt attcccaata 420
aagcctcttg ctgtttgcat ccgaatcgtg gactcgctga tccttgggag ggtctcctca 480
gattgattga ctgcccacct cgggggtctt tcatttggag gttccaccga gatttggaga 540
cccctgccca gggaccaccg acccccccgc cgggaggtaa gctggccagc ggtcgtttcg 600
tgtctgtctc tgtctttgtg cgtgtttgtg ccggcatcta atgtttgcgc ctgcgtctgt 660
actagttagc taactagctc tgtatctggc ggacccgtgg tggaactgac gagttctgaa 720
cacccggccg caaccctggg agacgtccca gggactttgg gggccgtttt tgtggcccga 780
cctgaggaag ggagtcgatg tggaatccga ccccgtcagg atatgtggtt ctggtaggag 840
acgagaacct aaaacagttc ccgcctccgt ctgaattttt gctttcggtt tggaaccgaa 900
gccgcgcgtc ttgtctgctg cagcgctgca gcatcgttct gtgttgtctc tgtctgactg 960
tgtttctgta tttgtctgaa aattagggcc agactgttac cactccctta agtttgacct 1020
taggtcactg gaaagatgtc gagcggatcg ctcacaacca gtcggtagat gtcaagaaga 1080
gacgttgggt taccttctgc tctgcagaat ggccaacctt taacgtcgga tggccgcgag 1140
acggcacctt taaccgagac ctcatcaccc aggttaagat caaggtcttt tcacctggcc 1200
cgcatggaca cccagaccag gtcccctaca tcgtgacctg ggaagccttg gcttttgacc 1260
cccctccctg ggtcaagccc tttgtacacc ctaagcctcc gcctcctctt cctccatccg 1320
ccccgtctct cccccttgaa cctcctcgtt cgaccccgcc tcgatcctcc ctttatccag 1380
ccctcactcc ttctctaggc gccggaatta gatccaccat ggcagaaatc ggtactggct 1440
ttccattcga cccccattat gtggaagtcc tgggcgagcg catgcactac gtcgatgttg 1500
gtccgcgcga tggcacccct gtgctgttcc tgcacggtaa cccgacctcc tcctacgtgt 1560
ggcgcaacat catcccgcat gttgcaccga cccatcgctg cattgctcca gacctgatcg 1620
gtatgggcaa atccgacaaa ccagacctgg gttatttctt cgacgaccac gtccgcttca 1680
tggatgcctt catcgaagcc ctgggtctgg aagaggtcgt cctggtcatt cacgactggg 1740
gctccgctct gggtttccac tgggccaagc gcaatccaga gcgcgtcaaa ggtattgcat 1800
ttatggagtt catccgccct atcccgacct gggacgaatg gccagaattt gcccgcgaga 1860
ccttccaggc cttccgcacc accgacgtcg gccgcaagct gatcatcgat cagaacgttt 1920
ttatcgaggg tacgctgccg atgggtgtcg tccgcccgct gactgaagtc gagatggacc 1980
attaccgcga gccgttcctg aatcctgttg accgcgagcc actgtggcgc ttcccaaacg 2040
agctgccaat cgccggtgag ccagcgaaca tcgtcgcgct ggtcgaagaa tacatggact 2100
ggctgcacca gtcccctgtc ccgaagctgc tgttctgggg caccccaggc gttctgatcc 2160
caccggccga agccgctcgc ctggccaaaa gcctgcctaa ctgcaaggct gtggacatcg 2220
gcccgggtct gaatctgctg caagaagaca acccggacct gatcggcagc gagatcgcgc 2280
gctggctgtc gacgctcgag atttccggcg agccaaccac tgaggatctg tactttcaga 2340
gcatcgatga aaatctgtac ttccagggga tcgatgagaa cctgtacttt caggggagcg 2400
cctggtccca cccccagttc gaaaagggcg gcggaagcgg aggaggctcc ggaggttccg 2460
cttggtccca cccgcagttc gagaagaccg gcgccattta tcaaacaagt ttgtacaaaa 2520
aagcaggctc caccatggga accaattcag tcgactggat cctcatggct agcatgactg 2580
gtggacagca aatgggtatg gacggcattg tcccagatat agccgttggt acaaagcggg 2640
gatctgacga gcttttctct acttgtgtca ctaacggacc gtttatcatg agcagcaact 2700
cggcttctgc agcaaacgga aatgacagca agaagttcaa aggtgacagc cgaagtgcag 2760
gcgtcccctc tagagtgatc cacatccgga agctccccat cgacgtcacg gagggggaag 2820
tcatctccct ggggctgccc tttgggaagg tcaccaacct cctgatgctg aaggggaaaa 2880
accaggcctt catcgagatg aacacggagg aggctgccaa caccatggtg aactactaca 2940
cctcggtgac ccctgtgctg cgcggccagc ccatctacat ccagttctct aaccacaagg 3000
agctgaagac cgacagctct cccaaccagg cgcgggccca ggcggccctg caggcggtga 3060
actcggtcca gtcggggaac ctggccttgg ctgcctcggc ggcggccgtg gacgcaggga 3120
tggcgatggc cgggcagagc cctgtgctca ggatcatcgt ggagaacctc ttctaccctg 3180
tgaccctgga tgtgctgcac cagattttct ccaagttcgg cacagtgttg aagatcatca 3240
ccttcaccaa gaacaaccag ttccaggccc tgctgcagta tgcggacccc gtgagcgccc 3300
agcacgccaa gctgtcgctg gacgggcaga acatctacaa cgcctgctgc acgctgcgca 3360
tcgacttttc caagctcacc agcctcaacg tcaagtacaa caatgacaag agccgtgact 3420
acacacgccc agacctgcct tccggggaca gccagccctc gctggaccag accatggccg 3480
cggccttcgg cctttccgtt ccgaacgtcc acggcgccct ggcccccctg gccatcccct 3540
cggcggcggc ggcagctgcg gcggcaggtc ggatcgccat cccgggcctg gcgggggcag 3600
gaaattctgt attgctggtc agcaacctca acccagagag agtcacaccc caaagcctct 3660
ttattctttt cggcgtctac ggtgacgtgc agcgcgtgaa gatcctgttc aataagaagg 3720
agaacgccct agtgcagatg gcggacggca accaggccca gctggccatg agccacctga 3780
acgggcacaa gctgcacggg aagcccatcc gcatcacgct ctcgaagcac cagaacgtgc 3840
agctgccccg cgagggccag gaggaccagg gcctgaccaa ggactacggc aactcacccc 3900
tgcaccgctt caagaagccg ggctccaaga acttccagaa catattcccg ccctcggcca 3960
ctctgcacct ctccaacatc ccgccctcag tctccgagga ggatctcaag gtcctgtttt 4020
ccagcaatgg gggcgtcgtc aaaggattca agttcttcca gaaggaccgc aagatggcac 4080
tgatccagat gggctccgtg gaggaggcgg tccaggccct cattgacctg cacaaccacg 4140
acctcgggga gaaccaccac ctgcgggtct ccttctccaa gtccaccatc tagctcgaga 4200
tatctagacc cagctttctt gtacaaagtg gttcgataaa ttgacgtaag ctagtctaga 4260
cggaattcta ccgggtaggg gaggcgcttt tcccaaggca gtctggagca tgcgctttag 4320
cagccccgct gggcacttgg cgctacacaa gtggcctctg gcctcgcaca cattccacat 4380
ccaccggtag gcgccaaccg gctccgttct ttggtggccc cttcgcgcca ccttctactc 4440
ctcccctagt caggaagttc ccccccgccc cgcagctcgc gtcgtgcagg acgtgacaaa 4500
tggaagtagc acgtctcact agtctcgtgc agatggacag caccgctgag caatggaagc 4560
gggtaggcct ttggggcagc ggccaatagc agctttgctc cttcgctttc tgggctcaga 4620
ggctgggaag gggtgggtcc gggggcgggc tcaggggcgg gctcaggggc ggggcgggcg 4680
cccgaaggtc ctccggaggc ccggcattct gcacgcttca aaagcgcacg tctgccgcgc 4740
tgttctcctc ttcctcatct ccgggccttt cgacctgcag cccaagctta ccatgaccga 4800
gtacaagccc acggtgcgcc tcgccacccg cgacgacgtc cccagggccg tacgcaccct 4860
cgccgccgcg ttcgccgact accccgccac gcgccacacc gtcgatccgg accgccacat 4920
cgagcgggtc accgagctgc aagaactctt cctcacgcgc gtcgggctcg acatcggcaa 4980
ggtgtgggtc gcggacgacg gcgccgcggt ggcggtctgg accacgccgg agagcgtcga 5040
agcgggggcg gtgttcgccg agatcggccc gcgcatggcc gagttgagcg gttcccggct 5100
ggccgcgcag caacagatgg aaggcctcct ggcgccgcac cggcccaagg agcccgcgtg 5160
gttcctggcc accgtcggcg tctcgcccga ccaccagggc aagggtctgg gcagcgccgt 5220
cgtgctcccc ggagtggagg cggccgagcg cgccggggtg cccgccttcc tggagacctc 5280
cgcgccccgc aacctcccct tctacgagcg gctcggcttc accgtcaccg ccgacgtcga 5340
ggtgcccgaa ggaccgcgca cctggtgcat gacccgcaag cccggtgcct gacgcccgcc 5400
ccacgacccg cagcgcccga ccgaaaggag cgcacgaccc catgcatcga taaaataaaa 5460
gattttattt agtctccaga aaaagggggg aatgaaagac cccacctgta ggtttggcaa 5520
gctagcttaa gtaacgccat tttgcaaggc atggaaaata cataactgag aatagagaag 5580
ttcagatcaa ggttaggaac agagagacag cagaatatgg gccaaacagg atatctgtgg 5640
taagcagttc ctgccccggc tcagggccaa gaacagatgg tccccagatg cggtcccgcc 5700
ctcagcagtt tctagagaac catcagatgt ttccagggtg ccccaaggac ctgaaatgac 5760
cctgtgcctt atttgaacta accaatcagt tcgcttctcg cttctgttcg cgcgcttctg 5820
ctccccgagc tcaataaaag agcccacaac ccctcactcg gcgcgccagt cctccgatag 5880
actgcgtcgc ccgggtaccc gtgtatccaa taaaccctct tgcagttgca tccgacttgt 5940
ggtctcgctg ttccttggga gggtctcctc tgagtgattg actacccgtc agcgggggtc 6000
tttcatgggt aacagtttct tgaagttgga gaacaacatt ctgagggtag gagtcgaata 6060
ttaagtaatc ctgactcaat tagccactgt tttgaatcca catactccaa tactcctgaa 6120
atagttcatt atggacagcg cagaaagagc tggggagaat tgtgaaattg ttatccgctc 6180
acaattccac acaacatacg agccggaagc ataaagtgta aagcctgggg tgcctaatga 6240
gtgagctaac tcacattaat tgcgttgcgc tcactgcccg ctttccagtc gggaaacctg 6300
tcgtgccagc tgcattaatg aatcggccaa cgcgcgggga gaggcggttt gcgtattggg 6360
cgctcttccg cttcctcgct cactgactcg ctgcgctcgg tcgttcggct gcggcgagcg 6420
gtatcagctc actcaaaggc ggtaatacgg ttatccacag aatcagggga taacgcagga 6480
aagaacatgt gagcaaaagg ccagcaaaag gccaggaacc gtaaaaaggc cgcgttgctg 6540
gcgtttttcc ataggctccg cccccctgac gagcatcaca aaaatcgacg ctcaagtcag 6600
aggtggcgaa acccgacagg actataaaga taccaggcgt ttccccctgg aagctccctc 6660
gtgcgctctc ctgttccgac cctgccgctt accggatacc tgtccgcctt tctcccttcg 6720
ggaagcgtgg cgctttctca tagctcacgc tgtaggtatc tcagttcggt gtaggtcgtt 6780
cgctccaagc tgggctgtgt gcacgaaccc cccgttcagc ccgaccgctg cgccttatcc 6840
ggtaactatc gtcttgagtc caacccggta agacacgact tatcgccact ggcagcagcc 6900
actggtaaca ggattagcag agcgaggtat gtaggcggtg ctacagagtt cttgaagtgg 6960
tggcctaact acggctacac tagaagaaca gtatttggta tctgcgctct gctgaagcca 7020
gttaccttcg gaaaaagagt tggtagctct tgatccggca aacaaaccac cgctggtagc 7080
ggtggttttt ttgtttgcaa gcagcagatt acgcgcagaa aaaaaggatc tcaagaagat 7140
cctttgatct tttctacggg gtctgacgct cagtggaacg aaaactcacg ttaagggatt 7200
ttggtcatga gattatcaaa aaggatcttc acctagatcc ttttaaatta aaaatgaagt 7260
tttaaatcaa tctaaagtat atatgagtaa acttggtctg acagttacca atgcttaatc 7320
agtgaggcac ctatctcagc gatctgtcta tttcgttcat ccatagttgc ctgactcccc 7380
gtcgtgtaga taactacgat acgggagggc ttaccatctg gccccagtgc tgcaatgata 7440
ccgcgagacc cacgctcacc ggctccagat ttatcagcaa taaaccagcc agccggaagg 7500
gccgagcgca gaagtggtcc tgcaacttta tccgcctcca tccagtctat taattgttgc 7560
cgggaagcta gagtaagtag ttcgccagtt aatagtttgc gcaacgttgt tgccattgct 7620
acaggcatcg tggtgtcacg ctcgtcgttt ggtatggctt cattcagctc cggttcccaa 7680
cgatcaaggc gagttacatg atcccccatg ttgtgcaaaa aagcggttag ctccttcggt 7740
cctccgatcg ttgtcagaag taagttggcc gcagtgttat cactcatggt tatggcagca 7800
ctgcataatt ctcttactgt catgccatcc gtaagatgct tttctgtgac tggtgagtac 7860
tcaaccaagt cattctgaga atagtgtatg cggcgaccga gttgctcttg cccggcgtca 7920
atacgggata ataccgcgcc acatagcaga actttaaaag tgctcatcat tggaaaacgt 7980
tcttcggggc gaaaactctc aaggatctta ccgctgttga gatccagttc gatgtaaccc 8040
actcgtgcac ccaactgatc ttcagcatct tttactttca ccagcgtttc tgggtgagca 8100
aaaacaggaa ggcaaaatgc cgcaaaaaag ggaataaggg cgacacggaa atgttgaata 8160
ctcatactct tcctttttca atattattga agcatttatc agggttattg tctcatgagc 8220
ggatacatat ttgaatgtat ttagaaaaat aaacaaatag gggttccgcg cacatttccc 8280
cgaaaagtgc cacctgacgt ctaagaaacc attattatca tgacattaac ctataaaaat 8340
aggcgtatca cgaggccctt tcgtctcgcg cgtttcggtg atgacggtga aaacctctga 8400
cacatgcagc tcccggagac ggtcacagct tgtctgtaag cggatgccgg gagcagacaa 8460
gcccgtcagg gcgcgtcagc gggtgttggc gggtgtcggg gctggcttaa ctatgcggca 8520
tcagagcaga ttgtactgag agtgcaccat atgcggtgtg aaataccgca cagatgcgta 8580
aggagaaaat accgcatcag gcgccattcg ccattcaggc tgcgcaactg ttgggaaggg 8640
cgatcggtgc gggcctcttc gctattacgc cagctggcga aagggggatg tgctgcaagg 8700
cgattaagtt gggtaacgcc agggttttcc cagtcacgac gttgtaaaac gacggcgcaa 8760
ggaatggtgc atgcaaggag atggcgccca acagtccccc ggccacgggg cctgccacca 8820
tacccacgcc gaaacaagcg ctcatgagcc cgaagtggcg agcccgatct tccccatcgg 8880
tgatgtcggc gatataggcg ccagcaaccg cacctgtggc gccggtgatg ccggccacga 8940
tgcgtccggc gtagaggcga ttagtccaat ttgttaaaga caggatatca gtggtccagg 9000
ctctagtttt gactcaacaa tatcaccagc tgaagcctat agagtacgag ccatagataa 9060
aataaaagat tttatttagt ctccagaaaa aggggggaa 9099
<210> 3
<211> 720
<212> DNA
<213> Artificial sequence
<220>
<223>
<400> 3
atggtgagca agggcgagga gctgttcacc ggggtggtgc ccatcctggt cgagctggac 60
ggcgacgtaa acggccacaa gttcagcgtg tccggcgagg gcgagggcga tgccacctac 120
ggcaagctga ccctgaagtt catctgcacc accggcaagc tgcccgtgcc ctggcccacc 180
ctcgtgacca ccttcggcta cggcctgcaa tgcttcgccc gctaccccga ccacatgaag 240
ctgcacgact tcttcaagtc cgccatgccc gaaggctacg tccaggagcg caccatcttc 300
ttcaaggacg acggcaacta caagacccgc gccgaggtga agttcgaggg cgacaccctg 360
gtgaaccgca tcgagctgaa gggcatcgac ttcaaggagg acggcaacat cctggggcac 420
aagctggagt acaactacaa cagccacaac gtctatatca tggccgacaa gcagaagaac 480
ggcatcaagg tgaacttcaa gatccgccac aacatcgagg acggcagcgt gcagctcgcc 540
gaccactacc agcagaacac ccccatcggc gacggccccg tgctgctgcc cgacaaccac 600
tacctgagct accagtccgc cctgagcaaa gaccccaacg agaagcgcga tcacatggtc 660
ctgctggagt tcgtgaccgc cgccgggatc actctcggca tggacgagct gtacaagtaa 720