CN111471665B

CN111471665B - DNA cyclization molecule and application thereof

Info

Publication number: CN111471665B
Application number: CN201910063623.6A
Authority: CN
Inventors: 冯松杰; 江雯; 黄行许
Original assignee: ShanghaiTech University
Current assignee: ShanghaiTech University
Priority date: 2019-01-23
Filing date: 2019-01-23
Publication date: 2023-07-04
Anticipated expiration: 2039-01-23
Also published as: CN111471665A

Abstract

The invention relates to the field of biotechnology, in particular to a fusion protein and application thereof. The invention provides a fusion protein, comprising a homodimer fragment and a dCAS9 protein fragment, wherein the homodimer fragment comprises an LDB1 protein fragment or a Dimer domain fragment of an LDB1 protein. The invention combines LDB1 with LDB1spdCAS9 forms a novel fusion protein LDB1-dCAS9, achieves the aim through the three-dimensional structure change of chromatin, does not need to change genome sequence information or epigenetic modification, has simple operation, short experimental preparation period, greatly reduces time and working cost, does not need to add any small molecule to induce the formation of dimer, and can simultaneously target two sites to form a ring only by one CRISPR-Cas9 due to the existence of homodimer monomers, thereby greatly reducing time and working cost, and improving the efficiency of DNA ring formation when the target sites of genes are added.

Description

A DNA cyclization molecule and its use

技术领域Technical Field

本发明涉及生物技术领域，特别是涉及一种融合蛋白及其用途。The present invention relates to the field of biotechnology, and in particular to a fusion protein and application thereof.

背景技术Background Art

细胞核中染色质的折叠和不同区域的相互作用产生了染色质三维结构，染色质三维结构又在基因表达中扮演关键作用。启动子和增强子序列是调控基因表达的顺式作用元件。启动子是临近基因上游一段DNA区域，能够招募转录因子结合启动基因表达。增强子是位于目标基因上下游一定距离的一段序列，并通过DNA成环与目标基因启动子相互作用而激活或提高目标基因表达。基于不同胚胎发育阶段的研究更详细的描述出在发育过程中增强子和增强子成环的动态变化对基因时空表达的调控。其中研究最深入的是人β-globin基因簇。该基因簇中的远程增强子被称为“基因簇调控区(LCR)”。在发育过程中，LCR通过与不同基因启动子结合成环依次调控表达胚胎的ε-globin(HBE)、婴儿的γ-globin(HBG)、以及成人的δ-globin(HBE)和β-globin(HBB)。因此，人工成环作为一个可行的策略可以被用来研究增强子对内源基因表达的调控，甚至具有运用在疾病治疗中的潜力。The folding of chromatin in the cell nucleus and the interaction between different regions produce the three-dimensional structure of chromatin, which plays a key role in gene expression. Promoter and enhancer sequences are cis-acting elements that regulate gene expression. The promoter is a DNA region adjacent to the upstream of the gene that can recruit transcription factors to bind and initiate gene expression. The enhancer is a sequence located a certain distance upstream and downstream of the target gene, and activates or increases the expression of the target gene by interacting with the promoter of the target gene through DNA looping. Studies based on different embryonic development stages have described in more detail the regulation of gene spatiotemporal expression by the dynamic changes of enhancers and enhancer looping during development. Among them, the most intensively studied is the human β-globin gene cluster. The long-range enhancers in this gene cluster are called "gene cluster regulatory regions (LCRs)". During development, LCRs regulate the expression of embryonic ε-globin (HBE), infant γ-globin (HBG), and adult δ-globin (HBE) and β-globin (HBB) by binding to different gene promoters to form loops. Therefore, artificial looping can be used as a feasible strategy to study the regulation of endogenous gene expression by enhancers and even has the potential to be used in disease treatment.

发明内容Summary of the invention

鉴于以上所述现有技术的缺点，本发明的目的在于提供一种融合蛋白及其用途，用于解决现有技术中的问题。In view of the above-mentioned shortcomings of the prior art, the object of the present invention is to provide a fusion protein and use thereof to solve the problems in the prior art.

为实现上述目的及其他相关目的，本发明一方面提供一种融合蛋白，包括同源二聚体片段和dCas9蛋白片段，所述同源二聚体片段包括LDB1蛋白片段或LDB1蛋白的Dimerdomain片段。To achieve the above objectives and other related objectives, the present invention provides a fusion protein, including a homodimer fragment and a dCas9 protein fragment, wherein the homodimer fragment includes an LDB1 protein fragment or a Dimerdomain fragment of the LDB1 protein.

在本发明一些实施方式中，所述LDB1蛋白片段的氨基酸序列包括：In some embodiments of the present invention, the amino acid sequence of the LDB1 protein fragment comprises:

a)如SEQ ID NO.1所示的氨基酸序列；或，a) the amino acid sequence shown in SEQ ID NO.1; or,

b)与SEQ ID NO.1具有80％以上序列相似性的氨基酸序列、且具有a)所限定的氨基酸序列的功能，优选为能够形成同源二聚体。b) an amino acid sequence having a sequence similarity of 80% or more to SEQ ID NO. 1, having the function of the amino acid sequence defined in a), and preferably being able to form a homodimer.

在本发明一些实施方式中，所述LDB1蛋白的Dimer domain片段的氨基酸序列包括：In some embodiments of the present invention, the amino acid sequence of the dimer domain fragment of the LDB1 protein includes:

c)如SEQ ID NO.2所示的氨基酸序列；或，c) the amino acid sequence shown in SEQ ID NO.2; or,

d)与SEQ ID NO.2具有80％以上序列相似性的氨基酸序列、且具有c)所限定的氨基酸序列的功能，优选为能够形成同源二聚体。d) an amino acid sequence having a sequence similarity of 80% or more to SEQ ID NO. 2, and having the function of the amino acid sequence defined in c), preferably being able to form a homodimer.

在本发明一些实施方式中，所述dCas9蛋白片段的氨基酸序列包括：In some embodiments of the present invention, the amino acid sequence of the dCas9 protein fragment comprises:

e)如SEQ ID NO.3所示的氨基酸序列；或，e) the amino acid sequence shown in SEQ ID NO.3; or,

f)与SEQ ID NO.3具有80％以上序列相似性的氨基酸序列、且具有e)所限定的氨基酸序列的功能，优选为能够特异性靶向位点的sgRNA相配合。f) an amino acid sequence having a sequence similarity of 80% or more to SEQ ID NO. 3 and having the function of the amino acid sequence defined in e), preferably in combination with an sgRNA capable of specifically targeting the site.

在本发明一些实施方式中，所述融合蛋白自5’端至3’端依次包括同源二聚体片段和dCas9蛋白片段。In some embodiments of the present invention, the fusion protein includes a homodimer fragment and a dCas9 protein fragment in sequence from the 5' end to the 3' end.

在本发明一些实施方式中，所述融合蛋白自5’端至3’端依次包括dCas9蛋白片段和同源二聚体片段。In some embodiments of the present invention, the fusion protein includes a dCas9 protein fragment and a homodimer fragment from the 5' end to the 3' end.

在本发明一些实施方式中，所述融合蛋白还包括柔性连接肽段，所述柔性连接肽段位于同源二聚体片段和dCas9蛋白片段之间，优选的，所述柔性连接肽段的氨基酸序列如SEQ ID NO.41～42所示。In some embodiments of the present invention, the fusion protein further comprises a flexible connecting peptide segment, and the flexible connecting peptide segment is located between the homodimer fragment and the dCas9 protein fragment. Preferably, the amino acid sequence of the flexible connecting peptide segment is shown in SEQ ID NO.41-42.

在本发明一些实施方式中，所述融合蛋白的氨基酸序列如SEQ ID No.37～40其中之一所示。In some embodiments of the present invention, the amino acid sequence of the fusion protein is shown in one of SEQ ID No. 37-40.

本发明另一方面提供一种分离的多核苷酸，编码所述的融合蛋白。Another aspect of the present invention provides an isolated polynucleotide encoding the fusion protein.

本发明另一方面提供一种DNA成环体系，包括所述的融合蛋白，还包括靶向启动子的sgRNA和靶向增强子的sgRNA。On the other hand, the present invention provides a DNA looping system, comprising the fusion protein, and also comprising a sgRNA targeting a promoter and a sgRNA targeting an enhancer.

在本发明一些实施方式中，所述靶向启动子的sgRNA靶向基因TSS上游-100至-200bp区间。In some embodiments of the present invention, the promoter-targeting sgRNA targets the -100 to -200 bp interval upstream of the gene TSS.

在本发明一些实施方式中，所述靶向启动子的sgRNA具有gnnnnnnnnnnnnnnnnnnnNGG特征(SEQ ID NO.43)。In some embodiments of the present invention, the sgRNA targeting the promoter has the feature of gnnnnnnnnnnnnnnnnnnnNGG (SEQ ID NO.43).

在本发明一些实施方式中，所述靶向启动子的GC含量在40-60％之间。In some embodiments of the present invention, the GC content of the targeted promoter is between 40-60%.

在本发明一些实施方式中，所述靶向启动子的sgRNA靶向HBB基因的启动子区域，优选的，所述靶向启动子的sgRNA的序列如SEQ ID NO.4～6所示。In some embodiments of the present invention, the promoter-targeting sgRNA targets the promoter region of the HBB gene. Preferably, the sequence of the promoter-targeting sgRNA is shown in SEQ ID NOs. 4 to 6.

在本发明一些实施方式中，所述靶向增强子的sgRNA靶向增强子的DHS区域。In some embodiments of the present invention, the sgRNA targeting the enhancer targets the DHS region of the enhancer.

在本发明一些实施方式中，所述靶向增强子的sgRNA靶向β-globin的LCR区域的DHS2附近。In some embodiments of the present invention, the sgRNA targeting the enhancer targets the vicinity of DHS2 in the LCR region of β-globin.

在本发明一些实施方式中，所述靶向增强子的sgRNA的序列如SEQ ID NO.7～9所示。In some embodiments of the present invention, the sequence of the sgRNA targeting the enhancer is shown in SEQ ID NOs. 7 to 9.

本发明另一方面提供一种表达系统，所述表达系统包括能够表达所述融合蛋白、所述靶向启动子的sgRNA和所述靶向增强子的sgRNA的宿主细胞。Another aspect of the present invention provides an expression system, comprising a host cell capable of expressing the fusion protein, the sgRNA targeting the promoter, and the sgRNA targeting the enhancer.

在本发明一些实施方式中，所述表达系统包括含有编码所述融合蛋白的多核苷酸的表达载体的宿主细胞、或染色体中整合有编码所述融合蛋白的多核苷酸的宿主细胞。In some embodiments of the present invention, the expression system comprises a host cell containing an expression vector of a polynucleotide encoding the fusion protein, or a host cell in which a polynucleotide encoding the fusion protein is integrated into its chromosome.

在本发明一些实施方式中，所述表达系统包括含有编码所述靶向启动子的sgRNA的多核苷酸的表达载体的宿主细胞、或染色体中整合有编码所述靶向启动子的sgRNA的多核苷酸的宿主细胞。In some embodiments of the present invention, the expression system includes a host cell containing an expression vector encoding a polynucleotide of the sgRNA targeting the promoter, or a host cell in which a polynucleotide encoding the sgRNA targeting the promoter is integrated into the chromosome.

在本发明一些实施方式中，所述表达系统包括含有编码所述靶向增强子的sgRNA的多核苷酸的表达载体的宿主细胞，或染色体中整合有编码所述靶向增强子的sgRNA的多核苷酸的宿主细胞。In some embodiments of the present invention, the expression system includes a host cell containing an expression vector encoding a polynucleotide of the sgRNA targeting the enhancer, or a host cell in which a polynucleotide encoding the sgRNA targeting the enhancer is integrated into the chromosome.

在本发明一些实施方式中，所述表达系统还包括能够表达目标基因的宿主细胞。In some embodiments of the present invention, the expression system further comprises a host cell capable of expressing the target gene.

在本发明一些实施方式中，所述宿主细胞选自真核细胞。In some embodiments of the present invention, the host cell is selected from eukaryotic cells.

在本发明一些实施方式中，所述宿主细胞选自后生动物来源的原代细胞或永生化细胞系。In some embodiments of the present invention, the host cell is selected from primary cells or immortalized cell lines derived from metazoans.

在本发明一些实施方式中，所述宿主细胞选自血系细胞系。In some embodiments of the invention, the host cell is selected from a blood cell line.

在本发明一些实施方式中，所述宿主细胞选自人K562细胞。In some embodiments of the present invention, the host cell is selected from human K562 cells.

本发明另一方面提供所述的DNA成环分子、所述的多核苷酸、所述的成环体系、所述的表达系统在基因表达中的用途。Another aspect of the present invention provides uses of the DNA circularization molecule, the polynucleotide, the circularization system, and the expression system in gene expression.

在本发明一些实施方式中，所述基因表达中的用途为真核生物的基因表达中的用途。In some embodiments of the present invention, the use in gene expression is use in gene expression in eukaryotic organisms.

在本发明一些实施方式中，所述真核生物选自后生动物。In some embodiments of the invention, the eukaryotic organism is selected from metazoa.

在本发明一些实施方式中，所述真核生物选自人、小鼠、线虫、果蝇中的一种或多种的组合。In some embodiments of the present invention, the eukaryotic organism is selected from a combination of one or more of humans, mice, nematodes, and fruit flies.

本发明另一方面提供一种基因表达方法，包括：通过所述的融合蛋白、或所述的成环体系，拉近靶向位点的三维空间距离，进行基因表达。Another aspect of the present invention provides a gene expression method, comprising: shortening the three-dimensional spatial distance of the target site by the fusion protein or the looping system to perform gene expression.

在本发明一些实施方式中，所述基因表达方法包括：在所述成环体系存在的条件下，在适当条件下培养能够表达目标基因的宿主细胞。In some embodiments of the present invention, the gene expression method comprises: culturing a host cell capable of expressing the target gene under appropriate conditions in the presence of the looping system.

在本发明一些实施方式中，所述基因表达方法为体外基因表达方法。In some embodiments of the present invention, the gene expression method is an in vitro gene expression method.

在本发明一些实施方式中，所述基因表达方法包括：在适当条件下培养求所述的表达系统。In some embodiments of the present invention, the gene expression method comprises: culturing the expression system under appropriate conditions.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1显示为本发明LDB1-dCas9介导的DNA成环对目的基因空间位置重编程的示意图。FIG1 is a schematic diagram showing the spatial position reprogramming of the target gene by DNA looping mediated by LDB1-dCas9 of the present invention.

图2显示为本发明LDB1-dCas9与dCas9-LDB1介导DNA成环后β-globin基因簇里各基因表达的变化。FIG. 2 shows the changes in the expression of genes in the β-globin gene cluster after DNA circularization mediated by LDB1-dCas9 and dCas9-LDB1 of the present invention.

图3显示为本发明基因簇里其他globin基因的表达情况示意图。FIG3 is a schematic diagram showing the expression of other globin genes in the gene cluster of the present invention.

图4显示为本发明LDB1-dCas9、dCas9-LDB1、dCas9-DD与DD-dCas9激活HBB基因的效率比较。FIG4 shows a comparison of the efficiency of activating the HBB gene by LDB1-dCas9, dCas9-LDB1, dCas9-DD and DD-dCas9 of the present invention.

具体实施方式DETAILED DESCRIPTION

本发明发明人经过大量探索性研究，提供了一种新型DNA环化分子，所述DNA环化分子包括由LDB1与dCas9所形成的融合蛋白，可以通过重编程基因的空间位置调控基因表达，在此基础上完成了本发明。After a lot of exploratory research, the inventors of the present invention provided a new type of DNA cyclization molecule, which includes a fusion protein formed by LDB1 and dCas9, and can regulate gene expression by reprogramming the spatial position of the gene. On this basis, the present invention was completed.

本发明第一方面提供一种融合蛋白，包括同源二聚体片段和dCas9蛋白片段，所述同源二聚体片段包括LDB1蛋白片段或LDB1蛋白的Dimer domain(DD)片段。本发明所提供的环化分子通常可以是融合蛋白，可以通过重编程基因的空间位置，从而调控基因的表达，由于同源二聚体片段的存在，所以只需要一个CRISPR-Cas9就能同时靶向两个位点成环，从而可以从空间位置上拉近启动子、增强子和目标基因之间的距离，提升DNA成环的效率，已达到调控目标基因表达的效果。The first aspect of the present invention provides a fusion protein, including a homodimer fragment and a dCas9 protein fragment, wherein the homodimer fragment includes an LDB1 protein fragment or a Dimer domain (DD) fragment of the LDB1 protein. The cyclized molecule provided by the present invention can generally be a fusion protein, which can regulate the expression of the gene by reprogramming the spatial position of the gene. Due to the presence of the homodimer fragment, only one CRISPR-Cas9 is needed to simultaneously target two sites to form a loop, thereby shortening the distance between the promoter, enhancer and target gene in terms of spatial position, improving the efficiency of DNA looping, and achieving the effect of regulating the expression of the target gene.

本发明所提供的融合蛋白中，所述LDB1蛋白片段的氨基酸序列可以包括：a)如SEQID NO.1所示的氨基酸序列；或，b)与SEQ ID NO.1具有80％以上序列相似性的氨基酸序列、且具有a)所限定的氨基酸序列的功能；具体的，所述b)中的氨基酸序列具体指：如SEQ IDNo.1其中之一所示的氨基酸序列经过取代、缺失或者添加一个或多个(具体可以是1-50、1-30个、1-20个、1-10个、1-5个、或1-3个)氨基酸而得到的，或者在N-末端和/或C-末端添加一个或多个(具体可以是1-50个、1-30个、1-20个、1-10个、1-5个、或1-3个)氨基酸而得到的，且具有氨基酸如SEQ ID No.1所示的多肽片段的功能的多肽片段，例如，可以形成同源二聚体。所述b)中的氨基酸序列可与SEQ ID No.1具有80％、85％、90％、93％、95％、97％、或99％以上的相似性。In the fusion protein provided by the present invention, the amino acid sequence of the LDB1 protein fragment may include: a) an amino acid sequence as shown in SEQ ID NO.1; or, b) an amino acid sequence having a sequence similarity of more than 80% with SEQ ID NO.1 and having the function of the amino acid sequence defined in a); specifically, the amino acid sequence in b) specifically refers to: an amino acid sequence as shown in one of SEQ ID No.1 obtained by substitution, deletion or addition of one or more (specifically 1-50, 1-30, 1-20, 1-10, 1-5, or 1-3) amino acids, or an amino acid sequence as shown in one of SEQ ID No.1 obtained by addition of one or more (specifically 1-50, 1-30, 1-20, 1-10, 1-5, or 1-3) amino acids at the N-terminus and/or C-terminus, and having the function of the polypeptide fragment as shown in SEQ ID No.1, for example, can form a homodimer. The amino acid sequence in b) may have 80%, 85%, 90%, 93%, 95%, 97%, or 99% or more similarity to SEQ ID No.1.

MLDRDVGPTPMYPPTYLEPGIGRHTPYGNQTDYRIFELNKRLQNWTEECDNLWWDAFTTEFFEDDAMLTITFCLEDGPKRYTIGRTLIPRYFRSIFEGGATELYYVLKHPKEAFHSNFVSLDCDQGSMVTQHGKPMFTQVCVEGRLYLEFMFDDMMRIKTWHFSIRQHRELIPRSILAMHAQDPQMLDQLSKNITRCGLSNSTLNYLRLCVILEPMQELMSRHKTYSLSPRDCLKTCLFQKWQRMVAPPAEPTRQQPSKRRKRKMSGGSTMSSGGGNTNNSNSKKKSPASTFALSSQVPDVMVVGEPTLMGGEFGDEDERLITRLENTQFDAANGIDDEDSFNNSPALGANSPWNSKPPSSQESKSENPTSQASQ(SEQ ID NO.1)MLDRDVGPTPMYPPTYLEPGIGRHTPYGNQTDYRIFELNKRLQNWTEECDNLWWDAFTTEFFEDDAMLTITFCLEDGPKRYTIGRTLIPRYFRSIFEGGATELYYVLKHPKEAFHSNFVSLDCDQGSMVTQHGKPMFTQVCVEGRLYLEFMFDDMMRIKTWHFSIRQHRELIPRSILAMHAQDPQMLDQLSKNITRCGLSNSTLNYLRLCVILEPMQELMSRH KTYSLSPRDCLKTCLFQKWQRMVAPPAEPTRQQPSKRRKRKMSGGSTMSSGGGNTNNSNSKKKSPASTFALSSQVPDVMVVGEPTLMMGGEFGDEDERLITRLENTQFDAANGIDDEDSFNNSPALGANSPWNSKPPSSQESKSENPTSQASQ(SEQ ID NO.1)

本发明所提供的融合蛋白中，所述LDB1蛋白的Dimer domain片段是LDB1蛋白中用于形成同源二聚体的片段，所述LDB1蛋白的Dimer domain片段的氨基酸序列可以包括：c)如SEQ ID NO.2所示的氨基酸序列；或，d)与SEQ ID NO.2具有80％以上序列相似性的氨基酸序列、且具有c)所限定的氨基酸序列的功能；具体的，所述d)中的氨基酸序列具体指：如SEQ ID No.2其中之一所示的氨基酸序列经过取代、缺失或者添加一个或多个(具体可以是1-50、1-30个、1-20个、1-10个、1-5个、或1-3个)氨基酸而得到的，或者在N-末端和/或C-末端添加一个或多个(具体可以是1-50个、1-30个、1-20个、1-10个、1-5个、或1-3个)氨基酸而得到的，且具有氨基酸如SEQ ID No.2所示的多肽片段的功能的多肽片段，例如，可以形成同源二聚体。所述d)中的氨基酸序列可与SEQ ID No.2具有80％、85％、90％、93％、95％、97％、或99％以上的相似性。In the fusion protein provided by the present invention, the dimer domain fragment of the LDB1 protein is a fragment of the LDB1 protein used to form a homodimer, and the amino acid sequence of the dimer domain fragment of the LDB1 protein may include: c) an amino acid sequence as shown in SEQ ID NO.2; or, d) an amino acid sequence having a sequence similarity of more than 80% with SEQ ID NO.2, and having the function of the amino acid sequence defined in c); specifically, the amino acid sequence in d) specifically refers to: an amino acid sequence as shown in one of SEQ ID No.2 obtained by substitution, deletion or addition of one or more (specifically 1-50, 1-30, 1-20, 1-10, 1-5, or 1-3) amino acids, or an amino acid sequence as shown in one of SEQ ID No.2 obtained by adding one or more (specifically 1-50, 1-30, 1-20, 1-10, 1-5, or 1-3) amino acids to the N-terminus and/or C-terminus, and having the function of the polypeptide fragment as shown in SEQ ID No.2, for example, can form a homodimer. The amino acid sequence in d) may have 80%, 85%, 90%, 93%, 95%, 97%, or 99% or more similarity to SEQ ID No.2.

MLDRDVGPTPMYPPTYLEPGIGRHTPYGNQTDYRIFELNKRLQNWTEECDNLWWDAFTTEFFEDDAMLTITFCLEDGPKRYTIGRTLIPRYFRSIFEGGATELYYVLKHPKEAFHSNFVSLDCDQGSMVTQHGKPMFTQVCVEGRLYLEFMFDDMMRIKTWHFSIRQHRELIPRSILAMHAQDPQMLDQLSKNITRCGLS(SEQID NO.2)MLDRDVGPTPMYPPTYLEPGIGRHTPYGNQTDYRIFELNKRLQNWTEECDNLWWDAFTTEFFEDDAMLTITFCLEDGPKRYTIGRTLIPRYFRSIFEGGATELYYVLKHPKEAFHSNFVSLDCDQGSMVTQHGKPMFTQVCVEGRLYLEFMFDDMMRIKTWHFSIRQHRELIPRSILAMHAQDPQMLDQLSKNITRCGLS(SEQID NO.2)

本发明所提供的融合蛋白中，所述dCas9蛋白片段的氨基酸序列可以包括：e)如SEQ IDNO.3所示的氨基酸序列；或，f)与SEQ ID NO.3具有80％以上序列相似性的氨基酸序列、且具有e)所限定的氨基酸序列的功能；具体的，所述f)中的氨基酸序列具体指：如SEQIDNo.3其中之一所示的氨基酸序列经过取代、缺失或者添加一个或多个(具体可以是1-50、1-30个、1-20个、1-10个、1-5个、或1-3个)氨基酸而得到的，或者在N-末端和/或C-末端添加一个或多个(具体可以是1-50个、1-30个、1-20个、1-10个、1-5个、或1-3个)氨基酸而得到的，且具有氨基酸如SEQ ID No.3所示的多肽片段的功能的多肽片段，例如，与特异性靶向位点(例如，靶向启动子、增强子等)的sgRNA相配合，识别靶向位点，从而可以拉近靶向位点之间的三维空间距离。所述f)中的氨基酸序列可与SEQ ID No.3具有80％、85％、90％、93％、95％、97％、或99％以上的相似性。In the fusion protein provided by the present invention, the amino acid sequence of the dCas9 protein fragment may include: e) an amino acid sequence as shown in SEQ ID NO.3; or, f) an amino acid sequence having a sequence similarity of more than 80% with SEQ ID NO.3, and having the function of the amino acid sequence defined in e); specifically, the amino acid sequence in f) specifically refers to: an amino acid sequence as shown in one of SEQ ID No.3 obtained by substitution, deletion or addition of one or more (specifically 1-50, 1-30, 1-20, 1-10, 1-5, or 1-3) amino acids, or a polypeptide fragment obtained by adding one or more (specifically 1-50, 1-30, 1-20, 1-10, 1-5, or 1-3) amino acids to the N-terminus and/or C-terminus, and having the function of a polypeptide fragment as shown in SEQ ID No.3, for example, cooperating with an sgRNA of a specific targeting site (e.g., targeting a promoter, enhancer, etc.) to recognize the targeting site, thereby shortening the three-dimensional spatial distance between the targeting sites. The amino acid sequence in f) may have 80%, 85%, 90%, 93%, 95%, 97%, or 99% or more similarity to SEQ ID No.3.

DKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKARGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD(SEQ ID NO.3)DKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSAR LSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQL PEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIV DLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGLSRKLINGIRDKQSGKTILDFLKSD GFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKARGKSDNVPSEEVVKKMKNYWRQLLNAK LITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEI GKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLK GSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD(SEQ ID NO.3)

本发明所提供的融合蛋白中，所述融合蛋白自5’端至3’端可以依次包括同源二聚体片段和dCas9蛋白片段，例如，同源二聚体片段可以与dCas9蛋白氨基端连接。所述融合蛋白自5’端至3’端可以依次包括dCas9蛋白片段和同源二聚体片段，例如，同源二聚体片段可以与dCas9蛋白羧基端连接。In the fusion protein provided by the present invention, the fusion protein may include a homodimer fragment and a dCas9 protein fragment in sequence from the 5' end to the 3' end, for example, the homodimer fragment may be connected to the amino terminus of the dCas9 protein. The fusion protein may include a dCas9 protein fragment and a homodimer fragment in sequence from the 5' end to the 3' end, for example, the homodimer fragment may be connected to the carboxyl terminus of the dCas9 protein.

本发明所提供的融合蛋白中，所述融合蛋白还包括柔性连接肽段，所述柔性连接肽段通常位于同源二聚体片段和dCas9蛋白片段之间。本领域技术人员通常可以选择合适的柔性连接肽段以连接同源二聚体片段和dCas9蛋白片段，例如，当融合蛋白自5’端至3’端可以依次包括同源二聚体片段和dCas9蛋白片段，连接同源二聚体片段和dCas9蛋白片段的柔性连接肽段的氨基酸序列可以是SGSETPGTSESATPES(SEQ ID NO.41)。再例如，当融合蛋白自5’端至3’端可以依次包括dCas9蛋白片段和同源二聚体片段，连接同源二聚体片段和dCas9蛋白片段的柔性连接肽段的氨基酸序列可以是GRAGGGSGGGSGGGS(SEQ ID NO.42)。In the fusion protein provided by the present invention, the fusion protein further comprises a flexible connecting peptide segment, which is generally located between the homodimer fragment and the dCas9 protein fragment. Those skilled in the art can generally select a suitable flexible connecting peptide segment to connect the homodimer fragment and the dCas9 protein fragment. For example, when the fusion protein can include the homodimer fragment and the dCas9 protein fragment in sequence from the 5' end to the 3' end, the amino acid sequence of the flexible connecting peptide segment connecting the homodimer fragment and the dCas9 protein fragment can be SGSETPGTSESATPES (SEQ ID NO.41). For another example, when the fusion protein can include the dCas9 protein fragment and the homodimer fragment in sequence from the 5' end to the 3' end, the amino acid sequence of the flexible connecting peptide segment connecting the homodimer fragment and the dCas9 protein fragment can be GRAGGGSGGGSGGGS (SEQ ID NO.42).

在本发明一具体实施例中，所述融合蛋白的氨基酸序列可以如SEQ ID No.37～40所示。In a specific embodiment of the present invention, the amino acid sequence of the fusion protein may be as shown in SEQ ID No.37-40.

DKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKARGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDGSPKKKRKVGRAGGGSGGGSGGGSMLDRDVGPTPMYPPTYLEPGIGRHTPYGNQTDYRIFELNKRLQNWTEECDNLWWDAFTTEFFEDDAMLTITFCLEDGPKRYTIGRTLIPRYFRSIFEGGATELYYVLKHPKEAFHSNFVSLDCDQGSMVTQHGKPMFTQVCVEGRLYLEFMFDDMMRIKTWHFSIRQHRELIPRSILAMHAQDPQMLDQLSKNITRCGLSNSTLNYLRLCVILEPMQELMSRHKTYSLSPRDCLKTCLFQKWQRMVAPPAEPTRQQPSKRRKRKMSGGSTMSSGGGNTNNSNSKKKSPASTFALSSQVPDVMVVGEPTLMGGEFGDEDERLITRLENTQFDAANGIDDEDSFNNSPALGANSPWNSKPPSSQESKSENPTSQASQG(SEQ ID No.37).DKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSAR LSKSRRL ENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQ EDFYPFLKDNREKIEKI LTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIIEERLKTYAHLFDDKVMKQLK RRRYTGWGRLSR KLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKARGKSDNVP SEEVVKKMKNYWRQL LNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQ VNIVKKTEVQTGGFSK ESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPA AFKYFDT TIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDGSPKKKRKVGRAGGGSGGGSGGGSMLDRDVGPTPMYPPTYLEPGIGRHTPYGNQTDYRIFELNKRLQNWTEECDNLWWDAFTTEFFEDDAMLTITFCLEDGPKRYTIGRTLIPRYFRSIFEGGATELYYVLKHPKEAFHSNFVSLDCDQGSMVTQHGKPMFTQVCVEGRLYLEFMFDDMMRI KTWHFSIRQHRELIPRSILAMHAQDPQMLDQLSKNITRCGLSNSTLNYLRLCVILEPMQELMSRHKTYSLSPRDCLKTCLFQKWQRMVAPPAEPTRQQPSKRRKRKMSGGSTMSSGGGNTNNSNSKKKSPASTFALSSQVPDVMVVGEPTLMGGEFGDEDERLITRLENTQFDAANGIDDEDSFNNSPALGANSPWNSKPPSSQESKSENPTSQASQ G(SEQ ID No.37).

MLDRDVGPTPMYPPTYLEPGIGRHTPYGNQTDYRIFELNKRLQNWTEECDNLWWDAFTTEFFEDDAMLTITFCLEDGPKRYTIGRTLIPRYFRSIFEGGATELYYVLKHPKEAFHSNFVSLDCDQGSMVTQHGKPMFTQVCVEGRLYLEFMFDDMMRIKTWHFSIRQHRELIPRSILAMHAQDPQMLDQLSKNITRCGLSNSTLNYLRLCVILEPMQELMSRHKTYSLSPRDCLKTCLFQKWQRMVAPPAEPTRQQPSKRRKRKMSGGSTMSSGGGNTNNSNSKKKSPASTFALSSQVPDVMVVGEPTLMGGEFGDEDERLITRLENTQFDAANGIDDEDSFNNSPALGANSPWNSKPPSSQESKSENPTSQASQSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKARGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDGSPKKKRKV(SEQ ID No.38)MLDRDVGPTPMYPPTYLEPGIGRHTPYGNQTDYRIFELNKRLQNWTEECDNLWWDAFTTEFFEDDAMLTITFCLEDGPKRYTIGRTLIPRYFRSIFEGGATELYYVLKHPKEAFHSNFVSLDCDQGSMVTQHGKPMFTQVCVEGRLYLEFMFDDMMRIKTWHFSIRQHRELIPRSILAMHAQDPQMLDQLSKNITRCGLSNSTLNYLRLCVILEPMQELMS RHKTYSLSPRDCLKTCLFQKWQRMVAPPAEPTRQQPSKRRKRKMSGGSTMSSGGGNTNNSNSKKKSPASTFALSSQVPDVMVVGEPTLMGGEFGDEDERLITRLENTQFDAANGIDDEDSFNNSPALGANSPWNSKPPSSQESKSENPTSQASQSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALL FDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTY DD DLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFE EVVDKGASAQSFIERM TNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLT FKEDIQKAQVSGQ GDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKARGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQ LVETRQITKHVAQILDSR MNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVA KVEKGKSKKLKS VKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDGSPKKKRK V(SEQ ID No.38)

MLDRDVGPTPMYPPTYLEPGIGRHTPYGNQTDYRIFELNKRLQNWTEECDNLWWDAFTTEFFEDDAMLTITFCLEDGPKRYTIGRTLIPRYFRSIFEGGATELYYVLKHPKEAFHSNFVSLDCDQGSMVTQHGKPMFTQVCVEGRLYLEFMFDDMMRIKTWHFSIRQHRELIPRSILAMHAQDPQMLDQLSKNITRCGLSSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKARGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDGSPKKKRKV(SEQ ID No.39)MLDRDVGPTPMYPPTYLEPGIGRHTPYGNQTDYRIFELNKRLQNWTEECDNLWWDAFTTEFFEDDAMLTITFCLEDGPKRYTIGRTLIPRYFRSIFEGGATELYYVLKHPKEAFHSNFVSLDCDQGSMVTQHGKPMFTQVCVEGRLYLEFMFDDMMRIKTWHFSIRQHRELIPRSILAMHAQDPQMLDQLSKNITRCGLSSGSETPGTSESATPESDKKYS IGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKL FIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKM DGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGV EDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGS QILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKARGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYH HAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDL IIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDGSPKKKRKV(SEQ ID No.39)

DKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKARGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDGSPKKKRKVGRAGGGSGGGSGGGSMLDRDVGPTPMYPPTYLEPGIGRHTPYGNQTDYRIFELNKRLQNWTEECDNLWWDAFTTEFFEDDAMLTITFCLEDGPKRYTIGRTLIPRYFRSIFEGGATELYYVLKHPKEAFHSNFVSLDCDQGSMVTQHGKPMFTQVCVEGRLYLEFMFDDMMRIKTWHFSIRQHRELIPRSILAMHAQDPQMLDQLSKNITRCGLSG(SEQ ID No.40)DKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSAR LSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLR KQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDF LDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEH PVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKARGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALI KKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKL PKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDGSPKKKRKVGRAGGGSGGGSGGGSMLDRDVGPTPMYPPTYLEPGIGRHHTPY GNQTDYRIFELNKRLQNWTEECDNLWWDAFTTEFFEDDAMLTITFCLEDGPKRYTIGRTLIPRYFRSIFEGGATELYYVLKHPKEAFHSNFVSLDCDQGSMVTQHGKPMFTQVCVEGRLYLEFMFDDMMRIKTWHFSIRQHRELIPRSILAMHAQDPQMLDQLSKNITRCGLSG(SEQ ID No. 40)

本发明第二方面提供一种分离的多核苷酸，编码本发明第一方面所提供的融合蛋白。The second aspect of the present invention provides an isolated polynucleotide encoding the fusion protein provided by the first aspect of the present invention.

本发明第三方面提供一种DNA成环体系，包括本发明第一方面所提供的融合蛋白，还包括靶向启动子的sgRNA和靶向增强子的sgRNA。本领域技术人员可以根据目标表达基因，选择合适的靶向启动子的sgRNA和/或靶向增强子的sgRNA。例如，所述靶向启动子的sgRNA的序列通常可以与目标基因的启动子至少部分互补，再例如，所述靶向增强子的sgRNA的序列通常可以与目标基因的增强子至少部分互补，从而可以通过所述环化分子所形成的二聚体，拉近靶向位点的三维空间距离。The third aspect of the present invention provides a DNA looping system, including the fusion protein provided by the first aspect of the present invention, and also including a sgRNA targeting a promoter and a sgRNA targeting an enhancer. Those skilled in the art can select a suitable sgRNA targeting a promoter and/or a sgRNA targeting an enhancer according to the target expression gene. For example, the sequence of the sgRNA targeting the promoter can generally be at least partially complementary to the promoter of the target gene, and for another example, the sequence of the sgRNA targeting the enhancer can generally be at least partially complementary to the enhancer of the target gene, so that the dimer formed by the cyclization molecule can be used to close the three-dimensional space distance of the targeting site.

本发明所提供的DNA成环体系中，所述靶向启动子的sgRNA通常可以靶向基因TSS上游-100至-200bp区间，所述靶向启动子的sgRNA的序列通常可以设计为具有gnnnnnnnnnnnnnnnnnnnNGG特征(SEQ ID NO.43)，所述靶向启动子的sgRNA的GC含量通常可以在40-60％之间。在本发明一具体实施例中，所述靶向启动子的sgRNA靶向HBB基因的启动子区域，具体的，所述靶向启动子的sgRNA的序列可以如SEQ ID NO.4～6所示。In the DNA looping system provided by the present invention, the sgRNA targeting the promoter can generally target the -100 to -200bp interval upstream of the gene TSS, and the sequence of the sgRNA targeting the promoter can generally be designed to have a gnnnnnnnnnnnnnnnnnnnNGG feature (SEQ ID NO.43), and the GC content of the sgRNA targeting the promoter can generally be between 40-60%. In a specific embodiment of the present invention, the sgRNA targeting the promoter targets the promoter region of the HBB gene, and specifically, the sequence of the sgRNA targeting the promoter can be shown as SEQ ID NOs.4 to 6.

本发明所提供的DNA成环体系中，所述靶向增强子的sgRNA可以靶向增强子的DHS(DNase Hypersensitive Site)区域，具体可以是靶向增强子的DHS(DNaseHypersensitive Site)的附近或内部，所述靶向增强子的sgRNA的序列通常可以设计为具有gnnnnnnnnnnnnnnnnnnnNGG特征(SEQ ID NO.43)，所述靶向增强子的sgRNA的GC含量通常可以在40-60％之间。在本发明一具体实施例中，所述靶向增强子的sgRNA靶向LCR区域，所述LCR区域可以优选为β-globin基因簇的LCR区域，LCR区域也可以优选为LCR区域的高敏位点DHS2，具体的，所述靶向增强子的sgRNA的序列可以如SEQ ID NO.7～9所示。In the DNA looping system provided by the present invention, the sgRNA targeting the enhancer can target the DHS (DNase Hypersensitive Site) region of the enhancer, and specifically can be near or inside the DHS (DNase Hypersensitive Site) of the targeting enhancer. The sequence of the sgRNA targeting the enhancer can generally be designed to have a gnnnnnnnnnnnnnnnnnnnNGG feature (SEQ ID NO.43), and the GC content of the sgRNA targeting the enhancer can generally be between 40-60%. In a specific embodiment of the present invention, the sgRNA targeting the enhancer targets the LCR region, and the LCR region can preferably be the LCR region of the β-globin gene cluster, and the LCR region can also preferably be the high-sensitivity site DHS2 of the LCR region. Specifically, the sequence of the sgRNA targeting the enhancer can be as shown in SEQ ID NO.7 to 9.

本发明第三方面提供一种表达系统，所述表达系统包括能够表达所述融合蛋白、所述靶向启动子的sgRNA和所述靶向增强子的sgRNA的宿主细胞。从而可以通过所述融合蛋白所形成的二聚体，拉近靶向位点的三维空间距离，以实现目标基因的顺利表达。使所述表达系统能够表达所述融合蛋白、所述靶向启动子的sgRNA和所述靶向增强子的sgRNA的方法对于本领域技术人员来说应该是已知的，例如，可以使所述表达系统包括含有编码所述融合蛋白的多核苷酸的表达载体的宿主细胞、或染色体中整合有编码所述融合蛋白的多核苷酸的宿主细胞；再例如，可以使所述表达系统包括含有编码所述靶向启动子的sgRNA的多核苷酸的表达载体的宿主细胞、或染色体中整合有编码所述靶向启动子的sgRNA的多核苷酸的宿主细胞；再例如，可以使所述表达系统包括含有编码所述靶向增强子的sgRNA的多核苷酸的表达载体的宿主细胞、或染色体中整合有编码所述靶向增强子的sgRNA的多核苷酸的宿主细胞。所述表达系统还可以包括能够表达目标基因的宿主细胞，在本发明一具体实施例中，所述目标基因可以是β-globin基因簇中的基因，更具体可以是沉默基因，更具体可以是HBB基因。在本发明另一具体实施例中，所述宿主细胞可以是真核细胞，更具体可以是后生动物的细胞，更具体的可以是后生动物(例如，包括但不限于人、小鼠等)来源的原代细胞细胞或永生化细胞系，例如，可以是血系细胞系，更具体可以是人K562细胞。本领域技术人员可以根据宿主细胞的种类，选择合适的表达载体，例如，所述表达载体可以是包括但不限于pCDNA3.1、pST1374等瞬转载体或lenti病毒载体等。所述表达系统中，所述宿主细胞可以是能够表达目标基因、所述融合蛋白、所述靶向启动子的sgRNA、所述靶向增强子的sgRNA中的一个或多个，从而可以在表达系统中形成所述的DNA成环体系。The third aspect of the present invention provides an expression system, the expression system includes a host cell capable of expressing the fusion protein, the sgRNA targeting the promoter, and the sgRNA targeting the enhancer. Thus, the dimer formed by the fusion protein can be used to shorten the three-dimensional spatial distance of the targeting site to achieve the smooth expression of the target gene. The method of enabling the expression system to express the fusion protein, the sgRNA targeting the promoter, and the sgRNA targeting the enhancer should be known to those skilled in the art. For example, the expression system may include a host cell containing an expression vector of a polynucleotide encoding the fusion protein, or a host cell in which a polynucleotide encoding the fusion protein is integrated into a chromosome; for another example, the expression system may include a host cell containing an expression vector of a polynucleotide encoding the sgRNA targeting the promoter, or a host cell in which a polynucleotide encoding the sgRNA targeting the promoter is integrated into a chromosome; for another example, the expression system may include a host cell containing an expression vector of a polynucleotide encoding the sgRNA targeting the enhancer, or a host cell in which a polynucleotide encoding the sgRNA targeting the enhancer is integrated into a chromosome. The expression system may also include a host cell capable of expressing a target gene. In a specific embodiment of the present invention, the target gene may be a gene in the β-globin gene cluster, more specifically a silent gene, and more specifically an HBB gene. In another specific embodiment of the present invention, the host cell may be a eukaryotic cell, more specifically a cell of a metazoan, and more specifically a primary cell or immortalized cell line derived from a metazoan (e.g., including but not limited to humans, mice, etc.), for example, a blood cell line, more specifically a human K562 cell. A person skilled in the art may select a suitable expression vector according to the type of host cell, for example, the expression vector may include but is not limited to a transient vector such as pCDNA3.1, pST1374, or a lenti virus vector, etc. In the expression system, the host cell may be capable of expressing one or more of the target gene, the fusion protein, the sgRNA targeting the promoter, and the sgRNA targeting the enhancer, so that the DNA looping system may be formed in the expression system.

本发明第五方面提供本发明第一方面所提供的DNA成环分子、或本发明第二方面所提供的成环体系、或本发明第三方面所提供的表达系统在基因表达中的用途，优选为真核生物的基因表达中的用途，所述真核生物具体可以是后生动物，具体可以是包括但不限于人、小鼠、果蝇、线虫等。在本发明一具体实施例中，被表达的目标基因可以是β-globin基因簇中的基因，更具体可以是沉默基因，更具体可以是HBB基因。The fifth aspect of the present invention provides the use of the DNA looping molecule provided in the first aspect of the present invention, or the looping system provided in the second aspect of the present invention, or the expression system provided in the third aspect of the present invention in gene expression, preferably in gene expression of eukaryotic organisms, wherein the eukaryotic organisms may be metazoans, including but not limited to humans, mice, fruit flies, nematodes, etc. In a specific embodiment of the present invention, the target gene to be expressed may be a gene in the β-globin gene cluster, more specifically a silent gene, and more specifically the HBB gene.

本发明第六方面提供一种基因表达方法，所述基因表达方法可以为体外基因表达方法，包括：通过本发明第一方面所提供的融合蛋白、或本发明第二方面所提供的成环体系，拉近靶向位点的三维空间距离，进行基因表达。例如，所述基因表达方法可以包括：在所述成环体系存在的条件下，在适当条件下培养能够表达目标基因的宿主细胞。再例如，所述基因表达方法可以包括：在适当条件下培养本发明第三方面所提供的表达系统。The sixth aspect of the present invention provides a gene expression method, which can be an in vitro gene expression method, including: through the fusion protein provided by the first aspect of the present invention, or the looping system provided by the second aspect of the present invention, the three-dimensional spatial distance of the target site is shortened to perform gene expression. For example, the gene expression method may include: in the presence of the looping system, culturing a host cell capable of expressing the target gene under appropriate conditions. For another example, the gene expression method may include: culturing the expression system provided by the third aspect of the present invention under appropriate conditions.

本发明针对现有的DNA成环系统的不足，将LDB1与spdCas9形成融合蛋白，形成一种新的融合蛋白LDB1-dCas9，与现有技术相比，本发明是通过染色质三维结构改变达到，无需改变基因组序列信息或者表观遗传修饰，且具有操作简单，实验准备周期短，大幅降低了时间和工作成本，不需要添加任何小分子来诱导形成二聚体，且由于同源二聚体单体的存在，因此只需要一种CRISPR-Cas9就能同时靶向两个位点成环，大幅降低了时间和工作成本，且当增加基因的靶向位点时，能提高DNA成环的效率。In view of the shortcomings of the existing DNA looping system, the present invention forms a fusion protein of LDB1 and spdCas9 to form a new fusion protein LDB1-dCas9. Compared with the prior art, the present invention is achieved by changing the three-dimensional structure of chromatin, without changing the genome sequence information or epigenetic modification, and has the advantages of simple operation and short experimental preparation cycle, which greatly reduces the time and work cost, and does not need to add any small molecules to induce the formation of dimers. Moreover, due to the presence of homologous dimer monomers, only one CRISPR-Cas9 is needed to simultaneously target two sites for looping, which greatly reduces the time and work cost, and when the targeting site of the gene is increased, the efficiency of DNA looping can be improved.

以下通过特定的具体实例说明本发明的实施方式，本领域技术人员可由本说明书所揭露的内容轻易地了解本发明的其他优点与功效。本发明还可以通过另外不同的具体实施方式加以实施或应用，本说明书中的各项细节也可以基于不同观点与应用，在没有背离本发明的精神下进行各种修饰或改变。The following describes the embodiments of the present invention through specific examples, and those skilled in the art can easily understand other advantages and effects of the present invention from the contents disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and the details in this specification can also be modified or changed in various ways based on different viewpoints and applications without departing from the spirit of the present invention.

在进一步描述本发明具体实施方式之前，应理解，本发明的保护范围不局限于下述特定的具体实施方案；还应当理解，本发明实施例中使用的术语是为了描述特定的具体实施方案，而不是为了限制本发明的保护范围；在本发明说明书和权利要求书中，除非文中另外明确指出，单数形式“一个”、“一”和“这个”包括复数形式。Before further describing the specific embodiments of the present invention, it should be understood that the scope of protection of the present invention is not limited to the specific specific embodiments described below; it should also be understood that the terms used in the examples of the present invention are for describing the specific specific embodiments rather than for limiting the scope of protection of the present invention; in the present specification and claims, unless otherwise expressly stated herein, the singular forms "a", "an" and "the" include plural forms.

当实施例给出数值范围时，应理解，除非本发明另有说明，每个数值范围的两个端点以及两个端点之间任何一个数值均可选用。除非另外定义，本发明中使用的所有技术和科学术语与本技术领域技术人员通常理解的意义相同。除实施例中使用的具体方法、设备、材料外，根据本技术领域的技术人员对现有技术的掌握及本发明的记载，还可以使用与本发明实施例中所述的方法、设备、材料相似或等同的现有技术的任何方法、设备和材料来实现本发明。When the embodiments give numerical ranges, it should be understood that, unless otherwise specified in the present invention, both endpoints of each numerical range and any numerical value between the two endpoints can be selected. Unless otherwise defined, all technical and scientific terms used in the present invention have the same meaning as those generally understood by those skilled in the art. In addition to the specific methods, equipment, and materials used in the embodiments, according to the grasp of the prior art by those skilled in the art and the record of the present invention, any methods, equipment, and materials of the prior art similar or equivalent to the methods, equipment, and materials described in the embodiments of the present invention can also be used to realize the present invention.

除非另外说明，本发明中所公开的实验方法、检测方法、制备方法均采用本技术领域常规的分子生物学、生物化学、染色质结构和分析、分析化学、细胞培养、重组DNA技术及相关领域的常规技术。这些技术在现有文献中已有完善说明，具体可参见Sambrook等MOLECULAR CLONING：A LABORATORY MANUAL，Second edition，Cold Spring HarborLaboratory Press，1989and Third edition，2001；Ausubel等，CURRENT PROTOCOLS INMOLECULAR BIOLOGY，John Wiley&Sons，New York，1987and periodic updates；theseries METHODS IN ENZYMOLOGY，Academic Press，San Diego；Wolffe，CHROMATINSTRUCTURE AND FUNCTION，Third edition，Academic Press，San Diego，1998；METHODS INENZYMOLOGY，Vol.304，Chromatin(P.M.Wassarman and A.P.Wolffe，eds.)，AcademicPress，San Diego，1999；和METHODS IN MOLECULAR BIOLOGY，Vol.119，ChromatinProtocols(P.B.Becker，ed.)Humana Press，Totowa，1999等。Unless otherwise stated, the experimental methods, detection methods, and preparation methods disclosed in the present invention all adopt conventional techniques in the field of molecular biology, biochemistry, chromatin structure and analysis, analytical chemistry, cell culture, recombinant DNA technology, and related fields. These techniques are well described in the literature, see Sambrook et al., MOLECULAR CLONING: A LABORATORY MANUAL, Second edition, Cold Spring Harbor Laboratory Press, 1989 and Third edition, 2001; Ausubel et al., CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley & Sons, New York, 1987 and periodic updates; these series METHODS IN ENZYMOLOGY, Academic Press, San Diego; Wolffe, CHROMATINSTRUCTURE AND FUNCTION, Third edition, Academic Press, San Diego, 1998; METHODS IN ENZYMOLOGY, Vol. 304, Chromatin (P. M. Wassarman and A. P. Wolffe, eds.), Academic Press, San Diego, 1999; and METHODS IN MOLECULAR BIOLOGY, Vol. 304, Chromatin (P. M. Wassarman and A. P. Wolffe, eds.), Academic Press, San Diego, 1999. BIOLOGY, Vol. 119, Chromatin Protocols (P.B. Becker, ed.) Humana Press, Totowa, 1999, etc.

实施例1Example 1

LDB1-dCas9质粒的构建Construction of LDB1-dCas9 plasmid

将人类LDB1的cDNA，核苷酸序列(NM_001113407.2)为atgctggatagggatgtgggtccaactcccatgtatccgcctacatacctggagccagggattgggaggcacacaccatatggcaaccaaactgactacagaatatttgagcttaacaaacggcttcagaactggacagaggagtgtgacaatctctggtgggatgcattcacgactgagttctttgaggatgatgccatgttgaccatcactttctgcctggaggatggaccaaagagatataccattggccggaccctgatcccacgctacttccgcagcatctttgaggggggtgctacggagctgtactatgttcttaagcaccccaaggaggcattccacagcaactttgtgtccctcgactgtgaccagggcagcatggtgacccagcatggcaagcccatgttcacccaggtgtgtgtggagggccggttgtacctggagttcatgtttgacgacatgatgcggataaagacgtggcacttcagcatccggcagcaccgagagctcatcccccgcagcatccttgccatgcatgcccaagacccccagatgttggatcagctctccaaaaacatcactcggtgtgggctgtccaattccactctcaactacctccgactctgtgtgatactcgagcccatgcaagagctcatgtcacgccacaagacctacagcctcagcccccgcgactgcctcaagacctgccttttccagaagtggcagcgcatggtagcaccccctgcggagcccacacgtcagcagcccagcaaacggcggaaacggaagatgtcagggggcagcaccatgagctctggtggtggcaacaccaacaacagcaacagcaagaagaagagcccagctagcaccttcgccctctccagccaggtacctgatgtgatggtggtgggggagcccaccctgatgggcggggagttcggggacgaggacgagaggctcatcacccggctggagaacacccagtttgacgcagccaacggcattgacgacgaggacagctttaacaactcccctgcactgggcgccaacagcccctggaacagcaagcctccgtccagccaagaaagcaaatcggagaaccccacgtcacaggcctcccag(SEQ ID NO.37)，稀释至10μL作为PCR模板。设计正向引物带有NotI酶切位点：gggacctaagaaaaagaggaaggtggcggccgctggcggcagcatgctggatagggatgtgggtccaactcccatgtatccg(SEQ IDNO.29)，反向引物带有KpnI酶切位点ctctcgggggtggcgctctcgctggtaccgggggtctcgctgccgctctgggaggcctgtgacgt(SEQ ID NO.30)，加水溶解至10μM。使用诺唯赞高保真酶试剂盒(Vazyme,p501-d2)扩增LDB1的cDNA序列片段。扩增体系和PCR反应条件如所示：The cDNA of human LDB1, the nucleotide sequence (NM_001113407.2) is atgctggatagggatgtgggtccaactcccatgtatccgcctacatacctggagccagggattgggaggcacaaccatatggcaaccaaactgactacagaatatttgagcttaacaaacggcttcagaactggacagaggagtgtgacaatctctggtgggatgcattcacgactga gttctttgaggatgatgccatgttgaccatcactttctgcctggaggatggaccaaagagatataccattggccgga ccctgatcccacgctacttccgcagcatctttgaggggggtgctacggagctgtactatgttcttaagcaccccaaggaggcattccacagcaactttgtgtccctcgactgtgaccagggcagcatggtgacccagcatggcaagcccatgttcacccaggtgtgtgtggagggccggttgtacctggagttcatgtttg acgacatgatgcggataaagacgtggcacttcagcatccggcagcaccgagagctcatcccccgcagcatccttgccatgcatgcccaag accccgatgttggatcagctctccaaaaacatcactcggtgtgggctgtccaattccactctcaactacctccgactctgtgtgatactcgagcccatgcaagagctcatgtcacgccacaagacctacagcctcagcccccgcgactgcctcaagacctgccttttccagaagtggcagcgcatggtagcaccccctgcggag cccacacgtcagcagcccagcaaacggcggaaacggaagatgtcaggggggcagcaccatgagctctggtggtggcaacaccaaca acagcaacagcaagaagaagagcccagctagcaccttcgccctctccagccaggtacctgatgtgatggtggtgggggagcccaccctgatgggcggggagttcggggacgaggacgagaggctcatcacccggctggagaacacccagtttgacgcagccaacggcattgacgacgaggacagctttaacaactcccctgcactgggcgc caacagcccctggaacagcaagcctccgtccagccaagaaagcaaatcggagaaccccacgtcacaggcctcccag (SEQ ID NO. 37), diluted to 10 μL as a PCR template. The forward primer was designed with a NotI restriction site: gggacctaagaaaaagaggaaggtggcggccgctggcggcagcatgctggatagggatgtgggtccaactcccatgtatccg (SEQ ID NO.29), and the reverse primer was designed with a KpnI restriction site ctctcgggggtggcgctctcgctggtaccgggggtctcgctgccgctctgggaggcctgtgacgt (SEQ ID NO.30), and dissolved in water to 10 μM. The cDNA sequence fragment of LDB1 was amplified using an enzyme kit (Vazyme, p501-d2). The amplification system and PCR reaction conditions are as follows:

PCR扩增产物通过AxyPrep PCR Clean-up试剂盒(Axygen,AP-PCR-500G)纯化回收。另取pST1374-N-NLS-flag-linker-dCas9载体1μg，用NotI-HF(NEB,R3189S)和KpnI-HF(NEB,R3142S)做酶切，37℃孵育2h。酶切体系如下：The PCR amplification product was purified and recovered by AxyPrep PCR Clean-up Kit (Axygen, AP-PCR-500G). 1 μg of pST1374-N-NLS-flag-linker-dCas9 vector was taken and digested with NotI-HF (NEB, R3189S) and KpnI-HF (NEB, R3142S) and incubated at 37°C for 2 h. The digestion system is as follows:

酶切产物用AxyPrep DNA凝胶回收试剂盒(Axygen,AP-GX-250G)做割胶回收。通过Vazyme重组试剂盒(Vazyme,C112-01)重组连接PCR片段和酶切后载体片段，连接体系如下：The digested product was recovered by gel extraction using the AxyPrep DNA gel recovery kit (Axygen, AP-GX-250G). The PCR fragment and the digested vector fragment were recombined and connected using the Vazyme recombination kit (Vazyme, C112-01). The connection system is as follows:

连接产物在37℃孵育0.5h,转化涂板，经Sanger测序得到正确的LDB1-dCas9质粒，序列信息见SEQ ID NO.11。The ligation product was incubated at 37°C for 0.5 h, transformed and plated, and the correct LDB1-dCas9 plasmid was obtained by Sanger sequencing. The sequence information is shown in SEQ ID NO.11.

dCas9-LDB1质粒的构建Construction of dCas9-LDB1 plasmid

以LDB1的cDNA作为PCR模板，设计正向引物带有BssHII酶切位点gggcgcgctggaggaggatccggaggaggatccggaggaggatccatgctggatagggatgtgggtccaactcccatgtatccg(SEQIDNO.31)，反向引物带有ApaI酶切位点gaagggcccctgggaggcctgtgacgt(SEQ ID NO.32)，加水溶解至10μM。使用诺唯赞高保真酶试剂盒(Vazyme,p501-d2)扩增LDB1的cDNA序列片段。扩增体系和PCR反应条件如下：Using LDB1 cDNA as a PCR template, the forward primer was designed with a BssHII restriction site gggcgcgctggaggaggatccggaggaggatccggaggaggatccatgctggatagggatgtgggtccaactcccatgtatccg (SEQ ID NO.31), and the reverse primer was designed with an ApaI restriction site gaagggcccctgggaggcctgtgacgt (SEQ ID NO.32), and dissolved in water to 10 μM. The cDNA sequence fragment of LDB1 was amplified using the Vazyme high-fidelity enzyme kit (Vazyme, p501-d2). The amplification system and PCR reaction conditions are as follows:

PCR扩增产物通过AxyPrep PCR Clean-up试剂盒(Axygen,AP-PCR-500G)纯化回收并取1μg，另取pST1374-N-NLS-flag-linker-dCas9载体1μg，用ApaI(NEB,R0114S)和BssHII(NEB,R0119S)分别酶切PCR目的片段或载体，25℃孵育2h。酶切体系如下：The PCR amplification product was purified and recovered by AxyPrep PCR Clean-up Kit (Axygen, AP-PCR-500G) and 1 μg was taken. Another 1 μg of pST1374-N-NLS-flag-linker-dCas9 vector was taken and the PCR target fragment or vector was digested with ApaI (NEB, R0114S) and BssHII (NEB, R0119S) respectively, and incubated at 25°C for 2 hours. The digestion system is as follows:

酶切产物用AxyPrep DNA凝胶回收试剂盒(Axygen,AP-GX-250G)做割胶回收。通过T4连接酶(NEB,M0202S)连接酶切后的PCR片段和载体片段，连接体系如下：The digested product was recovered by gel extraction using the AxyPrep DNA Gel Recovery Kit (Axygen, AP-GX-250G). The digested PCR fragment and the vector fragment were connected using T4 ligase (NEB, M0202S). The connection system is as follows:

连接产物在16℃孵育2h,转化涂板，经Sanger测序得到正确的dCas9-LDB1质粒，序列信息见SEQ ID NO.12。The ligation product was incubated at 16°C for 2 h, transformed and plated, and the correct dCas9-LDB1 plasmid was obtained by Sanger sequencing. The sequence information is shown in SEQ ID NO.12.

DD-dCas9质粒的构建Construction of DD-dCas9 plasmid

以LDB1的cDNA作为PCR模板，设计正向引物带有NotI酶切位点gtggcggccgctggcggcagcatgctggatagggatgtgggtccaactcccatgtatccg(SEQ ID NO.33)，反向引物带有KpnI酶切位点cgctggtaccgggggtctcgctgccgctggacagcccacaccgagtgatgtttttgg(SEQ IDNO.34)，加水溶解至10μM。使用诺唯赞高保真酶试剂盒(Vazyme,p501-d2)扩增LDB1的dimmer domain(DD)片段。扩增体系和PCR反应条件如下：Using LDB1 cDNA as a PCR template, the forward primer was designed with NotI restriction site gtggcggccgctggcggcagcatgctggatagggatgtgggtccaactcccatgtatccg (SEQ ID NO.33), and the reverse primer was designed with KpnI restriction site cgctggtaccgggggtctcgctgccgctggacagcccacaccgagtgatgttttttgg (SEQ ID NO.34), and dissolved in water to 10 μM. The dimmer domain (DD) fragment of LDB1 was amplified using the Vazyme high-fidelity enzyme kit (Vazyme, p501-d2). The amplification system and PCR reaction conditions are as follows:

PCR扩增产物经AxyPrep PCR Clean-up试剂盒(Axygen,AP-PCR-500G)纯化回收并取1μg，另取pST1374-N-NLS-flag-linker-dCas9载体1μg，用NotI-HF(NEB,R3189S)和KpnI-HF(NEB,R3142S)做酶切，37℃孵育2h。酶切体系如下：The PCR amplification product was purified and recovered by AxyPrep PCR Clean-up Kit (Axygen, AP-PCR-500G) and 1 μg was taken. Another 1 μg of pST1374-N-NLS-flag-linker-dCas9 vector was taken and digested with NotI-HF (NEB, R3189S) and KpnI-HF (NEB, R3142S) and incubated at 37°C for 2 hours. The digestion system is as follows:

连接产物在16℃孵育2h,转化涂板，经Sanger测序得到正确的DD-dCas9质粒，序列信息见SEQ ID NO.13。The ligation product was incubated at 16°C for 2 h, transformed and plated, and the correct DD-dCas9 plasmid was obtained by Sanger sequencing. The sequence information is shown in SEQ ID NO.13.

dCas9-DD质粒的构建Construction of dCas9-DD plasmid

以LDB1的cDNA作为PCR模板，设计正向引物带有BssHII酶切位点gggcgcgctggaggaggatccggaggaggatccggaggaggatccatgctggatagggatgtgggtccaactcccatgtatccg(SEQIDNO.35)，反向引物带有ApaI酶切位点tcgaagggcccggacagcccacaccgagtgatgtt(SEQ IDNO.36)，加水溶解至10μM。使用诺唯赞高保真酶试剂盒(Vazyme,p501-d2)扩增LDB1的cDNA序列片段。扩增体系和PCR反应条件如所示：Using the cDNA of LDB1 as a PCR template, the forward primer was designed with a BssHII restriction site gggcgcgctggaggaggatccggaggaggatccggaggaggatccatgctggatagggatgtgggtccaactcccatgtatccg (SEQ ID NO.35), and the reverse primer was designed with an ApaI restriction site tcgaagggcccggacagcccacaccgagtgatgtt (SEQ ID NO.36), and dissolved in water to 10 μM. The cDNA sequence fragment of LDB1 was amplified using the Vazyme high-fidelity enzyme kit (Vazyme, p501-d2). The amplification system and PCR reaction conditions are shown as follows:

PCR扩增产物经AxyPrep PCR Clean-up试剂盒(Axygen,AP-PCR-500G)纯化回收并取1μg，另取pST1374-N-NLS-flag-linker-dCas9载体1μg，用ApaI(NEB,R0114S)和BssHII(NEB,R0119S)分别酶切PCR目的片段或载体，25℃孵育2h。酶切体系如下：The PCR amplification product was purified and recovered by AxyPrep PCR Clean-up Kit (Axygen, AP-PCR-500G) and 1 μg was taken. Another 1 μg of pST1374-N-NLS-flag-linker-dCas9 vector was taken and the PCR target fragment or vector was digested with ApaI (NEB, R0114S) and BssHII (NEB, R0119S) respectively, and incubated at 25°C for 2 hours. The digestion system is as follows:

连接产物在16℃孵育2h,转化涂板，经Sanger测序得到正确的dCas9-DDThe ligation products were incubated at 16°C for 2 h, transformed and plated, and the correct dCas9-DD was obtained by Sanger sequencing.

质粒，序列信息见SEQ ID NO.14。Plasmid, sequence information see SEQ ID NO.14.

靶向位点sgRNA质粒的构建Construction of targeting site sgRNA plasmid

对K562细胞的β-globin基因簇的LCR区域DHS2设计3个靶向sgRNA分别命名为L-sg1序列为aatatgtcacattctgtctc(SEQ ID NO.7)；L-sg3，序列为ggactatgggaggtcactaa(SEQ ID NO.8)；L-sg4，序列为gaaggttacacagaaccaga(SEQ ID NO.9)。对HBB基因的promoter区域设计3个sgRNA，分别命名为P-sg1序列为ggccaagagatatatcttag(SEQ IDNO.4)；P-sg3序列为gtgccagaagagccaaggac(SEQ ID NO.5)、P-sg4序列为gtggagccacaccctagggt(SEQ ID NO.6)，。阴性对照sgRNA靶向EGFP，命名为sg-egfp，序列为ggagcgcaccatcttcttca(SEQ ID NO.10)。根据sgRNA序列设计碱基互补配对的正负链引物，正链在5’端加碱基ACCG，负链5’端加碱基AAAC，加灭菌水溶解至100μM。经退火后形成有overhang的双链DNA片段，连接到BsaI(NEB,R0535S)酶切后的pGL3-U6-sgRNA(Addgene#51133)线性载体上，以构建靶向特异性sgRNA。所有靶向位点的sgRNA的引物序列如SEQ IDNO.所示，具体如下：Three targeting sgRNAs were designed for the LCR region DHS2 of the β-globin gene cluster of K562 cells, named L-sg1 with a sequence of aatatgtcacattctgtctc (SEQ ID NO.7); L-sg3 with a sequence of ggactatgggaggtcactaa (SEQ ID NO.8); and L-sg4 with a sequence of gaaggttacacagaaccaga (SEQ ID NO.9). Three sgRNAs were designed for the promoter region of the HBB gene, named P-sg1 with a sequence of ggccaagagatatatcttag (SEQ ID NO.4); P-sg3 with a sequence of gtgccagaagagccaaggac (SEQ ID NO.5), and P-sg4 with a sequence of gtggagccacaccctagggt (SEQ ID NO.6). The negative control sgRNA targets EGFP and is named sg-egfp, with a sequence of ggagcgcaccatcttcttca (SEQ ID NO.10). According to the sgRNA sequence, the positive and negative strand primers with complementary base pairing were designed. The positive strand was added with base ACCG at the 5' end, and the negative strand was added with base AAAC at the 5' end. Sterile water was added to dissolve to 100 μM. After annealing, a double-stranded DNA fragment with overhang was formed, which was connected to the pGL3-U6-sgRNA (Addgene#51133) linear vector after BsaI (NEB, R0535S) digestion to construct a targeting specific sgRNA. The primer sequences of sgRNAs for all targeting sites are shown in SEQ ID NO., as follows:

L-sg1正链引物序列：ACCG AATATGTCACATTCTGTCTC(SEQ ID NO.15)L-sg1 positive strand primer sequence: ACCG AATATGTCACATTCTGTCTC (SEQ ID NO. 15)

L-sg1负链引物序列：AAAC GAGACAGAATGTGACATATT(SEQ ID NO.16)L-sg1 negative strand primer sequence: AAAC GAGACAGAATGTGACATATT (SEQ ID NO.16)

L-sg3正链引物序列：ACCG GGACTATGGGAGGTCACTAA(SEQ ID NO.17)L-sg3 positive strand primer sequence: ACCG GGACTATGGGAGGTCACTAA (SEQ ID NO.17)

L-sg3负链引物序列：AAAC TTAGTGACCTCCCATAGTCC(SEQ ID NO.18)L-sg3 negative strand primer sequence: AAAC TTAGTGACCTCCCATAGTCC (SEQ ID NO.18)

L-sg4正链引物序列：ACCG GAAGGTTACACAGAACCAGA(SEQ ID NO.19)L-sg4 positive strand primer sequence: ACCG GAAGGTTACACAGAACCAGA (SEQ ID NO. 19)

L-sg4负链引物序列：AAAC TCTGGTTCTGTGTAACCTTC(SEQ ID NO.20)L-sg4 negative strand primer sequence: AAAC TCTGGTTCTGTGTAACCTTC (SEQ ID NO.20)

P-sg1正链引物序列：ACCG GGCCAAGAGATATATCTTAG(SEQ ID NO.21)P-sg1 positive strand primer sequence: ACCG GGCCAAGAGATATATCTTAG (SEQ ID NO.21)

P-sg1负链引物序列：AAAC CTAAGATATATCTCTTGGCC(SEQ ID NO.22)P-sg1 negative strand primer sequence: AAAC CTAAGATATATCTCTTGGCC (SEQ ID NO.22)

P-sg3正链引物序列：ACCG GTGCCAGAAGAGCCAAGGAC(SEQ ID NO.23)P-sg3 positive strand primer sequence: ACCG GTGCCAGAAGAGCCAAGGAC (SEQ ID NO.23)

P-sg3负链引物序列：AAAC GTCCTTGGCTCTTCTGGCAC(SEQ ID NO.24)P-sg3 negative strand primer sequence: AAAC GTCCTTGGCTCTTCTGGCAC (SEQ ID NO.24)

P-sg4正链引物序列：ACCG GTGGAGCCACACCCTAGGGT(SEQ ID NO.25)P-sg4 positive strand primer sequence: ACCG GTGGAGCCACACCCTAGGGT (SEQ ID NO.25)

P-sg4负链引物序列：AAAC ACCCTAGGGTGTGGCTCCAC(SEQ ID NO.26)P-sg4 negative strand primer sequence: AAAC ACCCTAGGGTGTGGCTCCAC (SEQ ID NO.26)

sg-egfp正链引物序列：ACCG GGAGCGCACCATCTTCTTCA(SEQ ID NO.27)sg-egfp positive strand primer sequence: ACCG GGAGCGCACCATCTTCTTCA (SEQ ID NO.27)

sg-egfp负链引物序列：AAAC TGAAGAAGATGGTGCGCTCC(SEQ ID NO.28)sg-egfp negative strand primer sequence: AAAC TGAAGAAGATGGTGCGCTCC (SEQ ID NO.28)

退火体系和退火程序具体如下：The annealing system and annealing procedure are as follows:

利用BsaI(NEB,R0535S)对pGL3-U6-sgRNA(Addgene#51133)质粒进行酶切以得到线性化sgRNA载体。酶切体系如下所示：The pGL3-U6-sgRNA (Addgene #51133) plasmid was digested with BsaI (NEB, R0535S) to obtain a linearized sgRNA vector. The digestion system is as follows:

酶切产物用AxyPrep DNA凝胶回收试剂盒(Axygen,AP-GX-250G)做割胶回收得到线性化载体。取50ng线性化载体与3μl退火产物通过T4连接酶(NEB,M0202S)连接，16℃孵育2小时后并转化涂板，经Sanger测序得到正确的靶向特异性sgRNA。The enzyme digestion product was cut and recovered using the AxyPrep DNA gel recovery kit (Axygen, AP-GX-250G) to obtain the linearized vector. 50 ng of the linearized vector was ligated with 3 μl of the annealed product using T4 ligase (NEB, M0202S), incubated at 16°C for 2 hours, and transformed to a plate. The correct targeting specific sgRNA was obtained by Sanger sequencing.

连接体系如下：The connection system is as follows:

LDB1-dCas9通过人工DNA成环重编程目的基因空间位置的示意图如图1所示。A schematic diagram of LDB1-dCas9 reprogramming the spatial position of the target gene through artificial DNA looping is shown in Figure 1.

实施例2Example 2

LDB1-dCas9和dCas9-LDB1通过DNA成环重编程基因空间位置激活HBB表达：LDB1-dCas9 and dCas9-LDB1 activate HBB expression by reprogramming gene spatial location through DNA looping:

利用上述的LDB1-dCas9和dCas9-LDB1系统用电穿孔的方法转染K562细胞，过程如下：The above-mentioned LDB1-dCas9 and dCas9-LDB1 systems were used to transfect K562 cells by electroporation, and the process was as follows:

1)K562细胞(来自ATCC)复苏，在10cm培养皿(Corning,430167)中培养，培养基为混有10％的胎牛血清(HyClone,SV30087)的RPMI 1640培养基(Gibco,11875093)。培养温度为37℃，二氧化碳浓度为5％。1) K562 cells (from ATCC) were revived and cultured in 10 cm culture dishes (Corning, 430167) in RPMI 1640 medium (Gibco, 11875093) mixed with 10% fetal bovine serum (HyClone, SV30087) at 37°C and a carbon dioxide concentration of 5%.

2)当细胞浓度为1x10⁶/ml时收集细胞，每管1x10⁶个并1000r/min，离心收集细胞。使用Lonza电转试剂盒Amaxa cell line Nucleofector Kit V(Lonza，VCA-1003)，每孔转染的质粒的量分别是LDB1-dCas9或者dCas9-LDB1质粒1μg，靶向LCR区域DHS2的sgRNA质粒0.5μg和靶向HBB基因启动子区的sgRNA 0.5μg，电转程序为T-016(Lonza 2b)。三种靶向LCR的sgRNA与3种靶向HBB启动子的sgRNA共9种组合。阴性对照组电转的sgRNA为靶向egfp的sgRNA。2) When the cell concentration was 1x10 ⁶ /ml, cells were collected, 1x10 ⁶ cells per tube and 1000r/min, and the cells were collected by centrifugation. Using the Lonza electroporation kit Amaxa cell line Nucleofector Kit V (Lonza, VCA-1003), the amount of plasmid transfected in each well was 1μg of LDB1-dCas9 or dCas9-LDB1 plasmid, 0.5μg of sgRNA plasmid targeting DHS2 in the LCR region, and 0.5μg of sgRNA targeting the promoter region of the HBB gene, and the electroporation program was T-016 (Lonza 2b). There were 9 combinations of three sgRNAs targeting LCR and three sgRNAs targeting the HBB promoter. The sgRNA electroporated in the negative control group was the sgRNA targeting egfp.

3)电转结束后用500μl培养基轻轻冲出细胞，移至12孔板，孔板中各加1.5ml的1640培养基。3) After electroporation, the cells were gently flushed out with 500 μl of culture medium and transferred to a 12-well plate. 1.5 ml of 1640 culture medium was added to each well of the plate.

4)转染24小时后，用终浓度为2ng/ml的Puromycin(InvivoGen,nt-pr-1)和10ng/ml的Blasticidin(InvivoGen,ant-bl-1)做药杀处理。4) 24 hours after transfection, the cells were treated with Puromycin (InvivoGen, nt-pr-1) at a final concentration of 2 ng/ml and Blasticidin (InvivoGen, ant-bl-1) at a final concentration of 10 ng/ml.

转染72小时后收细胞，用500μl的Trizol(Invitrogen,15596018)裂解并用RNA提取试剂盒TransZol Up Plus RNA Kit(ER501-01)抽提RNA，分别取500ng的RNA用TOYOBO逆转录试剂盒(Toyobo,FSQ-301)转录cDNA稀释10倍使用。用Biotool的Sybr greenqPCRMastermix(Biotool,B21703)做qPCR,体系如下：72 hours after transfection, cells were harvested, lysed with 500 μl Trizol (Invitrogen, 15596018), and RNA was extracted using the RNA extraction kit TransZol Up Plus RNA Kit (ER501-01). 500 ng of RNA was used to transcribe cDNA using the TOYOBO reverse transcription kit (Toyobo, FSQ-301) and diluted 10 times for use. qPCR was performed using Biotool's Sybr green qPCR Mastermix (Biotool, B21703), and the system was as follows:

测得LDB1-dCas9和dCas9-LDB1在各sgRNA组合下对HBB基因表达的调控，如图2所示。相比较阴性对照组，LDB-dCas9和dCas9-LDB1都能通过DNA成环重编程HBB基因空间位置特异性地上调其表达。LDB1-dCas9对HBB的激活效果高于dCas9-LDB1，且当靶向sgRNA选择为L-sg3和p-sg3组合时HBB的表达上调最高LDB1-dCas9提高12倍，dCas9-LDB1提高将近8倍。The regulation of HBB gene expression by LDB1-dCas9 and dCas9-LDB1 under various sgRNA combinations was measured, as shown in Figure 2. Compared with the negative control group, both LDB-dCas9 and dCas9-LDB1 can upregulate the expression of HBB gene spatially specifically by DNA looping reprogramming. LDB1-dCas9 has a higher activation effect on HBB than dCas9-LDB1, and when the targeting sgRNA is selected as a combination of L-sg3 and p-sg3, the expression of HBB is upregulated the most, with LDB1-dCas9 increasing by 12 times and dCas9-LDB1 increasing by nearly 8 times.

为了验证对HBB的表达是由于将HBB的空间位置拉近到LCR导致其特异性表达，我们又检测了在靶向sgRNA不同组合时为L-sg3和P-sg1、L-sg3和P-sg3、L-sg1和P-sg3时基因簇里其他globin基因的表达情况，如图3所示，相比较空白对照组，HBB被LDB1-dCas9在靶向sgRNA为L-sg3和P-sg1组合时提高了27倍，在L-sg3和P-sg3组合时提高了25倍；HBB被dCas9-LDB1在靶向sgRNA为L-sg1和P-sg3组合时提高了15倍，在L-sg3和P-sg3组合时提高了14倍。同时我们发现HBD的表达也有4-5倍的提高，我们推测由于HBD基因距离HBB基因距离较近，在DNA成环时也拉近了LCR与HBD的空间位置进而提高了其表达量。In order to verify that the expression of HBB was specifically due to the spatial proximity of HBB to LCR, we detected the expression of other globin genes in the gene cluster when targeting different combinations of sgRNAs: L-sg3 and P-sg1, L-sg3 and P-sg3, and L-sg1 and P-sg3. As shown in Figure 3, compared with the blank control group, HBB was increased 27-fold by LDB1-dCas9 when the targeting sgRNA was the combination of L-sg3 and P-sg1, and 25-fold when the targeting sgRNA was the combination of L-sg3 and P-sg3; HBB was increased 15-fold by dCas9-LDB1 when the targeting sgRNA was the combination of L-sg1 and P-sg3, and 14-fold when the targeting sgRNA was the combination of L-sg3 and P-sg3. At the same time, we found that the expression of HBD also increased by 4-5 times. We speculate that because the HBD gene is close to the HBB gene, the spatial position of LCR and HBD is also shortened during DNA looping, thereby increasing its expression level.

全长的LDB1蛋白对HBB基因表达的激活更高：The full-length LDB1 protein has a higher activation of HBB gene expression:

为了验证LDB1与其DD domain对调控HBB基因的效率差异，分别同时以LDB1-dCas9,DD-dCas9,dCas9-LDB1,dCas9-DD对HBB基因进行空间位置重编程。同时为了验证靶向promoter的多个位点对DNA成环的效率是否有加乘作用，在LCR或者HBB启动子区分别选择两个靶向位点。我们将上述融合蛋白的质粒载体各1μg混合并分别与三组混合的sgRNA质粒P-sg3、P-sg1+3、L-sg3&P-sg1+3、L-sg1+3&P-sg1+3电转，每组sgRNA混合物总共0.5μg，当sgRNA混合物中包括多种质粒载体时，各质粒载体的用量相同，电转程序为T-016，具体参照上文。In order to verify the difference in efficiency of LDB1 and its DD domain in regulating the HBB gene, the spatial position of the HBB gene was reprogrammed by LDB1-dCas9, DD-dCas9, dCas9-LDB1, and dCas9-DD. At the same time, in order to verify whether the multiple sites of the targeted promoter have a multiplicative effect on the efficiency of DNA looping, two targeting sites were selected in the LCR or HBB promoter region. We mixed 1 μg of the plasmid vectors of the above fusion proteins and electroporated them with three groups of mixed sgRNA plasmids P-sg3, P-sg1+3, L-sg3&P-sg1+3, and L-sg1+3&P-sg1+3, respectively. Each group of sgRNA mixture had a total of 0.5 μg. When the sgRNA mixture included multiple plasmid vectors, the amount of each plasmid vector was the same, and the electroporation procedure was T-016. Please refer to the above for details.

RNA提取和qPCR检测步骤如上所述。HBB表达变化结果如图4所示。在各组的靶向sgRNA作用下，全长的LDB1对HBB基因的激活效率均高于DD domain。值得注意的是，当增加promoter区的靶向位点时，LDB1-dCas9和dCas9-LDB1对HBB的激活效率具有加乘作用。因此，增加目的基因启动子区的靶向sgRNA能显著提高DNA效率进而提高基因表达。The RNA extraction and qPCR detection steps are as described above. The results of HBB expression changes are shown in Figure 4. Under the action of targeted sgRNA in each group, the activation efficiency of the full-length LDB1 on the HBB gene is higher than that of the DD domain. It is worth noting that when the targeting site in the promoter region is increased, LDB1-dCas9 and dCas9-LDB1 have a multiplying effect on the activation efficiency of HBB. Therefore, increasing the targeted sgRNA in the promoter region of the target gene can significantly improve the DNA efficiency and thus improve gene expression.

综上所述，本发明有效克服了现有技术中的种种缺点而具高度产业利用价值。In summary, the present invention effectively overcomes various shortcomings of the prior art and has high industrial utilization value.

上述实施例仅例示性说明本发明的原理及其功效，而非用于限制本发明。任何熟悉此技术的人士皆可在不违背本发明的精神及范畴下，对上述实施例进行修饰或改变。因此，举凡所属技术领域中具有通常知识者在未脱离本发明所揭示的精神与技术思想下所完成的一切等效修饰或改变，仍应由本发明的权利要求所涵盖。The above embodiments are merely illustrative of the principles and effects of the present invention, and are not intended to limit the present invention. Anyone familiar with the art may modify or alter the above embodiments without departing from the spirit and scope of the present invention. Therefore, all equivalent modifications or alterations made by a person of ordinary skill in the art without departing from the spirit and technical ideas disclosed by the present invention shall still be covered by the claims of the present invention.

序列表Sequence Listing

<110> 上海科技大学<110> ShanghaiTech University

<120> 一种DNA环化分子及其用途<120> A DNA cyclization molecule and its use

<160> 43<160> 43

<170> SIPOSequenceListing 1.0<170> SIPOSequenceListing 1.0

<210> 1<210> 1

<211> 375<211> 375

<212> PRT<212> PRT

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 1<400> 1

Met Leu Asp Arg Asp Val Gly Pro Thr Pro Met Tyr Pro Pro Thr TyrMet Leu Asp Arg Asp Val Gly Pro Thr Pro Met Tyr Pro Pro Thr Tyr

1 5 10 151 5 10 15

Leu Glu Pro Gly Ile Gly Arg His Thr Pro Tyr Gly Asn Gln Thr AspLeu Glu Pro Gly Ile Gly Arg His Thr Pro Tyr Gly Asn Gln Thr Asp

20 25 3020 25 30

Tyr Arg Ile Phe Glu Leu Asn Lys Arg Leu Gln Asn Trp Thr Glu GluTyr Arg Ile Phe Glu Leu Asn Lys Arg Leu Gln Asn Trp Thr Glu Glu

35 40 4535 40 45

Cys Asp Asn Leu Trp Trp Asp Ala Phe Thr Thr Glu Phe Phe Glu AspCys Asp Asn Leu Trp Trp Asp Ala Phe Thr Thr Glu Phe Phe Glu Asp

50 55 6050 55 60

Asp Ala Met Leu Thr Ile Thr Phe Cys Leu Glu Asp Gly Pro Lys ArgAsp Ala Met Leu Thr Ile Thr Phe Cys Leu Glu Asp Gly Pro Lys Arg

65 70 75 8065 70 75 80

Tyr Thr Ile Gly Arg Thr Leu Ile Pro Arg Tyr Phe Arg Ser Ile PheTyr Thr Ile Gly Arg Thr Leu Ile Pro Arg Tyr Phe Arg Ser Ile Phe

85 90 9585 90 95

Glu Gly Gly Ala Thr Glu Leu Tyr Tyr Val Leu Lys His Pro Lys GluGlu Gly Gly Ala Thr Glu Leu Tyr Tyr Val Leu Lys His Pro Lys Glu

100 105 110100 105 110

Ala Phe His Ser Asn Phe Val Ser Leu Asp Cys Asp Gln Gly Ser MetAla Phe His Ser Asn Phe Val Ser Leu Asp Cys Asp Gln Gly Ser Met

115 120 125115 120 125

Val Thr Gln His Gly Lys Pro Met Phe Thr Gln Val Cys Val Glu GlyVal Thr Gln His Gly Lys Pro Met Phe Thr Gln Val Cys Val Glu Gly

130 135 140130 135 140

Arg Leu Tyr Leu Glu Phe Met Phe Asp Asp Met Met Arg Ile Lys ThrArg Leu Tyr Leu Glu Phe Met Phe Asp Asp Met Met Arg Ile Lys Thr

145 150 155 160145 150 155 160

Trp His Phe Ser Ile Arg Gln His Arg Glu Leu Ile Pro Arg Ser IleTrp His Phe Ser Ile Arg Gln His Arg Glu Leu Ile Pro Arg Ser Ile

165 170 175165 170 175

Leu Ala Met His Ala Gln Asp Pro Gln Met Leu Asp Gln Leu Ser LysLeu Ala Met His Ala Gln Asp Pro Gln Met Leu Asp Gln Leu Ser Lys

180 185 190180 185 190

Asn Ile Thr Arg Cys Gly Leu Ser Asn Ser Thr Leu Asn Tyr Leu ArgAsn Ile Thr Arg Cys Gly Leu Ser Asn Ser Thr Leu Asn Tyr Leu Arg

195 200 205195 200 205

Leu Cys Val Ile Leu Glu Pro Met Gln Glu Leu Met Ser Arg His LysLeu Cys Val Ile Leu Glu Pro Met Gln Glu Leu Met Ser Arg His Lys

210 215 220210 215 220

Thr Tyr Ser Leu Ser Pro Arg Asp Cys Leu Lys Thr Cys Leu Phe GlnThr Tyr Ser Leu Ser Pro Arg Asp Cys Leu Lys Thr Cys Leu Phe Gln

225 230 235 240225 230 235 240

Lys Trp Gln Arg Met Val Ala Pro Pro Ala Glu Pro Thr Arg Gln GlnLys Trp Gln Arg Met Val Ala Pro Pro Ala Glu Pro Thr Arg Gln Gln

245 250 255245 250 255

Pro Ser Lys Arg Arg Lys Arg Lys Met Ser Gly Gly Ser Thr Met SerPro Ser Lys Arg Arg Lys Arg Lys Met Ser Gly Gly Ser Thr Met Ser

260 265 270260 265 270

Ser Gly Gly Gly Asn Thr Asn Asn Ser Asn Ser Lys Lys Lys Ser ProSer Gly Gly Gly Asn Thr Asn Asn Ser Asn Ser Lys Lys Lys Ser Pro

275 280 285275 280 285

Ala Ser Thr Phe Ala Leu Ser Ser Gln Val Pro Asp Val Met Val ValAla Ser Thr Phe Ala Leu Ser Ser Gln Val Pro Asp Val Met Val Val

290 295 300290 295 300

Gly Glu Pro Thr Leu Met Gly Gly Glu Phe Gly Asp Glu Asp Glu ArgGly Glu Pro Thr Leu Met Gly Gly Glu Phe Gly Asp Glu Asp Glu Arg

305 310 315 320305 310 315 320

Leu Ile Thr Arg Leu Glu Asn Thr Gln Phe Asp Ala Ala Asn Gly IleLeu Ile Thr Arg Leu Glu Asn Thr Gln Phe Asp Ala Ala Asn Gly Ile

325 330 335325 330 335

Asp Asp Glu Asp Ser Phe Asn Asn Ser Pro Ala Leu Gly Ala Asn SerAsp Asp Glu Asp Ser Phe Asn Asn Ser Pro Ala Leu Gly Ala Asn Ser

340 345 350340 345 350

Pro Trp Asn Ser Lys Pro Pro Ser Ser Gln Glu Ser Lys Ser Glu AsnPro Trp Asn Ser Lys Pro Pro Ser Ser Gln Glu Ser Lys Ser Glu Asn

355 360 365355 360 365

Pro Thr Ser Gln Ala Ser GlnPro Thr Ser Gln Ala Ser Gln

370 375370 375

<210> 2<210> 2

<211> 200<211> 200

<212> PRT<212> PRT

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 2<400> 2

1 5 10 151 5 10 15

20 25 3020 25 30

35 40 4535 40 45

50 55 6050 55 60

65 70 75 8065 70 75 80

85 90 9585 90 95

100 105 110100 105 110

115 120 125115 120 125

130 135 140130 135 140

145 150 155 160145 150 155 160

165 170 175165 170 175

180 185 190180 185 190

Asn Ile Thr Arg Cys Gly Leu SerAsn Ile Thr Arg Cys Gly Leu Ser

195 200195 200

<210> 3<210> 3

<211> 1367<211> 1367

<212> PRT<212> PRT

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 3<400> 3

Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser Val GlyAsp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser Val Gly

1 5 10 151 5 10 15

Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe LysTrp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys

20 25 3020 25 30

Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile GlyVal Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile Gly

35 40 4535 40 45

Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu LysAla Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys

50 55 6050 55 60

Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys TyrArg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr

65 70 75 8065 70 75 80

Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser PheLeu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser Phe

85 90 9585 90 95

Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys HisPhe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys His

100 105 110100 105 110

Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr HisGlu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr His

115 120 125115 120 125

Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp SerGlu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp Ser

130 135 140130 135 140

Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His MetThr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His Met

145 150 155 160145 150 155 160

Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro AspIle Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp

165 170 175165 170 175

Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr AsnAsn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn

180 185 190180 185 190

Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala LysGln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala Lys

195 200 205195 200 205

Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn LeuAla Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu

210 215 220210 215 220

Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn LeuIle Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu

225 230 235 240225 230 235 240

Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe AspIle Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp

245 250 255245 250 255

Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp AspLeu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp

260 265 270260 265 270

Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp LeuAsp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu

275 280 285275 280 285

Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp IlePhe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile

290 295 300290 295 300

Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser MetLeu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser Met

305 310 315 320305 310 315 320

Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys AlaIle Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys Ala

325 330 335325 330 335

Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe AspLeu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp

340 345 350340 345 350

Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser GlnGln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln

355 360 365355 360 365

Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp GlyGlu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp Gly

370 375 380370 375 380

Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg LysThr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys

385 390 395 400385 390 395 400

Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu GlyGln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu Gly

405 410 415405 410 415

Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe LeuGlu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu

420 425 430420 425 430

Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile ProLys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro

435 440 445435 440 445

Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp MetTyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp Met

450 455 460450 455 460

Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu ValThr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu Val

465 470 475 480465 470 475 480

Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr AsnVal Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr Asn

485 490 495485 490 495

Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser LeuPhe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser Leu

500 505 510500 505 510

Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys TyrLeu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr

515 520 525515 520 525

Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln LysVal Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys

530 535 540530 535 540

Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr ValLys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr Val

545 550 555 560545 550 555 560

Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp SerLys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser

565 570 575565 570 575

Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly ThrVal Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr

580 585 590580 585 590

Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp AsnTyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn

595 600 605595 600 605

Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr LeuGlu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr Leu

610 615 620610 615 620

Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala HisPhe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala His

625 630 635 640625 630 635 640

Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr ThrLeu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr

645 650 655645 650 655

Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp LysGly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys

660 665 670660 665 670

Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe AlaGln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala

675 680 685675 680 685

Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe LysAsn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe Lys

690 695 700690 695 700

Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu HisGlu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu His

705 710 715 720705 710 715 720

Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly IleGlu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile

725 730 735725 730 735

Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly ArgLeu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly Arg

740 745 750740 745 750

His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln ThrHis Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln Thr

755 760 765755 760 765

Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile GluThr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile Glu

770 775 780770 775 780

Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro ValGlu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro Val

785 790 795 800785 790 795 800

Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu GlnGlu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln

805 810 815805 810 815

Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg LeuAsn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu

820 825 830820 825 830

Ser Asp Tyr Asp Val Asp Ala Ile Val Pro Gln Ser Phe Leu Lys AspSer Asp Tyr Asp Val Asp Ala Ile Val Pro Gln Ser Phe Leu Lys Asp

835 840 845835 840 845

Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Ala Arg GlyAsp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Ala Arg Gly

850 855 860850 855 860

Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys AsnLys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn

865 870 875 880865 870 875 880

Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys PheTyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe

885 890 895885 890 895

Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp LysAsp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys

900 905 910900 905 910

Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr LysAla Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr Lys

915 920 925915 920 925

His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp GluHis Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu

930 935 940930 935 940

Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser LysAsn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser Lys

945 950 955 960945 950 955 960

Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg GluLeu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu

965 970 975965 970 975

Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val ValIle Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val Val

980 985 990980 985 990

Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe ValGly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val

995 1000 1005995 1000 1005

Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys SerTyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser

1010 1015 10201010 1015 1020

Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser AsnGlu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn

1025 1030 1035 10401025 1030 1035 1040

Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu IleIle Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile

1045 1050 10551045 1050 1055

Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile ValArg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val

1060 1065 10701060 1065 1070

Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser MetTrp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met

1075 1080 10851075 1080 1085

Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly PhePro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe

1090 1095 11001090 1095 1100

Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile AlaSer Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala

1105 1110 1115 11201105 1110 1115 1120

Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser ProArg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro

1125 1130 11351125 1130 1135

Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly LysThr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly Lys

1140 1145 11501140 1145 1150

Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr Ile MetSer Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met

1155 1160 11651155 1160 1165

Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala LysGlu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys

1170 1175 11801170 1175 1180

Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys TyrGly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr

1185 1190 1195 12001185 1190 1195 1200

Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser AlaSer Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala

1205 1210 12151205 1210 1215

Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr ValGly Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val

1220 1225 12301220 1225 1230

Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser ProAsn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro

1235 1240 12451235 1240 1245

Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His TyrGlu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His Tyr

1250 1255 12601250 1255 1260

Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val IleLeu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val Ile

1265 1270 1275 12801265 1270 1275 1280

Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys HisLeu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys His

1285 1290 12951285 1290 1295

Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu PheArg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe

1300 1305 13101300 1305 1310

Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp ThrThr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr

1315 1320 13251315 1320 1325

Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp AlaThr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala

1330 1335 13401330 1335 1340

Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile AspThr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp

1345 1350 1355 13601345 1350 1355 1360

Leu Ser Gln Leu Gly Gly AspLeu Ser Gln Leu Gly Gly Asp

13651365

<210> 4<210> 4

<211> 20<211> 20

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 4<400> 4

ggccaagaga tatatcttag 20ggccaagaga tatatcttag 20

<210> 5<210> 5

<211> 20<211> 20

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 5<400> 5

gtgccagaag agccaaggac 20gtgccagaag agccaaggac 20

<210> 6<210> 6

<211> 20<211> 20

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 6<400> 6

gtggagccac accctagggt 20gtggagccac accctagggt 20

<210> 7<210> 7

<211> 20<211> 20

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 7<400> 7

aatatgtcac attctgtctc 20aatatgtcac attctgtctc 20

<210> 8<210> 8

<211> 20<211> 20

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 8<400> 8

ggactatggg aggtcactaa 20ggactatggg aggtcactaa 20

<210> 9<210> 9

<211> 20<211> 20

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 9<400> 9

gaaggttaca cagaaccaga 2020

<210> 10<210> 10

<211> 20<211> 20

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 10<400> 10

ggagcgcacc atcttcttca 20ggagcgcacc atcttcttca 20

<210> 11<210> 11

<211> 10428<211> 10428

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 11<400> 11

gacggatcgg gagatctccc gatcccctat ggtcgactct cagtacaatc tgctctgatg 60gacggatcgg gagatctccc gatcccctat ggtcgactct cagtacaatc tgctctgatg 60

ccgcatagtt aagccagtat ctgctccctg cttgtgtgtt ggaggtcgct gagtagtgcg 120ccgcatagtt aagccagtat ctgctccctg cttgtgtgtt ggaggtcgct gagtagtgcg 120

cgagcaaaat ttaagctaca acaaggcaag gcttgaccga caattgcatg aagaatctgc 180cgagcaaaat ttaagctaca acaaggcaag gcttgaccga caattgcatg aagaatctgc 180

ttagggttag gcgttttgcg ctgcttcgcg atgtacgggc cagatatacg cgttgacatt 240ttagggttag gcgttttgcg ctgcttcgcg atgtacgggc cagatatacg cgttgacatt 240

gattattgac tagttattaa tagtaatcaa ttacggggtc attagttcat agcccatata 300gattattgac tagttattaa tagtaatcaa ttacggggtc attagttcat agcccatata 300

tggagttccg cgttacataa cttacggtaa atggcccgcc tggctgaccg cccaacgacc 360tggagttccg cgttacataa cttacggtaa atggcccgcc tggctgaccg cccaacgacc 360

cccgcccatt gacgtcaata atgacgtatg ttcccatagt aacgccaata gggactttcc 420cccgcccatt gacgtcaata atgacgtatg ttcccatagt aacgccaata gggactttcc 420

attgacgtca atgggtggag tatttacggt aaactgccca cttggcagta catcaagtgt 480attgacgtca atgggtggag tatttacggt aaactgccca cttggcagta catcaagtgt 480

atcatatgcc aagtacgccc cctattgacg tcaatgacgg taaatggccc gcctggcatt 540atcatatgcc aagtacgccc cctattgacg tcaatgacgg taaatggccc gcctggcatt 540

atgcccagta catgacctta tgggactttc ctacttggca gtacatctac gtattagtca 600atgcccagta catgacctta tgggactttc ctacttggca gtacatctac gtattagtca 600

tcgctattac catggtgatg cggttttggc agtacatcaa tgggcgtgga tagcggtttg 660tcgctattac catggtgatg cggttttggc agtacatcaa tgggcgtgga tagcggtttg 660

actcacgggg atttccaagt ctccacccca ttgacgtcaa tgggagtttg ttttggcacc 720actcacgggg atttccaagt ctccacccca ttgacgtcaa tggggagtttg ttttggcacc 720

aaaatcaacg ggactttcca aaatgtcgta acaactccgc cccattgacg caaatgggcg 780aaaatcaacg ggactttcca aaatgtcgta acaactccgcccattgacg caaatgggcg 780

gtaggcgtgt acggtgggag gtctatataa gcagagctct ctggctaact agagaaccca 840gtaggcgtgt acggtggggag gtctatataa gcagagctct ctggctaact agagaaccca 840

ctgcttactg gcttatcgaa attaatacga ctcactatag ggagacccaa gctggctagc 900ctgcttactg gctttatcgaa attaatacga ctcactatag ggagacccaa gctggctagc 900

accatgggac ctaagaaaaa gaggaaggtg gcggccgctg gcggcagcat gctggatagg 960accatgggac ctaagaaaaa gaggaaggtg gcggccgctg gcggcagcat gctggatagg 960

gatgtgggtc caactcccat gtatccgcct acatacctgg agccagggat tgggaggcac 1020gatgtgggtc caactcccat gtatccgcct acatacctgg agccagggat tgggaggcac 1020

acaccatatg gcaaccaaac tgactacaga atatttgagc ttaacaaacg gcttcagaac 1080acaccatatg gcaaccaaac tgactacaga atatttgagc ttaacaaacg gcttcagaac 1080

tggacagagg agtgtgacaa tctctggtgg gatgcattca cgactgagtt ctttgaggat 1140tggacagagg agtgtgacaa tctctggtgg gatgcattca cgactgagtt ctttgaggat 1140

gatgccatgt tgaccatcac tttctgcctg gaggatggac caaagagata taccattggc 1200gatgccatgt tgaccatcac tttctgcctg gaggatggac caaagagata taccattggc 1200

cggaccctga tcccacgcta cttccgcagc atctttgagg ggggtgctac ggagctgtac 1260cggaccctga tcccacgcta cttccgcagc atctttgagg ggggtgctac ggagctgtac 1260

tatgttctta agcaccccaa ggaggcattc cacagcaact ttgtgtccct cgactgtgac 1320tatgttctta agcaccccaa ggaggcattc cacagcaact ttgtgtccct cgactgtgac 1320

cagggcagca tggtgaccca gcatggcaag cccatgttca cccaggtgtg tgtggagggc 1380cagggcagca tggtgaccca gcatggcaag cccatgttca cccaggtgtg tgtggagggc 1380

cggttgtacc tggagttcat gtttgacgac atgatgcgga taaagacgtg gcacttcagc 1440cggttgtacc tggagttcat gtttgacgac atgatgcgga taaagacgtg gcacttcagc 1440

atccggcagc accgagagct catcccccgc agcatccttg ccatgcatgc ccaagacccc 1500atccggcagc accgagagct catcccccgc agcatccttg ccatgcatgc ccaagacccc 1500

cagatgttgg atcagctctc caaaaacatc actcggtgtg ggctgtccaa ttccactctc 1560cagatgttgg atcagctctc caaaaacatc actcggtgtg ggctgtccaa ttccactctc 1560

aactacctcc gactctgtgt gatactcgag cccatgcaag agctcatgtc acgccacaag 1620aactacctcc gactctgtgt gatactcgag cccatgcaag agctcatgtc acgccacaag 1620

acctacagcc tcagcccccg cgactgcctc aagacctgcc ttttccagaa gtggcagcgc 1680acctacagcc tcagcccccg cgactgcctc aagacctgcc ttttccagaa gtggcagcgc 1680

atggtagcac cccctgcgga gcccacacgt cagcagccca gcaaacggcg gaaacggaag 1740atggtagcac cccctgcgga gcccaacgt cagcagccca gcaaacggcg gaaacggaag 1740

atgtcagggg gcagcaccat gagctctggt ggtggcaaca ccaacaacag caacagcaag 1800atgtcagggg gcagcaccat gagctctggt ggtggcaaca ccaacaacag caacagcaag 1800

aagaagagcc cagctagcac cttcgccctc tccagccagg tacctgatgt gatggtggtg 1860aagaagagcc cagctagcac cttcgccctc tccagccagg tacctgatgt gatggtggtg 1860

ggggagccca ccctgatggg cggggagttc ggggacgagg acgagaggct catcacccgg 1920ggggagccca ccctgatggg cggggagttc ggggacgagg acgagaggct catcacccgg 1920

ctggagaaca cccagtttga cgcagccaac ggcattgacg acgaggacag ctttaacaac 1980ctggagaaca cccagtttga cgcagccaac ggcattgacg acgaggacag ctttaacaac 1980

tcccctgcac tgggcgccaa cagcccctgg aacagcaagc ctccgtccag ccaagaaagc 2040tcccctgcac tgggcgccaa cagcccctgg aacagcaagc ctccgtccag ccaagaaagc 2040

aaatcggaga accccacgtc acaggcctcc cagagcggca gcgagacccc cggtaccagc 2100aaatcggaga accccacgtc acaggcctcc cagagcggca gcgagacccc cggtaccagc 2100

gagagcgcca cccccgagag cgacaagaaa tactctattg gactggctat cgggacaaac 2160gagagcgcca cccccgagag cgacaagaaa tactctattg gactggctat cgggacaaac 2160

tccgttggct gggccgtcat aaccgacgag tataaggtgc caagcaagaa attcaaggtg 2220tccgttggct gggccgtcat aaccgacgag tataaggtgc caagcaagaa attcaaggtg 2220

ctgggtaata ctgaccgcca ttcaatcaag aagaacctga tcggagcact cctcttcgac 2280ctgggtaata ctgaccgcca ttcaatcaag aagaacctga tcggagcact cctcttcgac 2280

tccggtgaaa ccgctgaagc tactcggctg aagcggaccg caaggcggag atacacccgc 2340tccggtgaaa ccgctgaagc tactcggctg aagcggaccg caaggcggag atacacccgc 2340

cgcaagaatc ggatatgtta tctgcaagag atctttagca acgaaatggc taaggtggac 2400cgcaagaatc ggatatgtta tctgcaagag atctttagca acgaaatggc taaggtggac 2400

gactccttct ttcaccgcct ggaagagagc tttctggtgg aggaggataa gaaacacgag 2460gactccttct ttcaccgcct ggaagagagc tttctggtgg aggaggataa gaaacacgag 2460

aggcacccta tattcggaaa tatcgtggat gaggtggctt accatgaaaa gtatcctaca 2520aggcacccta tattcggaaa tatcgtggat gaggtggctt accatgaaaa gtatcctaca 2520

atctaccatc tgaggaagaa gctggtggac agcaccgata aagcagacct gaggctcatc 2580atctaccatc tgaggaagaa gctggtggac agcaccgata aagcagacct gaggctcatc 2580

tatctggccc tggctcatat gataaagttt agaggacact ttctgatcga gggcgacctg 2640tatctggccc tggctcatat gataaagttt agaggacact ttctgatcga gggcgacctg 2640

aatcccgata attccgatgt ggataaactc ttcattcaac tggtgcagac atataaccaa 2700aatcccgata attccgatgt ggataaactc ttcattcaac tggtgcagac atataaccaa 2700

ctgttcgagg agaatcccat aaacgcttct ggtgtggatg ccaaggctat tctgtccgct 2760ctgttcgagg agaatcccat aaacgcttct ggtgtggatg ccaaggctat tctgtccgct 2760

cggctgtcca agtcacgcag actggagaat ctgattgccc aactgccagg agaaaagaag 2820cggctgtcca agtcacgcag actggagaat ctgattgccc aactgccagg agaaaagaag 2820

aacggcctgt ttgggaacct catcgccctg agcctgggcc tgacacctaa cttcaagtcc 2880aacggcctgt ttgggaacct catcgccctg agcctgggcc tgacacctaa cttcaagtcc 2880

aattttgatc tggccgaaga tgctaaactc cagctctcca aggacaccta tgacgatgat 2940aattttgatc tggccgaaga tgctaaactc cagctctcca aggacacccta tgacgatgat 2940

ctggacaacc tgctcgcaca gataggcgac cagtacgccg atctctttct ggctgctaag 3000ctggacaacc tgctcgcaca gataggcgac cagtacgccg atctctttct ggctgctaag 3000

aatctctccg acgccattct gctgagcgac atactccggg tcaacactga gatcaccaaa 3060aatctctccg acgccattct gctgagcgac atactccggg tcaacactga gatcaccaaa 3060

gcacctctga gcgcctccat gataaaacgc tatgatgaac accatcaaga cctgactctg 3120gcacctctga gcgcctccat gataaaacgc tatgatgaac accatcaaga cctgactctg 3120

ctcaaagccc tcgtgaggca acagctgcca gagaagtaca aagagatatt cttcgaccag 3180ctcaaagccc tcgtgaggca acagctgcca gagaagtaca aagagatatt cttcgaccag 3180

agcaagaatg gatatgccgg atacatcgat ggcggagcat cacaggaaga attttacaag 3240agcaagaatg gatatgccgg atacatcgat ggcggagcat cacaggaaga attttacaag 3240

ttcatcaaac caatcctcga gaagatggac ggtactgaag agctgctggt gaagctgaac 3300ttcatcaaac caatcctcga gaagatggac ggtactgaag agctgctggt gaagctgaac 3300

agggaggacc tgctgaggaa gcagaggacc tttgataatg gctccattcc acatcagata 3360agggaggacc tgctgaggaa gcagaggacc tttgataatg gctccattcc acatcagata 3360

cacctgggag agctgcatgc aatcctccgc aggcaggagg atttctatcc tttcctgaag 3420cacctgggag agctgcatgc aatcctccgc aggcaggagg atttctatcc tttcctgaag 3420

gataaccggg agaagataga gaagatcctg accttcagga tcccttatta cgtcggccct 3480gataaccggg agaagataga gaagatcctg accttcagga tcccttatta cgtcggccct 3480

ctggctagag gcaactcccg cttcgcttgg atgaccagga aatctgagga gacaattact 3540ctggctagag gcaactcccg cttcgcttgg atgaccagga aatctgagga gacaattact 3540

ccttggaact tcgaagaggt cgtggataag ggcgcaagcg cccagtcatt catcgaacgg 3600ccttggaact tcgaagaggt cgtggataag ggcgcaagcg cccagtcatt catcgaacgg 3600

atgaccaatt tcgataagaa cctgccaaac gagaaggtcc tgcccaaaca ttcactcctg 3660atgaccaatt tcgataagaa cctgccaaac gagaaggtcc tgcccaaaca ttcactcctg 3660

tacgagtatt tcaccgtcta taacgagctg actaaagtga agtacgtgac cgagggcatg 3720tacgagtatt tcaccgtcta taacgagctg actaaagtga agtacgtgac cgagggcatg 3720

aggaagcctg ccttcctgtc cggagagcag aagaaggcta tcgttgatct gctcttcaag 3780aggaagcctg ccttcctgtc cggagagcag aagaaggcta tcgttgatct gctcttcaag 3780

actaatagaa aggtgacagt gaagcagctc aaggaggatt actttaagaa gatcgaatgc 3840actaatagaa aggtgacagt gaagcagctc aaggaggatt actttaagaa gatcgaatgc 3840

tttgactcag tggaaatctc tggcgtggag gaccgcttta atgccagcct gggcacttac 3900tttgactcag tggaaatctc tggcgtggag gaccgcttta atgccagcct gggcacttac 3900

catgatctgc tgaagataat caaagacaaa gatttcctcg ataatgagga gaacgaggac 3960catgatctgc tgaagataat caaagacaaa gatttcctcg ataatgagga gaacgaggac 3960

atcctggaag atatcgtgct gaccctgact ctgttcgagg atagagagat gatcgaagag 4020atcctggaag atatcgtgct gaccctgact ctgttcgagg atagagagat gatcgaagag 4020

cgcctgaaga cctatgccca tctgtttgac gataaagtca tgaaacagct caagcggcgg 4080cgcctgaaga cctatgccca tctgtttgac gataaagtca tgaaacagct caagcggcgg 4080

cgctacactg ggtggggtag actctccagg aaactcataa acggcatccg cgacaaacag 4140cgctacactg ggtggggtag actctccagg aaactcataa acggcatccg cgacaaacag 4140

agcggaaaga ccatcctgga tttcctgaaa tccgacggat tcgctaacag gaacttcatg 4200agcggaaaga ccatcctgga tttcctgaaa tccgacggat tcgctaacag gaacttcatg 4200

caactgattc acgatgactc tctgacattt aaagaggaca tccagaaggc acaggtgagc 4260caactgattc acgatgactc tctgacattt aaagaggaca tccagaaggc acaggtgagc 4260

ggtcaaggcg acagcctgca cgagcacatc gccaacctcg ctggatcacc cgccataaag 4320ggtcaaggcg acagcctgca cgagcacatc gccaacctcg ctggatcacc cgccataaag 4320

aagggaatac tgcagacagt caaggtcgtg gacgaactcg tcaaagtgat gggtcggcac 4380aagggaatac tgcagacagt caaggtcgtg gacgaactcg tcaaagtgat gggtcggcac 4380

aagccagaga atatcgttat cgaaatggca agggagaacc aaaccaccca gaagggccag 4440aagccagaga atatcgttat cgaaatggca agggagaacc aaaccaccca gaagggccag 4440

aagaactctc gggaacggat gaaaagaatc gaagagggaa ttaaggagct gggatctcag 4500aagaactctc gggaacggat gaaaagaatc gaagagggaa ttaaggagct gggatctcag 4500

atactgaagg agcaccctgt ggagaataca cagctccaga acgagaaact ctacctgtac 4560atactgaagg agcaccctgt ggagaataca cagctccaga acgagaaact ctacctgtac 4560

tacctccaga acgggcggga catgtacgtt gaccaggaac tcgacatcaa ccggctgtcc 4620tacctccaga acgggcggga catgtacgtt gaccaggaac tcgacatcaa ccggctgtcc 4620

gattatgacg tggacgctat tgttccacag tccttcctca aagatgactc cattgacaac 4680gattatgacg tggacgctat tgttccacag tccttcctca aagatgactc cattgacaac 4680

aaggtgctga ccagatccga taaggcccgc ggtaagtctg acaatgttcc atcagaagag 4740aaggtgctga ccagatccga taaggcccgc ggtaagtctg acaatgttcc atcagaagag 4740

gtggtcaaga agatgaagaa ttactggcgg cagctcctca acgccaaact gatcacccag 4800gtggtcaaga agatgaagaa ttactggcgg cagctcctca acgccaaact gatcacccag 4800

cggaagtttg acaatctgac taaggcagaa agaggaggtc tgagcgaact cgacaaggcc 4860cggaagtttg acaatctgac taaggcagaa agaggaggtc tgagcgaact cgacaaggcc 4860

ggctttatta agaggcaact ggtcgaaaca cgccagatta ccaaacacgt ggcacaaatc 4920ggctttatta agaggcaact ggtcgaaaca cgccagatta ccaaacacgt ggcacaaatc 4920

ctcgactcta ggatgaacac taagtacgat gagaacgata agctgatcag ggaagtgaaa 4980ctcgactcta ggatgaacac taagtacgat gagaacgata agctgatcag ggaagtgaaa 4980

gtgataactc tgaagagcaa gctggtgtct gacttccgga aggactttca attctacaaa 5040gtgataactc tgaagagcaa gctggtgtct gacttccgga aggactttca attctacaaa 5040

gttcgcgaaa taaacaatta ccatcatgct cacgatgcct atctcaatgc tgtcgttggc 5100gttcgcgaaa taaacaatta ccatcatgct cacgatgcct atctcaatgc tgtcgttggc 5100

accgccctga tcaagaaata ccctaaactg gagtctgagt tcgtgtacgg tgactataaa 5160accgccctga tcaagaaata ccctaaactg gagtctgagt tcgtgtacgg tgactataaa 5160

gtctacgatg tgaggaagat gatagcaaag tctgagcaag agattggcaa agccaccgcc 5220gtctacgatg tgaggaagat gatagcaaag tctgagcaag agattggcaa agccaccgcc 5220

aagtacttct tctactctaa tatcatgaat ttctttaaga ctgagataac cctggctaac 5280aagtacttct tctactctaa tatcatgaat ttctttaaga ctgagataac cctggctaac 5280

ggcgaaatcc ggaagcgccc actgatcgaa acaaacggag aaacaggaga aatcgtgtgg 5340ggcgaaatcc ggaagcgccc actgatcgaa acaaacggag aaacaggaga aatcgtgtgg 5340

gataaaggca gggacttcgc aactgtgcgg aaggtgctgt ccatgccaca agtcaatatc 5400gataaaggca gggacttcgc aactgtgcgg aaggtgctgt ccatgccaca agtcaatatc 5400

gtgaagaaga ccgaagtgca gaccggcgga ttctcaaagg agagcatcct gccaaagcgg 5460gtgaagaaga ccgaagtgca gaccggcgga ttctcaaagg agagcatcct gccaaagcgg 5460

aactctgaca agctgatcgc caggaagaaa gattgggacc caaagaagta tggcggtttc 5520aactctgaca agctgatcgc caggaagaaa gattgggacc caaagaagta tggcggtttc 5520

gattccccta cagtggctta ttccgttctg gtcgtggcaa aagtggagaa aggcaagtcc 5580gattccccta cagtggctta ttccgttctg gtcgtggcaa aagtggagaa aggcaagtcc 5580

aagaaactca agtctgttaa ggagctgctc ggaattacta ttatggagag atccagcttc 5640aagaaactca agtctgttaa ggagctgctc ggaattacta ttatggagag atccagcttc 5640

gagaagaatc caatcgattt cctggaagct aagggctata aagaagtgaa gaaagatctc 5700gagaagaatc caatcgattt cctggaagct aagggctata aagaagtgaa gaaagatctc 5700

atcatcaaac tgcccaagta ctctctcttt gagctggaga atggtaggaa gcggatgctg 5760atcatcaaac tgcccaagta ctctctcttt gagctggaga atggtaggaa gcggatgctg 5760

gcctccgccg gagagctgca gaaaggaaac gagctggctc tgccctccaa atacgtgaac 5820gcctccgccg gagagctgca gaaaggaaac gagctggctc tgccctccaa atacgtgaac 5820

ttcctgtatc tggcctccca ctacgagaaa ctcaaaggta gccctgaaga caatgagcag 5880ttcctgtatc tggcctccca ctacgagaaa ctcaaaggta gccctgaaga caatgagcag 5880

aagcaactct ttgttgagca acataaacac tacctggacg aaatcattga acagattagc 5940aagcaactctttgttgagca acataaacac tacctggacg aaatcattga acagattagc 5940

gagttcagca agcgggttat tctggccgat gcaaacctcg ataaagtgct gagcgcatat 6000gagttcagca agcgggttat tctggccgat gcaaacctcg ataaagtgct gagcgcatat 6000

aataagcaca gggacaagcc aattcgcgaa caagcagaga atattatcca cctctttact 6060aataagcaca gggacaagcc aattcgcgaa caagcagaga atattatcca cctctttact 6060

ctgactaatc tgggcgctcc tgctgccttc aagtatttcg atacaactat tgacaggaag 6120ctgactaatc tgggcgctcc tgctgccttc aagtatttcg atacaactat tgacaggaag 6120

cggtacacct ctaccaaaga agttctcgat gccaccctga tacaccagtc aattaccgga 6180cggtacacct ctaccaaaga agttctcgat gccaccctga tacaccagtc aattaccgga 6180

ctgtacgaga ctcgcatcga cctgtctcag ctcggcggcg acggttctcc caagaagaag 6240ctgtacgaga ctcgcatcga cctgtctcag ctcggcggcg acggttctcc caagaagaag 6240

aggaaagtct cgagcggtgg agctgcagga taggaattcg ggcccttcga aggtaagcct 6300aggaaagtct cgagcggtgg agctgcagga taggaattcg ggcccttcga aggtaagcct 6300

atccctaacc ctctcctcgg tctcgattct acgcgtaccg gtcatcatca ccatcaccat 6360atccctaacc ctctcctcgg tctcgattct acgcgtaccg gtcatcatca ccatcaccat 6360

tgagtttaaa cccgctgatc agcctcgact gtgccttcta gttgccagcc atctgttgtt 6420tgagtttaaa cccgctgatc agcctcgact gtgccttcta gttgccagcc atctgttgtt 6420

tgcccctccc ccgtgccttc cttgaccctg gaaggtgcca ctcccactgt cctttcctaa 6480tgcccctccc ccgtgccttc cttgaccctg gaaggtgcca ctcccactgt cctttcctaa 6480

taaaatgagg aaattgcatc gcattgtctg agtaggtgtc attctattct ggggggtggg 6540taaaatgagg aaattgcatc gcattgtctg agtaggtgtc attctattct ggggggtggg 6540

gtggggcagg acagcaaggg ggaggattgg gaagacaata gcaggcatgc tggggatgcg 6600gtggggcagg acagcaaggg ggaggattgg gaagacaata gcaggcatgc tggggatgcg 6600

gtgggctcta tggcttctga ggcggaaaga accagctggg gctctagggg gtatccccac 6660gtgggctcta tggcttctga ggcggaaaga accagctggg gctctagggg gtatccccac 6660

gcgccctgta gcggcgcatt aagcgcggcg ggtgtggtgg ttacgcgcag cgtgaccgct 6720gcgccctgta gcggcgcatt aagcgcggcg ggtgtggtgg ttacgcgcag cgtgaccgct 6720

acacttgcca gcgccctagc gcccgctcct ttcgctttct tcccttcctt tctcgccacg 6780acacttgcca gcgccctagc gcccgctcct ttcgctttct tcccttcctt tctcgccacg 6780

ttcgccggct ttccccgtca agctctaaat cggggcatcc ctttagggtt ccgatttagt 6840ttcgccggct ttccccgtca agctctaaat cggggcatcc ctttagggtt ccgatttagt 6840

gctttacggc acctcgaccc caaaaaactt gattagggtg atggttcacg tagtgggcca 6900gctttacggc acctcgaccc caaaaaactt gattagggtg atggttcacg tagtgggcca 6900

tcgccctgat agacggtttt tcgccctttg acgttggagt ccacgttctt taatagtgga 6960tcgccctgat agacggtttt tcgccctttg acgttggagt ccacgttctt taatagtgga 6960

ctcttgttcc aaactggaac aacactcaac cctatctcgg tctattcttt tgatttataa 7020ctcttgttcc aaactggaac aacactcaac cctatctcgg tctattcttt tgatttataa 7020

gggattttgg ggatttcggc ctattggtta aaaaatgagc tgatttaaca aaaatttaac 7080gggattttgg ggatttcggc ctattggtta aaaaatgagc tgatttaaca aaaatttaac 7080

gcgaattaat tctgtggaat gtgtgtcagt tagggtgtgg aaagtcccca ggctccccag 7140gcgaattaat tctgtggaat gtgtgtcagt tagggtgtgg aaagtcccca ggctccccag 7140

gcaggcagaa gtatgcaaag catgcatctc aattagtcag caaccaggtg tggaaagtcc 7200gcaggcagaa gtatgcaaag catgcatctc aattagtcag caaccaggtg tggaaagtcc 7200

ccaggctccc cagcaggcag aagtatgcaa agcatgcatc tcaattagtc agcaaccata 7260ccaggctccc cagcaggcag aagtatgcaa agcatgcatc tcaattagtc agcaaccata 7260

gtcccgcccc taactccgcc catcccgccc ctaactccgc ccagttccgc ccattctccg 7320gtcccgcccc taactccgcc catcccgccc ctaactccgc ccagttccgc ccattctccg 7320

ccccatggct gactaatttt ttttatttat gcagaggccg aggccgcctc tgcctctgag 7380ccccatggct gactaattttttttatttat gcagaggccg aggccgcctc tgcctctgag 7380

ctattccaga agtagtgagg aggctttttt ggaggcctag gcttttgcaa aaagctcccg 7440ctattccaga agtagtgagg aggctttttt ggaggcctag gcttttgcaa aaagctcccg 7440

ggagcttgta tatccatttt cggatctgat cagcacgtgt tgacaattaa tcatcggcat 7500ggagcttgta tatccatttt cggatctgat cagcacgtgt tgacaattaa tcatcggcat 7500

agtatatcgg catagtataa tacgacaagg tgaggaacta aaccatggcc aagcctttgt 7560agtatatcgg catagtataa tacgacaagg tgaggaacta aaccatggcc aagcctttgt 7560

ctcaagaaga atccaccctc attgaaagag caacggctac aatcaacagc atccccatct 7620ctcaagaaga atccaccctc attgaaagag caacggctac aatcaacagc atccccatct 7620

ctgaagacta cagcgtcgcc agcgcagctc tctctagcga cggccgcatc ttcactggtg 7680ctgaagacta cagcgtcgcc agcgcagctc tctctagcga cggccgcatc ttcactggtg 7680

tcaatgtata tcattttact gggggacctt gtgcagaact cgtggtgctg ggcactgctg 7740tcaatgtata tcattttact gggggacctt gtgcagaact cgtggtgctg ggcactgctg 7740

ctgctgcggc agctggcaac ctgacttgta tcgtcgcgat cggaaatgag aacaggggca 7800ctgctgcggc agctggcaac ctgacttgta tcgtcgcgat cggaaatgag aacaggggca 7800

tcttgagccc ctgcggacgg tgtcgacagg tgcttctcga tctgcatcct gggatcaaag 7860tcttgagccc ctgcggacgg tgtcgacagg tgcttctcga tctgcatcct gggatcaaag 7860

cgatagtgaa ggacagtgat ggacagccga cggcagttgg gattcgtgaa ttgctgccct 7920cgatagtgaa ggacagtgat ggacagccga cggcagttgg gattcgtgaa ttgctgccct 7920

ctggttatgt gtgggagggc taagcacttc gtggccgagg agcaggactg acacgtgcta 7980ctggttatgt gtggggagggc taagcacttc gtggccgagg agcaggactg acacgtgcta 7980

cgagatttcg attccaccgc cgccttctat gaaaggttgg gcttcggaat cgttttccgg 8040cgagatttcg attccaccgc cgccttctat gaaaggttgg gcttcggaat cgttttccgg 8040

gacgccggct ggatgatcct ccagcgcggg gatctcatgc tggagttctt cgcccacccc 8100gacgccggct ggatgatcct ccagcgcggg gatctcatgc tggagttctt cgcccacccc 8100

aacttgttta ttgcagctta taatggttac aaataaagca atagcatcac aaatttcaca 8160aacttgttta ttgcagctta taatggttac aaataaagca atagcatcac aaatttcaca 8160

aataaagcat ttttttcact gcattctagt tgtggtttgt ccaaactcat caatgtatct 8220aataaagcat ttttttcact gcattctagt tgtggtttgt ccaaactcat caatgtatct 8220

tatcatgtct gtataccgtc gacctctagc tagagcttgg cgtaatcatg gtcatagctg 8280tatcatgtct gtataccgtc gacctctagc tagagcttgg cgtaatcatg gtcatagctg 8280

tttcctgtgt gaaattgtta tccgctcaca attccacaca acatacgagc cggaagcata 8340tttcctgtgt gaaattgtta tccgctcaca attccacaca acatacgagc cggaagcata 8340

aagtgtaaag cctggggtgc ctaatgagtg agctaactca cattaattgc gttgcgctca 8400aagtgtaaag cctggggtgc ctaatgagtg agctaactca cattaattgc gttgcgctca 8400

ctgcccgctt tccagtcggg aaacctgtcg tgccagctgc attaatgaat cggccaacgc 8460ctgcccgctt tccagtcggg aaacctgtcg tgccagctgc attaatgaat cggccaacgc 8460

gcggggagag gcggtttgcg tattgggcgc tcttccgctt cctcgctcac tgactcgctg 8520gcggggagag gcggtttgcg tattgggcgc tcttccgctt cctcgctcac tgactcgctg 8520

cgctcggtcg ttcggctgcg gcgagcggta tcagctcact caaaggcggt aatacggtta 8580cgctcggtcg ttcggctgcg gcgagcggta tcagctcact caaaggcggt aatacggtta 8580

tccacagaat caggggataa cgcaggaaag aacatgtgag caaaaggcca gcaaaaggcc 8640tccacagaat caggggataa cgcaggaaag aacatgtgag caaaaggcca gcaaaaggcc 8640

aggaaccgta aaaaggccgc gttgctggcg tttttccata ggctccgccc ccctgacgag 8700aggaaccgta aaaaggccgc gttgctggcg tttttccata ggctccgccc ccctgacgag 8700

catcacaaaa atcgacgctc aagtcagagg tggcgaaacc cgacaggact ataaagatac 8760catcacaaaa atcgacgctc aagtcagagg tggcgaaacc cgacaggact ataaagatac 8760

caggcgtttc cccctggaag ctccctcgtg cgctctcctg ttccgaccct gccgcttacc 8820caggcgtttc cccctggaag ctccctcgtg cgctctcctg ttccgaccct gccgcttacc 8820

ggatacctgt ccgcctttct cccttcggga agcgtggcgc tttctcaatg ctcacgctgt 8880ggatacctgt ccgcctttct cccttcggga agcgtggcgc tttctcaatg ctcacgctgt 8880

aggtatctca gttcggtgta ggtcgttcgc tccaagctgg gctgtgtgca cgaacccccc 8940aggtatctca gttcggtgta ggtcgttcgc tccaagctgg gctgtgtgca cgaacccccc 8940

gttcagcccg accgctgcgc cttatccggt aactatcgtc ttgagtccaa cccggtaaga 9000gttcagcccg accgctgcgc cttatccggt aactatcgtc ttgagtccaa cccggtaaga 9000

cacgacttat cgccactggc agcagccact ggtaacagga ttagcagagc gaggtatgta 9060cacgacttat cgccactggc agcagccact ggtaacagga ttagcagagc gaggtatgta 9060

ggcggtgcta cagagttctt gaagtggtgg cctaactacg gctacactag aaggacagta 9120ggcggtgcta cagagttctt gaagtggtgg cctaactacg gctacactag aaggacagta 9120

tttggtatct gcgctctgct gaagccagtt accttcggaa aaagagttgg tagctcttga 9180tttggtatct gcgctctgct gaagccagtt accttcggaa aaagagttgg tagctcttga 9180

tccggcaaac aaaccaccgc tggtagcggt ggtttttttg tttgcaagca gcagattacg 9240tccggcaaac aaaccaccgc tggtagcggt ggtttttttg tttgcaagca gcagattacg 9240

cgcagaaaaa aaggatctca agaagatcct ttgatctttt ctacggggtc tgacgctcag 9300cgcagaaaaa aaggatctca agaagatcct ttgatctttt ctacggggtc tgacgctcag 9300

tggaacgaaa actcacgtta agggattttg gtcatgagat tatcaaaaag gatcttcacc 9360tggaacgaaa actcacgtta agggattttg gtcatgagat tatcaaaaag gatcttcacc 9360

tagatccttt taaattaaaa atgaagtttt aaatcaatct aaagtatata tgagtaaact 9420tagatccttt taaattaaaa atgaagtttt aaatcaatct aaagtatata tgagtaaact 9420

tggtctgaca gttaccaatg cttaatcagt gaggcaccta tctcagcgat ctgtctattt 9480tggtctgaca gttaccaatg cttaatcagt gaggcaccta tctcagcgat ctgtctattt 9480

cgttcatcca tagttgcctg actccccgtc gtgtagataa ctacgatacg ggagggctta 9540cgttcatcca tagttgcctg actccccgtc gtgtagataa ctacgatacg ggagggctta 9540

ccatctggcc ccagtgctgc aatgataccg cgagacccac gctcaccggc tccagattta 9600ccatctggcc ccagtgctgc aatgataccg cgagaccccac gctcaccggc tccagattta 9600

tcagcaataa accagccagc cggaagggcc gagcgcagaa gtggtcctgc aactttatcc 9660tcagcaataa accagccagc cggaagggcc gagcgcagaa gtggtcctgc aactttatcc 9660

gcctccatcc agtctattaa ttgttgccgg gaagctagag taagtagttc gccagttaat 9720gcctccatcc agtctattaa ttgttgccgg gaagctagag taagtagttc gccagttaat 9720

agtttgcgca acgttgttgc cattgctaca ggcatcgtgg tgtcacgctc gtcgtttggt 9780agtttgcgca acgttgttgc cattgctaca ggcatcgtgg tgtcacgctc gtcgtttggt 9780

atggcttcat tcagctccgg ttcccaacga tcaaggcgag ttacatgatc ccccatgttg 9840atggcttcat tcagctccgg ttcccaacga tcaaggcgag ttacatgatc ccccatgttg 9840

tgcaaaaaag cggttagctc cttcggtcct ccgatcgttg tcagaagtaa gttggccgca 9900tgcaaaaaag cggttagctc cttcggtcct ccgatcgttg tcagaagtaa gttggccgca 9900

gtgttatcac tcatggttat ggcagcactg cataattctc ttactgtcat gccatccgta 9960gtgttatcac tcatggttat ggcagcactg cataattctc ttactgtcat gccatccgta 9960

agatgctttt ctgtgactgg tgagtactca accaagtcat tctgagaata gtgtatgcgg 10020agatgctttt ctgtgactgg tgagtactca accaagtcat tctgagaata gtgtatgcgg 10020

cgaccgagtt gctcttgccc ggcgtcaata cgggataata ccgcgccaca tagcagaact 10080cgaccgagtt gctcttgccc ggcgtcaata cgggataata ccgcgccaca tagcagaact 10080

ttaaaagtgc tcatcattgg aaaacgttct tcggggcgaa aactctcaag gatcttaccg 10140ttaaaagtgc tcatcattgg aaaacgttct tcggggcgaa aactctcaag gatcttaccg 10140

ctgttgagat ccagttcgat gtaacccact cgtgcaccca actgatcttc agcatctttt 10200ctgttgagat ccagttcgat gtaacccact cgtgcaccca actgatcttc agcatctttt 10200

actttcacca gcgtttctgg gtgagcaaaa acaggaaggc aaaatgccgc aaaaaaggga 10260actttcacca gcgtttctgg gtgagcaaaa acaggaaggc aaaatgccgc aaaaaaggga 10260

ataagggcga cacggaaatg ttgaatactc atactcttcc tttttcaata ttattgaagc 10320ataagggcga cacggaaatg ttgaatactc atactcttcc tttttcaata ttattgaagc 10320

atttatcagg gttattgtct catgagcgga tacatatttg aatgtattta gaaaaataaa 10380atttatcagg gttattgtct catgagcgga tacatatttg aatgtattta gaaaaataaa 10380

caaatagggg ttccgcgcac atttccccga aaagtgccac ctgacgtc 10428caaatagggg ttccgcgcac atttccccga aaagtgccac ctgacgtc 10428

<210> 12<210> 12

<211> 10458<211> 10458

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 12<400> 12

accatgggac ctaagaaaaa gaggaaggtg gcggccgctg actacaaaga ccatgacggt 960accatgggac ctaagaaaaa gaggaaggtg gcggccgctg actacaaaga ccatgacggt 960

gattataaag atcatgacat cgactacaag gatgacgatg acaagtctag agacaagaaa 1020gattataaag atcatgacat cgactacaag gatgacgatg acaagtctag agacaagaaa 1020

tactctattg gactggctat cgggacaaac tccgttggct gggccgtcat aaccgacgag 1080tactctattg gactggctat cgggacaaac tccgttggct gggccgtcat aaccgacgag 1080

tataaggtgc caagcaagaa attcaaggtg ctgggtaata ctgaccgcca ttcaatcaag 1140tataaggtgc caagcaagaa attcaaggtg ctgggtaata ctgaccgcca ttcaatcaag 1140

aagaacctga tcggagcact cctcttcgac tccggtgaaa ccgctgaagc tactcggctg 1200aagaacctga tcggagcact cctcttcgac tccggtgaaa ccgctgaagc tactcggctg 1200

aagcggaccg caaggcggag atacacccgc cgcaagaatc ggatatgtta tctgcaagag 1260aagcggaccg caaggcggag atacacccgc cgcaagaatc ggatatgtta tctgcaagag 1260

atctttagca acgaaatggc taaggtggac gactccttct ttcaccgcct ggaagagagc 1320atctttagca acgaaatggc taaggtggac gactccttct ttcaccgcct ggaagagagc 1320

tttctggtgg aggaggataa gaaacacgag aggcacccta tattcggaaa tatcgtggat 1380tttctggtgg aggaggataa gaaacacgag aggcacccta tattcggaaa tatcgtggat 1380

gaggtggctt accatgaaaa gtatcctaca atctaccatc tgaggaagaa gctggtggac 1440gaggtggctt accatgaaaa gtatcctaca atctaccatc tgaggaagaa gctggtggac 1440

agcaccgata aagcagacct gaggctcatc tatctggccc tggctcatat gataaagttt 1500agcaccgata aagcagacct gaggctcatc tatctggccc tggctcatat gataaagttt 1500

agaggacact ttctgatcga gggcgacctg aatcccgata attccgatgt ggataaactc 1560agaggacact ttctgatcga gggcgacctg aatcccgata attccgatgt ggataaactc 1560

ttcattcaac tggtgcagac atataaccaa ctgttcgagg agaatcccat aaacgcttct 1620ttcattcaac tggtgcagac atataaccaa ctgttcgagg agaatcccat aaacgcttct 1620

ggtgtggatg ccaaggctat tctgtccgct cggctgtcca agtcacgcag actggagaat 1680ggtgtggatg ccaaggctat tctgtccgct cggctgtcca agtcacgcag actggagaat 1680

ctgattgccc aactgccagg agaaaagaag aacggcctgt ttgggaacct catcgccctg 1740ctgattgccc aactgccagg agaaaagaag aacggcctgt ttgggaacct catcgccctg 1740

agcctgggcc tgacacctaa cttcaagtcc aattttgatc tggccgaaga tgctaaactc 1800agcctgggcc tgacacctaa cttcaagtcc aattttgatc tggccgaaga tgctaaactc 1800

cagctctcca aggacaccta tgacgatgat ctggacaacc tgctcgcaca gataggcgac 1860cagctctcca aggacacccta tgacgatgat ctggacaacc tgctcgcaca gataggcgac 1860

cagtacgccg atctctttct ggctgctaag aatctctccg acgccattct gctgagcgac 1920cagtacgccg atctctttct ggctgctaag aatctctccg acgccattct gctgagcgac 1920

atactccggg tcaacactga gatcaccaaa gcacctctga gcgcctccat gataaaacgc 1980atactccggg tcaacactga gatcaccaaa gcacctctga gcgcctccat gataaaacgc 1980

tatgatgaac accatcaaga cctgactctg ctcaaagccc tcgtgaggca acagctgcca 2040tatgatgaac accatcaaga cctgactctg ctcaaagccc tcgtgaggca acagctgcca 2040

gagaagtaca aagagatatt cttcgaccag agcaagaatg gatatgccgg atacatcgat 2100gagaagtaca aagagatatt cttcgaccag agcaagaatg gatatgccgg atacatcgat 2100

ggcggagcat cacaggaaga attttacaag ttcatcaaac caatcctcga gaagatggac 2160ggcggagcat cacaggaaga attttacaag ttcatcaaac caatcctcga gaagatggac 2160

ggtactgaag agctgctggt gaagctgaac agggaggacc tgctgaggaa gcagaggacc 2220ggtactgaag agctgctggt gaagctgaac agggaggacc tgctgaggaa gcagaggacc 2220

tttgataatg gctccattcc acatcagata cacctgggag agctgcatgc aatcctccgc 2280tttgataatg gctccattcc acatcagata cacctgggag agctgcatgc aatcctccgc 2280

aggcaggagg atttctatcc tttcctgaag gataaccggg agaagataga gaagatcctg 2340aggcaggagg atttctatcc tttcctgaag gataaccggg agaagataga gaagatcctg 2340

accttcagga tcccttatta cgtcggccct ctggctagag gcaactcccg cttcgcttgg 2400accttcagga tcccttatta cgtcggccct ctggctagag gcaactcccg cttcgcttgg 2400

atgaccagga aatctgagga gacaattact ccttggaact tcgaagaggt cgtggataag 2460atgaccagga aatctgagga gacaattact ccttggaact tcgaagaggt cgtggataag 2460

ggcgcaagcg cccagtcatt catcgaacgg atgaccaatt tcgataagaa cctgccaaac 2520ggcgcaagcg cccagtcatt catcgaacgg atgaccaatt tcgataagaa cctgccaaac 2520

gagaaggtcc tgcccaaaca ttcactcctg tacgagtatt tcaccgtcta taacgagctg 2580gagaaggtcc tgcccaaaca ttcactcctg tacgagtatt tcaccgtcta taacgagctg 2580

actaaagtga agtacgtgac cgagggcatg aggaagcctg ccttcctgtc cggagagcag 2640actaaagtga agtacgtgac cgagggcatg aggaagcctg ccttcctgtc cggagagcag 2640

aagaaggcta tcgttgatct gctcttcaag actaatagaa aggtgacagt gaagcagctc 2700aagaaggcta tcgttgatct gctcttcaag actaatagaa aggtgacagt gaagcagctc 2700

aaggaggatt actttaagaa gatcgaatgc tttgactcag tggaaatctc tggcgtggag 2760aaggaggatt actttaagaa gatcgaatgc tttgactcag tggaaatctc tggcgtggag 2760

gaccgcttta atgccagcct gggcacttac catgatctgc tgaagataat caaagacaaa 2820gaccgcttta atgccagcct gggcacttac catgatctgc tgaagataat caaagacaaa 2820

gatttcctcg ataatgagga gaacgaggac atcctggaag atatcgtgct gaccctgact 2880gatttcctcg ataatgagga gaacgaggac atcctggaag atatcgtgct gaccctgact 2880

ctgttcgagg atagagagat gatcgaagag cgcctgaaga cctatgccca tctgtttgac 2940ctgttcgagg atagagagat gatcgaagag cgcctgaaga cctatgccca tctgtttgac 2940

gataaagtca tgaaacagct caagcggcgg cgctacactg ggtggggtag actctccagg 3000gataaagtca tgaaacagct caagcggcgg cgctacactg ggtggggtag actctccagg 3000

aaactcataa acggcatccg cgacaaacag agcggaaaga ccatcctgga tttcctgaaa 3060aaactcataa acggcatccg cgacaaacag agcggaaaga ccatcctgga tttcctgaaa 3060

tccgacggat tcgctaacag gaacttcatg caactgattc acgatgactc tctgacattt 3120tccgacggat tcgctaacag gaacttcatg caactgattc acgatgactc tctgacattt 3120

aaagaggaca tccagaaggc acaggtgagc ggtcaaggcg acagcctgca cgagcacatc 3180aaagaggaca tccagaaggc acaggtgagc ggtcaaggcg acagcctgca cgagcacatc 3180

gccaacctcg ctggatcacc cgccataaag aagggaatac tgcagacagt caaggtcgtg 3240gccaacctcg ctggatcacc cgccataaag aagggaatac tgcagacagt caaggtcgtg 3240

gacgaactcg tcaaagtgat gggtcggcac aagccagaga atatcgttat cgaaatggca 3300gacgaactcg tcaaagtgat gggtcggcac aagccagaga atatcgttat cgaaatggca 3300

agggagaacc aaaccaccca gaagggccag aagaactctc gggaacggat gaaaagaatc 3360agggagaacc aaaccaccca gaagggccag aagaactctc gggaacggat gaaaagaatc 3360

gaagagggaa ttaaggagct gggatctcag atactgaagg agcaccctgt ggagaataca 3420gaagagggaa ttaaggagct gggatctcag atactgaagg agcaccctgt ggagaataca 3420

cagctccaga acgagaaact ctacctgtac tacctccaga acgggcggga catgtacgtt 3480cagctccaga acgagaaact ctacctgtac tacctccaga acgggcggga catgtacgtt 3480

gaccaggaac tcgacatcaa ccggctgtcc gattatgacg tggacgctat tgttccacag 3540gaccaggaac tcgacatcaa ccggctgtcc gattatgacg tggacgctat tgttccacag 3540

tccttcctca aagatgactc cattgacaac aaggtgctga ccagatccga taaggcccgc 3600tccttcctca aagatgactc cattgacaac aaggtgctga ccagatccga taaggcccgc 3600

ggtaagtctg acaatgttcc atcagaagag gtggtcaaga agatgaagaa ttactggcgg 3660ggtaagtctg acaatgttcc atcagaagag gtggtcaaga agatgaagaa ttactggcgg 3660

cagctcctca acgccaaact gatcacccag cggaagtttg acaatctgac taaggcagaa 3720cagctcctca acgccaaact gatcacccag cggaagtttg acaatctgac taaggcagaa 3720

agaggaggtc tgagcgaact cgacaaggcc ggctttatta agaggcaact ggtcgaaaca 3780agaggaggtc tgagcgaact cgacaaggcc ggctttatta agaggcaact ggtcgaaaca 3780

cgccagatta ccaaacacgt ggcacaaatc ctcgactcta ggatgaacac taagtacgat 3840cgccagatta ccaaacacgt ggcacaaatc ctcgactcta ggatgaacac taagtacgat 3840

gagaacgata agctgatcag ggaagtgaaa gtgataactc tgaagagcaa gctggtgtct 3900gagaacgata agctgatcag ggaagtgaaa gtgataactc tgaagagcaa gctggtgtct 3900

gacttccgga aggactttca attctacaaa gttcgcgaaa taaacaatta ccatcatgct 3960gacttccgga aggactttca attctacaaa gttcgcgaaa taaacaatta ccatcatgct 3960

cacgatgcct atctcaatgc tgtcgttggc accgccctga tcaagaaata ccctaaactg 4020cacgatgcct atctcaatgc tgtcgttggc accgccctga tcaagaaata ccctaaactg 4020

gagtctgagt tcgtgtacgg tgactataaa gtctacgatg tgaggaagat gatagcaaag 4080gagtctgagt tcgtgtacgg tgactataaa gtctacgatg tgaggaagat gatagcaaag 4080

tctgagcaag agattggcaa agccaccgcc aagtacttct tctactctaa tatcatgaat 4140tctgagcaag agattggcaa agccaccgcc aagtacttct tctactctaa tatcatgaat 4140

ttctttaaga ctgagataac cctggctaac ggcgaaatcc ggaagcgccc actgatcgaa 4200ttctttaaga ctgagataac cctggctaac ggcgaaatcc ggaagcgccc actgatcgaa 4200

acaaacggag aaacaggaga aatcgtgtgg gataaaggca gggacttcgc aactgtgcgg 4260acaaacggag aaacaggaga aatcgtgtgg gataaaggca gggacttcgc aactgtgcgg 4260

aaggtgctgt ccatgccaca agtcaatatc gtgaagaaga ccgaagtgca gaccggcgga 4320aaggtgctgt ccatgccaca agtcaatatc gtgaagaaga ccgaagtgca gaccggcgga 4320

ttctcaaagg agagcatcct gccaaagcgg aactctgaca agctgatcgc caggaagaaa 4380ttctcaaagg agagcatcct gccaaagcgg aactctgaca agctgatcgc caggaagaaa 4380

gattgggacc caaagaagta tggcggtttc gattccccta cagtggctta ttccgttctg 4440gattgggacc caaagaagta tggcggtttc gattccccta cagtggctta ttccgttctg 4440

gtcgtggcaa aagtggagaa aggcaagtcc aagaaactca agtctgttaa ggagctgctc 4500gtcgtggcaa aagtggagaa aggcaagtcc aagaaactca agtctgttaa ggagctgctc 4500

ggaattacta ttatggagag atccagcttc gagaagaatc caatcgattt cctggaagct 4560ggaattacta ttatggagag atccagcttc gagaagaatc caatcgattt cctggaagct 4560

aagggctata aagaagtgaa gaaagatctc atcatcaaac tgcccaagta ctctctcttt 4620aagggctata aagaagtgaa gaaagatctc atcatcaaac tgcccaagta ctctctcttt 4620

gagctggaga atggtaggaa gcggatgctg gcctccgccg gagagctgca gaaaggaaac 4680gagctggaga atggtaggaa gcggatgctg gcctccgccg gagagctgca gaaaggaaac 4680

gagctggctc tgccctccaa atacgtgaac ttcctgtatc tggcctccca ctacgagaaa 4740gagctggctc tgccctccaa atacgtgaac ttcctgtatc tggcctccca ctacgagaaa 4740

ctcaaaggta gccctgaaga caatgagcag aagcaactct ttgttgagca acataaacac 4800ctcaaaggta gccctgaaga caatgagcag aagcaactct ttgttgagca acataaacac 4800

tacctggacg aaatcattga acagattagc gagttcagca agcgggttat tctggccgat 4860tacctggacg aaatcattga acagattagc gagttcagca agcgggttat tctggccgat 4860

gcaaacctcg ataaagtgct gagcgcatat aataagcaca gggacaagcc aattcgcgaa 4920gcaaacctcg ataaagtgct gagcgcatat aataagcaca gggacaagcc aattcgcgaa 4920

caagcagaga atattatcca cctctttact ctgactaatc tgggcgctcc tgctgccttc 4980caagcagaga atattatcca cctctttat ctgactaatc tgggcgctcc tgctgccttc 4980

aagtatttcg atacaactat tgacaggaag cggtacacct ctaccaaaga agttctcgat 5040aagtatttcg atacaactat tgacaggaag cggtacacct ctaccaaaga agttctcgat 5040

gccaccctga tacaccagtc aattaccgga ctgtacgaga ctcgcatcga cctgtctcag 5100gccaccctga tacaccagtc aattaccgga ctgtacgaga ctcgcatcga cctgtctcag 5100

ctcggcggcg acggttctcc caagaagaag aggaaagtcg ggcgcgctgg aggaggatcc 5160ctcggcggcg acggttctcc caagaagaag aggaaagtcg ggcgcgctgg aggaggatcc 5160

ggaggaggat ccggaggagg atccatgctg gatagggatg tgggtccaac tcccatgtat 5220ggaggaggat ccggaggagg atccatgctg gatagggatg tgggtccaac tcccatgtat 5220

ccgcctacat acctggagcc agggattggg aggcacacac catatggcaa ccaaactgac 5280ccgcctacat acctggagcc agggattggg aggcacacac catatggcaa ccaaactgac 5280

tacagaatat ttgagcttaa caaacggctt cagaactgga cagaggagtg tgacaatctc 5340tacagaatat ttgagcttaa caaacggctt cagaactgga cagaggagtg tgacaatctc 5340

tggtgggatg cattcacgac tgagttcttt gaggatgatg ccatgttgac catcactttc 5400tggtggggatg cattcacgac tgagttcttt gaggatgatg ccatgttgac catcactttc 5400

tgcctggagg atggaccaaa gagatatacc attggccgga ccctgatccc acgctacttc 5460tgcctggagg atggaccaaa gagatatacc attggccgga ccctgatccc acgctacttc 5460

cgcagcatct ttgagggggg tgctacggag ctgtactatg ttcttaagca ccccaaggag 5520cgcagcatct ttgagggggg tgctacggag ctgtactatg ttcttaagca ccccaaggag 5520

gcattccaca gcaactttgt gtccctcgac tgtgaccagg gcagcatggt gacccagcat 5580gcattccaca gcaactttgt gtccctcgac tgtgaccagg gcagcatggt gacccagcat 5580

ggcaagccca tgttcaccca ggtgtgtgtg gagggccggt tgtacctgga gttcatgttt 5640ggcaagccca tgttcaccca ggtgtgtgtg gagggccggt tgtacctgga gttcatgttt 5640

gacgacatga tgcggataaa gacgtggcac ttcagcatcc ggcagcaccg agagctcatc 5700gacgacatga tgcggataaa gacgtggcac ttcagcatcc ggcagcaccg agagctcatc 5700

ccccgcagca tccttgccat gcatgcccaa gacccccaga tgttggatca gctctccaaa 5760ccccgcagca tccttgccat gcatgcccaa gacccccaga tgttggatca gctctccaaa 5760

aacatcactc ggtgtgggct gtccaattcc actctcaact acctccgact ctgtgtgata 5820aacatcactc ggtgtgggct gtccaattcc actctcaact acctccgact ctgtgtgata 5820

ctcgagccca tgcaagagct catgtcacgc cacaagacct acagcctcag cccccgcgac 5880ctcgagccca tgcaagagct catgtcacgc cacaagacct acagcctcag cccccgcgac 5880

tgcctcaaga cctgcctttt ccagaagtgg cagcgcatgg tagcaccccc tgcggagccc 5940tgcctcaaga cctgcctttt ccagaagtgg cagcgcatgg tagcaccccc tgcggagccc 5940

acacgtcagc agcccagcaa acggcggaaa cggaagatgt cagggggcag caccatgagc 6000acacgtcagc agcccagcaa acggcggaaa cggaagatgt caggggggcag caccatgagc 6000

tctggtggtg gcaacaccaa caacagcaac agcaagaaga agagcccagc tagcaccttc 6060tctggtggtg gcaacaccaa caacagcaac agcaagaaga agagcccagc tagcaccttc 6060

gccctctcca gccaggtacc tgatgtgatg gtggtggggg agcccaccct gatgggcggg 6120gccctctcca gccaggtacc tgatgtgatg gtggtggggg agcccaccct gatgggcggg 6120

gagttcgggg acgaggacga gaggctcatc acccggctgg agaacaccca gtttgacgca 6180gagttcgggg acgaggacga gaggctcatc acccggctgg agaacacccca gtttgacgca 6180

gccaacggca ttgacgacga ggacagcttt aacaactccc ctgcactggg cgccaacagc 6240gccaacggca ttgacgacga ggacagcttt aacaactccc ctgcactggg cgccaacagc 6240

ccctggaaca gcaagcctcc gtccagccaa gaaagcaaat cggagaaccc cacgtcacag 6300ccctggaaca gcaagcctcc gtccagccaa gaaagcaaat cggagaaccc cacgtcacag 6300

gcctcccagg ggcccttcga aggtaagcct atccctaacc ctctcctcgg tctcgattct 6360gcctcccagg ggcccttcga aggtaagcct atccctaacc ctctcctcgg tctcgattct 6360

acgcgtaccg gtcatcatca ccatcaccat tgagtttaaa cccgctgatc agcctcgact 6420acgcgtaccg gtcatcatca ccatcaccat tgagtttaaa cccgctgatc agcctcgact 6420

gtgccttcta gttgccagcc atctgttgtt tgcccctccc ccgtgccttc cttgaccctg 6480gtgccttcta gttgccagcc atctgttgtt tgcccctccc ccgtgccttc cttgaccctg 6480

gaaggtgcca ctcccactgt cctttcctaa taaaatgagg aaattgcatc gcattgtctg 6540gaaggtgcca ctcccactgt cctttcctaa taaaatgagg aaattgcatc gcattgtctg 6540

agtaggtgtc attctattct ggggggtggg gtggggcagg acagcaaggg ggaggattgg 6600agtaggtgtc attctattct gggggtggg gtggggcagg acagcaaggg ggaggattgg 6600

gaagacaata gcaggcatgc tggggatgcg gtgggctcta tggcttctga ggcggaaaga 6660gaagacaata gcaggcatgc tggggatgcg gtgggctcta tggcttctga ggcggaaaga 6660

accagctggg gctctagggg gtatccccac gcgccctgta gcggcgcatt aagcgcggcg 6720accagctggg gctctagggg gtatccccac gcgccctgta gcggcgcatt aagcgcggcg 6720

ggtgtggtgg ttacgcgcag cgtgaccgct acacttgcca gcgccctagc gcccgctcct 6780ggtgtggtgg ttacgcgcag cgtgaccgct acacttgcca gcgccctagc gcccgctcct 6780

ttcgctttct tcccttcctt tctcgccacg ttcgccggct ttccccgtca agctctaaat 6840ttcgctttct tcccttcctt tctcgccacg ttcgccggct ttccccgtca agctctaaat 6840

cggggcatcc ctttagggtt ccgatttagt gctttacggc acctcgaccc caaaaaactt 6900cggggcatcc ctttagggtt ccgatttagt gctttacggc acctcgaccc caaaaaactt 6900

gattagggtg atggttcacg tagtgggcca tcgccctgat agacggtttt tcgccctttg 6960gattagggtg atggttcacg tagtgggcca tcgccctgat agacggtttt tcgccctttg 6960

acgttggagt ccacgttctt taatagtgga ctcttgttcc aaactggaac aacactcaac 7020acgttggagt ccacgttctt taatagtgga ctcttgttcc aaactggaac aacactcaac 7020

cctatctcgg tctattcttt tgatttataa gggattttgg ggatttcggc ctattggtta 7080cctatctcgg tctattcttt tgatttataa gggattttgg ggatttcggc ctattggtta 7080

aaaaatgagc tgatttaaca aaaatttaac gcgaattaat tctgtggaat gtgtgtcagt 7140aaaaatgagc tgatttaaca aaaatttaac gcgaattaat tctgtggaat gtgtgtcagt 7140

tagggtgtgg aaagtcccca ggctccccag gcaggcagaa gtatgcaaag catgcatctc 7200tagggtgtgg aaagtcccca ggctccccag gcaggcagaa gtatgcaaag catgcatctc 7200

aattagtcag caaccaggtg tggaaagtcc ccaggctccc cagcaggcag aagtatgcaa 7260aattagtcag caaccaggtg tggaaagtcc ccaggctccc cagcaggcag aagtatgcaa 7260

agcatgcatc tcaattagtc agcaaccata gtcccgcccc taactccgcc catcccgccc 7320agcatgcatc tcaattagtc agcaaccata gtcccgcccc taactccgcc catcccgccc 7320

ctaactccgc ccagttccgc ccattctccg ccccatggct gactaatttt ttttatttat 7380ctaactccgc ccagttccgc ccattctccg ccccatggct gactaattttttttatttat 7380

gcagaggccg aggccgcctc tgcctctgag ctattccaga agtagtgagg aggctttttt 7440gcagaggccg aggccgcctc tgcctctgag ctattccaga agtagtgagg aggctttttt 7440

ggaggcctag gcttttgcaa aaagctcccg ggagcttgta tatccatttt cggatctgat 7500ggaggcctag gcttttgcaa aaagctcccg ggagcttgta tatccatttt cggatctgat 7500

cagcacgtgt tgacaattaa tcatcggcat agtatatcgg catagtataa tacgacaagg 7560cagcacgtgt tgacaattaa tcatcggcat agtatatcgg catagtataa tacgacaagg 7560

tgaggaacta aaccatggcc aagcctttgt ctcaagaaga atccaccctc attgaaagag 7620tgaggaacta aaccatggcc aagcctttgt ctcaagaaga atccaccctc attgaaagag 7620

caacggctac aatcaacagc atccccatct ctgaagacta cagcgtcgcc agcgcagctc 7680caacggctac aatcaacagc atccccatct ctgaagacta cagcgtcgcc agcgcagctc 7680

tctctagcga cggccgcatc ttcactggtg tcaatgtata tcattttact gggggacctt 7740tctctagcga cggccgcatc ttcactggtg tcaatgtata tcattttact gggggacctt 7740

gtgcagaact cgtggtgctg ggcactgctg ctgctgcggc agctggcaac ctgacttgta 7800gtgcagaact cgtggtgctg ggcactgctg ctgctgcggc agctggcaac ctgacttgta 7800

tcgtcgcgat cggaaatgag aacaggggca tcttgagccc ctgcggacgg tgtcgacagg 7860tcgtcgcgat cggaaatgag aacaggggca tcttgagccc ctgcggacgg tgtcgacagg 7860

tgcttctcga tctgcatcct gggatcaaag cgatagtgaa ggacagtgat ggacagccga 7920tgcttctcga tctgcatcct gggatcaaag cgatagtgaa ggacagtgat ggacagccga 7920

cggcagttgg gattcgtgaa ttgctgccct ctggttatgt gtgggagggc taagcacttc 7980cggcagttgg gattcgtgaa ttgctgccct ctggttatgt gtggggagggc taagcacttc 7980

gtggccgagg agcaggactg acacgtgcta cgagatttcg attccaccgc cgccttctat 8040gtggccgagg agcaggactg acacgtgcta cgagatttcg attccaccgc cgccttctat 8040

gaaaggttgg gcttcggaat cgttttccgg gacgccggct ggatgatcct ccagcgcggg 8100gaaaggttgg gcttcggaat cgttttccgg gacgccggct ggatgatcct ccagcgcggg 8100

gatctcatgc tggagttctt cgcccacccc aacttgttta ttgcagctta taatggttac 8160gatctcatgc tggagttctt cgcccacccc aacttgttta ttgcagctta taatggttac 8160

aaataaagca atagcatcac aaatttcaca aataaagcat ttttttcact gcattctagt 8220aaataaagca atagcatcac aaatttcaca aataaagcat ttttttcact gcattctagt 8220

tgtggtttgt ccaaactcat caatgtatct tatcatgtct gtataccgtc gacctctagc 8280tgtggtttgt ccaaactcat caatgtatct tatcatgtct gtataccgtc gacctctagc 8280

tagagcttgg cgtaatcatg gtcatagctg tttcctgtgt gaaattgtta tccgctcaca 8340tagagcttgg cgtaatcatg gtcatagctg tttcctgtgt gaaattgtta tccgctcaca 8340

attccacaca acatacgagc cggaagcata aagtgtaaag cctggggtgc ctaatgagtg 8400attccacaca acatacgagc cggaagcata aagtgtaaag cctggggtgc ctaatgagtg 8400

agctaactca cattaattgc gttgcgctca ctgcccgctt tccagtcggg aaacctgtcg 8460agctaactca cattaattgc gttgcgctca ctgcccgctt tccagtcggg aaacctgtcg 8460

tgccagctgc attaatgaat cggccaacgc gcggggagag gcggtttgcg tattgggcgc 8520tgccagctgc attaatgaat cggccaacgc gcggggagag gcggtttgcg tattgggcgc 8520

tcttccgctt cctcgctcac tgactcgctg cgctcggtcg ttcggctgcg gcgagcggta 8580tcttccgctt cctcgctcac tgactcgctg cgctcggtcg ttcggctgcg gcgagcggta 8580

tcagctcact caaaggcggt aatacggtta tccacagaat caggggataa cgcaggaaag 8640tcagctcact caaaggcggt aatacggtta tccacagaat caggggataa cgcaggaaag 8640

aacatgtgag caaaaggcca gcaaaaggcc aggaaccgta aaaaggccgc gttgctggcg 8700aacatgtgag caaaaggcca gcaaaaggcc aggaaccgta aaaaggccgc gttgctggcg 8700

tttttccata ggctccgccc ccctgacgag catcacaaaa atcgacgctc aagtcagagg 8760tttttccata ggctccgccc ccctgacgag catcacaaaa atcgacgctc aagtcagagg 8760

tggcgaaacc cgacaggact ataaagatac caggcgtttc cccctggaag ctccctcgtg 8820tggcgaaacc cgacaggact ataaagatac caggcgtttc cccctggaag ctccctcgtg 8820

cgctctcctg ttccgaccct gccgcttacc ggatacctgt ccgcctttct cccttcggga 8880cgctctcctg ttccgaccct gccgcttacc ggatacctgt ccgcctttct cccttcggga 8880

agcgtggcgc tttctcaatg ctcacgctgt aggtatctca gttcggtgta ggtcgttcgc 8940agcgtggcgc tttctcaatg ctcacgctgt aggtatctca gttcggtgta ggtcgttcgc 8940

tccaagctgg gctgtgtgca cgaacccccc gttcagcccg accgctgcgc cttatccggt 9000tccaagctgg gctgtgtgca cgaacccccc gttcagcccg accgctgcgc cttatccggt 9000

aactatcgtc ttgagtccaa cccggtaaga cacgacttat cgccactggc agcagccact 9060aactatcgtc ttgagtccaa cccggtaaga cacgacttat cgccactggc agcagccact 9060

ggtaacagga ttagcagagc gaggtatgta ggcggtgcta cagagttctt gaagtggtgg 9120ggtaacagga ttagcagagc gaggtatgta ggcggtgcta cagagttctt gaagtggtgg 9120

cctaactacg gctacactag aaggacagta tttggtatct gcgctctgct gaagccagtt 9180cctaactacg gctacactag aaggacagta tttggtatct gcgctctgct gaagccagtt 9180

accttcggaa aaagagttgg tagctcttga tccggcaaac aaaccaccgc tggtagcggt 9240accttcggaa aaagagttgg tagctcttga tccggcaaac aaaccaccgc tggtagcggt 9240

ggtttttttg tttgcaagca gcagattacg cgcagaaaaa aaggatctca agaagatcct 9300ggtttttttg tttgcaagca gcagattacg cgcagaaaaa aaggatctca agaagatcct 9300

ttgatctttt ctacggggtc tgacgctcag tggaacgaaa actcacgtta agggattttg 9360ttgatctttt ctacggggtc tgacgctcag tggaacgaaa actcacgtta agggattttg 9360

gtcatgagat tatcaaaaag gatcttcacc tagatccttt taaattaaaa atgaagtttt 9420gtcatgagat tatcaaaaag gatcttcacc tagatccttt taaattaaaa atgaagtttt 9420

aaatcaatct aaagtatata tgagtaaact tggtctgaca gttaccaatg cttaatcagt 9480aaatcaatct aaagtatata tgagtaaact tggtctgaca gttaccaatg cttaatcagt 9480

gaggcaccta tctcagcgat ctgtctattt cgttcatcca tagttgcctg actccccgtc 9540gaggcaccta tctcagcgat ctgtctattt cgttcatcca tagttgcctg actccccgtc 9540

gtgtagataa ctacgatacg ggagggctta ccatctggcc ccagtgctgc aatgataccg 9600gtgtagataa ctacgatacg ggagggctta ccatctggcc ccagtgctgc aatgataccg 9600

cgagacccac gctcaccggc tccagattta tcagcaataa accagccagc cggaagggcc 9660cgagacccac gctcaccggc tccagattta tcagcaataa accagccagc cggaagggcc 9660

gagcgcagaa gtggtcctgc aactttatcc gcctccatcc agtctattaa ttgttgccgg 9720gagcgcagaa gtggtcctgc aactttatcc gcctccatcc agtctattaa ttgttgccgg 9720

gaagctagag taagtagttc gccagttaat agtttgcgca acgttgttgc cattgctaca 9780gaagctagag taagtagttc gccagttaat agtttgcgca acgttgttgc cattgctaca 9780

ggcatcgtgg tgtcacgctc gtcgtttggt atggcttcat tcagctccgg ttcccaacga 9840ggcatcgtgg tgtcacgctc gtcgtttggt atggcttcat tcagctccgg ttcccaacga 9840

tcaaggcgag ttacatgatc ccccatgttg tgcaaaaaag cggttagctc cttcggtcct 9900tcaaggcgag ttacatgatc ccccatgttg tgcaaaaaag cggttagctc cttcggtcct 9900

ccgatcgttg tcagaagtaa gttggccgca gtgttatcac tcatggttat ggcagcactg 9960ccgatcgttg tcagaagtaa gttggccgca gtgttatcac tcatggttat ggcagcactg 9960

cataattctc ttactgtcat gccatccgta agatgctttt ctgtgactgg tgagtactca 10020cataattctc ttactgtcat gccatccgta agatgctttt ctgtgactgg tgagtactca 10020

accaagtcat tctgagaata gtgtatgcgg cgaccgagtt gctcttgccc ggcgtcaata 10080accaagtcat tctgagaata gtgtatgcgg cgaccgagtt gctcttgccc ggcgtcaata 10080

cgggataata ccgcgccaca tagcagaact ttaaaagtgc tcatcattgg aaaacgttct 10140cgggataata ccgcgccaca tagcagaact ttaaaagtgc tcatcattgg aaaacgttct 10140

tcggggcgaa aactctcaag gatcttaccg ctgttgagat ccagttcgat gtaacccact 10200tcggggcgaa aactctcaag gatcttaccg ctgttgagat ccagttcgat gtaacccact 10200

cgtgcaccca actgatcttc agcatctttt actttcacca gcgtttctgg gtgagcaaaa 10260cgtgcaccca actgatcttc agcatctttt actttcacca gcgtttctgg gtgagcaaaa 10260

acaggaaggc aaaatgccgc aaaaaaggga ataagggcga cacggaaatg ttgaatactc 10320acaggaaggc aaaatgccgc aaaaaaggga ataagggcga cacggaaatg ttgaatactc 10320

atactcttcc tttttcaata ttattgaagc atttatcagg gttattgtct catgagcgga 10380atactcttcc tttttcaata ttattgaagc atttatcagg gttattgtct catgagcgga 10380

tacatatttg aatgtattta gaaaaataaa caaatagggg ttccgcgcac atttccccga 10440tacatatttg aatgtattta gaaaaataaa caaatagggg ttccgcgcac atttccccga 10440

aaagtgccac ctgacgtc 10458aaagtgccac ctgacgtc 10458

<210> 13<210> 13

<211> 9903<211> 9903

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 13<400> 13

cagatgttgg atcagctctc caaaaacatc actcggtgtg ggctgtccag cggcagcgag 1560cagatgttgg atcagctctc caaaaacatc actcggtgtg ggctgtccag cggcagcgag 1560

acccccggta ccagcgagag cgccaccccc gagagcgaca agaaatactc tattggactg 1620accccccggta ccagcgagag cgccaccccc gagagcgaca agaaatactc tattggactg 1620

gctatcggga caaactccgt tggctgggcc gtcataaccg acgagtataa ggtgccaagc 1680gctatcggga caaactccgt tggctgggcc gtcataaccg acgagtataa ggtgccaagc 1680

aagaaattca aggtgctggg taatactgac cgccattcaa tcaagaagaa cctgatcgga 1740aagaaattca aggtgctggg taatactgac cgccattcaa tcaagaagaa cctgatcgga 1740

gcactcctct tcgactccgg tgaaaccgct gaagctactc ggctgaagcg gaccgcaagg 1800gcactcctct tcgactccgg tgaaaccgct gaagctactc ggctgaagcg gaccgcaagg 1800

cggagataca cccgccgcaa gaatcggata tgttatctgc aagagatctt tagcaacgaa 1860cggagataca cccgccgcaa gaatcggata tgttatctgc aagagatctt tagcaacgaa 1860

atggctaagg tggacgactc cttctttcac cgcctggaag agagctttct ggtggaggag 1920atggctaagg tggacgactc cttctttcac cgcctggaag agagctttct ggtggaggag 1920

gataagaaac acgagaggca ccctatattc ggaaatatcg tggatgaggt ggcttaccat 1980gataagaaac acgagaggca ccctatattc ggaaatatcg tggatgaggt ggcttaccat 1980

gaaaagtatc ctacaatcta ccatctgagg aagaagctgg tggacagcac cgataaagca 2040gaaaagtatc ctacaatcta ccatctgagg aagaagctgg tggacagcac cgataaagca 2040

gacctgaggc tcatctatct ggccctggct catatgataa agtttagagg acactttctg 2100gacctgaggc tcatctatct ggccctggct catatgataa agtttagagg acactttctg 2100

atcgagggcg acctgaatcc cgataattcc gatgtggata aactcttcat tcaactggtg 2160atcgagggcg acctgaatcc cgataattcc gatgtggata aactcttcat tcaactggtg 2160

cagacatata accaactgtt cgaggagaat cccataaacg cttctggtgt ggatgccaag 2220cagacatata accaactgtt cgaggagaat cccataaacg cttctggtgt ggatgccaag 2220

gctattctgt ccgctcggct gtccaagtca cgcagactgg agaatctgat tgcccaactg 2280gctattctgt ccgctcggct gtccaagtca cgcagactgg agaatctgat tgcccaactg 2280

ccaggagaaa agaagaacgg cctgtttggg aacctcatcg ccctgagcct gggcctgaca 2340ccaggagaaa agaagaacgg cctgtttggg aacctcatcg ccctgagcct gggcctgaca 2340

cctaacttca agtccaattt tgatctggcc gaagatgcta aactccagct ctccaaggac 2400cctaacttca agtccaattt tgatctggcc gaagatgcta aactccagct ctccaaggac 2400

acctatgacg atgatctgga caacctgctc gcacagatag gcgaccagta cgccgatctc 2460acctatgacg atgatctgga caacctgctc gcacagatag gcgaccagta cgccgatctc 2460

tttctggctg ctaagaatct ctccgacgcc attctgctga gcgacatact ccgggtcaac 2520tttctggctg ctaagaatct ctccgacgcc attctgctga gcgacatact ccgggtcaac 2520

actgagatca ccaaagcacc tctgagcgcc tccatgataa aacgctatga tgaacaccat 2580actgagatca ccaaagcacc tctgagcgcc tccatgataa aacgctatga tgaacaccat 2580

caagacctga ctctgctcaa agccctcgtg aggcaacagc tgccagagaa gtacaaagag 2640caagacctga ctctgctcaa agccctcgtg aggcaacagc tgccagagaa gtacaaagag 2640

atattcttcg accagagcaa gaatggatat gccggataca tcgatggcgg agcatcacag 2700atattcttcg accagagcaa gaatggatat gccggataca tcgatggcgg agcatcacag 2700

gaagaatttt acaagttcat caaaccaatc ctcgagaaga tggacggtac tgaagagctg 2760gaagaatttt acaagttcat caaaccaatc ctcgagaaga tggacggtac tgaagagctg 2760

ctggtgaagc tgaacaggga ggacctgctg aggaagcaga ggacctttga taatggctcc 2820ctggtgaagc tgaacaggga ggacctgctg aggaagcaga ggacctttga taatggctcc 2820

attccacatc agatacacct gggagagctg catgcaatcc tccgcaggca ggaggatttc 2880attccacatc agatacacct gggagagctg catgcaatcc tccgcaggca ggaggatttc 2880

tatcctttcc tgaaggataa ccgggagaag atagagaaga tcctgacctt caggatccct 2940tatcctttcc tgaaggataa ccgggagaag atagagaaga tcctgacctt caggatccct 2940

tattacgtcg gccctctggc tagaggcaac tcccgcttcg cttggatgac caggaaatct 3000tattacgtcg gccctctggc tagaggcaac tcccgcttcg cttggatgac caggaaatct 3000

gaggagacaa ttactccttg gaacttcgaa gaggtcgtgg ataagggcgc aagcgcccag 3060gaggagacaa ttactccttg gaacttcgaa gaggtcgtgg ataagggcgc aagcgcccag 3060

tcattcatcg aacggatgac caatttcgat aagaacctgc caaacgagaa ggtcctgccc 3120tcattcatcg aacggatgac caatttcgat aagaacctgc caaacgagaa ggtcctgccc 3120

aaacattcac tcctgtacga gtatttcacc gtctataacg agctgactaa agtgaagtac 3180aaacattcac tcctgtacga gtatttcacc gtctataacg agctgactaa agtgaagtac 3180

gtgaccgagg gcatgaggaa gcctgccttc ctgtccggag agcagaagaa ggctatcgtt 3240gtgaccgagg gcatgaggaa gcctgccttc ctgtccggag agcagaagaa ggctatcgtt 3240

gatctgctct tcaagactaa tagaaaggtg acagtgaagc agctcaagga ggattacttt 3300gatctgctct tcaagactaa tagaaaggtg acagtgaagc agctcaagga ggattacttt 3300

aagaagatcg aatgctttga ctcagtggaa atctctggcg tggaggaccg ctttaatgcc 3360aagaagatcg aatgctttga ctcagtggaa atctctggcg tggaggaccg ctttaatgcc 3360

agcctgggca cttaccatga tctgctgaag ataatcaaag acaaagattt cctcgataat 3420agcctgggca cttaccatga tctgctgaag ataatcaaag acaaagattt cctcgataat 3420

gaggagaacg aggacatcct ggaagatatc gtgctgaccc tgactctgtt cgaggataga 3480gaggagaacg aggacatcct ggaagatatc gtgctgaccc tgactctgtt cgaggataga 3480

gagatgatcg aagagcgcct gaagacctat gcccatctgt ttgacgataa agtcatgaaa 3540gagatgatcg aagagcgcct gaagacctat gcccatctgt ttgacgataa agtcatgaaa 3540

cagctcaagc ggcggcgcta cactgggtgg ggtagactct ccaggaaact cataaacggc 3600cagctcaagc ggcggcgcta cactgggtgg ggtagactct ccaggaaact cataaacggc 3600

atccgcgaca aacagagcgg aaagaccatc ctggatttcc tgaaatccga cggattcgct 3660atccgcgaca aacagagcgg aaagaccatc ctggatttcc tgaaatccga cggattcgct 3660

aacaggaact tcatgcaact gattcacgat gactctctga catttaaaga ggacatccag 3720aacaggaact tcatgcaact gattcacgat gactctctga catttaaaga ggacatccag 3720

aaggcacagg tgagcggtca aggcgacagc ctgcacgagc acatcgccaa cctcgctgga 3780aaggcacagg tgagcggtca aggcgacagc ctgcacgagc acatcgccaa cctcgctgga 3780

tcacccgcca taaagaaggg aatactgcag acagtcaagg tcgtggacga actcgtcaaa 3840tcacccgcca taaagaaggg aatactgcag acagtcaagg tcgtggacga actcgtcaaa 3840

gtgatgggtc ggcacaagcc agagaatatc gttatcgaaa tggcaaggga gaaccaaacc 3900gtgatgggtc ggcacaagcc agagaatatc gttatcgaaa tggcaaggga gaaccaaacc 3900

acccagaagg gccagaagaa ctctcgggaa cggatgaaaa gaatcgaaga gggaattaag 3960acccagaagg gccagaagaa ctctcgggaa cggatgaaaa gaatcgaaga gggaattaag 3960

gagctgggat ctcagatact gaaggagcac cctgtggaga atacacagct ccagaacgag 4020gagctgggat ctcagatact gaaggagcac cctgtggaga atacacagct ccagaacgag 4020

aaactctacc tgtactacct ccagaacggg cgggacatgt acgttgacca ggaactcgac 4080aaactctacc tgtactacct ccagaacggg cgggacatgt acgttgacca ggaactcgac 4080

atcaaccggc tgtccgatta tgacgtggac gctattgttc cacagtcctt cctcaaagat 4140atcaaccggc tgtccgatta tgacgtggac gctattgttc cacagtcctt cctcaaagat 4140

gactccattg acaacaaggt gctgaccaga tccgataagg cccgcggtaa gtctgacaat 4200gactccattg acaacaaggt gctgaccaga tccgataagg cccgcggtaa gtctgacaat 4200

gttccatcag aagaggtggt caagaagatg aagaattact ggcggcagct cctcaacgcc 4260gttccatcag aagaggtggt caagaagatg aagaattact ggcggcagct cctcaacgcc 4260

aaactgatca cccagcggaa gtttgacaat ctgactaagg cagaaagagg aggtctgagc 4320aaactgatca cccagcggaa gtttgacaat ctgactaagg cagaaagagg aggtctgagc 4320

gaactcgaca aggccggctt tattaagagg caactggtcg aaacacgcca gattaccaaa 4380gaactcgaca aggccggctt tattaagagg caactggtcg aaacacgcca gattaccaaa 4380

cacgtggcac aaatcctcga ctctaggatg aacactaagt acgatgagaa cgataagctg 4440cacgtggcac aaatcctcga ctctaggatg aacactaagt acgatgagaa cgataagctg 4440

atcagggaag tgaaagtgat aactctgaag agcaagctgg tgtctgactt ccggaaggac 4500atcagggaag tgaaagtgat aactctgaag agcaagctgg tgtctgactt ccggaaggac 4500

tttcaattct acaaagttcg cgaaataaac aattaccatc atgctcacga tgcctatctc 4560tttcaattct acaaagttcg cgaaataaac aattaccatc atgctcacga tgcctatctc 4560

aatgctgtcg ttggcaccgc cctgatcaag aaatacccta aactggagtc tgagttcgtg 4620aatgctgtcg ttggcaccgc cctgatcaag aaatacccta aactggagtc tgagttcgtg 4620

tacggtgact ataaagtcta cgatgtgagg aagatgatag caaagtctga gcaagagatt 4680tacggtgact ataaagtcta cgatgtgagg aagatgatag caaagtctga gcaagagatt 4680

ggcaaagcca ccgccaagta cttcttctac tctaatatca tgaatttctt taagactgag 4740ggcaaagcca ccgccaagta cttcttctac tctaatatca tgaatttctt taagactgag 4740

ataaccctgg ctaacggcga aatccggaag cgcccactga tcgaaacaaa cggagaaaca 4800ataaccctgg ctaacggcga aatccggaag cgcccactga tcgaaacaaa cggagaaaca 4800

ggagaaatcg tgtgggataa aggcagggac ttcgcaactg tgcggaaggt gctgtccatg 4860ggagaaatcg tgtgggataa aggcagggac ttcgcaactg tgcggaaggt gctgtccatg 4860

ccacaagtca atatcgtgaa gaagaccgaa gtgcagaccg gcggattctc aaaggagagc 4920ccacaagtca atatcgtgaa gaagaccgaa gtgcagaccg gcggattctc aaaggagagc 4920

atcctgccaa agcggaactc tgacaagctg atcgccagga agaaagattg ggacccaaag 4980atcctgccaa agcggaactc tgacaagctg atcgccagga agaaagattg ggacccaaag 4980

aagtatggcg gtttcgattc ccctacagtg gcttattccg ttctggtcgt ggcaaaagtg 5040aagtatggcg gtttcgattc ccctacagtg gcttattccg ttctggtcgt ggcaaaagtg 5040

gagaaaggca agtccaagaa actcaagtct gttaaggagc tgctcggaat tactattatg 5100gagaaaggca agtccaagaa actcaagtct gttaaggagc tgctcggaat tactattatg 5100

gagagatcca gcttcgagaa gaatccaatc gatttcctgg aagctaaggg ctataaagaa 5160gagagatcca gcttcgagaa gaatccaatc gatttcctgg aagctaaggg ctataaagaa 5160

gtgaagaaag atctcatcat caaactgccc aagtactctc tctttgagct ggagaatggt 5220gtgaagaaag atctcatcat caaactgccc aagtactctc tctttgagct ggagaatggt 5220

aggaagcgga tgctggcctc cgccggagag ctgcagaaag gaaacgagct ggctctgccc 5280aggaagcgga tgctggcctc cgccggagag ctgcagaaag gaaacgagct ggctctgccc 5280

tccaaatacg tgaacttcct gtatctggcc tcccactacg agaaactcaa aggtagccct 5340tccaaatacg tgaacttcct gtatctggcc tcccactacg agaaactcaa aggtagccct 5340

gaagacaatg agcagaagca actctttgtt gagcaacata aacactacct ggacgaaatc 5400gaagacaatg agcagaagca actctttgtt gagcaacata aacactacct ggacgaaatc 5400

attgaacaga ttagcgagtt cagcaagcgg gttattctgg ccgatgcaaa cctcgataaa 5460attgaacaga ttagcgagtt cagcaagcgg gttatctctgg ccgatgcaaa cctcgataaa 5460

gtgctgagcg catataataa gcacagggac aagccaattc gcgaacaagc agagaatatt 5520gtgctgagcg catataataa gcacagggac aagccaattc gcgaacaagc agagaatatt 5520

atccacctct ttactctgac taatctgggc gctcctgctg ccttcaagta tttcgataca 5580atccacctct ttactctgac taatctgggc gctcctgctg ccttcaagta tttcgataca 5580

actattgaca ggaagcggta cacctctacc aaagaagttc tcgatgccac cctgatacac 5640actattgaca ggaagcggta cacctctacc aaagaagttc tcgatgccac cctgatacac 5640

cagtcaatta ccggactgta cgagactcgc atcgacctgt ctcagctcgg cggcgacggt 5700cagtcaatta ccggactgta cgagactcgc atcgacctgt ctcagctcgg cggcgacggt 5700

tctcccaaga agaagaggaa agtctcgagc ggtggagctg caggatagga attcgggccc 5760tctcccaaga agaagaggaa agtctcgagc ggtggagctg caggatagga attcgggccc 5760

ttcgaaggta agcctatccc taaccctctc ctcggtctcg attctacgcg taccggtcat 5820ttcgaaggta agcctatccc taaccctctc ctcggtctcg attctacgcg taccggtcat 5820

catcaccatc accattgagt ttaaacccgc tgatcagcct cgactgtgcc ttctagttgc 5880catcaccatc accattgagt ttaaacccgc tgatcagcct cgactgtgcc ttctagttgc 5880

cagccatctg ttgtttgccc ctcccccgtg ccttccttga ccctggaagg tgccactccc 5940cagccatctg ttgtttgccc ctcccccgtg ccttccttga ccctggaagg tgccactccc 5940

actgtccttt cctaataaaa tgaggaaatt gcatcgcatt gtctgagtag gtgtcattct 6000actgtccttt cctaataaaa tgaggaaatt gcatcgcatt gtctgagtag gtgtcattct 6000

attctggggg gtggggtggg gcaggacagc aagggggagg attgggaaga caatagcagg 6060attctggggg gtggggtggg gcaggacagc aagggggagg attgggaaga caatagcagg 6060

catgctgggg atgcggtggg ctctatggct tctgaggcgg aaagaaccag ctggggctct 6120catgctgggg atgcggtggg ctctatggct tctgaggcgg aaagaaccag ctggggctct 6120

agggggtatc cccacgcgcc ctgtagcggc gcattaagcg cggcgggtgt ggtggttacg 6180agggggtatc cccacgcgcc ctgtagcggc gcattaagcg cggcgggtgt ggtggttacg 6180

cgcagcgtga ccgctacact tgccagcgcc ctagcgcccg ctcctttcgc tttcttccct 6240cgcagcgtga ccgctacact tgccagcgcc ctagcgcccg ctcctttcgc tttcttccct 6240

tcctttctcg ccacgttcgc cggctttccc cgtcaagctc taaatcgggg catcccttta 6300tcctttctcg ccacgttcgc cggctttccc cgtcaagctc taaatcgggg catcccttta 6300

gggttccgat ttagtgcttt acggcacctc gaccccaaaa aacttgatta gggtgatggt 6360gggttccgat ttagtgcttt acggcacctc gaccccaaaa aacttgatta gggtgatggt 6360

tcacgtagtg ggccatcgcc ctgatagacg gtttttcgcc ctttgacgtt ggagtccacg 6420tcacgtagtg ggccatcgcc ctgatagacg gtttttcgcc ctttgacgtt ggagtccacg 6420

ttctttaata gtggactctt gttccaaact ggaacaacac tcaaccctat ctcggtctat 6480ttctttaata gtggactctt gttccaaact ggaacaacac tcaaccctat ctcggtctat 6480

tcttttgatt tataagggat tttggggatt tcggcctatt ggttaaaaaa tgagctgatt 6540tcttttgatt tataagggat tttggggatt tcggcctatt ggttaaaaaa tgagctgatt 6540

taacaaaaat ttaacgcgaa ttaattctgt ggaatgtgtg tcagttaggg tgtggaaagt 6600taacaaaaat ttaacgcgaa ttaattctgt ggaatgtgtg tcagttaggg tgtggaaagt 6600

ccccaggctc cccaggcagg cagaagtatg caaagcatgc atctcaatta gtcagcaacc 6660ccccaggctc cccaggcagg cagaagtatg caaagcatgc atctcaatta gtcagcaacc 6660

aggtgtggaa agtccccagg ctccccagca ggcagaagta tgcaaagcat gcatctcaat 6720aggtgtggaa agtccccagg ctccccagca ggcagaagta tgcaaagcat gcatctcaat 6720

tagtcagcaa ccatagtccc gcccctaact ccgcccatcc cgcccctaac tccgcccagt 6780tagtcagcaa ccatagtccc gcccctaact ccgcccatcc cgcccctaac tccgcccagt 6780

tccgcccatt ctccgcccca tggctgacta atttttttta tttatgcaga ggccgaggcc 6840tccgcccatt ctccgcccca tggctgacta atttttttta tttatgcaga ggccgaggcc 6840

gcctctgcct ctgagctatt ccagaagtag tgaggaggct tttttggagg cctaggcttt 6900gcctctgcct ctgagctatt ccagaagtag tgaggaggct tttttggagg cctaggcttt 6900

tgcaaaaagc tcccgggagc ttgtatatcc attttcggat ctgatcagca cgtgttgaca 6960tgcaaaaagc tcccgggagc ttgtatatcc attttcggat ctgatcagca cgtgttgaca 6960

attaatcatc ggcatagtat atcggcatag tataatacga caaggtgagg aactaaacca 7020attaatcatc ggcatagtat atcggcatag tataatacga caaggtgagg aactaaacca 7020

tggccaagcc tttgtctcaa gaagaatcca ccctcattga aagagcaacg gctacaatca 7080tggccaagcc tttgtctcaa gaagaatcca ccctcattga aagagcaacg gctacaatca 7080

acagcatccc catctctgaa gactacagcg tcgccagcgc agctctctct agcgacggcc 7140acagcatccc catctctgaa gactacagcg tcgccagcgc agctctctct agcgacggcc 7140

gcatcttcac tggtgtcaat gtatatcatt ttactggggg accttgtgca gaactcgtgg 7200gcatcttcac tggtgtcaat gtatatcatt ttactggggg accttgtgca gaactcgtgg 7200

tgctgggcac tgctgctgct gcggcagctg gcaacctgac ttgtatcgtc gcgatcggaa 7260tgctgggcac tgctgctgct gcggcagctg gcaacctgac ttgtatcgtc gcgatcggaa 7260

atgagaacag gggcatcttg agcccctgcg gacggtgtcg acaggtgctt ctcgatctgc 7320atgagaacag gggcatcttg agcccctgcg gacggtgtcg acaggtgctt ctcgatctgc 7320

atcctgggat caaagcgata gtgaaggaca gtgatggaca gccgacggca gttgggattc 7380atcctggggat caaagcgata gtgaaggaca gtgatggaca gccgacggca gttgggattc 7380

gtgaattgct gccctctggt tatgtgtggg agggctaagc acttcgtggc cgaggagcag 7440gtgaattgct gccctctggt tatgtgtggg agggctaagc acttcgtggc cgaggagcag 7440

gactgacacg tgctacgaga tttcgattcc accgccgcct tctatgaaag gttgggcttc 7500gactgacacg tgctacgaga tttcgattcc accgccgcct tctatgaaag gttgggcttc 7500

ggaatcgttt tccgggacgc cggctggatg atcctccagc gcggggatct catgctggag 7560ggaatcgttt tccgggacgc cggctggatg atcctccagc gcggggatct catgctggag 7560

ttcttcgccc accccaactt gtttattgca gcttataatg gttacaaata aagcaatagc 7620ttcttcgccc accccaactt gtttattgca gcttataatg gttacaaata aagcaatagc 7620

atcacaaatt tcacaaataa agcatttttt tcactgcatt ctagttgtgg tttgtccaaa 7680atcacaaatt tcacaaataa agcatttttt tcactgcatt ctagttgtgg tttgtccaaa 7680

ctcatcaatg tatcttatca tgtctgtata ccgtcgacct ctagctagag cttggcgtaa 7740ctcatcaatg tatcttatca tgtctgtata ccgtcgacct ctagctagag cttggcgtaa 7740

tcatggtcat agctgtttcc tgtgtgaaat tgttatccgc tcacaattcc acacaacata 7800tcatggtcat agctgtttcc tgtgtgaaat tgttatccgc tcacaattcc acacaacata 7800

cgagccggaa gcataaagtg taaagcctgg ggtgcctaat gagtgagcta actcacatta 7860cgagccggaa gcataaagtg taaagcctgg ggtgcctaat gagtgagcta actcacatta 7860

attgcgttgc gctcactgcc cgctttccag tcgggaaacc tgtcgtgcca gctgcattaa 7920attgcgttgc gctcactgcc cgctttccag tcgggaaacc tgtcgtgcca gctgcattaa 7920

tgaatcggcc aacgcgcggg gagaggcggt ttgcgtattg ggcgctcttc cgcttcctcg 7980tgaatcggcc aacgcgcggg gagaggcggt ttgcgtattg ggcgctcttc cgcttcctcg 7980

ctcactgact cgctgcgctc ggtcgttcgg ctgcggcgag cggtatcagc tcactcaaag 8040ctcactgact cgctgcgctc ggtcgttcgg ctgcggcgag cggtatcagc tcactcaaag 8040

gcggtaatac ggttatccac agaatcaggg gataacgcag gaaagaacat gtgagcaaaa 8100gcggtaatac ggttatccac agaatcaggg gataacgcag gaaagaacat gtgagcaaaa 8100

ggccagcaaa aggccaggaa ccgtaaaaag gccgcgttgc tggcgttttt ccataggctc 8160ggccagcaaa aggccaggaa ccgtaaaaag gccgcgttgc tggcgttttt ccataggctc 8160

cgcccccctg acgagcatca caaaaatcga cgctcaagtc agaggtggcg aaacccgaca 8220cgcccccctg acgagcatca caaaaatcga cgctcaagtc agaggtggcg aaacccgaca 8220

ggactataaa gataccaggc gtttccccct ggaagctccc tcgtgcgctc tcctgttccg 8280ggactataaa gataccaggc gtttccccct ggaagctccc tcgtgcgctc tcctgttccg 8280

accctgccgc ttaccggata cctgtccgcc tttctccctt cgggaagcgt ggcgctttct 8340accctgccgc ttaccggata cctgtccgcc tttctccctt cgggaagcgt ggcgctttct 8340

caatgctcac gctgtaggta tctcagttcg gtgtaggtcg ttcgctccaa gctgggctgt 8400caatgctcac gctgtaggta tctcagttcg gtgtaggtcg ttcgctccaa gctgggctgt 8400

gtgcacgaac cccccgttca gcccgaccgc tgcgccttat ccggtaacta tcgtcttgag 8460gtgcacgaac cccccgttca gcccgaccgc tgcgccttat ccggtaacta tcgtcttgag 8460

tccaacccgg taagacacga cttatcgcca ctggcagcag ccactggtaa caggattagc 8520tccaacccgg taagacacga cttatcgcca ctggcagcag ccactggtaa caggattagc 8520

agagcgaggt atgtaggcgg tgctacagag ttcttgaagt ggtggcctaa ctacggctac 8580agagcgaggt atgtaggcgg tgctacagag ttcttgaagt ggtggcctaa ctacggctac 8580

actagaagga cagtatttgg tatctgcgct ctgctgaagc cagttacctt cggaaaaaga 8640actagaagga cagtatttgg tatctgcgct ctgctgaagc cagttacctt cggaaaaaga 8640

gttggtagct cttgatccgg caaacaaacc accgctggta gcggtggttt ttttgtttgc 8700gttggtagct cttgatccgg caaacaaacc accgctggta gcggtggttt ttttgtttgc 8700

aagcagcaga ttacgcgcag aaaaaaagga tctcaagaag atcctttgat cttttctacg 8760aagcagcaga ttacgcgcag aaaaaaagga tctcaagaag atcctttgat cttttctacg 8760

gggtctgacg ctcagtggaa cgaaaactca cgttaaggga ttttggtcat gagattatca 8820gggtctgacg ctcagtggaa cgaaaactca cgttaaggga ttttggtcat gagattatca 8820

aaaaggatct tcacctagat ccttttaaat taaaaatgaa gttttaaatc aatctaaagt 8880aaaaggatct tcacctagat ccttttaaat taaaaatgaa gttttaaatc aatctaaagt 8880

atatatgagt aaacttggtc tgacagttac caatgcttaa tcagtgaggc acctatctca 8940atatatgagt aaacttggtc tgacagttac caatgcttaa tcagtgaggc acctatctca 8940

gcgatctgtc tatttcgttc atccatagtt gcctgactcc ccgtcgtgta gataactacg 9000gcgatctgtc tatttcgttc atccatagtt gcctgactcc ccgtcgtgta gataactacg 9000

atacgggagg gcttaccatc tggccccagt gctgcaatga taccgcgaga cccacgctca 9060atacgggagg gcttaccatc tggccccagt gctgcaatga taccgcgaga cccacgctca 9060

ccggctccag atttatcagc aataaaccag ccagccggaa gggccgagcg cagaagtggt 9120ccggctccag atttatcagc aataaaccag ccagccggaa gggccgagcg cagaagtggt 9120

cctgcaactt tatccgcctc catccagtct attaattgtt gccgggaagc tagagtaagt 9180cctgcaactt tatccgcctc catccagtct attaattgtt gccgggaagc tagagtaagt 9180

agttcgccag ttaatagttt gcgcaacgtt gttgccattg ctacaggcat cgtggtgtca 9240agttcgccag ttaatagttt gcgcaacgtt gttgccattg ctacaggcat cgtggtgtca 9240

cgctcgtcgt ttggtatggc ttcattcagc tccggttccc aacgatcaag gcgagttaca 9300cgctcgtcgt ttggtatggc ttcattcagc tccggttccc aacgatcaag gcgagttaca 9300

tgatccccca tgttgtgcaa aaaagcggtt agctccttcg gtcctccgat cgttgtcaga 9360tgatccccca tgttgtgcaa aaaagcggtt agctccttcg gtcctccgat cgttgtcaga 9360

agtaagttgg ccgcagtgtt atcactcatg gttatggcag cactgcataa ttctcttact 9420agtaagttgg ccgcagtgtt atcactcatg gttatggcag cactgcataa ttctcttact 9420

gtcatgccat ccgtaagatg cttttctgtg actggtgagt actcaaccaa gtcattctga 9480gtcatgccat ccgtaagatg cttttctgtg actggtgagt actcaaccaa gtcattctga 9480

gaatagtgta tgcggcgacc gagttgctct tgcccggcgt caatacggga taataccgcg 9540gaatagtgta tgcggcgacc gagttgctct tgcccggcgt caatacggga taataccgcg 9540

ccacatagca gaactttaaa agtgctcatc attggaaaac gttcttcggg gcgaaaactc 9600ccacatagca gaactttaaa agtgctcatc attggaaaac gttcttcggg gcgaaaactc 9600

tcaaggatct taccgctgtt gagatccagt tcgatgtaac ccactcgtgc acccaactga 9660tcaaggatct taccgctgtt gagatccagt tcgatgtaac ccactcgtgc acccaactga 9660

tcttcagcat cttttacttt caccagcgtt tctgggtgag caaaaacagg aaggcaaaat 9720tcttcagcat cttttacttt caccagcgtt tctgggtgag caaaaacagg aaggcaaaat 9720

gccgcaaaaa agggaataag ggcgacacgg aaatgttgaa tactcatact cttccttttt 9780gccgcaaaaa agggaataag ggcgacacgg aaatgttgaa tactcatact cttccttttt 9780

caatattatt gaagcattta tcagggttat tgtctcatga gcggatacat atttgaatgt 9840caatattatt gaagcattta tcagggttat tgtctcatga gcggatacat atttgaatgt 9840

atttagaaaa ataaacaaat aggggttccg cgcacatttc cccgaaaagt gccacctgac 9900atttagaaaa ataaacaaat aggggttccg cgcacatttc cccgaaaagt gccacctgac 9900

gtc 9903gtc 9903

<210> 14<210> 14

<211> 9933<211> 9933

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 14<400> 14

aacatcactc ggtgtgggct gtccgggccc ttcgaaggta agcctatccc taaccctctc 5820aacatcactc ggtgtgggct gtccgggccc ttcgaaggta agcctatccc taaccctctc 5820

ctcggtctcg attctacgcg taccggtcat catcaccatc accattgagt ttaaacccgc 5880ctcggtctcg attctacgcg taccggtcat catcaccatc accattgagt ttaaacccgc 5880

tgatcagcct cgactgtgcc ttctagttgc cagccatctg ttgtttgccc ctcccccgtg 5940tgatcagcct cgactgtgcc ttctagttgc cagccatctg ttgtttgccc ctcccccgtg 5940

ccttccttga ccctggaagg tgccactccc actgtccttt cctaataaaa tgaggaaatt 6000ccttccttga ccctggaagg tgccactccc actgtccttt cctaataaaa tgaggaaatt 6000

gcatcgcatt gtctgagtag gtgtcattct attctggggg gtggggtggg gcaggacagc 6060gcatcgcatt gtctgagtag gtgtcattct attctggggg gtggggtggg gcaggacagc 6060

aagggggagg attgggaaga caatagcagg catgctgggg atgcggtggg ctctatggct 6120aagggggagg attgggaaga caatagcagg catgctgggg atgcggtggg ctctatggct 6120

tctgaggcgg aaagaaccag ctggggctct agggggtatc cccacgcgcc ctgtagcggc 6180tctgaggcgg aaagaaccag ctggggctct aggggggtatc cccacgcgcc ctgtagcggc 6180

gcattaagcg cggcgggtgt ggtggttacg cgcagcgtga ccgctacact tgccagcgcc 6240gcattaagcg cggcgggtgt ggtggttacg cgcagcgtga ccgctacact tgccagcgcc 6240

ctagcgcccg ctcctttcgc tttcttccct tcctttctcg ccacgttcgc cggctttccc 6300ctagcgcccg ctcctttcgc tttcttccct tcctttctcg ccacgttcgc cggctttccc 6300

cgtcaagctc taaatcgggg catcccttta gggttccgat ttagtgcttt acggcacctc 6360cgtcaagctc taaatcgggg catcccttta gggttccgat ttagtgcttt acggcacctc 6360

gaccccaaaa aacttgatta gggtgatggt tcacgtagtg ggccatcgcc ctgatagacg 6420gaccccaaaa aacttgatta gggtgatggt tcacgtagtg ggccatcgcc ctgatagacg 6420

gtttttcgcc ctttgacgtt ggagtccacg ttctttaata gtggactctt gttccaaact 6480gtttttcgcc ctttgacgtt ggagtccacg ttctttaata gtggactctt gttccaaact 6480

ggaacaacac tcaaccctat ctcggtctat tcttttgatt tataagggat tttggggatt 6540ggaacaacac tcaaccctat ctcggtctat tcttttgatt tataagggat tttggggatt 6540

tcggcctatt ggttaaaaaa tgagctgatt taacaaaaat ttaacgcgaa ttaattctgt 6600tcggcctatt ggttaaaaaa tgagctgatt taacaaaaat ttaacgcgaa ttaattctgt 6600

ggaatgtgtg tcagttaggg tgtggaaagt ccccaggctc cccaggcagg cagaagtatg 6660ggaatgtgtg tcagttaggg tgtggaaagt ccccaggctc cccaggcagg cagaagtatg 6660

caaagcatgc atctcaatta gtcagcaacc aggtgtggaa agtccccagg ctccccagca 6720caaagcatgc atctcaatta gtcagcaacc aggtgtggaa agtccccagg ctccccagca 6720

ggcagaagta tgcaaagcat gcatctcaat tagtcagcaa ccatagtccc gcccctaact 6780ggcagaagta tgcaaagcat gcatctcaat tagtcagcaa ccatagtccc gcccctaact 6780

ccgcccatcc cgcccctaac tccgcccagt tccgcccatt ctccgcccca tggctgacta 6840ccgcccatcc cgcccctaac tccgcccagt tccgcccatt ctccgcccca tggctgacta 6840

atttttttta tttatgcaga ggccgaggcc gcctctgcct ctgagctatt ccagaagtag 6900atttttttta tttatgcaga ggccgaggcc gcctctgcct ctgagctatt ccagaagtag 6900

tgaggaggct tttttggagg cctaggcttt tgcaaaaagc tcccgggagc ttgtatatcc 6960tgaggaggct tttttggagg cctaggcttt tgcaaaaagc tcccggggagc ttgtatatcc 6960

attttcggat ctgatcagca cgtgttgaca attaatcatc ggcatagtat atcggcatag 7020attttcggat ctgatcagca cgtgttgaca attaatcatc ggcatagtat atcggcatag 7020

tataatacga caaggtgagg aactaaacca tggccaagcc tttgtctcaa gaagaatcca 7080tataatacga caaggtgagg aactaaacca tggccaagcc tttgtctcaa gaagaatcca 7080

ccctcattga aagagcaacg gctacaatca acagcatccc catctctgaa gactacagcg 7140ccctcattga aagagcaacg gctacaatca acagcatccc catctctgaa gactacagcg 7140

tcgccagcgc agctctctct agcgacggcc gcatcttcac tggtgtcaat gtatatcatt 7200tcgccagcgc agctctctct agcgacggcc gcatcttcac tggtgtcaat gtatatcatt 7200

ttactggggg accttgtgca gaactcgtgg tgctgggcac tgctgctgct gcggcagctg 7260ttactggggg accttgtgca gaactcgtgg tgctgggcac tgctgctgct gcggcagctg 7260

gcaacctgac ttgtatcgtc gcgatcggaa atgagaacag gggcatcttg agcccctgcg 7320gcaacctgac ttgtatcgtc gcgatcggaa atgagaacag gggcatcttg agcccctgcg 7320

gacggtgtcg acaggtgctt ctcgatctgc atcctgggat caaagcgata gtgaaggaca 7380gacggtgtcg acaggtgctt ctcgatctgc atcctggggat caaagcgata gtgaaggaca 7380

gtgatggaca gccgacggca gttgggattc gtgaattgct gccctctggt tatgtgtggg 7440gtgatggaca gccgacggca gttgggattc gtgaattgct gccctctggt tatgtgtggg 7440

agggctaagc acttcgtggc cgaggagcag gactgacacg tgctacgaga tttcgattcc 7500agggctaagc acttcgtggc cgaggagcag gactgacacg tgctacgaga tttcgattcc 7500

accgccgcct tctatgaaag gttgggcttc ggaatcgttt tccgggacgc cggctggatg 7560accgccgccttctatgaaag gttgggcttc ggaatcgttt tccgggacgc cggctggatg 7560

atcctccagc gcggggatct catgctggag ttcttcgccc accccaactt gtttattgca 7620atcctccagc gcggggatct catgctggag ttcttcgccc accccaactt gtttattgca 7620

gcttataatg gttacaaata aagcaatagc atcacaaatt tcacaaataa agcatttttt 7680gcttataatg gttacaaata aagcaatagc atcacaaatt tcacaaataa agcatttttt 7680

tcactgcatt ctagttgtgg tttgtccaaa ctcatcaatg tatcttatca tgtctgtata 7740tcactgcatt ctagttgtgg tttgtccaaa ctcatcaatg tatcttatca tgtctgtata 7740

ccgtcgacct ctagctagag cttggcgtaa tcatggtcat agctgtttcc tgtgtgaaat 7800ccgtcgacct ctagctagag cttggcgtaa tcatggtcat agctgtttcc tgtgtgaaat 7800

tgttatccgc tcacaattcc acacaacata cgagccggaa gcataaagtg taaagcctgg 7860tgttatccgc tcacaattcc acacaacata cgagccggaa gcataaagtg taaagcctgg 7860

ggtgcctaat gagtgagcta actcacatta attgcgttgc gctcactgcc cgctttccag 7920ggtgcctaat gagtgagcta actcacatta attgcgttgc gctcactgcc cgctttccag 7920

tcgggaaacc tgtcgtgcca gctgcattaa tgaatcggcc aacgcgcggg gagaggcggt 7980tcgggaaacc tgtcgtgcca gctgcattaa tgaatcggcc aacgcgcggg gagaggcggt 7980

ttgcgtattg ggcgctcttc cgcttcctcg ctcactgact cgctgcgctc ggtcgttcgg 8040ttgcgtattg ggcgctcttc cgcttcctcg ctcactgact cgctgcgctc ggtcgttcgg 8040

ctgcggcgag cggtatcagc tcactcaaag gcggtaatac ggttatccac agaatcaggg 8100ctgcggcgag cggtatcagc tcactcaaag gcggtaatac ggttatccac agaatcaggg 8100

gataacgcag gaaagaacat gtgagcaaaa ggccagcaaa aggccaggaa ccgtaaaaag 8160gataacgcag gaaagaacat gtgagcaaaa ggccagcaaa aggccaggaa ccgtaaaaag 8160

gccgcgttgc tggcgttttt ccataggctc cgcccccctg acgagcatca caaaaatcga 8220gccgcgttgc tggcgttttt ccataggctc cgcccccctg acgagcatca caaaaatcga 8220

cgctcaagtc agaggtggcg aaacccgaca ggactataaa gataccaggc gtttccccct 8280cgctcaagtc agaggtggcg aaacccgaca ggactataaa gataccaggc gtttccccct 8280

ggaagctccc tcgtgcgctc tcctgttccg accctgccgc ttaccggata cctgtccgcc 8340ggaagctccc tcgtgcgctc tcctgttccg accctgccgc ttaccggata cctgtccgcc 8340

tttctccctt cgggaagcgt ggcgctttct caatgctcac gctgtaggta tctcagttcg 8400tttctccctt cgggaagcgt ggcgctttct caatgctcac gctgtaggta tctcagttcg 8400

gtgtaggtcg ttcgctccaa gctgggctgt gtgcacgaac cccccgttca gcccgaccgc 8460gtgtaggtcg ttcgctccaa gctgggctgt gtgcacgaac cccccgttca gcccgaccgc 8460

tgcgccttat ccggtaacta tcgtcttgag tccaacccgg taagacacga cttatcgcca 8520tgcgccttat ccggtaacta tcgtcttgag tccaacccgg taagacacga cttatcgcca 8520

ctggcagcag ccactggtaa caggattagc agagcgaggt atgtaggcgg tgctacagag 8580ctggcagcag ccactggtaa caggattagc agagcgaggt atgtaggcgg tgctacagag 8580

ttcttgaagt ggtggcctaa ctacggctac actagaagga cagtatttgg tatctgcgct 8640ttcttgaagt ggtggcctaa ctacggctac actagaagga cagtatttgg tatctgcgct 8640

ctgctgaagc cagttacctt cggaaaaaga gttggtagct cttgatccgg caaacaaacc 8700ctgctgaagc cagttacctt cggaaaaaga gttggtagct cttgatccgg caaacaaacc 8700

accgctggta gcggtggttt ttttgtttgc aagcagcaga ttacgcgcag aaaaaaagga 8760accgctggta gcggtggtttttttgtttgc aagcagcaga ttacgcgcag aaaaaaagga 8760

tctcaagaag atcctttgat cttttctacg gggtctgacg ctcagtggaa cgaaaactca 8820tctcaagaag atcctttgat cttttctacg gggtctgacg ctcagtggaa cgaaaactca 8820

cgttaaggga ttttggtcat gagattatca aaaaggatct tcacctagat ccttttaaat 8880cgttaaggga ttttggtcat gagattatca aaaaggatct tcacctagat ccttttaaat 8880

taaaaatgaa gttttaaatc aatctaaagt atatatgagt aaacttggtc tgacagttac 8940taaaaatgaa gttttaaatc aatctaaagt atatatgagt aaacttggtc tgacagttac 8940

caatgcttaa tcagtgaggc acctatctca gcgatctgtc tatttcgttc atccatagtt 9000caatgcttaa tcagtgaggc acctatctca gcgatctgtc tatttcgttc atccatagtt 9000

gcctgactcc ccgtcgtgta gataactacg atacgggagg gcttaccatc tggccccagt 9060gcctgactcc ccgtcgtgta gataactacg atacgggagg gcttaccatc tggccccagt 9060

gctgcaatga taccgcgaga cccacgctca ccggctccag atttatcagc aataaaccag 9120gctgcaatga taccgcgaga cccacgctca ccggctccag atttatcagc aataaaccag 9120

ccagccggaa gggccgagcg cagaagtggt cctgcaactt tatccgcctc catccagtct 9180ccagccggaa gggccgagcg cagaagtggt cctgcaactt tatccgcctc catccagtct 9180

attaattgtt gccgggaagc tagagtaagt agttcgccag ttaatagttt gcgcaacgtt 9240attaattgtt gccgggaagc tagagtaagt agttcgccag ttaatagttt gcgcaacgtt 9240

gttgccattg ctacaggcat cgtggtgtca cgctcgtcgt ttggtatggc ttcattcagc 9300gttgccattg ctacaggcat cgtggtgtca cgctcgtcgt ttggtatggc ttcattcagc 9300

tccggttccc aacgatcaag gcgagttaca tgatccccca tgttgtgcaa aaaagcggtt 9360tccggttccc aacgatcaag gcgagttaca tgatccccca tgttgtgcaa aaaagcggtt 9360

agctccttcg gtcctccgat cgttgtcaga agtaagttgg ccgcagtgtt atcactcatg 9420agctccttcg gtcctccgat cgttgtcaga agtaagttgg ccgcagtgtt atcactcatg 9420

gttatggcag cactgcataa ttctcttact gtcatgccat ccgtaagatg cttttctgtg 9480gttatggcag cactgcataa ttctcttact gtcatgccat ccgtaagatg cttttctgtg 9480

actggtgagt actcaaccaa gtcattctga gaatagtgta tgcggcgacc gagttgctct 9540actggtgagt actcaaccaa gtcattctga gaatagtgta tgcggcgacc gagttgctct 9540

tgcccggcgt caatacggga taataccgcg ccacatagca gaactttaaa agtgctcatc 9600tgcccggcgt caatacggga taataccgcg ccacatagca gaactttaaa agtgctcatc 9600

attggaaaac gttcttcggg gcgaaaactc tcaaggatct taccgctgtt gagatccagt 9660attggaaaac gttcttcggg gcgaaaactc tcaaggatct taccgctgtt gagatccagt 9660

tcgatgtaac ccactcgtgc acccaactga tcttcagcat cttttacttt caccagcgtt 9720tcgatgtaac ccactcgtgc acccaactga tcttcagcat cttttacttt caccagcgtt 9720

tctgggtgag caaaaacagg aaggcaaaat gccgcaaaaa agggaataag ggcgacacgg 9780tctgggtgag caaaaacagg aaggcaaaat gccgcaaaaa agggaataag ggcgacacgg 9780

aaatgttgaa tactcatact cttccttttt caatattatt gaagcattta tcagggttat 9840aaatgttgaa tactcatact cttccttttt caatattatt gaagcattta tcagggttat 9840

tgtctcatga gcggatacat atttgaatgt atttagaaaa ataaacaaat aggggttccg 9900tgtctcatga gcggatacat atttgaatgt atttagaaaa ataaacaaat aggggttccg 9900

cgcacatttc cccgaaaagt gccacctgac gtc 9933cgcacatttc cccgaaaagt gccacctgac gtc 9933

<210> 15<210> 15

<211> 24<211> 24

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 15<400> 15

accgaatatg tcacattctg tctc 24accgaatatg tcacattctg tctc 24

<210> 16<210> 16

<211> 24<211> 24

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 16<400> 16

aaacgagaca gaatgtgaca tatt 24aaacgagaca gaatgtgaca tatt 24

<210> 17<210> 17

<211> 24<211> 24

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 17<400> 17

accgggacta tgggaggtca ctaa 24accgggacta tgggaggtca ctaa 24

<210> 18<210> 18

<211> 24<211> 24

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 18<400> 18

aaacttagtg acctcccata gtcc 24aaacttagtg acctcccata gtcc 24

<210> 19<210> 19

<211> 24<211> 24

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 19<400> 19

accggaaggt tacacagaac caga 24accggaaggt tacacagaac caga 24

<210> 20<210> 20

<211> 24<211> 24

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 20<400> 20

aaactctggt tctgtgtaac cttc 24aaactctggt tctgtgtaac cttc 24

<210> 21<210> 21

<211> 24<211> 24

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 21<400> 21

accgggccaa gagatatatc ttag 24accgggccaa gagatatatc ttag 24

<210> 22<210> 22

<211> 24<211> 24

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 22<400> 22

aaacctaaga tatatctctt ggcc 24aaacctaaga tatatctctt ggcc 24

<210> 23<210> 23

<211> 24<211> 24

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 23<400> 23

accggtgcca gaagagccaa ggac 24accggtgcca gaagagccaa ggac 24

<210> 24<210> 24

<211> 24<211> 24

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 24<400> 24

aaacgtcctt ggctcttctg gcac 24aaacgtcctt ggctcttctg gcac 24

<210> 25<210> 25

<211> 24<211> 24

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 25<400> 25

accggtggag ccacacccta gggt 24accggtggag ccacacccta gggt 24

<210> 26<210> 26

<211> 24<211> 24

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 26<400> 26

aaacacccta gggtgtggct ccac 24aaacacccta gggtgtggct ccac 24

<210> 27<210> 27

<211> 24<211> 24

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 27<400> 27

aaacggagcg caccatcttc ttca 24aaacggagcg caccatcttc ttca 24

<210> 28<210> 28

<211> 24<211> 24

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 28<400> 28

aaactgaaga agatggtgcg ctcc 24aaactgaaga agatggtgcg ctcc 24

<210> 29<210> 29

<211> 82<211> 82

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 29<400> 29

gggacctaag aaaaagagga aggtggcggc cgctggcggc agcatgctgg atagggatgt 60gggacctaag aaaaagagga aggtggcggc cgctggcggc agcatgctgg atagggatgt 60

gggtccaact cccatgtatc cg 82gggtccaact cccatgtatc cg 82

<210> 30<210> 30

<211> 65<211> 65

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 30<400> 30

ctctcggggg tggcgctctc gctggtaccg ggggtctcgc tgccgctctg ggaggcctgt 60ctctcggggg tggcgctctc gctggtaccg ggggtctcgc tgccgctctg ggaggcctgt 60

gacgt 65gacgt 65

<210> 31<210> 31

<211> 84<211> 84

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 31<400> 31

gggcgcgctg gaggaggatc cggaggagga tccggaggag gatccatgct ggatagggat 60gggcgcgctg gaggaggatc cggagggagga tccggaggag gatccatgct ggatagggat 60

gtgggtccaa ctcccatgta tccg 84gtgggtccaa ctcccatgta tccg 84

<210> 32<210> 32

<211> 27<211> 27

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 32<400> 32

gaagggcccc tgggaggcct gtgacgt 27gaagggcccc tgggaggcct gtgacgt 27

<210> 33<210> 33

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 33<400> 33

gtggcggccg ctggcggcag catgctggat agggatgtgg gtccaactcc catgtatccg 60gtggcggccg ctggcggcag catgctggat agggatgtgg gtccaactcc catgtatccg 60

<210> 34<210> 34

<211> 57<211> 57

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 34<400> 34

cgctggtacc gggggtctcg ctgccgctgg acagcccaca ccgagtgatg tttttgg 57cgctggtacc gggggtctcg ctgccgctgg acagcccaca ccgagtgatg tttttgg 57

<210> 35<210> 35

<211> 84<211> 84

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 35<400> 35

gtgggtccaa ctcccatgta tccg 84gtgggtccaa ctcccatgta tccg 84

<210> 36<210> 36

<211> 35<211> 35

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 36<400> 36

tcgaagggcc cggacagccc acaccgagtg atgtt 35tcgaagggcc cggacagccc acaccgagtg atgtt 35

<210> 37<210> 37

<211> 1767<211> 1767

<212> PRT<212> PRT

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 37<400> 37

1 5 10 151 5 10 15

20 25 3020 25 30

35 40 4535 40 45

50 55 6050 55 60

65 70 75 8065 70 75 80

85 90 9585 90 95

100 105 110100 105 110

115 120 125115 120 125

130 135 140130 135 140

145 150 155 160145 150 155 160

165 170 175165 170 175

180 185 190180 185 190

195 200 205195 200 205

210 215 220210 215 220

225 230 235 240225 230 235 240

245 250 255245 250 255

260 265 270260 265 270

275 280 285275 280 285

290 295 300290 295 300

305 310 315 320305 310 315 320

325 330 335325 330 335

340 345 350340 345 350

355 360 365355 360 365

370 375 380370 375 380

385 390 395 400385 390 395 400

405 410 415405 410 415

420 425 430420 425 430

435 440 445435 440 445

450 455 460450 455 460

465 470 475 480465 470 475 480

485 490 495485 490 495

500 505 510500 505 510

515 520 525515 520 525

530 535 540530 535 540

545 550 555 560545 550 555 560

565 570 575565 570 575

580 585 590580 585 590

595 600 605595 600 605

610 615 620610 615 620

625 630 635 640625 630 635 640

645 650 655645 650 655

660 665 670660 665 670

675 680 685675 680 685

690 695 700690 695 700

705 710 715 720705 710 715 720

725 730 735725 730 735

740 745 750740 745 750

755 760 765755 760 765

770 775 780770 775 780

785 790 795 800785 790 795 800

805 810 815805 810 815

820 825 830820 825 830

835 840 845835 840 845

850 855 860850 855 860

865 870 875 880865 870 875 880

885 890 895885 890 895

900 905 910900 905 910

915 920 925915 920 925

930 935 940930 935 940

945 950 955 960945 950 955 960

965 970 975965 970 975

980 985 990980 985 990

995 1000 1005995 1000 1005

1010 1015 10201010 1015 1020

1025 1030 1035 10401025 1030 1035 1040

1045 1050 10551045 1050 1055

1060 1065 10701060 1065 1070

1075 1080 10851075 1080 1085

1090 1095 11001090 1095 1100

1105 1110 1115 11201105 1110 1115 1120

1125 1130 11351125 1130 1135

1140 1145 11501140 1145 1150

1155 1160 11651155 1160 1165

1170 1175 11801170 1175 1180

1185 1190 1195 12001185 1190 1195 1200

1205 1210 12151205 1210 1215

1220 1225 12301220 1225 1230

1235 1240 12451235 1240 1245

1250 1255 12601250 1255 1260

1265 1270 1275 12801265 1270 1275 1280

1285 1290 12951285 1290 1295

1300 1305 13101300 1305 1310

1315 1320 13251315 1320 1325

1330 1335 13401330 1335 1340

1345 1350 1355 13601345 1350 1355 1360

Leu Ser Gln Leu Gly Gly Asp Gly Ser Pro Lys Lys Lys Arg Lys ValLeu Ser Gln Leu Gly Gly Asp Gly Ser Pro Lys Lys Lys Arg Lys Val

1365 1370 13751365 1370 1375

Gly Arg Ala Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly Gly Ser MetGly Arg Ala Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly Gly Ser Met

1380 1385 13901380 1385 1390

Leu Asp Arg Asp Val Gly Pro Thr Pro Met Tyr Pro Pro Thr Tyr LeuLeu Asp Arg Asp Val Gly Pro Thr Pro Met Tyr Pro Pro Thr Tyr Leu

1395 1400 14051395 1400 1405

Glu Pro Gly Ile Gly Arg His Thr Pro Tyr Gly Asn Gln Thr Asp TyrGlu Pro Gly Ile Gly Arg His Thr Pro Tyr Gly Asn Gln Thr Asp Tyr

1410 1415 14201410 1415 1420

Arg Ile Phe Glu Leu Asn Lys Arg Leu Gln Asn Trp Thr Glu Glu CysArg Ile Phe Glu Leu Asn Lys Arg Leu Gln Asn Trp Thr Glu Glu Cys

1425 1430 1435 14401425 1430 1435 1440

Asp Asn Leu Trp Trp Asp Ala Phe Thr Thr Glu Phe Phe Glu Asp AspAsp Asn Leu Trp Trp Asp Ala Phe Thr Thr Glu Phe Phe Glu Asp Asp

1445 1450 14551445 1450 1455

Ala Met Leu Thr Ile Thr Phe Cys Leu Glu Asp Gly Pro Lys Arg TyrAla Met Leu Thr Ile Thr Phe Cys Leu Glu Asp Gly Pro Lys Arg Tyr

1460 1465 14701460 1465 1470

Thr Ile Gly Arg Thr Leu Ile Pro Arg Tyr Phe Arg Ser Ile Phe GluThr Ile Gly Arg Thr Leu Ile Pro Arg Tyr Phe Arg Ser Ile Phe Glu

1475 1480 14851475 1480 1485

Gly Gly Ala Thr Glu Leu Tyr Tyr Val Leu Lys His Pro Lys Glu AlaGly Gly Ala Thr Glu Leu Tyr Tyr Val Leu Lys His Pro Lys Glu Ala

1490 1495 15001490 1495 1500

Phe His Ser Asn Phe Val Ser Leu Asp Cys Asp Gln Gly Ser Met ValPhe His Ser Asn Phe Val Ser Leu Asp Cys Asp Gln Gly Ser Met Val

1505 1510 1515 15201505 1510 1515 1520

Thr Gln His Gly Lys Pro Met Phe Thr Gln Val Cys Val Glu Gly ArgThr Gln His Gly Lys Pro Met Phe Thr Gln Val Cys Val Glu Gly Arg

1525 1530 15351525 1530 1535

Leu Tyr Leu Glu Phe Met Phe Asp Asp Met Met Arg Ile Lys Thr TrpLeu Tyr Leu Glu Phe Met Phe Asp Asp Met Met Arg Ile Lys Thr Trp

1540 1545 15501540 1545 1550

His Phe Ser Ile Arg Gln His Arg Glu Leu Ile Pro Arg Ser Ile LeuHis Phe Ser Ile Arg Gln His Arg Glu Leu Ile Pro Arg Ser Ile Leu

1555 1560 15651555 1560 1565

Ala Met His Ala Gln Asp Pro Gln Met Leu Asp Gln Leu Ser Lys AsnAla Met His Ala Gln Asp Pro Gln Met Leu Asp Gln Leu Ser Lys Asn

1570 1575 15801570 1575 1580

Ile Thr Arg Cys Gly Leu Ser Asn Ser Thr Leu Asn Tyr Leu Arg LeuIle Thr Arg Cys Gly Leu Ser Asn Ser Thr Leu Asn Tyr Leu Arg Leu

1585 1590 1595 16001585 1590 1595 1600

Cys Val Ile Leu Glu Pro Met Gln Glu Leu Met Ser Arg His Lys ThrCys Val Ile Leu Glu Pro Met Gln Glu Leu Met Ser Arg His Lys Thr

1605 1610 16151605 1610 1615

Tyr Ser Leu Ser Pro Arg Asp Cys Leu Lys Thr Cys Leu Phe Gln LysTyr Ser Leu Ser Pro Arg Asp Cys Leu Lys Thr Cys Leu Phe Gln Lys

1620 1625 16301620 1625 1630

Trp Gln Arg Met Val Ala Pro Pro Ala Glu Pro Thr Arg Gln Gln ProTrp Gln Arg Met Val Ala Pro Pro Ala Glu Pro Thr Arg Gln Gln Pro

1635 1640 16451635 1640 1645

Ser Lys Arg Arg Lys Arg Lys Met Ser Gly Gly Ser Thr Met Ser SerSer Lys Arg Arg Lys Arg Lys Met Ser Gly Gly Ser Thr Met Ser Ser

1650 1655 16601650 1655 1660

Gly Gly Gly Asn Thr Asn Asn Ser Asn Ser Lys Lys Lys Ser Pro AlaGly Gly Gly Asn Thr Asn Asn Ser Asn Ser Lys Lys Lys Ser Pro Ala

1665 1670 1675 16801665 1670 1675 1680

Ser Thr Phe Ala Leu Ser Ser Gln Val Pro Asp Val Met Val Val GlySer Thr Phe Ala Leu Ser Ser Gln Val Pro Asp Val Met Val Val Gly

1685 1690 16951685 1690 1695

Glu Pro Thr Leu Met Gly Gly Glu Phe Gly Asp Glu Asp Glu Arg LeuGlu Pro Thr Leu Met Gly Gly Glu Phe Gly Asp Glu Asp Glu Arg Leu

1700 1705 17101700 1705 1710

Ile Thr Arg Leu Glu Asn Thr Gln Phe Asp Ala Ala Asn Gly Ile AspIle Thr Arg Leu Glu Asn Thr Gln Phe Asp Ala Ala Asn Gly Ile Asp

1715 1720 17251715 1720 1725

Asp Glu Asp Ser Phe Asn Asn Ser Pro Ala Leu Gly Ala Asn Ser ProAsp Glu Asp Ser Phe Asn Asn Ser Pro Ala Leu Gly Ala Asn Ser Pro

1730 1735 17401730 1735 1740

Trp Asn Ser Lys Pro Pro Ser Ser Gln Glu Ser Lys Ser Glu Asn ProTrp Asn Ser Lys Pro Pro Ser Ser Gln Glu Ser Lys Ser Glu Asn Pro

1745 1750 1755 17601745 1750 1755 1760

Thr Ser Gln Ala Ser Gln GlyThr Ser Gln Ala Ser Gln Gly

17651765

<210> 38<210> 38

<211> 1767<211> 1767

<212> PRT<212> PRT

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 38<400> 38

1 5 10 151 5 10 15

20 25 3020 25 30

35 40 4535 40 45

50 55 6050 55 60

65 70 75 8065 70 75 80

85 90 9585 90 95

100 105 110100 105 110

115 120 125115 120 125

130 135 140130 135 140

145 150 155 160145 150 155 160

165 170 175165 170 175

180 185 190180 185 190

195 200 205195 200 205

210 215 220210 215 220

225 230 235 240225 230 235 240

245 250 255245 250 255

260 265 270260 265 270

275 280 285275 280 285

290 295 300290 295 300

305 310 315 320305 310 315 320

325 330 335325 330 335

340 345 350340 345 350

355 360 365355 360 365

Pro Thr Ser Gln Ala Ser Gln Ser Gly Ser Glu Thr Pro Gly Thr SerPro Thr Ser Gln Ala Ser Gln Ser Gly Ser Glu Thr Pro Gly Thr Ser

370 375 380370 375 380

Glu Ser Ala Thr Pro Glu Ser Asp Lys Lys Tyr Ser Ile Gly Leu AlaGlu Ser Ala Thr Pro Glu Ser Asp Lys Lys Tyr Ser Ile Gly Leu Ala

385 390 395 400385 390 395 400

Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr LysIle Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys

405 410 415405 410 415

Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His SerVal Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His Ser

420 425 430420 425 430

Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu ThrIle Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr

435 440 445435 440 445

Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr ArgAla Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg

450 455 460450 455 460

Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu MetArg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met

465 470 475 480465 470 475 480

Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser Phe LeuAla Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser Phe Leu

485 490 495485 490 495

Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn IleVal Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn Ile

500 505 510500 505 510

Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr His LeuVal Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr His Leu

515 520 525515 520 525

Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu IleArg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile

530 535 540530 535 540

Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe Leu IleTyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe Leu Ile

545 550 555 560545 550 555 560

Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu Phe IleGlu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu Phe Ile

565 570 575565 570 575

Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile AsnGln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn

580 585 590580 585 590

Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser LysAla Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys

595 600 605595 600 605

Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys LysSer Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys

610 615 620610 615 620

Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr ProAsn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro

625 630 635 640625 630 635 640

Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln LeuAsn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu

645 650 655645 650 655

Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln IleSer Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile

660 665 670660 665 670

Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser AspGly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp

675 680 685675 680 685

Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile Thr LysAla Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile Thr Lys

690 695 700690 695 700

Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His His GlnAla Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His His Gln

705 710 715 720705 710 715 720

Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu LysAsp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu Lys

725 730 735725 730 735

Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly TyrTyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr

740 745 750740 745 750

Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys ProIle Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro

755 760 765755 760 765

Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys Leu AsnIle Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys Leu Asn

770 775 780770 775 780

Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser IleArg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile

785 790 795 800785 790 795 800

Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg Arg GlnPro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg Arg Gln

805 810 815805 810 815

Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu LysGlu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys

820 825 830820 825 830

Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg GlyIle Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly

835 840 845835 840 845

Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile ThrAsn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile Thr

850 855 860850 855 860

Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala Gln SerPro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala Gln Ser

865 870 875 880865 870 875 880

Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu LysPhe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys

885 890 895885 890 895

Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr AsnVal Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn

900 905 910900 905 910

Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro AlaGlu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro Ala

915 920 925915 920 925

Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu Phe LysPhe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu Phe Lys

930 935 940930 935 940

Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe LysThr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys

945 950 955 960945 950 955 960

Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp ArgLys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp Arg

965 970 975965 970 975

Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile Ile LysPhe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile Ile Lys

980 985 990980 985 990

Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu AspAsp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp

995 1000 1005995 1000 1005

Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu GluIle Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu Glu

1010 1015 10201010 1015 1020

Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met Lys GlnArg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met Lys Gln

1025 1030 1035 10401025 1030 1035 1040

Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys LeuLeu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu

1045 1050 10551045 1050 1055

Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp PheIle Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe

1060 1065 10701060 1065 1070

Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu Ile HisLeu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu Ile His

1075 1080 10851075 1080 1085

Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln Val SerAsp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln Val Ser

1090 1095 11001090 1095 1100

Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala Gly SerGly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala Gly Ser

1105 1110 1115 11201105 1110 1115 1120

Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val Asp GluPro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val Asp Glu

1125 1130 11351125 1130 1135

Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val Ile GluLeu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val Ile Glu

1140 1145 11501140 1145 1150

Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser ArgMet Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg

1155 1160 11651155 1160 1165

Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser GlnGlu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln

1170 1175 11801170 1175 1180

Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn Glu LysIle Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn Glu Lys

1185 1190 1195 12001185 1190 1195 1200

Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Val Asp GlnLeu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Val Asp Gln

1205 1210 12151205 1210 1215

Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp Ala Ile ValGlu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp Ala Ile Val

1220 1225 12301220 1225 1230

Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys Val Leu ThrPro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys Val Leu Thr

1235 1240 12451235 1240 1245

Arg Ser Asp Lys Ala Arg Gly Lys Ser Asp Asn Val Pro Ser Glu GluArg Ser Asp Lys Ala Arg Gly Lys Ser Asp Asn Val Pro Ser Glu Glu

1250 1255 12601250 1255 1260

Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala LysVal Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys

1265 1270 1275 12801265 1270 1275 1280

Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg GlyLeu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly

1285 1290 12951285 1290 1295

Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu ValGly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu Val

1300 1305 13101300 1305 1310

Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp Ser ArgGlu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg

1315 1320 13251315 1320 1325

Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val LysMet Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys

1330 1335 13401330 1335 1340

Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp PheVal Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe

1345 1350 1355 13601345 1350 1355 1360

Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His AspGln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp

1365 1370 13751365 1370 1375

Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr ProAla Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro

1380 1385 13901380 1385 1390

Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp ValLys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val

1395 1400 14051395 1400 1405

Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr AlaArg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala

1410 1415 14201410 1415 1420

Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu IleLys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile

1425 1430 1435 14401425 1430 1435 1440

Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr AsnThr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn

1445 1450 14551445 1450 1455

Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala ThrGly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr

1460 1465 14701460 1465 1470

Val Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys ThrVal Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr

1475 1480 14851475 1480 1485

Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys ArgGlu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg

1490 1495 15001490 1495 1500

Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys LysAsn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys

1505 1510 1515 15201505 1510 1515 1520

Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val ValTyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val Val

1525 1530 15351525 1530 1535

Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys GluAla Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu

1540 1545 15501540 1545 1550

Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn ProLeu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro

1555 1560 15651555 1560 1565

Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp LeuIle Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu

1570 1575 15801570 1575 1580

Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly ArgIle Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg

1585 1590 1595 16001585 1590 1595 1600

Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu LeuLys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu

1605 1610 16151605 1610 1615

Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His TyrAla Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr

1620 1625 16301620 1625 1630

Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu PheGlu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe

1635 1640 16451635 1640 1645

Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln Ile SerVal Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser

1650 1655 16601650 1655 1660

Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys ValGlu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val

1665 1670 1675 16801665 1670 1675 1680

Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln AlaLeu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala

1685 1690 16951685 1690 1695

Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro AlaGlu Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala

1700 1705 17101700 1705 1710

Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr SerAla Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser

1715 1720 17251715 1720 1725

Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr GlyThr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly

1730 1735 17401730 1735 1740

Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Gly SerLeu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Gly Ser

1745 1750 1755 17601745 1750 1755 1760

Pro Lys Lys Lys Arg Lys ValPro Lys Lys Lys Arg Lys Val

17651765

<210> 39<210> 39

<211> 1592<211> 1592

<212> PRT<212> PRT

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 39<400> 39

1 5 10 151 5 10 15

20 25 3020 25 30

35 40 4535 40 45

50 55 6050 55 60

65 70 75 8065 70 75 80

85 90 9585 90 95

100 105 110100 105 110

115 120 125115 120 125

130 135 140130 135 140

145 150 155 160145 150 155 160

165 170 175165 170 175

180 185 190180 185 190

Asn Ile Thr Arg Cys Gly Leu Ser Ser Gly Ser Glu Thr Pro Gly ThrAsn Ile Thr Arg Cys Gly Leu Ser Ser Gly Ser Glu Thr Pro Gly Thr

195 200 205195 200 205

Ser Glu Ser Ala Thr Pro Glu Ser Asp Lys Lys Tyr Ser Ile Gly LeuSer Glu Ser Ala Thr Pro Glu Ser Asp Lys Lys Tyr Ser Ile Gly Leu

210 215 220210 215 220

Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu TyrAla Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr

225 230 235 240225 230 235 240

Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg HisLys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His

245 250 255245 250 255

Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly GluSer Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu

260 265 270260 265 270

Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr ThrThr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr

275 280 285275 280 285

Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn GluArg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu

290 295 300290 295 300

Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser PheMet Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser Phe

305 310 315 320305 310 315 320

Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly AsnLeu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn

325 330 335325 330 335

Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr HisIle Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr His

340 345 350340 345 350

Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg LeuLeu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu

355 360 365355 360 365

Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe LeuIle Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe Leu

370 375 380370 375 380

Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu PheIle Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu Phe

385 390 395 400385 390 395 400

Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro IleIle Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile

405 410 415405 410 415

Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu SerAsn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser

420 425 430420 425 430

Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu LysLys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys

435 440 445435 440 445

Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu ThrLys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr

450 455 460450 455 460

Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu GlnPro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln

465 470 475 480465 470 475 480

Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala GlnLeu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln

485 490 495485 490 495

Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu SerIle Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser

500 505 510500 505 510

Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile ThrAsp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile Thr

515 520 525515 520 525

Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His HisLys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His His

530 535 540530 535 540

Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro GluGln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu

545 550 555 560545 550 555 560

Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala GlyLys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly

565 570 575565 570 575

Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile LysTyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys

580 585 590580 585 590

Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys LeuPro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys Leu

595 600 605595 600 605

Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly SerAsn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser

610 615 620610 615 620

Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg ArgIle Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg Arg

625 630 635 640625 630 635 640

Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile GluGln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu

645 650 655645 650 655

Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala ArgLys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg

660 665 670660 665 670

Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr IleGly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile

675 680 685675 680 685

Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala GlnThr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala Gln

690 695 700690 695 700

Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn GluSer Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu

705 710 715 720705 710 715 720

Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val TyrLys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr

725 730 735725 730 735

Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys ProAsn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro

740 745 750740 745 750

Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu PheAla Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu Phe

755 760 765755 760 765

Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr PheLys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe

770 775 780770 775 780

Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu AspLys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp

785 790 795 800785 790 795 800

Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile IleArg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile Ile

805 810 815805 810 815

Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu GluLys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu

820 825 830820 825 830

Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile GluAsp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu

835 840 845835 840 845

Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met LysGlu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met Lys

850 855 860850 855 860

Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg LysGln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys

865 870 875 880865 870 875 880

Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu AspLeu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp

885 890 895885 890 895

Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu IlePhe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu Ile

900 905 910900 905 910

His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln ValHis Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln Val

915 920 925915 920 925

Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala GlySer Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala Gly

930 935 940930 935 940

Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val AspSer Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val Asp

945 950 955 960945 950 955 960

Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val IleGlu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val Ile

965 970 975965 970 975

Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn SerGlu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser

980 985 990980 985 990

Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly SerArg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser

995 1000 1005995 1000 1005

Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn GluGln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn Glu

1010 1015 10201010 1015 1020

Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Val AspLys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Val Asp

1025 1030 1035 10401025 1030 1035 1040

Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp Ala IleGln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp Ala Ile

1045 1050 10551045 1050 1055

Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys Val LeuVal Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys Val Leu

1060 1065 10701060 1065 1070

Thr Arg Ser Asp Lys Ala Arg Gly Lys Ser Asp Asn Val Pro Ser GluThr Arg Ser Asp Lys Ala Arg Gly Lys Ser Asp Asn Val Pro Ser Glu

1075 1080 10851075 1080 1085

Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn AlaGlu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala

1090 1095 11001090 1095 1100

Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu ArgLys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg

1105 1110 1115 11201105 1110 1115 1120

Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln LeuGly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu

1125 1130 11351125 1130 1135

Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp SerVal Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp Ser

1140 1145 11501140 1145 1150

Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu ValArg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val

1155 1160 11651155 1160 1165

Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys AspLys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp

1170 1175 11801170 1175 1180

Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala HisPhe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His

1185 1190 1195 12001185 1190 1195 1200

Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys TyrAsp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr

1205 1210 12151205 1210 1215

Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr AspPro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp

1220 1225 12301220 1225 1230

Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala ThrVal Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr

1235 1240 12451235 1240 1245

Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr GluAla Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu

1250 1255 12601250 1255 1260

Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu ThrIle Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr

1265 1270 1275 12801265 1270 1275 1280

Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe AlaAsn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala

1285 1290 12951285 1290 1295

Thr Val Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys LysThr Val Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys

1300 1305 13101300 1305 1310

Thr Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro LysThr Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys

1315 1320 13251315 1320 1325

Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro LysArg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys

1330 1335 13401330 1335 1340

Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu ValLys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val

1345 1350 1355 13601345 1350 1355 1360

Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val LysVal Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys

1365 1370 13751365 1370 1375

Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys AsnGlu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn

1380 1385 13901380 1385 1390

Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys AspPro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp

1395 1400 14051395 1400 1405

Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn GlyLeu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly

1410 1415 14201410 1415 1420

Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn GluArg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu

1425 1430 1435 14401425 1430 1435 1440

Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser HisLeu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His

1445 1450 14551445 1450 1455

Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln LeuTyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu

1460 1465 14701460 1465 1470

Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln IlePhe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln Ile

1475 1480 14851475 1480 1485

Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu Asp LysSer Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys

1490 1495 15001490 1495 1500

Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu GlnVal Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln

1505 1510 1515 15201505 1510 1515 1520

Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala ProAla Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro

1525 1530 15351525 1530 1535

Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr ThrAla Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr

1540 1545 15501540 1545 1550

Ser Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile ThrSer Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr

1555 1560 15651555 1560 1565

Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp GlyGly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Gly

1570 1575 15801570 1575 1580

Ser Pro Lys Lys Lys Arg Lys ValSer Pro Lys Lys Lys Arg Lys Val

1585 15901585 1590

<210> 40<210> 40

<211> 1592<211> 1592

<212> PRT<212> PRT

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 40<400> 40

1 5 10 151 5 10 15

20 25 3020 25 30

35 40 4535 40 45

50 55 6050 55 60

65 70 75 8065 70 75 80

85 90 9585 90 95

100 105 110100 105 110

115 120 125115 120 125

130 135 140130 135 140

145 150 155 160145 150 155 160

165 170 175165 170 175

180 185 190180 185 190

195 200 205195 200 205

210 215 220210 215 220

225 230 235 240225 230 235 240

245 250 255245 250 255

260 265 270260 265 270

275 280 285275 280 285

290 295 300290 295 300

305 310 315 320305 310 315 320

325 330 335325 330 335

340 345 350340 345 350

355 360 365355 360 365

370 375 380370 375 380

385 390 395 400385 390 395 400

405 410 415405 410 415

420 425 430420 425 430

435 440 445435 440 445

450 455 460450 455 460

465 470 475 480465 470 475 480

485 490 495485 490 495

500 505 510500 505 510

515 520 525515 520 525

530 535 540530 535 540

545 550 555 560545 550 555 560

565 570 575565 570 575

580 585 590580 585 590

595 600 605595 600 605

610 615 620610 615 620

625 630 635 640625 630 635 640

645 650 655645 650 655

660 665 670660 665 670

675 680 685675 680 685

690 695 700690 695 700

705 710 715 720705 710 715 720

725 730 735725 730 735

740 745 750740 745 750

755 760 765755 760 765

770 775 780770 775 780

785 790 795 800785 790 795 800

805 810 815805 810 815

820 825 830820 825 830

835 840 845835 840 845

850 855 860850 855 860

865 870 875 880865 870 875 880

885 890 895885 890 895

900 905 910900 905 910

915 920 925915 920 925

930 935 940930 935 940

945 950 955 960945 950 955 960

965 970 975965 970 975

980 985 990980 985 990

995 1000 1005995 1000 1005

1010 1015 10201010 1015 1020

1025 1030 1035 10401025 1030 1035 1040

1045 1050 10551045 1050 1055

1060 1065 10701060 1065 1070

1075 1080 10851075 1080 1085

1090 1095 11001090 1095 1100

1105 1110 1115 11201105 1110 1115 1120

1125 1130 11351125 1130 1135

1140 1145 11501140 1145 1150

1155 1160 11651155 1160 1165

1170 1175 11801170 1175 1180

1185 1190 1195 12001185 1190 1195 1200

1205 1210 12151205 1210 1215

1220 1225 12301220 1225 1230

1235 1240 12451235 1240 1245

1250 1255 12601250 1255 1260

1265 1270 1275 12801265 1270 1275 1280

1285 1290 12951285 1290 1295

1300 1305 13101300 1305 1310

1315 1320 13251315 1320 1325

1330 1335 13401330 1335 1340

1345 1350 1355 13601345 1350 1355 1360

1365 1370 13751365 1370 1375

1380 1385 13901380 1385 1390

1395 1400 14051395 1400 1405

1410 1415 14201410 1415 1420

1425 1430 1435 14401425 1430 1435 1440

1445 1450 14551445 1450 1455

1460 1465 14701460 1465 1470

1475 1480 14851475 1480 1485

1490 1495 15001490 1495 1500

1505 1510 1515 15201505 1510 1515 1520

1525 1530 15351525 1530 1535

1540 1545 15501540 1545 1550

1555 1560 15651555 1560 1565

1570 1575 15801570 1575 1580

Ile Thr Arg Cys Gly Leu Ser GlyIle Thr Arg Cys Gly Leu Ser Gly

1585 15901585 1590

<210> 41<210> 41

<211> 16<211> 16

<212> PRT<212> PRT

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 41<400> 41

Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu SerSer Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser

1 5 10 151 5 10 15

<210> 42<210> 42

<211> 15<211> 15

<212> PRT<212> PRT

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 42<400> 42

Gly Arg Ala Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly Gly SerGly Arg Ala Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly Gly Ser

1 5 10 151 5 10 15

<210> 43<210> 43

<211> 23<211> 23

<212> DNA<212> DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 43<400> 43

gnnnnnnnnn nnnnnnnnnn ngg 23gnnnnnnnnn nnnnnnnn ngg 23

Claims

1. A fusion protein is characterized in that the amino acid sequence of the fusion protein is shown as one of SEQ ID No. 37-40.

2. A DNA loop system comprising the fusion protein of claim 1, a promoter-targeting sgRNA and an enhancer-targeting sgRNA,

the sgRNA of the targeting promoter is located between-100 and-200 bp upstream of the TSS;

and/or, the sgrnas of the targeting promoter have gnnnnnnnnnnnnnnnngg characteristics;

and/or, the sgRNA of the targeting promoter targets the promoter region of the HBB gene;

and/or, the sgrnas of the targeting enhancer target the vicinity of DHS2 of the LCR region of β -globin.

3. The DNA looping system of claim 2, wherein the sequence of sgRNA of the targeting promoter is shown in SEQ ID No. 4-6.

4. The DNA looping system of claim 2, wherein the sequence of sgRNA of said targeting enhancer is shown in SEQ ID No. 7-9.

5. An isolated polynucleotide encoding the fusion protein of claim 1.

6. An expression system comprising a host cell capable of expressing the fusion protein of claim 1, the sgRNA of the targeting promoter of claim 2, and the sgRNA of the targeting enhancer.

7. The expression system of claim 6, wherein the host cell is selected from the group consisting of eukaryotic cells.

8. The expression system of claim 6, wherein the host cell is selected from a primary cell of metazoan origin or an immortalized cell line.

9. The expression system of claim 6, wherein the host cell is a blood cell line.

10. The expression system of claim 8, wherein the host cell is a human K562 cell.

11. Use of a DNA-loop system according to any one of claims 2 to 4, a polynucleotide according to claim 5, or an expression system according to any one of claims 6 to 10 for regulating gene expression for non-disease therapeutic purposes.

12. The use of claim 11, wherein the gene expression is eukaryotic gene expression.

13. The use according to claim 12, wherein the eukaryote is a metazoan.

14. The use of claim 12, wherein the eukaryotic organism is a combination of one or more of a human, a mouse, a nematode, a drosophila.

15. A method of modulating gene expression for non-disease therapeutic purposes comprising: the control of gene expression is performed by pulling the three-dimensional space distance of the target site closer by the loop-forming system according to any one of claims 2 to 4.

16. The method of claim 15, wherein the method of modulating gene expression comprises: culturing a host cell capable of expressing a gene of interest under suitable conditions in the presence of said loop-forming system;

and/or, the method for regulating gene expression is a method for regulating in vitro gene expression.