HK1165508A

HK1165508A - A method for detecting nucleotide sequence of disease-associated nucleic acid molecule in samples under testing

Info

Publication number: HK1165508A
Application number: HK12106351.6A
Authority: HK
Inventors: 魏晓明; 陈洋; 杨光辉; 朱倩; 谢姝琦; 汪建; 王俊; 杨焕明
Original assignee: 深圳华大基因科技有限公司; 深圳华大基因研究院
Filing date: 2012-06-28
Publication date: 2012-10-05

Abstract

The present invention relates to a method for determining the nucleotide sequence of disease-related nucleic acid molecules in a sample to be tested.The method comprises: adding a linker to the end of fragmented double stranded nucleic acid molecules derived from genomic DNA in the test sample, and enriching them; Capture DNA fragments containing adapters using nucleic acid chips, and sequence the captured fragments on high-throughput sequencing platforms.Based on known genetic locus information, analyzing sequencing results can quickly and high-throughput obtain nucleotide sequences of disease-related nucleic acid molecules in samples, which can be used for detection of single gene diseases, for example.The present invention also provides a nucleic acid chip for immobilizing several to tens of thousands of disease-specific probes that can be used in the method, as well as a reagent kit comprising the chip.

Description

Method for determining nucleotide sequence of disease-related nucleic acid molecule in sample to be detected

Technical Field

The invention relates to the field of biotechnology, in particular to a method for determining a nucleotide sequence of a disease-related nucleic acid molecule in a sample to be detected. The method comprises the following steps: designing a probe chip with specificity for various diseases, capturing and enriching a specific target DNA fragment with a joint, sequencing with high throughput, analyzing gene mutation bit information and the like.

Background

The completion of biological genome sequencing work in various modes greatly improves the understanding of people on the pathogenic mechanism of diseases and the physiological state of organisms at the gene level and also greatly promotes the development of second-generation high-throughput sequencing technology. The organisms that currently complete genome sequencing are: human, mouse, rat, fruit fly, rice, soybean, arabidopsis, etc. Then, genome sequencing and identification and analysis of disease-associated genes in individuals are far from meeting the growing need due to the cost of sequencing.

A monogenic disease is a disease or pathological trait controlled by a pair of alleles, also known as mendelian or monogenic genetic diseases. There are 6000 or more monogenic diseases which have been found so far, 1700 or more diseases in which the phenotype is known and the molecular basis is unknown, and many subtypes are not found in monogenic diseases in which the phenotype and the pathogenic molecular basis are known (about 2900 or more) due to genetic heterogeneity. The gene is a genetic unit located on a chromosome, the chromosome is divided into an autosome and a sex chromosome, and the gene also has a dominant gene and a recessive gene, so that pathogenic genes located on different chromosomes have different genetic modes. Generally, the monogenic diseases can be divided into autosomal dominant diseases, autosomal recessive diseases, x-linked dominant diseases, x-linked recessive diseases, and Y-linked diseases.

The detection method of the single-gene disease is mainly based on the first generation sequencing technology at present, and mainly comprises the following steps: phylogenetic analysis, karyotyping, enzymatic reactions and activity assays, RALF, SSCP (single-stranded conformation polymorphism), MOLDI-TOF, FISH (fluorescence in situ hybridization), a-CGH (a-comparative genomic hybridization), qPCR, MLPA (multiple ligation Probe amplification), Sanger's method, and the like. There are many disadvantages in the above method, such as: genealogy analysis, chromosome karyotype analysis, enzymatic reaction activity determination method and FISH method analysis method are all chromosome level detection, and the accuracy is low; RALF, SSCP and MOLDI-TOF analysis methods are indirect detection methods and cannot directly reflect the change of sites; the a-CGH, qPCR and MLPA only can be used for specific sites, the newly found mutation sites cannot be read, and the sequencing flux of the method is very small and the PCR amplification process needs to be carried out firstly. Therefore, although the Sanger method-based first-generation sequencing technology is the gold standard for detecting the single-gene diseases at present, the number of samples for simultaneous sequencing is small, the types of the detected single-gene diseases are limited to one or more, the sequencing cost is high, the simultaneous detection of the single-gene diseases with multiple known molecular bases cannot be carried out, and the identification of the individual gene diseases is greatly limited.

There is currently no efficient method for determining the nucleotide sequence of a disease-associated nucleic acid molecule in a sample to be tested. Therefore, there is an urgent need to develop a novel method for detecting the nucleotide sequence of a disease-associated nucleic acid molecule in an individualized sample, based on the genetic information of known various diseases.

Disclosure of Invention

The invention aims to provide a method for determining the nucleotide sequence of a disease-related nucleic acid molecule in a sample to be detected and application thereof.

Another object of the present invention is to provide a kit for determining the nucleotide sequence of a nucleic acid molecule associated with a disease in a sample to be tested.

In a first aspect of the present invention, there is provided a method for determining the nucleotide sequence of a disease-associated nucleic acid molecule in a sample to be tested, comprising the steps of:

a. providing a sample to be tested, said sample containing broken double-stranded nucleic acid fragments derived from genomic DNA and said nucleic acid fragments having blunt ends;

b. adding an adaptor connecting sequence to the end of the double-stranded nucleic acid fragment of the previous step; and adding adaptors to both ends of the double-stranded nucleic acid fragment by means of the adaptor-joining sequence, wherein the adaptors have a primer binding region and a junction-complementary region, the junction-complementary region being complementary to the adaptor-joining sequence;

c. performing PCR amplification on the DNA double-stranded nucleic acid fragment with the adaptor obtained in step (b) using a first primer and a second primer having an adaptor binding region corresponding to the primer binding region of the adaptor and a sequencing probe binding region outside the adaptor binding region, thereby obtaining a mixture of first PCR amplification products;

d. subjecting the mixture of the first PCR amplification products to single-stranded formation, and blocking regions corresponding to the first primer and the second primer at both ends of the amplification products with blocking molecules, thereby obtaining a mixture of single-stranded amplification products blocked at both ends;

e. capturing disease-associated nucleic acid molecules from said mixture of blocked single-stranded amplification products using a nucleic acid chip;

f. performing PCR amplification on the nucleic acid molecules captured in the previous step by using a third primer and a fourth primer, thereby obtaining a mixture of second PCR amplification products, wherein the third primer and the fourth primer specifically correspond to or are combined with the first primer and the second primer respectively;

g. sequencing the mixture of the second PCR amplification products obtained in the previous step, thereby obtaining the nucleotide sequence of the disease-associated nucleic acid molecule in the sample.

In another preferred example, in step (g), the mixture of the second PCR amplification products is hybridized with the sequencing probe immobilized on the solid phase carrier, and solid phase bridge PCR amplification is performed to form a sequencing cluster; and sequencing the sequencing cluster by using a synthesis-sequencing-by-synthesis method, thereby obtaining the nucleotide sequence of the disease-related nucleic acid molecule in the sample.

In another preferred embodiment, said disrupted, genomic DNA-derived double stranded nucleic acid fragment of step (a) is of a length: 100-.

In another preferred embodiment, the fragment has a length of 150-500bp, preferably 200-300 bp.

In another preferred embodiment, the nucleic acid fragment has blunt ends prepared by a method of end repair.

In another preferred embodiment, the linker connecting sequence in step (b) is poly (N)_nWherein each N is independently selected from A, T, G or C, and N is any positive integer selected from 1-20.

In another preferred embodiment, the linker connecting sequence in step (b) is poly (A)_nWherein n is a positive integer of 1-20, preferably n is 1-2.

In another preferred embodiment, the linker-joining complementary region sequence in step (b) is poly (N')_mWherein each N' is independently selected from A, T, G or C, m is a positive integer from 1 to 20, and poly (N)_nAnd poly (N')_mIs a complementary sequence.

In another preferred embodiment, m is any positive integer selected from 1 to 3.

In another preferred embodiment, the length of the region complementary to the linker junction is the same as the length of the linker junction sequence, i.e., poly (N)_nAnd poly (N')_mIs a complete complementary sequence.

In another preferred embodiment, the linker-connecting complementary region is poly (T)_mWherein m is a positive integer of 1-20, preferably m is 1-2.

In another preferred embodiment, the first primer and the second primer in step (c) are oligonucleotides having a length of 30-80 bp.

In another preferred embodiment, the first primer and the second primer are 55-65bp in length.

In another preferred embodiment, said first primer and said second primer are different, and/or said third primer and said fourth primer are different.

In another preferred embodiment, the blocking molecule of step (d) blocks 70% to 100% of the region of the first PCR amplification product corresponding to the first primer and the second primer.

In another preferred embodiment, the blocking molecule described in step (d) blocks 100% of the region of the first PCR amplification product corresponding to the first primer and the second primer.

In another preferred example, the nucleic acid chip in the step (e) is immobilized with 5 to 200,000 specific probes corresponding to the disease.

In another preferred embodiment, the types of the specific probes on the chip in step (e) are 50-150,000, more preferably 500-100,000, and most preferably 5000-80,000.

In another preferred embodiment, the sequence of the probe corresponds to the following regions of the disease causing gene: 200bp in front and at the back of the exon and/or the exon.

In another preferred embodiment, the specific probe has a length of 20-120mer, preferably 50-100mer, more preferably 60-80 mer.

In another preferred embodiment, the specific probe is fully synthesized or synthesized by in vitro cloning.

In another preferred embodiment, the third primer and the fourth primer in step (f) are specifically bound to the outside of the first primer and the second primer, respectively, and have a length smaller than that of the first primer and the second primer.

In another preferred embodiment, the length of the third primer and the fourth primer is 15-40bp, preferably 20-25 bp.

In another preferred embodiment, the sample is derived from a human, an animal, a plant, or a microorganism.

In another preferred embodiment, the sample to be tested is derived from a human or non-human mammal, preferably from a human.

In another preferred embodiment, the sample to be tested contains human genomic DNA.

In another preferred embodiment, the disease is mendelian monogenic disease.

In another preferred embodiment, the disease is selected from the group consisting of: familial adenomatous polyposis, achondroplasia, familial hypercholesterolemia, polydactylosis, Marfan's syndrome, hereditary chorea, alopecia, phenylketonuria, cystinuria, hereditary high myopia, anti-D rickets, hereditary nephritis, hemophilia, thalassemia, tuberous brain sclerosis syndrome, Duchenne's muscular dystrophy, progressive muscular dystrophy, polycystic kidney syndrome, sexual reversal due to sex-determining gene mutation, or a combination thereof.

In a second aspect of the invention, there is provided a kit for determining the nucleotide sequence of a disease-associated nucleic acid molecule in a sample to be tested, which kit is useful in the method of the first aspect of the invention, said kit comprising:

(1) a first container and a nucleic acid chip located in the container;

(2) a second container and a fitting located within the container;

(3) a third container and a primer located within the container selected from the group consisting of: (a) a first primer and/or a second primer; or (b) a third primer and/or a fourth primer;

(4) a fourth container and a blocking molecule located within the container;

(5) and (5) detecting the instruction.

In another preferred embodiment, the kit further comprises a reagent selected from the group consisting of: reagents required for performing a PCR amplification, reagents required for performing a blocking reaction, reagents required for performing a hybridization reaction, or a combination thereof.

In another preferred embodiment, the disease is mendelian monogenic disease.

In another preferred embodiment, the nucleic acid chip has one or more probes immobilized on the surface thereof, wherein the probes are selected from the group consisting of:

1, probe 1: the sequence is shown as SEQ ID NO: 7, position 112073411 is captured, familial adenomatous polyps are detected;

and (3) probe 2: the sequence is shown as SEQ ID NO: 8, capture position 51479999, detecting polycystic kidney syndrome;

and 3, probe 3: the sequence is shown as SEQ ID NO: position 135766620, capture, detection of arthritic cerebral sclerosis syndrome;

and 4, probe 4: the sequence is shown as SEQ ID NO: 10, capture position 103231969, detect phenylketonuria;

and 5, probe: the sequence is shown as SEQ ID NO: 11, capture position 48700368, detect marfan syndrome;

and 6, probe 6: the sequence is shown as SEQ ID NO: 12, position 31137199 was captured and duchenne muscular dystrophy was detected.

It is to be understood that within the scope of the present invention, the above-described features of the present invention and those specifically described below (e.g., in the examples) may be combined with each other to form new or preferred embodiments. Not to be reiterated herein, but to the extent of space.

Drawings

The following drawings are included to illustrate specific embodiments of the invention and are not intended to limit the scope of the invention as defined by the claims.

FIG. 1 shows a flow chart of the present invention for simultaneous detection of multiple monogenic disorders.

Detailed Description

The present inventors have made extensive and intensive studies and, for the first time, have established a method for determining a nucleotide sequence of a disease-associated nucleic acid molecule in a sample to be tested. Specifically, the present inventors designed a nucleic acid chip on which a plurality of disease-specific probes were immobilized based on information on existing disease genes; adding a linker to the end of a fragmented double-stranded nucleic acid molecule derived from genome DNA in a sample to be tested, and enriching; and capturing the DNA fragment containing the joint by using a nucleic acid chip, sequencing the captured fragment on a high-throughput sequencing platform, and analyzing a sequencing result based on known gene locus information to obtain a nucleotide sequence of the disease-related nucleic acid molecule in the sample.

Term(s) for

As used herein, the term "comprising" includes "having", "consists essentially of", and "consists of".

Monogenic diseases

As used herein, the term "monogenic disease" refers to a disease or pathological trait controlled by a pair of alleles, also known as mendelian hereditary disease, which can be classified as autosomal dominant, autosomal recessive, x-associated, and Y-associated hereditary diseases.

The autosomal dominant hereditary disease-causing gene is located on the autosome, and the common subtypes: fully dominant: normal homozygous and heterozygous patients are phenotypically undifferentiated; incomplete dominance: the heterozygote is between the dominant homozygote patient and the normal person, and is often in a mild disease type; irregular phenotype: the heterozygote dominant gene may not show corresponding symptoms for some reason; co-dominant: there is no dominant and recessive distinction between alleles, and both genes can act in heterozygote; delayed dominance: heterozygote does not express dominant gene in early life stage, and expresses the dominant gene after a certain age; dependency is as follows: the expression of heterozygotes is influenced by sex, and the corresponding phenotype is expressed in one sex and not in the other. The causative gene on the autosome of an autosomal recessive genetic disorder does not manifest the corresponding disease in the heterozygous state, but causes the disease only in the homozygous state. The disease-causing gene located on the X chromosome inherits the disease along with the X chromosome, and includes X-linked dominant inheritance and X-linked recessive inheritance. The disease-causing gene located on the Y chromosome inherits the disease along with the Y chromosome.

Suitable monogenic diseases for use in the screening methods of the present invention include, but are not limited to: familial adenomatous polyposis, achondroplasia, familial hypercholesterolemia, polydactylosis, Marfan's syndrome, hereditary chorea, alopecia, phenylketonuria, cystinuria, hereditary high myopia, anti-D rickets, hereditary nephritis, hemophilia, thalassemia, tuberous brain sclerosis syndrome, Duchenne's muscular dystrophy, progressive muscular dystrophy, polycystic kidney syndrome, sexual reversal due to sex-determining gene mutation, or a combination thereof.

Exon(s)

As used herein, the term "exon" refers to the portion of the mature mRNA that is retained, i.e., the portion of the mature mRNA corresponding to the gene. Introns are the parts that are spliced out during mRNA processing and are not present in mature mRNA. Both exons and introns are for genes, the encoded part is an exon, the non-encoded part is an intron, and the intron has no genetic effect.

Probe needle

As used herein, the term "probe" refers to a simple DNA or RNA molecule capable of detecting a complementary nucleic acid sequence. The probe must be pure and unaffected by other nucleic acids of different sequences. Typically, the probe is a cloned DNA sequence or DNA obtained by PCR amplification, an artificially synthesized oligonucleotide or RNA obtained by in vitro transcription of a cloned DNA sequence, and may be used as the probe. The probe length may be from 20-120mer, preferably 50-100mer, more preferably 60-80 mer. Methods for designing and synthesizing probes are well known to those skilled in the art, and probes are designed based on exons of known pathogenic genes of monogenic diseases and sequences at both the front and back ends thereof (preferably around 200 bp). In a preferred embodiment, the probe is 50-80 mers in length. Probes can be synthesized using artificial chemical synthesis or using commercially available probes. A typical probe sequence is shown in Table 2.

Chip and method for manufacturing the same

As used herein, the term "chip" refers to a material containing a large number of probes, which is obtained by processing various fine structures on a substrate material of a chip by micromachining, applying necessary biochemical substances and performing surface treatment to immobilize various probe molecules on a surface.

The chips can be obtained by a person skilled in the art using common methods. There are generally 4 methods for preparing DNA chips. The 1 st is light guide in-situ synthesis method, which combines photoetching process with photochemical synthesis method in micromachining technology. The 2 nd method is a chemical spraying method in which a DNA chip is prepared by spraying and immobilizing a synthesized oligonucleotide probe on a chip at a fixed point. The 3 rd method is a contact-type spot coating method in which a DNA probe is coated on a chip by bringing a pipetting head into contact with a glass chip by the precise movement of a high-speed precision robot. In the 4 th method, DNA probes were synthesized in parallel on a chip using 4 piezojets each containing the A, T, G, C nucleosides.

The invention provides a nucleic acid chip with a surface fixed with probes corresponding to specific sequences of known genes, the types of the probes on the surface of the chip can reach tens of thousands, and the chip can detect a plurality of diseases for one same sample to be detected.

DNA library and preparation thereof

As used herein, the term "DNA library preparation" refers to the disruption of a desired segment of a genome to obtain a mixture of DNA fragments of a defined size.

Methods for preparing libraries are well known to those skilled in the art and include, but are not limited to, the steps of:

1. providing a sample to be tested, said sample containing broken double-stranded nucleic acid fragments derived from genomic DNA and said nucleic acid fragments having blunt ends;

2. adding an adaptor connecting sequence to the end of the double-stranded nucleic acid fragment of the previous step; and adding adaptors to both ends of the double-stranded nucleic acid fragment by means of the adaptor-joining sequence, wherein the adaptors have a primer binding region and a junction-complementary region, the junction-complementary region being complementary to the adaptor-joining sequence; the sequence of the primer binding region of the linker flanking the 3 'and 5' ends is different.

3. Amplifying the DNA double-stranded nucleic acid fragment with the adaptor obtained in the previous step with a first primer and a second primer, thereby obtaining a mixture of PCR amplification products, wherein the primers have an adaptor binding region corresponding to the primer binding region of the adaptor and a sequencing probe binding region located outside the adaptor binding region.

In a preferred embodiment, the cleavage product, the end-repair product, the linker product and the enrichment product may also be purified. Purification conditions and parameters are well known to those skilled in the art, and it is within the ability of those skilled in the art to make certain changes or optimizations to the reaction conditions.

Exon capture

As used herein, the terms "exon capture", "chip hybridization" and "hybridization" are used interchangeably and refer to the process of specifically selecting and binding DNA fragments in a library containing regions of target exons to a chip with disease-specific probes.

DNA molecules are normally double stranded and therefore, prior to capture, the DNA molecule must become single stranded, typically by denaturing it by heating for melting purposes, and the melted DNA molecule is rapidly cooled, i.e., remains single stranded. The library is denatured and then subjected to capture hybridization with the chip on the hybridization platform. Molecular hybridization is performed under stringent conditions between the DNA fragment containing the target exon region and the probe immobilized on the chip. Preferably, the concentration of probe molecules on the chip is much higher than the concentration of target molecules. After hybridization, the captured sequences are collected by methods such as denaturation and purified to obtain a mixture of sequences from the captured sequences.

The exon capture and elution and Purification of the desired fragment can be carried out by a person skilled in the art by general methods, and can also be carried out using commercially available (e.g.MinElute PCR Purification kit from Qiagen, Germany) kits.

In a preferred embodiment, a mixture of PCR amplification products of a DNA library to be detected is subjected to single-stranded reaction, and a blocking molecule is used to block regions corresponding to the first primer and the second primer in the amplification products, thereby obtaining a mixture of single-stranded amplification products with both ends blocked; capturing disease-associated nucleic acid molecules from said mixture of blocked single-stranded amplification products using a nucleic acid chip; amplifying the captured nucleic acid molecules with a third primer and a fourth primer, which specifically bind to said first primer and said second primer, respectively, thereby obtaining a mixture of second PCR amplification products; sequencing the mixture of the second PCR amplification products obtained in the previous step, thereby obtaining the nucleotide sequence of the disease-associated nucleic acid molecule in the sample.

Primer and method for producing the same

As used herein, the term "primer" refers to a generic term for oligonucleotides that can be complementarily paired with a template to synthesize a DNA strand complementary to the template by the action of a DNA polymerase. The primer can be natural RNA, DNA, or any form of natural nucleotide, and the primer can even be non-natural nucleotide such as LNA or ZNA.

A primer is "substantially" (or "substantially") complementary to a particular sequence on one strand of the template. The primer must be sufficiently complementary to one strand of the template to begin extension, but the sequence of the primer need not be completely complementary to the sequence of the template. For example, a primer that is complementary to the template at its 3 'end and has a sequence that is not complementary to the template at its 5' end remains substantially complementary to the template. Primers that are not perfectly complementary can also form a primer-template complex with the template, so long as there is sufficient primer binding to the template, allowing amplification to occur.

In the present invention, the sequences and names of several types of important primers are shown in Table 1.

TABLE 1

A first primer (SEQ ID NO: 1) and a second primer (SEQ ID NO: 2) amplify a DNA double-stranded nucleic acid fragment with a linker to obtain a first PCR amplification product, the first primer and the second primer having a linker binding region corresponding to the primer binding region of the linker and a sequencing probe binding region located outside the linker binding region. The blocking molecules 1(SEQ ID NO: 3) and 2(SEQ ID NO: 4) function to complement the linker during sequence capture to avoid capture of non-specific sequences. The third primer (SEQ ID NO: 5) and the fourth primer (SEQ ID NO: 6) function to amplify the captured specific DNA fragments in large quantities for further sequencing.

Enrichment detection

The invention also provides a method for detecting Enrichment (Enrichment) of amplification products, which comprises the following steps: ligation Mediated polymerase chain reaction (Ligation-Mediated PCR, LM-PCR) and qPCR (Real-time quantitative PCR detection System). One skilled in the art can detect the enrichment by a fluorescent quantitative nucleic acid amplification detection system. qPCR is to add excessive fluorescent dye (SYBR and the like) into a PCR reaction system, the fluorescent dye emits a fluorescent signal after being specifically doped into a DNA double strand, SYBR dye molecules which are not doped into the DNA double strand do not emit any fluorescent signal, the quantity of a specific product is immediately determined by continuously monitoring the change of the intensity of the fluorescent signal during PCR exponential amplification, and the initial quantity of a target gene is deduced according to the quantity.

As used herein, LM-PCR refers to the specific amplification of DNA fragments by ligation of specific linkers for the purpose of sensitive detection of nucleic acid fragments. Furthermore, the LM-PCR assay is semi-quantitative, so that comparisons of different samples can be made.

In a preferred embodiment of the invention, the enrichment degree detection comprises the following steps:

1) taking out the diluted 4 NSC Assay mix to dissolve on ice;

2) according to the Nanodrop detection concentration, the Non-Captured and Captured LM-PCR products are diluted to 1 ng/mul, and the volume is required to be more than 12 mul;

3) 4 NSC Assay per sample, each sample comprising 2 DNA templates, each sample requiring 4 × 2 ═ 8 reactions, 4 reactions per plate requiring 1 negative control;

4) preparing QPCR reaction mixed liquid in a 1.5ml centrifugal tube;

5) transferring the prepared 12 mul QPCR reaction mixed solution to a QPCR reaction plate with 96 holes, adding 3 mul of diluted 1 ng/mul LM-PCR product, sealing the plate by using a sealing film after all reagents and samples are added, and centrifuging at 4000rpm for 2 min;

6) placing the 96-well plate on a QPCR instrument for detection;

7) and analyzing the test result after the experiment is finished, sorting QPCR test data, calculating the enrichment degree according to a formula, and judging whether the library is qualified or not, wherein the library can be subjected to the next test after the library is qualified. When the Average Fold entity is more than 60, the library is qualified, and the next sequencing can be carried out. The format of the Enrichment calculation is shown in table 2.

TABLE 2

High throughput sequencing

"resequencing" of the genome allows humans to discover abnormal changes in disease-associated genes as early as possible, facilitating intensive research in the diagnosis and treatment of individual diseases. One skilled in the art can generally use three second generation sequencing platforms for high throughput sequencing: 454FLX (Roche), Solexa genome Analyzer (Illumina), SOLID from Applied Biosystems, and the like. The platforms have the common characteristic of extremely high sequencing flux, compared with the conventional sequencing of 96-channel capillary, 40-400 ten thousand sequences can be read in one experiment by high-throughput sequencing, and the reading length is different from 25bp to 450bp according to different platforms, so that different sequencing platforms can read different base numbers from 1G to 14G in one experiment.

The Solexa high-throughput sequencing comprises two steps of DNA cluster formation and on-machine sequencing: hybridizing the mixture of the PCR amplification products with a sequencing probe fixed on a solid phase carrier, and performing solid phase bridge type PCR amplification to form a sequencing cluster; sequencing the sequencing cluster by a synthesis-sequencing-by-synthesis method so as to obtain the nucleotide sequence of the disease-related nucleic acid molecules in the sample.

The formation of DNA cluster is to use the sequencing chip (flow cell) with a layer of single-stranded primer (primer) connected on the surface, the DNA fragment in single-stranded state is fixed on the surface of the chip by the base complementary pairing principle through the adaptor sequence and the primer on the surface of the chip, the fixed single-stranded DNA is changed into double-stranded DNA through the amplification reaction, the double strand is denatured into single strand again, one end of the double strand is anchored on the sequencing chip, and the other end is anchored by being complemented with another primer nearby randomly to form a bridge; more than ten million DNA single molecules are reacted simultaneously on a sequencing chip; the formed single-chain bridge takes the surrounding primers as amplification primers, and the amplification is carried out again on the surface of the amplification chip to form double chains, and the double chains are denatured into single chains and become bridges again, which is called as the continuous amplification of a template of the next round of amplification; after 30 rounds of amplification were repeated, each single molecule was amplified 1000-fold and was called a monoclonal DNA cluster.

Synthesizing and sequencing the DNA cluster on a Solexa sequencer, wherein in the sequencing reaction, four bases are respectively marked with different fluorescence, the tail end of each base is sealed by a protected base, only one base can be added in a single reaction, after scanning and reading the color of the reaction, the protected group is removed, the next reaction can be continued, and repeating the steps to obtain the accurate sequence of the base. During the Solexa multiplex sequencing (MultiplexedSequening), Index (label) is used to distinguish samples, and after the conventional sequencing is completed, the Index part is subjected to 7 additional cycles of sequencing, and through the identification of Index, 12 different samples can be distinguished in 1 sequencing lane.

The invention provides a method for determining the nucleotide sequence of a disease-associated nucleic acid molecule in a sample to be tested. Referring to fig. 1, a preferred embodiment of the present invention includes (but is not limited to) the following steps:

breaking the genome DNA in the sample into small fragments with the main band of 200-250bp, carrying out end repair on the double-stranded DNA to obtain flat-end DNA, adding an ' A ' at the 3 ' end of each strand, connecting with a connector with a ' T ' and forming a double-stranded DNA fragment mixture with connectors at both ends; hybridizing the mixture with a chip fixed with a disease-specific probe, capturing a disease-specific DNA fragment, enriching the captured DNA fragment, and performing solid-phase bridge PCR amplification to form a sequencing cluster; and (3) performing computer sequencing on the sequencing cluster by using a method of synthesizing and sequencing simultaneously, and finally performing data analysis.

And (3) analyzing a sequencing result:

(1) performing quality control on the sequencing result of the original read, wherein the items included in the quality control of the original read are shown in a table 3;

TABLE 3

(2) Carrying out short sequence comparison and outputting, namely an original comparison result-SAM file;

(3) processing the comparison result by using a samtools tool, comprising the following steps: format conversion and compression; sequencing the comparison results according to the chromosome numbers and the coordinates; lane results from the same library were pooled; each library was deduplicated separately (replication); all libraries were pooled together and finally SNP detection was performed using the soapsnp tool.

Reagent kit

The present invention also provides a kit for determining the nucleotide sequence of a disease-associated nucleic acid molecule in a sample to be tested, said kit comprising:

(1) a first container and a nucleic acid chip located in the container;

(2) a second container and a fitting located within the container;

(4) a fourth container and a blocking molecule located within the container;

(5) and (5) detecting the instruction.

In a preferred embodiment of the present invention, the kit further comprises a reagent selected from any of the following groups:

reagents required for performing a PCR amplification, reagents required for performing a blocking reaction, reagents required for performing a hybridization reaction, or a combination thereof.

The main advantages of the invention include:

1. capturing target DNA fragments through a chip fixed with a nucleic acid probe, and covering the target DNA fragments comprehensively;

2. amplifying all the captured fragments by using 1 pair of primers specifically combined with the two-end adapters of the DNA fragments to obtain an amplification mixture with the same adapter sequence and different middle fragments,

3. the amplification products are firstly synthesized into a sequencing cluster and then synthesized and sequenced, so that the efficiency is high, the repeated sequence can be accurately read, and the high sequencing depth can be achieved;

4. a plurality of samples can be detected simultaneously, and the interference of fluorescence background is avoided;

5. the test cost is low, and only 1/100 of the traditional method is used;

6. the method is not limited by species, and the individual detection can be carried out on human beings, animals, microorganisms, plants and the like;

7. high sensitivity, high accuracy and good repeatability.

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Experimental procedures without specific conditions noted in the following examples, molecular cloning is generally performed according to conventional conditions such as Sambrook et al: the conditions described in the Laboratory Manual (New York: Cold spring Harbor Laboratory Press, 1989), or according to the manufacturer's recommendations.

Example 1

Establishing chip hybridization platform

The probes are designed from the exon sequences of known pathogenic genes of the monogenic diseases and 100bp before and after the exons, and a plurality of probes of 7 ten thousand in total are designed, and the SEQ ID NO. thereof, the chromosome coordinates thereof, the capture positions thereof, the capture lengths thereof and the types of the related diseases are shown in Table 4.

TABLE 4

Example 2

Preparation of DNA library

1. Genomic DNA acquisition

Human peripheral blood was taken and genomic DNA was extracted to obtain 3. mu.g of DNA.

DNA fragmentation

The extracted human genomic DNA sample is fragmented on a Covaris S2 instrument (purchased from Covaris, USA) and finally broken into a mixture of DNA double-stranded fragments with the main bands of 200bp, and the fragments are purified by adopting an Ampure Beads method according to Agencour AMPureptocol (Beckman, USA).

DNA fragment ligation

And (2) carrying out end repairing on the DNA fragments to obtain a fragment mixture with a flat end, adding an A at the 3' end of each single chain so as to be convenient for connecting with a joint with a T, and purifying after connection, wherein the purification method adopts Ampure Beads according to the Agencour AMPure protocol (Beckman company, USA). After purification, excess reagents such as buffer, enzymes, ATP, etc. are removed, and finally only the set of adaptor-ligated DNA fragments remains.

4. Amplification of DNA fragments

Since the DNA sample with the linker is very low in concentration and requires amplification enrichment, the PCR reaction was performed on a PTC-200PCR instrument from Bio-Rad. The configuration of the PCR amplification reagents is shown in Table 5.

The PCR reaction system is as follows: 30s at 98 ℃; denaturation at 98 ℃ for 15s, annealing at 65 ℃ for 30s, and extension at 72 ℃ for 30s, and amplifying for 4-10 cycles; final extension at 72 ℃ for 5 min.

TABLE 5

The amplified DNA was ligated and the PCR product was purified according to the Agencour AMPureptocol procedure (Beckman, USA) using Ampure beads.

5. The purified product is dissolved in 25 μ l of pure water, and the concentration of the PCR product is detected by using NanoDrop1000, so that a DNA library is formed, and the DNA library can be stored for several days at 4 ℃, can also be stored for several weeks at-20 ℃, and can also be directly used for subsequent procedures.

Example 3

Sequence Capture

1. Denaturation of libraries

The prepared DNA sample was dried by evaporation at 60 ℃ in a speedVac, and then 11.2. mu.L of ultrapure water was added thereto and sufficiently dissolved. The sample was centrifuged at full speed for 30 seconds and the following two reagents were added: 18.5. mu.L of 2 XSCHapart Buffer (available from Roche NimbleGen, USA) and 7.3. mu.L of 1 XSCHapart Component A (available from Roche NimbleGen, USA). Shaking and mixing evenly, placing the mixture on a centrifuge for 30 seconds at full speed, then fully denaturing the DNA at 95 ℃ and obtaining a single-stranded DNA library with a joint after 10 minutes of denaturation.

2. Hybridization/sequence Capture

The chip with the corresponding probe in example 1 was fixed on a hybridization instrument (Roche NimbleGen, USA), the sample after the previous step of the mutation was added to the chip, the chip was closed, and hybridization was carried out at 42 ℃ for 64 hours. In the hybridization system, the concentration of the probe molecules on the gene chip is much higher than that of the target molecules.

The hybridization reaction system is shown in table 6:

TABLE 6

Wherein, Cot-1DNA can well block nonspecific hybridization from a genome repetitive sequence, and the hybridization efficiency is improved to the maximum extent; PE Block 1.0 and PE Block 2.0 can Block PE Primer1.0 and PE Primer2.0 in example 2 to avoid non-specific capture.

3. Chip washing and sample purification

Chip washing and sample purification were performed according to kit instructions from Roche NimbleGen, USA, and the specific steps are shown in Table 7.

TABLE 7

The NaOH eluate was recovered and neutralized with 32. mu.L of 20% glacial acetic acid, and the neutralized solution was purified with MinElute PCR Purification Kit from Qiagen, Germany to obtain a captured sample, which was finally dissolved in 138. mu.L of purified water.

Example 4

PCR amplification of captured sequences

Since the concentration of the captured DNA fragment containing the specific sequence was very low, the reaction system required for PCR amplification was 50. mu.L per tube, and the reaction components are shown in Table 8.

TABLE 8

Reaction conditions are as follows:

pre-denaturation at 98 ℃ for 30s, denaturation at 98 ℃ for 15s, annealing at 62 ℃ for 30s, and extension at 72 ℃ for 30s, and circulating for 20 times; finally, the extension is carried out for 5min at 72 ℃ and allowed to stand overnight at 4 ℃.

The PCR product was purified using the Ampure Beads procedure.

After completion, the mixture was dissolved in 50. mu.l EB and the concentration was measured using NanoDrop and Bioanalyzer 2100.

Example 5

Detecting enrichment of capture sequences

1. Diluted 4 NSC Assay mix (available from Roche NimbleGen, usa) were taken out according to the instructions in the kit and dissolved on ice. Non-Captured and Captured LM-PCR products were diluted to 1 ng/. mu.l, final volume > 12. mu.l.

2. The qPCR reaction mix was prepared in a 1.5ml centrifuge tube and distributed to a 96 well qPCR reaction plate, to which was added 3 μ l of diluted 1ng/μ l LM-PCR product, after all reagents and samples were added the plate was sealed with a sealing film and centrifuged at 4000rpm for 2 min.

3. The 96-well plate was placed on a qPCR instrument and the operation was performed according to the instruction manual.

4. After the experiment is completed, qPCR test data are collated and analyzed, and the Enrichment degree (Enrichment) is calculated, and the result shows that the Enrichment degree of the human genome DNA sample (n is 10) is more than 60 after the human genome DNA sample is treated by the method of the embodiment 1-5, and the human genome DNA sample can be used for subsequent sequencing.

Example 6

Solexa high throughput sequencing and data analysis

Hybridizing the mixture of the PCR amplification products with a sequencing probe fixed on a solid phase carrier, and performing solid phase bridge type PCR amplification to form a sequencing cluster; sequencing said sequencing cluster by "sequencing-by-synthesis" method, thereby obtaining the nucleotide sequence of the disease-associated nucleic acid molecule in the sample, comprising the steps of:

a layer of single-stranded primer is linked on the surface of a special sequencing chip (flow cell) for Solexa sequencing, and a DNA fragment in a single-stranded state and the surface of the chip are anchored on the chip by one end through base complementation; the single-stranded DNA by the amplification reaction becomes double-stranded DNA; the double strand is denatured again into a single strand, one end of which is "anchored" to the sequencing chip, and the other end (5 'or 3') is randomly complementary to another primer nearby and is "anchored" to form a "bridge"; the sequencing chip has more than ten million DNA single molecules which react together; the formed single-chain bridge takes surrounding primers as amplification primers, and the amplification is carried out on the surface of the sequencing chip again to form a double chain; the double strand is denatured into single strand, and then forms bridge again to become the next round of amplified template for continuous amplification reaction; after 30 rounds of amplification are carried out repeatedly, each single molecule is amplified 1000 times to form a monoclonal 'DNA cluster'; "DNA clusters" were subjected to sequence analysis on a Solexa sequencer; sequencing reaction: "reversible end-termination reactions" enhance base synthesis for sequencing. The four bases are respectively marked with four different fluorescences, the tail end of each base is sealed by a protecting group, only one base can be added in a single reaction, after scanning and reading the reaction color, the protecting group is removed, the next reaction can be continued, and the steps are repeated to obtain the accurate sequence of the base; the bases are automatically read, and the data are transferred to an automatic analysis channel for secondary analysis.

Example 7

Four methods were used to test whether the samples carried the following three monogenic diseases.

Specifically, examples 1-5 were repeated, with the exception of the sequencing method and the linker junction region. The differences and the results are shown in Table 9.

TABLE 9

As can be seen from Table 9, the DNA libraries with different linker regions prepared by the method of the present invention were analyzed by combining with the second generation sequencing method, and the Sanger method showed that the method of the present invention could obtain accurate screening results.

Example 8

Preparation of the kit

A kit for determining the nucleotide sequence of a disease-associated nucleic acid molecule in a sample to be tested, comprising the components:

(1) a first container and a nucleic acid chip located in the container;

(2) a second container and a fitting located within the container;

(3) a third container and a first primer and/or a second primer located in the container; and a third primer and/or a fourth primer;

(4) a fourth container and a blocking molecule located within the container;

(5) a fifth container and reagents required for PCR amplification located in the containers;

(6) a sixth container and reagents required for carrying out a blocking reaction in the container;

(7) a seventh container and reagents required for the hybridization reaction in the container;

(5) and (5) detecting the instruction.

All documents referred to herein are incorporated by reference into this application as if each were individually incorporated by reference. Furthermore, it should be understood that various changes and modifications of the present invention can be made by those skilled in the art after reading the above teachings of the present invention, and these equivalents also fall within the scope of the present invention as defined by the appended claims.

Claims

1. A method for determining the nucleotide sequence of a disease-associated nucleic acid molecule in a sample to be tested, comprising the steps of:

2. The method of claim 1, wherein in step (g), the mixture of second PCR amplification products is hybridized to sequencing probes immobilized on a solid support and subjected to solid phase bridge PCR amplification to form a sequencing cluster; and sequencing the sequencing cluster by using a synthesis-sequencing-by-synthesis method, thereby obtaining the nucleotide sequence of the disease-related nucleic acid molecule in the sample.

3. The method of claim 1, wherein the broken double stranded nucleic acid fragment derived from genomic DNA in step (a) is of a length: 100-;

preferably, the fragment has a length of 150-500bp, preferably 200-300 bp.

4. The method of claim 1, wherein the nucleic acid fragments of step (a) have blunt ends prepared by a method of end repair.

5. The method of claim 1, wherein the linker sequence in step (b) is poly (N)_nWherein each N is independently selected from A, T, G or C, and N is any positive integer selected from 1-20;

preferably, the linker sequence is poly (A)_nWherein n is a positive integer of 1-20, preferably n is 1-2.

6. The method of claim 1, wherein the linker junction complementary region sequence in step (b) is poly (N')_mWherein each N' is independently selected from A, T, G or C, m is any positive integer selected from 1-20, and poly (N)_nAnd poly (N')_mIs a complementary sequence;

preferably, m is any positive integer selected from 1 to 3; or preferably, the length of the region complementary to the linker junction is the same as the length of the linker junction sequence, i.e., poly (N)_nAnd poly (N')_mIs a complete complementary sequence; or preferably, the linker connecting complementary region is poly (T)_mWherein m is a positive integer of 1 to 20, more preferably m is 1 to 2.

7. The method of claim 1, wherein the adapter ligation sequence of step (b) is A and the adapter ligation complementary region sequence is T.

8. The method of claim 1, wherein the first primer and the second primer in step (c) are oligonucleotides 30-80bp in length; more preferably, the first primer and the second primer are 55-65bp in length.

9. The method of claim 1, wherein in step (c) the first primer and the second primer are different, and/or the third primer and the fourth primer are different.

10. The method of claim 1, wherein the blocking molecule of step (d) blocks 70% to 100% of the region of the first PCR amplification product corresponding to the first primer and the second primer;

preferably, the blocking molecule in step (d) blocks 100% of the region of the first PCR amplification product corresponding to the first primer and the second primer.

11. The method of claim 1, wherein the nucleic acid chip of step (e) is immobilized with 5 to 200,000 specific probes corresponding to the disease;

preferably, the species of the specific probes on the chip in step (e) are 50-150,000, more preferably 500-100,000, and most preferably 5000-80,000.

12. The method of claim 1, wherein the sequence of the probe in step (e) corresponds to the following region of a disease causing gene: 200bp of the exon and/or the front end and the back end of the exon;

preferably, the specific probe has a length of 20-120mer, preferably 50-100mer, more preferably 60-80 mer.

13. The method of claim 1, wherein the method has one or more characteristics selected from the group consisting of:

the specific probe is synthesized by total manual or in vitro cloning;

the third primer and the fourth primer of step (f) are specifically combined on the outer sides of the first primer and the second primer respectively, and have smaller lengths than the first primer and the second primer;

the length of the third primer and the fourth primer is 15-40bp, preferably 20-25 bp;

the sample is derived from a human, animal, plant, or microorganism;

the sample to be tested is from a human or non-human mammal, preferably from a human;

the sample to be detected contains human genome DNA;

the disease is Mendelian monogenic disease.

14. A kit for determining the nucleotide sequence of a nucleic acid molecule associated with a disease in a sample to be assayed for use in the method of claim 1, comprising:

(1) a first container and a nucleic acid chip located in the container;

(2) a second container and a fitting located within the container;

(4) a fourth container and a blocking molecule located within the container;

(5) and (5) detecting the instruction.

15. The kit of claim 14, wherein the disease is mendelian monogenic disease; preferably, the disease is selected from the group consisting of: familial adenomatous polyposis, achondroplasia, familial hypercholesterolemia, polydactylosis, Marfan's syndrome, hereditary chorea, alopecia, phenylketonuria, cystinuria, hereditary high myopia, anti-D rickets, hereditary nephritis, hemophilia, thalassemia, tuberous brain sclerosis syndrome, Duchenne's muscular dystrophy, progressive muscular dystrophy, polycystic kidney syndrome, sexual reversal due to sex-determining gene mutation, or a combination thereof.

16. The kit of claim 14, further comprising a reagent selected from the group consisting of: reagents required for performing a PCR amplification, reagents required for performing a blocking reaction, reagents required for performing a hybridization reaction, or a combination thereof; and/or

One or more probes selected from the following group are fixed on the surface of the nucleic acid chip: