[go: up one dir, main page]

HK1199070B - Methods of amplifying whole genome of a single cell - Google Patents

Methods of amplifying whole genome of a single cell Download PDF

Info

Publication number
HK1199070B
HK1199070B HK14112461.9A HK14112461A HK1199070B HK 1199070 B HK1199070 B HK 1199070B HK 14112461 A HK14112461 A HK 14112461A HK 1199070 B HK1199070 B HK 1199070B
Authority
HK
Hong Kong
Prior art keywords
dna
amplification
cell
polymerase
reaction mixture
Prior art date
Application number
HK14112461.9A
Other languages
Chinese (zh)
Other versions
HK1199070A1 (en
Inventor
Xiaoliang Sunney Xie
Chenghang ZONG
Sijia Lu
Original Assignee
President And Fellows Of Harvard College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by President And Fellows Of Harvard College filed Critical President And Fellows Of Harvard College
Priority claimed from PCT/US2012/038930 external-priority patent/WO2012166425A2/en
Publication of HK1199070A1 publication Critical patent/HK1199070A1/en
Publication of HK1199070B publication Critical patent/HK1199070B/en

Links

Description

单细胞全基因组扩增方法Single-cell whole genome amplification method

相关申请号Related application number

该申请请求对下列美国临时专利的优先权:#61/621,271,2012-4-6提交;#61/550,667,2011-10-24提交;#61/510,539,2011-7-22提交;#61/490,790,2011-5-27提交;This application claims priority to the following U.S. provisional patents: #61/621,271, filed April 6, 2012; #61/550,667, filed October 24, 2011; #61/510,539, filed July 22, 2011; #61/490,790, filed May 27, 2011;

政府利益声明Government Interest Statement

该发明获NIH3R01HG005097-02和3R01HG005097-02S1经费支持。政府对该发明有一定权利。This invention was supported by NIH Grants 3R01HG005097-02 and 3R01HG005097-02S1. The government has certain rights in this invention.

技术领域Technical Field

本发明的实施方式通常涉及用于扩增基因组序列如单细胞全基因组扩增的方法与成分。Embodiments of the present invention generally relate to methods and compositions for amplifying genomic sequences, such as single cell whole genome amplification.

背景技术Background Art

目前使用随机引物扩增DNA与RNA的一个常见问题是引物二聚体的形成。引物使用浓度需要足够高以取得有效扩增,高浓度引物导致二聚体的形成,明显降低引物与DNA或RNA模板的结合。使用随机引物扩增DNA的其它问题包括由于位点丢失而无法扩增基因组的全部,短小扩增物的生成,以及有时无法扩增部分降解或有人工修饰的DNA。A common problem with random primer amplification of DNA and RNA is the formation of primer dimers. Primer concentrations must be high enough for efficient amplification. High primer concentrations lead to dimer formation, significantly reducing primer binding to the DNA or RNA template. Other issues with random primer amplification include inability to amplify the entire genome due to site loss, the generation of short amplicons, and the inability to amplify partially degraded or artificially modified DNA.

文献中已经有关于基因组PCR扩增方法的报道,例如US7,718,403,US2003/0108870和US7,402,386.如Genomeplex和Picoplex的PCR扩增方法对不同基因位点有明显的偏向(不均一性),导致扩增后部分位点丢失。这些方法扩增的细胞产物常常只能测序5-30%的基因组序列。多个置换扩增法(MDA)利用低温扩增,可导致嵌合物的大量形成,这些非基因组本来的序列造成明显的测序假阳性结果以及测序人工假象。虽然全基因组扩增已经有开始尝试,但一般说来这些方法低效,复杂,费用贵。因此,需要发展一种从很少DNA量,比如说单细胞,扩增全基因组的有效方法。There are reports on genomic PCR amplification methods in the literature, such as US7,718,403, US2003/0108870 and US7,402,386. PCR amplification methods such as Genomeplex and Picoplex have significant bias (heterogeneity) towards different gene loci, resulting in the loss of some loci after amplification. Cell products amplified by these methods often only sequence 5-30% of the genomic sequence. Multiple displacement amplification (MDA) uses low-temperature amplification, which can lead to the formation of a large amount of chimeras. These non-genomic sequences cause significant false-positive sequencing results and sequencing artifacts. Although whole genome amplification has been attempted, these methods are generally inefficient, complex and expensive. Therefore, there is a need to develop an effective method for amplifying the entire genome from a very small amount of DNA, such as a single cell.

发明内容Summary of the Invention

该专利涉及一种DNA扩增方法,该扩增方法使用少量基因组DNA,或少量基因组DNA序列,来自于人或其它物种的单细胞,或一种细胞的少数几个细胞,血液或体液。该扩增方法可以用单个反应管,在PCR扩增仪上进行扩增。该方法可用于由单细胞扩增的全基因组DNA的高通量测序,并取得相当的测序深度。This patent relates to a DNA amplification method that uses a small amount of genomic DNA, or a small amount of genomic DNA sequence, from a single cell, or a small number of cells of a single cell, or from blood or body fluids, from a human or other species. This amplification method can be performed using a single reaction tube on a PCR amplifier. This method can be used for high-throughput sequencing of whole-genomic DNA amplified from a single cell, achieving substantial sequencing depth.

该专利方法对单细胞全基因组的各不同位点进行高保真,高均一,高测序广度的扩增,扩增物用于高通量测序分析。该方法减少了测序偏向,从而能从单细胞提供出比较全面的,90%以上,基因组DNA序列信息,在7x,10x,或15x的测序深度时,可以测得70%以上的基因组序列信息,并且基本没有嵌合物形成的影响。该方法避免了测序人工假象,推进了单个核苷酸多形性(SNP),基因拷贝数多形性(CNV),基因结构变化(SV)等高级基因组分析,及高通量基因测序分析。该方法特别有用于有多个细胞群体的生物系统,如肿瘤,神经组织等。This patented method amplifies various sites across the entire genome of a single cell with high fidelity, uniformity, and sequencing breadth, and the amplified products are used for high-throughput sequencing analysis. This method reduces sequencing bias, thereby providing relatively comprehensive, over 90% genomic DNA sequence information from a single cell. At sequencing depths of 7x, 10x, or 15x, over 70% of the genomic sequence information can be measured, with virtually no influence from chimera formation. This method avoids sequencing artifacts and advances advanced genomic analyses such as single nucleotide polymorphisms (SNPs), gene copy number polymorphisms (CNVs), and gene structural variations (SVs), as well as high-throughput gene sequencing analysis. This method is particularly useful for biological systems with multiple cell populations, such as tumors and neural tissue.

该专利方法所用引物包含一段共同序列,一段可变序列,一段固定序列。DNA变性后单链的引物与单链模板在低温结合,然后在高温下扩增,使用至少一个具有链置换性能及3’端核酸核酸外切酶活性的聚合酶。在第二个扩增循环结束时,双链DNA(dsDNA),一端有引物的序列结合,另一端有互补引物序列结合,变性产生单链DNA(ssDNA)。反应温度降低到可以使游离引物与单链扩增物的3’端杂交的温度,从而防止扩增物自身或相互之间杂交,再降温到适当温度完成引物的结合与下一个扩增循环。The primers used in this patented method contain a common sequence, a variable sequence, and a fixed sequence. After the DNA is denatured, the single-stranded primer binds to the single-stranded template at low temperature and then amplifies at high temperature using at least one polymerase with strand displacement properties and 3' end nucleic acid exonuclease activity. At the end of the second amplification cycle, the double-stranded DNA (dsDNA), with the primer sequence bound at one end and the complementary primer sequence bound at the other end, denatures to produce single-stranded DNA (ssDNA). The reaction temperature is lowered to a temperature that allows the free primer to hybridize with the 3' end of the single-stranded amplicon, thereby preventing the amplicon from hybridizing with itself or with each other, and then cooled to an appropriate temperature to complete the primer binding and the next amplification cycle.

该专利方法可用于少量或微量DNA扩增,对样品DNA的多个部位进行核型分析或高通量筛选,可用于快速建立针对染色体特定区域的带特异性的探针,微解剖,或扩增未知染色体区域或已标记的异常染色体的标记染色体,用于扩增物的快速克隆或建立DNA库。因此该方法不仅对核型分析与高通量筛选有用,对细胞遗传诊断也有价值。This patented method can be used to amplify small or trace amounts of DNA, perform karyotyping or high-throughput screening on multiple locations within a sample DNA sample, and rapidly establish probes specific for specific chromosomal regions, perform microdissection, or amplify marker chromosomes of unknown chromosomal regions or labeled abnormal chromosomes. It can also be used for rapid cloning of amplifications or the establishment of DNA libraries. Therefore, this method is not only useful for karyotyping and high-throughput screening, but also valuable for cytogenetic diagnosis.

该专利方法使用非常规PCR循环扩增方法从单个哺乳动物细胞扩增全基因组序列。某些DNA聚合酶能够产生典型的结果。可以增加一个扩增步骤让游离引物与扩增物上的互补序列结合,比如说3’端,从而防止扩增物自身或相互之间杂交,以至于嵌合物的形成,并提供了去除扩增物两端引物的方法。This patented method uses an unconventional PCR cycle amplification method to amplify whole genome sequences from single mammalian cells. Certain DNA polymerases can produce typical results. An additional amplification step allows free primers to bind to complementary sequences on the amplicon, such as the 3' end. This prevents hybridization of the amplicon to itself or to each other, leading to the formation of chimeras, and provides a method for removing primers from both ends of the amplicon.

该专利方法扩增一个或几个细胞的全基因组序列。在使用示意中,首先分离单细胞,将之在一定体积的细胞裂解液中裂解,释放基因组DNA,裂解产物可以分成2亚份到10万亚份不等。DNA分离方法可以将同源的两个基因位点分与不同的亚份中,亚份数越多,同源DNA位点分与不同亚份的可能性越高。同源DNA的两个位点分与不同亚份可以用于基因组DNA半倍体的核型分析。不同亚份中的基因组组份可以扩增得到相应部分的基因组扩增物,用于测序分析。两个或两个以上,相同的或不同细胞,的样品可以当作单细胞样品处理分析。This patented method amplifies the whole genome sequence of one or several cells. In the usage diagram, a single cell is first separated and lysed in a certain volume of cell lysis solution to release genomic DNA. The lysate can be divided into 2 to 100,000 subparts. The DNA separation method can divide two homologous gene sites into different subparts. The more subparts there are, the higher the possibility that the homologous DNA sites are divided into different subparts. The two sites of homologous DNA are divided into different subparts, which can be used for karyotype analysis of genomic DNA hemiploidy. The genomic components in different subparts can be amplified to obtain the corresponding part of the genomic amplification products for sequencing analysis. Two or more samples of the same or different cells can be processed and analyzed as single cell samples.

该专利方法扩增一个或几个细胞的全基因组序列。单细胞的遗传物质可以分成若干亚份,并分别进行扩增分析,从而实现单倍体或二倍体核型分析。该方法可用于没有参考基因组序列,或有复杂结构变化的基因组的物种的全新基因组测序组装。该方法可采用不同来源的DNA,包括不均一组织,如肿瘤,罕见与珍贵样品,如胚胎干细胞,非分裂细胞,如神经元,等。也可以用于不同测序平台与核型分析方法。单倍体核型分析方法可参见下列文献:Levy,S.et al.The diploid genome sequence of an individual human.PlosBiol.5,e254(2007);de Bakker,P.I.et al.A high-resolution HLA and SNP haplotypemap for disease association studies in the extended human MHC.Nat.Genet.38,1166-1172(2006);Nagel,R.L.et al.The Senegal DNA haplotype is associated withthe amelioration of anemia in African-American sickle cell anemiapatients.Blood77,1371-1375(1991);Drysdale,C.M.et al.Complex promoter andcoding region beta2-adrenergic receptor haplotypes alter receptor expressionand predict in vivo responsiveness.Proc.Natl.Acad.Sci.USA97,10483-10488(2000);Sun,T.et al.Haplotypes in matrix metalloproteinase gene cluster onchromosome11q22contribute to the risk of lung cancer development andprogression.Clin.Cancer Res.12,7009-7017(2008);Kitzman,J.O.et al.Haplotype-resolved genome sequencing of a Gujarati Indian individual.Nat.Biotech.29,59-63(2011);International HapMap Consortium.Integrating common and rare geneticvariation in diverse human populations.Nature467,52-58(2010);Xiao,M.et.Al.Direct determination of haplotypes from single DNAmolecules.Nat.Methods6,199-201(2009);Fan H.C.et al.Whole-genome molecularhaplotyping of single cells.Nat.Biotech.29,51-57(2011).This patented method amplifies the whole genome sequence of one or several cells. The genetic material of a single cell can be divided into several subfractions and amplified and analyzed separately, enabling haploid or diploid karyotyping. This method can be used to sequence and assemble de novo genomes from species for which no reference genome sequence exists or whose genomes exhibit complex structural variation. The method can be used with DNA from diverse sources, including heterogeneous tissues such as tumors, rare and precious samples such as embryonic stem cells, and non-dividing cells such as neurons. It can also be used with different sequencing platforms and karyotyping methods. Haploid karyotype analysis methods can be found in the following literature: Levy, S. et al. The diploid genome sequence of an individual human. PlosBiol.5, e254 (2007); de Bakker, P. I. et al. A high-resolution HLA and SNP haplotypemap for disease association studies in the extended human MHC. Nat. Genet. 38, 1166-1172 (2006); Nagel, R. L. et al. The Senegal DNA haplotype is associated with the amelioration of anemia in African-American sickle cell anemia patients. haplotypes alter receptor expression and predict in vivo responsiveness.Proc.Natl.Acad.Sci.USA97, 10483-10488(2000); Sun, T.et al.Haplotypes in matrix metalloproteinase gene cluster onchromosome11q22contribute to the risk of lung cancer development and progression.Clin.Cancer Res.12, 7009-7017(2008); Kitzman, J.O.et al.Haplotype-resolved genome sequencing of a Gujarati Indian individual.Nat.Biotech.29, 59-63(2011); International HapMap Consortium.Integrating common and rare genetic variation in diverse human populations.Nature467, 52-58(2010); Xiao, M.et.Al.Direct determination of haplotypes from single DNAmolecules. Nat. Methods 6, 199-201 (2009); Fan H.C. et al. Whole-genome molecular haplotyping of single cells. Nat. Biotech. 29, 51-57 (2011).

该专利方法其它相关特性与优势在以下的描述中有更明确的阐述。Other relevant features and advantages of the patented method are more clearly explained in the following description.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

该专利上述方法及其它相关特性与优势在下列使用示意中得到明确的阐述。The above-mentioned method of this patent and other related characteristics and advantages are clearly explained in the following usage diagrams.

图1:单细胞DNA扩增示意图Figure 1: Schematic diagram of single-cell DNA amplification

图2:单细胞基因组测序结果,测序深度15x。Figure 2: Single-cell genome sequencing results, sequencing depth 15x.

图2A:单细胞基因组测序1号染色体全部结果图。Figure 2A: Single-cell genome sequencing results for all chromosomes 1.

图3:用于高通量测序的单细胞DNA扩增流程图,包括线性预扩增与指数扩增阶段。Figure 3: Flowchart of single-cell DNA amplification for high-throughput sequencing, including linear pre-amplification and exponential amplification stages.

图4:该专利方法,-多次退火环状循环扩增法(MALBAC)示意图。首先,引物与单链DNA模板结合,被具有链置换活性的聚合酶延伸,产生半扩增物。然后,产生两端有互补序列的单链完全扩增物,后者在中间温度,两端自连,从而在该步骤不能用作模板,而实现近线性扩增,避免了常见的扩增偏向。Figure 4: Schematic diagram of the patented method, Multiple Annealing Loop Amplification Cycles (MALBAC). First, a primer binds to a single-stranded DNA template and is extended by a polymerase with strand-displacing activity, producing a semi-amplifier. Next, a single-stranded fully amplified product with complementary sequences at both ends is generated. This fully amplified product self-ligates at an intermediate temperature, rendering it ineffective as a template at this step. This results in near-linear amplification, avoiding common amplification bias.

图5A为Lorenz曲线,比较MALBAC,MDA,非扩增样品的测序广度的均一性。单细胞与非扩增样品来自于SW480细胞株。MDA方法的资料来自于Fan et al.,Nat.Biotech.29,5192011).完全均一无偏向的扩增在图中应当是一条对角线,与对角线偏离越远代表扩增偏向越严重。箭头代表没有被扩增的基因组部分(MDA55%,MALBACl5%,未扩增样品5%)。图5B为全基因组读序分布图,MALBAC与未扩增样品结果相似,MDA方法则有大量的低密度区,代表了几个百万碱基大小的区域在扩增物中被低估或高估。Figure 5A is a Lorenz curve comparing the uniformity of sequencing breadth for MALBAC, MDA, and unamplified samples. Single-cell and unamplified samples are from the SW480 cell line. Data for the MDA method are from Fan et al., Nat. Biotech. 29, 519 (2011). Completely uniform and unbiased amplification should appear as a diagonal line in the graph, with the greater the deviation from the diagonal line, the more severe the amplification bias. The arrows represent the portion of the genome that was not amplified (MDA 55%, MALBACl 5%, unamplified sample 5%). Figure 5B is a whole-genome read distribution diagram. The results for MALBAC and unamplified samples are similar, while the MDA method has a large number of low-density regions, representing regions of several million bases that are underestimated or overestimated in the amplified product.

具体实施方式DETAILED DESCRIPTION

该专利涉及到一般常规操作方法,如分子生物学技术,重组DNA技术,微生物学技术,在文献中已有很详细的描述。参见:Sambrook,Fritsch,and Maniatis,MolecularCloning:A laboratory Manual,2nd Edition(1989),Oligonucleotide synthesis(M.J.Gait,Ed.,1984),Animal Cell Culture(R.I.Freshney,Ed.,1987),Methods inEnzymology(Academic Press,Inc系列,Gene Transfer Vectors for Mammalian Cells(J.M.Miller and M.P.Calos eds,1987),Handbook of Experimental Immunology(D.M.Weir and C.C.Blackwell,Eds),Current Protocols in Molecular Biology(F.M.Ausubel et al eds.,1987),Current Protocols in Immunology(J.E.Coligan etal eds.,1991),Annual Review of Immunology,Advances in Immunololgy.This patent involves general conventional operating methods, such as molecular biology techniques, recombinant DNA techniques, and microbiology techniques, which have been described in detail in the literature. See: Sambrook, Fritsch, and Maniatis, Molecular Cloning: A laboratory Manual, 2nd Edition (1989), Oligonucleotide synthesis (MJGait, Ed., 1984), Animal Cell Culture (RIFreshney, Ed., 1987), Methods in Enzymology (Academic Press, Inc Series, Gene Transfer Vectors for Mammalian Cells (JMMiller and MPCalos) eds, 1987), Handbook of Experimental Immunology (DMWeir and CCBlackwell, Eds), Current Protocols in Molecular Biology (FMAusubel et al eds., 1987), Current Protocols in Immunology (JEColigan etal eds., 1991), Annual Review of Immunology, Advances in Immunololgy.

核酸化学,生化,遗传,分子生物学的术语与符号,采用专业内标准用法。具体如Komberg and Baker,DNA Replication,2nd Edition(W.H.Freeman,NY,1992);Lehninger,Biochemistry,2nd Edition(Worth Publishers,NY1975);Strachan and Read,HumanMolecular Genetics,2ndEdition(Wiley-Liss,NY,1999);Eckstein,editor,Oligonucleotides and analogs:A practical Approach(Oxford University Press,NY,1991);Gait,editor,Oligonucleotide synthesis:A Practical Approach(IRL Press,Oxford,1984)等。Terms and symbols used in nucleic acid chemistry, biochemistry, genetics, and molecular biology follow standard professional usage, such as Komberg and Baker, DNA Replication, 2nd Edition (WH Freeman, NY, 1992); Lehninger, Biochemistry, 2nd Edition (Worth Publishers, NY, 1975); Strachan and Read, Human Molecular Genetics, 2nd Edition (Wiley-Liss, NY, 1999); Eckstein, editor, Oligonucleotides and analogs: A practical approach (Oxford University Press, NY, 1991); and Gait, editor, Oligonucleotide synthesis: A practical approach (IRL Press, Oxford, 1984).

该发明部分基于DNA或RNA(如mRNA)扩增方法的建立,使用基因组或转录组材料或两者。这种DNA或RNA可能来自于单细胞,或同一种细胞的少数几个细胞。该方法可以用于扩增任何物种或生物体的DNA,可以在一个PCR管里的一个扩增反应。DNA也可以分成若干亚份,用移液枪或其它微量移液设备,使得每一亚份只包含部分原来的DNA,可能代表了亚细胞水平的基因组DNA,即单细胞基因组的DNA的一部分,并得到相应的扩增与分析。该方法可用于并不依赖特定DNA序列的DNA扩增,其DNA可来源于但不限于人,动物,植物,真菌,病毒等真核或原核细胞DNA。The invention is based in part on the establishment of a method for amplifying DNA or RNA (such as mRNA) using genomic or transcriptomic material or both. This DNA or RNA may come from a single cell, or a few cells of the same cell. The method can be used to amplify the DNA of any species or organism, and can be used in an amplification reaction in a PCR tube. The DNA can also be divided into several sub-portions, using a pipette or other micro-pipetting device so that each sub-portion contains only part of the original DNA, which may represent genomic DNA at the subcellular level, that is, part of the DNA of a single cell genome, and is amplified and analyzed accordingly. The method can be used for DNA amplification that does not rely on a specific DNA sequence, and the DNA can come from but is not limited to human, animal, plant, fungal, viral or other eukaryotic or prokaryotic cell DNA.

该发明方法可用于扩增几乎整个基因组或转录组的全部基因(分别称为全基因组扩增与全转录组扩增),并不会因为丢失某些特定部位而失去扩增序列的可代表性。‘几乎整个’指包含了全基因组序列的80-99%以上的序列。这种扩增有时会对基因组不同部位的序列造成非等量扩增。This inventive method can be used to amplify nearly all genes in a genome or transcriptome (respectively, termed whole-genome amplification and whole-transcriptome amplification) without losing the representativeness of the amplified sequence due to missing specific regions. "Nearly the entire genome" refers to sequences encompassing at least 80-99% of the genome. This amplification method can sometimes result in unequal amplification of sequences in different parts of the genome.

该发明方法可以在单个反应管或微孔板分两步进行。第一步产生扩增库,所用引物包含一段共同序列,一段可变序列和固定序列,以及至少一个具有链置换活性或核酸核酸外切酶活性的聚合酶。该方法包括一个退火后步骤,即使游离引物与扩增物的末端结合,从而防止扩增物自身或相互结合。这种线性扩增可以明显减少扩增偏向的问题。特定的引物设计与退火后步骤,使得基因组的扩增广度达到了最大化。扩增库的分子在第二步进行标准PCR扩增,用Taq聚合酶以及针对已知序列的引物,从而对全基因组或全转录组进行数千倍的无偏向扩增,其产物可以再进一步扩增至数百万倍以上。The method of the invention can be carried out in two steps in a single reaction tube or microplate. In the first step, an amplification library is generated, and the primers used contain a common sequence, a variable sequence and a fixed sequence, and at least one polymerase with chain displacement activity or nucleic acid exonuclease activity. The method includes a post-annealing step, which allows the free primers to bind to the ends of the amplification products, thereby preventing the amplification products from binding to themselves or each other. This linear amplification can significantly reduce the problem of amplification bias. The specific primer design and post-annealing step maximize the amplification breadth of the genome. In the second step, the molecules in the amplification library are amplified by standard PCR, using Taq polymerase and primers for known sequences, thereby performing unbiased amplification of the entire genome or the entire transcriptome by thousands of times, and the products can be further amplified to more than millions of times.

该发明方法使用至少一个DNA聚合酶进行DNA扩增,引物结合到DNA模板,聚合酶从引物的3’端开始合成互补链。具体步骤可以分为模板DNA的变性,引物与模板DNA的结合,聚合酶在模板上延长引物,合成互补链。This method uses at least one DNA polymerase to amplify DNA. Primers bind to a DNA template, and the polymerase synthesizes a complementary strand starting from the 3' end of the primer. The specific steps can be divided into denaturation of the template DNA, binding of the primer to the template DNA, and extension of the primer by the polymerase on the template to synthesize the complementary strand.

一般来说,DNA扩增开始于dsDNA模板的高温变性,变成ssDNA,降低温度(退火),引物与ssDNA模板结合,体系中的一个或多个聚合酶在模板上延长引物,合成互补链。DNA聚合酶可能含有5’至3’的核酸核酸外切酶活性或链置换活性。Generally speaking, DNA amplification begins with the high-temperature denaturation of a dsDNA template into ssDNA. The temperature is then lowered (annealed), allowing primers to bind to the ssDNA template. One or more polymerases in the system then extend the primers along the template, synthesizing a complementary strand. DNA polymerases may contain 5' to 3' exonuclease activity or strand displacement activity.

第一扩增循环后,新合成的互补链的5’端带有引物的序列。上述变性,退火,延伸步骤再被循环重复。After the first amplification cycle, the 5' end of the newly synthesized complementary chain carries the primer sequence. The above denaturation, annealing, and extension steps are repeated.

每一循环结束后,扩增的DNA分子的一端带有引物的序列,另一端带有引物的互补序列。反应条件适当时,游离引物可以与模板结合,从而防止扩增物自身或相互结合。After each cycle, the amplified DNA molecule has the primer sequence at one end and the primer's complementary sequence at the other end. When the reaction conditions are appropriate, the free primer can bind to the template, thus preventing the amplicons from binding to themselves or to each other.

DNA模板可以是基因组DNA,微解剖染色体DNA,YAC,粘粒DNA,噬菌体DNA,PAC DNA,BAC DNA;可以来自于哺乳动物,植物,真菌,病毒,或原核DNA;可以来自于人,牛,狗,猪,鼠,鸟,鱼,虾,植物,真菌,病毒,或细菌。The DNA template can be genomic DNA, microdissected chromosomal DNA, YAC, cosmid DNA, phage DNA, PAC DNA, BAC DNA; can be from mammals, plants, fungi, viruses, or prokaryotic DNA; can be from humans, cows, dogs, pigs, mice, birds, fish, shrimp, plants, fungi, viruses, or bacteria.

该发明方法所用引物包含一段共同(恒定)序列,一段可变序列与一段固定序列,有时也称为类简并引物。共同序列与可变序列包含基本不与自身或反应中的其它引物互补的序列。共同序列最好是已知的,可以是扩增目标序列。可变序列可以随机选择,也可以根据模板DNA来源与序列特性做相应选择。共同序列是反应中各引物共有的,10-30bp长,不包含C。可变序列3-7bp长,可以包含任何碱基,如果是5个碱基长,则可以有4的5次方个序列组合。固定序列2-4bp长,不含互补序列,如GGG,TTT,GAA,ATG。The primers used in the inventive method include a common (constant) sequence, a variable sequence and a fixed sequence, sometimes also referred to as quasi-degenerate primers. The common sequence and the variable sequence contain sequences that are essentially not complementary to themselves or other primers in the reaction. The common sequence is preferably known and can be the amplification target sequence. The variable sequence can be selected randomly or selected accordingly based on the source of the template DNA and the sequence characteristics. The common sequence is common to all primers in the reaction, is 10-30bp long, and does not contain C. The variable sequence is 3-7bp long and can contain any base. If it is 5 bases long, there can be 4 to the power of 5 sequence combinations. The fixed sequence is 2-4bp long and does not contain complementary sequences, such as GGG, TTT, GAA, and ATG.

典型的引物可以是5’-GT GAG TGA TGG TTG AGG TAG TGT GGAGNNNNNGGG-3′和5’-GT GAG TGA TGG TTG AGG TAG TGTGGA GNNNNNTTT-3’。5N3G与5N3T对也可以被下列对替代:5N3G与5N3A对;5N3C与5N3T对;5N3C与5N3A对。共同序列部分可以修饰以减少引物二聚体形成。Typical primers include 5'-GT GAG TGA TGG TTG AGG TAG TGT GGAGNNNNNGGG-3' and 5'-GT GAG TGA TGG TTG AGG TAG TGTGGA GNNNNNTTT-3'. The 5N3G and 5N3T pair can also be replaced by the following pairs: 5N3G and 5N3A; 5N3C and 5N3T; and 5N3C and 5N3A. The consensus sequence can be modified to reduce primer-dimer formation.

在反应混合液中,有变性后单链的模板与引物,引物可以是一个,也可以是多个简并引物,他们能至少与模板单链分子的某一段结合。模板可以是DNA或RNA。The reaction mixture contains a denatured single-stranded template and primers. The primers can be one or more degenerate primers that bind to at least a certain portion of the single-stranded template molecule. The template can be DNA or RNA.

在反应混合液中,有引物,模板,以及至少一种带有5’至3’的核酸核酸外切酶活性或链置换活性的DNA聚合酶。链置换活性的DNA聚合酶可以在延伸时打开双链DNA模板,包括聚合酶,Bst聚合酶,pyrophage3173,vent聚合酶,deep vent聚合酶,TOPO Taq,vent(exo-)聚合酶,deep vent(exo-)聚合酶,9°Nm聚合酶,DNA聚合酶Klenow片段,MMLV反转录酶,AMV反转录酶,HIV反转录酶,T7噬菌体DNA聚合酶的变异型(无核酸外切酶活性),或者他们的任何混合物。一个或多个聚合酶含有5’至3’的核酸核酸外切酶活性,如Taq聚合酶,BstDNA聚合酶(全长),大肠杆菌DNA聚合酶,LongAmp Taq聚合酶,OneTaq聚合酶,或者他们的任何混合物。The reaction mixture contains primers, a template, and at least one DNA polymerase with 5' to 3' exonuclease activity or strand displacement activity. The DNA polymerase with strand displacement activity can open the double-stranded DNA template during extension, and includes polymerase, Bst polymerase, pyrophage 3173, vent polymerase, deep vent polymerase, TOPO Taq, vent (exo-) polymerase, deep vent (exo-) polymerase, 9°Nm polymerase, DNA polymerase Klenow fragment, MMLV reverse transcriptase, AMV reverse transcriptase, HIV reverse transcriptase, a variant of T7 bacteriophage DNA polymerase (without exonuclease activity), or any mixture thereof. The one or more polymerases contain 5' to 3' exonuclease activity, such as Taq polymerase, Bst DNA polymerase (full length), Escherichia coli DNA polymerase, LongAmp Taq polymerase, OneTaq polymerase, or any mixture thereof.

混合好的反应物会经过多个扩增循环。退火温度在30℃以下,延伸温度可以根据聚合酶种类在10-65℃以上。聚合酶在30℃以上活性最好,Bst聚合酶,pyrophage3173(exo-)在62℃以上活性最好。变性温度在90-100℃。另外有个附加的退火温度,55-60℃,一般选用58℃,扩增物变性后的混合物,在这个温度,扩增物自身两端结合,暂时不作为模板,从而实现线性扩增。The mixed reactants undergo multiple amplification cycles. The annealing temperature is below 30°C, and the extension temperature can be above 10-65°C depending on the type of polymerase. Polymerase activity is best above 30°C, while Bst polymerase and pyrophage3173 (exo-) are most active above 62°C. The denaturation temperature is between 90-100°C. There is also an additional annealing temperature of 55-60°C, with 58°C generally used. After the denatured amplicon is denatured, the mixture binds to its own ends at this temperature and temporarily stops serving as a template, thus achieving linear amplification.

扩增物可以进行处理,去除两个末端引物序列,或再进行常规PCR扩增,或直接进行高通量测序分析如果量足够的话。The amplicon can be processed to remove the two terminal primer sequences, or further subjected to conventional PCR amplification, or directly subjected to high-throughput sequencing analysis if the quantity is sufficient.

DNA变性温度在90-100℃,典型的是95℃,变性10秒到5分钟,典型的是2分钟。The DNA denaturation temperature is 90-100°C, typically 95°C, and the denaturation time is 10 seconds to 5 minutes, typically 2 minutes.

退火温度在0-30℃,退火10秒到5分钟。延长温度取决于所用的DNA聚合酶,多个聚合酶存在时可以用温度梯度分别激活不同酶活性,延长时间2-7分钟,典型的是5分钟。Annealing temperature is 0-30°C, annealing time is 10 seconds to 5 minutes. The extension temperature depends on the DNA polymerase used. When multiple polymerases are present, a temperature gradient can be used to activate different enzyme activities. The extension time is 2-7 minutes, typically 5 minutes.

上述温度与时间等参数可以适当改变,一般不会明显影响扩增效果。The above parameters such as temperature and time can be changed appropriately and generally will not significantly affect the amplification effect.

图3所示,对分离好的单细胞进行裂解,释放基因组DNA,并进行线性预扩增,再进行常规PCR扩增,其最终产物用于高通量测序或筛选。As shown in Figure 3, the separated single cells are lysed to release genomic DNA, which is then linearly pre-amplified and then subjected to conventional PCR amplification. The final product is used for high-throughput sequencing or screening.

根据图1,3和4所示,多重退火成环循环扩增技术(MALBAC)的单细胞基因组扩增条件最初是一个线性预扩增过程,用于扩增出可以覆盖基因组的绝大部分区域的重复片段。预扩增之后是传统的指数扩增,比如PCR,用于进一步扩增基因组。扩增物可以用于第二代测序。As shown in Figures 1, 3, and 4, the single-cell genome amplification conditions of MALBAC (Multiple Annealing and Looping) technology begin with a linear pre-amplification process, which is used to amplify repetitive fragments that cover the majority of the genome. This pre-amplification is followed by traditional exponential amplification, such as PCR, to further amplify the genome. The amplicon can then be used for next-generation sequencing.

根据图1,3和4所示,单细胞的核苷酸,比如染色体,是扩增混合体系中的模板。单细胞的核苷酸变性成为单链并作为扩增的模板。在适合的条件下,大量的随机引物与单链核苷酸杂交,在具有链置换活性的聚合酶作用下合成核酸单链模板的互补片段。随机引物包括一段固定碱基和一段简并碱基。扩增过程起始于核酸单链模板中的随机位点。以核酸单链为模板扩增出来的核酸片段被称为第一轮扩增物或者半扩增物。As shown in Figures 1, 3, and 4, the nucleotides of a single cell, such as a chromosome, serve as the template in the amplification system. The nucleotides of the single cell are denatured into single strands and serve as the template for amplification. Under suitable conditions, a large number of random primers hybridize with the single-stranded nucleotides, and under the action of a polymerase with strand displacement activity, a complementary fragment of the single-stranded nucleic acid template is synthesized. The random primers consist of a fixed base sequence and a degenerate base sequence. The amplification process begins at a random site within the single-stranded nucleic acid template. The nucleic acid fragments amplified using the single-stranded nucleic acid template are called first-round amplicons or semi-amplicons.

与核酸单链模板杂交的互补序列可在合适的条件下变性为单链,可最大限度地提高可用于与引物杂交的核酸单链模板量。扩增混合体系中的核酸双链在链置换聚合酶的作用下变性成单链,增加可与引物和扩增杂交的核酸单链量。Complementary sequences hybridizing to single-stranded nucleic acid templates can be denatured into single strands under appropriate conditions, maximizing the amount of single-stranded nucleic acid template available for hybridization with primers. Double-stranded nucleic acids in the amplification mixture are denatured into single strands by the action of strand-displacing polymerases, increasing the amount of single-stranded nucleic acid available for hybridization with primers and amplicon.

第一轮扩增物也就是半扩增物的5’端和3’端具有互补性,使半扩增物可以形成环状结构,即为半扩增环。半扩增环的扩增效率很低,半扩增环防止进一步扩增,这是线性扩增的重要步骤。The first round of amplification, also known as the half-amplification, has complementary 5' and 3' ends, allowing it to form a circular structure, known as a semi-amplification ring. The amplification efficiency of the semi-amplification ring is very low, and the semi-amplification ring prevents further amplification, which is an important step in linear amplification.

线性第一轮扩增物,与核酸单链共同存在于扩增混合体系中。第一轮扩增物和核酸单链作为第二轮反应中的引物模板。The linear first-round amplicon is present in the amplification mixture along with the single-stranded nucleic acid. The first-round amplicon and the single-stranded nucleic acid serve as primer templates in the second-round reaction.

在适合的条件下,引物杂交在第一轮扩增物和核苷酸单链的不同位置。Under appropriate conditions, the primers hybridize to different positions on the first round amplicon and the nucleotide single strand.

在适合的条件下,链置换活性的聚合酶作用于第一轮扩增物与随机引物的结合部位,扩增出各个第一轮扩增物的互补序列。这个扩增出来的互补序列的末端具有引物的互补序列。以第一轮扩增物为模板扩增出来的产物即为第二轮扩增物或者全扩增物。同样,链置换活性的聚合酶也作用于核酸单链与随机引物的结合处,扩增出核酸单链的互补序列。以遗传核酸单链为模板扩增出的产物是第一轮扩增物。如上所述,一些第一轮扩增物会形成环状的半扩增物。Under suitable conditions, a polymerase with strand displacement activity acts on the binding site between the first-round amplicons and random primers, amplifying the complementary sequences of each first-round amplicons. The ends of these amplified complementary sequences have the complementary sequences of the primers. The products amplified using the first-round amplicons as templates are the second-round amplicons or full amplicons. Similarly, a polymerase with strand displacement activity also acts on the binding site between a single-stranded nucleic acid and random primers, amplifying the complementary sequences of the single-stranded nucleic acid. The products amplified using a single-stranded nucleic acid as a template are the first-round amplicons. As mentioned above, some first-round amplicons will form circular half-amplicons.

互补片段通过退火步骤与扩增混合体系中的核酸杂交,如核酸单链模板、半扩增物或者全扩增物。在适当的条件下,扩增混合体系中的核酸序列变性为单链,能够提高能与引物和扩增物杂交的核酸单链量。尽管使用了具有链置换活性的聚合酶,但扩增体系中仍然具有双链核酸,这些双链核酸同样可以变性形成单链,增加能与引物和扩增物杂交的核酸单链量。The complementary fragments hybridize to nucleic acids in the amplification mixture, such as single-stranded template nucleic acid, half-amplid, or full-amplid, through an annealing step. Under appropriate conditions, the nucleic acid sequences in the amplification mixture are denatured into single strands, which can increase the amount of single-stranded nucleic acids available for hybridization with the primers and amplification product. Despite the use of a polymerase with strand displacement activity, the amplification system still contains double-stranded nucleic acids, which can also be denatured into single strands, increasing the amount of single-stranded nucleic acids available for hybridization with the primers and amplification product.

为了降低由全扩增物的3’端与另外一个全扩增物或其他核苷酸的5’端结合而导致形成的聚合体,第二轮扩增物的两端互补杂交成一个环状结构,使得第二轮扩增物不能形成聚合体,并不能作为扩增的模板。因此,在适合的条件下,过量的引物与第二轮扩增物的3’端杂交,扩增呈现为线性模式并无聚合体产生。本专利利用全扩增物的5’端和3’端互补成环状来减少聚合体等的产生,并使全扩增物不能作为下一轮扩增循环的模板,同时扩增混合体系中的DNA和半扩增物可以作为下一轮扩增循环的模板。In order to reduce the formation of polymers caused by the 3' end of the full amplification product binding to the 5' end of another full amplification product or other nucleotides, the two ends of the second round of amplification products are complementary and hybridized into a ring structure, so that the second round of amplification products cannot form polymers and cannot serve as a template for amplification. Therefore, under suitable conditions, excess primers hybridize with the 3' end of the second round of amplification products, and amplification appears in a linear pattern without polymer production. This patent uses the 5' and 3' ends of the full amplification products to complement each other into a ring to reduce the production of polymers, etc., and prevent the full amplification product from serving as a template for the next round of amplification cycles. At the same time, the DNA and half-amplification products in the amplification mixture system can serve as templates for the next round of amplification cycles.

在接下来的循环过程中,在适合的条件下,引物与第一轮扩增物以及核酸单链的各个位点杂交并扩增;然而由于第二轮扩增物或者全扩增物的3’端与引物结合或者5’端和3’端杂交形成环状结构,导致不能进行下一步扩增。随后扩增混合体系处于适合于核酸单链和半扩增物扩增出半扩增物和全扩增物的条件下,接着是适合第二轮扩增物或者全扩增物杂交成环状结构的条件。根据需要重复循环过程。During the subsequent cycles, under appropriate conditions, the primers hybridize to and amplify the first-round amplification product and the nucleic acid single strand. However, the second-round amplification product or the full amplification product may be unable to proceed to the next amplification step due to the 3' end of the primer binding or the 5' and 3' ends hybridizing to form a loop. The amplification mixture is then placed under conditions suitable for amplifying the nucleic acid single strand and the half amplification product into a half amplification product and a full amplification product, followed by conditions suitable for the second-round amplification product or the full amplification product to hybridize into a loop. The cycle is repeated as needed.

扩增DNA产物的进一步分析方法包括DNA扩增物的基因型分析。另外,扩增DNA产物可用于识别DNA扩增物的多态性,如单核苷酸多态性分析(SNP)。在实验方案中,SNP可以通过一些众所周知的方法进行检测,包括DNA测序,通过扩增一个PCR产物然后测序;寡核苷酸连接测定法(OLA)、单碱基延生法、等位基因特异性引物延伸法、错配杂交法。发现的SNP最好能证明与表型相关,包括疾病表型等。DNA扩增物可以构建DNA文库,包括非限制的基因组DNA文库、显微染色体DNA文库、BAC文库、YAC文库、PAC文库、cDNA文库、噬菌体文库和质粒文库。Further analysis methods of the amplified DNA products include genotyping analysis of the DNA amplification products. In addition, the amplified DNA products can be used to identify polymorphisms of the DNA amplification products, such as single nucleotide polymorphism analysis (SNP). In the experimental protocol, SNPs can be detected by some well-known methods, including DNA sequencing, by amplifying a PCR product and then sequencing; oligonucleotide ligation assay (OLA), single base extension method, allele-specific primer extension method, mismatch hybridization method. The discovered SNPs are preferably shown to be associated with phenotypes, including disease phenotypes. DNA libraries can be constructed from DNA amplification products, including non-restricted genomic DNA libraries, microchromosomal DNA libraries, BAC libraries, YAC libraries, PAC libraries, cDNA libraries, phage libraries and plasmid libraries.

词条“基因组”定义为个体、细胞以及细胞器所有的全部基因。词条“遗传DNA”定义为个体、细胞以及细胞器所携带的全部DNA。词条“转录组”定义为单细胞的RNA。.The term "genome" is defined as all genes in an individual, cell, or organelle. The term "genetic DNA" is defined as all DNA in an individual, cell, or organelle. The term "transcriptome" is defined as the RNA of a single cell.

词条“核苷”指由嘌呤或嘧啶碱与核糖或脱氧核糖共价连接的分子。核苷包括腺嘌呤核苷、鸟嘌呤核苷、胞嘧啶核苷、尿嘧啶核苷和胸腺嘧啶核苷。核苷还包括一些稀有核苷,如次黄嘌呤核苷、甲基次黄嘌呤核苷、假尿嘧啶核苷、5,6-二氢尿嘧啶核苷、胸腺嘧啶核糖核苷、2N-甲基鸟嘌呤核苷和2,2N,N-二甲基鸟嘌呤核苷。词条“核苷酸”指有核苷和一个或者多个磷酸基团形成的分子。核苷酸包括腺嘌呤核苷酸,鸟嘌呤核苷酸,胞嘧啶核苷酸和胸腺嘧啶核苷酸。词条“多核苷酸”、“寡聚核苷酸”、“核苷酸分子”都是指核苷酸的聚合物,由脱氧核甘酸或者核糖核苷酸通过3′,5′-磷酸二酯键连接而成的大分子。多核苷酸具有各种三维空间结构和多种功能。下面都是多核苷酸:一个基因或者基因片段(例如,一个探针、引物、EST或者SAGE标签)、外显子、内含子、信使RNA(mRNA)、转运RNA、核糖RNA、核糖酶cDNA、重组多核苷酸、支多核苷酸、质粒、载体、DNA序列、RNA序列、核苷酸探针和引物。一个多核苷酸分子由修饰的核苷酸组成,比如甲基化的核苷酸和核苷酸类似物。除非特别申明,本专利的多核苷酸是有双链结构和互补单链组成。一个多核苷酸是有四种核苷酸碱基组成的特殊序列,包括腺嘌呤(A)、胸腺嘧啶(T)、鸟嘌呤(G)、胞嘧啶(C);当多核苷酸是RNA时具有尿嘧啶(U)。因此,多核苷酸序列是由多核苷酸分子的碱基字母表示的。这些字母序列可记录着电脑的数据库中,可运用生物信息学软件进行分析应用,如功能基因组和同源性分析等。The term "nucleoside" refers to a molecule composed of a purine or pyrimidine base covalently linked to a ribose or deoxyribose sugar. Nucleosides include adenine, guanine, cytidine, uridine, and thymidine. Nucleosides also include rare nucleosides such as inosine, methylinosine, pseudouridine, 5,6-dihydrouridine, thymidine ribonucleotide, 2N -methylguanine, and 2,2N ,N-dimethylguanine. The term "nucleotide" refers to a molecule composed of a nucleoside and one or more phosphate groups. Nucleotides include adenine, guanine, cytosine, and thymidine. The terms "polynucleotide,""oligonucleotide," and "nucleotide molecule" all refer to polymers of nucleotides, macromolecules composed of deoxynucleotides or ribonucleotides linked by 3',5'-phosphodiester bonds. Polynucleotides have various three-dimensional structures and numerous functions. The following are all polynucleotides: a gene or gene fragment (e.g., a probe, primer, EST or SAGE tag), exon, intron, messenger RNA (mRNA), transfer RNA, ribonucleic acid (RNA), ribozyme cDNA, recombinant polynucleotide, branched polynucleotide, plasmid, vector, DNA sequence, RNA sequence, nucleotide probe and primer. A polynucleotide molecule is composed of modified nucleotides, such as methylated nucleotides and nucleotide analogs. Unless otherwise stated, the polynucleotides of this patent have a double-stranded structure and complementary single-stranded structures. A polynucleotide is a special sequence composed of four nucleotide bases, including adenine (A), thymine (T), guanine (G), cytosine (C); when the polynucleotide is RNA, it has uracil (U). Therefore, the polynucleotide sequence is represented by the base letters of the polynucleotide molecule. These letter sequences can be recorded in a computer database and can be analyzed using bioinformatics software for applications such as functional genomics and homology analysis.

词条“RNA”、“RNA分子”和“核糖核酸分子”是由核糖核苷酸组成的。词条“DNA”、“DNA分子”和“脱氧核糖核苷酸分子”是由脱氧核糖核苷酸组成的。DNA和RNA都能自然合成(例如,通过DNA复制和转录,分别合成DNA和RNA)。DNA和RNA可以以单链的形式存在(如,ssRNA和ssDNA)或者多链形式(如,dsDNA和dsRNA)。“mRNA”或者“信使RNA”是单链RNA,决定肽链氨基酸排列序列。The terms "RNA,""RNAmolecule," and "ribonucleic acid molecule" are composed of ribonucleotides. The terms "DNA,""DNAmolecule," and "deoxyribonucleotide molecule" are composed of deoxyribonucleotides. Both DNA and RNA are synthesized naturally (e.g., by DNA replication and transcription, respectively). DNA and RNA can exist as single strands (e.g., ssRNA and ssDNA) or multiple strands (e.g., dsDNA and dsRNA). "mRNA" or "messenger RNA" is single-stranded RNA that determines the sequence of amino acids in a peptide chain .

在本专利中,词条“小干扰RNA”(“siRNA”)(在文献中也可作为“短干扰RNA”)指一个能够指导或介导RNA干扰的长10~50个核苷酸(或核苷酸类似物)的RNA(或RNA类似物)。一个siRNA可以为15~30个核苷酸或核苷酸类似物之间,16~25个核苷酸或核苷酸类似物之间,18~23个核苷酸或核苷酸类似物之间,甚至为19~22个核苷酸或核苷酸类似物之间(也就是19,20,21或22个核苷酸或核苷酸类似物)。词条“短”siRNA指一个siRNA由21个核苷酸(或核苷酸类似物)组成,如19,20,21或22个核苷酸。词条“长”siRNA指一个siRNA由24~25个左右的核苷酸能成,如23,24,25或26个核苷酸。短siRNAs可以包括具有介导RNAi能力的少于19个核苷酸的更短siRNA,如16,17或18个核苷酸。长siRNAs可以包括具有介导RNAi能力且不会被进一步加工--如被酶切为短siRNA的、多于26个核苷酸的更长siRNA。In this patent, the term "small interfering RNA"("siRNA") (also referred to as "short interfering RNA" in the literature) refers to an RNA (or RNA analog) of 10 to 50 nucleotides (or nucleotide analogs) in length that can guide or mediate RNA interference. An siRNA can be between 15 and 30 nucleotides or nucleotide analogs, between 16 and 25 nucleotides or nucleotide analogs, between 18 and 23 nucleotides or nucleotide analogs, or even between 19 and 22 nucleotides or nucleotide analogs (i.e., 19, 20, 21 or 22 nucleotides or nucleotide analogs). The term "short" siRNA refers to an siRNA composed of 21 nucleotides (or nucleotide analogs), such as 19, 20, 21 or 22 nucleotides. The term "long" siRNA refers to an siRNA composed of about 24 to 25 nucleotides, such as 23, 24, 25 or 26 nucleotides. Short siRNAs may include shorter siRNAs of less than 19 nucleotides, such as 16, 17 or 18 nucleotides, that have the ability to mediate RNAi. Long siRNAs may include longer siRNAs of more than 26 nucleotides that have the ability to mediate RNAi and are not further processed, such as enzymatically cleaved into short siRNAs .

词条“核苷酸类似物”、“替代核苷酸”和“修饰后的核苷酸”指非标准的核苷酸,包括非自然出现的核苷酸或脱氧核苷酸。核苷酸类似物是任意位点被修饰以改变特定化学性质但仍然存在核苷酸类似物的能力而执行所需功能的核苷酸。如核苷酸可能出现衍生物的5位,如5-(2-氨基)丙基尿苷,5-溴尿核甙,5-丙炔尿苷,5-丙烯尿苷等;6位,如6-(2-氨基)丙基尿苷;腺苷酸及(或)鸟苷酸的8位,如8-丙炔尿苷,8-氯鸟嘌呤核苷,8-fluoroguanosine等。核苷酸类似物还包括deaza核苷酸,如7-deaza-腺苷;O-和N-修饰(如烷基化,如N6-甲基腺苷或文献中其他已知的修饰)核苷酸;或其他杂环修饰的核苷酸类似物如Herdewijn,Antisense Nucleic Acid Drug Dev.,2000Aug.10(4):297-310中所描述的。The terms "nucleotide analogs," "alternative nucleotides," and "modified nucleotides" refer to non-standard nucleotides, including non-naturally occurring nucleotides or deoxynucleotides. Nucleotide analogs are nucleotides that have been modified at any position to alter specific chemical properties while still retaining the ability to perform the desired function as a nucleotide analog. Examples include derivatives of nucleotides at the 5-position (e.g., 5-(2-amino)propyluridine, 5-bromouridine, 5-propynyluridine, 5-propyleneuridine), the 6-position (e.g., 6-(2-amino)propyluridine), and the 8-position (e.g., 8-propynyluridine, 8-chloroguanosine, 8-fluoroguanosine). Nucleotide analogs also include deaza nucleotides, such as 7-deaza-adenosine; O- and N-modified (e.g., alkylated, such as N6-methyladenosine or other modifications known in the literature) nucleotides; or other heterocyclic modified nucleotide analogs such as those described in Herdewijn, Antisense Nucleic Acid Drug Dev., 2000 Aug. 10(4): 297-310.

核酸类似物可能同样由核苷酸中糖分子的修饰产生。如2’OH基团可能由H、OR、R、F、Cl、Br、I、SH、SR、NH2、NHR、NR2、COOR或OR中的一个基团取代,此处的R可被C1-C6烷基、烯基、炔基、芳基等取代或不取代。其他可能的修饰还包括在美国专利Nos.5,858,988,及6,291,438中的描述的。Nucleic acid analogs may also be produced by modifying the sugar moiety in nucleotides. For example, the 2'OH group may be substituted with H, OR, R, F, Cl, Br, I, SH, SR, NH2, NHR, NR2, COOR, or OR, where R may be substituted or unsubstituted with C1-C6 alkyl, alkenyl, alkynyl, or aryl. Other possible modifications include those described in U.S. Patent Nos. 5,858,988 and 6,291,438.

核苷酸的磷酸基团也可能发生修饰,如,用硫(如硫代磷酸)取代磷酸基团上的一个或更多的氧原子,或者通过其他取代作用使核苷酸发挥所需功能,如Eckstein,Antisense Nucleic Acid Drug Dev.2000Apr.10(2):117-21;Rusckowski etal.Antisense Nucleic Acid Drug Dev.2000Oct.10(5):333-45;Stein,AntisenseNucleic Acid Drug Dev.2001Oct.11(5):317-25;Vorobjev et al.Antisense NucleicAcid Drug Dev.2001Apr.11(2):77-85及美国专利No.5,684,143中所述。上面所述修饰作用(如磷酸基团修饰)均会降低核苷酸类似物在体内及体外试验下的杂交效率。The phosphate groups of nucleotides may also be modified, for example, by replacing one or more oxygen atoms on the phosphate group with sulfur (e.g., phosphorothioate), or by other substitutions to enable the nucleotide to perform the desired function, as described in Eckstein, Antisense Nucleic Acid Drug Dev. 2000 Apr. 10(2): 117-21; Rusckowski et al. Antisense Nucleic Acid Drug Dev. 2000 Oct. 10(5): 333-45; Stein, Antisense Nucleic Acid Drug Dev. 2001 Oct. 11(5): 317-25; Vorobjev et al. Antisense Nucleic Acid Drug Dev. 2001 Apr. 11(2): 77-85 and U.S. Patent No. 5,684,143. The above modifications (e.g., phosphate group modifications) will reduce the hybridization efficiency of nucleotide analogs in in vivo and in vitro assays.

在本专利中,词条“分离RNA”(也就是“分离mRNA”)指在利用重组技术时将RNA分子与其他细胞物质或培养基彻底分离;或者指利用合成技术时彻底分离化学前体物质或其他化学物质。In this patent, the term "isolated RNA" (i.e., "isolated mRNA") refers to the complete separation of RNA molecules from other cellular materials or culture medium when using recombinant technology; or the complete separation of chemical precursors or other chemicals when using synthetic technology.

词条“体外”是指涉及纯化试剂或提取试剂,或者细胞提取等环境下。词条“体内”是指与活细胞相关,也就是永生细胞、初代细胞、细胞系和(或)有机体内的细胞。The term "in vitro" refers to the use of purification reagents or extraction reagents, or in environments such as cell extraction. The term "in vivo" refers to the use of living cells, that is, immortalized cells, primary cells, cell lines, and/or cells within an organism.

词条“互补”与“互补性”用于与碱基配对规则相关的核酸序列。例如,序列5’-AGT-3’与序列5’-ACT-3’互补。互补可以是部分的,也可以是整体的。部分互补是指当一个或更多的核酸碱基不能通过碱基配对规则匹配时的情况。整体或完全互补是指一个核酸间的每一个碱基均与另一个核酸相匹配。核酸链之间互补的程度强烈地影响核酸链杂交效率和强度。The terms "complementary" and "complementarity" are used to describe nucleic acid sequences with respect to the base pairing rules. For example, the sequence 5'-AGT-3' is complementary to the sequence 5'-ACT-3'. Complementarity can be partial or complete. Partial complementarity occurs when one or more nucleic acid bases cannot be matched according to the base pairing rules. Complete or total complementarity occurs when every base in one nucleic acid matches every base in the other. The degree of complementarity between nucleic acid chains strongly influences the efficiency and strength of hybridization.

词条“同源性”在表示核酸之间的关系指其互补程度。可能存在部分同源(也就是部分一致)与完全同源(也就是完全一致)。部分互补序列是可以至少部分抑制一个完全互补的序列与目标核酸杂交的序列,与功能性词条“高度一致”相关。对完全互补序列与目标序列杂交的抑制可以在低严谨性条件下通过杂交试验(Southern或Northern印迹检测,或液相杂交等)检测。一个高度同源序列或探针(也就是能够与目标寡聚核苷酸杂交的寡聚核苷酸)可以在低严谨条件下与完全同源序列竞争并抑制其与靶标的结合(也就是杂交)。这并不是说低严谨性条件会产生非特异性的结合;低严谨性条件需要相互结合的两个序列特异性(也就是选择性)相互作用。非特异性结合是否存在可以通过使用另一个不存在部分互补性(也就是一致性少于30%)的靶标检测;不存在非特异性结合的探针不会与非互补靶标杂交。The term "homology" when referring to a relationship between nucleic acids refers to the degree of complementarity. There can be partial homology (i.e., partial identity) as well as complete homology (i.e., complete identity). A partially complementary sequence is one that can at least partially inhibit the hybridization of a completely complementary sequence to a target nucleic acid, and is related to the functional term "high identity." Inhibition of hybridization of a completely complementary sequence to a target sequence can be detected under low stringency conditions in hybridization assays (e.g., Southern or Northern blotting, or liquid phase hybridization). A highly homologous sequence or probe (i.e., an oligonucleotide capable of hybridizing to a target oligonucleotide) can compete with and inhibit the binding (i.e., hybridization) of a completely homologous sequence to the target under low stringency conditions. This does not mean that low stringency conditions will result in nonspecific binding; low stringency conditions require that the two sequences that bind to each other interact specifically (i.e., selectively). The presence of nonspecific binding can be detected by using a second target that is not partially complementary (i.e., less than 30% identical); a probe that does not exhibit nonspecific binding will not hybridize to a non-complementary target.

当词条“高度同源”被用于cDNA或基因组克隆等双链核酸时,指的是在低严谨性条件下,任意探针可以与双链核酸中的一条或两条链杂交。当词条“高度同源”被用于单核苷酸链时,指的是在低严谨条件下,任意探针可以与单链核酸序列杂交。When the term "highly homologous" is applied to double-stranded nucleic acids such as cDNA or genomic clones, it means that any probe can hybridize to one or both strands of the double-stranded nucleic acid under low stringency conditions. When the term "highly homologous" is applied to single nucleotide chains, it means that any probe can hybridize to the single-stranded nucleic acid sequence under low stringency conditions.

下面的词条用于描述两个或更多个序列间的关系:“参考序列”,“序列一致性”,“一致序列百分比”和“高度一致”。一个“参考序列”是一个用于序列比较的依据;一个参考序列可能是一个长序列的一个部分,如一个给定的序列表中一个全长cDNA的一部分或由一个完整的基因序列组成。一般来说,一个参考序列长20个核苷酸,通常至少为25个核苷酸长,更常见的为50个核苷酸以上。因为两个多聚核苷酸可能(1)有一段序列(也就是完整多聚核苷酸序列的一个部分)在两个多聚核苷酸序列中相似,(2)可能还有一部分存在不同,2个(或更多)序列间比较的典型方式是在“比对窗口”中对比两个序列,鉴定对比一个区域中的序列一致性。“对比窗口”在此处指一个概念上的最短20个连续的核苷酸位置片段,该片段可能与参考序列的至少20个连续核苷酸匹配。比对窗口部分的多聚核苷酸与参考序列(不存在增加或缺失)相比可能存在20%或更少的增加或缺失(也就是gaps)部分。比对窗口内的序列间最佳匹配可以通过使用电脑运行Smith and Waterman局部同源算法(Smithand Waterman,(1981)Adv.Appl.Math.2:482),Needleman and Wunsch(J.Mol.Biol.48:443(1970))局部同源算法,Pearson and Lipman(Proc.Natl.Acad.Sci.USA85:2444(1988))相似性搜索方法等算法(Wisconsin遗传学软件包7.0版中的GAP,BESTFIT,FASTA,以及TFASTA,Genetics Computer Group,575Science Dr.,Madison,Wis.)得出,或运行多种方法并选择其中的最佳匹配(也就是在比对窗口中一致性比例最高的结果)。The following terms are used to describe the relationship between two or more sequences: "reference sequence," "sequence identity," "percentage of identical sequences," and "highest identity." A "reference sequence" is a basis for sequence comparison; a reference sequence may be a portion of a longer sequence, such as a portion of a full-length cDNA in a given sequence listing or consist of a complete gene sequence. Typically, a reference sequence is 20 nucleotides long, often at least 25 nucleotides long, and more often 50 or more nucleotides long. Because two polynucleotides may (1) have a sequence (i.e., a portion of the complete polynucleotide sequence) that is similar in both polynucleotide sequences and (2) may also differ in other regions, a typical approach to comparing two (or more) sequences is to compare the two sequences over a "comparison window" and identify the sequence identity within that region. The "comparison window" here refers to a conceptually shortest segment of 20 consecutive nucleotide positions that may match at least 20 consecutive nucleotides in the reference sequence. The polynucleotides in the comparison window may have 20% or less additions or deletions (i.e., gaps) compared to the reference sequence (without additions or deletions). The best match between sequences within the alignment window can be obtained by using a computer to run the Smith and Waterman local homology algorithm (Smith and Waterman, (1981) Adv. Appl. Math. 2: 482), the Needleman and Wunsch (J. Mol. Biol. 48: 443 (1970)) local homology algorithm, the Pearson and Lipman (Proc. Natl. Acad. Sci. USA 85: 2444 (1988)) similarity search method and other algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package Version 7.0, Genetics Computer Group, 575 Science Dr., Madison, Wis.), or by running multiple methods and selecting the best match (i.e., the result with the highest proportion of identity in the alignment window).

词条“序列一致”指两个多聚核酸序列在比对区域内是一致的(也就是形成核苷酸-核苷酸对)。词条“一致序列百分比”是通过两个序列的匹配片段与比对区域比较计算得出,测定两个序列中一致的核酸碱基(也就是A、T、C、G、U或I)位置数目,用一致的核酸数除以比较区域(也就是比对窗口windows size)内的位置总数,所得结果再乘以100,获得序列一致性百分比。词条“高度同源”用于描述一个多聚核酸序列,该序列与另一序列一致性至少为85%,最好具有90~95%的序列一致性,更常用的情况指当在比对区为20个核苷酸以上,通常为25~50个核苷酸时,序列与参考序列相比有99%及以上的序列一致性。参考序列可能是一个长序列的一部分,如本发明权利要求的全长序列的一个片段。The term "sequence identity" refers to two polynucleotide sequences that are identical (i.e., form nucleotide-nucleotide pairs) within the alignment region. The term "percentage of identical sequences" is calculated by comparing matching segments of the two sequences to the alignment region, determining the number of identical nucleic acid base positions (i.e., A, T, C, G, U, or I) in the two sequences, dividing the number of identical nucleic acid positions by the total number of positions within the comparison region (i.e., the alignment window size), and multiplying the result by 100 to obtain the percentage of sequence identity. The term "highly homologous" is used to describe a polynucleotide sequence that is at least 85% identical to another sequence, preferably having a sequence identity of 90-95%, and more generally refers to a sequence that is 99% or more identical to a reference sequence over an alignment region of 20 or more nucleotides, typically 25-50 nucleotides. The reference sequence may be a portion of a longer sequence, such as a fragment of the full-length sequence claimed in the present invention.

词条“杂交”指的是互补核酸的配对。杂交和杂交强度(也就是核酸间相互结合的强度)受核酸间的互补程度、条件的严谨性、杂交形成的Tm值及核酸中的G:C所占比例等条件的影响。The term "hybridization" refers to the pairing of complementary nucleic acids. Hybridization and its strength (i.e., the strength of the binding between nucleic acids) are influenced by factors such as the degree of complementarity between the nucleic acids, the stringency of the conditions, the Tm of the hybridization, and the G:C ratio of the nucleic acids.

词条“Tm”指的是核酸的溶解温度。溶解温度是双链核酸分子中有一半解链成单链分子时的温度。Tm值的计算参考方法,当核酸在1M NaCl的水溶液中时,Tm值可以通过下面的公式简单估算:Tm=81.5+0.41(%G+C)(参考:Anderson and Young,QuantitativeFilter Hybridization,in Nucleic Acid Hybridization(1985)等文献)。某些更加复杂的Tm值计算方法将核酸结构和序列特性同时考虑在内。The term "Tm" refers to the melting temperature of nucleic acids. The melting temperature is the temperature at which half of a double-stranded nucleic acid molecule dissociates into single strands. For nucleic acid in a 1M NaCl aqueous solution, the Tm value can be simply estimated using the following formula: Tm = 81.5 + 0.41 (% G + C) (see, for example, Anderson and Young, Quantitative Filter Hybridization, in Nucleic Acid Hybridization (1985)). Some more complex Tm calculation methods take both nucleic acid structure and sequence characteristics into account.

词条“严谨性”指的是在核酸杂交中的温度、离子强度及其他物质如有机溶剂是否存在等条件。The term "stringency" refers to the conditions during nucleic acid hybridization, such as temperature, ionic strength, and the presence of other substances such as organic solvents.

在核酸杂交中,“低严谨条件”指的是:在42℃结合或杂交,杂交溶液中含5×SSPE(43.8g/l NaCl,6.9g/l NaH2PO4(H2O)和1.85g/l EDTA,用NaOH调节pH至7.4),0.1%SDS,5×Denhardt′s试剂(每500ml50×Denhardt′s试剂含:5g Ficoll(Type400;Pharmacia),5gBSA(Fraction V;Sigma))及100μg/ml变性鲑鱼精DNA,继以42℃下用5.0×SSPE(含0.1%SDS)溶液洗涤(若探针长度为500个核酸左右)。In nucleic acid hybridization, "low stringency conditions" refer to binding or hybridization at 42°C, a hybridization solution containing 5×SSPE (43.8 g/l NaCl, 6.9 g/l NaH2PO4 ( H2O ) and 1.85 g/l EDTA , pH adjusted to 7.4 with NaOH), 0.1% SDS, 5×Denhardt's reagent (each 500 ml of 50×Denhardt's reagent contains: 5 g Ficoll (Type 400; Pharmacia), 5 g BSA (Fraction V; Sigma)) and 100 μg/ml denatured salmon sperm DNA, followed by washing with 5.0×SSPE (containing 0.1% SDS) solution at 42°C (if the probe is approximately 500 nucleic acids in length).

在核酸杂交中,“中等严谨性条件”指的是:在42℃结合或杂交,杂交溶液中含5×SSPE(43.8g/l NaCl,6.9g/l NaH2PO4(H2O)和1.85g/l EDTA,用NaOH调节pH至7.4),0.5%SDS,5×Denhardt试剂及100μg/ml变性鲑鱼精DNA,继以42℃下用1.0×SSPE(含1.0%SDS)溶液洗涤(若探针长度为500个核酸左右)。In nucleic acid hybridization, "moderate stringency conditions" refer to binding or hybridization at 42°C, a hybridization solution containing 5×SSPE (43.8 g/l NaCl, 6.9 g/l NaH2PO4 ( H2O ) and 1.85 g/l EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS, 5×Denhardt's reagent and 100 μg/ml denatured salmon sperm DNA, followed by washing with 1.0×SSPE (containing 1.0% SDS) solution at 42°C (if the probe is approximately 500 nucleic acids in length).

“高严谨性反应条件”指用5×SSPE(43.8g/l NaCl,6.9g/l NaH2PO4(H2O),1.85g/lEDTA,并用NaOH调pH7.4),0.5%SDS,5×Denhardt′s试剂及100μg/ml的变性鲑鱼精子DNA混合,42℃孵育杂交探针(约500nt);至探针结合上核酸后用洗脱液(0.1×SSPE,1.0%SDS)在42℃进行洗脱。"High stringency reaction conditions" refer to incubating the hybridization probe (approximately 500 nt) at 42°C with a mixture of 5 × SSPE (43.8 g/l NaCl, 6.9 g/l NaH2PO4 ( H2O ), 1.85 g/l EDTA, and adjusted to pH 7.4 with NaOH), 0.5% SDS, 5×Denhardt's reagent, and 100 μg/ml denatured salmon sperm DNA; after the probe binds to the nucleic acid, elution is performed at 42°C with an elution buffer (0.1×SSPE, 1.0% SDS).

有很多因素会影响反应的严谨性。比如探针的长度、种类、碱基组成;目标分子的的种类、碱基组成、保存条件等;盐离子及其他成分(甲酰胺、硫酸葡聚糖、聚乙二醇的存在与否)的浓度;杂交条件等都是可能造成反应特异性不高的原因。此外,在高严谨性也可以通过提高杂交温度或在洗脱液中加入甲酰胺获得。Many factors can influence reaction stringency. These factors include probe length, type, and base composition; target molecule type, base composition, and storage conditions; the concentrations of salt ions and other components (the presence or absence of formamide, dextran sulfate, and polyethylene glycol); and hybridization conditions. These factors can all contribute to low reaction specificity. Furthermore, high stringency can be achieved by increasing the hybridization temperature or adding formamide to the eluent.

在一些实施方案中,首先确定细胞,继而进行单细胞或多细胞的分离。本专利中所指的细胞包括所有DNA或RNA含量被公认的的细胞。包括各种类型的癌细胞,如肝细胞、卵母细胞、胚胎、干细胞、iPS细胞、ES细胞、神经元细胞、红细胞、黑素细胞、星细胞、生殖细胞、少突细胞、肾脏细胞等。本专利中的方法成功适用用于单细胞的DNA或RNA。而多细胞则包括了从大约2-1,000,000个细胞。In some embodiments, cells are first identified, followed by isolation of single cells or multiple cells. The cells referred to in this patent include all cells whose DNA or RNA content is recognized. This includes various types of cancer cells, such as hepatocytes, oocytes, embryos, stem cells, iPS cells, ES cells, neurons, red blood cells, melanocytes, astrocytes, germ cells, oligodendrocytes, kidney cells, etc. The method in this patent is successfully applied to the DNA or RNA of single cells. Multi-cells include from about 2 to 1,000,000 cells.

本方法中提到的核酸可以是DNA,RNA,或者DNA-RNA嵌合体。这些核酸的来源可以多种多样,比如人类样本,包括全血、血清、血浆、脑脊髓液、脸表皮细胞、乳汁、活体组织切片、精液、尿液、粪便、毛囊、唾液、汗液、免疫沉淀或物理分离的染色质等等。The nucleic acid referred to in the present method can be DNA, RNA, or a DNA-RNA chimera. The nucleic acid can be obtained from a variety of sources, such as human samples including whole blood, serum, plasma, cerebrospinal fluid, facial epidermal cells, breast milk, biopsy tissue sections, semen, urine, feces, hair follicles, saliva, sweat, immunoprecipitated or physically isolated chromatin, etc.

本方法获得的从模板扩增得到的核酸分子可以提供诊断或预后信息。例如,从样品得到经处理的核酸分子中可以获得基因组拷贝和(或)序列信息,等位基因变异信息,癌症诊断,产前诊断,亲子鉴定,疾病诊断、检测、监测以及相应的治疗信息、序列信息等等。The nucleic acid molecules obtained by the present method, amplified from the template, can provide diagnostic or prognostic information. For example, the processed nucleic acid molecules obtained from the sample can provide genomic copy and/or sequence information, allelic variation information, cancer diagnosis, prenatal diagnosis, paternity testing, disease diagnosis, detection, monitoring, and corresponding treatment information, sequence information, and the like.

本文中所指的“单细胞”意为“单细胞”。单细胞的样本来源可以是组织切片、全血或细胞培养液。来源于特定器官、组织、肿瘤等的细胞也适用于本方法。此外,原核(如细菌)或真核(如酵母)微生物种群的细胞也同样适用此法。单细胞悬浮液可由常规方法获得,例如,使用胰蛋白酶或木瓜蛋白酶对样本组织进行酶解或采用物理方法分离细胞。获得的单细胞可分别保存于不同的无菌离心管作单独处理,如96孔板,这样每个孔可以放置一个单细胞。The "single cell" referred to in this article means "single cell". The sample source of the single cell can be tissue sections, whole blood or cell culture fluid. Cells derived from specific organs, tissues, tumors, etc. are also suitable for this method. In addition, cells of prokaryotic (such as bacteria) or eukaryotic (such as yeast) microbial populations are also applicable to this method. Single-cell suspensions can be obtained by conventional methods, for example, using trypsin or papain to enzymatically hydrolyze the sample tissue or using physical methods to separate cells. The obtained single cells can be stored in different sterile centrifuge tubes for separate processing, such as a 96-well plate, so that one single cell can be placed in each well.

目前,常用的操作单细胞的方法包括荧光活化细胞分类(FACS),流式细胞术(FlowCytoMetry,FCM;Herzenberg.,PNAS USA76:1453-55,1979),显微操作,半自动细胞采集(即来自于Stoelting公司的QuixellTMcell transfer system)。单细胞可通过显微镜观察位置、形态特征或标签基因表达情况进行特征鉴定来选择获得。此外,结合梯度离心和流式细胞术也可提高细胞分离和筛选效率。Currently, commonly used methods for manipulating single cells include fluorescence-activated cell sorting (FACS), flow cytometry (FCM; Herzenberg, PNAS USA 76:1453-55, 1979), micromanipulation, and semi-automated cell harvesting (e.g., the Quixell cell transfer system from Stoelting). Single cells can be selected and identified by microscopic observation of location, morphology, or marker gene expression. Furthermore, combining gradient centrifugation with flow cytometry can improve the efficiency of cell separation and screening.

一旦确定所需的细胞,技术人员可以使用常用方法裂解该细胞,释放胞内物质,包括DNA和RNA,这些操作均在试管中进行。裂解细胞的方法包括使用加热、洗涤剂或其他化学方法,亦或是将这些方法结合使用。温和裂解细胞有利于防止核染色质的释放。例如,对含有Tween-20的细胞进行72℃孵育2min足以充分裂解细胞,但却会导致核染色质基因组降解。因此可以选择以下几种方法进行细胞裂解:65℃水浴10min(Esumi 等,Neurosci Res.,60(4):439-51,2008));或者将已加入0.5%NP-40的PCR buffer II(Applied Biosystem)进行70℃孵育90sec;或者使用蛋白酶K酶解;或者是使用异硫氰酸胍盐溶液(美国专利No.2007/0281313)。可直接从细胞裂解液中进行基因组DNA扩增,即可直接向裂解产物中加入反应混合液。或者,也可先将细胞裂解产物分成多管,这样每个反应管中都含有基因组DNA,再分别进行扩增。Once the desired cells are identified, technicians can use common methods to lyse the cells and release intracellular materials, including DNA and RNA, all in a test tube. Methods for lysing cells include using heat, detergents, other chemical methods, or a combination of these methods. Gentle cell lysis helps prevent the release of nuclear chromatin. For example, incubating cells in Tween-20 at 72°C for 2 minutes is sufficient to fully lyse the cells, but it will cause degradation of the nuclear chromatin genome. Therefore, the following methods can be used for cell lysis: 10 minutes in a 65°C water bath (Esumi et al., Neurosci Res., 60(4): 439-51, 2008); or incubating PCR buffer II (Applied Biosystem) with 0.5% NP-40 at 70°C for 90 seconds; or using proteinase K for enzymatic digestion; or using guanidinium isothiocyanate solution (U.S. Patent No. 2007/0281313). Genomic DNA can be amplified directly from the cell lysate, that is, the reaction mixture can be directly added to the lysate. Alternatively, the cell lysate can be divided into multiple tubes so that each reaction tube contains genomic DNA and then amplified separately.

本发明中使用的核酸既包括是天然碱基,也包括是非天然的碱基。天然核酸指的是腺嘌呤、胸腺嘧啶(RNA中为尿嘧啶)、胞嘧啶和鸟嘌呤中的一种或几种。核酸,无论是含有天然骨架还是具有类似结构,包含的典型非天然碱基包括但不限于:肌苷,黄嘌呤,次黄嘌呤,异胞嘧啶,异鸟嘌呤,5-甲基胞嘧啶,2-氨基腺嘌呤,6-甲基腺嘌呤,6-甲基鸟嘌呤,2-丙基鸟嘌呤,2-丙基腺嘌呤,2-硫代胸腺嘧啶,2-硫代胞嘧啶,15-卤代尿嘧啶,15-卤代胞嘧啶,5-丙炔基尿嘧啶,5-丙炔基胞嘧啶,6-偶氮尿嘧啶,6-偶氮胞嘧啶,6-偶氮胸腺嘧啶,5-尿嘧啶,4-硫代尿嘧啶,8-卤代腺嘌呤或鸟嘌呤,8-氨基腺嘌呤或鸟嘌呤,8-硫代腺嘌呤或鸟嘌呤,8-硫代烷基腺嘌呤或鸟嘌呤,8-羟基腺嘌呤或鸟嘌呤,5-卤代尿嘧啶或胞嘧啶,7-甲基鸟嘌呤,7-甲基腺嘌呤,8-氮杂鸟嘌呤,8-氮杂腺嘌呤,7-脱氮鸟嘌呤,7-脱氮腺嘌呤,3-脱氮鸟嘌呤,3-脱氮腺嘌呤等及其类似物。在一定条件下,可利用在核酸序列中加入异胞嘧啶和异鸟嘌呤来降低杂交的非特异性,具体可参见美国专利No.5,681,702。The nucleic acids used in the present invention include both natural bases and non-natural bases. Natural nucleic acids refer to one or more of adenine, thymine (uracil in RNA), cytosine and guanine. Nucleic acids, whether containing a natural skeleton or having a similar structure, include typical non-natural bases including but not limited to: inosine, xanthine, hypoxanthine, isocytosine, isoguanine, 5-methylcytosine, 2-aminoadenine, 6-methyladenine, 6-methylguanine, 2-propylguanine, 2-propyladenine, 2-thiothymine, 2-thiocytosine, 15-halouracil, 15-halocytosine, 5-propynyluracil, 5-propynylcytosine, 6-azouracil ... Azacytosine, 6-azothymine, 5-uracil, 4-thiouracil, 8-haloadenine or guanine, 8-aminoadenine or guanine, 8-thioadenine or guanine, 8-thioalkyladenine or guanine, 8-hydroxyadenine or guanine, 5-halouracil or cytosine, 7-methylguanine, 7-methyladenine, 8-azaguanine, 8-azaadenine, 7-deazaguanine, 7-deazaadenine, 3-deazaguanine, 3-deazaadenine, etc. and their analogs. Under certain conditions, isocytosine and isoguanine can be added to the nucleic acid sequence to reduce the non-specificity of hybridization, as shown in U.S. Patent No. 5,681,702.

本文中所指的“引物”,是一条天然或人工合成的寡核苷酸链。引物可以与模板核酸结合,在DNA聚合酶作用下沿3’端延伸,最终完成模板的复制。引物长度一般为3-36nt,本专利中的引物长度为17-30nt,包括正交引物、扩增引物、构建引物等。一对引物可以分别结合一条或多条目的序列。引物和探针可以是简并或准简并的序列。本发明中的引物结合邻近的靶序列。一条“引物”可视为一条短的多核苷酸链,其游离的3’-OH基团通过杂交结合靶序列或模板,然后与靶序列进行互补配对。本发明的引物包括核苷酸范围从17nt至30nt。The "primer" referred to in this article is a natural or artificially synthesized oligonucleotide chain. The primer can bind to the template nucleic acid, extend along the 3' end under the action of DNA polymerase, and finally complete the replication of the template. The primer length is generally 3-36nt, and the primer length in this patent is 17-30nt, including orthogonal primers, amplification primers, construction primers, etc. A pair of primers can bind to one or more sequences respectively. Primers and probes can be degenerate or quasi-degenerate sequences. The primers in the present invention bind to adjacent target sequences. A "primer" can be regarded as a short polynucleotide chain, whose free 3'-OH group binds to the target sequence or template through hybridization, and then performs complementary pairing with the target sequence. The primers of the present invention include nucleotides ranging from 17nt to 30nt.

本专利方法采用PCR法进行DNA扩增。PCR,即聚合酶链式反应(美国专利Nos.4683195,4683202,4965188)。PCR是通过一对特异引物与目的序列互补结合,在DNA聚合酶作用下进行热循环反应(变性-退火-延伸),从而获得大量完整的目的片段,即PCR产物。两条引物分别与双链模板进行互补结合。扩增时,混合物首先进行模板变性,然后引物在退火阶段与相互补的模板序列进行结合,接着引物在DNA聚合酶作用下进行延伸反应最终形成一条新的互补链。这样的“变性-退火-延伸”一次即为一个循环,通过多次重复这种循环来获得大量的靶序列上的目的片段。扩增片段的长度由引物结合模板的相对位置所决定,因此是可人为控制的参数。This patented method uses PCR for DNA amplification. PCR, or polymerase chain reaction (U.S. Patent Nos. 4,683,195, 4,683,202, and 4,965,188), involves a pair of specific primers that bind complementary to the target sequence and undergo a thermal cycling reaction (denaturation-annealing-extension) under the action of DNA polymerase, thereby obtaining a large number of intact target fragments, i.e., PCR products. The two primers each bind complementary to the double-stranded template. During amplification, the mixture first undergoes template denaturation. Then, the primers anneal to their complementary template sequences. The primers then undergo extension under the action of DNA polymerase, ultimately forming a new complementary strand. This "denaturation-annealing-extension" cycle is called a single cycle, and repeated multiple times can yield a large number of target fragments from the target sequence. The length of the amplified fragment is determined by the relative positions of the primers to the template and is therefore a controllable parameter.

基于PCR反应,可通过多种方法将基因组DNA中单拷贝的目的序列扩增富集至可检测水平(如:杂交已标记的探针,生物素标记引物和抗生素-酶复合物结合检测,对引物进行32P标记dNTP如dCTP或dATP)。只要有合适的引物对,除基因组DNA外,任何寡核苷酸或多聚核苷酸也都可以作为PCR反应中的模板。特别是,通过PCR方法产生的扩增片段本身也可作为后续的PCR扩增的模板。PCR中所使用的DNA聚合酶是一种热稳定酶。在Southern和Northern实验中,引物同时也被作为杂交反应的探针。Based on the PCR reaction, a variety of methods can be used to amplify and enrich single copies of the target sequence in genomic DNA to detectable levels (such as hybridization with labeled probes, biotin-labeled primers and antibiotic-enzyme complex binding detection, and primer labeling with 32P dNTPs such as dCTP or dATP). As long as there is a suitable primer pair, in addition to genomic DNA, any oligonucleotide or polynucleotide can also serve as a template in the PCR reaction. In particular, the amplified fragments generated by the PCR method can also serve as templates for subsequent PCR amplification. The DNA polymerase used in PCR is a thermostable enzyme. In Southern and Northern experiments, primers also serve as probes for hybridization reactions.

扩增,如PCR法、连接扩增法(LCR)和扩增法。这些都是常用方法,请参考如下:美国专利Nos.4,683,195和4,683,202;Innis et al.,PCR protocols:a guide to method andapplications”Academic Press,Incorporated(1990)(for PCR);Wu et al.(1989)Genomics4:560-569(for LCR)。一般PCR过程包括以下几步:(i)引物特异结合DNA模板;(ii)在DNA聚合酶作用下进行随后的扩增,涉及退火、延伸以及再变性;(iii)PCR产物大小的检测。引物应具有特异性和合适的长度,即每条引物都因根据基因位点进行特异设计与模板互补。Amplification, such as PCR, ligation-coupled recombination (LCR), and amplification. These are commonly used methods, and please refer to the following: U.S. Patent Nos. 4,683,195 and 4,683,202; Innis et al., PCR protocols: a guide to method and applications" Academic Press, Incorporated (1990) (for PCR); Wu et al. (1989) Genomics 4: 560-569 (for LCR). The general PCR process includes the following steps: (i) primers specifically bind to the DNA template; (ii) subsequent amplification under the action of DNA polymerase, involving annealing, extension, and denaturation; (iii) detection of the size of the PCR product. Primers should be specific and of appropriate length, that is, each primer should be specifically designed to complement the template according to the gene locus.

进行扩增反应所需的试剂和设备均为商用产品。用于扩增反应的引物最好是与目的基因序列特异互补。扩增物可直接用于测序。The reagents and equipment required for the amplification reaction are all commercially available. Primers used in the amplification reaction are preferably specifically complementary to the target gene sequence. The amplified product can be directly used for sequencing.

扩增得到的互补或同源链可以根据碱基比例和碱基互补配对原则对其进行量化。The amplified complementary or homologous chains can be quantified based on the base ratio and the principle of base complementary pairing.

“逆转录PCR”或“RT-PCR”是指在逆转录酶作用下将模板mRNA转化成互补的单链DNA或cDNA,然后对DNA进行PCR扩增。"Reverse transcription PCR" or "RT-PCR" refers to the conversion of template mRNA into complementary single-stranded DNA or cDNA by the action of reverse transcriptase, followed by PCR amplification of the DNA.

术语“PCR产物”,“PCR片段”以及“扩增物”均指在PCR反应进行“变性-退火-延伸”2个循环以上的混合物。这些术语也包括一个或多个靶序列的一个或多个片段同时扩增的情况。The terms "PCR product," "PCR fragment," and "amplicon" all refer to the mixture resulting from a PCR reaction that has undergone two or more cycles of denaturation-annealing-extension. These terms also encompass the simultaneous amplification of one or more fragments of one or more target sequences.

术语“扩增试剂”包括dNTP、buffer等在内的试剂以及扩增所必需的引物、核酸模板、扩增酶。扩增试剂及其他反应所需的成分均置于反应管(试管,离心管等)中进行反应。扩增方法包括普通PCR法、滚环扩增法(HCA)、超分支滚环扩增法(HRCA)和环介导等温扩增法(LAMP)。The term "amplification reagents" includes reagents such as dNTPs and buffer, as well as primers, nucleic acid templates, and amplification enzymes required for amplification. Amplification reagents and other reaction components are placed in a reaction tube (test tube, centrifuge tube, etc.) for the reaction. Amplification methods include conventional PCR, rolling circle amplification (HCA), hyperbranched rolling circle amplification (HRCA), and loop-mediated isothermal amplification (LAMP).

乳化PCR(emulsion PCR,ePCR),指通过剧烈振荡或搅拌“油包水”混合物,以产生百万个微米级的水性隔室进行各自扩增。将DNA文库混合于有限稀释的乳化之前的磁珠或者直接混于乳液混合物中。这些隔室的大小以及有限稀释的磁珠和目标分子形成一个微反应器,这些微反应器平均只有一个DNA分子和磁珠(最佳稀释条件下,许多反应室只有磁珠而没有目标分子)。为了提高扩增效率,上游PCR引物(低浓度)和下游PCR引物(高浓度)均包含于以上的混合反应液中。根据水性隔室的大小在乳液PCR可产生3×109个各自分开的PCR反应。根据乳化状态的不同,一个乳液中的隔室的平均粒径范围可从亚微米到超过100微米。Emulsion PCR (ePCR) involves vigorous shaking or stirring of a water-in-oil mixture to generate millions of micron-sized aqueous compartments for individual amplification. The DNA library is mixed with limiting dilution magnetic beads prior to emulsification or directly into the emulsion mixture. The size of these compartments, along with the limiting dilution magnetic beads and target molecules, form a microreactor, which contains, on average, only one DNA molecule and magnetic bead (under optimal dilution conditions, many reaction chambers contain only magnetic beads and no target molecules). To improve amplification efficiency, upstream PCR primers (low concentration) and downstream PCR primers (high concentration) are both included in the above mixed reaction solution. Depending on the size of the aqueous compartments, emulsion PCR can generate 3× 109 separate PCR reactions. Depending on the emulsification state, the average particle size of the compartments in an emulsion can range from submicron to over 100 microns.

“同一性”,“同源性”和“相似性”可互换使用,是指两个核酸分子之间的序列相似性。同一性可以通过比对每条序列目的基因的位置来判定,若被比较序列的同一位点处含有相同的碱基或氨基酸,即为同源。序列之间同一性的程度与匹配或同一位点上序列一致数有关。非相关或非同源序列之间的同一性低于25%-40%。"Identity," "homology," and "similarity" are used interchangeably to refer to the sequence similarity between two nucleic acid molecules. Identity can be determined by comparing the positions of the target gene in each sequence. Homology is determined when the compared sequences contain the same base or amino acid at the same position. The degree of identity between sequences is related to the number of matches or sequences that share the same position. Sequence identities between unrelated or non-homologous sequences are typically less than 25%-40%.

“序列一致性”是有一定比例的(如:60%,65%,70%,75%或99%),即碱基在互相比较的不同序列中占有相同的比例。通过专业软件可以确定比对序列这种同一性或同源性的百分比,如BLAST。特别是BLASTN和BLASTP,软件详情可见NCBI网站。"Sequence identity" is defined as a percentage (e.g., 60%, 65%, 70%, 75%, or 99%), which indicates the percentage of bases that are shared between the compared sequences. This percentage of identity or homology can be determined using specialized software, such as BLAST, particularly BLASTN and BLASTP. Details of these software can be found on the NCBI website.

可以多种测序方法实现对靶核酸序列的测序分析,包括但不限于通过杂交测序(SBH),通过连接测序(SBL),定量递增的荧光核苷酸添加测序(QIFNAS),逐步连接和切割,荧光共振能量转移(FRET),分子信标,TaqMan报告基因探针消化,焦磷酸测序,荧光原位测序(FISSEQ),FISSEQ珠(美国专利7,425,431),摇摆测序(wobble sequencing)(PCT/US05/27695),多重测序(于2008年2月6日提交的美国系列号12/027,039;Porreca et al(2007)Nat.Methods4:931),聚合克隆(POLONY)测序(美国专利号6,432,360,6,485,944和6,511,803,以及PCT/US05/06425);纳米格栅滚环测序(nanogrid rolling circle sequencing)(R0L0NY)(于2008年5月14日提交的美国系列号12/120,M1),等位基因特异性寡聚物连接测定法(例如寡聚物连接测定法(OLA)、使用连接的线性探针以及滚环扩增(RCA)读出的单模板分子0LA、连接的锁式探针、和/或使用连接的环形锁式探针和滚环扩增(RCA)读出的单模板分子0LA)等。例如使用多种平台(例如Roche454,Illumina Solexa,AB-SOLiD,Helicos,Polonator平台等)在环形阵列测序上还可以使用高通量测序方法。高通量测序方法在于2009年3月M日提交的美国系列号61/162,913中进行了说明。多种基于光的测序技术是本领域已知的(Landegren et al.(1998)Genome Res.8:769-76;Kwok(2000)Pharmocogenomics1:95-100;以及Shi(2001)Clin.Chem.47:164-172)。Sequencing analysis of target nucleic acid sequences can be achieved using a variety of sequencing methods, including but not limited to sequencing by hybridization (SBH), sequencing by ligation (SBL), quantitative incremental fluorescent nucleotide addition sequencing (QIFNAS), stepwise ligation and cleavage, fluorescence resonance energy transfer (FRET), molecular beacons, TaqMan reporter probe digestion, pyrosequencing, fluorescence in situ sequencing (FISSEQ), FISSEQ beads (U.S. Patent 7,425,431), wobble sequencing (PCT/US05/27695), multiplex sequencing (U.S. Serial No. 12/027,039 filed February 6, 2008; Porreca et al (2007) Nat. Methods 4:931), polyclonal (POLONY) sequencing (U.S. Patent Nos. 6,432,360, 6,485,944 and 6,511,803, and PCT/US05/06425); nanogrid rolling circle sequencing (FISSEQ). Sequencing) (ROLONY) (U.S. Serial No. 12/120,241, filed May 14, 2008), allele-specific oligomer ligation assays (e.g., oligomer ligation assay (OLA), single template molecule OLA using ligated linear probes and rolling circle amplification (RCA) readout, ligated padlock probes, and/or single template molecule OLA using ligated circular padlock probes and rolling circle amplification (RCA) readout), etc. High-throughput sequencing methods can also be used on circular array sequencing, for example, using various platforms (e.g., Roche 454, Illumina Solexa, AB-SOLiD, Helicos, Polonator platforms, etc.). High-throughput sequencing methods are described in U.S. Serial No. 61/162,913, filed March 24, 2009. Various light-based sequencing technologies are known in the art (Landegren et al. (1998) Genome Res. 8:769-76; Kwok (2000) Pharmocogenomics 1:95-100; and Shi (2001) Clin. Chem. 47:164-172).

扩增后的DNA可进行测序,方法多种多样。特别是可以进行高通量测序如AppliedBiosystems公司的SOLID测序法,或是Illumina公司的GenomeAnalyzer测序平台。得到的读序数可达1万到10亿条。这里的“读序数”是指测序反应中的一条连续的核酸序列。The amplified DNA can be sequenced using a variety of methods. High-throughput sequencing, such as Applied Biosystems' SOLID sequencing method or Illumina's Genome Analyzer sequencing platform, can yield anywhere from 10,000 to 1 billion reads. A "read" in this context refers to a continuous nucleic acid sequence in the sequencing reaction.

“鸟枪法测序”(Shotgun sequencing),适用于对大的DNA(如全基因组)进行测序。该方法首先将DNA片段化至可分别测序的小片段,然后将这些小片段的测序结果按照原序列进行重新装配,形成完整的序列。片段化DNA的方法有很多种,包括酶切法和机械打断。重叠序列可利用合适的生物信息软件进行比对分析。Shotgun sequencing is suitable for sequencing large DNA sequences (such as whole genomes). This method first fragments the DNA into individually sequenceable fragments. Sequencing results from these fragments are then reassembled according to the original sequence to form a complete sequence. There are many methods for fragmenting DNA, including enzymatic digestion and mechanical shearing. Overlapping sequences can be compared and analyzed using appropriate bioinformatics software.

扩增和测序方法在预测医学领域是有用的,包括诊断分析、预后分析和药物基因组学。而监控临床试验有助于进行预后(预测)目的,从而对个体进行预防性治疗。相应地,本发明的一个方面涉及一种诊断测定法,用于检测基因组DNA,以确定个体是否处于发生病症和(或)疾病的风险之中。此类测定可用于预后和(或)预测目的,从而对个体进行病症和疾病发作前的预防治疗。Amplification and sequencing methods are useful in predictive medicine, including diagnostic assays, prognostic assays, and pharmacogenomics. Monitoring clinical trials can aid in prognostic (predictive) purposes, thereby enabling preventive treatment of individuals. Accordingly, one aspect of the present invention relates to a diagnostic assay for detecting genomic DNA to determine whether an individual is at risk for developing a condition and/or disease. Such assays can be used for prognostic and/or predictive purposes, thereby enabling preventive treatment of individuals before the onset of a condition and/or disease.

该发明方法还提供了鉴定同源染色体单倍体结构的方法。单倍体信息对于描述和阐释基因组和基因多样性有重要作用。个体单倍体信息有助于更好地使用个性化药物。完整的个体单倍体信息同样有助于促进复杂性状的基因组关联分析(GWAS)。就此而言,遗传材料来源于个体的单细胞,或者是同一细胞型的多个细胞,将被分离成多个部分或反应混合物,这样来自于同一个体的同源染色体DNA将被分开。经以上介绍到的扩增、测序和分析方法,每一个扩增的部分都将以基因组序列,如人基因组(介绍详见International HumanGenome Sequencing Consortium,Nature431,931-945(2004))作为参考分析单核苷酸多态性(Single nucleotide polymorphisms,SNPs)情况。该方法同样适用于其他具有参考基因组的物种。SNP分析可用于鉴定单倍体,如H.C.Fan et al.,Nat.Biotech.29,51-57(2011)所提到的。基于序列信息和单核苷酸多态性,构建单细胞的单倍体型,例如,具有大于100KB的大小。另一方面,将来自同一个体的多个单细胞单倍型进行比较,以确定该个体的完整的单倍核型型。The inventive method also provides a method for identifying the haploid structure of homologous chromosomes. Haploid information plays an important role in describing and interpreting genomic and genetic diversity. Individual haploid information facilitates the use of personalized medicine. Complete individual haploid information also facilitates genome-wide association studies (GWAS) of complex traits. In this regard, genetic material derived from a single cell from an individual, or multiple cells of the same cell type, is separated into multiple fractions or reaction mixtures, so that DNA from homologous chromosomes from the same individual can be separated. Using the amplification, sequencing, and analysis methods described above, each amplified fraction is analyzed for single nucleotide polymorphisms (SNPs) using a genomic sequence, such as the human genome (for details, see International Human Genome Sequencing Consortium, Nature 431, 931-945 (2004)). This method is also applicable to other species with reference genomes. SNP analysis can be used to identify haploids, as described in H.C. Fan et al., Nat. Biotech. 29, 51-57 (2011). Based on the sequence information and single nucleotide polymorphisms, a haplotype of a single cell is constructed, for example, with a size greater than 100 KB. Alternatively, multiple single cell haplotypes from the same individual are compared to determine the complete haplotype of the individual.

鉴定单细胞的单倍体型方法包括从组织材料中提取DNA,收集细胞,无明显损坏和降解的单细胞和DNA分子。将提取的DNA分别加入多个组分,加入反应液,保证每个反应中只有一个基因组亚份。这样,来源于同源染色体的DNA将分别进行扩增、测序和多态性分析。继而进行细胞单倍体分析和SNPs分析。另一方面,单细胞基因组DNA单倍体核型是通过每个部分扩增得到的DNA进行比较而获得的。The method for determining the haploid type of a single cell involves extracting DNA from tissue material, collecting cells, and collecting single cells and DNA molecules without obvious damage or degradation. The extracted DNA is then divided into multiple fractions and added to the reaction solution, ensuring that only one genomic subfraction is present in each reaction. DNA from homologous chromosomes is then amplified, sequenced, and analyzed for polymorphisms. This is followed by cellular haploid analysis and SNP analysis. Separately, the haploid karyotype of the single-cell genomic DNA is determined by comparing the DNA amplified from each fraction.

当样本物种并无参考序列或某些癌症样本具有复杂结构变异时,需要采用从头基因拼接方法(de-novo)。从头基因组拼接是通过将从每个基因组亚份中获得的测序结果进行拼接和相互映射来完成。从头基因组拼接法则也可应用于当前从单细胞或多细胞提取的DNA。例如,可以将来源于同一细胞型的多个细胞进行分离并提取DNA,再分别进行扩增、测序、比较和分析测序结果,在没有参考序列的情况下进行拼接、组装,获得完整的基因组信息。When there is no reference sequence for the sample species or some cancer samples have complex structural variations, de novo gene assembly methods (de-novo) are required. De novo genome assembly is accomplished by splicing and mapping the sequencing results obtained from each genome sub-fraction. The de novo genome assembly principle can also be applied to DNA currently extracted from single cells or multiple cells. For example, multiple cells from the same cell type can be separated and DNA extracted, and then the sequencing results can be amplified, sequenced, compared and analyzed separately, and spliced and assembled without a reference sequence to obtain complete genome information.

本专利方法中提到的术语“生物样本”包括但不限于以下几种:组织,细胞,生物液体以及从这些样本中得到的分离物。The term "biological sample" mentioned in the method of this patent includes but is not limited to the following: tissues, cells, biological fluids and isolates obtained from these samples.

本专利方法中“电子设备可读介质”指任何能够读取和直接识别那些储存、保持或容纳数据或信息的电子设备。这些介质包括但不限于以下几种:存储磁盘,如软盘;硬盘;磁带;光存储,如光盘;电子存储,如RAM,ROM,EPROM,EEPROM等类似的;普通硬盘以及以上这些介质的混合使用,如磁盘/光盘存储。这些介质用于记录一个或多个表达谱信息。In this patented method, "electronic device-readable medium" refers to any electronic device capable of reading and directly identifying any device that stores, maintains, or contains data or information. These media include, but are not limited to, the following: storage disks, such as floppy disks; hard disks; magnetic tapes; optical storage, such as CDs; electronic storage, such as RAM, ROM, EPROM, EEPROM, and the like; conventional hard disks; and combinations of these media, such as magnetic disk/CD storage. These media are used to record one or more expression profiles.

上文中使用到的术语“电子仪器”可以是任何相配的计算机或仪器或者其它可用于满足条件的数据和信息存储设备。例如:独立计算系统;网络,包括局域网(LNA)、广域网(WAN)、内联网和外联网;电子设备如个人数字助手系统、手机、寻呼机等等;分布式处理系统。As used herein, the term "electronic device" refers to any suitable computer or device or other data and information storage device that meets the requirements. Examples include: stand-alone computing systems; networks, including local area networks (LANs), wide area networks (WANs), intranets, and extranets; electronic devices such as personal digital assistant systems, cell phones, pagers, etc.; and distributed processing systems.

上文中使用到的“记录”指的是在电子设备的显示屏上对信息进行记录和编码的过程。技术娴熟的人员更易于使用现有的方法在设备上记录信息,并将信息集合成包含一个或多个表达谱的结果。As used herein, "recording" refers to the process of recording and encoding information on a display screen of an electronic device. A skilled person can easily use existing methods to record information on a device and aggregate the information into a result comprising one or more expression profiles.

电子仪器中的该发明方法的基因组的DNA信息可以用许多软件系统和程序来记录。例如,核苷酸的序列可以用word文件的形式来描绘,用商业化的软件来编码,如WordPerfect和Microsoft Word,或者以ASCII文件的形式描绘,并存储在DB2、Sybase、Oracle等等数据库中。各种数据处理格式(如文件档或数据库)都应当被用于获取或建立一个包含一个或多个如上方所述表达谱的媒介。The genomic DNA information of the present invention can be recorded in electronic devices using a variety of software systems and programs. For example, nucleotide sequences can be represented as Word files and encoded using commercial software such as WordPerfect and Microsoft Word, or as ASCII files and stored in databases such as DB2, Sybase, Oracle, etc. Various data processing formats (e.g., files or databases) can be used to obtain or create a medium containing one or more expression profiles as described above.

需要清楚的是,前面所述的该发明的具体方法仅仅只是原理的一部分应用的说明而已。在不偏离该发明方法的实质和界限的前提下,还需要对目前所描述的方法进行许多修改。本专利说明书所引用的参考文献、专利和公开的专利申请一概都归集在参考资料中。It should be understood that the specific method of the invention described above is merely an illustration of a portion of the principle. Numerous modifications may be made to the method described herein without departing from the essence and scope of the invention. All references, patents, and published patent applications cited in this patent specification are incorporated herein by reference.

下列的例子将对该发明方法进行陈述。这些例子不可诠释为限制该发明方法的范围,因为基于现有的公开发明、图片和相关的文件,还存在这些或其它相同的方法。The following examples illustrate the invention method. These examples should not be interpreted as limiting the scope of the invention method, because based on the existing disclosure, drawings and related documents, there are other similar methods.

实施例IExample 1

单细胞DNA扩增Single-cell DNA amplification

用激光解剖显微镜分离细胞 Dissociate cells using a laser dissecting microscope .

从细胞培养皿中选择一个细胞,用激光解剖显微镜(LDM-6500,Leica)分离细胞分离后悬浮于离心管中。分离出的细胞再转移到培养皿中进行培养,并在10倍物镜下用亮视野显微镜观察细胞。然后用紫外激光切除选择的单细胞周围的膜,将其置于PCR管中。稍作离心使细胞沉于管底。将5μl裂解缓冲液(30mM Tris-Cl PH7.8,2mM KCl,0.2%Trition X-100,12.5μg/ml Qiagen Protease)加入PCR管的一侧,流入管内。然后将捕获的细胞使用以下的温度条件在PCR仪器中进行裂解:50℃3小时,75℃20分钟,80℃5分钟。Select one cell from the cell culture dish, separate the cells using a laser dissecting microscope (LDM-6500, Leica), and suspend them in a centrifuge tube. The separated cells are then transferred to a culture dish for culture, and the cells are observed using a bright field microscope under a 10x objective lens. The membrane surrounding the selected single cell is then removed using an ultraviolet laser and placed in a PCR tube. Centrifuge briefly to allow the cells to settle to the bottom of the tube. Add 5μl of lysis buffer (30mM Tris-Cl PH7.8, 2mM KCl, 0.2% Trition X-100, 12.5μg/ml Qiagen Protease) to one side of the PCR tube and flow into the tube. The captured cells are then lysed in a PCR instrument using the following temperature conditions: 50°C for 3 hours, 75°C for 20 minutes, and 80°C for 5 minutes.

线性预扩增Linear preamplification

在线性预扩增中,使用一对类简并引物对整个基因组DNA扩增。引物为NG和NT:In linear preamplification, a pair of quasi-degenerate primers is used to amplify the entire genomic DNA. The primers are NG and NT:

NG5’-GTGAGTGATGGTTGAGGTAGTGTGGAGNNNNNGGG-3’NG5’-GTGAGTGATGGTTGAGGTAGTGTGGAGNNNNNGGG-3’

NT5’-GTGAGTGATGGTTGAGGTAGTGTGGAGNNNNNTTT-3’NT5’-GTGAGTGATGGTTGAGGTAGTGTGGAGNNNNNTTT-3’

将以下的溶液加入到PCR管中进行第一次扩增:1.5μl ThxrmoPol Buffer(NEB),1.5μlΦ29Reaction Buffer(NEB),1.0μl dNTP(10mM),26μl H2O(Ambion),0.1μl NG和NT引物(50μM)。The following solution was added to a PCR tube for the first amplification: 1.5 μl Th x rmoPol Buffer (NEB), 1.5 μl Φ29 Reaction Buffer (NEB), 1.0 μl dNTP (10 mM), 26 μl H 2 O (Ambion), 0.1 μl NG and NT primers (50 μM).

在PCR缓冲液加入到含有已裂解的单细胞的PCR管后,94℃加热3分钟以使DNA变性成单链DNA。将样品迅速置于冰上使温度达到0℃,在这个过程中引物结合到模板。0.6μlBst大片段聚合酶和Φ29(NEB)混合物加入PCR管。在PCR仪器中运行以下的温度和时间:After adding PCR buffer to the PCR tube containing the lysed single cells, heat at 94°C for 3 minutes to denature the DNA into single-stranded DNA. Quickly place the sample on ice to allow the temperature to reach 0°C, during which time the primers bind to the template. Add 0.6 μl of a mixture of Bst Large Fragment Polymerase and Φ29 (NEB) to the PCR tube. Run the PCR instrument at the following temperatures and times:

10℃-45秒10℃-45 seconds

20℃-45秒20℃-45 seconds

30℃-60秒30℃-60 seconds

40℃-45秒40℃-45 seconds

50℃-45秒50℃-45 seconds

62℃-3分钟62℃-3 minutes

95℃-20秒95℃-20 seconds

然后将PCR管置于冰上停止反应并开始使用新的引物进行反应。The PCR tubes were then placed on ice to stop the reaction and start the reaction using new primers.

在第二次和后续的循环中,新鲜的聚合酶混合物加入到PCR管并在PCR仪器中运行以下的循环:In the second and subsequent cycles, fresh polymerase mix is added to the PCR tubes and the following cycles are run in the PCR instrument:

10℃-45秒10℃-45 seconds

20℃-45秒20℃-45 seconds

30℃-60秒30℃-60 seconds

40℃-45秒40℃-45 seconds

50℃-45秒50℃-45 seconds

62℃-3分钟62℃-3 minutes

95℃-20秒95℃-20 seconds

58℃-至少20秒58℃ - at least 20 seconds

实施例IIExample II

使用标准方法对例1的扩增子进一步扩增The amplicon from Example 1 was further amplified using standard methods.

使用标准PCR扩增方法对例1的扩增物进行指数扩增。在冰上操作将以下反应体系加入到PCR管中:Use standard PCR amplification methods to exponentially amplify the product of Example 1. Working on ice, add the following reaction system to a PCR tube:

3.0μl ThermoPol Buffer(NEB)3.0 μl ThermoPol Buffer (NEB)

1.0μl dNTP(10mM),1.0 μl dNTP (10 mM),

26μl H2O(Ambion)26 μl H 2 O (Ambion)

0.1μl引物(100μM)(5’-GTGAGTGATGGTTGAGGTAGTGTGGAG-3’)0.1 μl primer (100 μM) (5’-GTGAGTGATGGTTGAGGTAGTGTGGAG-3’)

1.0μl Deep VentR(exo-)(NEB)1.0 μl Deep VentR (exo-) (NEB)

采用以下标准的PCR程序扩增出1-2μg DNA产物。1-2 μg of DNA product was amplified using the following standard PCR procedure.

94℃-20秒94℃-20 seconds

59℃-20秒59℃-20 seconds

72℃-3分钟72℃-3 minutes

17个循环17 cycles

72℃-5分钟72℃-5 minutes

4℃-∞4℃-∞

指数扩增后,使用Qiagen柱纯化DNA并保存,并去除DNA扩增子的引物端。After exponential amplification, the DNA was purified using Qiagen columns and stored, and the primer ends of the DNA amplicons were removed.

以下已知的其它扩增方法也可以使用。Other amplification methods known below may also be used.

最常见的扩增方法之一是PCR,详细描述见美国专利Nos.4683195,4683202和4800159,在此列出作为参考。PCR被认为是一种进行DNA扩增的可接受的方法,特别要注意的是该发明方法也可能用其它的扩增技术进行。这些技术一般被认为是常规技术,以下简单进行讨论。在PCR中,引物对选择性结合到核苷酸这一过程是在允许选择性结合的条件下发生的。在模板依赖性扩增中,该引物含有可以起始最初核苷酸合成的核苷酸。引物可以是单链或双链的形式,但是单链引物更佳。在一系列模板依赖性扩增中,引物用于扩增已知模板的目的基因序列。One of the most common amplification methods is PCR, which is described in detail in U.S. Patent Nos. 4,683,195, 4,683,202, and 4,800,159, which are incorporated herein by reference. PCR is considered an acceptable method for DNA amplification, and it is important to note that the inventive method may also be performed using other amplification techniques. These techniques are generally considered conventional techniques and are briefly discussed below. In PCR, the process of selective binding of primer pairs to nucleotides occurs under conditions that allow selective binding. In template-dependent amplification, the primers contain nucleotides that can initiate initial nucleotide synthesis. Primers can be in single-stranded or double-stranded form, but single-stranded primers are preferred. In a series of template-dependent amplifications, primers are used to amplify the target gene sequence of a known template.

现已公开的DNA扩增方法的目标产物都是核酸,可以是用现有扩增技术进行扩增的任何核酸或核酸的类似物。例如,该发明方法上下文中的目的核酸包括但又不仅仅局限于:基因组DNA,cDNA,RNA,mRNA,粘粒DNA,BAC DNA,PAC DNA,YAC DNA和人工合成的DNA。源于原核细胞、真核细胞或组织的基因组DNA均可作为现有公开DNA扩增方法的模板。可将分离的带有Poly A尾的mRNA反转录成(reversetranscribed,简称RT)cDNA后作为DNA扩增的模板。可以直接将cDNA作为DNA模板进行扩增。另外,起始原料为RNA样品而不是DNA样品时,RNA或mRNA可以被直接扩增。The target products of the currently disclosed DNA amplification methods are all nucleic acids, which can be any nucleic acid or nucleic acid analog that can be amplified using existing amplification techniques. For example, the target nucleic acids in the context of the inventive method include, but are not limited to, genomic DNA, cDNA, RNA, mRNA, cosmid DNA, BAC DNA, PAC DNA, YAC DNA, and synthetic DNA. Genomic DNA derived from prokaryotic cells, eukaryotic cells, or tissues can be used as a template for the currently disclosed DNA amplification methods. Isolated mRNA with a Poly A tail can be reverse transcribed (RT) into cDNA and then used as a template for DNA amplification. cDNA can be directly used as a DNA template for amplification. In addition, when the starting material is an RNA sample rather than a DNA sample, RNA or mRNA can be directly amplified.

在PCR反应中,引物对的序列与目的基因的序列互补。如果模板中存在目的基因,引物便会与目的基因结合DNA模板:引物结合物。反应体系中加入足量的dNTP和DNA聚合酶(如Taq聚合酶)。In a PCR reaction, the primer pairs have sequences that complement those of the target gene. If the target gene is present in the template, the primers will bind to it. DNA template:primer complex. Sufficient dNTPs and a DNA polymerase (such as Taq polymerase) are added to the reaction system.

DNA模板:引物结合物形成后,DNA聚合酶开始催化引物沿着目的基因序列延伸。通过控制反应体系的温度的增减使延伸后的引物从模板上脱落下来,这就是反应产物。剩余的引物再次与模板和新生成的产物结合,并重复之前的步骤。这几个步骤称为一个“循环”。一直循环到产生足够多的扩增物。DNA template: After the primer-primer complex is formed, DNA polymerase begins catalyzing the extension of the primer along the target gene sequence. By controlling the temperature of the reaction system, the extended primer is released from the template, forming the reaction product. The remaining primer binds again to the template and the newly formed product, and the previous steps are repeated. This series of steps is called a "cycle." This cycle continues until a sufficient amount of amplicon is produced.

接下来要检测扩增物。可靠的方法一般用可视化的方法来检测扩增物。另外还可以通过荧光标记、化学荧光标记方法、同位素示踪标记或者标记核苷酸、质量标签或者电或热脉冲信号等方法间接检测扩增物。Next, the amplicon needs to be detected. Reliable methods generally use visualization to detect the amplicon. Alternatively, indirect detection can be achieved through fluorescent labeling, chemifluorescent labeling, isotope tracer labeling or labeled nucleotides, mass tags, or electrical or thermal pulse signals.

另一种扩增方式是连接酶链式反应(Ligase Chain Reaction,LCR)。该方法公开于欧洲专利申请No.320,308。在LCR反应中,当暴露于目的基因序列时,两对互补的探针分别结合在两条模板链上,且两结合物相互邻近。加入连接酶后,两个相邻的探针相互连接。和PCR一样,通过温度的循环,使生成的单位从模板上解离下来,并作为新的模板与多余的探针结合。美国专利No.4,883,750,也收录了一篇与LCR相似的,利用探针与模板链结合的扩增方法。Another amplification method is the ligase chain reaction (LCR). This method is disclosed in European Patent Application No. 320,308. In the LCR reaction, when exposed to the target gene sequence, two pairs of complementary probes bind to the two template strands, and the two binding partners are adjacent to each other. After the addition of ligase, the two adjacent probes are connected to each other. As with PCR, through temperature cycling, the generated units dissociate from the template and act as new templates to bind to the excess probes. U.S. Patent No. 4,883,750 also includes an amplification method similar to LCR that utilizes the binding of probes to template strands.

PCT专利申请No.PCT/US87/00880描述的Qbeta复制酶方法也是另外一种扩增方法。这种方法是向含有RNA聚合酶和模板的体系中加入与目的基因序列互补的重复序列RNA,这段重复序列将在RNA聚合作用下扩增,因此也就可以用于检测。The Qbeta replicase method, described in PCT Patent Application No. PCT/US87/00880, is another amplification method. This method involves adding a repetitive RNA sequence complementary to the target gene sequence to a system containing RNA polymerase and a template. RNA polymerase amplifies this repetitive sequence, allowing for detection.

等温扩增技术是使用限制性核酸内切酶和连接酶来扩增带有5′-[α-硫代]三磷酸的位点的目的分子。此方法也可用于DNA扩增。例如Walker et al.描述的扩增方法。Isothermal amplification uses restriction endonucleases and ligases to amplify target molecules containing 5′-[α-thio]triphosphate sites. This method can also be used for DNA amplification, for example, as described by Walker et al.

另外一种恒温扩增方法是链置换扩增技术(Strand DisplacementAmplification,SDA),该技术包括多个链置换和合成的过程。SDA技术收录于美国专利Nos.5,712,1245,648,211和5,455,166。另一种相似的方法是修复链反应(Repair ChainReaction,RCR)。此方法退火时使多个引物探针与目的区段混合,然后经过链修复后保留四种碱基中的两种。其他两种碱基以生物素化的衍生物的形式加入反应体系中。相似的检测手段也可用于SDA反应。Another isothermal amplification method is strand displacement amplification (SDA), which involves multiple strand displacement and synthesis processes. SDA technology is described in U.S. Patents Nos. 5,712, 1245, 648, 211, and 5,455,166. Another similar method is the repair chain reaction (RCR). This method involves mixing multiple primer probes with the target region during annealing, followed by strand repair, which retains two of the four bases. The other two bases are added to the reaction system as biotinylated derivatives. Similar detection methods can also be used in SDA reactions.

还可以利用循环探针反应(cyclic probe reaction,CPR)来检测特定的目的基因。在CPR反应中,探针的3’端和5’端带有非特异的DNA,中间含有一段特异性的RNA序列。将探针与模板DNA结合后,利用RNase H催化反应,经消化后可以生成反应产物也就是鉴定产物。初始模板经退火与其它探针结合后开始新一轮的反应。A cyclic probe reaction (CPR) can also be used to detect specific target genes. In a CPR reaction, the probe has nonspecific DNA at its 3' and 5' ends, with a specific RNA sequence in between. After the probe binds to the template DNA, RNase H catalyzes the reaction, which digests it to produce a reaction product, also known as the identification product. The initial template anneals and binds to additional probes, initiating a new round of reaction.

现已公开的核酸扩增方法还有,转录依赖的扩增系统(transcription-basedamplification system,rAS),核酸序列依赖性扩增(nucleic acid sequence basedamplification,NASBA)和3SR。(Kwoh et al.,Proc Natl Acad Sci USA,86:1173-77,1989;PCT WO88/10315et al.,1989)。在NASBA方法中,可以通过标准的苯酚氯仿提取、加热使样本变性、裂解液处理等方法获得核酸,并用minispin柱分离DNA与RNA或用盐酸胍提取RNA。这些扩增技术一般包括以下步骤:退火使引物与模板链结合,聚合反应,RNase H消化DNA/RNA结合产物,高温解链DNA双链。不管哪种技术,都可以通过向反应体系中加入针对第二链的引物进行聚合反应而使单链DNA变为双链DNA。双链的DNA可以在聚合酶T7或SP6的作用下大量进行转录。在等温循环反应中,RNA反转录为双链DNA后再利用聚合酶T7或SP6进行转录。由此产生的产物,无论是完整产物还是有所缩短的产物,都是带有目的基因特异性的产物。Other currently available nucleic acid amplification methods include transcription-based amplification (rAS), nucleic acid sequence-based amplification (NASBA), and 3SR (Kwoh et al., Proc Natl Acad Sci USA, 86:1173-77, 1989; PCT WO88/10315 et al., 1989). In the NASBA method, nucleic acids are obtained through standard phenol-chloroform extraction, heat denaturation, and lysis buffer treatment. DNA and RNA are then separated using minispin columns or RNA is extracted using guanidine hydrochloride. These amplification techniques generally involve the following steps: annealing to allow primers to bind to the template strand, polymerization reaction, RNase H digestion of the DNA/RNA binding product, and high-temperature denaturation of the DNA double strands. Regardless of the technique, single-stranded DNA is converted to double-stranded DNA by adding a primer targeting the second strand to the reaction system for polymerization. The double-stranded DNA can then be transcribed in large quantities using polymerases such as T7 or SP6. In an isothermal cycling reaction, RNA is reverse transcribed into double-stranded DNA, which is then transcribed using polymerase T7 or SP6. The resulting products, whether intact or shortened, are specific for the target gene.

英国专利No.GB2,202,328和PCT No.PCT/US89/01025中描述的扩增技术也可用于扩增。前者在类PCR的模式和酶依赖合成中使用经过修饰的引物。这种引物加入了捕获基团(如生物素)或检测基团(如酶)。后者在反应体系中加入过量带有标记的探针,目的序列存在时,探针可以与之结合并经催化断裂。断裂后模板链不变,还可以与其他探针结合,探针断裂时发出的信号可以标识目的序列的存在。Amplification techniques described in UK Patent No. GB2,202,328 and PCT No. PCT/US89/01025 can also be used for amplification. The former uses modified primers in a PCR-like format and enzyme-dependent synthesis. These primers incorporate capture groups (such as biotin) or detection groups (such as enzymes). The latter involves adding an excess of labeled probes to the reaction system. In the presence of the target sequence, the probes bind to it and undergo catalytic cleavage. After cleavage, the template strand remains intact and can bind to other probes. The signal emitted by the probe cleavage indicates the presence of the target sequence.

Davey et al.,欧洲专利No.329,822公布了一种循环扩增单链RNA(single-strand RNA,ssRNA),ssDNA和双链DNA(double-strand DNA,dsDNA)的方法。寡核苷酸引物先与初始模板ssRNA结合,并在逆转录酶的催化下延伸。然后手RNase H除去DNA:RNA复合物中的RNA,剩下的ssDNA作为第二模板与含有与其模板5’端同源的RNA聚合酶的启动子序列第二条引物结合,在DNA聚合酶作用进行延伸形成双链DNA分子。此双链DNA含有与引物间的原始RNA同样的序列,另外在一端还含有启动子序列。在适当的RNA聚合酶的作用下,这些启动子序列可以用于生成许多源于DNA的RNA拷贝。这些拷贝进入循环后将可加速扩增。因此这个方法的起始模板可以是DNA或RNA,因此可以通过选择适当的酶实现恒温扩增,而不必在每个循环都重新加入新的酶。Davey et al., European Patent No. 329,822, discloses a method for cyclic amplification of single-stranded RNA (ssRNA), ssDNA, and double-stranded DNA (dsDNA). An oligonucleotide primer first binds to the initial template, ssRNA, and is extended under the catalysis of reverse transcriptase. RNase H then removes the RNA from the DNA:RNA complex. The remaining ssDNA serves as a secondary template, which binds to a second primer containing an RNA polymerase promoter sequence homologous to the 5' end of the template. DNA polymerase then extends the ssDNA to form a double-stranded DNA molecule. This double-stranded DNA contains the same sequence as the original RNA between the primers and also contains a promoter sequence at one end. Under the action of an appropriate RNA polymerase, this promoter sequence can be used to generate numerous RNA copies derived from the DNA. These copies then undergo accelerated amplification after entering the cycle. Therefore, this method can use either DNA or RNA as the starting template, enabling isothermal amplification by selecting the appropriate enzyme, without the need to add fresh enzyme for each cycle.

Miller et al.,PCT WO89/06700公布了一种扩增方法,这种方法采用启动子/引物序列与目的单链DNA杂交,进而生成许多反转录RNA序列的产物。这一方法并不是循环式的,因此在RNA转录反应中没有新的模板生成。Miller et al., PCT WO89/06700 discloses an amplification method that uses a promoter/primer sequence to hybridize with a single-stranded DNA of interest to generate multiple products of reverse-transcribed RNA sequences. This method is not cyclic, so no new template is generated during the RNA transcription reaction.

其他的扩增方式还包括race and one-sided PCR(Frohman,In:PCR Protocols:AGuide To Methods And Applications,Academic Press,N.Y.,1990)。这个方法是将含有目的寡核苷酸链序列的两个(或多个)寡核苷酸链连接后再扩增寡核苷酸链。同样地,也可以被用于扩增DNA(Wu et al.,Genomics4:560-569,1989)。Other amplification methods include race and one-sided PCR (Frohman, In: PCR Protocols: A Guide To Methods And Applications, Academic Press, N.Y., 1990). This method involves ligating two (or more) oligonucleotides containing the desired oligonucleotide sequence and then amplifying the oligonucleotides. Similarly, this method can also be used to amplify DNA (Wu et al., Genomics 4:560-569, 1989).

实施例IIIExample III

从PCR扩增物中除去引物序列Removal of primer sequences from PCR amplification

使用硫代膦酸核苷酸是一种除去扩增物中引物序列的方法,将DNA产物利用dA、dC、dT和硫代膦酸的dG进行重扩增。因为引物中不含有dC核苷酸,所以其互补序列中应该不含有dG核苷酸,用核酸核酸外切酶III从3’端消化PCR产物,然而带有硫代膦酸的核苷酸不能被消化,所以序列会被消化至带有硫代膦酸的dG。该酶混合液中还加入了Mung Bean核酸外切酶,消除核酸外切酶III消化过程中产生的5’末端。Using phosphothioate nucleotides is a method for removing primer sequences from the amplicon. The DNA product is then re-amplified using dA, dC, dT, and phosphothioate dG. Because the primers do not contain dC nucleotides, their complementary sequences should not contain dG nucleotides. Nuclease exonuclease III is used to digest the PCR product from the 3' end. However, phosphothioate-containing nucleotides cannot be digested, so the sequence is digested to the phosphothioate-containing dG. Mung Bean exonuclease is also included in the enzyme mixture to eliminate the 5' ends generated during the exonuclease III digestion.

反应缓冲液和反应条件如下:1X Buffer1(New England Biolabs),5U ExoIII和0.5U Mung Bean exonuclease(New England Biolabs)。室温条件下孵育反应2分钟。然后可通过添加5ul0.5M EDTA终止反应。The reaction buffer and reaction conditions are as follows: 1X Buffer 1 (New England Biolabs), 5 U ExoIII, and 0.5 U Mung Bean exonuclease (New England Biolabs). The reaction is incubated at room temperature for 2 minutes. The reaction can then be terminated by adding 5 μl of 0.5 M EDTA.

从DNA产物中去除引物序列的反应步骤流程图显示如下,去除引物序列后的产品可用于高通量测序。例如,引物序列可从3’端去除。然后再从5’端去除。The following flowchart shows the reaction steps for removing primer sequences from DNA products. After removing the primer sequences, the products can be used for high-throughput sequencing. For example, the primer sequence can be removed from the 3' end and then from the 5' end.

实施例IVExample IV

从PCR扩增子中去除启动序列Removal of priming sequences from PCR amplicons

使用半甲基化引物用于促进切除扩增子产物的启动序列。半甲基化引物于第二轮扩增中,使用MspJI限制性内切酶或具有相似属性需甲基酶识别限制性位点的其他酶的活性。半甲基化引物序列如下:Hemimethylated primers are used to facilitate excision of the priming sequence of the amplicon product. Hemimethylated primers are used in the second round of amplification to activate the activity of the MspJI restriction endonuclease or other enzymes with similar properties that require a methylase to recognize the restriction site. The sequences of the hemimethylated primers are as follows:

第一轮扩增引物:First round amplification primers:

GT GAG TGA TGG TTG AGG TCT TGT GGA GNNNNNGGGGT GAG TGA TGG TTG AGG TCT TGT GGA GNNNNNGGG

GT GAG TGA TGG TTG AGG TCT TGT GGA GNNNNNTTTGT GAG TGA TGG TTG AGG TCT TGT GGA GNNNNNTTT

第二轮扩增引物:Second round amplification primers:

GT GAG TGATGG TTGAGG TmCT TGT GGA GGT GAG TGATGG TTGAGG TmCT TGT GGA G

第二轮扩增结束后,在纯化后的DNA产物中加入enzyme MspJI(0.5U),buffer4(New England Biolabs)和BSA。37℃反应4hours。然后经柱纯化终止反应。After the second round of amplification, enzyme MspJI (0.5 U), buffer 4 (New England Biolabs), and BSA were added to the purified DNA product. The reaction was incubated at 37°C for 4 hours and then terminated by column purification.

实施例VExample V

有用的标签Useful tags

在引物或扩增物中包含可识别基团,或添加可识别基团到探针中,可方便最后对扩增物进行定性与定量。有许多不同种类的标签可供使用并达到这个目的,例如:荧光基团,生色团,放射性同位素,酶标记物,免疫抗体标记,化学发光基团,电致发光,亲和标签,等等。相关专业人员熟悉以上这些和这里没有提到其他荧光物质。Including a recognizable group in primers or amplicon, or adding a recognizable group to a probe, facilitates the final qualitative and quantitative analysis of the amplicon. Many different types of labels are available for this purpose, including fluorescent groups, chromophores, radioisotopes, enzyme labels, immunolabeled antibodies, chemiluminescent groups, electroluminescent groups, affinity tags, and many others. Those skilled in the art will be familiar with these and other fluorescent substances not mentioned here.

亲和标签的例子包含但不限于以下:抗体,片段化抗体,受体蛋白,激素,生物素,DNP,或任何多肽/蛋白质分子绑定一个亲和标签,可用于增殖基因的分离。Examples of affinity tags include, but are not limited to, antibodies, antibody fragments, receptor proteins, hormones, biotin, DNP, or any polypeptide/protein molecule bound to an affinity tag that can be used to isolate a proliferation gene.

酶标记物的例子包含酶,如尿素酶,碱性磷酸酶或过氧化酶。此外,可使用比色指示底物,经人眼可见,或用分光光度法来检测特定的杂交与互补核酸样本。这些例子都是众所周知的,相关专业人员熟悉在目前已公布的信息中,并不仅限于上述例子。Examples of enzyme labels include enzymes such as urease, alkaline phosphatase, or peroxidase. Additionally, colorimetric indicator substrates, visible to the human eye, or spectrophotometric methods can be used to detect specific hybridization with complementary nucleic acid samples. These examples are well known to those skilled in the art and are not limited to the examples listed above.

在目前已知的荧光物质信息中,下面的较为常用:Alexa350,Alexa430,AMCA,BODIPY630/650,BODIPY650/665,BODIPY-FL,BODIPY-R6G,BODIPY-TMR,BODIPY-TRX,Cascade Blue,Cy2,Cy3,Cy5,6-FAM,Fluorescein,HEX,6-JOE,Oregon Green488,OregonGreen500,Oregon Green514,Pacific Blue,REG,Rhodamine Green,Rhodamine Red,ROX,TAMRA,TET,Tetramethylrhodamine,and Texas Red。Among the currently known fluorescent substances, the following are more commonly used: Alexa350, Alexa430, AMCA, BODIPY630/650, BODIPY650/665, BODIPY-FL, BODIPY-R6G, BODIPY-TMR, BODIPY-TRX, Cascade Blue, Cy2, Cy3, Cy5, 6-FAM, Fluorescein, HEX, 6-JOE, Oregon Green488, OregonGreen500, Oregon Green514, Pacific Blue, REG, Rhodamine Green, Rhodamine Red, ROX, TAMRA, TET, Tetramethylrhodamine, and Texas Red.

实施例VIExample VI

分离技术Separation technology

扩增后,需要从扩增物中分离出不同片段大小的产物,并从扩增物,起始模版,或多余的引物,分析确定是否有发生特异性扩增。After amplification, products of different fragment sizes need to be separated from the amplicon and analyzed to determine whether specific amplification has occurred from the amplicon, the starting template, or excess primers.

在标准实验操作中,扩增物可以通过使用琼脂糖凝胶,琼脂糖-丙烯酰胺或聚丙烯酰胺凝胶电泳进行分离。(Sambrook et al.,″Molecular Cloning,″A LaboratoryManual,2d Ed.,Cold Spring Harbor Laboratory Press,New York,13.7-13.9:1989)。凝胶电泳分离技术是众所周知的技术。In standard laboratory procedures, the amplicons can be separated by electrophoresis using agarose gel, agarose-acrylamide, or polyacrylamide gel (Sambrook et al., "Molecular Cloning," A Laboratory Manual, 2d Ed., Cold Spring Harbor Laboratory Press, New York, 13.7-13.9: 1989). Gel electrophoresis separation techniques are well known.

另外,利用色谱技术以达到分离效果。有很多种可用的色谱方法:吸附,隔离,离子交换和分子筛,及许多专业技术可供使用包括色谱柱,滤膜,薄层与气相色谱技术。(Freifelder,Physical Biochemstry Applications to Biochemistry and MolecularBiology,2nd ed.Wm.Freeman and Co.,New York,N.Y,1982)。另一个替代方法是通过捕获核酸产物上的标签,如:使用包含亲和素蛋白或抗体的磁珠,分别捕获带有生物素或抗原的核酸。Alternatively, separation can be achieved using chromatography. A wide variety of chromatographic methods are available, including adsorption, isolation, ion exchange, and molecular sieves, as well as specialized techniques such as columns, membranes, thin-layer chromatography, and gas chromatography (Freifelder, Physical Biochemstry Applications to Biochemistry and Molecular Biology, 2nd ed. Wm. Freeman and Co., New York, NY, 1982). Another alternative approach involves capturing a tag attached to the nucleic acid product, such as using magnetic beads containing avidin or antibodies to capture nucleic acids labeled with biotin or antigen, respectively.

微流控芯片技术包含有分离平台,如毛细管(ACLARA BioSciences Inc.)芯片(Caliper Technologies Inc.)。这些微流体分离平台只需要纳升体积的样本,而其他的分离技术则需要微升体积的样本。基因分析的一些实验流程有使用微流体设备。如:已发行的PCT申请No.WO94/05414(Northrup and White),报道使用一种micro-PCR.TM.设备,从标本中分离并扩增核酸。美国专利Nos.5,304,487,5,296,375,and5,856,174描述了类似的设备与方法。Microfluidic chip technology includes separation platforms such as capillary tubes (ACLARA BioSciences Inc.) and microchips (Caliper Technologies Inc.). These microfluidic separation platforms require only nanoliter sample volumes, while other separation technologies require microliter sample volumes. Some genetic analysis protocols utilize microfluidic devices. For example, published PCT application No. WO94/05414 (Northrup and White) describes the use of a micro-PCR™ device to isolate and amplify nucleic acids from specimens. U.S. Patents Nos. 5,304,487, 5,296,375, and 5,856,174 describe similar devices and methods.

在一些实例中,可能需要提供一种额外的或替代方法来分析DNA扩增物。在这些实例中,微细管阵列被考虑用于样品分析。微细管阵列电泳通常使用薄毛细管或通道,其中含有特定的分离介质。电泳样品通过毛细管分离出样品中所含的条带。微细管阵列电泳通常可提供一个快速分析样品的方法,PCR.TM,产物分析,和酶切片段大小测定。这些毛细管的高表面体积比是考虑到在通过毛细管更高的电场时,毛细管不发生实质性的热变化,这样可以更快的分离样品。此外,结合共焦成像技术,这种方法可提高分析的灵敏度,其灵敏度可比于放射性测序方法。精密加工的微流体设备包括上述已详细讨论过的微细管阵列电泳,例如,Jacobson et al.,Anal Chem,66:1107-1113,1994;Effenhauser et al.,AnalChem,66:2949-2953,1994;Harrison et al.,Science,261:895-897,1993;Effenhauseret al.,Anal Chem,65:2637-2642,1993;Manz et al.,J.Chromatogr593:253-258,1992;and U.S.Pat.No.5,904,824,incorporated herein by reference。通常情况下,这些方法包括在二氧化硅,硅胶或其他晶体基板或芯片的微米尺度通道中光蚀刻成像。可方便改用于该发明方法。In some instances, it may be necessary to provide an additional or alternative method for analyzing DNA amplicons. In these instances, microtube arrays are considered for sample analysis. Microtube array electrophoresis typically uses thin capillaries or channels that contain a specific separation medium. Electrophoresis of the sample through the capillaries separates the bands contained in the sample. Microtube array electrophoresis typically provides a method for rapid sample analysis, PCR.TM, product analysis, and enzyme fragment size determination. The high surface-to-volume ratio of these capillaries allows for faster sample separation due to the fact that the higher electric fields passing through the capillaries do not undergo substantial thermal changes. In addition, combined with confocal imaging, this method can increase the sensitivity of the analysis, which is comparable to radioactive sequencing methods. Precision-fabricated microfluidic devices include microcapillary array electrophoresis, which has been discussed in detail above, for example, in Jacobson et al., Anal Chem, 66:1107-1113, 1994; Effenhauser et al., Anal Chem, 66:2949-2953, 1994; Harrison et al., Science, 261:895-897, 1993; Effenhauser et al., Anal Chem, 65:2637-2642, 1993; Manz et al., J. Chromatogr, 593:253-258, 1992; and U.S. Pat. No. 5,904,824, incorporated herein by reference. Typically, these methods involve photolithography of micrometer-scale channels in silica, silica gel, or other crystalline substrates or chips, which can be readily adapted for use with the present invention.

Tsuda等人.(Anal Chem,62:2149-2152,1990)介绍了一种矩形毛细管,可替代圆柱形毛细玻璃管。这些系统的一些优势是它们高效的散热率取决于高度和宽度比,因此,需要高表面体积比和具有高灵敏度光学进样口的检测模式。这些平面分离通道可同时执行二维分离,一种是应用在分离通道,一种是应用在样本区域,可用多通道阵列探测器探测。Tsuda et al. (Anal Chem, 62:2149-2152, 1990) introduced rectangular capillaries as an alternative to cylindrical glass capillaries. Advantages of these systems include efficient heat dissipation rates that depend on the ratio of height to width, thus requiring high surface-to-volume ratios and detection modes with highly sensitive optical inlets. These planar separation channels can simultaneously perform two-dimensional separations, one in the separation channel and one in the sample area, enabling detection using multichannel array detectors.

在许多毛细管电泳方法中,毛细管中(例如,熔融石英毛细管,通道蚀刻法,经机械加工的,或塑造成平面的基板)被填满适当的分离区域/筛选介质。通常情况下,已知的各种筛选介质都可能被用于微细管阵列分析。这类筛选介质包括,如,羟乙基纤维素,聚丙烯酰胺,琼脂糖及诸如此类的物质。通常,微细管阵列分析可用使用许多种筛选介质,如羟乙基纤维素,聚丙烯胺,葡聚糖等。选用特定的凝胶基质,缓冲液,运行条件以达到最大化分离目的。如,核酸的片段大小,所需的分辨率,起始或未变性的核酸分子。比如,运行缓冲液中可能包含变性剂,高浓度药剂以尿素变性核酸样本等等。In many capillary electrophoresis methods, a capillary (e.g., fused silica capillary, channel-etched, machined, or formed into a flat substrate) is filled with an appropriate separation region/filtering medium. Generally, a variety of known filter media may be used for microtube array analysis. Such filter media include, for example, hydroxyethyl cellulose, polyacrylamide, agarose, and the like. Typically, microtube array analysis can utilize a variety of filter media, such as hydroxyethyl cellulose, polyacrylamide, dextran, and the like. Specific gel matrices, buffers, and running conditions are selected to maximize separation. For example, the fragment size of the nucleic acid, the desired resolution, and the starting or undenatured nucleic acid molecules may be considered. For example, the running buffer may contain denaturants, high concentrations of agents such as urea to denature the nucleic acid sample, and the like.

质谱分析提供了一种“定量”单个分子的方法,通过在真空中分子电离,蒸发,使其可以“飞”的分析手段。在电磁场的组合下,离子的运行轨迹取决于它们各自的质量(m)和电荷(z)。对于低分子量的分子来说,质谱已经是常规的一种物理--有机技术手段,用于分析与鉴定有机分子并测定原分子离子的质量。此外,原分子离子与其他粒子的碰撞,由所谓的碰撞诱导解离(CID)机制,使分子离子分散形成二次离子。并可以通过这种分裂模式/路径推导原分子离子详细的结构信息。质谱分析技术的其他应用在酶学方法里有进行了总结。参见Vol.193:″Mass spectrometry″(j.a.Mccloskey,editor),1990,Academic press,NewYork。Mass spectrometry provides a method for "quantifying" individual molecules by ionizing and evaporating the molecules in a vacuum, allowing them to "fly". Under the combination of electromagnetic fields, the trajectory of the ions depends on their respective mass (m) and charge (z). For low molecular weight molecules, mass spectrometry has become a conventional physical-organic technical means for analyzing and identifying organic molecules and determining the mass of the original molecular ions. In addition, the collision of the original molecular ions with other particles causes the molecular ions to disperse into secondary ions by the so-called collision-induced dissociation (CID) mechanism. Detailed structural information of the original molecular ions can be deduced from this fragmentation pattern/path. Other applications of mass spectrometry technology are summarized in enzymology methods. See Vol. 193: "Mass spectrometry" (j.a.Mccloskey, editor), 1990, Academic press, New York.

由于质谱分析法的明显优势在于可以提供高灵敏度检测,准确的质量测量,并通过CID结合一个ms/(ms配置和速度,以及在线数据直接传输到电脑上)可推导详细的结构信息,因此在核酸的结构分析上,使用质谱分析法有相当大的好处。相关这一领域的综述,包括(Schram,Methods Biochem Anal,34:203-287,1990)和(Crain,Mass SpectrometryReviews,9:505-554,1990)。将质谱分析法应用于核酸分析的最大障碍是很难挥发这些生物聚合物。因此,“测序”仅限于低分子量的合成寡核苷酸通过质谱分析测定原分子离子的质量,确认已知序列,或者,确认已知序列,通过CID的MS/MS配置利用形成二次离子(离子碎片),特别是,电离和挥发,快原子轰击质谱(Fab mass spectrometry)或等离子体解吸质谱(Pd mass spectrometry)。例如,应用Fab质谱方法分析被保护寡脱氧核苷酸化学合成的二聚体。(Koster et al.,Biomedical Environmental Mass Spectrometry 14:111-116,1987)。Mass spectrometry offers significant advantages for the structural analysis of nucleic acids, as it provides highly sensitive detection, accurate mass measurement, and the ability to derive detailed structural information through CID coupled with an MS/MS configuration and speed, as well as online data transfer directly to a computer. Reviews in this field include (Schram, Methods Biochem Anal, 34:203-287, 1990) and (Crain, Mass Spectrometry Reviews, 9:505-554, 1990). The greatest obstacle to the application of mass spectrometry to nucleic acid analysis is the difficulty in volatilizing these biopolymers. Consequently, "sequencing" has been limited to low-molecular-weight synthetic oligonucleotides, where the mass of the primary molecular ion is determined by mass spectrometry and the sequence is confirmed. Alternatively, to confirm the sequence, the CID MS/MS configuration utilizes secondary ion formation (ion fragmentation), specifically, ionization and volatilization, fast atom bombardment mass spectrometry (Fab mass spectrometry) or plasma desorption mass spectrometry (Pd mass spectrometry). For example, Fab mass spectrometry was used to analyze dimers of chemically synthesized protected oligodeoxynucleotides (Koster et al., Biomedical Environmental Mass Spectrometry 14: 111-116, 1987).

二种电离/解析技术是电喷雾/离子喷雾(ES)和基质辅助激光解析/电离(MALDI)。ES质谱分析是由Fenn等人介绍,Fenn et al.,J.Phys.Chem.88;4451-59,1984;PCT申请号Wo90/14148;相关应用总结参见Smith et al.,Anal Chem62:882-89,1990,and Ardrey,Electrospray mass spectrometry,spectroscopy europe,4:10-18,1992.作为一个质量分析仪器,四极子是最常用的。毫微微克分子量级别的样本的分子量的测定的精确性是取决于可用于大规模计算的多个离子峰的存在。Two ionization/desorption techniques are electrospray/ion spray (ES) and matrix-assisted laser desorption/ionization (MALDI). ES mass spectrometry was introduced by Fenn et al., J. Phys. Chem. 88:4451-59, 1984; PCT Application No. Wo90/14148; for a summary of its applications, see Smith et al., Anal Chem 62:882-89, 1990, and Ardrey, Electrospray mass spectrometry, Spectroscopy Europe, 4:10-18, 1992. As a mass analyzer, the quadrupole is the most commonly used. The accuracy of molecular weight determination for femtogram-level samples depends on the presence of multiple ion peaks that can be used for mass calculations.

相比之下,MALDI质谱分析法特别吸引人的是有飞行时间(TOF)装置用做质量分析仪。Hillenkamp等人介绍了MALDI-TOF质谱分析法。(hillenkamp et al.,Biological massspectrometry eds.Burlingame and Mccloskey,Elsevier science publishers,Amsterdam,pp.49-60,1990)。因为,在大多数情况下,这个技术不产生多个分子离子峰,这种质谱法,一般而言,与ES质谱分析法相比看起来简单。DNA在分子量多达410,000道尔顿时能去吸附和挥发(Williams et al.,Science,246:1585-87,1989)。最近,使用这种技术的红外激光器(IR)已经用于较大的核酸的分析,如合成DNA,质粒的限制性内切酶片段,多达2180个核苷酸的转录RNA等。(Berkenkamp et al.,Science,281:260-2,1998)。Berkenkamp还描述了使用MALDI-TOF IR怎样从有限的纯化样品中分析DNA和RNA样本。In contrast, MALDI mass spectrometry is particularly attractive because time-of-flight (TOF) devices are used as mass analyzers. Hillenkamp et al. introduced MALDI-TOF mass spectrometry (Hillenkamp et al., Biological Mass Spectrometry, eds. Burlingame and McCloskey, Elsevier Science Publishers, Amsterdam, pp. 49-60, 1990). Because this technique does not produce multiple molecular ion peaks in most cases, this type of mass spectrometry is generally considered simpler than ES mass spectrometry. DNA can be desorbed and volatilized at molecular weights up to 410,000 Daltons (Williams et al., Science, 246:1585-87, 1989). Recently, infrared lasers (IR) using this technique have been applied to the analysis of larger nucleic acids, such as synthetic DNA, restriction enzyme fragments of plasmids, and transcribed RNAs up to 2180 nucleotides. (Berkenkamp et al., Science, 281:260-2, 1998) Berkenkamp also described how to use MALDI-TOF IR to analyze DNA and RNA samples from limited purified samples.

日本专利号No.59-131909.介绍了一种仪器,可以通过电泳,液相色谱法,或高速凝胶过滤法检测分离核酸片段。质谱检测是通过在核酸中融入通常不会存在与DNA中的离子如S,Br,I,Ag,Au,Pt,Os,Hg等实现。Japanese Patent No. 59-131909 describes an instrument that can detect and separate nucleic acid fragments by electrophoresis, liquid chromatography, or high-speed gel filtration. Mass spectrometry detection is achieved by incorporating ions normally absent from DNA, such as S, Br, I, Ag, Au, Pt, Os, and Hg, into nucleic acids.

荧光标记杂交寡核苷酸探针是众所周知的一种实验技术,也是一个灵敏的,非放射性方法,便于检测探针杂交情况。最近较成熟的检测方法是采用检测荧光能量转移的过程(FET)而不是直接检测荧光强度。FET发生在供体荧光基团和受体染料(可能也有可能不是荧光基团)之间,当一个(受体)的吸收光谱重叠另一个(供体)的发射光谱,这二种染料靠近。染料的这些属性被成为供体/受体染料对或能量转移染料对。供体染料激发的能量通过共振偶极子-感应偶极子转移,与邻近的受体相互作用。这导致供体荧光淬灭。在某些情况下,如果受体也是一个荧光基团,它们的荧光强度会增强。能量转移的效率很大程度上取决于供体与受体之间的距离,这些关系之间的方程式由Forster,Ann Phys2:55-75,1948开发。供体和受体染料的能量转移效率是50%时的距离被确定为Forster距离(Ro)。其他已知的荧光淬灭机制包括电荷转移和碰撞淬灭。Fluorescently labeling hybridization oligonucleotide probes is a well-known experimental technique and a sensitive, non-radioactive method for detecting probe hybridization. A more recently established detection method utilizes fluorescence energy transfer (FET) rather than direct fluorescence intensity. FET occurs between a donor fluorophore and an acceptor dye (which may or may not be a fluorophore) when the absorption spectrum of one (the acceptor) overlaps the emission spectrum of the other (the donor), bringing the two dyes into close proximity. Dyes with these properties are referred to as donor/acceptor dye pairs or energy transfer dye pairs. Energy from the donor dye excitation is transferred to the adjacent acceptor via a resonant dipole-induced dipole interaction. This results in quenching of the donor fluorescence. In some cases, if the acceptor is also a fluorophore, their fluorescence intensity is enhanced. The efficiency of energy transfer is largely dependent on the distance between the donor and acceptor, and equations relating these relationships were developed by Forster, Ann Phys 2:55-75, 1948. The distance between the donor and acceptor dyes at which energy transfer efficiency is 50% is defined as the Forster distance (Ro). Other known fluorescence quenching mechanisms include charge transfer and collisional quenching.

能量转移等机制是依靠两种染料在靠近时相互作用产生淬灭,这是检测或识别核苷酸序列的一种有效的方法,因为这样可以在均一反应体系中分析。均匀分析体系不同于常规的探针杂交分析,依靠检测单个荧光基团的荧光标记,因为多样化分析则需要额外的从游离的标签中分离出已杂交的标签。FET的几种杂交分析模式在Nonisotopic DNA探针技术中有描述。(Academic press,inc.,pgs.311-352,1992)。Mechanisms such as energy transfer rely on the quenching effect of two dyes interacting in close proximity. This is an effective method for detecting or identifying nucleotide sequences because it allows analysis in a homogeneous reaction system. Homogeneous analysis systems differ from conventional probe hybridization assays, which rely on detecting fluorescent labels derived from a single fluorophore, as diverse assays require the additional step of separating hybridized tags from free tags. Several hybridization analysis modes for FET are described in Nonisotopic DNA Probe Technology (Academic Press, Inc., pp. 311-352, 1992).

匀相测定法已被Higuchi等人描述,利用能量转移或荧光淬灭的其他机制来检测核酸扩增,Higuchi et al.(Biotechnology10:413-417,1992)。介绍的方法是通过监测绑定在双链dna上溴化乙锭的荧光强度对dna扩增进行实时检测。这种方法检测的灵敏度有限,因为溴化乙锭的绑定没有特定的目标,且同时也检测到扩增物的背景值。Lee等人(Nucleic Acids Res 21:3761-3766,1993)公开了一种real-time检测方法,一种双重标记的探针在目标基因PCR扩增过程中裂解。探针与下游的扩增引物杂交,所以在Taq酶5′-3′端核酸核酸外切酶的作用下,探针水解,分离出二种荧光染料,并形成一个能量转移对。探针裂解,荧光强度增加。PCT出版的wo96/21144中公布了一种酶促核酸裂解反应导致荧光增强的连续荧光分析报告。荧光能量转移方法只有单一荧光标签的情况下使用,也就是说通过杂交目的基因并淬灭荧光基团。A homogeneous assay, utilizing energy transfer or other mechanisms of fluorescence quenching to detect nucleic acid amplification, has been described by Higuchi et al. (Biotechnology 10:413-417, 1992). This method involves real-time detection of DNA amplification by monitoring the fluorescence intensity of ethidium bromide bound to double-stranded DNA. This method has limited sensitivity because ethidium bromide binding is not targeted and background amplification is also detected. Lee et al. (Nucleic Acids Res 21:3761-3766, 1993) disclose a real-time detection method in which a dual-labeled probe is cleaved during PCR amplification of the target gene. The probe hybridizes to the downstream amplification primer, whereupon the probe is hydrolyzed by the 5′-3′ exonuclease of the Taq enzyme, separating the two fluorescent dyes and forming an energy transfer pair. Probe cleavage results in an increase in fluorescence intensity. PCT Publication WO 96/21144 reports a continuous fluorescence assay in which enzymatic nucleic acid cleavage results in fluorescence enhancement. Fluorescence energy transfer (FET) is used when only a single fluorescent tag is present, i.e. by hybridizing the target gene and quenching the fluorescent group.

在核酸扩增检测中(美国专利No.5,547,861)描述了,引物或探针杂交到目的基因下游,形成杂交的扩增引物。引物在聚合酶的作用下类似于扩增引物进行延伸。延伸后的扩增物取代引物在目的基因的扩增,形成一个双链的二级扩增物,这可作为目标扩增的一个检测指标。二级扩增物可经引物包含的各种各样的标签或报告基团被检测到,引物中的限制性位点则可产生特定大小的片段化DNA,捕获基团,三股螺旋等特殊结构,和双链dna结合蛋白的识别位点等。In nucleic acid amplification detection (U.S. Patent No. 5,547,861), it is described that a primer or probe hybridizes to the downstream of the target gene to form a hybridized amplification primer. The primer is extended similarly to the amplification primer under the action of the polymerase. The extended amplicon replaces the primer in the amplification of the target gene to form a double-stranded secondary amplicon, which can be used as a detection indicator for target amplification. The secondary amplicon can be detected through various tags or reporter groups contained in the primer. The restriction sites in the primer can produce fragmented DNA of a specific size, capture groups, special structures such as triple helices, and recognition sites for double-stranded DNA binding proteins.

许多众所周知的供体/受体染料对可用在本方面方法中使用。包括但不仅限于以下这些:fluorescein isothiocyanate(FITC)/tetramethylrhodamine isothiocyanate(TALIC),FITC/texas red.tm.molecular probes,FITC/n-hydroxysuccmimidyl1-pyrenebutyrate(PYB),FITC/eosin isothiocyanate(EITC),n-hydroxysuccinimidyl1-pyrenesulfonate(PYS)/FITC,FITC/rhodamine x,FITC/tetramethylrhodamine(TAMRA),及其他等等。选择一个特定的供体/受体荧光染料对不是至关重要的。因为,能量转移荧光淬灭机制只有在供体的发射波长和受体的激发波长的荧光团重叠时才发生,也就是说,在两种染料中必须要有足够的光谱重叠才可发生能量转移,电荷转移或荧光淬灭。p-(dimethyl aminophenylazo)benzoic acid(DABCYL)是一种非荧光物质受体染料,可以有效的淬灭相邻荧光基团的荧光,如fluorescein or5-(2′-aminoethyl)aminonaphthalene(edans)。任何在核酸检测方法中可产生荧光淬灭机制的染料对都适用于已公布的检测方法该发明方法。已知的终点检测和内部标记物检测的方法,可以通过连接核酸供体和受体染料的各自的荧光信号而在该发明方法中使用。Many well-known donor/acceptor dye pairs can be used in the methods of the present invention. These include, but are not limited to, fluorescein isothiocyanate (FITC)/tetramethylrhodamine isothiocyanate (TALIC), FITC/texas red.tm. molecular probes, FITC/n-hydroxysuccmimidyl1-pyrenebutyrate (PYB), FITC/eosin isothiocyanate (EITC), n-hydroxysuccinimidyl1-pyrenesulfonate (PYS)/FITC, FITC/rhodamine X, FITC/tetramethylrhodamine (TAMRA), and others. The selection of a specific donor/acceptor fluorescent dye pair is not critical. Energy transfer fluorescence quenching occurs only when the emission wavelength of the donor fluorophore overlaps with the excitation wavelength of the acceptor fluorophore. That is, there must be sufficient spectral overlap between the two dyes for energy transfer, charge transfer, or fluorescence quenching to occur. p-(dimethyl aminophenylazo)benzoic acid (DABCYL) is a non-fluorescent acceptor dye that effectively quenches the fluorescence of adjacent fluorophores, such as fluorescein or 5-(2′-aminoethyl)aminonaphthalene (edans). Any dye pair that can produce fluorescence quenching in nucleic acid detection methods is suitable for use in this inventive method. Known endpoint and internal marker detection methods can be used in this inventive method by linking the fluorescence signals of the nucleic acid donor and acceptor dyes.

值得注意的是此技术可用于微阵列和/或者是基于芯片的DNA技术中,如在以下文献中提到的:Hacia等,Nature Genet,14:441-449,1996和Shoemaker等,Nature Genetics,14:450-456,1996。这些方法包括了准确、快速地定量分析多个基因。通过寡核苷酸标记基因或者运用固定的探针阵列,芯片技术可以被运用于通过高通量杂交的方式分选目标分子(Pease et al.,Proc Natl Acad Sci USA,91:5022-5026,1994;Fodor et al.,Nature,364:555-556,1993)It is noteworthy that this technology can be used in microarray and/or chip-based DNA technology, as described in the following references: Hacia et al., Nature Genet, 14:441-449, 1996 and Shoemaker et al., Nature Genetics, 14:450-456, 1996. These methods include accurate and rapid quantitative analysis of multiple genes. By labeling genes with oligonucleotides or using fixed probe arrays, chip technology can be used to sort target molecules by high-throughput hybridization (Pease et al., Proc Natl Acad Sci USA, 91:5022-5026, 1994; Fodor et al., Nature, 364:555-556, 1993).

另外运用本技术扩增出来的产物可以用于Bistar的OIA技术。OIA运用如镜面样的硅基底。一层薄膜光学涂层以及捕获抗体吸附于此硅基底上。白光照射在薄膜上形成金色的背景色。此颜色不会改变直到光学薄膜的厚度发生改变。Furthermore, products amplified using this technology can be used in Bistar's OIA technology. OIA utilizes a mirror-like silicon substrate. A thin film optical coating and capture antibodies are adsorbed onto this silicon substrate. White light shines on the film, creating a golden background color. This color persists until the thickness of the optical film is altered.

当阳性样品置于此硅基底上时,配体和抗体发生结合。当配体被加入以完成质量强化时,相应地由于分子薄层的厚度发生增加,颜色会从金色变为紫色/蓝色。这个技术可以参考美国专利号:US5,541,057,特此提供参考。When a positive sample is placed on this silica substrate, the ligand and antibody bind. As the ligand is added to achieve mass enhancement, the color changes from gold to purple/blue due to the increased thickness of the molecular layer. This technology can be found in U.S. Patent No. 5,541,057, which is incorporated herein by reference.

加入的RNA或者DNA可以用实时定量PCR(RT-PCR)以定量(Higuchi et al.,Biotechnology10:413-417,1992)。通过确定经过一定量扩增循环,并且出于线性区的扩增物的浓度,可以确定原混合物中目标序列的相对浓度。比如,假设此DNA的混合物是组织或细胞中提出去来的RNA转录出来的cDNA,包含目标序列的目标mRNA的相对丰度可以通过此种方式得到。目标片断的相对成比例的浓度关系只在扩增反应的线性区域成立。Added RNA or DNA can be quantified using real-time quantitative PCR (RT-PCR) (Higuchi et al., Biotechnology 10:413-417, 1992). By determining the concentration of amplicon after a certain number of amplification cycles and within the linear region, the relative concentration of the target sequence in the original mixture can be determined. For example, if the DNA mixture is cDNA transcribed from RNA extracted from tissues or cells, the relative abundance of target mRNA containing the target sequence can be determined in this way. The relative proportionality of the target fragment concentration only holds true within the linear region of the amplification reaction.

目标DNA的最终扩增物浓度,在扩增平台期曲线是由反应试剂的充足性有关,而与目标DNA的原始浓度无关。所以,通过实时定量PCR进行RNA或DNA精确相对定量的首要条件是扩增物的浓度需要在反应线性区。第二个必须满足的条件是RT-PCR反应中扩增出的特定mRNA序列的cDNA需要和一些独立的标准样品进行归一化。RT-PCR试验的目的是确定某个RNA或者DNA序列相对于样品中全部RNA或DNA的丰度。The final amplicon concentration of the target DNA during the plateau phase of the amplification curve is determined by the adequacy of the reaction reagents and is independent of the initial target DNA concentration. Therefore, the first requirement for accurate relative quantification of RNA or DNA by real-time quantitative PCR is that the concentration of the amplicon be within the linear region of the reaction. A second requirement is that the cDNA of the specific mRNA sequence amplified during the RT-PCR reaction be normalized to an independent standard sample. The goal of an RT-PCR experiment is to determine the abundance of a specific RNA or DNA sequence relative to the total RNA or DNA in the sample.

Luminex科技可以使固定于带颜色的微球的核酸定量成为可能。生物分子反应的强度通过测量被称为报告子的第二个分子实现。报告分子通过附着于微球上以测量反映的强度。因为微球和报告分子均带上特定的颜色,数字信号处理可以实时地对这些信号进行翻译,以对每个反应进行实时定量。这些标准化的技术可见美国专利号:US5,736303和US6,057,107,特此提供参考Luminex technology enables the quantification of nucleic acids immobilized on colored microspheres. The intensity of the biomolecular reaction is measured by measuring a second molecule called a reporter. The reporter molecule is attached to the microsphere to measure the intensity of the reaction. Because both the microsphere and the reporter molecule are colored, digital signal processing can interpret these signals in real time to quantify each reaction in real time. These standardized technologies can be found in U.S. Patent Nos. 5,736,303 and 6,057,107, which are hereby incorporated by reference.

实施例VIIExample VII

探测技术Detection technology

扩增物必须要经过探测才能确定目标基因是否被扩增。其中一个探测可视化的方式是利用荧光染料辅助的凝胶成像,比如说溴化乙啶或者Vistra Green,经过紫外光照射探测。或者如果扩增物本身自体被放射性或荧光标记了核苷酸,扩增的产物可以直接进行X光胶片曝光,或者适合的激发光谱探测。The amplicon must be detected to confirm whether the target gene has been amplified. One method for visualization is gel imaging using fluorescent dyes, such as ethidium bromide or Vistra Green, followed by UV illumination. Alternatively, if the amplicon itself is radioactively or fluorescently labeled with nucleotides, the amplified product can be directly exposed to X-ray film or detected using appropriate excitation spectra.

在一种实现方式中,可视化是通过间接的方式,运用一种核酸探针。扩增物分离后,一段带标记的核酸探针与扩增物混合。探针与染料分子结合是最优的,但也可以通过放射性标记进行。在另一种实现方式中,探针与结合分子相结合,例如抗体或者生物素,或者可检测的其他配体。在另一种实现方式中,探针含有荧光染料或者标记。在另一种实现方式中,探针含有质量标记以探测扩增的分子。其他可预见的实现方式包括Taqman TM和分子信标探针。在另外一种实现方式中,固态捕获方法结合标准化的探针也可以被应用。In one embodiment, visualization is achieved indirectly using a nucleic acid probe. After separation of the amplicon, a labeled nucleic acid probe is mixed with the amplicon. It is optimal for the probe to be bound to a dye molecule, but radioactive labeling can also be used. In another embodiment, the probe is bound to a binding molecule, such as an antibody or biotin, or other detectable ligand. In another embodiment, the probe contains a fluorescent dye or label. In another embodiment, the probe contains a mass marker to detect the amplified molecule. Other foreseeable implementations include Taqman™ and molecular beacon probes. In another embodiment, solid-state capture methods combined with standardized probes can also be used.

混入DNA扩增物的标记分子的种类由分析方法决定。当使用微管电泳,微流控电泳,高效液相色谱(HPLC),或者液相色谱(LC)分离时,混合的或插入的荧光染料被用于标记和探测扩增物。样品是通过动态探测的,标记分子经过探测器同时进行荧光定量。如果任何电泳、HPLC或者LC方法用于分离,产物可以通过紫外吸收进行探测,紫外吸收是DNA固有的性质,因此不需要额外加入标记。如果应用聚丙烯酰胺凝胶或者板凝胶电泳,核酸扩增的引物可以用荧光染料、色团染料、放射性基团标记,或者可以通过酶催化反应探测。酶反应包括酶与引物的结合,比如说,在通过凝胶分离扩增物后,通过生物素:抗生物素蛋白相互作用进行分析,又或者是通过化学反应,如鲁米诺化学发光反应。荧光信号可以被动态监测。放射性同位素或没反应检测要求首先进行凝胶分离,然后将DNA分子转移到固态支撑物。如果扩增物是通过质谱仪进行分离的,就不需要标记,因为核酸可以直接被检测。The type of labeling molecule incorporated into the DNA amplicon depends on the analytical method. When separation is performed using microtube electrophoresis, microfluidics electrophoresis, high-performance liquid chromatography (HPLC), or liquid chromatography (LC), mixed or intercalated fluorescent dyes are used to label and detect the amplicon. The sample is detected dynamically, and the labeled molecule passes through the detector and fluorescence is quantified. If any electrophoresis, HPLC, or LC method is used for separation, the product can be detected by UV absorption, which is an intrinsic property of DNA, so no additional labeling is required. If polyacrylamide gel or slab gel electrophoresis is used, the primers for nucleic acid amplification can be labeled with fluorescent dyes, chromophore dyes, radioactive groups, or detected by enzyme-catalyzed reactions. Enzymatic reactions involve enzyme binding to the primers, for example, after gel separation, analysis by biotin:avidin interaction, or chemical reactions such as luminol chemiluminescence. The fluorescent signal can be monitored dynamically. Radioisotope or non-reactive detection requires gel separation followed by transfer of the DNA molecules to a solid support. If the amplicon is separated by mass spectrometry, labeling is not required, as the nucleic acids can be detected directly.

以上提到的各种分离方法可以给予分子的两种以上不同的特性组合起来使用。比如说,有些PCR引物可以与一些配合基结合,可供进行亲和性捕获,某些引物可能没有标记。标记可以包含糖类(如和凝集素柱的结合),疏水性基团(如和反相柱结合),生物素(如和抗生蛋白链菌素柱结合),抗原(如和抗体柱结合)。样品流经亲和层析柱。流过的部分被收集,结合的部分被洗脱(通过化学切割、盐洗脱等)。这些样品进一步通过各种性质(如质量)进行分离以确定基本组分。The various separation methods mentioned above can be combined to apply two or more different properties to a molecule. For example, some PCR primers can be conjugated to ligands for affinity capture, while others may be unlabeled. Labels can include carbohydrates (e.g., for binding to a lectin column), hydrophobic groups (e.g., for binding to a reverse-phase column), biotin (e.g., for binding to a streptavidin column), or antigens (e.g., for binding to an antibody column). The sample is passed through the affinity chromatography column. The flow-through fraction is collected, and the bound fraction is eluted (via chemical cleavage, salt elution, etc.). These samples are further separated by various properties (e.g., mass) to identify the essential components.

实施例VIIIExample VIII

试剂盒Reagent test kit

这里展示的扩增方法内的材料或者试剂可以被组合成试剂盒。现有展示的扩增试剂盒大致包含至少可供反应的酶和核苷酸及引物序列。在较好的实施方式中,试剂盒包含从DNA样品中扩增DNA的说明书。The material or reagent in the amplification method that shows here can be combined into test kit.Existing amplification test kit that shows roughly comprises at least enzyme and nucleotide and primer sequence that can be reacted.In a preferred embodiment, test kit comprises the specification sheets of DNA amplification from DNA sample.

与这里展示方法相关的试剂盒会大体包括一个或多个预先选择的引物和/或探针序列,这些序列可能特异性或非特异性地探测待扩增的基因。试剂盒更可能包含一种或多种核酸探针和/或引物集合以探测核酸。在某些实现方式,如在扩增核酸的试剂盒中,探测核酸的方法是一个标记,例如荧光色团、放射性标记、酶标记等等,这些标记与核酸引物或者序列本身。可以预见试剂盒可能含有多种引物组合以实现本展示方法各部扩增反应。也可以预见试剂盒可能含有沉淀试剂以便于后续用固态介质保存DNA样本。The test kit relevant to the display method here can generally include one or more preselected primers and/or probe sequences, and these sequences may specifically or non-specifically detect genes to be amplified. Test kit is more likely to comprise one or more nucleic acid probes and/or primer sets to detect nucleic acid. In some implementations, as in the test kit for amplifying nucleic acid, the method for detecting nucleic acid is a mark, such as a fluorescent chromophore, a radioactive label, an enzyme label or the like, these marks and nucleic acid primers or the sequence itself. It is foreseeable that test kit may contain multiple primer combinations to realize the various amplification reactions of this display method. It is also foreseeable that test kit may contain a precipitating reagent so that subsequent DNA samples are preserved with solid media.

这里用全基因组扩增试剂盒举例。试剂盒较好的实现方式如下:提供5’端确定,中间随机,3’端固定的引物,引物均匀或显著均匀地与基因组DNA杂交。试剂盒可包含第二个引物,引物与第一个引物的5’端固定序列一致。例如,试剂盒可被运用于扩增所有的基因和染色体的区域,对于特定的物种,这些基因和区域的序列可能已知或未知。试剂盒还会包含适合扩增核酸的酶,核苷酸和扩增所需要的缓冲液。Here, a whole genome amplification kit is used as an example. The kit is preferably implemented as follows: a primer with a defined 5' end, a random middle, and a fixed 3' end is provided, and the primer hybridizes uniformly or significantly uniformly with genomic DNA. The kit may include a second primer that has the same fixed sequence as the 5' end of the first primer. For example, the kit can be used to amplify all genes and chromosomal regions, the sequences of which may or may not be known for a particular species. The kit will also include enzymes suitable for amplifying nucleic acids, nucleotides, and the buffer required for amplification.

本展示方法中的试剂盒可能含有具有一种或多种配合物修饰的引物。另一方面,试剂盒含有一种或多种多核苷酸碱基序列,这些序列相互不会,或不显著会与同一个反应的其他序列相互碱基配对。试剂盒可能包含DNA聚合酶,或者是具有链置换活性的聚合酶,包括,例如下列聚合酶中的一种,或多种混合:Phi29聚合酶,Bst聚合酶,Pyrophase3173聚合酶,Vent聚合酶,Deep Vent聚合酶,TOPOTaq聚合酶,Vent exo-聚合酶,DeepVent Exo-聚合酶,9’Nm聚合酶,Klenow片断,MMLV反转录酶,AMV反转录酶,HIV反转录酶,T7phase DNA聚合酶的exo-突变体。The test kit in this demonstration method may contain primers modified with one or more complexes. On the other hand, the test kit contains one or more polynucleotide base sequences that do not, or do not significantly, base pair with other sequences in the same reaction. The test kit may contain a DNA polymerase, or a polymerase with strand displacement activity, including, for example, one of the following polymerases, or a combination thereof: Phi29 polymerase, Bst polymerase, Pyrophase 3173 polymerase, Vent polymerase, Deep Vent polymerase, TOPOTaq polymerase, Vent exo-polymerase, DeepVent Exo-polymerase, 9'Nm polymerase, Klenow fragment, MMLV reverse transcriptase, AMV reverse transcriptase, HIV reverse transcriptase, exo-mutants of T7phase DNA polymerase.

试剂盒对每个分别的试剂和酶,以及探针和引物可设计不同的容器。每一种生物试剂均应分装至分别合适的容器中。这些容器一般至少包含一个试管或测试小管。烧瓶、试剂瓶或者其他承装试剂的容器形式也有可能被采用。这些容器会封闭保存进行商业销售。合适的较大的容器可能会被应用于包含与测试管相匹配的注射加样辅助器具。具体操作指南可与试剂盒一并提供。The kit may contain separate containers for each reagent and enzyme, as well as probes and primers. Each biological reagent should be dispensed into a separate, appropriate container. These containers typically include at least one test tube or vial. Flasks, reagent bottles, or other reagent containers may also be used. These containers are sealed and stored for commercial distribution. Appropriate larger containers may be used to contain injection aids that match the test tubes. Detailed operating instructions may be provided with the kit.

实施例IXExample IX

扩增物测序Amplicon sequencing

应用高通量测序已发表或已被熟知的方法和流程,DNA的扩增物经过去引物序列后可以用于高通量测序。Using published or well-known methods and processes for high-throughput sequencing, the DNA amplification product can be used for high-throughput sequencing after removing the primer sequence.

实施例XExample X

运用单细胞的一部分遗传物质进行全基因组扩增和测序以应用于全基因组单倍型检测Whole genome amplification and sequencing using a portion of the genetic material of a single cell for genome-wide haplotype detection

用口吸管或者其他已被熟知的方法如:激光显微切割和流式细胞仪可以将一个单细胞分离放置于一个PCR管中。PCR管随后可通过离心将细胞离心至管底。A single cell can be isolated and placed in a PCR tube using a mouth pipette or other well-known methods such as laser microdissection and flow cytometry. The PCR tube can then be centrifuged to remove the cells to the bottom of the tube.

1ul的血样通过指尖取样的方法获得后置于200ul的红细胞裂解液中(0.1%Triton X-100,10mM EDTA in PBS)。经过5分钟室温的处理后,大部分红细胞被裂解,而白细胞或其他细胞的细胞核未被裂解。随后通过口吸管从裂解液中吸出一个单细胞放置在一个试管中。A 1-μl blood sample was obtained by fingerstick sampling and placed in 200 μl of red blood cell lysis buffer (0.1% Triton X-100, 10 mM EDTA in PBS). After 5 minutes at room temperature, most red blood cells were lysed, while the nuclei of white blood cells and other cells remained intact. A single cell was then aspirated from the lysis buffer using a mouth pipette and placed in a test tube.

60ul的裂解液(30mM Tris-Cl PH7.8,2mM EDTA,20mM KCL,0.3%Triton X-100,30mM dTT,12.5ug/ml Qiagen Protease,0.1ul引物5’-GT GAG TGA TGG TTG AGG TAG TGTGGA GNNNNNGGG-3’)加入含有单细胞的管中。以下的温度循环应用于细胞裂解中:50C3小时,70C20分钟。裂解液及温和的加热过程是用于减少DNA的双链断裂及单链上的缺口形成。这样处理过的DNA平均长度长于100kb。0.1uM的引物加入裂解液目的为吸附于管壁及吸液管壁,以防止过多的基因组DNA吸附于这些表面。对于组织样品,可以研磨组织,或者用激光微切割的方法分离组织的一部分,然后置于60ul的裂解液中。对于多个细胞,可以先离心富集细胞后用60ul的裂解液进行裂解。60ul of lysis buffer (30mM Tris-Cl pH 7.8, 2mM EDTA, 20mM KCL, 0.3% Triton X-100, 30mM dTT, 12.5ug/ml Qiagen Protease, 0.1ul of primer 5'-GT GAG TGA TGG TTG AGG TAG TGTGGA GNNNNNGGG-3') is added to the tube containing the single cell. The following temperature cycle is used for cell lysis: 50°C for 3 hours, 70°C for 20 minutes. The lysis buffer and mild heating process are used to reduce double-strand breaks in DNA and the formation of gaps on single strands. The average length of DNA treated in this way is longer than 100kb. The purpose of adding 0.1uM primer to the lysis buffer is to adsorb to the walls of the tube and the pipette to prevent excessive genomic DNA from adsorbing to these surfaces. For tissue samples, the tissue can be ground or a portion of the tissue can be separated by laser microdissection and then placed in 60ul of lysis buffer. For multiple cells, you can first centrifuge to enrich the cells and then lyse them with 60ul of lysis buffer.

以下的流程用于将裂解出的DNA分子分离成多个部分,使每个部分含有基因组的一部分。单细胞的裂解产物DNA分散于60ul的裂解液中,用各种方法可将裂解液分为N份。结果是两个等位基因通常被分散在不同的部分中。对于二倍体细胞,两个等位基因不被分到不同的部分的机会是1/N。在单细胞单倍型实验中,N可任意大以确保等位基因完全分离。如果多个同样的细胞能被收集到,同样的程序能被应用到不同的细胞中,以确保至少在某些细胞中等位基因得到分离。在一个方面,低黏附的PCR管和吸液管被应用于防止DNA吸附于管壁,扩增使用的引物加入裂解液中已防止基因组DNA吸附于管壁。单细胞裂解步骤完成后,运用移液枪可将分离出的DNA分于N个独立的等分部分。或者DNA可以用微流控或者其他已知的方法分于N个个体部分。分离出的DNA在4度下在微流控器件中被小心扰动。微流控器件被特意设计为具有多个出口管道(约30个管道)。随后收集每个管道里的溶液作全基因组扩增。又或者,对于多个细胞或者组织的样本,分离出的DNA可被稀释成少于一个细胞基因组DNA的量,一个人类细胞基因组DNA大约为6pg。The following procedure is used to separate lysed DNA molecules into multiple fractions, each containing a portion of the genome. Single-cell lysate DNA is dispersed in 60 μl of lysis buffer. Various methods can be used to fractionate the lysate into N fractions. As a result, the two alleles are typically separated into distinct fractions. For diploid cells, the chance of the two alleles not being separated into distinct fractions is 1/N. In single-cell haplotype experiments, N can be arbitrarily large to ensure complete allele separation. If multiple identical cells can be collected, the same procedure can be applied to the different cells to ensure allele separation in at least some of the cells. In one aspect, low-adhesion PCR tubes and pipettes are used to prevent DNA from adhering to the tube walls. Amplification primers are added to the lysis buffer to prevent genomic DNA from adhering to the tube walls. After the single-cell lysis step, the separated DNA can be aliquoted into N separate fractions using a pipette. Alternatively, the DNA can be aliquoted into N individual fractions using microfluidics or other known methods. The separated DNA is gently agitated in a microfluidic device at 4°C. The microfluidic device is designed with multiple outlet channels (approximately 30 channels). The solution in each channel is then collected for whole genome amplification. Alternatively, for samples of multiple cells or tissues, the isolated DNA can be diluted to less than the amount of genomic DNA from a single cell, which is approximately 6 pg.

每个部分的DNA随后被这里展示的方法全基因组扩增。又或者,每个部分的DNA可以被各种已知的方法扩增,比如说:多重链置换扩增(Multiple DisplacementAmplification,MDA),或者应用目前市场上存在的全基因组扩增试剂盒,如Picoplex(Rubicon Genomics),Genomeplex (Sigma Aldrich),GenomiPhi(GE HealthcareLifesciences),或者类似的产品。The DNA from each fraction is then subjected to whole genome amplification using the methods described herein. Alternatively, the DNA from each fraction can be amplified using various known methods, such as multiple strand displacement amplification (MDA), or using commercially available whole genome amplification kits such as Picoplex (Rubicon Genomics), Genomeplex (Sigma Aldrich), GenomiPhi (GE Healthcare Lifesciences), or similar products.

通过分别检测每个亚细胞组分的基因组的单碱基核苷酸多态性(SNP)可进行区域性单倍型的分析。全基因组单倍型分析可以通过对比分析多个组分的区域性单倍型获得。首先,各个亚细胞组分的基因组被分别测量和确定。这些序列与人类的参考基因进行比对,确定每个亚细胞组分的SNP情况。因为每一个亚细胞组分均只含有单细胞基因组的一小部分,每个组分的测序覆盖图将呈现模块(block)状分布,基因组的大多部分没有覆盖。统计来说,所有属于一个模块的测序读数来源于同一条染色体,所以所有的在同一模块里面发现的SNP均可被连接并归类到同一个单倍型上。单倍型连接SNP的长度决定于DNA提取后片断的长度。全基因组或全染色体单倍体分型可以通过多个细胞重复上面所述的步骤获得。在这种情况下,每个细胞得到的单倍型相互重叠,单个染色体上的SNP可以准确地得到单倍型分型。除了使用基因组测序,其他的基因组分型方法(如SNP微阵列)也可以被应用。Regional haplotype analysis can be performed by separately measuring single-base nucleotide polymorphisms (SNPs) within the genome of each subcellular fraction. Whole-genome haplotype analysis can be achieved by comparing regional haplotypes across multiple fractions. First, the genomes of each subcellular fraction are sequenced and determined. These sequences are aligned to a human reference genome to determine the SNP profile of each subcellular fraction. Because each subcellular fraction contains only a small portion of the single-cell genome, the sequencing coverage of each fraction will be distributed as a block, with large portions of the genome left uncovered. Statistically, all sequencing reads belonging to a block originate from the same chromosome, so all SNPs found within the same block can be linked and assigned to the same haplotype. The length of the haplotype-linked SNPs is determined by the length of the DNA fragments extracted. Whole-genome or whole-chromosome haplotypes can be obtained by repeating the above steps for multiple cells. In this case, the haplotypes obtained for each cell overlap, allowing accurate haplotype assignment for SNPs on a single chromosome. In addition to genome sequencing, other genomic profiling methods (such as SNP microarrays) can also be used.

这里展示的一个通过从头组装每个亚细胞单倍型组分,并进行相互比较,从而进行全基因组从头组装的一个方法。对于没有参考基因组的物种,或者基因组变异显著的情况(如癌症),基因组从头组装是必要的。进行测序以后,分别确定每一个亚细胞组分的序列。这些亚细胞组分的序列可通过常规的从头组装程序或算法获得,例如使用CLCGenomics Workbench(CLC bio),SeqManNGen(DNASTAR)或者类似算法。DNA模块的大小通常为100kb左右。这些组装好的模块随后相互比对以建立组装基因组。因为这些模块的大小相似,为100kb左右,基因组组装的复杂度与基因组大小成正比,而传统的基因组组装复杂度与基因组大小成指数关系。Demonstrated here is a method for de novo whole-genome assembly by assembling each subcellular haplotype component de novo and comparing them to each other. De novo genome assembly is necessary for species without a reference genome or in situations where genomic variation is significant (e.g., cancer). After sequencing, the sequence of each subcellular component is determined separately. The sequences of these subcellular components can be obtained using conventional de novo assembly programs or algorithms, such as CLCGenomics Workbench (CLC bio), SeqManNGen (DNASTAR), or similar algorithms. The size of the DNA modules is typically around 100 kb. These assembled modules are then aligned to create the assembled genome. Because these modules are of similar size, around 100 kb, the complexity of the genome assembly is proportional to the genome size, while the complexity of traditional genome assembly is exponentially proportional to the genome size.

实施例XIExample XI

单细胞单倍型分型Single-cell haplotype typing

一个从去标识化病人P2处取得的单个白细胞被口吸管分离,然后裂解于10ul的裂解液中。含有裂解液的试管置于一个热循环仪上:50C3小时,70C20分钟。加入60ul的裂解液(不含酶),共70ul的溶液经过混合后分于24个PCR管中。用在示例一中所述的方法用DeepVentR(exo-)聚合酶将24个PCR管中的DNA分别扩增。PCR产物纯化后用荧光定量PCR测试1号及2号染色体上的两个位点。表1中展示了染色体1和染色体2位点在24个PCR管中的测试结果。表中展示的数字是qPCR中的Ct值,表示位点测试阳性的结果。‘x’表示的是位点测试为阴性的结果。A single white blood cell obtained from de-identified patient P2 was isolated by mouth pipette and lysed in 10 μl of lysis buffer. The tube containing the lysis buffer was placed in a thermal cycler: 50°C for 3 hours, 70°C for 20 minutes. 60 μl of lysis buffer (without enzyme) was added, and the total solution of 70 μl was mixed and divided into 24 PCR tubes. The DNA in each of the 24 PCR tubes was amplified using DeepVentR (exo-) polymerase using the method described in Example 1. After purification, the PCR products were tested for two loci on chromosomes 1 and 2 using fluorescent quantitative PCR. Table 1 shows the test results of the chromosome 1 and chromosome 2 loci in the 24 PCR tubes. The numbers shown in the table are the Ct values from qPCR, indicating a positive result for the locus tested. An 'x' indicates a negative result for the locus tested.

表1Table 1

管子tube 11 22 33 44 55 66 77 88 99 1010 1111 1212 chrlchrl xx xx xx xx xx 18.118.1 xx xx xx xx xx xx chr2chr2 xx xx xx xx xx xx xx xx xx xx xx xx 管子tube 1313 1414 1515 1616 1717 1818 1919 2020 21twenty one 22twenty two 23twenty three 24twenty four chrlchrl xx xx xx xx xx xx xx xx xx xx xx 20.820.8 chr2chr2 22twenty two xx xx xx xx xx xx xx 20.220.2 xx xx xx

两个位点均显示在两个管中为阳性,这显示等位基因的分离和扩增对于四个位点(两对处于染色体1和染色体2上的等位基因)均是成功的。如果一次试验中某个位点没有被扩增,可能是由于DNA转移中的丢失,扩增的随机性,以及等位基因分离失败。但是,这里展示的方法降低了这些事件发生的可能性。Both loci showed positive results in both tubes, indicating that allele separation and amplification were successful for all four loci (two pairs of alleles on chromosomes 1 and 2). If a locus is not amplified in an experiment, this could be due to losses during DNA transfer, random amplification, or failure of allele separation. However, the method presented here reduces the likelihood of these events.

实施例XIIExample XII

用两个细胞进行单倍型分析Haplotype analysis using two cells

一个从去标识化病人P2处取得的两个白细胞被口吸管分离,然后裂解于10ul的裂解液中。含有裂解液的试管置于一个热循环仪上:50C3小时,70C20分钟。加入60ul的裂解液(不含酶),共70ul的溶液经过混合后分于12个PCR管中。用在示例一中所述的方法用DeepVentR(exo-)聚合酶将12个PCR管中的DNA分别扩增。PCR产物纯化后用荧光定量PCR测试8个分别位于染色体1-7号及9号的位点。表2中展示了染色这些位点在12个PCR管中的测试结果。表中展示的数字是qPCR中的Ct值,表示位点测试阳性的结果。‘x’表示的是位点测试为阴性的结果。Two white blood cells obtained from de-identified patient P2 were separated by mouth pipette and lysed in 10 μl of lysis buffer. The tube containing the lysis buffer was placed in a thermal cycler: 50°C for 3 hours, 70°C for 20 minutes. 60 μl of lysis buffer (without enzyme) was added, and the total solution of 70 μl was mixed and divided into 12 PCR tubes. The DNA in each of the 12 PCR tubes was amplified using DeepVentR (exo-) polymerase using the method described in Example 1. After purification, the PCR products were tested by fluorescent quantitative PCR at eight loci located on chromosomes 1-7 and 9. Table 2 shows the results of staining these loci in the 12 PCR tubes. The numbers shown in the table are the Ct values from the qPCR, indicating a positive result for the locus tested. An 'x' indicates a negative result for the locus tested.

表2Table 2

管子tube 11 22 33 44 55 66 77 88 99 1010 1111 1212 chr1chr1 xx xx xx 20.520.5 xx 21twenty one xx xx 20.820.8 20.220.2 xx xx chr2chr2 xx xx xx xx xx xx 22.522.5 xx 21.721.7 2626 xx 25.225.2 chr3chr3 xx xx xx xx xx xx xx xx xx xx xx xx chr4chr4 xx 22.222.2 xx xx 23twenty three xx 24.724.7 xx 22.522.5 xx xx xx chr5chr5 23.523.5 xx xx 22twenty two xx xx xx xx xx xx xx xx chr6chr6 25.425.4 26.426.4 xx xx xx xx xx xx 23.123.1 xx xx xx chr7chr7 xx xx xx 23.123.1 xx xx 23.423.4 25.325.3 xx 24.524.5 xx xx chr9chr9 2525 xx xx xx xx 25.525.5 24.824.8 xx 22.622.6 xx xx xx

对于染色体1,2,4,7,9号上的位点,四个管子均显示阳性结果,这代表两个细胞的两对等位基因均被分离在12个管子中的其中四个管子中。对于3号染色体,没有位点位阳性,显示扩增对于这个位点不完全。对于5号及6号染色体上的两个位点,少于4个管子显示阳性结果。这可能是由于某些等位基因没有被扩增,也可能是两个以上的等位基因被分到了同一个管子里。增加管子的数目可以确保绝大部分的等位基因分离在不同的管子里。For loci on chromosomes 1, 2, 4, 7, and 9, all four tubes showed positive results, indicating that both pairs of alleles from the two cells were separated in four of the 12 tubes. For chromosome 3, no loci were positive, indicating incomplete amplification at this locus. For loci on chromosomes 5 and 6, fewer than four tubes showed positive results. This may be due to some alleles not being amplified or more than two alleles being separated in the same tube. Increasing the number of tubes can ensure that the majority of alleles are separated in different tubes.

实施例XIIIExample XIII

不通过分离单细胞的单倍型分型Haplotype typing without isolating single cells

将志愿者P1的1ul血液置于100ul的细胞裂解液中,并设置以下温度循环:50℃3小时,70℃20分钟。裂解后的血液被稀释400x和1000x,1ul的稀释溶液用于扩增。稀释后,每个管里约含有1/4个和1/10个细胞量的基因组DNA。表格3中显示的是1/4细胞(1-5)和1/10细胞(6-10)的荧光定量的结果,测试位点是染色体1-7号及9号上面的位点。下表数字显示qPCR的Ct值,显示该位点阳性的结果。1ul of blood from volunteer P1 was placed in 100ul of cell lysis buffer and the following temperature cycle was set: 50°C for 3 hours, 70°C for 20 minutes. The lysed blood was diluted 400x and 1000x, and 1ul of the diluted solution was used for amplification. After dilution, each tube contained approximately 1/4 and 1/10 of the genomic DNA of a cell. Table 3 shows the results of fluorescence quantification of 1/4 cells (1-5) and 1/10 cells (6-10). The test sites are chromosomes 1-7 and above 9. The numbers in the table below show the Ct values of qPCR, indicating positive results for the sites.

表三Table 3

位点的均匀分布显示在用稀释法进行等位基因分离中,染色体纠缠并不是一个显著的因素。染色体2,3,6,7,9号的等位基因分离显示单细胞分离并不是单细胞单倍型分型的必要条件。The uniform distribution of loci indicates that chromosome entanglement is not a significant factor in allele segregation by dilution. The segregation of alleles on chromosomes 2, 3, 6, 7, and 9 indicates that single-cell segregation is not a requirement for single-cell haplotype typing.

实施例XIVExample XIV

单细胞全基因组扩增Single-cell whole genome amplification

多重退火环状扩增循环(MALBAC)的实现步骤如下。SW480肠道腺癌细胞系从American Type Culture Collection(ATCC,Rockville)取得。SW480细胞培养于ATCC配方的Leibovitz L-15细胞培养液,添加10%胎牛血清(ATCC),100I.U./ml青霉素和100ug/ml链霉素(ATCC)。细胞经0.25%胰酶-EDTA处理后置于带PEN膜的玻璃片上(Leica)。细胞经70%乙醇固定三分钟后用PBS漂洗。细胞随后于0.5%亚甲基绿中染色约20秒后,用ddH2O漂洗两次。The Multiple Annealing Loop Amplification Cycle (MALBAC) protocol was performed as follows. SW480 intestinal adenocarcinoma cell line was obtained from the American Type Culture Collection (ATCC, Rockville). SW480 cells were cultured in ATCC-prepared Leibovitz L-15 medium supplemented with 10% fetal bovine serum (ATCC), 100 IU/ml penicillin, and 100 μg/ml streptomycin (ATCC). Cells were treated with 0.25% trypsin-EDTA and then plated on glass slides with PEN membranes (Leica). Cells were fixed with 70% ethanol for three minutes and then rinsed with PBS. Cells were then stained with 0.5% methylene green for approximately 20 seconds and rinsed twice with ddH2O.

在激光切割前,用紫外照射的方法去除低黏附PCR管里的DNA污染。然后用激光切割仪(Leica LMD7000)将单细胞切割至独立的PCR管中。经过短暂的离心将单细胞离心至PCR管底后,每个管中加入5ul新鲜配置的细胞裂解液(30mM Tris-Cl PH7.8,2mM EDTA,20mM KCl,0.2%Triton X-100,12.5ug/ml QIAGEN Protease)。单细胞裂解经过以下的温度循环:50C3小时,75C20分钟,80C5分钟。单细胞分离也可以用其他的方式进行,例如用口吸管或者流式细胞仪。Before laser cutting, DNA contamination was removed from low-adhesion PCR tubes using UV irradiation. Single cells were then excised into individual PCR tubes using a laser cutter (Leica LMD7000). After a brief centrifugation to remove the single cells to the bottom of the PCR tubes, 5 μl of freshly prepared cell lysis buffer (30 mM Tris-Cl pH 7.8, 2 mM EDTA, 20 mM KCl, 0.2% Triton X-100, 12.5 μg/ml QIAGEN Protease) was added to each tube. Single cells were lysed using the following temperature cycle: 50°C for 3 hours, 75°C for 20 minutes, and 80°C for 5 minutes. Single-cell isolation can also be performed using other methods, such as using a mouth pipette or flow cytometry.

在预扩增中,每个裂解单细胞的PCR管中加入30ul的预扩增缓冲液(20mM Tris-ClPH8.8,10mM(NH4)2SO4,10mM KCl,3mM MgS04,0.1%Triton X-100,0.32uM GAT3T引物(5’-GTGAGTGATGGTTGAGGTAGTGTGGAGNNNNNTTT-3’),0.25uMGAT3G引物(NG5’-GTGAGTGATGGTTGAGGTAGTGTGGAGNNNNNGGG-3’),94C处理三分钟以将双链基因组DNA解离为单链。随后,将这些单链DNA迅速置于冰上以确保其与引物的有效结合。随后加入2.5U的Bst大片段(NEB),或者2U的Bst大片段和0.8U的Pyrophage3173exo-(Lucigen)。进行以下的温度循环:10℃-45秒20℃-45秒30℃-60秒40℃-45秒50℃-45秒62℃-2分钟95℃-20秒,然后将PCR管快速置于冰上。In the pre-amplification, 30 μl of pre-amplification buffer (20 mM Tris-Cl pH 8.8, 10 mM (NH4)2SO4, 10 mM KCl, 3 mM MgSO4, 0.1% Triton X-100, 0.32 μM GAT3T primer (5'-GTGAGTGATGGTTGAGGTAGTGTGGAGNNNNNTTT-3'), 0.25uMGAT3G primer (NG5'-GTGAGTGATGGTTGAGGTAGTGTGGAGNNNNNGGG-3'), 94C treatment for three minutes to dissociate the double-stranded genomic DNA into single strands. Subsequently, these single-stranded DNAs were quickly placed on ice to ensure their effective binding to the primers. Subsequently, 2.5U of Bst large fragment (NEB), or 2U of Bst large fragment and 0.8U of Pyrophage3173exo- (Lucigen) were added. The following temperature cycle was performed: 10℃-45 seconds, 20℃-45 seconds, 30℃-60 seconds, 40℃-45 seconds, 50℃-45 seconds, 62℃-2 minutes, 95℃-20 seconds, and then the PCR tube was quickly placed on ice.

在冰上退火后,加入同样的聚合酶混合物并进行以下温度循环:10℃-45秒,20℃-45秒,30℃-60秒,40℃-45秒,50℃-45秒,62℃-2分钟,95℃-20秒,58℃-20秒。并重复以上步骤四次以得到与扩增物的混合物。After annealing on ice, the same polymerase mixture was added and the following temperature cycle was performed: 10°C for 45 seconds, 20°C for 45 seconds, 30°C for 60 seconds, 40°C for 45 seconds, 50°C for 45 seconds, 62°C for 2 minutes, 95°C for 20 seconds, and 58°C for 20 seconds. The above steps were repeated four times to obtain a mixture with the amplified product.

经过几个上述的循环后,产物中可能包含完整的扩增物或者半扩增物。在融解后58度的退火使扩增物的两端形成双链DNA。可识别引物序列的限制性内切酶(如BseGI)可用以处理扩增物。结果是只有半端的半扩增物是完整的。限制性内切酶可以加热80C20分钟进行灭活,然后可以进行更多的扩增循环以产生扩增物。这个方法可以产生线性扩增的结果。After several cycles of this process, the product may contain either a complete amplicon or a half-amplicon. Annealing at 58°C after melting allows the ends of the amplicon to form double-stranded DNA. A restriction endonuclease (such as BseGI) that recognizes the primer sequence can be used to treat the amplicon. As a result, only the half-amplicon at the half end remains intact. The restriction endonuclease can be inactivated by heating at 80°C for 20 minutes, and then further amplification cycles can be performed to produce amplicons. This method can produce linear amplification results.

上述的扩增物可以进一步用PCR进行扩增,目的是进行新一代基因组测序。往预扩增物中加入新鲜配置的30ul的扩增混合物(20mM Tris-Cl PH8.8,10mM(NH4)2SO4,10mMKCl,4mM MgSO4,0.1%Triton X-100,0.66uM Bio-GAT引物(5’/Biosg/GTGAGTGATGGTTGAGGTAGTGTGGAG-3’),2.4U Deep VentR exo-聚合酶(NEB)。温度循环:94C20秒,59C20秒,65C1分钟,72C2分钟重复18遍,从一个细胞开始共可得到2-3微克的双链DNA产物用以进行高通量测序。The amplified product can be further amplified by PCR for next-generation genome sequencing. To the pre-amplified product, 30 μl of freshly prepared amplification mix (20 mM Tris-Cl pH 8.8, 10 mM (NH₄)₂SO₄, 10 mM KCl, 4 mM MgSO₄, 0.1% Triton X-100, 0.66 μM Bio-GAT primer (5'/Biosg/GTGAGTGATGGTTGAGGTAGTGTGGAG-3'), and 2.4 U Deep VentR exo-polymerase (NEB)) was added. The temperature was cycled 18 times: 94°C for 20 seconds, 59°C for 20 seconds, 65°C for 1 minute, and 72°C for 2 minutes. Starting from one cell, a total of 2-3 μg of double-stranded DNA product was obtained for high-throughput sequencing.

扩增物的长度分布为500bp到1500bp,其5’带有生物素修饰。用16个位点的定量PCR可用于检查扩增的均一性,每一个位点均来自一个不同的染色体(Rochette et al.,Journal of Molecular Biology352,44(2005))16个测试的位点有14个均扩增成功,成功率与30X深度全基因组测序中92%的位点覆盖度相吻合。数据总结在以下的表4中,显示随机挑选的16个位点的Ct值。The lengths of the amplicons ranged from 500 bp to 1500 bp, with a 5' biotin modification. Quantitative PCR was performed at 16 loci, each from a different chromosome (Rochette et al., Journal of Molecular Biology 352, 44 (2005)). Amplification was successful at 14 of the 16 sites tested, a success rate consistent with the 92% site coverage achieved by 30X deep whole-genome sequencing. The data are summarized in Table 4 below, showing the Ct values for the 16 randomly selected loci.

表4Table 4

qPCRqPCR Chr1Chr1 Chr2Chr2 Chr3Chr3 Chr4Chr4 Chr5Chr5 Chr6Chr6 Chr7Chr7 Chr9Chr9 单细胞Single cell 22.522.5 24.424.4 36.536.5 24.824.8 25.425.4 26.926.9 25.525.5 25.225.2

阳性对照Positive control 22.222.2 24.524.5 29.629.6 24.424.4 24.424.4 24.424.4 25.525.5 25.125.1 qPCRqPCR Chr12Chr12 Chr13Chr13 Chr13Chr13 Chr15Chr15 Chr16Chr16 Chr17Chr17 Chr18Chr18 Chr19Chr19 单细胞Single cell 25.825.8 23.923.9 39.039.0 24.724.7 21.321.3 24.324.3 24.224.2 27.027.0 阳性对照Positive control 26.426.4 26.326.3 30.030.0 23.823.8 20.020.0 25.825.8 23.723.7 22.522.5

单细胞扩增的结果与500pg基因组DNA的阳性样品扩增的结果相吻合。负对照中所有位点的Ct值均大于30,这些位点的引物信息如下:The results of single-cell amplification were consistent with those of the positive sample amplification using 500pg of genomic DNA. The Ct values of all sites in the negative control were greater than 30. The primer information for these sites is as follows:

Chr1+:AGG AAA GGC ATA CTG GAG GGA CATChr1+: AGG AAA GGC ATA CTG GAG GGA CAT

Chr1-:TTA GGG ATG GCA CCA CAC TCT TGAChr1-: TTA GGG ATG GCA CCA CAC TCT TGA

Chr2+:TCC CAG AGA AGC ATC CTC CAT GTTChr2+:TCC CAG AGA AGC ATC CTC CAT GTT

Chr2-:CAC CAC ACT GCC TCAAAT GTT GCTChr2-:CAC CAC ACT GCC TCAAAT GTT GCT

Chr3+:TCA AGT TGC CAG CTG TGG CTG TATChr3+: TCA AGT TGC CAG CTG TGG CTG TAT

Chr3-:AGA AGG GCA TTT CCT GTC AGT GGAChr3-:AGA AGG GCA TTT CCT GTC AGT GGA

Chr4+:ATG GGC AAA TCC AGA AGA GTC CAGChr4+:ATG GGC AAA TCC AGA AGA GTC CAG

Chr4-:CCA TTC ACT TCC TTG GAA AGG TAG CCChr4-: CCA TTC ACT TCC TTG GAA AGG TAG CC

Chr5+:AAT AGC GTG CAG TTC TGG GTA GCAChr5+: AAT AGC GTG CAG TTC TGG GTA GCA

Chr5-:TTC ACA TCC TGG GAG GAA CAG CATChr5-:TTC ACA TCC TGG GAG GAA CAG CAT

Chr6+:TGA ATG CCA GGG TGA GAC CTT TGAChr6+: TGA ATG CCA GGG TGA GAC CTT TGA

Chr6-:TGT TCA TTA TCC CAC GCC AGG ACTChr6-: TGT TCA TTA TCC CAC GCC AGG ACT

Chr7+:ACC AAA GGA AAG CCA GCC AGT CTAChr7+: ACC AAA GGA AAG CCA GCC AGT CTA

Chr7-:ACT CCA CAG CTC CCA AGC ATA CAAChr7-:ACT CCA CAG CTC CCA AGC ATA CAA

Chr9+:TCC CAG CTC TCT CTC TTG CAT CTTChr9+: TCC CAG CTC TCT CTC TTG CAT CTT

Chr9-:AGT GAA GCT GGT GTATGC AGA GGTChr9-:AGT GAA GCT GGT GTATGC AGA GGT

Chr12+:AGA GGG CTG CTT TAT GCA GGT GChr12+: AGA GGG CTG CTT TAT GCA GGT G

Chr12-:CTA CAT TTG GGT CTT TGC TGC CAT GChr12-:CTA CAT TTG GGT CTT TGC TGC CAT G

Chr13+:AGCAGC CCCAGG CAGATChr13+: AGCAGC CCCAGG CAGAT

Chr13-:CGGAGA GGA CGG TCA CGT TTA CChr13-: CGGAGA GGA CGG TCA CGT TTA C

Chr14+:GTC CAG CAC TAG TGA TCT TGT CCChr14+: GTC CAG CAC TAG TGA TCT TGT CC

Chr14-:CGT GGG AGT TTT GAAATG CGA TGTChr14-: CGT GGG AGT TTT GAAATG CGA TGT

Chr15+:CCT GTC TCT GCT CCT GCGChr15+: CCT GTC TCT GCT CCT GCG

Chr15-:TGCACA CAT GCA CAG TGGAGChr15-: TGCACA CAT GCA CAG TGGAG

Chr16+:CTC CAA GGT TCT GCA GCC TCChr16+: CTC CAA GGT TCT GCA GCC TC

Chr16-:GGTATGACTACA CAT TCA GGC TGGChr16-:GGTATGACTACA CAT TCA GGC TGG

Chr17+:GTG GTA CAT AGT GCA TGG TCC GChr17+: GTG GTA CAT AGT GCA TGG TCC G

Chr17-:GGC GAC ATA CCC CAA CTT CAT AAGChr17-:GGC GAC ATA CCC CAA CTT CAT AAG

Chr18+:CGT TCT TAG GAC CAAAGG GCT GChr18+: CGT TCT TAG GAC CAAAGG GCT G

Chr18-:CCA GCA TCC ATG TCT CTG CACChr18-:CCA GCA TCC ATG TCT CTG CAC

Chr19+:GCC CAG AGC GCC TGAChr19+: GCC CAG AGC GCC TGA

Chr19-:CCAG CCC CTG GAC CAC TChr19-:CCAG CCC CTG GAC CAC T

扩增的均匀性可以用图5A中的Lorentz曲线图进行衡量。图5A中画的是累计的读数与累计的基因组覆盖率之间的关系,比较了未经扩增的阳性群体样品,基于本发明的MALBAC扩增样品和MDA扩增样品。完全完美的扩增将会形成对角线,而与对角线较大的偏差显示扩增存在显著偏倚。图5A中比较的未经扩增阳性样品,MALBAC扩增物,MDA扩增物均均一化至8x测序深度。未经扩增的阳性样品是最接近对角线的。在8x测序深度下,MALBAC产物覆盖率达到85%,与95%的阳性样品相似而比MDA的45%要好(这里图5A中显示基因组未覆盖的部分对于阳性样品,MALBAC和MDA分别为5%,15%和55%)。The uniformity of amplification can be measured using the Lorentz plot in Figure 5A. Figure 5A shows the relationship between the cumulative number of reads and the cumulative genome coverage, comparing unamplified positive population samples, MALBAC amplified samples based on the present invention, and MDA amplified samples. A completely perfect amplification will form a diagonal line, while a larger deviation from the diagonal line shows that there is a significant bias in amplification. The unamplified positive samples, MALBAC amplification products, and MDA amplification products compared in Figure 5A were all normalized to 8x sequencing depth. The unamplified positive samples are the closest to the diagonal line. At 8x sequencing depth, the MALBAC product coverage reaches 85%, which is similar to 95% of the positive samples and better than 45% of MDA (here, Figure 5A shows that the uncovered portion of the genome is 5%, 15%, and 55% for the positive samples, respectively, for MALBAC and MDA).

CNV指的是基因组片段的插入、缺失和重复,这些片段大小不一,可由几千个碱基到整个染色体。CNV广泛存在和被发现于几乎所有种类的肿瘤中。CNV从一个细胞起源,制造了对肿瘤的发生发展重要的基因组变异。这里展示的方法可实现单细胞CNV分析的单拷贝分辨率。为进一步展示基因组扩增的均匀性,我们展示读数密度的功率频谱,显示读数的空间频率。完全均匀的分布将给出一个delta函数的功率频谱。分析表明MALBAC扩增的样品的功率频谱与未扩增的阳性样品相似。而MDA则在几十千碱基到几十兆碱基的范围内显示出很高的值,导致在这个尺度下基因组扩增的过度扩增和扩增不足。这展示了MALBAC方法可以用于检测拷贝数变异(CNV)。CNVs refer to insertions, deletions, and duplications of genomic fragments, ranging in size from a few kilobases to entire chromosomes. CNVs are widespread and found in nearly all tumor types. CNVs originate from a single cell and create genomic variations that are important for tumor development and progression. The method presented here enables single-copy resolution for single-cell CNV analysis. To further demonstrate the uniformity of genomic amplification, we present the power spectrum of read density, which displays the spatial frequency of reads. A perfectly uniform distribution would produce a power spectrum that is a delta function. Analysis shows that the power spectrum of MALBAC-amplified samples is similar to that of unamplified positive samples. MDA, on the other hand, displays high values in the range of tens of kilobases to tens of megabases, indicating both over- and under-amplification of the genome at this scale. This demonstrates the feasibility of the MALBAC method for detecting copy number variations (CNVs).

由于MALBAC扩增的均一性和单拷贝的分辨能力,两个SW480癌细胞的全基因CNV可以被确定。这两个细胞相互不同,展现出数字化的拷贝数差异。如不进行单细胞测量,这个信息将掩盖在群体测量的结果中。我们用隐式Markov模型来获取单细胞的CNV并具有统计学显著性。作为对照,我们获取了该细胞系的核型。比如说,对于二倍体细胞,大多数染色体有两个拷贝。但是,对于SW480细胞系,8号染色体只有一个拷贝,17号染色体有三个拷贝。数字化计数和核型的结果吻合。Due to the uniformity of MALBAC amplification and its single-copy resolution, whole-genome CNVs could be determined between two SW480 cancer cells. These two cells differed from each other, exhibiting digital copy number differences. Without single-cell measurements, this information would be obscured by population measurements. We used an implicit Markov model to identify statistically significant single-cell CNVs. As a control, we obtained the karyotype of the cell line. For example, in diploid cells, most chromosomes have two copies. However, in the SW480 cell line, chromosome 8 has only one copy and chromosome 17 has three copies. The digital counts and karyotype results were consistent.

为进一步研究单细胞中的CNV,我们计算两个细胞全基因的相对比值。这种归一化方法去除了剩余的扩增偏倚,可以得到更平滑的CNV结果。大约8个明显的变异可以在全基因组范围内被发现,显示这两个癌症细胞有8处显著的CNV差别。分析显示对于13号染色体,细胞1有三个拷贝,而细胞而只有两个拷贝。To further investigate CNVs in single cells, we calculated the relative ratio of whole-genome CNV values between the two cells. This normalization method removes any remaining amplification bias, resulting in smoother CNV results. Approximately eight distinct CNVs were identified genome-wide, indicating eight significant CNV differences between the two cancer cells. This analysis revealed that cell 1 had three copies of chromosome 13, while cell 2 had only two copies.

扩增后的DNA产物用于SoliD及Illumina测序系统的建库。DNA破碎了以后进行末端补平,5’端的生物素修饰可以阻止测序接头和扩增接头的有效连接。这样做了以后,由于扩增引物而导致的无法与参考基因组比对的读数的比率显著下降。The amplified DNA products were used to construct libraries for the SoliD and Illumina sequencing systems. After DNA fragmentation, the ends were blunted, and biotinylation at the 5' end prevented efficient ligation of sequencing adapters and amplification adapters. This significantly reduced the proportion of reads that could not be aligned to the reference genome due to amplification primers.

SW480细胞系单细胞中的单碱基变异(SNV)可以这样分析。当一个特别的SNV出现在一个细胞的时候,对于群体的测序没有办法探测出这个SNV。这显示了对于单细胞进行SNV分析的必要性。每一个单细胞月含有1.6x10^6个SNV(表5),与群体中每1kb约1个SNV的频率吻合。Single-base variants (SNVs) in single cells of the SW480 cell line can be analyzed in this way. While a specific SNV may occur in a single cell, it cannot be detected by sequencing the entire population. This demonstrates the importance of single-cell SNV analysis. Each single cell contained 1.6x10^6 SNVs (Table 5), consistent with a population frequency of approximately one SNV per 1 kb.

表5Table 5

大多数的SNV均与群体细胞的SNV相吻合,显示这些是遗传于细胞系早期代数的共同SNV。但是,扩增会造成约10^-5的错误率,这会导致SNV的假阳性率远高于单细胞新生长SNV的几率。这个是单细胞测序一个很重要的挑战。为了解决这个问题并且明确确认单细胞里的SNV,我们选择由一个细胞分裂的两个子细胞进行测序和分析(表1)。在两个子细胞里面同时发现同样的SNV的机会大约是10^-10,基本可以被忽略。我们发现了13个新生成的SNV。这些发现是通过群体细胞基因组测序所不能发现的。Most SNVs matched those in the cell population, indicating that these were common SNVs inherited from earlier generations of the cell line. However, amplification results in an error rate of approximately 10^-5, which results in a much higher false positive rate for SNVs than the rate of newly grown SNVs in single cells. This is a significant challenge in single-cell sequencing. To address this issue and clearly identify SNVs in single cells, we sequenced and analyzed two daughter cells derived from a single cell division (Table 1). The chance of finding the same SNV in both daughter cells is approximately 10^-10, which is essentially negligible. We discovered 13 newly generated SNVs. These findings would not have been discovered through genome sequencing of cell populations.

分析显示13个新生成的SNV里面有6个聚集在一个500bp的区域,这显示他们的产生不是随机的。测序的数据由illumina测序仪产生,用BWA进行比对(参考:Bionformatics25,1754(2009),SoliD产生的测序数据使用BioScope进行比对的)。经过比对以后,重复的序列被剔除。然后运用vcftools检测出phred质量数大于50的,位点频率大于或等于0.5的SNV。对于一对子细胞我们分别测出了2,459,529和2,225,969个SNV。其中1,774,887个SNV在两个子细胞里均存在。我们进一步筛选10x测序深度以上的SNV。为了区分出新形成的SNV,我们用未扩增的群体结果对SNV进行筛选。在两个子细胞里存在,并且在群体细胞里不存在的SNV是潜在的新形成的SNV。但是,由于扩增和测序中的系统错误和误差导致仍然存在一些假阳性。这些假阳性SNV可以用一个无关的单细胞样品的扩增和测序被区分和剔除。这样做了以后,最后剩下的新生成SNV需要至少有20x的基因组覆盖度。Analysis revealed that 6 of the 13 newly generated SNVs were clustered within a 500-bp region, indicating that their generation was not random. Sequencing data were generated using an Illumina sequencer and aligned using BWA (reference: Bionformatics 25, 1754 (2009); SolidD-generated sequencing data were aligned using BioScope). After alignment, duplicate sequences were removed. SNVs with a phred quality greater than 50 and a site frequency greater than or equal to 0.5 were then detected using vcftools. For each daughter cell pair, 2,459,529 and 2,225,969 SNVs were detected, respectively. Of these, 1,774,887 SNVs were present in both daughter cells. We further screened for SNVs that were present at a sequencing depth of 10x or greater. To identify newly generated SNVs, we screened the SNVs using unamplified population data. SNVs present in both daughter cells but not in the population were potentially newly generated SNVs. However, some false positives still occur due to systematic errors and inaccuracies in amplification and sequencing. These false positive SNVs can be identified and eliminated by amplifying and sequencing an unrelated single-cell sample. After this, the remaining newly generated SNVs need to have at least 20x genome coverage.

本扩增方法的错误率是通过单细胞中发现的SNV与群体未扩增样品中的SNV进行对比得到的。单细胞中得到的但在群体中不存在的SNV大多数是因为扩增错误造成的。对于一对子细胞,我们分别发现了51805个和42705个假阳性,这说明扩增的错误率分别是:1.7x10^-5和1.4x10^-5。The error rate of this amplification method is calculated by comparing SNVs detected in single cells with SNVs in unamplified samples from the population. SNVs detected in single cells but not in the population are mostly due to amplification errors. For a pair of daughter cells, we found 51,805 and 42,705 false positives, respectively, indicating an amplification error rate of 1.7x10^-5 and 1.4x10^-5, respectively.

单细胞拷贝数变异是通过隐式Markov模型得到的。我们通过wgsim来模拟形成读数以对比对度进行校正,并保证平均长度是250kb,并且每个窗口含有相同的读数。对于每个窗口,我们允许0-9个拷贝数。我们利用一个转化矩阵对偏差二倍体细胞的拷贝数进行计算。具体来说,对于拷贝数m到拷贝数n转化矩阵的几率是:Single-cell copy number variation was captured using an implicit Markov model. We simulated reads using wgsim to correct for alignment, ensuring an average length of 250 kb and identical reads per window. For each window, we allowed 0–9 copy numbers. We calculated the copy number of cells with deviations from diploidy using a transformation matrix. Specifically, the probability of a transformation matrix from copy number m to copy number n is:

T(m,n)={1-f/b,当m=n=2T(m,n)={1-f/b, when m=n=2

1-b/l,当m=n≠21-b/l, when m=n≠2

(b-f)/l,当m≠n=2(b-f)/l, when m≠n=2

f/(N-2)/l,其余情况f/(N-2)/l, for the rest of the cases

其中f=10-9是拷贝数变异发生的频率,1=5x107是拷贝数变异的长度,b=250000是窗口的大小,N=10是状态的数目。Where f = 10 -9 is the frequency of copy number variation, 1 = 5x10 7 is the length of the copy number variation, b = 250000 is the window size, and N = 10 is the number of states.

转化矩阵跃迁的几率是从单个正常的血细胞通过线形预扩增得到。这里假设这个细胞没有显著的拷贝数变异。P2表示某个窗口中的读数是双倍体。P1表示相应的观测值读数是单倍体。假设双倍体读数值是由单倍体读数值独立组合而成的。则P2=P1*P1,*代表卷积。这个关系在Fourier空间里面可以被反转。一旦得到了p1,转化矩阵高能级的跃迁几率可以用Pn=Pn-1*P1得到。考虑测序深度,癌症细胞的读数在计算时将读数总深度用正常细胞的相应深度进行归一,然后用Viterbi算法计算最可能的状态。The probability of transition in the transformation matrix is obtained from a single normal blood cell through linear pre-amplification. It is assumed here that this cell has no significant copy number variation. P2 indicates that the reading in a certain window is diploid. P1 indicates that the corresponding observed reading is haploid. Assume that the diploid read value is composed of an independent combination of haploid read values. Then P2=P1*P1, * represents convolution. This relationship can be reversed in Fourier space. Once p1 is obtained, the transition probability of the high energy level of the transformation matrix can be obtained as Pn=Pn-1*P1. Considering the sequencing depth, the total depth of the reads of cancer cells is normalized by the corresponding depth of normal cells when calculating, and then the most likely state is calculated using the Viterbi algorithm.

分别计算未扩增的群体细胞、MALBAC、以及MDA的Lorentz曲线。读数在每5kb内做统计以用于得到Lorentz曲线,所以5kb以内的扩增偏倚可以被平均掉。Lorentz曲线里大于5kb的部分显示的是5kb以上的扩增偏倚。MALBAC的扩增偏倚主要在5kb以内,而MDA的5kbLorentz曲线的改进不太明显,显示MDA具有比5kb大得多的扩增偏倚。Lorentz curves were calculated for the unamplified population, MALBAC, and MDA. Read counts were calculated within 5 kb intervals to generate the Lorentz curves, so amplification bias within 5 kb was averaged out. The portion of the Lorentz curve greater than 5 kb indicates amplification bias above 5 kb. MALBAC's amplification bias is primarily within 5 kb, while the improvement in the 5 kb Lorentz curve for MDA is less pronounced, indicating that MDA has amplification bias much greater than 5 kb.

测序实验的结果在表6中总结。SW480的基因组不完整,只包含了约90%的参考基因组。单细胞的覆盖度比例用这个比例进行归一划。*Bst大片段和Pyrophage3173(exo-)同时用于MALBAC扩增提高了只用Bst大片段扩增的覆盖度。**S:SoliD系统得到的数据,I:Illumina系统得到的数据。The results of the sequencing experiments are summarized in Table 6. The SW480 genome is incomplete, encompassing only approximately 90% of the reference genome. Single-cell coverage ratios were normalized using this ratio. *The simultaneous use of Bst large fragments and Pyrophage3173 (exo-) for MALBAC amplification improved coverage compared to amplification using Bst large fragments alone. **S: Data obtained with the SolidD system; I: Data obtained with the Illumina system.

表6Table 6

实施例XVExample XV

产前诊断Prenatal diagnosis

根据这里展示的方法的某些方面,通过扩增遗传物质,例如从一个婴儿细胞,或一些婴儿细胞,或者游离的母体中的婴儿来源的DNA,可以用于进行无创伤的产前检测。根据某些方面胎儿的整个基因组,或者是基因组显著部分的信息可以被得到和分析,例如,进行非正常或制病基因的分析。According to certain aspects of the methods disclosed herein, non-invasive prenatal testing can be performed by amplifying genetic material, such as DNA from a fetal cell, a subset of fetal cells, or isolated from the mother's body. According to certain aspects, information about the entire fetal genome, or a significant portion of the genome, can be obtained and analyzed, for example, to identify abnormal or disease-causing genes.

根据另外的某些方面,这里提供植入前基因筛查(PGS)和诊断(PGD)的方法。每年有超过十万例试管婴儿(IVF)在美国进行,能够筛选待植入的胚胎十分重要。现行的PGS方法包括极体、卵裂球、囊胚等的活检,然后通过各种基因筛查方法,如荧光原位杂交(FISH)及聚合酶链式反应(PCR)进行筛查某些特异的染色体变异(如21号染色体三体),或者探测某些确定的有明确表型的基因突变。在IVF中,目前已知胚胎再植入前会经过2到12个细胞的阶段。根据某些方面,胚胎中的一个或几个细胞的遗传物质(如DNA)被提取和扩增,使得胚胎的全基因组可以被获得和分析,可以得到胚胎的异常和致病信息。According to certain other aspects, methods for preimplantation genetic screening (PGS) and diagnosis (PGD) are provided herein. More than 100,000 in vitro fertilization (IVF) procedures are performed in the United States each year, and it is very important to be able to screen embryos to be implanted. Current PGS methods include biopsies of polar bodies, blastomeres, blastocysts, etc., followed by various genetic screening methods such as fluorescence in situ hybridization (FISH) and polymerase chain reaction (PCR) to screen for certain specific chromosomal variations (such as trisomy 21), or to detect certain identified genetic mutations with clear phenotypes. In IVF, it is currently known that embryos will pass through a 2 to 12 cell stage before being implanted. According to certain aspects, the genetic material (such as DNA) of one or more cells in the embryo is extracted and amplified so that the full genome of the embryo can be obtained and analyzed, and information on the abnormality and pathogenicity of the embryo can be obtained.

根据某些方面,母体血液中的有核婴儿细胞可以被分离。从单个或多个有核的婴儿细胞中分离出核酸,如DNA。细胞外的婴儿核酸物质也可以从母体血中获得。见:Lo etal.,Nature Reviews Genetics,Vo1.8,pp.71-77(2007)和and Lo et al.,Sci.Transl.Med.2,6lra91(2010),以作为参考。这些核酸可被这里阐述的方法线性扩增。比如说,婴儿的整个基因组,或者是基因组上的某些位点。通过这个找到基因组上和疾病相关或已知表型的结果。According to certain aspects, nucleated fetal cells can be isolated from maternal blood. Nucleic acids, such as DNA, can be isolated from a single or multiple nucleated fetal cells. Extracellular fetal nucleic acid material can also be obtained from maternal blood. See Lo et al., Nature Reviews Genetics, Vol. 8, pp. 71-77 (2007) and Lo et al., Sci. Transl. Med. 2, 61ra91 (2010), for reference. These nucleic acids can be linearly amplified using the methods described herein. For example, the entire genome of the fetus, or specific sites on the genome, can be amplified. This can be used to identify genomic findings associated with a disease or known phenotype.

带核的婴儿细胞可以通过以下途径取得。母亲的血液的通过静脉抽血或指尖采血取得。母亲血液中抽取0.1ml到100ml。带核的婴儿细胞(包括婴儿核红细胞,白细胞,滋养层细胞等)在妊娠八周左右可以被分离出来。分离婴儿细胞的方法多样,包括:通过散射和表面标记的荧光辅助流式细胞仪(FACS),磁介导的细胞分选,微切割,细胞大小分选装置(如微流控芯片),细胞密度分选方法(如离心)等等。Nucleated fetal cells can be obtained by the following methods: Blood is drawn from the mother's vein or by fingerstick sampling. Between 0.1ml and 100ml of maternal blood is drawn. Nucleated fetal cells (including fetal nucleated red blood cells, white blood cells, and trophoblasts) can be isolated around eight weeks of gestation. Nucleated fetal cells can be isolated using a variety of methods, including fluorescence-assisted flow cytometry (FACS) using scattering and surface labeling, magnetic-mediated cell sorting, microdissection, cell size sorting devices (e.g., microfluidic chips), and cell density sorting methods (e.g., centrifugation).

核酸,例如DNA,可以通过以下方法从带核的婴儿细胞中取得。将单细胞置于新鲜配置的30ul的细胞裂解液(30mM Tris-Cl PH7.8,2mM EDTA,20mM KCl,0.2%Triton X-100,12.5ug/ml QIAGEN Protease)。请参考:Bianchi et al.,Isolation of fetal DNAfrom nucleated erythrocytes in maternal blood,Proc.Natl.Acad.Sci.USA Vol.87,pp.3279-3283,May1990and Wachtel et al.,Clin.Genet.2001:59;74-79(2001),这里提供以供参考。其他方法例如碱裂解或者解冻-冻存裂解也可以应用于核酸的提取,或者利用其他这里提到的或者已知的方法。Nucleic acids, such as DNA, can be obtained from nucleated fetal cells by the following method: Single cells are placed in 30 μl of freshly prepared cell lysis buffer (30 mM Tris-Cl pH 7.8, 2 mM EDTA, 20 mM KCl, 0.2% Triton X-100, 12.5 μg/ml QIAGEN Protease). See, for example, Bianchi et al., Isolation of fetal DNA from nucleated erythrocytes in maternal blood, Proc. Natl. Acad. Sci. USA Vol. 87, pp. 3279-3283, May 1990, and Wachtel et al., Clin. Genet. 2001: 59; 74-79 (2001), which are incorporated herein by reference. Other methods, such as alkaline lysis or thaw-freeze lysis, can also be used to extract nucleic acids, or other methods described herein or known in the art.

提取出来的核酸物质可以用一下方法进行扩增。在含有一个裂解了的单细胞中加入30ul的扩增缓冲液(20mM Tris-Cl(pH8.8),10mM(NH4)2SO4,10mM KCl,3mM MgSO4,0.1%Triton X-100,0.32uM引物GAT3T(GTG AGT GAT GGT TGA GGT AGT GTG GAG NNN NNG GG),0.25uM引物GAT3G(GTG AGT GAT GGT TGA GGT AGT GTG GAG NNN NNT TT),然后加热94C3分钟以将双链的基因组DNA解链为单链。随后,将这些单链DNA迅速置于冰上以确保其与引物的有效结合。随后加入2.5U的Bst大片段(NEB),或者2U的Bst大片段和0.8U的Pyrophage3173exo-(Lucigen)。进行以下的温度循环:10℃-45秒20℃-45秒30℃-60秒40℃-45秒50℃-45秒62℃-2分钟95℃-20秒,然后将PCR管快速置于冰上。The extracted nucleic acid can be amplified using the following method. Add 30ul of amplification buffer (20mM Tris-Cl (pH8.8), 10mM (NH4)2SO4, 10mM KCl, 3mM MgSO4, 0.1% Triton X-100, 0.32uM primer GAT3T (GTG AGT GAT GGT TGA GGT AGT GTG GAG NNN NNG GG), 0.25uM primer GAT3G (GTG AGT GAT GGT TGA GGT AGT GTG GAG NNN NNT TT), then heated at 94°C for 3 minutes to melt the double-stranded genomic DNA into single strands. Subsequently, these single-stranded DNAs were quickly placed on ice to ensure their effective binding to the primers. Subsequently, 2.5U of Bst Large Fragment (NEB) or 2U of Bst Large Fragment and 0.8U of Pyrophage 3173 exo- (Lucigen) were added. The following temperature cycle was performed: 10°C-45 seconds, 20°C-45 seconds, 30°C-60 seconds, 40°C-45 seconds, 50°C-45 seconds, 62°C-2 minutes, 95°C-20 seconds, and then the PCR tubes were quickly placed on ice.

在冰上退火后,加入同样的聚合酶混合物并进行以下温度循环:10℃-45秒,20℃-45秒,30℃-60秒,40℃-45秒,50℃-45秒,62℃-2分钟,95℃-20秒,58℃-20秒。并重复以上步骤四次以得到与扩增物的混合物。After annealing on ice, the same polymerase mixture was added and the following temperature cycle was performed: 10°C for 45 seconds, 20°C for 45 seconds, 30°C for 60 seconds, 40°C for 45 seconds, 50°C for 45 seconds, 62°C for 2 minutes, 95°C for 20 seconds, and 58°C for 20 seconds. The above steps were repeated four times to obtain a mixture with the amplified product.

上述的扩增物可以进一步用PCR进行扩增,目的是进行新一代基因组测序。往预扩增物中加入新鲜配置的30ul的扩增混合物(20mM Tris-Cl PH8.8,10mM(NH4)2SO4,10mMKCl,4mM MgSO4,0.1%Triton X-100,0.66uM Bio-GAT引物(5’/Biosg/GTGAGTGATGGTTGAGGTAGTGTGGAG-3’),2.4U Deep VentR exo-聚合酶(NEB)。温度循环:94C20秒,59C20秒,65C1分钟,72C2分钟重复18遍,从一个细胞开始共可得到2-3微克的双链DNA产物用以进行高通量测序。The amplified product can be further amplified by PCR for next-generation genome sequencing. To the pre-amplified product, 30 μl of freshly prepared amplification mix (20 mM Tris-Cl pH 8.8, 10 mM (NH₄)₂SO₄, 10 mM KCl, 4 mM MgSO₄, 0.1% Triton X-100, 0.66 μM Bio-GAT primer (5'/Biosg/GTGAGTGATGGTTGAGGTAGTGTGGAG-3'), and 2.4 U Deep VentR exo-polymerase (NEB)) was added. The temperature was cycled 18 times: 94°C for 20 seconds, 59°C for 20 seconds, 65°C for 1 minute, and 72°C for 2 minutes. Starting from one cell, a total of 2-3 μg of double-stranded DNA product was obtained for high-throughput sequencing.

扩增物所进行的基因测试可以是在全基因组尺度的,或者在选择的基因组上显著的部分,或者是已知致病的基因组位点。全基因组分析的例子包括新一代全基因组测序(Illumina,SoliD等),基于杂交的全基因组基因分型方法,例如单碱基多态性阵列(SNParray),比较基因组杂交芯片等等。分析基因组显著部分的方法例如靶向序列重测序,特别的基因组部分的测序例如外显子组测序,某个染色体等等。分析特别的基因组位点的例子包括对扩增全基因组进行杂交核酸探针,通过成像或测序进行分析;也包括通过PCR或者多重PCR对具体的基因组的区域进行扩增和后续的测序或者基因分型。The genetic testing performed on the amplicon can be on a whole-genome scale, or on a significant portion of a selected genome, or on a genomic site known to cause disease. Examples of whole-genome analysis include next-generation whole-genome sequencing (Illumina, SoliD, etc.), hybridization-based whole-genome genotyping methods, such as single-base polymorphism arrays (SNP arrays), comparative genomic hybridization chips, and the like. Methods for analyzing significant portions of the genome include targeted sequence resequencing, sequencing of specific genomic portions such as exome sequencing, a certain chromosome, and the like. Examples of analyzing specific genomic sites include hybridization of nucleic acid probes to the amplified whole genome, analysis by imaging or sequencing, and also amplification of specific genomic regions by PCR or multiplex PCR and subsequent sequencing or genotyping.

对于产前筛查和诊断有意义的基因多态性范围很广,包括但不限于单碱基多态性(SNV),1-100bp的小插入或缺失(Indels),100bp-100Mbp的拷贝数多态性(CNV),1bp-1Mbp的序列的倒位和重复,10bp-100Mbp的杂合性丢失(LOH),以及全染色体水平上的异常,包括染色体易位,非整倍体,染色体部分或全部的缺失或倍增。The range of genetic polymorphisms that are meaningful for prenatal screening and diagnosis is very wide, including but not limited to single base polymorphisms (SNVs), small insertions or deletions (Indels) of 1-100 bp, copy number polymorphisms (CNVs) of 100 bp-100 Mbp, inversions and duplications of sequences of 1 bp-1 Mbp, loss of heterozygosity (LOH) of 10 bp-100 Mbp, and abnormalities at the whole chromosome level, including chromosomal translocations, aneuploidy, and deletion or duplication of part or all of a chromosome.

“基因组上和疾病相关或已知表型的结果”的例子包括与基因变异相关的一致疾病,例如beta-地中海贫血症,是由血红蛋白(HBB)基因41号和42号编码子4bp的缺失导致的,唐氏综合症,是由21号染色体的重复(21染色体三体)造成的。“一致的表型的结果”包括仍为被认为是遗传病的潜在的健康状况或者物理状况,例如某种疾病的易感性,如癌症、婴儿的性别等等。对于具体的情况请参考:Cheung et al.,Nature Genetics,Vol.14,pp.264-268(1996)(镰刀型贫血症,地中海贫血症),Belroud,et al.,The Lancet,Vol.361,pp.1013-1014(2003)(脊肌萎缩症)Examples of "genomically associated disease or known phenotypic outcomes" include consistent diseases associated with genetic mutations, such as beta-thalassemia, which is caused by a 4-bp deletion in the coding regions 41 and 42 of the hemoglobin (HBB) gene, and Down syndrome, which is caused by a duplication of chromosome 21 (trisomy 21). "Consistent phenotypic outcomes" include underlying health conditions or physical conditions that are still considered genetic diseases, such as susceptibility to certain diseases, such as cancer, the sex of the baby, etc. For specific examples, please refer to: Cheung et al., Nature Genetics, Vol. 14, pp. 264-268 (1996) (sickle cell anemia, thalassemia), Belroud, et al., The Lancet, Vol. 361, pp. 1013-1014 (2003) (spinal muscular atrophy)

根据某些其他方面,一个或多个细胞可以从IVF胚胎中被切割或分离。核酸,例如DNA,可从IVF胚胎中的单细胞或多个细胞中被提取出来。这些核酸物质被这里展示的扩增方法进行扩增,以进行例如整个基因组或者基因组的某些部分的分析。疾病相关或已知表型结果的基因组变异或差异可以被检测出。According to certain other aspects, one or more cells can be cut or separated from the IVF embryo. Nucleic acid, such as DNA, can be extracted from the unicellular or multiple cells in the IVF embryo. These nucleic acid materials are amplified by the amplification method shown here, to carry out analysis of, for example, the entire genome or some part of the genome. Genomic variation or differences in disease-related or known phenotypic outcomes can be detected.

一个或多个从IVF胚胎中获取的细胞可以通过微操作的胚胎穿孔取得。可以获取胚胎的机体细胞;或者获取胚胎的外滋养层细胞;或者获取胚胎的卵裂球细胞,等等。活检是在胚胎受精后的第0天到第6天进行的。One or more cells from an IVF embryo can be obtained by micromanipulation of the embryo using embryonic puncture. This can involve harvesting somatic cells, trophoblast cells, or blastomere cells, among other options. Biopsies are performed between day 0 and day 6 after fertilization.

核酸,例如DNA,可以通过以下方法在胚胎细胞中取得。将单细胞置于新鲜配置的30ul的细胞裂解液(30mM Tris-Cl PH7.8,2mM EDTA,20mM KCl,0.2%Triton X-100,12.5ug/ml QIAGEN Protease)。其他方法例如碱裂解或者解冻-冻存裂解也可以应用于核酸的提取,或者利用其他这里提到的或者已知的方法。Nucleic acids, such as DNA, can be obtained from embryonic cells by the following method: Single cells are placed in 30 μl of freshly prepared cell lysis buffer (30 mM Tris-Cl pH 7.8, 2 mM EDTA, 20 mM KCl, 0.2% Triton X-100, 12.5 μg/ml QIAGEN Protease). Other methods, such as alkaline lysis or thaw-freeze lysis, can also be used to extract nucleic acids, or other methods described herein or known in the art.

提取出来的核酸物质可以用一下方法进行扩增。在含有一个裂解了的单细胞中加入30ul的扩增缓冲液(20mM Tris-Cl(pH8.8),10mM(NH4)2SO4,10mM KCl,3mM MgSO4,0.1%Triton X-100,0.32uM引物GAT3T(GTG AGT GAT GGT TGA GGT AGT GTG GAG NNN NNG GG),0.25uM引物GAT3G(GTG AGT GAT GGT TGA GGT AGT GTG GAGNNN NNT TT),然后加热94C3分钟以将双链的基因组DNA解链为单链。随后,将这些单链DNA迅速置于冰上以确保其与引物的有效结合。随后加入2.5U的Bst大片段(NEB),或者2U的Bst大片段和0.8U的Pyrophage3173exo-(Lucigen)。进行以下的温度循环:10℃-45秒20℃-45秒30℃-60秒40℃-45秒50℃-45秒62℃-2分钟95℃-20秒,然后将PCR管快速置于冰上。The extracted nucleic acid can be amplified using the following method. Add 30ul of amplification buffer (20mM Tris-Cl (pH8.8), 10mM (NH4)2SO4, 10mM KCl, 3mM MgSO4, 0.1% Triton X-100, 0.32uM primer GAT3T (GTG AGT GAT GGT TGA GGT AGT GTG GAG NNN NNG GG), 0.25uM primer GAT3G (GTG AGT GAT GGT TGA GGT AGT GTG GAGNNN NNT TT), then heated at 94°C for 3 minutes to melt the double-stranded genomic DNA into single strands. Subsequently, these single-stranded DNAs were quickly placed on ice to ensure their effective binding to the primers. Subsequently, 2.5U of Bst Large Fragment (NEB) or 2U of Bst Large Fragment and 0.8U of Pyrophage 3173 exo- (Lucigen) were added. The following temperature cycle was performed: 10°C-45 seconds, 20°C-45 seconds, 30°C-60 seconds, 40°C-45 seconds, 50°C-45 seconds, 62°C-2 minutes, 95°C-20 seconds, and then the PCR tubes were quickly placed on ice.

在冰上退火后,加入同样的聚合酶混合物并进行以下温度循环:10℃-45秒,20℃-45秒,30℃-60秒,40℃-45秒,50℃-45秒,62℃-2分钟,95℃-20秒,58℃-20秒。并重复以上步骤四次以得到与扩增物的混合物。After annealing on ice, the same polymerase mixture was added and the following temperature cycle was performed: 10°C for 45 seconds, 20°C for 45 seconds, 30°C for 60 seconds, 40°C for 45 seconds, 50°C for 45 seconds, 62°C for 2 minutes, 95°C for 20 seconds, and 58°C for 20 seconds. The above steps were repeated four times to obtain a mixture with the amplified product.

上述的扩增物可以进一步用PCR进行扩增,目标进行新一代基因组测序。往预扩增物中加入新鲜配置的30ul的扩增混合物(20mM Tris-ClPH8.8,10mM(NH4)2SO4,10mM KCl,4mM MgSO4,0.1%Triton X-100,0.66uM Bio-GAT引物(5’/Biosg/GTGAGTGATGGTTGAGGTAGTGTGGAG-3’),2.4U Deep VentR exo-聚合酶(NEB)。温度循环:94C20秒,59C20秒,65C1分钟,72C2分钟重复18遍,从一个细胞开始共可得到2-3微克的双链DNA产物用以进行高通量测序。The amplified product can be further amplified using PCR for next-generation genome sequencing. To the pre-amplified product, 30 μl of freshly prepared amplification mix (20 mM Tris-Cl pH 8.8, 10 mM (NH₄)₂SO₄, 10 mM KCl, 4 mM MgSO₄, 0.1% Triton X-100, 0.66 μM Bio-GAT primer (5'/Biosg/GTGAGTGATGGTTGAGGTAGTGTGGAG-3'), and 2.4 U Deep VentR exo-polymerase (NEB)) was added. The temperature was cycled 18 times: 94°C for 20 seconds, 59°C for 20 seconds, 65°C for 1 minute, and 72°C for 2 minutes. Starting from one cell, a total of 2-3 μg of double-stranded DNA product was obtained for high-throughput sequencing.

扩增物所进行的基因测试可以是在全基因组尺度的,或者在选择的基因组上显著的部分,或者是已知致病的基因组位点。全基因组分析的例子包括新一代全基因组测序(Illumina,SoliD等),基于杂交的全基因组基因分型方法,例如单碱基多态性阵列(SNParray),比较基因组杂交芯片等等。分析基因组显著部分的方法例如靶向序列重测序,特别的基因组部分的测序例如外显子组测序,某个染色体等等。分析特别的基因组位点的例子包括对扩增全基因组进行杂交核酸探针,通过成像或测序进行分析;也包括通过PCR或者多重PCR对具体的基因组的区域进行扩增和后续的测序或者基因分型。The genetic testing performed on the amplicon can be on a whole-genome scale, or on a significant portion of a selected genome, or on a genomic site known to cause disease. Examples of whole-genome analysis include next-generation whole-genome sequencing (Illumina, SoliD, etc.), hybridization-based whole-genome genotyping methods, such as single-base polymorphism arrays (SNP arrays), comparative genomic hybridization chips, and the like. Methods for analyzing significant portions of the genome include targeted sequence resequencing, sequencing of specific genomic portions such as exome sequencing, a certain chromosome, and the like. Examples of analyzing specific genomic sites include hybridization of nucleic acid probes to the amplified whole genome, analysis by imaging or sequencing, and also amplification of specific genomic regions by PCR or multiplex PCR and subsequent sequencing or genotyping.

对于产前筛查和诊断有意义的基因多态性范围很广,包括但不限于单碱基多态性(SNV),1-100bp的小插入或缺失(Indels),100bp-100Mbp的拷贝数多态性(CNV),1bp-1Mbp的序列的倒位和重复,10bp-100Mbp的杂合性丢失(LOH),以及全染色体水平上的异常,包括染色体易位,非整倍体,染色体部分或全部的缺失或倍增,以及其他已知可能致病的染色体或基因变异。需要说明的是这里提到的某些基因疾病的目的并不是穷尽所有的疾病,而仅仅为了举例说明。这里展示的方法的目的是为了分析单细胞的基因组。所以任何或所有能通过分析细胞基因组的疾病的异常在这里被视为本展示方法的一个例子。The range of genetic polymorphisms that are meaningful for prenatal screening and diagnosis is very wide, including but not limited to single base polymorphisms (SNVs), small insertions or deletions (Indels) of 1-100bp, copy number polymorphisms (CNVs) of 100bp-100Mbp, inversions and duplications of sequences of 1bp-1Mbp, loss of heterozygosity (LOH) of 10bp-100Mbp, and abnormalities at the whole chromosome level, including chromosome translocations, aneuploidies, partial or complete deletions or duplications of chromosomes, and other known chromosome or gene variations that may cause disease. It should be noted that the purpose of some of the genetic diseases mentioned here is not to exhaust all diseases, but only to illustrate. The purpose of the method shown here is to analyze the genome of a single cell. Therefore, any or all abnormalities of diseases that can be detected by analyzing the cell genome are considered here as an example of this display method.

“基因组上和疾病相关或已知表型的结果”的例子包括与基因变异相关的一致疾病,例如beta-地中海贫血症,是由血红蛋白(HBB)基因41号和42号编码子4bp的缺失导致的,唐氏综合症,是由21号染色体的重复(21染色体三体)造成的。“一致的表型的结果”包括仍为被认为是遗传病的潜在的健康状况或者物理状况,例如某种疾病的易感性,如癌症、婴儿的性别等等。别的可以通过产前检测得到胎儿或者胚胎的疾病和状况包括:囊肿性纤维化、镰刀细胞疾病、tay-sachs疾病、脆性X综合征、脊肌萎缩症、血红蛋白病、地中海贫血、X-连锁疾病(由X染色体基因主导的疾病)、脊柱分裂症、无脑症、先天性心脏病、肥胖、糖尿病、癌症、婴儿性别、婴儿RHD、婴儿HLA分型、父源突变、染色体非整倍性等等。Examples of "genomic findings associated with a disease or known phenotype" include consistent diseases associated with genetic variants, such as beta-thalassemia, caused by a 4-bp deletion in the coding regions 41 and 42 of the hemoglobin (HBB) gene, and Down syndrome, caused by a duplication of chromosome 21 (trisomy 21). "Consistent phenotypic findings" also include underlying health conditions or physical conditions that are not yet considered genetic, such as susceptibility to certain diseases, such as cancer, and the sex of the fetus. Other diseases and conditions that can be detected in the fetus or embryo through prenatal testing include cystic fibrosis, sickle cell disease, Tay-Sachs disease, fragile X syndrome, spinal muscular atrophy, hemoglobinopathies, thalassemia, X-linked disorders (diseases caused by genes on the X chromosome), spina schizophrenia, anencephaly, congenital heart disease, obesity, diabetes, cancer, the sex of the fetus, infant RHD, infant HLA typing, paternal mutations, and chromosomal aneuploidy.

实施例XVIExample XVI

癌症诊断Cancer diagnosis

根据本展示方法的某些方面,提供单个、一些、一个具体癌症细胞的全基因组分析。这里展示的方法尤其在仅有少量癌症细胞的情况适用。例如,在样本稀少,或者可获得和分离的稀少的情况。一个具体的例子是在个体血液中的循环肿瘤细胞(CTC)。在肿瘤细胞和血管相互作用以及在肿瘤转移的过程中,癌症细胞,例如肿瘤细胞会侵入到血液中。现行基于肿瘤循环细胞的诊断方法是基于可富集的CTC细胞的计数。因为CTC细胞在血液中非常稀少(10^9个血细胞中有一个CTC),而且CTC高度异质,富集的效率因病例而不同。基于CTC的数目并不太可靠,这种方法现在正在进行临床试验。目前CellSearch是唯一FDA批准的可以用于CTC富集和计数的仪器。像传统的检测基因病的用一百万个细胞左右的诊断方法可能对CTC不适用。According to certain aspects of the method presented herein, a whole genome analysis of a single, some, or a specific cancer cell is provided. The method presented here is particularly applicable when there are only a small number of cancer cells. For example, when samples are scarce, or when they are scarcely available and can be isolated. A specific example is circulating tumor cells (CTCs) in an individual's blood. During the interaction between tumor cells and blood vessels and during tumor metastasis, cancer cells, such as tumor cells, invade the blood. The current diagnostic method based on circulating tumor cells is based on the counting of enrichable CTC cells. Because CTC cells are very rare in the blood (one CTC in 10^9 blood cells) and CTCs are highly heterogeneous, the efficiency of enrichment varies from case to case. Based on the number of CTCs, it is not very reliable, and this method is now undergoing clinical trials. Currently, CellSearch is the only FDA-approved instrument that can be used for CTC enrichment and counting. Traditional diagnostic methods that use about one million cells to detect genetic diseases may not be applicable to CTCs.

循环肿瘤细胞提供了由原位或转移灶起源的癌症细胞,这些细胞可以运用于检测与分析癌细胞已进行早期诊断。这里展示的方法可以实现循环肿瘤细胞的DNA检测。循环肿瘤细胞数量少而不用进行例如手术等侵入性的获取,所以提供了一种非侵入性检测的方法。这里展示的方法可对循环肿瘤细胞的DNA进行检测,从而可能在癌症的初期,癌细胞游离在血液中的时候进行检测。用这里描述的方法可进行可靠的单个、以及约10个、以及约100个细胞的全基因均匀扩增,而不引入过多的扩增偏倚和等位基因脱扣。因为这个方法能扩增单细胞的全基因或者近全基因组。这个方法对于这里描述的肿瘤循环细胞特别有用,由于肿瘤循环细胞的数目很少,而可对肿瘤的早期诊断其重要作用。Circulating tumor cells (CTCs) provide a source of cancer cells, either originating in situ or metastatic sites, and can be used to detect and analyze cancer cells for early diagnosis. The method presented here enables DNA detection of CTCs. Because CTCs are so few in number, invasive procedures such as surgery are not necessary for their acquisition, providing a non-invasive method for detection. The method presented here allows for DNA detection of CTCs, potentially enabling detection of cancer cells at their earliest stages, when they are free in the bloodstream. The method described here allows for reliable and uniform amplification of the entire genome of a single cell, as well as of approximately 10 or 100 cells, without introducing excessive amplification bias or allelic dropout. This is because the method can amplify the entire genome or nearly the entire genome of a single cell. This method is particularly useful for the CTCs described here, which are so rare and therefore crucial for early diagnosis.

根据一个方面,这里展示的MALBAC方法,运用多重退火及循环扩增可基因单个癌症细胞的全基因组扩增,例如一个循环肿瘤细胞。根据一个方面,循环肿瘤细胞可从病人的血液中获取。肿瘤细胞也可以从原位或转移的癌中,通过微创手术如:微针取样(FNA)以进行最低样品取样诊断(MSMD)。通过这些手段,获取一个或多个癌症细胞可以被认为是非侵入性的。这里展示的方法尤其对只能得到很少癌症细胞的状况有很多应用,本方法也可以应用于可以取得较多癌细胞,但是进行单细胞分析的情况。According to one aspect, the MALBAC method presented herein utilizes multiple annealing and cyclic amplification to amplify the whole genome of a single cancer cell, such as a circulating tumor cell. According to one aspect, circulating tumor cells can be obtained from a patient's blood. Tumor cells can also be obtained from in situ or metastatic cancers through minimally invasive procedures such as microneedle aspiration (FNA) for minimally invasive sampling diagnosis (MSMD). Through these means, obtaining one or more cancer cells can be considered non-invasive. The method presented herein has many applications, particularly in situations where only a few cancer cells can be obtained. The method can also be applied to situations where a larger number of cancer cells can be obtained but single-cell analysis is required.

根据一个方面,这里提供从一个癌症细胞分析DNA的方法。“癌症”指的是多种类型的恶性增生,大多数的类型会侵害到周围的组织,也可能转移到另一个区域(请参考如PDRMedical Dictionary1st edition(1995))。术语“增生”和“肿瘤”指的是一种当刺激增殖的信号分子被移除后仍然不停止快速增殖的非正常组。这些组织往往部分或全部缺乏像正常组织一样结构功能性的构成,可能是良性的(如良性肿瘤),或者是恶性的(如恶性肿瘤)。According to one aspect, a method for analyzing DNA from a cancer cell is provided. "Cancer" refers to various types of malignant proliferation, most of which invade surrounding tissues and may also metastasize to another area (see, for example, the PDRMedical Dictionary, 1st edition (1995)). The terms "hyperplasia" and "tumor" refer to an abnormal group of cells that continue to proliferate rapidly even after the signaling molecules that stimulate proliferation are removed. These tissues often partially or completely lack the structural and functional components of normal tissues and can be benign (such as benign tumors) or malignant (such as malignant tumors).

肿瘤总体种类的例子(但不局限于)包括:癌症(即恶性肿瘤,往往从表皮组织起源,如常见类型的乳腺、前列腺、肺、结肠癌),恶性肉瘤(即结缔组织和间叶组织起源的恶性肿瘤;淋巴癌,白血病,造血干细胞起源的恶性肿瘤),生殖细胞肿瘤(从全能型细胞起源的肿瘤,成人中一般常见于睾丸或卵巢,婴幼儿常见于身体中线,尤其是尾骨的顶端),胚细胞瘤(不成熟细胞或胚胎组织的肿瘤),以及类似的病症。具有相关技能的人员理解这里的例子是以举例为目的,而非以穷尽为目的,故可能通过这里展示的方法检测更多种类的肿瘤。Examples of general types of tumors (but not limited to) include: cancer (i.e., malignant tumors, often originating from epidermal tissue, such as common types of breast, prostate, lung, and colon cancer), malignant sarcomas (i.e., malignant tumors originating from connective tissue and mesenchymal tissue; lymphomas, leukemias, and malignant tumors originating from hematopoietic stem cells), germ cell tumors (tumors originating from totipotent cells, generally common in the testicles or ovaries in adults, and common in infants and young children in the midline of the body, especially the tip of the coccyx), blastomas (tumors of immature cells or embryonic tissue), and similar conditions. Those skilled in the art will understand that the examples herein are for illustrative purposes only and are not intended to be exhaustive, and therefore more types of tumors may be detected using the methods presented herein.

本发明方法所指的增生包括但不限于:急性淋巴白血病、骨髓性白血病、儿童急性髓细胞性白血病、肾上腺皮质癌、艾滋病相关癌症、艾滋病相关淋巴癌、肛门癌、阑尾癌、星形细胞瘤(小脑、大脑)、非典型畸胎瘤、横纹肌样瘤、基底细胞癌、胆管癌、肝外增生、膀胱癌、骨癌、骨肉瘤、恶性纤维组织细胞瘤、脑肿瘤(如脑干神经胶质瘤、中枢神经系统非典型畸胎样/横纹肌样瘤,中枢神经系统胚胎性肿瘤,小脑星形细胞瘤,脑星形细胞瘤/恶性神经胶质瘤,颅咽管瘤,室管膜瘤,髓母细胞瘤,髓上皮瘤,中间分化的松果体实质肿瘤,幕上原始神经外胚层瘤,成松果体细胞瘤,视觉通路下丘脑神经胶质瘤,脑和脊髓肿瘤);乳腺癌;支气管肿瘤;伯基特淋巴瘤;类癌(例如胃肠道),不明原发癌;中枢神经系统(如非典型畸胎样/横纹肌样瘤,胚胎性肿瘤(如淋巴瘤,原发性);小脑星形细胞瘤;脑星形细胞瘤/恶性神经胶质瘤;宫颈癌;脊索瘤,慢性淋巴细胞性白血病,慢性骨髓增生性疾病;结肠癌;直肠癌;颅咽管瘤;皮肤T-细胞淋巴瘤;胚胎性肿瘤,中枢神经系统;子宫内膜癌;室管膜瘤;食道癌;尤因家族肿瘤;颅外生殖细胞肿瘤;性腺外生殖细胞肿瘤;肝外胆管癌;眼癌(例如,眼内黑色素瘤,视网膜母细胞瘤);胆囊癌;胃癌;胃肠肿瘤(如类癌,间质瘤(gist),间质细胞瘤);生殖细胞肿瘤(如颅外,性腺外,卵巢);妊娠滋养细胞肿瘤,胶质瘤(如脑干,脑星形细胞瘤),毛细胞白血病,头颈部肿瘤,肝癌,霍奇金淋巴瘤,喉咽癌,下丘脑及视路胶质瘤;眼内黑色素瘤,胰岛细胞瘤,卡波西氏肉瘤,肾癌,大细胞瘤,喉癌(如急性淋巴细胞,急性髓细胞性);白血病(例如,急性髓细胞,慢性淋巴细胞性,慢性粒细胞,毛细胞);唇和/或口腔癌,肝脏癌;肺癌(例如,非小细胞,小细胞);淋巴瘤(如艾滋病相关的,伯基特,皮肤T细胞,霍奇金淋巴瘤,非霍奇金,原发性中枢神经系统),巨球蛋白血症,华氏,骨和/或骨肉瘤的恶性纤维组织细胞瘤,髓母细胞瘤,髓上皮瘤,黑色素瘤,Merkel细胞癌,间皮瘤,转移性鳞癌颈部癌,口腔癌;多发性内分泌腺瘤综合征,多发性骨髓瘤/浆细胞瘤,蕈样肉芽肿,骨髓增生异常综合征,骨髓增生异常/骨髓增生性疾病,骨髓性白血病(如慢性,急性,多),骨髓增生性疾病,慢性;鼻腔和/或鼻窦癌;鼻咽癌;神经母细胞瘤,非霍奇金淋巴瘤,非小细胞肺癌;口腔癌;口腔癌,口咽癌,骨肉瘤和/或骨的恶性纤维组织细胞瘤;卵巢癌(例如,卵巢上皮癌,卵巢生殖细胞肿瘤,卵巢低度恶性潜能的肿瘤),胰腺癌(例如,胰岛细胞瘤),乳头状瘤病,鼻窦和/或鼻腔癌,甲状腺癌,阴茎癌,咽癌,嗜铬细胞瘤,中间分化的松果体实质肿瘤,成松果体细胞瘤与幕上原始神经外胚层肿瘤,垂体瘤,浆细胞瘤/多发性骨髓瘤,胸膜肺母细胞瘤,原发性中枢神经系统淋巴瘤,前列腺癌,直肠癌,肾细胞癌,肾,肾盂和/或输尿管移行细胞癌,累及呼吸道癌15号染色体上的基因螺母;视网膜母细胞瘤;横纹肌肉瘤;唾液腺癌症;肉瘤(例如尤因家族肿瘤,卡波济,软组织,子宫),塞扎里综合征,皮肤癌(如非黑色素瘤,黑色素瘤,Merkel细胞);小细胞肺癌;小肠肿瘤,软组织肉瘤,鳞状细胞癌;鳞状细胞头颈癌与原发性隐匿,转移,胃癌;幕上原始神经外胚层肿瘤,T细胞淋巴瘤,皮肤,睾丸癌,咽喉癌,胸腺瘤和/或胸腺癌,甲状腺癌,滋养细胞肿瘤;;原发部位不明癌,尿道癌,子宫癌,子宫内膜癌,子宫肉瘤,阴道癌,视觉通路和/或下丘脑胶质瘤,肾,肾盂和/或输尿管移行细胞癌外阴癌癌症;瓦尔登斯特伦巨球蛋白血症;肾母细胞瘤等。对于综述,请参阅美国国家癌症研究所的全球网站(cancer.gov/cancertopics/alphalist)。一个本领域技术人员将会理解,这个列表只是示例性的,而不是详尽无遗的,如本领域技术人员将容易地能够确定附加的癌症和/或根据本文所披露的肿瘤。Hyperplasia referred to in the method of the present invention includes but is not limited to: acute lymphoblastic leukemia, myeloid leukemia, childhood acute myeloid leukemia, adrenocortical carcinoma, AIDS-related cancer, AIDS-related lymphoma, anal cancer, appendix cancer, astrocytoma (cerebellum, cerebrum), atypical teratoma, rhabdoid tumor, basal cell carcinoma, bile duct cancer, extrahepatic hyperplasia, bladder cancer, bone cancer, osteosarcoma, malignant fibrous histiocytoma, brain tumors (such as brainstem glioma, central nervous system atypical teratoid/rhabdoid tumor, central nervous system embryonal tumor, cerebellar astrocytoma, brain astrocytoma/ Malignant gliomas (e.g., craniopharyngioma, ependymoma, medulloblastoma, medulloepithelioma, intermediately differentiated pineal parenchymal tumor, supratentorial primitive neuroectodermal tumor, pinealoblastoma, visual pathway hypothalamic glioma, brain and spinal cord tumors); breast cancer; bronchogenic carcinoma; Burkitt lymphoma; carcinoid tumors (e.g., gastrointestinal), carcinoma of unknown primary; central nervous system (e.g., atypical teratoid/rhabdoid tumors, embryonal tumors (e.g., lymphoma, primary); cerebellar astrocytoma; brain astrocytoma/malignant glioma; cervical cancer; chordoma, chronic lymphocytic leukemia, chronic bone marrow tumors); Myeloproliferative disorders; colon cancer; rectal cancer; craniopharyngioma; cutaneous T-cell lymphoma; embryonal tumors, central nervous system; endometrial cancer; ependymoma; esophageal cancer; Ewing family of tumors; extracranial germ cell tumors; extragonadal germ cell tumors; extrahepatic bile duct cancer; eye cancer (e.g., intraocular melanoma, retinoblastoma); gallbladder cancer; gastric cancer; gastrointestinal tumors (e.g., carcinoid, stromal tumors (gist), interstitial cell tumors); germ cell tumors (e.g., extracranial, extragonadal, ovarian); gestational trophoblastic tumors; gliomas (e.g., brain stem, astrocytomas), hairy cell leukemia, head Neck tumors, liver cancer, Hodgkin lymphoma, hypopharyngeal cancer, hypothalamic and optic pathway gliomas; intraocular melanoma, pancreatic islet cell tumor, Kaposi's sarcoma, renal cancer, large cell tumor, laryngeal cancer (e.g., acute lymphocytic, acute myeloid); leukemias (e.g., acute myeloid, chronic lymphocytic, chronic myeloid, hairy cell); lip and/or oral cancer, liver cancer; lung cancer (e.g., non-small cell, small cell); lymphomas (e.g., AIDS-related, Burkitt's, cutaneous T-cell, Hodgkin lymphoma, non-Hodgkin, primary central nervous system), macroglobulinemia, Waller's disease, bone and/or osteosarcoma malignant fibrous histiocytoma, medulloblastoma, medulloepithelioma, melanoma, Merkel cell carcinoma, mesothelioma, metastatic squamous cell carcinoma of the neck, oral cancer; multiple endocrine neoplasia syndrome, multiple myeloma/plasmacytoma, mycosis fungoides, myelodysplastic syndrome, myelodysplastic/myeloproliferative disorders, myeloid leukemia (e.g., chronic, acute, multifocal), myeloproliferative disorders, chronic; cancer of the nasal cavity and/or paranasal sinuses; nasopharyngeal carcinoma; neuroblastoma, non-Hodgkin lymphoma, non-small cell lung cancer; oral cancer; oral cancer, oropharyngeal cancer, osteosarcoma and/or malignant fibrous tissue of the bone ovarian cancer (e.g., epithelial ovarian cancer, ovarian germ cell tumor, ovarian tumor of low malignant potential), pancreatic cancer (e.g., islet cell tumor), papillomatosis, paranasal sinus and/or nasal cavity cancer, thyroid cancer, penile cancer, pharyngeal cancer, pheochromocytoma, intermediately differentiated pineal parenchymal tumor, pinealoblastoma and supratentorial primitive neuroectodermal tumor, pituitary tumor, plasmacytoma/multiple myeloma, pleuropulmonary blastoma, primary central nervous system lymphoma, prostate cancer, rectal cancer, renal cell carcinoma, transitional cell carcinoma of the kidney, renal pelvis and/or ureter, respiratory tract infection Chromosomal gene nut; retinoblastoma; rhabdomyosarcoma; salivary gland cancer; sarcomas (e.g., Ewing family tumors, Kaposi's, soft tissue, uterine); Sézary syndrome; skin cancer (e.g., non-melanoma, melanoma, Merkel cell); small cell lung cancer; small bowel tumors, soft tissue sarcomas, squamous cell carcinoma; squamous cell head and neck cancer with occult primary, metastasis, gastric cancer; supratentorial primitive neuroectodermal tumor, T-cell lymphoma, skin, testicular cancer, pharyngeal cancer, thymoma and/or thymic carcinoma, thyroid cancer, trophoblastic tumor; cancer of unknown primary site, urethral cancer, uterine cancer , endometrial cancer, uterine sarcoma, vaginal cancer, glioma of the visual pathway and/or hypothalamus, transitional cell carcinoma of the kidney, renal pelvis and/or ureter, vulvar cancer; Waldenstrom's macroglobulinemia; Wilms' tumor, etc. For a review, please refer to the National Cancer Institute's global website (cancer.gov/cancertopics/alphalist). One skilled in the art will understand that this list is exemplary only and not exhaustive, as one skilled in the art will readily be able to identify additional cancers and/or tumors based on the disclosure herein.

根据某些方面,来自个体的血液循环肿瘤细胞被分离出来。取一个或多个循环的肿瘤细胞进行核酸提取。然后用这里所描述的扩增方法扩增核酸,例如用多种退火和循环带或不带后续的指数扩增。这样做以提供细胞整个基因组或特定基因组位点进行分析。然后对基因组作遗传变异与基因组疾病及癌症的联合分析。According to certain aspects, circulating tumor cells are isolated from an individual's blood. One or more circulating tumor cells are then subjected to nucleic acid extraction. The nucleic acids are then amplified using the amplification methods described herein, for example, using multiple annealing cycles with or without subsequent exponential amplification. This provides analysis of the entire genome of the cell or specific genomic loci. The genome is then analyzed for genetic variation and associated genomic diseases and cancers.

肿瘤循环细胞可以用以下方法取得。通过常规的静脉穿刺术得到的患者的血液。得到约10ml的血液。循环肿瘤细胞可以通过多种不同的方法取得,包括:商业CellSearch系统(Clin.Cancer Res.2004,10,6897-6904and Clin.Cancer Res.2010,16,2634-2645),基于大小的过滤装置(J.Pathol.2000,156,57-63and Cancer Res.2010,70,6420-6428),野场成像与纤维光纤阵列扫描技术(Proc.Natl.Acad.Sci.USA2004,101,10501-1050),基于抗体捕获的微流体装置(Lab Chip2010,10,837-842,Nature2007,450,1235-1239,Anal.Chem.2011,83,2301-2309,Angew.Chem.2011,123,3140-3144andAngew.Chem.Int.Ed.2011,50,3084-3088)Tumor circulating cells can be obtained by the following methods. Blood from the patient is obtained by routine venipuncture. About 10 ml of blood is obtained. Circulating tumor cells can be obtained by a variety of different methods, including: commercial CellSearch system (Clin.Cancer Res.2004,10,6897-6904and Clin.Cancer Res.2010,16,2634-2645), size-based filtration device (J.Pathol.2000,156,57-63and Cancer Res.2010,70,6420-6428), field imaging and fiber optic array scanning technology (Proc.Natl.Acad.Sci.USA2004,101,10501-1050), microfluidic device based on antibody capture (Lab Chip2010,10,837-842,Nature2007,450,1235-1239, Anal.Chem.2011,83,2301-2309, Angew.Chem.2011,123,3140-3144andAngew.Chem.Int.Ed.2011,50,3084-3088)

根据某些其他方面,从块状或组织块中通过细针穿刺获得一个或多个细胞。从来自单细胞或多个从细针抽吸获得的细胞中提取核酸,如DNA。然后用本文所述的方法进行线性预扩增,以得到胚胎的全基因组或者特定的基因组位点以供分析。然后确定与已知先天性疾病或与已知的表型相关的基因变异。According to certain other aspects, one or more cells are obtained from a block or tissue mass by fine needle aspiration. Nucleic acids, such as DNA, are extracted from a single cell or multiple cells obtained from fine needle aspiration. Linear pre-amplification is then performed using the methods described herein to obtain the full genome of the embryo or specific genomic loci for analysis. Genetic variations associated with known congenital diseases or with known phenotypes are then determined.

细胞通过如下所示的细针穿刺法进行活检和取出。先用消毒水和无菌手术巾擦试待检查区域上方的皮肤。定位肿块以后,用X-射线或触诊,Cells are biopsied and removed by fine needle aspiration as shown below. First, wipe the skin above the area to be examined with disinfectant and sterile surgical towels. After locating the mass, use X-ray or palpation,

往肿块中插入很细直径(22或25号)的一种特殊的针。插入针以后,细胞被用注射器抽吸排出,并转移到一个单管。标记及保存细胞。在荧光显微镜下用口吸管法或激光切割法挑选出单个癌细胞。A special, very thin (22- or 25-gauge) needle is inserted into the tumor. After the needle is inserted, cells are aspirated with a syringe and transferred to a single tube. The cells are labeled and stored. Individual cancer cells are then isolated using a mouth pipette or laser ablation under a fluorescence microscope.

核酸,例如DNA,可以从CTC细胞或者细针取样细胞用蛋白酶消化法提取。将单细胞置于新鲜配置的30ul的细胞裂解液(30mM Tris-Cl PH7.8,2mM EDTA,20mM KCl,0.2%Triton X-100,12.5ug/ml QIAGEN Protease)。其他方法例如碱裂解或者解冻-冻存裂解也可以应用于核酸的提取,或者利用其他这里提到的或者已知的方法。Nucleic acids, such as DNA, can be extracted from CTCs or fine needle aspirated cells using protease digestion. Single cells are placed in 30 μl of freshly prepared cell lysis buffer (30 mM Tris-Cl pH 7.8, 2 mM EDTA, 20 mM KCl, 0.2% Triton X-100, 12.5 μg/ml QIAGEN Protease). Other methods, such as alkaline lysis or thaw-freeze lysis, can also be used for nucleic acid extraction, or other methods described herein or known in the art.

提取出来的核酸物质可以用一下方法进行扩增。在含有一个裂解了的单细胞中加入30ul的扩增缓冲液(20mM Tris-Cl(pH8.8),10mM(NH4)2SO4,10mM KCl,3mM MgSO4,0.1%Triton X-100,0.32uM引物GAT3T(GTG AGT GAT GGT TGA GGT AGT GTG GAG NNN NNG GG),0.25uM引物GAT3G(GTG AGT GAT GGT TGA GGT AGT GTG GAG NNN NNT TT),然后加热94C3分钟以将双链的基因组DNA解链为单链。随后,将这些单链DNA迅速置于冰上以确保其与引物的有效结合。随后加入2.5U的Bst大片段(NEB),或者2U的Bst大片段和0.8U的Pyrophage3173exo-(Lucigen)。进行以下的温度循环:10℃-45秒20℃-45秒30℃-60秒40℃-45秒50℃-45秒62℃-2分钟95℃-20秒,然后将PCR管快速置于冰上。The extracted nucleic acid can be amplified using the following method. Add 30ul of amplification buffer (20mM Tris-Cl (pH8.8), 10mM (NH4)2SO4, 10mM KCl, 3mM MgSO4, 0.1% Triton X-100, 0.32uM primer GAT3T (GTG AGT GAT GGT TGA GGT AGT GTG GAG NNN NNG GG), 0.25uM primer GAT3G (GTG AGT GAT GGT TGA GGT AGT GTG GAG NNN NNT TT), then heated at 94°C for 3 minutes to melt the double-stranded genomic DNA into single strands. Subsequently, these single-stranded DNAs were quickly placed on ice to ensure their effective binding to the primers. Subsequently, 2.5U of Bst Large Fragment (NEB) or 2U of Bst Large Fragment and 0.8U of Pyrophage 3173 exo- (Lucigen) were added. The following temperature cycle was performed: 10°C-45 seconds, 20°C-45 seconds, 30°C-60 seconds, 40°C-45 seconds, 50°C-45 seconds, 62°C-2 minutes, 95°C-20 seconds, and then the PCR tubes were quickly placed on ice.

在冰上退火后,加入同样的聚合酶混合物并进行以下温度循环:10℃-45秒,20℃-45秒,30℃-60秒,40℃-45秒,50℃-45秒,62℃-2分钟,95℃-20秒,58C-20秒。并重复以上步骤四次以得到与扩增物的混合物。After annealing on ice, the same polymerase mixture was added and the following temperature cycle was performed: 10°C for 45 seconds, 20°C for 45 seconds, 30°C for 60 seconds, 40°C for 45 seconds, 50°C for 45 seconds, 62°C for 2 minutes, 95°C for 20 seconds, and 58°C for 20 seconds. The above steps were repeated four times to obtain a mixture with the amplified product.

上述的扩增物可以进一步用PCR进行扩增,目标进行新一代基因组测序。往预扩增物中加入新鲜配置的30ul的扩增混合物(20mM Tris-ClPH8.8,10mM(NH4)2SO4,10mM KCl,4mM MgSO4,0.1%Triton X-100,0.66uM Bio-GAT引物(5’/Biosg/GTGAGTGATGGTTGAGGTAGTGTGGAG-3’),2.4U Deep VentR exo-聚合酶(NEB)。温度循环:94C20秒,59C20秒,65C1分钟,72C2分钟重复18遍,从一个细胞开始共可得到2-3微克的双链DNA产物用以进行高通量测序。The amplified product can be further amplified using PCR for next-generation genome sequencing. To the pre-amplified product, 30 μl of freshly prepared amplification mix (20 mM Tris-Cl pH 8.8, 10 mM (NH₄)₂SO₄, 10 mM KCl, 4 mM MgSO₄, 0.1% Triton X-100, 0.66 μM Bio-GAT primer (5'/Biosg/GTGAGTGATGGTTGAGGTAGTGTGGAG-3'), and 2.4 U Deep VentR exo-polymerase (NEB)) was added. The temperature was cycled 18 times: 94°C for 20 seconds, 59°C for 20 seconds, 65°C for 1 minute, and 72°C for 2 minutes. Starting from one cell, a total of 2-3 μg of double-stranded DNA product was obtained for high-throughput sequencing.

扩增物所进行的基因测试可以是在全基因组尺度的,或者在选择的基因组上显著的部分,或者是已知致病的基因组位点。全基因组分析的例子包括新一代全基因组测序(Illumina,SoliD等),基于杂交的全基因组基因分型方法,例如单碱基多态性阵列(SNParray),比较基因组杂交芯片等等。分析基因组显著部分的方法例如靶向序列重测序,特别的基因组部分的测序例如外显子组测序,某个染色体等等。分析特别的基因组位点的例子包括对扩增全基因组进行杂交核酸探针,通过成像或测序进行分析;也包括通过PCR或者多重PCR对具体的基因组的区域进行扩增和后续的测序或者基因分型。The genetic testing performed on the amplicon can be on a whole-genome scale, or on a significant portion of a selected genome, or on a genomic site known to cause disease. Examples of whole-genome analysis include next-generation whole-genome sequencing (Illumina, SoliD, etc.), hybridization-based whole-genome genotyping methods, such as single-base polymorphism arrays (SNP arrays), comparative genomic hybridization chips, and the like. Methods for analyzing significant portions of the genome include targeted sequence resequencing, sequencing of specific genomic portions such as exome sequencing, a certain chromosome, and the like. Examples of analyzing specific genomic sites include hybridization of nucleic acid probes to the amplified whole genome, analysis by imaging or sequencing, and also amplification of specific genomic regions by PCR or multiplex PCR and subsequent sequencing or genotyping.

对于癌症诊断有意义的基因多态性范围很广,包括但不限于单碱基多态性(SNV),1-100bp的小插入或缺失(Indels),100bp-100Mbp的拷贝数多态性(CNV),1bp-1Mbp的序列的倒位和重复,10bp-100Mbp的杂合性丢失(LOH),以及全染色体水平上的异常,包括染色体易位,非整倍体,染色体部分或全部的缺失或倍增,以及其他已知可能致病的染色体或基因变异。需要说明的是这里提到的某些基因疾病的目的并不是穷尽所有的疾病,而仅仅为了举例说明。这里展示的方法的目的是为了分析单细胞的基因组。所以任何或所有能通过分析细胞基因组的疾病的异常在这里被视为本展示方法的一个例子。The range of genetic polymorphisms that are meaningful for cancer diagnosis is very wide, including but not limited to single base polymorphisms (SNVs), small insertions or deletions (Indels) of 1-100bp, copy number polymorphisms (CNVs) of 100bp-100Mbp, inversions and duplications of sequences of 1bp-1Mbp, loss of heterozygosity (LOH) of 10bp-100Mbp, and abnormalities at the whole chromosome level, including chromosome translocations, aneuploidies, partial or complete deletions or duplications of chromosomes, and other known chromosome or gene variations that may cause disease. It should be noted that the purpose of certain genetic diseases mentioned here is not to exhaust all diseases, but only to illustrate. The purpose of the method shown here is to analyze the genome of a single cell. Therefore, any or all abnormalities of diseases that can be detected by analyzing the cell genome are considered here as an example of this display method.

已知的癌症基因组变异的具体例子,可以参考现有的公共数据库,如:美国国立卫生研究院的癌症基因组解剖计划(CGAP)和体细胞突变的癌症目录(COSMIC)。For specific examples of known cancer genomic variants, one can refer to existing public databases such as the Cancer Genome Anatomy Project (CGAP) and the Catalog of Somatic Mutations in Cancer (COSMIC) of the National Institutes of Health.

Claims (28)

1.一种用于非诊断目的的扩增单细胞全基因组的方法,包括:1. A method for amplifying the whole genome of a single cell for non-diagnostic purposes, comprising: (a)在反应容器中提供单链形式的来自单细胞的基因组DNA;(a) Providing genomic DNA from a single cell in single-stranded form in a reaction vessel; (b)向反应容器中加入具有10-30个核苷酸的共同序列、3-7个核苷酸的可变序列和2-4个核苷酸的固定序列的引物以产生反应混合物,其中所述共同序列包含G、T和A,但不包含C,(b) To produce a reaction mixture, primers having a common sequence of 10-30 nucleotides, a variable sequence of 3-7 nucleotides, and a fixed sequence of 2-4 nucleotides are added to the reaction vessel, wherein the common sequence contains G, T, and A, but not C. (c)将所述反应混合物经受0℃至10℃之间的低温,所述引物至所述单链基因组DNA的退火在所述低温下发生,(c) The reaction mixture is subjected to a low temperature between 0°C and 10°C, at which the annealing of the primers to the single-stranded genomic DNA occurs. (d)向所述反应混合物中加入具有链置换活性或具有5’-3’核酸外切酶活性的至少一种DNA聚合酶,并将所述反应混合物经受发生DNA扩增的温度,以产生单链或双链DNA,(d) Add at least one DNA polymerase having strand displacement activity or 5'-3' exonuclease activity to the reaction mixture, and subject the reaction mixture to a temperature at which DNA amplification occurs to produce single-stranded or double-stranded DNA. (e)将所述反应混合物经受产生单链扩增物的温度;(e) subjecting the reaction mixture to a temperature at which single-stranded amplifiers are produced; (f)将所述反应混合物经受55℃至60℃之间的退火游离引物至所述扩增物3’端的温度,从而保持线性结构,以防止嵌合物的形成;以及(f) subjecting the reaction mixture to an annealing temperature between 55°C and 60°C to the 3' end of the amplifier to maintain a linear structure and prevent the formation of intercalations; and (g)重复步骤(c)至(f)以产生所述基因组DNA的扩增物。(g) Repeat steps (c) through (f) to produce an amplified version of the genomic DNA. 2.根据权利要求1所述的方法,其中,所述引物具有以下序列:5’-GT GAG TGA TGG TTGAGG TAG TGT GGA GNNNNNGGG-3’和5’-GT GAG TGA TGG TTG AGG TAG TGT GGAGNNNNNTTT-3’。2. The method according to claim 1, wherein the primers have the following sequences: 5’-GT GAG TGA TGG TTGAGG TAG TGT GGA GNNNNNGGG-3’ and 5’-GT GAG TGA TGG TTG AGG TAG TGT GGAGNNNNNTTT-3’. 3.根据权利要求1所述的方法,其中,所述DNA聚合酶包含Φ29聚合酶或Bst聚合酶中的至少一种。3. The method according to claim 1, wherein the DNA polymerase comprises at least one of Φ29 polymerase or Bst polymerase. 4.根据权利要求1所述的方法,其中,所述DNA聚合酶包含Φ29聚合酶和Bst聚合酶。4. The method according to claim 1, wherein the DNA polymerase comprises Φ29 polymerase and Bst polymerase. 5.根据权利要求1所述的方法,进一步包括去除扩增物5’端和3’端的引物序列及互补引物序列的步骤。5. The method according to claim 1, further comprising the step of removing the primer sequences and complementary primer sequences from the 5' and 3' ends of the amplifier. 6.根据权利要求1所述的方法,其中,所述DNA扩增发生的温度在30℃至65℃之间。6. The method according to claim 1, wherein the DNA amplification occurs at a temperature between 30°C and 65°C. 7.根据权利要求1所述的方法,其中,所述游离引物的退火温度在低温淬火之前进行。7. The method according to claim 1, wherein the annealing temperature of the free primer is performed before cryogenic quenching. 8.根据权利要求1所述的方法,其中,所述步骤(f)是可选的并且在第二和后续的热循环过程中进行。8. The method of claim 1, wherein step (f) is optional and is performed during the second and subsequent thermal cycles. 9.一种用于非诊断目的的扩增单细胞基因组的方法,包括:9. A method for amplifying a single-cell genome for non-diagnostic purposes, comprising: (a)在反应容器中提供单链形式的来源于所述单细胞的基因组DNA;(a) Providing genomic DNA derived from the single cell in single-stranded form in the reaction vessel; (b)向所述反应容器中加入具有10-30个核苷酸的共同序列、3-7个核苷酸的可变序列和2-4个核苷酸的固定序列的引物以产生反应混合物,其中所述共同序列包含G、T和A,但不包含C;(b) Add primers having a common sequence of 10-30 nucleotides, a variable sequence of 3-7 nucleotides, and a fixed sequence of 2-4 nucleotides to the reaction vessel to generate a reaction mixture, wherein the common sequence contains G, T, and A, but not C; (c)将所述反应混合物经受0℃至10℃之间的低温,所述引物至所述单链基因组DNA的退火在所述低温发生;(c) The reaction mixture is subjected to a low temperature between 0°C and 10°C, wherein the primer to the single-stranded genomic DNA annealing occurs at the low temperature; (d)向所述反应混合物中加入至少一种具有链置换活性或具有5’-3’核酸外切酶活性的DNA聚合酶,并将所述反应混合物经受发生DNA扩增的温度,以产生在3’端具有引物序列的单链基因组DNA的第一轮扩增物;(d) Add at least one DNA polymerase having strand displacement activity or 5’-3’ exonuclease activity to the reaction mixture and subject the reaction mixture to a temperature at which DNA amplification occurs to produce a first-round amplification of single-stranded genomic DNA with primer sequences at the 3’ end; (e)将所述反应混合物经受将双链核酸解旋为单链核酸的温度;(e) subjecting the reaction mixture to a temperature that decomposes double-stranded nucleic acids into single-stranded nucleic acids; (f)将所述反应混合物经受低温,引物到单链基因组DNA的退火在所述低温下发生,以及引物到第一轮扩增物的退火在所述低温下发生;(f) The reaction mixture is subjected to low temperature, at which the annealing of primers to single-stranded genomic DNA and the annealing of primers to the first round of amplifications occur; (g)向所述反应混合物中加入至少一种具有链置换活性或具有5’-3’核酸外切酶活性的DNA聚合酶,或者5’游离核酸内切酶活性的聚合酶,并将所述反应混合物经受发生DNA扩增的温度,以产生在3’端具有引物序列的单链基因组DNA的第一轮扩增物,以及所述第一轮扩增物的第二轮扩增物,所述第二轮扩增物在其3’和5’端具有互补引物序列;(g) Add at least one DNA polymerase having strand displacement activity or 5’-3’ exonuclease activity, or a polymerase having 5’ free endonuclease activity, to the reaction mixture, and subject the reaction mixture to a temperature at which DNA amplification occurs to produce a first-round amplified single-stranded genomic DNA with primer sequences at the 3’ end, and a second-round amplified the first-round amplified, the second-round amplified having complementary primer sequences at its 3’ and 5’ ends; (h)将所述反应混合物经受将双链核酸解旋为单链核酸的温度;(h) subjecting the reaction mixture to a temperature that unwinds double-stranded nucleic acids into single-stranded nucleic acids; (i)将所述反应混合物经受55℃至60℃之间的退火游离引物至所述第二轮扩增物3’端的温度以保持线性结构,或55℃至60℃之间的退火所述第二轮扩增物的3’端至其5’端的温度,从而形成环状结构,从而使所述第二轮扩增物不能用于扩增;(i) The reaction mixture is subjected to a temperature between 55°C and 60°C to anneal the free primers to the 3' end of the second round amplification to maintain a linear structure, or to a temperature between 55°C and 60°C to anneal the 3' end of the second round amplification to its 5' end, thereby forming a ring structure, so that the second round amplification cannot be used for amplification; (j)重复(f)至(i)以产生所述基因组DNA的扩增物。(j) Repeat (f) to (i) to produce an amplified version of the genomic DNA. 10.根据权利要求9所述的方法,其中,所述引物具有以下序列:5’-GT GAG TGA TGGTTG AGG TAG TGT GGA GNNNNNGGG-3’和5’-GT GAG TGA TGG TTG AGG TAG TGT GGAGNNNNNTTT-3’。10. The method according to claim 9, wherein the primers have the following sequences: 5’-GT GAG TGA TGGTTG AGG TAG TGT GGA GNNNNNGGG-3’ and 5’-GT GAG TGA TGG TTG AGG TAG TGT GGAGNNNNNTTT-3’. 11.根据权利要求9中所述的方法,其中,所述DNA聚合酶包含Φ29聚合酶或Bst聚合酶中的至少一种。11. The method according to claim 9, wherein the DNA polymerase comprises at least one of Φ29 polymerase or Bst polymerase. 12.根据权利要求9所述的方法,其中,所述DNA聚合酶包含Φ29聚合酶和Bst聚合酶。12. The method according to claim 9, wherein the DNA polymerase comprises Φ29 polymerase and Bst polymerase. 13.根据权利要求9所述的方法,进一步包括去除所述扩增物5’端和3’端的引物序列及互补序列的步骤。13. The method of claim 9, further comprising the step of removing the primer sequences and complementary sequences at the 5' and 3' ends of the amplifier. 14.根据权利要求9所述的方法,其中,所述DNA扩增发生的温度在30℃至65℃之间。14. The method of claim 9, wherein the DNA amplification occurs at a temperature between 30°C and 65°C. 15.根据权利要求9所述的方法,其中,所述游离引物的退火温度在低温淬火之前进行。15. The method according to claim 9, wherein the annealing temperature of the free primer is performed before cryogenic quenching. 16.根据权利要求9所述的方法,其中,所述单细胞是出生前细胞。16. The method of claim 9, wherein the single cell is a prenatal cell. 17.根据权利要求9所述的方法,其中,所述单细胞是癌细胞。17. The method of claim 9, wherein the single cell is a cancer cell. 18.根据权利要求9所述的方法,其中,所述单细胞是循环肿瘤细胞。18. The method of claim 9, wherein the single cell is a circulating tumor cell. 19.一种用于非诊断目的的扩增一个或多个细胞基因组的方法,包括:19. A method for amplifying the genome of one or more cells for non-diagnostic purposes, comprising: (a)在一个反应容器中提供单链形式的来源于所述一个或多个细胞的基因组DNA;(a) Providing genomic DNA derived from the one or more cells in single-stranded form in a reaction vessel; (b)向所述反应容器中加入含有10-30个核苷酸的共同序列、3-7个核苷酸的可变序列和2-4个核苷酸的固定序列的引物以产生反应混合物,其中所述共同序列包含G、T和A,但不包含C;(b) Add primers containing a common sequence of 10-30 nucleotides, a variable sequence of 3-7 nucleotides, and a fixed sequence of 2-4 nucleotides to the reaction vessel to generate a reaction mixture, wherein the common sequence contains G, T, and A, but not C; (c)将所述反应混合物经受0℃至10℃之间的低温,所述引物到所述单链基因组DNA的退火在所述低温下发生;(c) The reaction mixture is subjected to a low temperature between 0°C and 10°C, wherein the primer to the single-stranded genomic DNA is annealed at the low temperature; (d)向所述反应混合物中加入至少一种具有链置换活性或具有5’至3’核酸外切酶活性的DNA聚合酶,并将所述反应混合物经受发生DNA扩增的温度,以产生在3’端具有引物序列的所述单链基因组DNA的第一轮扩增物;(d) Add at least one DNA polymerase having strand displacement activity or 5' to 3' exonuclease activity to the reaction mixture, and subject the reaction mixture to a temperature at which DNA amplification occurs to produce a first-round amplification of the single-stranded genomic DNA with primer sequences at the 3' end; (e)将所述反应混合物经受将双链核酸解旋为单链核酸的温度;(e) subjecting the reaction mixture to a temperature that decomposes double-stranded nucleic acids into single-stranded nucleic acids; (f)将所述反应混合物经受低温,所述引物至所述单链基因组DNA的退火在所述低温下发生,以及所述引物至第一轮扩增物的退火在所述低温下发生;(f) The reaction mixture is subjected to low temperature, the annealing of the primers to the single-stranded genomic DNA occurs at the low temperature, and the annealing of the primers to the first round of amplification occurs at the low temperature; (g)向所述反应混合物中加入至少一种具有链置换活性或具有5’至3’核酸外切酶活性的DNA聚合酶,或5’游离核酸内切酶活性的聚合酶,并将所述反应混合物经受发生DNA扩增的温度,以产生在3’端具有引物序列的所述单链基因组DNA的第一轮扩增物,以及所述第一轮扩增物的第二轮扩增物,所述第二轮扩增物在其3’和5’端具有互补引物序列;(g) Add at least one DNA polymerase having strand displacement activity or 5' to 3' exonuclease activity, or a polymerase having 5' free endonuclease activity, to the reaction mixture, and subject the reaction mixture to a temperature at which DNA amplification occurs to produce a first-round amplification of the single-stranded genomic DNA having primer sequences at the 3' end, and a second-round amplification of the first-round amplification having complementary primer sequences at its 3' and 5' ends; (h)将所述反应混合物经受将双链核酸解旋为单链核酸的温度;(h) subjecting the reaction mixture to a temperature that unwinds double-stranded nucleic acids into single-stranded nucleic acids; (i)将所述反应混合物经受55℃至60℃之间的退火游离引物至所述第二轮扩增物3’端以保持线性结构,或退火第二轮扩增物的3’端至其5’端的温度,从而形成环状结构,从而使所述第二轮扩增物不能用于扩增;(i) The reaction mixture is subjected to annealing at a temperature between 55°C and 60°C to the 3' end of the second round amplification to maintain a linear structure, or the temperature of annealing the 3' end of the second round amplification to its 5' end is used to form a ring structure, thereby making the second round amplification unusable for amplification; (j)重复步骤(f)至(i),以产生所述基因组DNA的扩增物。(j) Repeat steps (f) to (i) to produce an amplified version of the genomic DNA. 20.根据权利要求19所述的方法,其中,所述引物具有以下序列:5’-GT GAG TGA TGGTTG AGG TAG TGT GGA GNNNNNGGG-3’和5’-GT GAG TGA TGG TTG AGG TAG TGT GGAGNNNNNTTT-3’。20. The method of claim 19, wherein the primers have the following sequences: 5’-GT GAG TGA TGGTTG AGG TAG TGT GGA GNNNNNGGG-3’ and 5’-GT GAG TGA TGG TTG AGG TAG TGT GGAGNNNNNTTT-3’. 21.根据权利要求19所述的方法,其中,所述DNA聚合酶包含Φ29聚合酶或Bst聚合酶中的至少一种。21. The method of claim 19, wherein the DNA polymerase comprises at least one of Φ29 polymerase or Bst polymerase. 22.根据权利要求19所述的方法,其中,所述DNA聚合酶包含Φ29聚合酶和Bst聚合酶。22. The method of claim 19, wherein the DNA polymerase comprises Φ29 polymerase and Bst polymerase. 23.根据权利要求19所述的方法,进一步包括去除扩增物5’端和3’端的引物序列及互补序列的步骤。23. The method of claim 19, further comprising the step of removing the primer sequences and complementary sequences at the 5' and 3' ends of the amplifier. 24.根据权利要求19所述的方法,其中,所述DNA扩增发生的温度在30℃至65℃之间。24. The method of claim 19, wherein the DNA amplification occurs at a temperature between 30°C and 65°C. 25.根据权利要求19所述的方法,其中,所述游离引物的退火温度在低温淬火之前进行。25. The method of claim 19, wherein the annealing temperature of the free primer is performed before cryogenic quenching. 26.根据权利要求19所述的方法,其中,所述细胞是出生前细胞。26. The method of claim 19, wherein the cell is a prenatal cell. 27.根据权利要求19所述的方法,其中,所述细胞是癌细胞。27. The method of claim 19, wherein the cell is a cancer cell. 28.根据权利要求19所述的方法,其中,所述细胞是循环肿瘤细胞。28. The method of claim 19, wherein the cell is a circulating tumor cell.
HK14112461.9A 2011-05-27 2012-05-22 Methods of amplifying whole genome of a single cell HK1199070B (en)

Applications Claiming Priority (9)

Application Number Priority Date Filing Date Title
US201161490790P 2011-05-27 2011-05-27
US61/490,790 2011-05-27
US201161510539P 2011-07-22 2011-07-22
US61/510,539 2011-07-22
US201161550677P 2011-10-24 2011-10-24
US61/550,677 2011-10-24
US201261621271P 2012-04-06 2012-04-06
US61/621,271 2012-04-06
PCT/US2012/038930 WO2012166425A2 (en) 2011-05-27 2012-05-22 Methods of amplifying whole genome of a single cell

Publications (2)

Publication Number Publication Date
HK1199070A1 HK1199070A1 (en) 2015-06-19
HK1199070B true HK1199070B (en) 2020-01-24

Family

ID=

Similar Documents

Publication Publication Date Title
EP2714938B1 (en) Methods of amplifying whole genome of a single cell
EP3325665B1 (en) Methods of amplifying nucleic acid sequences
US20240376533A1 (en) Measurement of nucleic acid variants using highly-multiplexed error-suppressed deep sequencing
JP6882453B2 (en) Whole genome digital amplification method
US11214798B2 (en) Methods and compositions for rapid nucleic acid library preparation
CN114555802B (en) Single cell analysis
JP2019528059A (en) Method for de novo assembly of barcoded genomic DNA fragments
JP2020522243A (en) Multiplexed end-tagging amplification of nucleic acids
JP2007509613A (en) QRT-PCR assay system for gene expression profiling
AU2012304328A1 (en) Methods for obtaining a sequence
TW201321518A (en) Method of micro-scale nucleic acid library construction and application thereof
Emerman et al. NEBNext Direct: A Novel, Rapid, Hybridization‐Based Approach for the Capture and Library Conversion of Genomic Regions of Interest
US20250154582A1 (en) Systems and methods for sequencing error correction via double strand preservation
CN114391043A (en) Methylation detection and analysis of mammalian DNA
US20210403994A1 (en) Methods for rapid dna extraction from tissue and library preparation for nanopore-based sequencing
HK1199070B (en) Methods of amplifying whole genome of a single cell
HK40064558A (en) Compositions for rapid nucleic acid library preparation
HK40072305A (en) Methylation detection and analysis of mammalian dna